linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork
@ 2025-07-17  6:20 Adam Li
  2025-07-17  6:20 ` [PATCH v2 1/6] sched/fair: Only update stats for allowed CPUs when looking for dst group Adam Li
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Adam Li @ 2025-07-17  6:20 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, cl,
	linux-kernel, patches, shkaushik, Adam Li

Load imbalance is observed when the workload frequently forks new threads.
Due to CPU affinity, the workload can run on CPU 0-7 in the first
group, and only on CPU 8-11 in the second group. CPU 12-15 are always idle.

{ 0 1 2 3 4 5 6 7 } {8 9 10 11 12 13 14 15}
  * * * * * * * *    * * *  *

When looking for dst group for newly forked threads, in many times
update_sg_wakeup_stats() reports the second group has more idle CPUs
than the first group. The scheduler thinks the second group is less
busy. Then it selects least busy CPUs among CPU 8-11. So CPU 8-11 can be
crowded with newly forked threads, at the same time CPU 0-7 can be idle.

The first patch 'Only update stats of allowed CPUs when looking for dst
group' *alone* can fix this imbalance issue. With this patch, performance
significantly improved for workload with frequent task fork, if the
workload is set to use part of CPUs in a schedule group.

And I think the second patch also makes sense in this scenario. If group
weight includes CPUs a task cannot use, group classification can be
incorrect.

Peter mentioned [1] that the second patch might also apply to
update_sg_lb_stats(). The third patch counts group weight from 'env->cpus'
(active CPUs). Group classification can be incorrect if group weight
includes inactive CPUs.

Peter also mentioned that update_sg_wakeup_stats() and update_sg_lb_stats()
are very similar, that they might be unified. The RFC patches 4-6 try to
refactor the two functions. The common logic is unified to a new function
update_sg_stats().

I tested with Specjbb workload on arm64 server. The patch set does not
introduce observable performance change. But the test cannot cover every
code path. Please review.

v2:
  Follow Peter's suggestions:
  1) Apply the second patch to update_sg_lb_stats().
  2) Refactor and unify update_sg_wakeup_stats() and update_sg_lb_stats().

v1:
  https://lore.kernel.org/lkml/20250701024549.40166-1-adamli@os.amperecomputing.com/

links:
[1]: https://lore.kernel.org/lkml/20250704091758.GG2001818@noisy.programming.kicks-ass.net/

Adam Li (6):
  sched/fair: Only update stats for allowed CPUs when looking for dst
    group
  sched/fair: Only count group weight for allowed CPUs when looking for
    dst group
  sched/fair: Only count group weight for CPUs doing load balance when
    looking for src group
  sched/fair: Make update_sg_wakeup_stats() helper functions handle NULL
    pointers
  sched/fair: Introduce update_sg_stats()
  sched/fair: Unify update_sg_lb_stats() and update_sg_wakeup_stats()

 kernel/sched/fair.c | 274 ++++++++++++++++++++++++--------------------
 1 file changed, 148 insertions(+), 126 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/6] sched/fair: Only update stats for allowed CPUs when looking for dst group
  2025-07-17  6:20 [PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork Adam Li
@ 2025-07-17  6:20 ` Adam Li
  2025-07-17  6:20 ` [PATCH v2 2/6] sched/fair: Only count group weight " Adam Li
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Adam Li @ 2025-07-17  6:20 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, cl,
	linux-kernel, patches, shkaushik, Adam Li

A task may not use all the CPUs in a schedule group due to CPU affinity.
Only update schedule group statistics for allowed CPUs.

Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a14da5396fb..78a3d9b78e07 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10693,7 +10693,7 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd,
 	if (sd->flags & SD_ASYM_CPUCAPACITY)
 		sgs->group_misfit_task_load = 1;
 
-	for_each_cpu(i, sched_group_span(group)) {
+	for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) {
 		struct rq *rq = cpu_rq(i);
 		unsigned int local;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/6] sched/fair: Only count group weight for allowed CPUs when looking for dst group
  2025-07-17  6:20 [PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork Adam Li
  2025-07-17  6:20 ` [PATCH v2 1/6] sched/fair: Only update stats for allowed CPUs when looking for dst group Adam Li
@ 2025-07-17  6:20 ` Adam Li
  2025-07-17  6:20 ` [PATCH v2 3/6] sched/fair: Only count group weight for CPUs doing load balance when looking for src group Adam Li
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Adam Li @ 2025-07-17  6:20 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, cl,
	linux-kernel, patches, shkaushik, Adam Li

A task may not use all the CPUs in a schedule group due to CPU affinity.
If group weight includes CPUs not allowed to run, group classification
may be incorrect.

Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
---
 kernel/sched/fair.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 78a3d9b78e07..452e2df961b9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10722,7 +10722,9 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd,
 
 	sgs->group_capacity = group->sgc->capacity;
 
-	sgs->group_weight = group->group_weight;
+	/* Only count group_weight if p can run on these cpus */
+	sgs->group_weight = cpumask_weight_and(sched_group_span(group),
+				p->cpus_ptr);
 
 	sgs->group_type = group_classify(sd->imbalance_pct, group, sgs);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/6] sched/fair: Only count group weight for CPUs doing load balance when looking for src group
  2025-07-17  6:20 [PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork Adam Li
  2025-07-17  6:20 ` [PATCH v2 1/6] sched/fair: Only update stats for allowed CPUs when looking for dst group Adam Li
  2025-07-17  6:20 ` [PATCH v2 2/6] sched/fair: Only count group weight " Adam Li
@ 2025-07-17  6:20 ` Adam Li
  2025-07-17  6:20 ` [RFC PATCH v2 4/6] sched/fair: Make update_sg_wakeup_stats() helper functions handle NULL pointers Adam Li
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Adam Li @ 2025-07-17  6:20 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, cl,
	linux-kernel, patches, shkaushik, Adam Li

Load balancing is limitted to a set of CPUs, such as active CPUs.
Group classification may be incorrect if group weight counts inactive CPUs.

Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 452e2df961b9..db9ec6a6acdf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10427,7 +10427,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 
 	sgs->group_capacity = group->sgc->capacity;
 
-	sgs->group_weight = group->group_weight;
+	sgs->group_weight = cpumask_weight_and(sched_group_span(group), env->cpus);
 
 	/* Check if dst CPU is idle and preferred to this group */
 	if (!local_group && env->idle && sgs->sum_h_nr_running &&
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH v2 4/6] sched/fair: Make update_sg_wakeup_stats() helper functions handle NULL pointers
  2025-07-17  6:20 [PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork Adam Li
                   ` (2 preceding siblings ...)
  2025-07-17  6:20 ` [PATCH v2 3/6] sched/fair: Only count group weight for CPUs doing load balance when looking for src group Adam Li
@ 2025-07-17  6:20 ` Adam Li
  2025-07-17  6:20 ` [RFC PATCH v2 5/6] sched/fair: Introduce update_sg_stats() Adam Li
  2025-07-17  6:20 ` [RFC PATCH v2 6/6] sched/fair: Unify update_sg_lb_stats() and update_sg_wakeup_stats() Adam Li
  5 siblings, 0 replies; 7+ messages in thread
From: Adam Li @ 2025-07-17  6:20 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, cl,
	linux-kernel, patches, shkaushik, Adam Li

update_sg_wakeup_stats() uses a set of helper functions:
  cpu_load_without(struct task_struct *p),
  cpu_runnable_without(struct task_struct *p),
  cpu_util_without(struct task_struct *p),
  task_running_on_cpu(struct task_struct *p),
  idle_cpu_without(struct task_struct *p).

update_sg_lb_stats() uses similar helper functions, without the 'p'
argument: cpu_load(), cpu_runnable(), cpu_util_cfs(), idle_cpu().

Make update_sg_wakeup_stats() helper functions handle the case when
'p==NULL'. So update_sg_lb_stats() can use the same helper functions.

This is the first step to unify update_sg_wakeup_stats() and
update_sg_lb_stats().

Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
---
 kernel/sched/fair.c | 95 ++++++++++++++++++++++++---------------------
 1 file changed, 50 insertions(+), 45 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index db9ec6a6acdf..69dac5b337d8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7250,7 +7250,8 @@ static unsigned long cpu_load_without(struct rq *rq, struct task_struct *p)
 	unsigned int load;
 
 	/* Task has no contribution or is new */
-	if (cpu_of(rq) != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time))
+	if (!p || cpu_of(rq) != task_cpu(p) ||
+	    !READ_ONCE(p->se.avg.last_update_time))
 		return cpu_load(rq);
 
 	cfs_rq = &rq->cfs;
@@ -7273,7 +7274,8 @@ static unsigned long cpu_runnable_without(struct rq *rq, struct task_struct *p)
 	unsigned int runnable;
 
 	/* Task has no contribution or is new */
-	if (cpu_of(rq) != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time))
+	if (!p || cpu_of(rq) != task_cpu(p) ||
+	    !READ_ONCE(p->se.avg.last_update_time))
 		return cpu_runnable(rq);
 
 	cfs_rq = &rq->cfs;
@@ -7285,6 +7287,51 @@ static unsigned long cpu_runnable_without(struct rq *rq, struct task_struct *p)
 	return runnable;
 }
 
+/*
+ * task_running_on_cpu - return 1 if @p is running on @cpu.
+ */
+
+static unsigned int task_running_on_cpu(int cpu, struct task_struct *p)
+{
+	/* Task has no contribution or is new */
+	if (!p || cpu != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time))
+		return 0;
+
+	if (task_on_rq_queued(p))
+		return 1;
+
+	return 0;
+}
+
+/**
+ * idle_cpu_without - would a given CPU be idle without p ?
+ * @cpu: the processor on which idleness is tested.
+ * @p: task which should be ignored.
+ *
+ * Return: 1 if the CPU would be idle. 0 otherwise.
+ */
+static int idle_cpu_without(int cpu, struct task_struct *p)
+{
+	struct rq *rq = cpu_rq(cpu);
+
+	if (!p)
+		return idle_cpu(cpu);
+
+	if (rq->curr != rq->idle && rq->curr != p)
+		return 0;
+
+	/*
+	 * rq->nr_running can't be used but an updated version without the
+	 * impact of p on cpu must be used instead. The updated nr_running
+	 * be computed and tested before calling idle_cpu_without().
+	 */
+
+	if (rq->ttwu_pending)
+		return 0;
+
+	return 1;
+}
+
 static unsigned long capacity_of(int cpu)
 {
 	return cpu_rq(cpu)->cpu_capacity;
@@ -8099,7 +8146,7 @@ unsigned long cpu_util_cfs_boost(int cpu)
 static unsigned long cpu_util_without(int cpu, struct task_struct *p)
 {
 	/* Task has no contribution or is new */
-	if (cpu != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time))
+	if (!p || cpu != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time))
 		p = NULL;
 
 	return cpu_util(cpu, p, -1, 0);
@@ -10631,48 +10678,6 @@ static inline enum fbq_type fbq_classify_rq(struct rq *rq)
 
 struct sg_lb_stats;
 
-/*
- * task_running_on_cpu - return 1 if @p is running on @cpu.
- */
-
-static unsigned int task_running_on_cpu(int cpu, struct task_struct *p)
-{
-	/* Task has no contribution or is new */
-	if (cpu != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time))
-		return 0;
-
-	if (task_on_rq_queued(p))
-		return 1;
-
-	return 0;
-}
-
-/**
- * idle_cpu_without - would a given CPU be idle without p ?
- * @cpu: the processor on which idleness is tested.
- * @p: task which should be ignored.
- *
- * Return: 1 if the CPU would be idle. 0 otherwise.
- */
-static int idle_cpu_without(int cpu, struct task_struct *p)
-{
-	struct rq *rq = cpu_rq(cpu);
-
-	if (rq->curr != rq->idle && rq->curr != p)
-		return 0;
-
-	/*
-	 * rq->nr_running can't be used but an updated version without the
-	 * impact of p on cpu must be used instead. The updated nr_running
-	 * be computed and tested before calling idle_cpu_without().
-	 */
-
-	if (rq->ttwu_pending)
-		return 0;
-
-	return 1;
-}
-
 /*
  * update_sg_wakeup_stats - Update sched_group's statistics for wakeup.
  * @sd: The sched_domain level to look for idlest group.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH v2 5/6] sched/fair: Introduce update_sg_stats()
  2025-07-17  6:20 [PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork Adam Li
                   ` (3 preceding siblings ...)
  2025-07-17  6:20 ` [RFC PATCH v2 4/6] sched/fair: Make update_sg_wakeup_stats() helper functions handle NULL pointers Adam Li
@ 2025-07-17  6:20 ` Adam Li
  2025-07-17  6:20 ` [RFC PATCH v2 6/6] sched/fair: Unify update_sg_lb_stats() and update_sg_wakeup_stats() Adam Li
  5 siblings, 0 replies; 7+ messages in thread
From: Adam Li @ 2025-07-17  6:20 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, cl,
	linux-kernel, patches, shkaushik, Adam Li

Unify common logic in update_sg_lb_stats() and update_sg_wakeup_stats()
into function update_sg_stats().

Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
---
 kernel/sched/fair.c | 115 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 115 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 69dac5b337d8..f4ab520951a8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10398,6 +10398,121 @@ sched_reduced_capacity(struct rq *rq, struct sched_domain *sd)
 	return check_cpu_capacity(rq, sd);
 }
 
+struct sg_lb_stat_env {
+	/* true: find src group, false: find dst group */
+	bool			find_src_sg;
+	struct cpumask		*cpus;
+	struct sched_domain	*sd;
+	struct task_struct	*p;
+	bool			*sg_overloaded;
+	bool			*sg_overutilized;
+	int			local_group;
+	struct lb_env		*lb_env;
+};
+
+static inline void update_sg_stats(struct sg_lb_stats *sgs,
+				   struct sched_group *group,
+				   struct sg_lb_stat_env *env)
+{
+	bool find_src_sg = env->find_src_sg;
+	int i, sd_flags = env->sd->flags;
+	bool balancing_at_rd = !env->sd->parent;
+	struct task_struct *p = env->p;
+	enum cpu_idle_type idle;
+
+	if (env->lb_env)
+		idle = env->lb_env->idle;
+
+	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
+		struct rq *rq = cpu_rq(i);
+		unsigned int local, load = cpu_load_without(rq, p);
+		int nr_running;
+
+		sgs->group_load += load;
+		sgs->group_util += cpu_util_without(i, p);
+		sgs->group_runnable += cpu_runnable_without(rq, p);
+		local = task_running_on_cpu(i, p);
+		sgs->sum_h_nr_running += rq->cfs.h_nr_runnable - local;
+
+		nr_running = rq->nr_running - local;
+		sgs->sum_nr_running += nr_running;
+
+		if (find_src_sg && cpu_overutilized(i))
+			*env->sg_overutilized = 1;
+
+		/*
+		 * No need to call idle_cpu_without() if nr_running is not 0
+		 */
+		if (!nr_running && idle_cpu_without(i, p)) {
+			sgs->idle_cpus++;
+			/* Idle cpu can't have mistfit task */
+			continue;
+		}
+
+		if (!find_src_sg) {
+			/* Check if task fits in the CPU */
+			if (sd_flags & SD_ASYM_CPUCAPACITY &&
+			    sgs->group_misfit_task_load &&
+			    task_fits_cpu(p, i))
+				sgs->group_misfit_task_load = 0;
+
+			/* We are done if to find dst(idlest) group */
+			continue;
+		}
+
+		/* Overload indicator is only updated at root domain */
+		if (balancing_at_rd && nr_running > 1)
+			*env->sg_overloaded = 1;
+
+#ifdef CONFIG_NUMA_BALANCING
+		/* Only fbq_classify_group() uses this to classify NUMA groups */
+		if (sd_flags & SD_NUMA) {
+			sgs->nr_numa_running += rq->nr_numa_running;
+			sgs->nr_preferred_running += rq->nr_preferred_running;
+		}
+#endif
+		if (env->local_group)
+			continue;
+
+		if (sd_flags & SD_ASYM_CPUCAPACITY) {
+			/* Check for a misfit task on the cpu */
+			if (sgs->group_misfit_task_load < rq->misfit_task_load) {
+				sgs->group_misfit_task_load = rq->misfit_task_load;
+				*env->sg_overloaded = 1;
+			}
+		} else if (idle && sched_reduced_capacity(rq, env->sd)) {
+			/* Check for a task running on a CPU with reduced capacity */
+			if (sgs->group_misfit_task_load < load)
+				sgs->group_misfit_task_load = load;
+		}
+	}
+
+	sgs->group_capacity = group->sgc->capacity;
+
+	/* Only count group_weight for allowed cpus */
+	sgs->group_weight = cpumask_weight_and(sched_group_span(group), env->cpus);
+
+	/* Check if dst CPU is idle and preferred to this group */
+	if (find_src_sg && !env->local_group && idle && sgs->sum_h_nr_running &&
+	    sched_group_asym(env->lb_env, sgs, group))
+		sgs->group_asym_packing = 1;
+
+	/* Check for loaded SMT group to be balanced to dst CPU */
+	if (find_src_sg && !env->local_group && smt_balance(env->lb_env, sgs, group))
+		sgs->group_smt_balance = 1;
+
+	sgs->group_type = group_classify(env->sd->imbalance_pct, group, sgs);
+
+	/*
+	 * Computing avg_load makes sense only when group is fully busy or
+	 * overloaded
+	 */
+	if (sgs->group_type == group_fully_busy ||
+		sgs->group_type == group_overloaded)
+		sgs->avg_load = (sgs->group_load * SCHED_CAPACITY_SCALE) /
+				sgs->group_capacity;
+}
+
 /**
  * update_sg_lb_stats - Update sched_group's statistics for load balancing.
  * @env: The load balancing environment.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH v2 6/6] sched/fair: Unify update_sg_lb_stats() and update_sg_wakeup_stats()
  2025-07-17  6:20 [PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork Adam Li
                   ` (4 preceding siblings ...)
  2025-07-17  6:20 ` [RFC PATCH v2 5/6] sched/fair: Introduce update_sg_stats() Adam Li
@ 2025-07-17  6:20 ` Adam Li
  5 siblings, 0 replies; 7+ messages in thread
From: Adam Li @ 2025-07-17  6:20 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot
  Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, cl,
	linux-kernel, patches, shkaushik, Adam Li

The two functions call common function update_sg_stats(), with
different context.

Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
---
 kernel/sched/fair.c | 136 ++++++--------------------------------------
 1 file changed, 18 insertions(+), 118 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f4ab520951a8..96a2ca4fa880 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10529,83 +10529,20 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 				      bool *sg_overloaded,
 				      bool *sg_overutilized)
 {
-	int i, nr_running, local_group, sd_flags = env->sd->flags;
-	bool balancing_at_rd = !env->sd->parent;
+	struct sg_lb_stat_env stat_env = {
+		.find_src_sg		= true,
+		.cpus			= env->cpus,
+		.sd			= env->sd,
+		.p			= NULL,
+		.sg_overutilized	= sg_overutilized,
+		.sg_overloaded		= sg_overloaded,
+		.local_group		= group == sds->local,
+		.lb_env			= env,
+	};
 
 	memset(sgs, 0, sizeof(*sgs));
 
-	local_group = group == sds->local;
-
-	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
-		struct rq *rq = cpu_rq(i);
-		unsigned long load = cpu_load(rq);
-
-		sgs->group_load += load;
-		sgs->group_util += cpu_util_cfs(i);
-		sgs->group_runnable += cpu_runnable(rq);
-		sgs->sum_h_nr_running += rq->cfs.h_nr_runnable;
-
-		nr_running = rq->nr_running;
-		sgs->sum_nr_running += nr_running;
-
-		if (cpu_overutilized(i))
-			*sg_overutilized = 1;
-
-		/*
-		 * No need to call idle_cpu() if nr_running is not 0
-		 */
-		if (!nr_running && idle_cpu(i)) {
-			sgs->idle_cpus++;
-			/* Idle cpu can't have misfit task */
-			continue;
-		}
-
-		/* Overload indicator is only updated at root domain */
-		if (balancing_at_rd && nr_running > 1)
-			*sg_overloaded = 1;
-
-#ifdef CONFIG_NUMA_BALANCING
-		/* Only fbq_classify_group() uses this to classify NUMA groups */
-		if (sd_flags & SD_NUMA) {
-			sgs->nr_numa_running += rq->nr_numa_running;
-			sgs->nr_preferred_running += rq->nr_preferred_running;
-		}
-#endif
-		if (local_group)
-			continue;
-
-		if (sd_flags & SD_ASYM_CPUCAPACITY) {
-			/* Check for a misfit task on the cpu */
-			if (sgs->group_misfit_task_load < rq->misfit_task_load) {
-				sgs->group_misfit_task_load = rq->misfit_task_load;
-				*sg_overloaded = 1;
-			}
-		} else if (env->idle && sched_reduced_capacity(rq, env->sd)) {
-			/* Check for a task running on a CPU with reduced capacity */
-			if (sgs->group_misfit_task_load < load)
-				sgs->group_misfit_task_load = load;
-		}
-	}
-
-	sgs->group_capacity = group->sgc->capacity;
-
-	sgs->group_weight = cpumask_weight_and(sched_group_span(group), env->cpus);
-
-	/* Check if dst CPU is idle and preferred to this group */
-	if (!local_group && env->idle && sgs->sum_h_nr_running &&
-	    sched_group_asym(env, sgs, group))
-		sgs->group_asym_packing = 1;
-
-	/* Check for loaded SMT group to be balanced to dst CPU */
-	if (!local_group && smt_balance(env, sgs, group))
-		sgs->group_smt_balance = 1;
-
-	sgs->group_type = group_classify(env->sd->imbalance_pct, group, sgs);
-
-	/* Computing avg_load makes sense only when group is overloaded */
-	if (sgs->group_type == group_overloaded)
-		sgs->avg_load = (sgs->group_load * SCHED_CAPACITY_SCALE) /
-				sgs->group_capacity;
+	update_sg_stats(sgs, group, &stat_env);
 }
 
 /**
@@ -10805,7 +10742,12 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd,
 					  struct sg_lb_stats *sgs,
 					  struct task_struct *p)
 {
-	int i, nr_running;
+	struct sg_lb_stat_env stat_env = {
+		.find_src_sg	= false,
+		.cpus		= (struct cpumask *)p->cpus_ptr,
+		.sd		= sd,
+		.p		= p,
+	};
 
 	memset(sgs, 0, sizeof(*sgs));
 
@@ -10813,49 +10755,7 @@ static inline void update_sg_wakeup_stats(struct sched_domain *sd,
 	if (sd->flags & SD_ASYM_CPUCAPACITY)
 		sgs->group_misfit_task_load = 1;
 
-	for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) {
-		struct rq *rq = cpu_rq(i);
-		unsigned int local;
-
-		sgs->group_load += cpu_load_without(rq, p);
-		sgs->group_util += cpu_util_without(i, p);
-		sgs->group_runnable += cpu_runnable_without(rq, p);
-		local = task_running_on_cpu(i, p);
-		sgs->sum_h_nr_running += rq->cfs.h_nr_runnable - local;
-
-		nr_running = rq->nr_running - local;
-		sgs->sum_nr_running += nr_running;
-
-		/*
-		 * No need to call idle_cpu_without() if nr_running is not 0
-		 */
-		if (!nr_running && idle_cpu_without(i, p))
-			sgs->idle_cpus++;
-
-		/* Check if task fits in the CPU */
-		if (sd->flags & SD_ASYM_CPUCAPACITY &&
-		    sgs->group_misfit_task_load &&
-		    task_fits_cpu(p, i))
-			sgs->group_misfit_task_load = 0;
-
-	}
-
-	sgs->group_capacity = group->sgc->capacity;
-
-	/* Only count group_weight if p can run on these cpus */
-	sgs->group_weight = cpumask_weight_and(sched_group_span(group),
-				p->cpus_ptr);
-
-	sgs->group_type = group_classify(sd->imbalance_pct, group, sgs);
-
-	/*
-	 * Computing avg_load makes sense only when group is fully busy or
-	 * overloaded
-	 */
-	if (sgs->group_type == group_fully_busy ||
-		sgs->group_type == group_overloaded)
-		sgs->avg_load = (sgs->group_load * SCHED_CAPACITY_SCALE) /
-				sgs->group_capacity;
+	update_sg_stats(sgs, group, &stat_env);
 }
 
 static bool update_pick_idlest(struct sched_group *idlest,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-07-17  6:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-17  6:20 [PATCH v2 0/6] sched/fair: Fix imbalance issue when balancing fork Adam Li
2025-07-17  6:20 ` [PATCH v2 1/6] sched/fair: Only update stats for allowed CPUs when looking for dst group Adam Li
2025-07-17  6:20 ` [PATCH v2 2/6] sched/fair: Only count group weight " Adam Li
2025-07-17  6:20 ` [PATCH v2 3/6] sched/fair: Only count group weight for CPUs doing load balance when looking for src group Adam Li
2025-07-17  6:20 ` [RFC PATCH v2 4/6] sched/fair: Make update_sg_wakeup_stats() helper functions handle NULL pointers Adam Li
2025-07-17  6:20 ` [RFC PATCH v2 5/6] sched/fair: Introduce update_sg_stats() Adam Li
2025-07-17  6:20 ` [RFC PATCH v2 6/6] sched/fair: Unify update_sg_lb_stats() and update_sg_wakeup_stats() Adam Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).