* Re: [PATCH v2 1/4] sched/fair: Check CPU capacity before comparing group types during load balance [not found] ` <20260429-rneri-fix-cas-clusters-v2-1-cd787de35cc6@linux.intel.com> @ 2026-05-06 10:38 ` Christian Loehle 2026-05-06 23:45 ` Ricardo Neri 0 siblings, 1 reply; 6+ messages in thread From: Christian Loehle @ 2026-05-06 10:38 UTC (permalink / raw) To: Ricardo Neri, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Tim C Chen, Chen Yu, Barry Song Cc: Rafael J. Wysocki, Len Brown, ricardo.neri, linux-kernel, Andrea Righi On 4/29/26 22:19, Ricardo Neri wrote: > update_sd_pick_busiest() may incorrectly select a fully_busy group as the > busiest group when its per-CPU capacity exceeds that of the destination > CPU. This happens because the type of busiest group is initialized to > group_has_spare and allows the fully_busy group to win the type comparison. > > update_sd_pick_busiest() should not choose a candidate scheduling group > with at most one runnable task if its per-CPU capacity is greater than that > of the destination CPU. Such a check already exists, but it is done too > late: after the type comparison, preventing a subsequent fully_busy group > of equal per-CPU capacity from being correctly selected. > > Move this check to occur before comparing group types. > > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> > --- > Changes since v1: > * Added a note clarifying that SMT and SD_ASYM_CPUCAPACITY are mutually > exclusive. (Tim) > * Kept parentheses around bitwise operators for clarity. > * Rewrote patch description for clarity. > --- > kernel/sched/fair.c | 25 ++++++++++++++----------- > 1 file changed, 14 insertions(+), 11 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 728965851842..0dbed82aa63f 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -10788,6 +10788,20 @@ static bool update_sd_pick_busiest(struct lb_env *env, > sds->local_stat.group_type != group_has_spare)) > return false; > > + /* > + * Candidate sg has no more than one task per CPU and has higher > + * per-CPU capacity. Migrating tasks to less capable CPUs may harm > + * throughput. Maximize throughput, power/energy consequences are not > + * considered. > + * > + * Systems with SMT are unaffected, as asymmetric capacity is not set > + * in such case. > + */ > + if ((env->sd->flags & SD_ASYM_CPUCAPACITY) && > + (sgs->group_type <= group_fully_busy) && > + (capacity_greater(sg->sgc->min_capacity, capacity_of(env->dst_cpu)))) > + return false; > + > if (sgs->group_type > busiest->group_type) > return true; > > @@ -10890,17 +10904,6 @@ static bool update_sd_pick_busiest(struct lb_env *env, > break; > } > > - /* > - * Candidate sg has no more than one task per CPU and has higher > - * per-CPU capacity. Migrating tasks to less capable CPUs may harm > - * throughput. Maximize throughput, power/energy consequences are not > - * considered. > - */ > - if ((env->sd->flags & SD_ASYM_CPUCAPACITY) && > - (sgs->group_type <= group_fully_busy) && > - (capacity_greater(sg->sgc->min_capacity, capacity_of(env->dst_cpu)))) > - return false; > - > return true; > } > > I think it deserves a Fixes, but nonetheless: Reviewed-by: Christian Loehle <christian.loehle@arm.com> I've CCed Andrea, just because of this SMT -> !SD_ASYM_CPUCAPACITY currently being up for debate... ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/4] sched/fair: Check CPU capacity before comparing group types during load balance 2026-05-06 10:38 ` [PATCH v2 1/4] sched/fair: Check CPU capacity before comparing group types during load balance Christian Loehle @ 2026-05-06 23:45 ` Ricardo Neri 0 siblings, 0 replies; 6+ messages in thread From: Ricardo Neri @ 2026-05-06 23:45 UTC (permalink / raw) To: Christian Loehle Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Tim C Chen, Chen Yu, Barry Song, Rafael J. Wysocki, Len Brown, ricardo.neri, linux-kernel, Andrea Righi On Wed, May 06, 2026 at 11:38:31AM +0100, Christian Loehle wrote: > On 4/29/26 22:19, Ricardo Neri wrote: > > update_sd_pick_busiest() may incorrectly select a fully_busy group as the > > busiest group when its per-CPU capacity exceeds that of the destination > > CPU. This happens because the type of busiest group is initialized to > > group_has_spare and allows the fully_busy group to win the type comparison. > > > > update_sd_pick_busiest() should not choose a candidate scheduling group > > with at most one runnable task if its per-CPU capacity is greater than that > > of the destination CPU. Such a check already exists, but it is done too > > late: after the type comparison, preventing a subsequent fully_busy group > > of equal per-CPU capacity from being correctly selected. > > > > Move this check to occur before comparing group types. > > > > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> > > --- > > Changes since v1: > > * Added a note clarifying that SMT and SD_ASYM_CPUCAPACITY are mutually > > exclusive. (Tim) > > * Kept parentheses around bitwise operators for clarity. > > * Rewrote patch description for clarity. > > --- > > kernel/sched/fair.c | 25 ++++++++++++++----------- > > 1 file changed, 14 insertions(+), 11 deletions(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 728965851842..0dbed82aa63f 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -10788,6 +10788,20 @@ static bool update_sd_pick_busiest(struct lb_env *env, > > sds->local_stat.group_type != group_has_spare)) > > return false; > > > > + /* > > + * Candidate sg has no more than one task per CPU and has higher > > + * per-CPU capacity. Migrating tasks to less capable CPUs may harm > > + * throughput. Maximize throughput, power/energy consequences are not > > + * considered. > > + * > > + * Systems with SMT are unaffected, as asymmetric capacity is not set > > + * in such case. > > + */ > > + if ((env->sd->flags & SD_ASYM_CPUCAPACITY) && > > + (sgs->group_type <= group_fully_busy) && > > + (capacity_greater(sg->sgc->min_capacity, capacity_of(env->dst_cpu)))) > > + return false; > > + > > if (sgs->group_type > busiest->group_type) > > return true; > > > > @@ -10890,17 +10904,6 @@ static bool update_sd_pick_busiest(struct lb_env *env, > > break; > > } > > > > - /* > > - * Candidate sg has no more than one task per CPU and has higher > > - * per-CPU capacity. Migrating tasks to less capable CPUs may harm > > - * throughput. Maximize throughput, power/energy consequences are not > > - * considered. > > - */ > > - if ((env->sd->flags & SD_ASYM_CPUCAPACITY) && > > - (sgs->group_type <= group_fully_busy) && > > - (capacity_greater(sg->sgc->min_capacity, capacity_of(env->dst_cpu)))) > > - return false; > > - > > return true; > > } > > > > > > I think it deserves a Fixes, but nonetheless: I will add this tag. > Reviewed-by: Christian Loehle <christian.loehle@arm.com> Thank you! > > I've CCed Andrea, just because of this SMT -> !SD_ASYM_CPUCAPACITY currently > being up for debate... Ah, I missed that patchset. I will take a look. For now I will leave the comment. It can always be updated later. ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <20260429-rneri-fix-cas-clusters-v2-2-cd787de35cc6@linux.intel.com>]
* Re: [PATCH v2 2/4] sched/fair: Skip misfit load accounting when the destination CPU cannot help [not found] ` <20260429-rneri-fix-cas-clusters-v2-2-cd787de35cc6@linux.intel.com> @ 2026-05-06 11:39 ` Christian Loehle 2026-05-06 23:47 ` Ricardo Neri 0 siblings, 1 reply; 6+ messages in thread From: Christian Loehle @ 2026-05-06 11:39 UTC (permalink / raw) To: Ricardo Neri, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Tim C Chen, Chen Yu, Barry Song Cc: Rafael J. Wysocki, Len Brown, ricardo.neri, linux-kernel On 4/29/26 22:19, Ricardo Neri wrote: > In domains with asymmetric capacity, identifying misfit load in a > scheduling group is not useful when the destination CPU cannot help (i.e., > its capacity exceeds the group's maximum CPU capacity by less than ~5%). In > such cases, it also prevents load balance among clusters of equal capacity > when CONFIG_SCHED_CLUSTER is enabled. This happens because > update_sd_pick_busiest() skips candidate groups of type misfit_task if the > destination CPU has similar capacity. > > Skipping misfit load accounting in this situation allows the group to be > classified as has_spare or fully_busy and lets load balancing proceed. Keep > marking scheduling groups as overloaded when misfit tasks are present. This > flag propagates to the root domain and allows bigger CPUs in it to help > via newly idle balance. > > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> > --- > Changes since v1: > * Moved the check of the destination CPU capacity inside the code block > used for SD_ASYM_CPUCAPACITY. v1 inadvertedly broke the mutual > exclusion of the sched_reduced_capacity() path. > * Keep marking the root domain as overloaded to allow bigger CPUs to > help. (sashiko) > * Fixed patch description to clarify that the capacity_greater() looks > differences of 5% or more. (Christian) > * Reworded the patch description for clarity. > * I did not include the Reviewed-by tag from Christian since the patch > changed functionally. > --- > kernel/sched/fair.c | 20 +++++++++++++++++--- > 1 file changed, 17 insertions(+), 3 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 0dbed82aa63f..166a5b109e0e 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -10719,10 +10719,24 @@ static inline void update_sg_lb_stats(struct lb_env *env, > continue; > > if (sd_flags & SD_ASYM_CPUCAPACITY) { > - /* Check for a misfit task on the cpu */ > - if (sgs->group_misfit_task_load < rq->misfit_task_load) { > - sgs->group_misfit_task_load = rq->misfit_task_load; > + if (rq->misfit_task_load) { > + /* > + * Always mark the domain overloaded so big CPUs > + * can pick up misfit tasks via newly idle > + * balance. > + */ > *sg_overloaded = 1; > + > + /* > + * Only account misfit load if @dst_cpu can > + * help, otherwise the group may be classified > + * as misfit_task and update_sd_pick_busiest() > + * will skip it. > + */ > + if (capacity_greater(capacity_of(env->dst_cpu), > + group->sgc->max_capacity) && > + (sgs->group_misfit_task_load < rq->misfit_task_load)) > + sgs->group_misfit_task_load = rq->misfit_task_load; > } > } else if (env->idle && sched_reduced_capacity(rq, env->sd)) { > /* Check for a task running on a CPU with reduced capacity */ > Reviewed-by: Christian Loehle <christian.loehle@arm.com> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/4] sched/fair: Skip misfit load accounting when the destination CPU cannot help 2026-05-06 11:39 ` [PATCH v2 2/4] sched/fair: Skip misfit load accounting when the destination CPU cannot help Christian Loehle @ 2026-05-06 23:47 ` Ricardo Neri 0 siblings, 0 replies; 6+ messages in thread From: Ricardo Neri @ 2026-05-06 23:47 UTC (permalink / raw) To: Christian Loehle Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Tim C Chen, Chen Yu, Barry Song, Rafael J. Wysocki, Len Brown, ricardo.neri, linux-kernel On Wed, May 06, 2026 at 12:39:23PM +0100, Christian Loehle wrote: > On 4/29/26 22:19, Ricardo Neri wrote: > > In domains with asymmetric capacity, identifying misfit load in a > > scheduling group is not useful when the destination CPU cannot help (i.e., > > its capacity exceeds the group's maximum CPU capacity by less than ~5%). In > > such cases, it also prevents load balance among clusters of equal capacity > > when CONFIG_SCHED_CLUSTER is enabled. This happens because > > update_sd_pick_busiest() skips candidate groups of type misfit_task if the > > destination CPU has similar capacity. > > > > Skipping misfit load accounting in this situation allows the group to be > > classified as has_spare or fully_busy and lets load balancing proceed. Keep > > marking scheduling groups as overloaded when misfit tasks are present. This > > flag propagates to the root domain and allows bigger CPUs in it to help > > via newly idle balance. > > > > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> > > --- > > Changes since v1: > > * Moved the check of the destination CPU capacity inside the code block > > used for SD_ASYM_CPUCAPACITY. v1 inadvertedly broke the mutual > > exclusion of the sched_reduced_capacity() path. > > * Keep marking the root domain as overloaded to allow bigger CPUs to > > help. (sashiko) > > * Fixed patch description to clarify that the capacity_greater() looks > > differences of 5% or more. (Christian) > > * Reworded the patch description for clarity. > > * I did not include the Reviewed-by tag from Christian since the patch > > changed functionally. > > --- > > kernel/sched/fair.c | 20 +++++++++++++++++--- > > 1 file changed, 17 insertions(+), 3 deletions(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 0dbed82aa63f..166a5b109e0e 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -10719,10 +10719,24 @@ static inline void update_sg_lb_stats(struct lb_env *env, > > continue; > > > > if (sd_flags & SD_ASYM_CPUCAPACITY) { > > - /* Check for a misfit task on the cpu */ > > - if (sgs->group_misfit_task_load < rq->misfit_task_load) { > > - sgs->group_misfit_task_load = rq->misfit_task_load; > > + if (rq->misfit_task_load) { > > + /* > > + * Always mark the domain overloaded so big CPUs > > + * can pick up misfit tasks via newly idle > > + * balance. > > + */ > > *sg_overloaded = 1; > > + > > + /* > > + * Only account misfit load if @dst_cpu can > > + * help, otherwise the group may be classified > > + * as misfit_task and update_sd_pick_busiest() > > + * will skip it. > > + */ > > + if (capacity_greater(capacity_of(env->dst_cpu), > > + group->sgc->max_capacity) && > > + (sgs->group_misfit_task_load < rq->misfit_task_load)) > > + sgs->group_misfit_task_load = rq->misfit_task_load; > > } > > } else if (env->idle && sched_reduced_capacity(rq, env->sd)) { > > /* Check for a task running on a CPU with reduced capacity */ > > > > Reviewed-by: Christian Loehle <christian.loehle@arm.com> Thank you for your review! ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <20260429-rneri-fix-cas-clusters-v2-3-cd787de35cc6@linux.intel.com>]
* Re: [PATCH v2 3/4] sched/fair: Allow load balancing between CPUs of identical capacity [not found] ` <20260429-rneri-fix-cas-clusters-v2-3-cd787de35cc6@linux.intel.com> @ 2026-05-06 13:10 ` Christian Loehle 2026-05-08 12:52 ` Ricardo Neri 0 siblings, 1 reply; 6+ messages in thread From: Christian Loehle @ 2026-05-06 13:10 UTC (permalink / raw) To: Ricardo Neri, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Tim C Chen, Chen Yu, Barry Song Cc: Rafael J. Wysocki, Len Brown, ricardo.neri, linux-kernel On 4/29/26 22:19, Ricardo Neri wrote: > sched_balance_find_src_rq() avoids selecting a runqueue with a single > running task as busiest if doing so results in migrating the task to a > CPU with less than ~5% of extra capacity. It also unintentionally > prevents migrations between CPUs of identical capacity. > > When CONFIG_SCHED_CLUSTER is enabled, load should be balanced across > clusters of CPUs with the same capacity. Allowing migration between CPUs > of identical capacity is necessary to meet this goal. > > We are interested in the architectural capacity of the involved CPUs, > excluding any reductions due to side activity or thermal pressure. Use > arch_scale_cpu_capacity(). > > While here, invert the check for runtime capacity for clarity. > > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> > --- > Changes since v1: > * Used arch_scale_cpu_capacity() instead of capacity_of() to ignore > runtime variability. > * Inverted the check for runtime capacity. (Christian) > * Reworded patch description for clarity. > --- > kernel/sched/fair.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 166a5b109e0e..4105717e64fe 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -11816,9 +11816,14 @@ static struct rq *sched_balance_find_src_rq(struct lb_env *env, > * eventually lead to active_balancing high->low capacity. > * Higher per-CPU capacity is considered better than balancing > * average load. > + * > + * Cluster scheduling requires balancing load across clusters > + * of identical capacity. Use architectural capacity to ignore > + * runtime variability. > */ > if (env->sd->flags & SD_ASYM_CPUCAPACITY && > - !capacity_greater(capacity_of(env->dst_cpu), capacity) && > + arch_scale_cpu_capacity(env->dst_cpu) != arch_scale_cpu_capacity(i) && > + capacity_greater(capacity, capacity_of(env->dst_cpu)) && > nr_running == 1) > continue; > > I wonder if we shouldn't use capacity_greater() margin for both, i.e. capacity_greater(arch_scale_cpu_capacity(i), arch_scale_cpu_capacity(env->dst_cpu)) && For example the orion o6 has a cluster with 1024 and one with 984, If we allow balancing 984->984 I think it's only consistent to also allow 984->1024. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 3/4] sched/fair: Allow load balancing between CPUs of identical capacity 2026-05-06 13:10 ` [PATCH v2 3/4] sched/fair: Allow load balancing between CPUs of identical capacity Christian Loehle @ 2026-05-08 12:52 ` Ricardo Neri 0 siblings, 0 replies; 6+ messages in thread From: Ricardo Neri @ 2026-05-08 12:52 UTC (permalink / raw) To: Christian Loehle Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider, Tim C Chen, Chen Yu, Barry Song, Rafael J. Wysocki, Len Brown, ricardo.neri, linux-kernel On Wed, May 06, 2026 at 02:10:22PM +0100, Christian Loehle wrote: > On 4/29/26 22:19, Ricardo Neri wrote: > > sched_balance_find_src_rq() avoids selecting a runqueue with a single > > running task as busiest if doing so results in migrating the task to a > > CPU with less than ~5% of extra capacity. It also unintentionally > > prevents migrations between CPUs of identical capacity. > > > > When CONFIG_SCHED_CLUSTER is enabled, load should be balanced across > > clusters of CPUs with the same capacity. Allowing migration between CPUs > > of identical capacity is necessary to meet this goal. > > > > We are interested in the architectural capacity of the involved CPUs, > > excluding any reductions due to side activity or thermal pressure. Use > > arch_scale_cpu_capacity(). > > > > While here, invert the check for runtime capacity for clarity. > > > > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> > > --- > > Changes since v1: > > * Used arch_scale_cpu_capacity() instead of capacity_of() to ignore > > runtime variability. > > * Inverted the check for runtime capacity. (Christian) > > * Reworded patch description for clarity. > > --- > > kernel/sched/fair.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 166a5b109e0e..4105717e64fe 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -11816,9 +11816,14 @@ static struct rq *sched_balance_find_src_rq(struct lb_env *env, > > * eventually lead to active_balancing high->low capacity. > > * Higher per-CPU capacity is considered better than balancing > > * average load. > > + * > > + * Cluster scheduling requires balancing load across clusters > > + * of identical capacity. Use architectural capacity to ignore > > + * runtime variability. > > */ > > if (env->sd->flags & SD_ASYM_CPUCAPACITY && > > - !capacity_greater(capacity_of(env->dst_cpu), capacity) && > > + arch_scale_cpu_capacity(env->dst_cpu) != arch_scale_cpu_capacity(i) && > > + capacity_greater(capacity, capacity_of(env->dst_cpu)) && > > nr_running == 1) > > continue; > > > > > > I wonder if we shouldn't use capacity_greater() margin for both, i.e. > capacity_greater(arch_scale_cpu_capacity(i), arch_scale_cpu_capacity(env->dst_cpu)) && > > For example the orion o6 has a cluster with 1024 and one with 984, If we allow balancing > 984->984 I think it's only consistent to also allow 984->1024. But that would be a change in the current policy, no? Today we allow a 984-> 1024 balance based on runtime capacity. The scope of this patchset is to make SCHED_CLUSTER work as expected for clusters of same capacity. Perhaps your proposal of using architectural capacity can be evaluated in a separate patchset? By the way, in v3 I will have to undo the inversion of the runtime capacity. The original check allowed balance if dst_cpu had at least 5% more capacity than src_cpu. The inverted check allows balance to CPUs of less capacity if the difference is less than 5%. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-05-08 12:43 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260429-rneri-fix-cas-clusters-v2-0-cd787de35cc6@linux.intel.com>
[not found] ` <20260429-rneri-fix-cas-clusters-v2-1-cd787de35cc6@linux.intel.com>
2026-05-06 10:38 ` [PATCH v2 1/4] sched/fair: Check CPU capacity before comparing group types during load balance Christian Loehle
2026-05-06 23:45 ` Ricardo Neri
[not found] ` <20260429-rneri-fix-cas-clusters-v2-2-cd787de35cc6@linux.intel.com>
2026-05-06 11:39 ` [PATCH v2 2/4] sched/fair: Skip misfit load accounting when the destination CPU cannot help Christian Loehle
2026-05-06 23:47 ` Ricardo Neri
[not found] ` <20260429-rneri-fix-cas-clusters-v2-3-cd787de35cc6@linux.intel.com>
2026-05-06 13:10 ` [PATCH v2 3/4] sched/fair: Allow load balancing between CPUs of identical capacity Christian Loehle
2026-05-08 12:52 ` Ricardo Neri
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox