[PATCH v2 0/2] sched/fair: SMT-aware asymmetric CPU capacity

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/2] sched/fair: SMT-aware asymmetric CPU capacity
@ 2026-04-03  5:31 Andrea Righi
  2026-04-03  5:31 ` [PATCH 1/2] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Andrea Righi
  2026-04-03  5:31 ` [PATCH 2/2] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity Andrea Righi
  0 siblings, 2 replies; 8+ messages in thread
From: Andrea Righi @ 2026-04-03  5:31 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, Christian Loehle, Koba Ko,
	Felix Abecassis, Balbir Singh, Shrikanth Hegde, linux-kernel

This series attempts to improve SD_ASYM_CPUCAPACITY scheduling by introducing
SMT awareness.

= Problem =

Nominal per-logical-CPU capacity can overstate usable compute when an SMT
sibling is busy, because the physical core doesn't deliver its full nominal
capacity. So, several asym-cpu-capacity paths may pick high capacity idle CPUs
that are not actually good destinations.

= Solution =

This patch set aligns those paths with a simple rule already used elsewhere:
when SMT is active, prefer fully idle cores and avoid treating partially idle
SMT siblings as full-capacity targets where that would mislead load balance.

Patch set summary:
 - Prefer fully-idle SMT cores in asym-capacity idle selection: in the wakeup
   fast path, extend select_idle_capacity() / asym_fits_cpu() so idle
   selection can prefer CPUs on fully idle cores.
 - Reject misfit pulls onto busy SMT siblings on SD_ASYM_CPUCAPACITY.

This patch set has been tested on the new Vera Rubin platform, where SMT is
enabled and the firmware exposes small frequency variations (+/-~5%) as
differences in CPU capacity, resulting in SD_ASYM_CPUCAPACITY being set.

Without these patches, performance can drop by up to ~2x with CPU-intensive
workloads, because the SD_ASYM_CPUCAPACITY idle selection policy does not
account for busy SMT siblings.

Alternative approaches have been evaluated, such as equalizing CPU capacities,
either by exposing uniform values via firmware or normalizing them in the kernel
by grouping CPUs within a small capacity window (+-5%).

However, the SMT-aware SD_ASYM_CPUCAPACITY approach has shown better results so
far. Improving this policy also seems worthwhile in general, as future platforms
may enable SMT with asymmetric CPU topologies.

Performance results on Vera Rubin with SD_ASYM_CPUCAPACITY (mainline) vs
SD_ASYM_CPUCAPACITY + SMT:

- NVBLAS benchblas (one task / SMT core):

 +---------------------------------+--------+
 | Configuration                   | gflops |
 +---------------------------------+--------+
 | ASYM (mainline) + SIS_UTIL      |  5478  |
 | ASYM (mainline) + NO_SIS_UTIL   |  5491  |
 |                                 |        |
 | NO ASYM + SIS_UTIL              |  8912  |
 | NO ASYM + NO_SIS_UTIL           |  8978  |
 |                                 |        |
 | ASYM + SMT + SIS_UTIL           |  9259  |
 | ASYM + SMT + NO_SIS_UTIL        |  9291  |
 +---------------------------------+--------+

 - DCPerf MediaWiki (all CPUs):

 +---------------------------------+--------+--------+--------+--------+
 | Configuration                   |   rps  |  p50   |  p95   |  p99   |
 +---------------------------------+--------+--------+--------+--------+
 | ASYM (mainline) + SIS_UTIL      |  7994  |  0.052 |  0.223 |  0.246 |
 | ASYM (mainline) + NO_SIS_UTIL   |  7993  |  0.052 |  0.221 |  0.245 |
 |                                 |        |        |        |        |
 | NO ASYM + SIS_UTIL              |  8113  |  0.067 |  0.184 |  0.225 |
 | NO ASYM + NO_SIS_UTIL           |  8093  |  0.068 |  0.184 |  0.223 |
 |                                 |        |        |        |        |
 | ASYM + SMT + SIS_UTIL           |  8129  |  0.076 |  0.149 |  0.188 |
 | ASYM + SMT + NO_SIS_UTIL        |  8138  |  0.076 |  0.148 |  0.186 |
 +---------------------------------+--------+--------+--------+--------+

In the MediaWiki case SMT awareness is less impactful (compared to equalizing
CPU capacities), because for the majority of the run all CPUs are used, but it
still seems to provide some benefits at reducing tail latency.

See also:
 - https://lore.kernel.org/lkml/20260324005509.1134981-1-arighi@nvidia.com
 - https://lore.kernel.org/lkml/20260318092214.130908-1-arighi@nvidia.com

Changes in v2:
 - Rework SMT awareness logic in select_idle_capacity() (K Prateek Nayak)
 - Drop EAS and find_new_ilb() changes for now
 - Link to v1: https://lore.kernel.org/all/20260326151211.1862600-1-arighi@nvidia.com

Andrea Righi (2):
      sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
      sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity

 kernel/sched/fair.c | 44 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 39 insertions(+), 5 deletions(-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
  2026-04-03  5:31 [PATCH v2 0/2] sched/fair: SMT-aware asymmetric CPU capacity Andrea Righi
@ 2026-04-03  5:31 ` Andrea Righi
  2026-04-07 11:21   ` Dietmar Eggemann
  2026-04-17  9:39   ` Vincent Guittot
  2026-04-03  5:31 ` [PATCH 2/2] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity Andrea Righi
  1 sibling, 2 replies; 8+ messages in thread
From: Andrea Righi @ 2026-04-03  5:31 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, Christian Loehle, Koba Ko,
	Felix Abecassis, Balbir Singh, Shrikanth Hegde, linux-kernel

On systems with asymmetric CPU capacity (e.g., ACPI/CPPC reporting
different per-core frequencies), the wakeup path uses
select_idle_capacity() and prioritizes idle CPUs with higher capacity
for better task placement.

However, when those CPUs belong to SMT cores, their effective capacity
can be much lower than the nominal capacity when the sibling thread is
busy: SMT siblings compete for shared resources, so a "high capacity"
CPU that is idle but whose sibling is busy does not deliver its full
capacity. This effective capacity reduction cannot be modeled by the
static capacity value alone.

When SMT is active, teach asym-capacity idle selection to treat a
logical CPU as a weaker target if its physical core is only partially
idle: select_idle_capacity() no longer returns on the first idle CPU
whose static capacity fits the task when that CPU still has a busy
sibling, it keeps scanning for an idle CPU on a fully-idle core and only
if none qualify does it fall back to partially-idle cores, using shifted
fit scores so fully-idle cores win ties; asym_fits_cpu() applies the
same fully-idle core requirement when asym capacity and SMT are both
active.

This improves task placement, since partially-idle SMT siblings deliver
less than their nominal capacity. Favoring fully idle cores, when
available, can significantly enhance both throughput and wakeup latency
on systems with both SMT and CPU asymmetry.

No functional changes on systems with only asymmetric CPUs or only SMT.

Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Christian Loehle <christian.loehle@arm.com>
Cc: Koba Ko <kobak@nvidia.com>
Reported-by: Felix Abecassis <fabecassis@nvidia.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/fair.c | 36 ++++++++++++++++++++++++++++++++----
 1 file changed, 32 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bf948db905ed1..7f09191014d18 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7774,6 +7774,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
 static int
 select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
 {
+	bool prefers_idle_core = sched_smt_active() && test_idle_cores(target);
 	unsigned long task_util, util_min, util_max, best_cap = 0;
 	int fits, best_fits = 0;
 	int cpu, best_cpu = -1;
@@ -7787,6 +7788,7 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
 	util_max = uclamp_eff_value(p, UCLAMP_MAX);
 
 	for_each_cpu_wrap(cpu, cpus, target) {
+		bool preferred_core = !prefers_idle_core || is_core_idle(cpu);
 		unsigned long cpu_cap = capacity_of(cpu);
 
 		if (!available_idle_cpu(cpu) && !sched_idle_cpu(cpu))
@@ -7795,7 +7797,7 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
 		fits = util_fits_cpu(task_util, util_min, util_max, cpu);
 
 		/* This CPU fits with all requirements */
-		if (fits > 0)
+		if (fits > 0 && preferred_core)
 			return cpu;
 		/*
 		 * Only the min performance hint (i.e. uclamp_min) doesn't fit.
@@ -7803,9 +7805,30 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
 		 */
 		else if (fits < 0)
 			cpu_cap = get_actual_cpu_capacity(cpu);
+		/*
+		 * fits > 0 implies we are not on a preferred core
+		 * but the util fits CPU capacity. Set fits to -2 so
+		 * the effective range becomes [-2, 0] where:
+		 *    0 - does not fit
+		 *   -1 - fits with the exception of UCLAMP_MIN
+		 *   -2 - fits with the exception of preferred_core
+		 */
+		else if (fits > 0)
+			fits = -2;
+
+		/*
+		 * If we are on a preferred core, translate the range of fits
+		 * of [-1, 0] to [-4, -3]. This ensures that an idle core
+		 * is always given priority over (partially) busy core.
+		 *
+		 * A fully fitting idle core would have returned early and hence
+		 * fits > 0 for preferred_core need not be dealt with.
+		 */
+		if (preferred_core)
+			fits -= 3;
 
 		/*
-		 * First, select CPU which fits better (-1 being better than 0).
+		 * First, select CPU which fits better (lower is more preferred).
 		 * Then, select the one with best capacity at same level.
 		 */
 		if ((fits < best_fits) ||
@@ -7824,12 +7847,17 @@ static inline bool asym_fits_cpu(unsigned long util,
 				 unsigned long util_max,
 				 int cpu)
 {
-	if (sched_asym_cpucap_active())
+	if (sched_asym_cpucap_active()) {
 		/*
 		 * Return true only if the cpu fully fits the task requirements
 		 * which include the utilization and the performance hints.
+		 *
+		 * When SMT is active, also require that the core has no busy
+		 * siblings.
 		 */
-		return (util_fits_cpu(util, util_min, util_max, cpu) > 0);
+		return (!sched_smt_active() || is_core_idle(cpu)) &&
+		       (util_fits_cpu(util, util_min, util_max, cpu) > 0);
+	}
 
 	return true;
 }
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity
  2026-04-03  5:31 [PATCH v2 0/2] sched/fair: SMT-aware asymmetric CPU capacity Andrea Righi
  2026-04-03  5:31 ` [PATCH 1/2] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Andrea Righi
@ 2026-04-03  5:31 ` Andrea Righi
  1 sibling, 0 replies; 8+ messages in thread
From: Andrea Righi @ 2026-04-03  5:31 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot
  Cc: Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Valentin Schneider, K Prateek Nayak, Christian Loehle, Koba Ko,
	Felix Abecassis, Balbir Singh, Shrikanth Hegde, linux-kernel

When SD_ASYM_CPUCAPACITY load balancing considers pulling a misfit task,
capacity_of(dst_cpu) can overstate available compute if the SMT sibling
is busy: the core does not deliver its full nominal capacity.

If SMT is active and dst_cpu is not on a fully idle core, skip this
destination so we do not migrate a misfit expecting a capacity upgrade
we cannot actually provide.

No functional changes on systems with only asymmetric CPUs or only SMT.

Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Christian Loehle <christian.loehle@arm.com>
Cc: Koba Ko <kobak@nvidia.com>
Reported-by: Felix Abecassis <fabecassis@nvidia.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/fair.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7f09191014d18..7bebceb5ed9df 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10607,10 +10607,16 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 	 * We can use max_capacity here as reduction in capacity on some
 	 * CPUs in the group should either be possible to resolve
 	 * internally or be covered by avg_load imbalance (eventually).
+	 *
+	 * When SMT is active, only pull a misfit to dst_cpu if it is on a
+	 * fully idle core; otherwise the effective capacity of the core is
+	 * reduced and we may not actually provide more capacity than the
+	 * source.
 	 */
 	if ((env->sd->flags & SD_ASYM_CPUCAPACITY) &&
 	    (sgs->group_type == group_misfit_task) &&
-	    (!capacity_greater(capacity_of(env->dst_cpu), sg->sgc->max_capacity) ||
+	    ((sched_smt_active() && !is_core_idle(env->dst_cpu)) ||
+	     !capacity_greater(capacity_of(env->dst_cpu), sg->sgc->max_capacity) ||
 	     sds->local_stat.group_type != group_has_spare))
 		return false;
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
  2026-04-03  5:31 ` [PATCH 1/2] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Andrea Righi
@ 2026-04-07 11:21   ` Dietmar Eggemann
  2026-04-18  8:24     ` Andrea Righi
  2026-04-17  9:39   ` Vincent Guittot
  1 sibling, 1 reply; 8+ messages in thread
From: Dietmar Eggemann @ 2026-04-07 11:21 UTC (permalink / raw)
  To: Andrea Righi, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot
  Cc: Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Christian Loehle, Koba Ko, Felix Abecassis,
	Balbir Singh, Shrikanth Hegde, linux-kernel



On 03.04.26 07:31, Andrea Righi wrote:
> On systems with asymmetric CPU capacity (e.g., ACPI/CPPC reporting
> different per-core frequencies), the wakeup path uses
> select_idle_capacity() and prioritizes idle CPUs with higher capacity
> for better task placement.
> 
> However, when those CPUs belong to SMT cores, their effective capacity
> can be much lower than the nominal capacity when the sibling thread is
> busy: SMT siblings compete for shared resources, so a "high capacity"
> CPU that is idle but whose sibling is busy does not deliver its full
> capacity. This effective capacity reduction cannot be modeled by the
> static capacity value alone.
> 
> When SMT is active, teach asym-capacity idle selection to treat a
> logical CPU as a weaker target if its physical core is only partially
> idle: select_idle_capacity() no longer returns on the first idle CPU
> whose static capacity fits the task when that CPU still has a busy
> sibling, it keeps scanning for an idle CPU on a fully-idle core and only
> if none qualify does it fall back to partially-idle cores, using shifted
> fit scores so fully-idle cores win ties; asym_fits_cpu() applies the
> same fully-idle core requirement when asym capacity and SMT are both
> active.
> 
> This improves task placement, since partially-idle SMT siblings deliver
> less than their nominal capacity. Favoring fully idle cores, when
> available, can significantly enhance both throughput and wakeup latency
> on systems with both SMT and CPU asymmetry.
> 
> No functional changes on systems with only asymmetric CPUs or only SMT.
> 
> Cc: K Prateek Nayak <kprateek.nayak@amd.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Christian Loehle <christian.loehle@arm.com>
> Cc: Koba Ko <kobak@nvidia.com>
> Reported-by: Felix Abecassis <fabecassis@nvidia.com>
> Signed-off-by: Andrea Righi <arighi@nvidia.com>
> ---
>  kernel/sched/fair.c | 36 ++++++++++++++++++++++++++++++++----
>  1 file changed, 32 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bf948db905ed1..7f09191014d18 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7774,6 +7774,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>  static int
>  select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
>  {
> +	bool prefers_idle_core = sched_smt_active() && test_idle_cores(target);

Somehow I miss a:

    if (prefers_idle_core)
        set_idle_cores(target, false)

The one in select_idle_sibling() -> select_idle_cpu() isn't executed
anymore in with ASYM_CPUCAPACITY.


Another thing is that sic() iterates over CPUs sd_asym_cpucapacity
whereas the idle core thing lives in sd_llc/sd_llc_shared. Both sd's are
probably th same on your system.


>  	unsigned long task_util, util_min, util_max, best_cap = 0;
>  	int fits, best_fits = 0;
>  	int cpu, best_cpu = -1;
[...]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
  2026-04-03  5:31 ` [PATCH 1/2] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Andrea Righi
  2026-04-07 11:21   ` Dietmar Eggemann
@ 2026-04-17  9:39   ` Vincent Guittot
  2026-04-18  6:02     ` Andrea Righi
  1 sibling, 1 reply; 8+ messages in thread
From: Vincent Guittot @ 2026-04-17  9:39 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Christian Loehle, Koba Ko, Felix Abecassis,
	Balbir Singh, Shrikanth Hegde, linux-kernel

On Fri, 3 Apr 2026 at 07:37, Andrea Righi <arighi@nvidia.com> wrote:
>
> On systems with asymmetric CPU capacity (e.g., ACPI/CPPC reporting
> different per-core frequencies), the wakeup path uses
> select_idle_capacity() and prioritizes idle CPUs with higher capacity
> for better task placement.
>
> However, when those CPUs belong to SMT cores, their effective capacity
> can be much lower than the nominal capacity when the sibling thread is
> busy: SMT siblings compete for shared resources, so a "high capacity"
> CPU that is idle but whose sibling is busy does not deliver its full
> capacity. This effective capacity reduction cannot be modeled by the
> static capacity value alone.
>
> When SMT is active, teach asym-capacity idle selection to treat a
> logical CPU as a weaker target if its physical core is only partially
> idle: select_idle_capacity() no longer returns on the first idle CPU
> whose static capacity fits the task when that CPU still has a busy
> sibling, it keeps scanning for an idle CPU on a fully-idle core and only
> if none qualify does it fall back to partially-idle cores, using shifted
> fit scores so fully-idle cores win ties; asym_fits_cpu() applies the
> same fully-idle core requirement when asym capacity and SMT are both
> active.
>
> This improves task placement, since partially-idle SMT siblings deliver
> less than their nominal capacity. Favoring fully idle cores, when
> available, can significantly enhance both throughput and wakeup latency
> on systems with both SMT and CPU asymmetry.
>
> No functional changes on systems with only asymmetric CPUs or only SMT.
>
> Cc: K Prateek Nayak <kprateek.nayak@amd.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Christian Loehle <christian.loehle@arm.com>
> Cc: Koba Ko <kobak@nvidia.com>
> Reported-by: Felix Abecassis <fabecassis@nvidia.com>
> Signed-off-by: Andrea Righi <arighi@nvidia.com>
> ---
>  kernel/sched/fair.c | 36 ++++++++++++++++++++++++++++++++----
>  1 file changed, 32 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bf948db905ed1..7f09191014d18 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7774,6 +7774,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>  static int
>  select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
>  {
> +       bool prefers_idle_core = sched_smt_active() && test_idle_cores(target);
>         unsigned long task_util, util_min, util_max, best_cap = 0;
>         int fits, best_fits = 0;
>         int cpu, best_cpu = -1;
> @@ -7787,6 +7788,7 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
>         util_max = uclamp_eff_value(p, UCLAMP_MAX);
>
>         for_each_cpu_wrap(cpu, cpus, target) {
> +               bool preferred_core = !prefers_idle_core || is_core_idle(cpu);
>                 unsigned long cpu_cap = capacity_of(cpu);
>
>                 if (!available_idle_cpu(cpu) && !sched_idle_cpu(cpu))
> @@ -7795,7 +7797,7 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
>                 fits = util_fits_cpu(task_util, util_min, util_max, cpu);
>
>                 /* This CPU fits with all requirements */
> -               if (fits > 0)
> +               if (fits > 0 && preferred_core)
>                         return cpu;
>                 /*
>                  * Only the min performance hint (i.e. uclamp_min) doesn't fit.
> @@ -7803,9 +7805,30 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
>                  */
>                 else if (fits < 0)
>                         cpu_cap = get_actual_cpu_capacity(cpu);
> +               /*
> +                * fits > 0 implies we are not on a preferred core
> +                * but the util fits CPU capacity. Set fits to -2 so
> +                * the effective range becomes [-2, 0] where:
> +                *    0 - does not fit
> +                *   -1 - fits with the exception of UCLAMP_MIN
> +                *   -2 - fits with the exception of preferred_core
> +                */
> +               else if (fits > 0)
> +                       fits = -2;
> +
> +               /*
> +                * If we are on a preferred core, translate the range of fits
> +                * of [-1, 0] to [-4, -3]. This ensures that an idle core
> +                * is always given priority over (partially) busy core.
> +                *
> +                * A fully fitting idle core would have returned early and hence
> +                * fits > 0 for preferred_core need not be dealt with.
> +                */
> +               if (preferred_core)
> +                       fits -= 3;
>
>                 /*
> -                * First, select CPU which fits better (-1 being better than 0).
> +                * First, select CPU which fits better (lower is more preferred).
>                  * Then, select the one with best capacity at same level.
>                  */
>                 if ((fits < best_fits) ||

You have to clear idle_core if you were looking of an idle core but
didn't find one while looping on CPUs.

You need the following to clear idle core:

@@ -7739,6 +7739,11 @@ select_idle_capacity(struct task_struct *p,
struct sched_domain *sd, int target)
                }
        }

+       /* The range [-4, -3] implies at least one idle core, the values above
+        * imply that we didn't find anyone while looping CPUs */
+       if (prefers_idle_core && fits > -3)
+                       set_idle_cores(target, false);
+
        return best_cpu;
 }


> @@ -7824,12 +7847,17 @@ static inline bool asym_fits_cpu(unsigned long util,
>                                  unsigned long util_max,
>                                  int cpu)
>  {
> -       if (sched_asym_cpucap_active())
> +       if (sched_asym_cpucap_active()) {
>                 /*
>                  * Return true only if the cpu fully fits the task requirements
>                  * which include the utilization and the performance hints.
> +                *
> +                * When SMT is active, also require that the core has no busy
> +                * siblings.
>                  */
> -               return (util_fits_cpu(util, util_min, util_max, cpu) > 0);
> +               return (!sched_smt_active() || is_core_idle(cpu)) &&
> +                      (util_fits_cpu(util, util_min, util_max, cpu) > 0);
> +       }
>
>         return true;
>  }
> --
> 2.53.0
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
  2026-04-17  9:39   ` Vincent Guittot
@ 2026-04-18  6:02     ` Andrea Righi
  2026-04-19 10:20       ` Vincent Guittot
  0 siblings, 1 reply; 8+ messages in thread
From: Andrea Righi @ 2026-04-18  6:02 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Christian Loehle, Koba Ko, Felix Abecassis,
	Balbir Singh, Shrikanth Hegde, linux-kernel

Hi Vincent,

On Fri, Apr 17, 2026 at 11:39:21AM +0200, Vincent Guittot wrote:
> On Fri, 3 Apr 2026 at 07:37, Andrea Righi <arighi@nvidia.com> wrote:
> >
> > On systems with asymmetric CPU capacity (e.g., ACPI/CPPC reporting
> > different per-core frequencies), the wakeup path uses
> > select_idle_capacity() and prioritizes idle CPUs with higher capacity
> > for better task placement.
> >
> > However, when those CPUs belong to SMT cores, their effective capacity
> > can be much lower than the nominal capacity when the sibling thread is
> > busy: SMT siblings compete for shared resources, so a "high capacity"
> > CPU that is idle but whose sibling is busy does not deliver its full
> > capacity. This effective capacity reduction cannot be modeled by the
> > static capacity value alone.
> >
> > When SMT is active, teach asym-capacity idle selection to treat a
> > logical CPU as a weaker target if its physical core is only partially
> > idle: select_idle_capacity() no longer returns on the first idle CPU
> > whose static capacity fits the task when that CPU still has a busy
> > sibling, it keeps scanning for an idle CPU on a fully-idle core and only
> > if none qualify does it fall back to partially-idle cores, using shifted
> > fit scores so fully-idle cores win ties; asym_fits_cpu() applies the
> > same fully-idle core requirement when asym capacity and SMT are both
> > active.
> >
> > This improves task placement, since partially-idle SMT siblings deliver
> > less than their nominal capacity. Favoring fully idle cores, when
> > available, can significantly enhance both throughput and wakeup latency
> > on systems with both SMT and CPU asymmetry.
> >
> > No functional changes on systems with only asymmetric CPUs or only SMT.
> >
> > Cc: K Prateek Nayak <kprateek.nayak@amd.com>
> > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Cc: Christian Loehle <christian.loehle@arm.com>
> > Cc: Koba Ko <kobak@nvidia.com>
> > Reported-by: Felix Abecassis <fabecassis@nvidia.com>
> > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> > ---
> >  kernel/sched/fair.c | 36 ++++++++++++++++++++++++++++++++----
> >  1 file changed, 32 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index bf948db905ed1..7f09191014d18 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7774,6 +7774,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> >  static int
> >  select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> >  {
> > +       bool prefers_idle_core = sched_smt_active() && test_idle_cores(target);
> >         unsigned long task_util, util_min, util_max, best_cap = 0;
> >         int fits, best_fits = 0;
> >         int cpu, best_cpu = -1;
> > @@ -7787,6 +7788,7 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> >         util_max = uclamp_eff_value(p, UCLAMP_MAX);
> >
> >         for_each_cpu_wrap(cpu, cpus, target) {
> > +               bool preferred_core = !prefers_idle_core || is_core_idle(cpu);
> >                 unsigned long cpu_cap = capacity_of(cpu);
> >
> >                 if (!available_idle_cpu(cpu) && !sched_idle_cpu(cpu))
> > @@ -7795,7 +7797,7 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> >                 fits = util_fits_cpu(task_util, util_min, util_max, cpu);
> >
> >                 /* This CPU fits with all requirements */
> > -               if (fits > 0)
> > +               if (fits > 0 && preferred_core)
> >                         return cpu;
> >                 /*
> >                  * Only the min performance hint (i.e. uclamp_min) doesn't fit.
> > @@ -7803,9 +7805,30 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> >                  */
> >                 else if (fits < 0)
> >                         cpu_cap = get_actual_cpu_capacity(cpu);
> > +               /*
> > +                * fits > 0 implies we are not on a preferred core
> > +                * but the util fits CPU capacity. Set fits to -2 so
> > +                * the effective range becomes [-2, 0] where:
> > +                *    0 - does not fit
> > +                *   -1 - fits with the exception of UCLAMP_MIN
> > +                *   -2 - fits with the exception of preferred_core
> > +                */
> > +               else if (fits > 0)
> > +                       fits = -2;
> > +
> > +               /*
> > +                * If we are on a preferred core, translate the range of fits
> > +                * of [-1, 0] to [-4, -3]. This ensures that an idle core
> > +                * is always given priority over (partially) busy core.
> > +                *
> > +                * A fully fitting idle core would have returned early and hence
> > +                * fits > 0 for preferred_core need not be dealt with.
> > +                */
> > +               if (preferred_core)
> > +                       fits -= 3;
> >
> >                 /*
> > -                * First, select CPU which fits better (-1 being better than 0).
> > +                * First, select CPU which fits better (lower is more preferred).
> >                  * Then, select the one with best capacity at same level.
> >                  */
> >                 if ((fits < best_fits) ||
> 
> You have to clear idle_core if you were looking of an idle core but
> didn't find one while looping on CPUs.
> 
> You need the following to clear idle core:
> 
> @@ -7739,6 +7739,11 @@ select_idle_capacity(struct task_struct *p,
> struct sched_domain *sd, int target)
>                 }
>         }
> 
> +       /* The range [-4, -3] implies at least one idle core, the values above
> +        * imply that we didn't find anyone while looping CPUs */
> +       if (prefers_idle_core && fits > -3)
> +                       set_idle_cores(target, false);
> +
>         return best_cpu;
>  }

That makes sense! But it should be best_fits instead of fits, right?

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
  2026-04-07 11:21   ` Dietmar Eggemann
@ 2026-04-18  8:24     ` Andrea Righi
  0 siblings, 0 replies; 8+ messages in thread
From: Andrea Righi @ 2026-04-18  8:24 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Christian Loehle, Koba Ko, Felix Abecassis,
	Balbir Singh, Shrikanth Hegde, linux-kernel

Hi Dietmar,

On Tue, Apr 07, 2026 at 01:21:16PM +0200, Dietmar Eggemann wrote:
> 
> 
> On 03.04.26 07:31, Andrea Righi wrote:
> > On systems with asymmetric CPU capacity (e.g., ACPI/CPPC reporting
> > different per-core frequencies), the wakeup path uses
> > select_idle_capacity() and prioritizes idle CPUs with higher capacity
> > for better task placement.
> > 
> > However, when those CPUs belong to SMT cores, their effective capacity
> > can be much lower than the nominal capacity when the sibling thread is
> > busy: SMT siblings compete for shared resources, so a "high capacity"
> > CPU that is idle but whose sibling is busy does not deliver its full
> > capacity. This effective capacity reduction cannot be modeled by the
> > static capacity value alone.
> > 
> > When SMT is active, teach asym-capacity idle selection to treat a
> > logical CPU as a weaker target if its physical core is only partially
> > idle: select_idle_capacity() no longer returns on the first idle CPU
> > whose static capacity fits the task when that CPU still has a busy
> > sibling, it keeps scanning for an idle CPU on a fully-idle core and only
> > if none qualify does it fall back to partially-idle cores, using shifted
> > fit scores so fully-idle cores win ties; asym_fits_cpu() applies the
> > same fully-idle core requirement when asym capacity and SMT are both
> > active.
> > 
> > This improves task placement, since partially-idle SMT siblings deliver
> > less than their nominal capacity. Favoring fully idle cores, when
> > available, can significantly enhance both throughput and wakeup latency
> > on systems with both SMT and CPU asymmetry.
> > 
> > No functional changes on systems with only asymmetric CPUs or only SMT.
> > 
> > Cc: K Prateek Nayak <kprateek.nayak@amd.com>
> > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Cc: Christian Loehle <christian.loehle@arm.com>
> > Cc: Koba Ko <kobak@nvidia.com>
> > Reported-by: Felix Abecassis <fabecassis@nvidia.com>
> > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> > ---
> >  kernel/sched/fair.c | 36 ++++++++++++++++++++++++++++++++----
> >  1 file changed, 32 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index bf948db905ed1..7f09191014d18 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7774,6 +7774,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> >  static int
> >  select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> >  {
> > +	bool prefers_idle_core = sched_smt_active() && test_idle_cores(target);
> 
> Somehow I miss a:
> 
>     if (prefers_idle_core)
>         set_idle_cores(target, false)
> 
> The one in select_idle_sibling() -> select_idle_cpu() isn't executed
> anymore in with ASYM_CPUCAPACITY.
> 

Right, we need to add this as also pointed by Vincent.

> 
> Another thing is that sic() iterates over CPUs sd_asym_cpucapacity
> whereas the idle core thing lives in sd_llc/sd_llc_shared. Both sd's are
> probably th same on your system.

Hm... they're the same on my machine, but if they're different, clearing
has_idle_cores here is not right and it might lead to false positives. We should
only clear it only when both domains span the same CPUs (or just check if
sd_asym_cpucapacity and sd_llc are the same).

However, if they're not the same, I'm not sure exactly what we should do...
maybe ignore has_idle_cores and always do the scan for now?

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
  2026-04-18  6:02     ` Andrea Righi
@ 2026-04-19 10:20       ` Vincent Guittot
  0 siblings, 0 replies; 8+ messages in thread
From: Vincent Guittot @ 2026-04-19 10:20 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
	K Prateek Nayak, Christian Loehle, Koba Ko, Felix Abecassis,
	Balbir Singh, Shrikanth Hegde, linux-kernel

On Sat, 18 Apr 2026 at 08:02, Andrea Righi <arighi@nvidia.com> wrote:
>
> Hi Vincent,
>
> On Fri, Apr 17, 2026 at 11:39:21AM +0200, Vincent Guittot wrote:
> > On Fri, 3 Apr 2026 at 07:37, Andrea Righi <arighi@nvidia.com> wrote:
> > >
> > > On systems with asymmetric CPU capacity (e.g., ACPI/CPPC reporting
> > > different per-core frequencies), the wakeup path uses
> > > select_idle_capacity() and prioritizes idle CPUs with higher capacity
> > > for better task placement.
> > >
> > > However, when those CPUs belong to SMT cores, their effective capacity
> > > can be much lower than the nominal capacity when the sibling thread is
> > > busy: SMT siblings compete for shared resources, so a "high capacity"
> > > CPU that is idle but whose sibling is busy does not deliver its full
> > > capacity. This effective capacity reduction cannot be modeled by the
> > > static capacity value alone.
> > >
> > > When SMT is active, teach asym-capacity idle selection to treat a
> > > logical CPU as a weaker target if its physical core is only partially
> > > idle: select_idle_capacity() no longer returns on the first idle CPU
> > > whose static capacity fits the task when that CPU still has a busy
> > > sibling, it keeps scanning for an idle CPU on a fully-idle core and only
> > > if none qualify does it fall back to partially-idle cores, using shifted
> > > fit scores so fully-idle cores win ties; asym_fits_cpu() applies the
> > > same fully-idle core requirement when asym capacity and SMT are both
> > > active.
> > >
> > > This improves task placement, since partially-idle SMT siblings deliver
> > > less than their nominal capacity. Favoring fully idle cores, when
> > > available, can significantly enhance both throughput and wakeup latency
> > > on systems with both SMT and CPU asymmetry.
> > >
> > > No functional changes on systems with only asymmetric CPUs or only SMT.
> > >
> > > Cc: K Prateek Nayak <kprateek.nayak@amd.com>
> > > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > > Cc: Christian Loehle <christian.loehle@arm.com>
> > > Cc: Koba Ko <kobak@nvidia.com>
> > > Reported-by: Felix Abecassis <fabecassis@nvidia.com>
> > > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> > > ---
> > >  kernel/sched/fair.c | 36 ++++++++++++++++++++++++++++++++----
> > >  1 file changed, 32 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index bf948db905ed1..7f09191014d18 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -7774,6 +7774,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> > >  static int
> > >  select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> > >  {
> > > +       bool prefers_idle_core = sched_smt_active() && test_idle_cores(target);
> > >         unsigned long task_util, util_min, util_max, best_cap = 0;
> > >         int fits, best_fits = 0;
> > >         int cpu, best_cpu = -1;
> > > @@ -7787,6 +7788,7 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> > >         util_max = uclamp_eff_value(p, UCLAMP_MAX);
> > >
> > >         for_each_cpu_wrap(cpu, cpus, target) {
> > > +               bool preferred_core = !prefers_idle_core || is_core_idle(cpu);
> > >                 unsigned long cpu_cap = capacity_of(cpu);
> > >
> > >                 if (!available_idle_cpu(cpu) && !sched_idle_cpu(cpu))
> > > @@ -7795,7 +7797,7 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> > >                 fits = util_fits_cpu(task_util, util_min, util_max, cpu);
> > >
> > >                 /* This CPU fits with all requirements */
> > > -               if (fits > 0)
> > > +               if (fits > 0 && preferred_core)
> > >                         return cpu;
> > >                 /*
> > >                  * Only the min performance hint (i.e. uclamp_min) doesn't fit.
> > > @@ -7803,9 +7805,30 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> > >                  */
> > >                 else if (fits < 0)
> > >                         cpu_cap = get_actual_cpu_capacity(cpu);
> > > +               /*
> > > +                * fits > 0 implies we are not on a preferred core
> > > +                * but the util fits CPU capacity. Set fits to -2 so
> > > +                * the effective range becomes [-2, 0] where:
> > > +                *    0 - does not fit
> > > +                *   -1 - fits with the exception of UCLAMP_MIN
> > > +                *   -2 - fits with the exception of preferred_core
> > > +                */
> > > +               else if (fits > 0)
> > > +                       fits = -2;
> > > +
> > > +               /*
> > > +                * If we are on a preferred core, translate the range of fits
> > > +                * of [-1, 0] to [-4, -3]. This ensures that an idle core
> > > +                * is always given priority over (partially) busy core.
> > > +                *
> > > +                * A fully fitting idle core would have returned early and hence
> > > +                * fits > 0 for preferred_core need not be dealt with.
> > > +                */
> > > +               if (preferred_core)
> > > +                       fits -= 3;
> > >
> > >                 /*
> > > -                * First, select CPU which fits better (-1 being better than 0).
> > > +                * First, select CPU which fits better (lower is more preferred).
> > >                  * Then, select the one with best capacity at same level.
> > >                  */
> > >                 if ((fits < best_fits) ||
> >
> > You have to clear idle_core if you were looking of an idle core but
> > didn't find one while looping on CPUs.
> >
> > You need the following to clear idle core:
> >
> > @@ -7739,6 +7739,11 @@ select_idle_capacity(struct task_struct *p,
> > struct sched_domain *sd, int target)
> >                 }
> >         }
> >
> > +       /* The range [-4, -3] implies at least one idle core, the values above
> > +        * imply that we didn't find anyone while looping CPUs */
> > +       if (prefers_idle_core && fits > -3)
> > +                       set_idle_cores(target, false);
> > +
> >         return best_cpu;
> >  }
>
> That makes sense! But it should be best_fits instead of fits, right?

yes, of course

>
> Thanks,
> -Andrea

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-04-19 10:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-03  5:31 [PATCH v2 0/2] sched/fair: SMT-aware asymmetric CPU capacity Andrea Righi
2026-04-03  5:31 ` [PATCH 1/2] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Andrea Righi
2026-04-07 11:21   ` Dietmar Eggemann
2026-04-18  8:24     ` Andrea Righi
2026-04-17  9:39   ` Vincent Guittot
2026-04-18  6:02     ` Andrea Righi
2026-04-19 10:20       ` Vincent Guittot
2026-04-03  5:31 ` [PATCH 2/2] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity Andrea Righi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox