Re: [PATCH 3/5] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

From: Andrea Righi <arighi@nvidia.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	Christian Loehle <christian.loehle@arm.com>,
	Koba Ko <kobak@nvidia.com>,
	Felix Abecassis <fabecassis@nvidia.com>,
	Balbir Singh <balbirs@nvidia.com>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/5] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
Date: Wed, 6 May 2026 20:15:01 +0200	[thread overview]
Message-ID: <afuFJUx_r2lm3y_q@gpd4> (raw)
In-Reply-To: <CAKfTPtDLDfACg+71-CCG_yqWqJUFjUmh6CRurkUzeBYh-KeEmg@mail.gmail.com>

Hi Vincent,

On Wed, May 06, 2026 at 12:29:10PM +0200, Vincent Guittot wrote:
> On Tue, 28 Apr 2026 at 16:44, Andrea Righi <arighi@nvidia.com> wrote:
> >
> > On systems with asymmetric CPU capacity (e.g., ACPI/CPPC reporting
> > different per-core frequencies), the wakeup path uses
> > select_idle_capacity() and prioritizes idle CPUs with higher capacity
> > for better task placement. However, when those CPUs belong to SMT cores,
> > their effective capacity can be much lower than the nominal capacity
> > when the sibling thread is busy: SMT siblings compete for shared
> > resources, so a "high capacity" CPU that is idle but whose sibling is
> > busy does not deliver its full capacity. This effective capacity
> > reduction cannot be modeled by the static capacity value alone.
> >
> > Introduce SMT awareness in the asym-capacity idle selection policy: when
> > SMT is active, always prefer fully-idle SMT cores over partially-idle
> > ones.
> >
> > Prioritizing fully-idle SMT cores yields better task placement because
> > the effective capacity of partially-idle SMT cores is reduced; always
> > preferring them when available leads to more accurate capacity usage on
> > task wakeup.
> >
> > On an SMT system with asymmetric CPU capacities, SMT-aware idle
> > selection has been shown to improve throughput by around 15-18% for
> > CPU-bound workloads, running an amount of tasks equal to the amount of
> > SMT cores.
> >
> > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Cc: Christian Loehle <christian.loehle@arm.com>
> > Cc: Koba Ko <kobak@nvidia.com>
> > Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
> > Reported-by: Felix Abecassis <fabecassis@nvidia.com>
> > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> > ---
> >  kernel/sched/fair.c | 70 +++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 65 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index bbdf537f61154..6a7e4943804b5 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7989,6 +7989,22 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> >         return idle_cpu;
> >  }
> >
> > +/*
> > + * Idle-capacity scan ranks transformed util_fits_cpu() outcomes; lower values
> > + * are more preferred (see select_idle_capacity()).
> > + */
> > +enum asym_fits_state {
> > +       /* In descending order of preference */
> > +       ASYM_IDLE_CORE_UCLAMP_MISFIT = -4,
> > +       ASYM_IDLE_CORE_COMPLETE_MISFIT,
> > +       ASYM_IDLE_THREAD_FITS,
> > +       ASYM_IDLE_THREAD_UCLAMP_MISFIT,
> > +       ASYM_IDLE_COMPLETE_MISFIT,
> > +
> > +       /* util_fits_cpu() bias for an idle core. */
> > +       ASYM_IDLE_CORE_BIAS = -3,
> > +};
> > +
> >  /*
> >   * Scan the asym_capacity domain for idle CPUs; pick the first idle one on which
> >   * the task fits. If no CPU is big enough, but there are idle ones, try to
> > @@ -7997,8 +8013,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> >  static int
> >  select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> >  {
> > +       bool prefers_idle_core = sched_smt_active() && test_idle_cores(target);
> >         unsigned long task_util, util_min, util_max, best_cap = 0;
> > -       int fits, best_fits = 0;
> > +       int fits, best_fits = ASYM_IDLE_COMPLETE_MISFIT;
> >         int cpu, best_cpu = -1;
> >         struct cpumask *cpus;
> >
> > @@ -8010,6 +8027,7 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> >         util_max = uclamp_eff_value(p, UCLAMP_MAX);
> >
> >         for_each_cpu_wrap(cpu, cpus, target) {
> > +               bool preferred_core = !prefers_idle_core || is_core_idle(cpu);
> >                 unsigned long cpu_cap = capacity_of(cpu);
> >
> >                 if (!choose_idle_cpu(cpu, p))
> > @@ -8018,7 +8036,7 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> >                 fits = util_fits_cpu(task_util, util_min, util_max, cpu);
> >
> >                 /* This CPU fits with all requirements */
> > -               if (fits > 0)
> > +               if (fits > 0 && preferred_core)
> >                         return cpu;
> >                 /*
> >                  * Only the min performance hint (i.e. uclamp_min) doesn't fit.
> > @@ -8026,9 +8044,33 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> >                  */
> >                 else if (fits < 0)
> >                         cpu_cap = get_actual_cpu_capacity(cpu);
> > +               /*
> > +                * fits > 0 implies we are not on a preferred core
> > +                * but the util fits CPU capacity. Set fits to ASYM_IDLE_THREAD_FITS
> > +                * so the effective range becomes
> > +                * [ASYM_IDLE_THREAD_FITS, ASYM_IDLE_COMPLETE_MISFIT], where:
> > +                *    ASYM_IDLE_COMPLETE_MISFIT - does not fit
> > +                *    ASYM_IDLE_THREAD_UCLAMP_MISFIT - fits with the exception of UCLAMP_MIN
> > +                *    ASYM_IDLE_THREAD_FITS - fits with the exception of preferred_core
> > +                */
> > +               else if (fits > 0)
> > +                       fits = ASYM_IDLE_THREAD_FITS;
> > +
> > +               /*
> > +                * If we are on a preferred core, translate the range of fits
> > +                * of [ASYM_IDLE_THREAD_UCLAMP_MISFIT, ASYM_IDLE_COMPLETE_MISFIT] to
> > +                * [ASYM_IDLE_CORE_UCLAMP_MISFIT, ASYM_IDLE_CORE_COMPLETE_MISFIT].
> > +                * This ensures that an idle core is always given priority over
> > +                * (partially) busy core.
> > +                *
> > +                * A fully fitting idle core would have returned early and hence
> > +                * fits > 0 for preferred_core need not be dealt with.
> > +                */
> > +               if (preferred_core)
> > +                       fits += ASYM_IDLE_CORE_BIAS;
> 
> It might be good to add a comment stating that if the system doesn't
> have SMT, prefers_idle_core and preferred_core are always true.
> 
> This is okay because CPU == Core in this case but the value differs
> from the default 0 or -1 of util_fits_cpu

Ack.

> 
> >
> >                 /*
> > -                * First, select CPU which fits better (-1 being better than 0).
> > +                * First, select CPU which fits better (lower is more preferred).
> >                  * Then, select the one with best capacity at same level.
> >                  */
> >                 if ((fits < best_fits) ||
> > @@ -8039,6 +8081,19 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> >                 }
> >         }
> >
> > +       /*
> > +        * A value in the [ASYM_IDLE_CORE_UCLAMP_MISFIT, ASYM_IDLE_CORE_BIAS]
> 
> s/ASYM_IDLE_CORE_BIAS/ASYM_IDLE_CORE_COMPLETE_MISFIT/
> 
> ASYM_IDLE_CORE_BIAS is an offset to move an idle core that doesn't
> fully fit in the preferred range [ASYM_IDLE_CORE_UCLAMP_MISFIT,
> ASYM_IDLE_CORE_COMPLETE_MISFIT]
> 
> Keeping in mind that ASYM_IDLE_CORE_BIAS == -3 == ASYM_IDLE_CORE_BIAS

Ah yes, using ASYM_IDLE_CORE_BIAS is just confusing, we should definitely use
[ASYM_IDLE_CORE_UCLAMP_MISFIT, ASYM_IDLE_CORE_COMPLETE_MISFIT]. Will fix this.

> 
> > +        * range means the chosen CPU is in a fully idle SMT core. Values above
> > +        * ASYM_IDLE_CORE_BIAS mean we never ranked such a CPU best.
> 
> s/ASYM_IDLE_CORE_BIAS/ASYM_IDLE_CORE_COMPLETE_MISFIT/

Ack.

> 
> > +        *
> > +        * The asym-capacity wakeup path returns from select_idle_sibling()
> > +        * after this function and never runs select_idle_cpu(), so the usual
> > +        * select_idle_cpu() tail that clears idle cores must live here when the
> > +        * idle-core preference did not win.
> > +        */
> > +       if (prefers_idle_core && best_fits > ASYM_IDLE_CORE_BIAS)
> 
> s/ASYM_IDLE_CORE_BIAS/ASYM_IDLE_CORE_COMPLETE_MISFIT/

Ack.

> 
> > +               set_idle_cores(target, false);
> > +
> >         return best_cpu;
> >  }
> >
> > @@ -8047,12 +8102,17 @@ static inline bool asym_fits_cpu(unsigned long util,
> >                                  unsigned long util_max,
> >                                  int cpu)
> >  {
> > -       if (sched_asym_cpucap_active())
> > +       if (sched_asym_cpucap_active()) {
> >                 /*
> >                  * Return true only if the cpu fully fits the task requirements
> >                  * which include the utilization and the performance hints.
> > +                *
> > +                * When SMT is active, also require that the core has no busy
> > +                * siblings.
> >                  */
> > -               return (util_fits_cpu(util, util_min, util_max, cpu) > 0);
> > +               return (!sched_smt_active() || is_core_idle(cpu)) &&
> > +                      (util_fits_cpu(util, util_min, util_max, cpu) > 0);
> > +       }
> >
> >         return true;
> >  }
> > --
> > 2.54.0
> >

Thanks,
-Andrea

next prev parent reply	other threads:[~2026-05-06 18:15 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28 14:41 [PATCH v5 0/5] sched/fair: SMT-aware asymmetric CPU capacity Andrea Righi
2026-04-28 14:41 ` [PATCH 1/5] sched/fair: Drop redundant RCU read lock in NOHZ kick path Andrea Righi
2026-04-28 16:29   ` K Prateek Nayak
2026-04-29 16:07     ` [PATCH v2 " Andrea Righi
2026-05-05  9:15   ` [PATCH " Dietmar Eggemann
2026-05-05  9:22     ` Andrea Righi
2026-04-28 14:41 ` [PATCH 2/5] sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity Andrea Righi
2026-05-05 12:48   ` Dietmar Eggemann
2026-05-06  9:45   ` Vincent Guittot
2026-05-06 10:19     ` K Prateek Nayak
2026-05-06 10:30       ` Vincent Guittot
2026-04-28 14:41 ` [PATCH 3/5] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Andrea Righi
2026-05-05 17:20   ` Dietmar Eggemann
2026-05-06 18:31     ` Andrea Righi
2026-05-06 10:29   ` Vincent Guittot
2026-05-06 12:34     ` Vincent Guittot
2026-05-06 18:15     ` Andrea Righi [this message]
2026-04-28 14:41 ` [PATCH 4/5] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity Andrea Righi
2026-04-28 14:41 ` [PATCH 5/5] sched/fair: Add SIS_UTIL support to select_idle_capacity() Andrea Righi
2026-05-06 12:59   ` Vincent Guittot
2026-05-06 17:01     ` Dietmar Eggemann
2026-05-06 18:11       ` Andrea Righi
2026-05-07  6:47         ` Vincent Guittot
2026-05-08 14:49           ` Dietmar Eggemann
2026-05-08 22:05             ` Andrea Righi
2026-05-05 20:40 ` [PATCH v5 0/5] sched/fair: SMT-aware asymmetric CPU capacity Dietmar Eggemann
  -- strict thread matches above, loose matches on Subject: below --
2026-05-09 18:01 Andrea Righi
2026-05-09 18:01 ` [PATCH 3/5] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Andrea Righi
2026-05-09 18:07 [PATCH v6 0/5 RESEND] sched/fair: SMT-aware asymmetric CPU capacity Andrea Righi
2026-05-09 18:07 ` [PATCH 3/5] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Andrea Righi
2026-05-11 13:07   ` Vincent Guittot
2026-05-11 13:45     ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afuFJUx_r2lm3y_q@gpd4 \
    --to=arighi@nvidia.com \
    --cc=balbirs@nvidia.com \
    --cc=bsegall@google.com \
    --cc=christian.loehle@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=fabecassis@nvidia.com \
    --cc=joelagnelf@nvidia.com \
    --cc=juri.lelli@redhat.com \
    --cc=kobak@nvidia.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sshegde@linux.ibm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox