All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
	Christian Loehle <christian.loehle@arm.com>,
	Koba Ko <kobak@nvidia.com>,
	Felix Abecassis <fabecassis@nvidia.com>,
	Balbir Singh <balbirs@nvidia.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
Date: Sat, 18 Apr 2026 08:02:16 +0200	[thread overview]
Message-ID: <aeMeaFiuEUhfJ5wg@gpd4> (raw)
In-Reply-To: <CAKfTPtCCXySFMG+bNM4f8LS-QRz99K1UsdAZMEqFTxq71SMKHQ@mail.gmail.com>

Hi Vincent,

On Fri, Apr 17, 2026 at 11:39:21AM +0200, Vincent Guittot wrote:
> On Fri, 3 Apr 2026 at 07:37, Andrea Righi <arighi@nvidia.com> wrote:
> >
> > On systems with asymmetric CPU capacity (e.g., ACPI/CPPC reporting
> > different per-core frequencies), the wakeup path uses
> > select_idle_capacity() and prioritizes idle CPUs with higher capacity
> > for better task placement.
> >
> > However, when those CPUs belong to SMT cores, their effective capacity
> > can be much lower than the nominal capacity when the sibling thread is
> > busy: SMT siblings compete for shared resources, so a "high capacity"
> > CPU that is idle but whose sibling is busy does not deliver its full
> > capacity. This effective capacity reduction cannot be modeled by the
> > static capacity value alone.
> >
> > When SMT is active, teach asym-capacity idle selection to treat a
> > logical CPU as a weaker target if its physical core is only partially
> > idle: select_idle_capacity() no longer returns on the first idle CPU
> > whose static capacity fits the task when that CPU still has a busy
> > sibling, it keeps scanning for an idle CPU on a fully-idle core and only
> > if none qualify does it fall back to partially-idle cores, using shifted
> > fit scores so fully-idle cores win ties; asym_fits_cpu() applies the
> > same fully-idle core requirement when asym capacity and SMT are both
> > active.
> >
> > This improves task placement, since partially-idle SMT siblings deliver
> > less than their nominal capacity. Favoring fully idle cores, when
> > available, can significantly enhance both throughput and wakeup latency
> > on systems with both SMT and CPU asymmetry.
> >
> > No functional changes on systems with only asymmetric CPUs or only SMT.
> >
> > Cc: K Prateek Nayak <kprateek.nayak@amd.com>
> > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Cc: Christian Loehle <christian.loehle@arm.com>
> > Cc: Koba Ko <kobak@nvidia.com>
> > Reported-by: Felix Abecassis <fabecassis@nvidia.com>
> > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> > ---
> >  kernel/sched/fair.c | 36 ++++++++++++++++++++++++++++++++----
> >  1 file changed, 32 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index bf948db905ed1..7f09191014d18 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7774,6 +7774,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> >  static int
> >  select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> >  {
> > +       bool prefers_idle_core = sched_smt_active() && test_idle_cores(target);
> >         unsigned long task_util, util_min, util_max, best_cap = 0;
> >         int fits, best_fits = 0;
> >         int cpu, best_cpu = -1;
> > @@ -7787,6 +7788,7 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> >         util_max = uclamp_eff_value(p, UCLAMP_MAX);
> >
> >         for_each_cpu_wrap(cpu, cpus, target) {
> > +               bool preferred_core = !prefers_idle_core || is_core_idle(cpu);
> >                 unsigned long cpu_cap = capacity_of(cpu);
> >
> >                 if (!available_idle_cpu(cpu) && !sched_idle_cpu(cpu))
> > @@ -7795,7 +7797,7 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> >                 fits = util_fits_cpu(task_util, util_min, util_max, cpu);
> >
> >                 /* This CPU fits with all requirements */
> > -               if (fits > 0)
> > +               if (fits > 0 && preferred_core)
> >                         return cpu;
> >                 /*
> >                  * Only the min performance hint (i.e. uclamp_min) doesn't fit.
> > @@ -7803,9 +7805,30 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> >                  */
> >                 else if (fits < 0)
> >                         cpu_cap = get_actual_cpu_capacity(cpu);
> > +               /*
> > +                * fits > 0 implies we are not on a preferred core
> > +                * but the util fits CPU capacity. Set fits to -2 so
> > +                * the effective range becomes [-2, 0] where:
> > +                *    0 - does not fit
> > +                *   -1 - fits with the exception of UCLAMP_MIN
> > +                *   -2 - fits with the exception of preferred_core
> > +                */
> > +               else if (fits > 0)
> > +                       fits = -2;
> > +
> > +               /*
> > +                * If we are on a preferred core, translate the range of fits
> > +                * of [-1, 0] to [-4, -3]. This ensures that an idle core
> > +                * is always given priority over (partially) busy core.
> > +                *
> > +                * A fully fitting idle core would have returned early and hence
> > +                * fits > 0 for preferred_core need not be dealt with.
> > +                */
> > +               if (preferred_core)
> > +                       fits -= 3;
> >
> >                 /*
> > -                * First, select CPU which fits better (-1 being better than 0).
> > +                * First, select CPU which fits better (lower is more preferred).
> >                  * Then, select the one with best capacity at same level.
> >                  */
> >                 if ((fits < best_fits) ||
> 
> You have to clear idle_core if you were looking of an idle core but
> didn't find one while looping on CPUs.
> 
> You need the following to clear idle core:
> 
> @@ -7739,6 +7739,11 @@ select_idle_capacity(struct task_struct *p,
> struct sched_domain *sd, int target)
>                 }
>         }
> 
> +       /* The range [-4, -3] implies at least one idle core, the values above
> +        * imply that we didn't find anyone while looping CPUs */
> +       if (prefers_idle_core && fits > -3)
> +                       set_idle_cores(target, false);
> +
>         return best_cpu;
>  }

That makes sense! But it should be best_fits instead of fits, right?

Thanks,
-Andrea

  reply	other threads:[~2026-04-18  6:02 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-03  5:31 [PATCH v2 0/2] sched/fair: SMT-aware asymmetric CPU capacity Andrea Righi
2026-04-03  5:31 ` [PATCH 1/2] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Andrea Righi
2026-04-07 11:21   ` Dietmar Eggemann
2026-04-18  8:24     ` Andrea Righi
2026-04-20  5:49       ` K Prateek Nayak
2026-04-20  8:36         ` Andrea Righi
2026-04-20  9:39           ` K Prateek Nayak
2026-04-20 21:42             ` Andrea Righi
2026-04-21  9:01               ` Andrea Righi
2026-04-21  9:35                 ` Andrea Righi
2026-04-21 11:22                   ` K Prateek Nayak
2026-04-21 12:31                     ` Andrea Righi
2026-04-21 13:38                     ` Andrea Righi
2026-04-22  3:36                       ` K Prateek Nayak
2026-04-22 15:29                         ` Andrea Righi
2026-04-21 12:26                   ` Vincent Guittot
2026-04-21 12:33                     ` Andrea Righi
2026-04-17  9:39   ` Vincent Guittot
2026-04-18  6:02     ` Andrea Righi [this message]
2026-04-19 10:20       ` Vincent Guittot
2026-04-03  5:31 ` [PATCH 2/2] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeMeaFiuEUhfJ5wg@gpd4 \
    --to=arighi@nvidia.com \
    --cc=balbirs@nvidia.com \
    --cc=bsegall@google.com \
    --cc=christian.loehle@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=fabecassis@nvidia.com \
    --cc=juri.lelli@redhat.com \
    --cc=kobak@nvidia.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sshegde@linux.ibm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.