Re: [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrea Righi <arighi@nvidia.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Christian Loehle <christian.loehle@arm.com>,
	Koba Ko <kobak@nvidia.com>,
	Felix Abecassis <fabecassis@nvidia.com>,
	Balbir Singh <balbirs@nvidia.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
Date: Fri, 27 Mar 2026 10:46:30 +0100	[thread overview]
Message-ID: <acZR9i_9ezzKWUmT@gpd4> (raw)
In-Reply-To: <CAKfTPtCW29-TVzEB+274_+jM9Aiy76dAf4MnKWjZgA=kNu+6pg@mail.gmail.com>

Hi Vincent,

On Fri, Mar 27, 2026 at 09:09:24AM +0100, Vincent Guittot wrote:
> On Thu, 26 Mar 2026 at 16:12, Andrea Righi <arighi@nvidia.com> wrote:
> >
> > On systems with asymmetric CPU capacity (e.g., ACPI/CPPC reporting
> > different per-core frequencies), the wakeup path uses
> > select_idle_capacity() and prioritizes idle CPUs with higher capacity
> > for better task placement. However, when those CPUs belong to SMT cores,
> > their effective capacity can be much lower than the nominal capacity
> > when the sibling thread is busy: SMT siblings compete for shared
> > resources, so a "high capacity" CPU that is idle but whose sibling is
> > busy does not deliver its full capacity. This effective capacity
> > reduction cannot be modeled by the static capacity value alone.
> >
> > Introduce SMT awareness in the asym-capacity idle selection policy: when
> > SMT is active prefer fully-idle SMT cores over partially-idle ones. A
> > two-phase selection first tries only CPUs on fully idle cores, then
> > falls back to any idle CPU if none fit.
> >
> > Prioritizing fully-idle SMT cores yields better task placement because
> > the effective capacity of partially-idle SMT cores is reduced; always
> > preferring them when available leads to more accurate capacity usage on
> > task wakeup.
> >
> > On an SMT system with asymmetric CPU capacities, SMT-aware idle
> > selection has been shown to improve throughput by around 15-18% for
> > CPU-bound workloads, running an amount of tasks equal to the amount of
> > SMT cores.
> >
> > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Cc: Christian Loehle <christian.loehle@arm.com>
> > Cc: Koba Ko <kobak@nvidia.com>
> > Reported-by: Felix Abecassis <fabecassis@nvidia.com>
> > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> > ---
> >  kernel/sched/fair.c | 86 +++++++++++++++++++++++++++++++++++++++------
> >  1 file changed, 75 insertions(+), 11 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index d57c02e82f3a1..9a95628669851 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7940,14 +7940,21 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> >   * Scan the asym_capacity domain for idle CPUs; pick the first idle one on which
> >   * the task fits. If no CPU is big enough, but there are idle ones, try to
> >   * maximize capacity.
> > + *
> > + * When @prefer_idle_cores is true (asym + SMT and idle cores exist), prefer
> > + * CPUs on fully-idle cores over partially-idle ones in a single pass: track
> > + * the best candidate among idle-core CPUs and the best among any idle CPU,
> > + * then return the idle-core candidate if found, else the best any-idle.
> >   */
> >  static int
> > -select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> > +select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target,
> > +                    bool prefer_idle_cores)
> >  {
> > -       unsigned long task_util, util_min, util_max, best_cap = 0;
> > -       int fits, best_fits = 0;
> > -       int cpu, best_cpu = -1;
> > +       unsigned long task_util, util_min, util_max, best_cap = 0, best_cap_core = 0;
> > +       int fits, best_fits = 0, best_fits_core = 0;
> > +       int cpu, best_cpu = -1, best_cpu_core = -1;
> >         struct cpumask *cpus;
> > +       bool on_idle_core;
> >
> >         cpus = this_cpu_cpumask_var_ptr(select_rq_mask);
> >         cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > @@ -7962,16 +7969,58 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> >                 if (!choose_idle_cpu(cpu, p))
> >                         continue;
> >
> > +               on_idle_core = is_core_idle(cpu);
> > +               if (prefer_idle_cores && !on_idle_core) {
> > +                       /* Track best among any idle CPU for fallback */
> > +                       fits = util_fits_cpu(task_util, util_min, util_max, cpu);
> 
> fits = util_fits_cpu(task_util, util_min, util_max, cpu); is always
> called so call it once above this if condition
> 
> this will help factorize the selection of best_cpu and best_cpu_core

Makes sense.

> 
> > +                       if (fits > 0) {
> > +                               /*
> > +                                * Full fit: strictly better than fits 0 / -1;
> > +                                * among several, prefer higher capacity.
> > +                                */
> > +                               if (best_cpu < 0 || best_fits <= 0 ||
> > +                                   (best_fits > 0 && cpu_cap > best_cap)) {
> > +                                       best_cap = cpu_cap;
> > +                                       best_cpu = cpu;
> > +                                       best_fits = fits;
> > +                               }
> > +                               continue;
> > +                       }
> > +                       if (best_fits > 0)
> > +                               continue;
> > +                       if (fits < 0)
> > +                               cpu_cap = get_actual_cpu_capacity(cpu);
> > +                       if ((fits < best_fits) ||
> > +                           ((fits == best_fits) && (cpu_cap > best_cap))) {
> > +                               best_cap = cpu_cap;
> > +                               best_cpu = cpu;
> > +                               best_fits = fits;
> > +                       }
> > +                       continue;
> > +               }
> > +
> >                 fits = util_fits_cpu(task_util, util_min, util_max, cpu);
> >
> >                 /* This CPU fits with all requirements */
> > -               if (fits > 0)
> > -                       return cpu;
> > +               if (fits > 0) {
> > +                       if (prefer_idle_cores && on_idle_core)
> > +                               return cpu;
> > +                       if (!prefer_idle_cores)
> > +                               return cpu;
> > +                       /*
> > +                        * Prefer idle cores: record and keep looking for
> > +                        * idle-core fit.
> > +                        */
> > +                       best_cap = cpu_cap;
> > +                       best_cpu = cpu;
> > +                       best_fits = fits;
> > +                       continue;
> > +               }
> >                 /*
> >                  * Only the min performance hint (i.e. uclamp_min) doesn't fit.
> >                  * Look for the CPU with best capacity.
> >                  */
> > -               else if (fits < 0)
> > +               if (fits < 0)
> >                         cpu_cap = get_actual_cpu_capacity(cpu);
> >
> >                 /*
> > @@ -7984,8 +8033,17 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target)
> >                         best_cpu = cpu;
> >                         best_fits = fits;
> >                 }
> > +               if (prefer_idle_cores && on_idle_core &&
> > +                   ((fits < best_fits_core) ||
> > +                    ((fits == best_fits_core) && (cpu_cap > best_cap_core)))) {
> > +                       best_cap_core = cpu_cap;
> > +                       best_cpu_core = cpu;
> > +                       best_fits_core = fits;
> > +               }
> >         }
> >
> > +       if (prefer_idle_cores && best_cpu_core >= 0)
> > +               return best_cpu_core;
> >         return best_cpu;
> >  }
> >
> > @@ -7994,12 +8052,17 @@ static inline bool asym_fits_cpu(unsigned long util,
> >                                  unsigned long util_max,
> >                                  int cpu)
> >  {
> > -       if (sched_asym_cpucap_active())
> > +       if (sched_asym_cpucap_active()) {
> >                 /*
> >                  * Return true only if the cpu fully fits the task requirements
> >                  * which include the utilization and the performance hints.
> > +                *
> > +                * When SMT is active, also require that the core has no busy
> > +                * siblings.
> >                  */
> > -               return (util_fits_cpu(util, util_min, util_max, cpu) > 0);
> > +               return (!sched_smt_active() || is_core_idle(cpu)) &&
> > +                      (util_fits_cpu(util, util_min, util_max, cpu) > 0);
> > +       }
> >
> >         return true;
> >  }
> > @@ -8097,8 +8160,9 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> >                  * capacity path.
> >                  */
> >                 if (sd) {
> > -                       i = select_idle_capacity(p, sd, target);
> > -                       return ((unsigned)i < nr_cpumask_bits) ? i : target;
> > +                       i = select_idle_capacity(p, sd, target,
> > +                               sched_smt_active() && test_idle_cores(target));
> 
> Move "sched_smt_active() && test_idle_cores(target)" inside
> select_idle_capacity(). I don't see the benefit of making it a
> parameter
> or use has_idle_core for the parameter like other smt related function

And also makes sense.

> 
> 
> > +                       return ((unsigned int)i < nr_cpumask_bits) ? i : target;
> >                 }
> >         }
> >
> > --
> > 2.53.0
> >

Thanks,
-Andrea

next prev parent reply	other threads:[~2026-03-27  9:46 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-26 15:02 [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity Andrea Righi
2026-03-26 15:02 ` [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Andrea Righi
2026-03-27  8:09   ` Vincent Guittot
2026-03-27  9:46     ` Andrea Righi [this message]
2026-03-27 10:44   ` K Prateek Nayak
2026-03-27 10:58     ` Andrea Righi
2026-03-27 11:14       ` K Prateek Nayak
2026-03-27 16:39         ` Andrea Righi
2026-03-30 10:17           ` K Prateek Nayak
2026-03-30 13:07             ` Vincent Guittot
2026-03-30 13:22             ` Andrea Righi
2026-03-30 13:46               ` Andrea Righi
2026-03-26 15:02 ` [PATCH 2/4] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity Andrea Righi
2026-03-26 15:02 ` [PATCH 3/4] sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems Andrea Righi
2026-03-27  8:09   ` Vincent Guittot
2026-03-27  9:45     ` Andrea Righi
2026-03-26 15:02 ` [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer Andrea Righi
2026-03-27  8:45   ` Vincent Guittot
2026-03-27  9:44     ` Andrea Righi
2026-03-27 11:34       ` K Prateek Nayak
2026-03-27 20:36         ` Andrea Righi
2026-03-27 22:45           ` Andrea Righi
2026-03-30 17:29         ` Andrea Righi
2026-03-27 13:44   ` Shrikanth Hegde
2026-03-26 16:33 ` [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity Christian Loehle
2026-03-27  6:52   ` Andrea Righi
2026-03-27 16:31 ` Shrikanth Hegde
2026-03-27 17:08   ` Andrea Righi
2026-03-28  6:51     ` Shrikanth Hegde
2026-03-28 13:03 ` Balbir Singh
2026-03-28 22:50   ` Andrea Righi
2026-03-29 21:36     ` Balbir Singh
2026-03-30 22:30 ` Dietmar Eggemann
2026-03-31  9:04   ` Andrea Righi
2026-04-01 11:57     ` Dietmar Eggemann
2026-04-01 12:08       ` Vincent Guittot
2026-04-01 12:42         ` Andrea Righi
2026-04-01 13:12           ` Andrea Righi
2026-04-03 11:47             ` Dietmar Eggemann
2026-04-03 14:45               ` Andrea Righi
2026-04-03 20:44                 ` Andrea Righi
2026-04-07 11:50                   ` Dietmar Eggemann
2026-04-07 19:16                     ` Andrea Righi
2026-04-03 11:47           ` Dietmar Eggemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acZR9i_9ezzKWUmT@gpd4 \
    --to=arighi@nvidia.com \
    --cc=balbirs@nvidia.com \
    --cc=bsegall@google.com \
    --cc=christian.loehle@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=fabecassis@nvidia.com \
    --cc=juri.lelli@redhat.com \
    --cc=kobak@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.