Re: [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Andrea Righi <arighi@nvidia.com>
To: Balbir Singh <balbirs@nvidia.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Christian Loehle <christian.loehle@arm.com>,
	Koba Ko <kobak@nvidia.com>,
	Felix Abecassis <fabecassis@nvidia.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity
Date: Sat, 28 Mar 2026 23:50:09 +0100	[thread overview]
Message-ID: <achbIaTOR9V5LW2F@gpd4> (raw)
In-Reply-To: <a3bce886-b4bb-4f5e-af04-930934fef50d@nvidia.com>

Hi Balbir,

On Sun, Mar 29, 2026 at 12:03:19AM +1100, Balbir Singh wrote:
> On 3/27/26 02:02, Andrea Righi wrote:
> > This series attempts to improve SD_ASYM_CPUCAPACITY scheduling by
> > introducing SMT awareness.
> > 
> > = Problem =
> > 
> > Nominal per-logical-CPU capacity can overstate usable compute when an SMT
> > sibling is busy, because the physical core doesn't deliver its full nominal
> > capacity. So, several SD_ASYM_CPUCAPACITY paths may pick high capacity CPUs
> > that are not actually good destinations.
> > 
> > = Proposed Solution =
> > 
> > This patch set aligns those paths with a simple rule already used
> > elsewhere: when SMT is active, prefer fully idle cores and avoid treating
> > partially idle SMT siblings as full-capacity targets where that would
> > mislead load balance.
> 
> In kernel/sched/topology.c
> 
> 	/* Don't attempt to spread across CPUs of different capacities. */
> 	if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child)
> 		sd->child->flags &= ~SD_PREFER_SIBLING;
> 
> Should handle the selection, but I guess this does not work for SMT level sd's?

IIUC, SD_PREFER_SIBLING steers load balance toward sibling_imbalance()
(spread runnables across child/sibling domains), it doesn't encode the
fully-idle core first logic. In practice it doesn't give us SMT-aware
destination choice when a sibling is busy and this series is trying to
cover that gap in the palcement path.

BTW, on Vera the hierarchy is SMT -> MC -> NUMA:

root@localhost:~# grep . /sys/kernel/debug/sched/domains/cpu0/domain*/flags
/sys/kernel/debug/sched/domains/cpu0/domain0/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SHARE_CPUCAPACITY SD_SHARE_LLC SD_PREFER_SIBLING
/sys/kernel/debug/sched/domains/cpu0/domain1/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_ASYM_CPUCAPACITY SD_SHARE_LLC
/sys/kernel/debug/sched/domains/cpu0/domain2/flags:SD_BALANCE_NEWIDLE SD_ASYM_CPUCAPACITY SD_ASYM_CPUCAPACITY_FULL SD_SERIALIZE SD_NUMA

And domain1/groups_flags (child / SMT flags on the sched groups used at the
MC level) still has SD_PREFER_SIBLING together with SD_SHARE_CPUCAPACITY.

root@localhost:~# cat /sys/kernel/debug/sched/domains/cpu0/domain1/groups_flags
SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SHARE_CPUCAPACITY SD_SHARE_LLC SD_PREFER_SIBLING

So, prefer-sibling is still in play for SMT (including via MC
groups_flags). On machines where asymmetry attaches immediately above SMT,
topology may strip that flag and reduce this branch of behavior, but
explicit SMT-aware placement still matters.

> > 
> > Patch set summary:
> > 
> >  - [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
> > 
> >    Prefer fully-idle SMT cores in asym-capacity idle selection. In the
> >    wakeup fast path, extend select_idle_capacity() / asym_fits_cpu() so
> >    idle selection can prefer CPUs on fully idle cores, with a safe fallback.
> > 
> >  - [PATCH 2/4] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity
> > 
> >    Reject misfit pulls onto busy SMT siblings on SD_ASYM_CPUCAPACITY.
> >    Provided for consistency with PATCH 1/4.
> > 
> >  - [PATCH 3/4] sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems
> > 
> >    Enable EAS with SD_ASYM_CPUCAPACITY and SMT. Also provided for
> >    consistency with PATCH 1/4. I've also tested with/without
> >    /proc/sys/kernel/sched_energy_aware enabled (same platform) and haven't
> >    noticed any regression.
> > 
> >  - [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer
> > 
> >    When choosing the housekeeping CPU that runs the idle load balancer,
> >    prefer an idle CPU on a fully idle core so migrated work lands where
> >    effective capacity is available.
> > 
> >    The change is still consistent with the same "avoid CPUs with busy
> >    sibling" logic and it shows some benefits on Vera, but could have
> >    negative impact on other systems, I'm including it for completeness
> >    (feedback is appreciated).
> > 
> > This patch set has been tested on the new NVIDIA Vera Rubin platform, where
> > SMT is enabled and the firmware exposes small frequency variations (+/-~5%)
> > as differences in CPU capacity, resulting in SD_ASYM_CPUCAPACITY being set.
> > 
> 
> Are you referring to nominal_freq?
> 

Correct.

Thanks,
-Andrea

next prev parent reply	other threads:[~2026-03-28 22:50 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-26 15:02 [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity Andrea Righi
2026-03-26 15:02 ` [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Andrea Righi
2026-03-27  8:09   ` Vincent Guittot
2026-03-27  9:46     ` Andrea Righi
2026-03-27 10:44   ` K Prateek Nayak
2026-03-27 10:58     ` Andrea Righi
2026-03-27 11:14       ` K Prateek Nayak
2026-03-27 16:39         ` Andrea Righi
2026-03-26 15:02 ` [PATCH 2/4] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity Andrea Righi
2026-03-26 15:02 ` [PATCH 3/4] sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems Andrea Righi
2026-03-27  8:09   ` Vincent Guittot
2026-03-27  9:45     ` Andrea Righi
2026-03-26 15:02 ` [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer Andrea Righi
2026-03-27  8:45   ` Vincent Guittot
2026-03-27  9:44     ` Andrea Righi
2026-03-27 11:34       ` K Prateek Nayak
2026-03-27 20:36         ` Andrea Righi
2026-03-27 22:45           ` Andrea Righi
2026-03-27 13:44   ` Shrikanth Hegde
2026-03-26 16:33 ` [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity Christian Loehle
2026-03-27  6:52   ` Andrea Righi
2026-03-27 16:31 ` Shrikanth Hegde
2026-03-27 17:08   ` Andrea Righi
2026-03-28  6:51     ` Shrikanth Hegde
2026-03-28 13:03 ` Balbir Singh
2026-03-28 22:50   ` Andrea Righi [this message]
2026-03-29 21:36     ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=achbIaTOR9V5LW2F@gpd4 \
    --to=arighi@nvidia.com \
    --cc=balbirs@nvidia.com \
    --cc=bsegall@google.com \
    --cc=christian.loehle@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=fabecassis@nvidia.com \
    --cc=juri.lelli@redhat.com \
    --cc=kobak@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox