From: Andrea Righi <arighi@nvidia.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
Christian Loehle <christian.loehle@arm.com>,
Koba Ko <kobak@nvidia.com>,
Felix Abecassis <fabecassis@nvidia.com>,
Balbir Singh <balbirs@nvidia.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer
Date: Fri, 27 Mar 2026 21:36:04 +0100 [thread overview]
Message-ID: <acbqNOWr37pMC1sG@gpd4> (raw)
In-Reply-To: <e7a24723-4cd0-4223-8dca-64825cf63a5e@amd.com>
On Fri, Mar 27, 2026 at 05:04:23PM +0530, K Prateek Nayak wrote:
> Hello Andrea,
>
> On 3/27/2026 3:14 PM, Andrea Righi wrote:
> > Hi Vincent,
> >
> > On Fri, Mar 27, 2026 at 09:45:56AM +0100, Vincent Guittot wrote:
> >> On Thu, 26 Mar 2026 at 16:12, Andrea Righi <arighi@nvidia.com> wrote:
> >>>
> >>> When choosing which idle housekeeping CPU runs the idle load balancer,
> >>> prefer one on a fully idle core if SMT is active, so balance can migrate
> >>> work onto a CPU that still offers full effective capacity. Fall back to
> >>> any idle candidate if none qualify.
> >>
> >> This one isn't straightforward for me. The ilb cpu will check all
> >> other idle CPUs 1st and finish with itself so unless the next CPU in
> >> the idle_cpus_mask is a sibling, this should not make a difference
> >>
> >> Did you see any perf diff ?
> >
> > I actually see a benefit, in particular, with the first patch applied I see
> > a ~1.76x speedup, if I add this on top I get ~1.9x speedup vs baseline,
> > which seems pretty consistent across runs (definitely not in error range).
> >
> > The intention with this change was to minimize SMT noise running the ILB
> > code on a fully-idle core when possible, but I also didn't expect to see
> > such big difference.
> >
> > I'll investigate more to better understand what's happening.
>
> Interesting! Either this "CPU-intensive workload" hates SMT turning
> busy (but to an extent where performance drops visibly?) or ILB
> keeps getting interrupted on an SMT sibling that is burdened by
> interrupts leading to slower balance (or IRQs driving the workload
> being delayed by rq_lock disabling them)
>
> Would it be possible to share the total SCHED_SOFTIRQ time, load
> balancing attempts, and utlization with and without the patch? I too
> will go queue up some runs to see if this makes a difference.
Quick update: I also tried this on a Vera machine with a firmware that
exposes the same capacity for all the CPUs (so with SD_ASYM_CPUCAPACITY
disabled and SMT still on of course) and I see similar performance
benefits.
Looking at SCHED_SOFTIRQ and load balancing attempts I don't see big
differences, all within error range (results produced using a vibe-coded
python script):
- baseline (stats/sec):
SCHED softirq count : 2,625
LB attempts (total) : 69,832
Per-domain breakdown:
domain0 (SMT):
lb_count (total) : 68,482 [balanced=68,472 failed=9]
CPU_IDLE : lb=1,408 imb(load=0 util=0 task=0 misfit=0) gained=0
CPU_NEWLY_IDLE : lb=67,041 imb(load=0 util=0 task=7 misfit=0) gained=0
CPU_NOT_IDLE : lb=33 imb(load=0 util=0 task=2 misfit=0) gained=0
domain1 (MC):
lb_count (total) : 902 [balanced=900 failed=2]
CPU_NEWLY_IDLE : lb=869 imb(load=0 util=0 task=0 misfit=0) gained=0
CPU_NOT_IDLE : lb=33 imb(load=0 util=0 task=2 misfit=0) gained=0
domain2 (NUMA):
lb_count (total) : 448 [balanced=441 failed=7]
CPU_NEWLY_IDLE : lb=415 imb(load=0 util=0 task=44 misfit=0) gained=0
CPU_NOT_IDLE : lb=33 imb(load=0 util=0 task=268 misfit=0) gained=0
- with ilb-smt (stats/sec):
SCHED softirq count : 2,671
LB attempts (total) : 68,572
Per-domain breakdown:
domain0 (SMT):
lb_count (total) : 67,239 [balanced=67,197 failed=41]
CPU_IDLE : lb=1,419 imb(load=0 util=0 task=0 misfit=0) gained=0
CPU_NEWLY_IDLE : lb=65,783 imb(load=0 util=0 task=42 misfit=0) gained=1
CPU_NOT_IDLE : lb=37 imb(load=0 util=0 task=0 misfit=0) gained=0
domain1 (MC):
lb_count (total) : 833 [balanced=833 failed=0]
CPU_NEWLY_IDLE : lb=796 imb(load=0 util=0 task=0 misfit=0) gained=0
CPU_NOT_IDLE : lb=37 imb(load=0 util=0 task=0 misfit=0) gained=0
domain2 (NUMA):
lb_count (total) : 500 [balanced=488 failed=12]
CPU_NEWLY_IDLE : lb=463 imb(load=0 util=0 task=44 misfit=0) gained=0
CPU_NOT_IDLE : lb=37 imb(load=0 util=0 task=627 misfit=0) gained=0
I'll add more direct instrumentation to check what ILB is doing
differently...
And I'll also repeat the test and collect the same metrics on the Vera
machine with the firmware that exposes different CPU capacities as soon as
I get access again.
Thanks,
-Andrea
next prev parent reply other threads:[~2026-03-27 20:36 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-26 15:02 [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity Andrea Righi
2026-03-26 15:02 ` [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Andrea Righi
2026-03-27 8:09 ` Vincent Guittot
2026-03-27 9:46 ` Andrea Righi
2026-03-27 10:44 ` K Prateek Nayak
2026-03-27 10:58 ` Andrea Righi
2026-03-27 11:14 ` K Prateek Nayak
2026-03-27 16:39 ` Andrea Righi
2026-03-26 15:02 ` [PATCH 2/4] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity Andrea Righi
2026-03-26 15:02 ` [PATCH 3/4] sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems Andrea Righi
2026-03-27 8:09 ` Vincent Guittot
2026-03-27 9:45 ` Andrea Righi
2026-03-26 15:02 ` [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer Andrea Righi
2026-03-27 8:45 ` Vincent Guittot
2026-03-27 9:44 ` Andrea Righi
2026-03-27 11:34 ` K Prateek Nayak
2026-03-27 20:36 ` Andrea Righi [this message]
2026-03-27 22:45 ` Andrea Righi
2026-03-27 13:44 ` Shrikanth Hegde
2026-03-26 16:33 ` [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity Christian Loehle
2026-03-27 6:52 ` Andrea Righi
2026-03-27 16:31 ` Shrikanth Hegde
2026-03-27 17:08 ` Andrea Righi
2026-03-28 6:51 ` Shrikanth Hegde
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acbqNOWr37pMC1sG@gpd4 \
--to=arighi@nvidia.com \
--cc=balbirs@nvidia.com \
--cc=bsegall@google.com \
--cc=christian.loehle@arm.com \
--cc=dietmar.eggemann@arm.com \
--cc=fabecassis@nvidia.com \
--cc=juri.lelli@redhat.com \
--cc=kobak@nvidia.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox