public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Christian Loehle <christian.loehle@arm.com>,
	Koba Ko <kobak@nvidia.com>,
	Felix Abecassis <fabecassis@nvidia.com>,
	Balbir Singh <balbirs@nvidia.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer
Date: Fri, 27 Mar 2026 21:36:04 +0100	[thread overview]
Message-ID: <acbqNOWr37pMC1sG@gpd4> (raw)
In-Reply-To: <e7a24723-4cd0-4223-8dca-64825cf63a5e@amd.com>

On Fri, Mar 27, 2026 at 05:04:23PM +0530, K Prateek Nayak wrote:
> Hello Andrea,
> 
> On 3/27/2026 3:14 PM, Andrea Righi wrote:
> > Hi Vincent,
> > 
> > On Fri, Mar 27, 2026 at 09:45:56AM +0100, Vincent Guittot wrote:
> >> On Thu, 26 Mar 2026 at 16:12, Andrea Righi <arighi@nvidia.com> wrote:
> >>>
> >>> When choosing which idle housekeeping CPU runs the idle load balancer,
> >>> prefer one on a fully idle core if SMT is active, so balance can migrate
> >>> work onto a CPU that still offers full effective capacity. Fall back to
> >>> any idle candidate if none qualify.
> >>
> >> This one isn't straightforward for me. The ilb cpu will check all
> >> other idle CPUs 1st and finish with itself so unless the next CPU in
> >> the idle_cpus_mask is a sibling, this should not make a difference
> >>
> >> Did you see any perf diff ?
> > 
> > I actually see a benefit, in particular, with the first patch applied I see
> > a ~1.76x speedup, if I add this on top I get ~1.9x speedup vs baseline,
> > which seems pretty consistent across runs (definitely not in error range).
> > 
> > The intention with this change was to minimize SMT noise running the ILB
> > code on a fully-idle core when possible, but I also didn't expect to see
> > such big difference.
> > 
> > I'll investigate more to better understand what's happening.
> 
> Interesting! Either this "CPU-intensive workload" hates SMT turning
> busy (but to an extent where performance drops visibly?) or ILB
> keeps getting interrupted on an SMT sibling that is burdened by
> interrupts leading to slower balance (or IRQs driving the workload
> being delayed by rq_lock disabling them)
> 
> Would it be possible to share the total SCHED_SOFTIRQ time, load
> balancing attempts, and utlization with and without the patch? I too
> will go queue up some runs to see if this makes a difference.

Quick update: I also tried this on a Vera machine with a firmware that
exposes the same capacity for all the CPUs (so with SD_ASYM_CPUCAPACITY
disabled and SMT still on of course) and I see similar performance
benefits.

Looking at SCHED_SOFTIRQ and load balancing attempts I don't see big
differences, all within error range (results produced using a vibe-coded
python script):

 - baseline (stats/sec):

  SCHED softirq count  :        2,625
  LB attempts (total)  :       69,832

  Per-domain breakdown:
    domain0 (SMT):
      lb_count    (total)  :       68,482  [balanced=68,472  failed=9]
        CPU_IDLE         : lb=1,408  imb(load=0 util=0 task=0 misfit=0)  gained=0
        CPU_NEWLY_IDLE   : lb=67,041  imb(load=0 util=0 task=7 misfit=0)  gained=0
        CPU_NOT_IDLE     : lb=33  imb(load=0 util=0 task=2 misfit=0)  gained=0
    domain1 (MC):
      lb_count    (total)  :          902  [balanced=900  failed=2]
        CPU_NEWLY_IDLE   : lb=869  imb(load=0 util=0 task=0 misfit=0)  gained=0
        CPU_NOT_IDLE     : lb=33  imb(load=0 util=0 task=2 misfit=0)  gained=0
    domain2 (NUMA):
      lb_count    (total)  :          448  [balanced=441  failed=7]
        CPU_NEWLY_IDLE   : lb=415  imb(load=0 util=0 task=44 misfit=0)  gained=0
        CPU_NOT_IDLE     : lb=33  imb(load=0 util=0 task=268 misfit=0)  gained=0

 - with ilb-smt (stats/sec):

  SCHED softirq count  :        2,671
  LB attempts (total)  :       68,572

  Per-domain breakdown:
    domain0 (SMT):
      lb_count    (total)  :       67,239  [balanced=67,197  failed=41]
        CPU_IDLE         : lb=1,419  imb(load=0 util=0 task=0 misfit=0)  gained=0
        CPU_NEWLY_IDLE   : lb=65,783  imb(load=0 util=0 task=42 misfit=0)  gained=1
        CPU_NOT_IDLE     : lb=37  imb(load=0 util=0 task=0 misfit=0)  gained=0
    domain1 (MC):
      lb_count    (total)  :          833  [balanced=833  failed=0]
        CPU_NEWLY_IDLE   : lb=796  imb(load=0 util=0 task=0 misfit=0)  gained=0
        CPU_NOT_IDLE     : lb=37  imb(load=0 util=0 task=0 misfit=0)  gained=0
    domain2 (NUMA):
      lb_count    (total)  :          500  [balanced=488  failed=12]
        CPU_NEWLY_IDLE   : lb=463  imb(load=0 util=0 task=44 misfit=0)  gained=0
        CPU_NOT_IDLE     : lb=37  imb(load=0 util=0 task=627 misfit=0)  gained=0

I'll add more direct instrumentation to check what ILB is doing
differently...

And I'll also repeat the test and collect the same metrics on the Vera
machine with the firmware that exposes different CPU capacities as soon as
I get access again.

Thanks,
-Andrea

  reply	other threads:[~2026-03-27 20:36 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-26 15:02 [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity Andrea Righi
2026-03-26 15:02 ` [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Andrea Righi
2026-03-27  8:09   ` Vincent Guittot
2026-03-27  9:46     ` Andrea Righi
2026-03-27 10:44   ` K Prateek Nayak
2026-03-27 10:58     ` Andrea Righi
2026-03-27 11:14       ` K Prateek Nayak
2026-03-27 16:39         ` Andrea Righi
2026-03-26 15:02 ` [PATCH 2/4] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity Andrea Righi
2026-03-26 15:02 ` [PATCH 3/4] sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems Andrea Righi
2026-03-27  8:09   ` Vincent Guittot
2026-03-27  9:45     ` Andrea Righi
2026-03-26 15:02 ` [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer Andrea Righi
2026-03-27  8:45   ` Vincent Guittot
2026-03-27  9:44     ` Andrea Righi
2026-03-27 11:34       ` K Prateek Nayak
2026-03-27 20:36         ` Andrea Righi [this message]
2026-03-27 22:45           ` Andrea Righi
2026-03-27 13:44   ` Shrikanth Hegde
2026-03-26 16:33 ` [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity Christian Loehle
2026-03-27  6:52   ` Andrea Righi
2026-03-27 16:31 ` Shrikanth Hegde
2026-03-27 17:08   ` Andrea Righi
2026-03-28  6:51     ` Shrikanth Hegde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acbqNOWr37pMC1sG@gpd4 \
    --to=arighi@nvidia.com \
    --cc=balbirs@nvidia.com \
    --cc=bsegall@google.com \
    --cc=christian.loehle@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=fabecassis@nvidia.com \
    --cc=juri.lelli@redhat.com \
    --cc=kobak@nvidia.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox