All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Christian Loehle <christian.loehle@arm.com>,
	Koba Ko <kobak@nvidia.com>,
	Felix Abecassis <fabecassis@nvidia.com>,
	Balbir Singh <balbirs@nvidia.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity
Date: Fri, 3 Apr 2026 16:45:50 +0200	[thread overview]
Message-ID: <ac_Sntcnqm8DrDwk@gpd4> (raw)
In-Reply-To: <9886a7d3-fb54-4637-8b4c-1f35272f4882@arm.com>

Hi Dietmar,

On Fri, Apr 03, 2026 at 01:47:17PM +0200, Dietmar Eggemann wrote:
> On 01.04.26 15:12, Andrea Righi wrote:
> > On Wed, Apr 01, 2026 at 02:42:34PM +0200, Andrea Righi wrote:
> >> On Wed, Apr 01, 2026 at 02:08:27PM +0200, Vincent Guittot wrote:
> >>> On Wed, 1 Apr 2026 at 13:57, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> >>>>
> >>>> On 31.03.26 11:04, Andrea Righi wrote:
> >>>>> Hi Dietmar,
> >>>>>
> >>>>> On Tue, Mar 31, 2026 at 12:30:55AM +0200, Dietmar Eggemann wrote:
> >>>>>> Hi Andrea,
> >>>>>>
> >>>>>> On 26.03.26 16:02, Andrea Righi wrote:
> 
> [...]
> 
> > Just finished running some tests with DCPerf MediaWiki on Vera as well
> > (sorry, it took a while, I did mutliple runs to rule out potential flukes):
> > 
> >  +---------------------------------+--------+--------+--------+--------+
> >  | Configuration                   |   rps  |  p50   |  p95   |  p99   |
> 
> Just to make sure: rps -> "Wrk RPS" and pXX -> "Nginx PXX time" in
> run_details.json ?

Correct, rps == "Wrk RPS", p50 == "Nginx P50 time", etc.

> 
> >  +---------------------------------+--------+--------+--------+--------+
> >  | NO ASYM + SIS_UTIL              |  8113  |  0.067 |  0.184 |  0.225 |
> >  | NO ASYM + NO_SIS_UTIL           |  8093  |  0.068 |  0.184 |  0.223 |
> 
> Thanks for the test results! Ok, so SIS_UTIL doesn't seem to play a role
> here. This workload should have #runnable tasks > #CPUs.
> 
> Still trying to grasp why 'sic() + smt' is better than 'sis() + smt' for
> NVBLAS?

Same...

> 
> There is a subtle differences in the start cpu for iterating:
> 
> sis(): for_each_cpu_wrap(cpu, cpus, target + 1)
>                                            ^^^
> sic(): for_each_cpu_wrap(cpu, cpus, target)
> 
> Not sure if this makes all the difference?

I quickly matching the wrap start (both ways), but still doesn't make any
difference: sic() is still slightly better than sis(). So the performance gap
doesn't seem to be in the wrap origin.

> 
> >  |                                 |        |        |        |        |
> >  | ASYM + SMT + SIS_UTIL           |  8129  |  0.076 |  0.149 |  0.188 |
> >  | ASYM + SMT + NO_SIS_UTIL        |  8138  |  0.076 |  0.148 |  0.186 |
> 
> This should be the same, right? SIS_UTIL is only for sis() so when using
> sic() this shouldn't differ. Or did you code SIS_UTIL into sic()?

No, you're right, it should be the same, SIS_UTIL is irrelevant here.

> 
> >  |                                 |        |        |        |        |
> >  | ASYM + ILB SMT + SIS_UTIL       |  8189  |  0.075 |  0.150 |  0.189 |
> >  | ASYM + SMT + ILB SMT + SIS_UTIL |  8185  |  0.076 |  0.151 |  0.190 |
> >  +---------------------------------+--------+--------+--------+--------+
> 
> So with '#tasks > #CPUs' smt doesn't make a difference.

Correct. At saturation there's no benefit with the SMT awareness, which makes
sense, all CPUs/siblings are busy, so there's no preferred fully-idle SMT core
to prioritize.

> 
> > Looking at the data:
> >  - SIS_UTIL doesn't seem relevant in this case (differences are within
> >    error range),
> >  - ASYM_CPU_CAPACITY seems to provide a small throughput gain, but it seems
> >    more beneficial for tail latency reduction,
> >  - the ILB SMT patch seems to slightly improve throughput, but the biggest
> >    benefit is still coming from ASYM_CPU_CAPACITY.
> 
> > Overall, also in this case it seems beneficial to use ASYM_CPU_CAPACITY
> > rather than equalizing the capacities.
> > 
> > That said, I'm still not sure why ASYM is helping. The frequency asymmetry
> 
> OK, I still would be more comfortable with this when I would now why
> this is :-)

Working on this. :)

Thanks,
-Andrea

  reply	other threads:[~2026-04-03 14:46 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-26 15:02 [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity Andrea Righi
2026-03-26 15:02 ` [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Andrea Righi
2026-03-27  8:09   ` Vincent Guittot
2026-03-27  9:46     ` Andrea Righi
2026-03-27 10:44   ` K Prateek Nayak
2026-03-27 10:58     ` Andrea Righi
2026-03-27 11:14       ` K Prateek Nayak
2026-03-27 16:39         ` Andrea Righi
2026-03-30 10:17           ` K Prateek Nayak
2026-03-30 13:07             ` Vincent Guittot
2026-03-30 13:22             ` Andrea Righi
2026-03-30 13:46               ` Andrea Righi
2026-03-26 15:02 ` [PATCH 2/4] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity Andrea Righi
2026-03-26 15:02 ` [PATCH 3/4] sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems Andrea Righi
2026-03-27  8:09   ` Vincent Guittot
2026-03-27  9:45     ` Andrea Righi
2026-03-26 15:02 ` [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer Andrea Righi
2026-03-27  8:45   ` Vincent Guittot
2026-03-27  9:44     ` Andrea Righi
2026-03-27 11:34       ` K Prateek Nayak
2026-03-27 20:36         ` Andrea Righi
2026-03-27 22:45           ` Andrea Righi
2026-03-30 17:29         ` Andrea Righi
2026-03-27 13:44   ` Shrikanth Hegde
2026-03-26 16:33 ` [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity Christian Loehle
2026-03-27  6:52   ` Andrea Righi
2026-03-27 16:31 ` Shrikanth Hegde
2026-03-27 17:08   ` Andrea Righi
2026-03-28  6:51     ` Shrikanth Hegde
2026-03-28 13:03 ` Balbir Singh
2026-03-28 22:50   ` Andrea Righi
2026-03-29 21:36     ` Balbir Singh
2026-03-30 22:30 ` Dietmar Eggemann
2026-03-31  9:04   ` Andrea Righi
2026-04-01 11:57     ` Dietmar Eggemann
2026-04-01 12:08       ` Vincent Guittot
2026-04-01 12:42         ` Andrea Righi
2026-04-01 13:12           ` Andrea Righi
2026-04-03 11:47             ` Dietmar Eggemann
2026-04-03 14:45               ` Andrea Righi [this message]
2026-04-03 20:44                 ` Andrea Righi
2026-04-07 11:50                   ` Dietmar Eggemann
2026-04-07 19:16                     ` Andrea Righi
2026-04-03 11:47           ` Dietmar Eggemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac_Sntcnqm8DrDwk@gpd4 \
    --to=arighi@nvidia.com \
    --cc=balbirs@nvidia.com \
    --cc=bsegall@google.com \
    --cc=christian.loehle@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=fabecassis@nvidia.com \
    --cc=juri.lelli@redhat.com \
    --cc=kobak@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.