All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Christian Loehle <christian.loehle@arm.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	linux-kernel@vger.kernel.org,
	Felix Abecassis <fabecassis@nvidia.com>
Subject: Re: [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise
Date: Wed, 25 Mar 2026 13:25:37 +0100	[thread overview]
Message-ID: <acPUQUezuBE25PZ-@gpd4> (raw)
In-Reply-To: <1d9b4abf-4b70-4775-92b8-924ced316578@arm.com>

On Wed, Mar 25, 2026 at 12:16:59PM +0100, Dietmar Eggemann wrote:
> On 25.03.26 10:32, Andrea Righi wrote:
> > On Wed, Mar 25, 2026 at 10:23:09AM +0100, Dietmar Eggemann wrote:
> >> On 24.03.26 12:01, Andrea Righi wrote:
> >>> Hi Dietmar,
> >>>
> >>> On Tue, Mar 24, 2026 at 11:29:24AM +0100, Dietmar Eggemann wrote:
> >>>> On 24.03.26 10:46, Andrea Righi wrote:
> >>>>> Hi Christian,
> >>>>>
> >>>>> On Tue, Mar 24, 2026 at 08:08:22AM +0000, Christian Loehle wrote:
> >>>>>> On 3/24/26 07:55, Christian Loehle wrote:
> >>>>>>> On 3/24/26 07:39, Vincent Guittot wrote:
> >>>>>>>> On Tue, 24 Mar 2026 at 01:55, Andrea Righi <arighi@nvidia.com> wrote:
> >>
> >> [...]
> >>
> >>>> The first time we observed this on NVIDIA Grace, we wondered whether
> >>>> there might be functionality outside the task scheduler that makes use
> >>>> of these slightly heterogeneous CPU capacity values from CPPC—and
> >>>> whether the dependency on task scheduling was simply an overlooked
> >>>> phenomenon.
> >>>>
> >>>> And then there was DCPerf Mediawiki on 72 CPUs system always scoring
> >>>> better with sched_asym_cpucap_active() = TRUE (mentioned already by
> >>>> Chris L. in:
> >>>> https://lore.kernel.org/r/15ffdeb3-a0f3-4b88-92c0-17ffb03b0574@arm.com
> >>>
> >>> Yeah, I think Chris' asym-packing approach might be the safest thing to do.
> >>>
> >>> At the same time it would be nice to improve asym-capacity to introduce
> >>> some concept of SMT awareness, that was my original attempt with
> >>> https://lore.kernel.org/all/20260318092214.130908-1-arighi@nvidia.com,
> >>> since we may see similar asym-capacity benefits on Vera (that has SMT,
> >>> unlike Grace). What do you think?
> >>
> >> We never found a good way to specify a CPU capacity in the SMT case (EAS
> >> and energy model included). So comparing CPU capacity w/ utilization, CPU
> >> overutilization detection etc. definitions get more blurry.
> > 
> > Hm... so should we just avoid calling select_idle_capacity() when SMT is
> > enabled to prevent waking up tasks on both SMT siblings when there are
> > fully-idle SMT cores?
> 
> Yeah, pretty much. So prefer (2) over (1).
> 
> IMHO, we do have a similar issue here. Can we say that a logical CPU is idle 
> if its SMT sibling isn't? But at least we don't have to use any CPU cap/util
> comparison there.
> 
> select_idle_sibling()
> 
>  8132         if (sched_smt_active()) {
>  8133                 has_idle_core = test_idle_cores(target);
>  8134 
>  8135                 if (!has_idle_core && cpus_share_cache(prev, target)) { <-- (1)
>  8136                         i = select_idle_smt(p, sd, prev);
>  8137                         if ((unsigned int)i < nr_cpumask_bits)
>  8138                                 return i;
>  8139                 }
>  8140         }
>  8141 
>  8142         i = select_idle_cpu(p, sd, has_idle_core, target);              <-- (2a)
>  8143         if ((unsigned)i < nr_cpumask_bits)
>  8144                 return i
> 
> select_idle_cpu()
> 
>  7926         for_each_cpu_wrap(cpu, cpus, target + 1) {
>  7927                 if (has_idle_core) {
>  7928                         i = select_idle_core(p, cpu, cpus, &idle_cpu);  <-- (2b)
>  7929                         if ((unsigned int)i < nr_cpumask_bits)
>  7930                                 return i;
>  7931 
>  7932                 } else {
>  7933                         if (--nr <= 0)
>  7934                                 return -1;
>  7935                         idle_cpu = __select_idle_cpu(cpu, p);
>  7936                         if ((unsigned int)idle_cpu < nr_cpumask_bits)
>  7937                                 break;
>  7938                 }
>  7939         }

Exactly, we already prefer fully-idle cores over partially-idle cores with
asym-capacity disabled, but in that case the idle selection logic stays in
a world of idle bits, without cap/util math, so it's a bit easier. And it's
probably fine also when we have both asym-capacity + SMT (at least it seems
better than what we have now, ignoring the SMT part).

Essentially having somethig like the following (which already gives better
performance on Vera):

 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d57c02e82f3a1..534634f813fca 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8086,7 +8086,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	 * For asymmetric CPU capacity systems, our domain of interest is
 	 * sd_asym_cpucapacity rather than sd_llc.
 	 */
-	if (sched_asym_cpucap_active()) {
+	if (sched_asym_cpucap_active() && !sched_smt_active()) {
 		sd = rcu_dereference_all(per_cpu(sd_asym_cpucapacity, target));
 		/*
 		 * On an asymmetric CPU capacity system where an exclusive

Thanks,
-Andrea

  reply	other threads:[~2026-03-25 12:25 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24  0:55 [PATCH] sched/topology: Avoid spurious asymmetry from CPU capacity noise Andrea Righi
2026-03-24  7:39 ` Vincent Guittot
2026-03-24  7:55   ` Christian Loehle
2026-03-24  8:08     ` Christian Loehle
2026-03-24  9:46       ` Andrea Righi
2026-03-24 10:29         ` Dietmar Eggemann
2026-03-24 11:01           ` Andrea Righi
2026-03-25  9:23             ` Dietmar Eggemann
2026-03-25  9:32               ` Andrea Righi
2026-03-25 11:16                 ` Dietmar Eggemann
2026-03-25 12:25                   ` Andrea Righi [this message]
2026-03-25 15:26                     ` Dietmar Eggemann
2026-03-25 16:50                       ` Andrea Righi
2026-03-25 12:48                 ` Phil Auld
2026-03-24  9:39   ` Andrea Righi
2026-03-25  3:30     ` Koba Ko
2026-03-25 12:29       ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acPUQUezuBE25PZ-@gpd4 \
    --to=arighi@nvidia.com \
    --cc=bsegall@google.com \
    --cc=christian.loehle@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=fabecassis@nvidia.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.