The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Christian Loehle <christian.loehle@arm.com>,
	Phil Auld <pauld@redhat.com>, Koba Ko <kobak@nvidia.com>,
	Felix Abecassis <fabecassis@nvidia.com>,
	Balbir Singh <balbirs@nvidia.com>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	Shrikanth Hegde <sshegde@linux.ibm.com>,
	linux-kernel@vger.kernel.org, tim.c.chen@linux.intel.com,
	yu.c.chen@intel.com
Subject: Re: [PATCH v2 2/5] sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity
Date: Tue, 19 May 2026 09:54:02 +0200	[thread overview]
Message-ID: <agwXGgTQEqF0sn6E@gpd4> (raw)
In-Reply-To: <55196e3b-ba1e-42c9-b80b-5c91306df452@amd.com>

On Tue, May 19, 2026 at 01:17:20PM +0530, K Prateek Nayak wrote:
> Hello Andrea,
> 
> Thank you for taking a look at the diff!

BTW I just re-ran the NVBLAS benchmark on a Vera Rubin machine using
queue:sched/core + this on top, all good!

Thanks,
-Andrea

> 
> On 5/19/2026 12:13 PM, Andrea Righi wrote:
> > Hi Prateek,
> > 
> > On Tue, May 19, 2026 at 11:22:32AM +0530, K Prateek Nayak wrote:
> >> Hello Peter, Andrea,
> >>
> >> On 5/19/2026 2:28 AM, Peter Zijlstra wrote:
> >>> @@@ -2775,20 -3049,16 +3107,15 @@@ build_sched_domains(const struct cpumas
> >>>   		if (!sd)
> >>>   			continue;
> >>>   
> >>>  +		if (has_asym)
> >>> - 			asym_claimed = claim_asym_sched_domain_shared(&d, i);
> >>> ++			claim_asym_sched_domain_shared(&d, i);
> >>>  +
> >>>   		/* First, find the topmost SD_SHARE_LLC domain */
> >>>   		while (sd->parent && (sd->parent->flags & SD_SHARE_LLC))
> >>>   			sd = sd->parent;
> >>>   
> >>>   		if (sd->flags & SD_SHARE_LLC) {
> >>> - 			/*
> >>> - 			 * Initialize the sd->shared for SD_SHARE_LLC unless
> >>> - 			 * the asym path above already claimed it.
> >>> - 			 */
> >>> - 			if (!asym_claimed)
> >>> - 				init_sched_domain_shared(&d, sd);
> >>>  -			int sd_id = cpumask_first(sched_domain_span(sd));
> >>>  -
> >>>  -			sd->shared = *per_cpu_ptr(d.sds, sd_id);
> >>>  -			atomic_set(&sd->shared->nr_busy_cpus, sd->span_weight);
> >>>  -			atomic_inc(&sd->shared->ref);
> >>> ++			init_sched_domain_shared(&d, sd);
> >>
> >> This will run into a small problem with "nr_idle_scan" if
> >> cpumask_first(sched_domain_span(sd)) is the same for both sd_asym and
> >> sd_llc.
> > 
> > Ah, good catch! When cpumask_first(asym_span) == cpumask_first(llc_span)
> > (big.LITTLE typical case), both sd_asym->shared and sd_llc->shared would alias
> > to d->sds[0].
> > 
> >>
> >> Load balancer at different domains will populate "nr_idle_scan" with
> >> different values and they alias to same ->shared if one isn't
> >> degenerated and I believe there is at least one way to hit the WARN_ON()
> >> from cpu_attach_domain() if the SD_ASYM_CPUCAPACITY_FULL comes before
> >> the last SD_SHARE_LLC domain and the latter is degenerated.
> >>
> >> How about this:
> >>
> >>   (On top of queue:sched/core; Lightly tested on !ASYM_CPUCAPACITY system)
> >>
> >> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> >> index fe09d3268bc9..1d2c98dca211 100644
> >> --- a/include/linux/sched/topology.h
> >> +++ b/include/linux/sched/topology.h
> >> @@ -67,7 +67,15 @@ struct sched_domain_shared {
> >>  	atomic_t	ref;
> >>  	atomic_t	nr_busy_cpus;
> >>  	int		has_idle_cores;
> >> -	int		nr_idle_scan;
> >> +	union {
> >> +		int	nr_idle_scan;
> >> +		/*
> >> +		 * Used during allocation to claim the
> >> +		 * sched_domain_shared object at
> >> +		 * multiple levels.
> > 
> > I think between build and the first LB tick, readers of nr_idle_scan may observe
> > leftover SD_* flags in nr_idle_scan. This shouldn't be a problem and should
> > self-heal soon, but maybe it's worth a comment? Something like:
> > 
> >   * Note: between build and the first periodic LB tick, which
> >   * rewrites the union via update_idle_cpu_scan(), readers of
> >   * nr_idle_scan may observe the transient SD_* flag value as
> >   * the scan bound. The flag bits are small positive integers,
> >   * so the effect is just a slightly relaxed scan bound for one
> >   * window and self-heals on the first tick.
> 
> Ack! We start with 0 today which isn't representative of the system
> state either and depend on the eventual correctness to fix the value
> after a hotplug / cpuset.
> 
> I can fold in the note and resend it as a formal patch.
> 
> Peter, would you prefer a formal patch or would you like to do this
> (or something similar) as a part of the conflict resolution itself?
> 
> >> +	BUG_ON(!sd->shared);
> > 
> > Unreachable in practice, but should we have a WARN_ON_ONCE() +
> > bail/early-return? In this way we'd fall back to using LLC's shared for
> > sd_balance_shared, which seems nicer than a BUG_ON().
> 
> Ack! We can just use the last CPU's "sds" if we don't end up finding a
> free one as a backup. I just had the BUG_ON() to easily spot my VM
> crashing ;-)
> 
> -- 
> Thanks and Regards,
> Prateek
> 

  reply	other threads:[~2026-05-19  7:54 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-09 18:07 [PATCH v6 0/5 RESEND] sched/fair: SMT-aware asymmetric CPU capacity Andrea Righi
2026-05-09 18:07 ` [PATCH 1/5] sched/fair: Drop redundant RCU read lock in NOHZ kick path Andrea Righi
2026-05-11 13:04   ` Vincent Guittot
2026-05-15  6:49   ` Shrikanth Hegde
2026-05-16  5:45     ` Andrea Righi
2026-05-16 17:15       ` Shrikanth Hegde
2026-05-20  8:34   ` [tip: sched/core] " tip-bot2 for Andrea Righi
2026-05-21 19:47   ` [PATCH 1/5] " Marek Szyprowski
2026-05-21 20:13     ` Andrea Righi
2026-05-09 18:07 ` [PATCH 2/5] sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity Andrea Righi
2026-05-11 13:04   ` Vincent Guittot
2026-05-15 10:05   ` Shrikanth Hegde
2026-05-16  5:58     ` [PATCH v2 " Andrea Righi
2026-05-16 17:19       ` Shrikanth Hegde
2026-05-18 20:58       ` Peter Zijlstra
2026-05-18 21:31         ` Andrea Righi
2026-05-19  5:52         ` K Prateek Nayak
2026-05-19  6:43           ` Andrea Righi
2026-05-19  7:47             ` K Prateek Nayak
2026-05-19  7:54               ` Andrea Righi [this message]
2026-05-19  8:46           ` Peter Zijlstra
2026-05-19 11:27             ` K Prateek Nayak
2026-05-19 11:47               ` Peter Zijlstra
2026-05-25  8:30                 ` Chen, Yu C
2026-05-20  8:34       ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-05-09 18:07 ` [PATCH 3/5] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Andrea Righi
2026-05-11 13:07   ` Vincent Guittot
2026-05-11 13:45     ` Andrea Righi
2026-05-11 14:25     ` [PATCH v2 " Andrea Righi
2026-05-20  8:34       ` [tip: sched/core] " tip-bot2 for Andrea Righi
2026-05-09 18:07 ` [PATCH 4/5] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity Andrea Righi
2026-05-11 13:07   ` Vincent Guittot
2026-05-15 10:09   ` Shrikanth Hegde
2026-05-16  9:04     ` Andrea Righi
2026-05-20  8:34   ` [tip: sched/core] " tip-bot2 for Andrea Righi
2026-05-09 18:07 ` [PATCH 5/5] sched/fair: Add SIS_UTIL support to select_idle_capacity() Andrea Righi
2026-05-11 13:08   ` Vincent Guittot
2026-05-20  8:34   ` [tip: sched/core] " tip-bot2 for K Prateek Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agwXGgTQEqF0sn6E@gpd4 \
    --to=arighi@nvidia.com \
    --cc=balbirs@nvidia.com \
    --cc=bsegall@google.com \
    --cc=christian.loehle@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=fabecassis@nvidia.com \
    --cc=joelagnelf@nvidia.com \
    --cc=juri.lelli@redhat.com \
    --cc=kobak@nvidia.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sshegde@linux.ibm.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox