From: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Andrea Righi <arighi@nvidia.com>, Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
K Prateek Nayak <kprateek.nayak@amd.com>,
Christian Loehle <christian.loehle@arm.com>,
Koba Ko <kobak@nvidia.com>,
Felix Abecassis <fabecassis@nvidia.com>,
Balbir Singh <balbirs@nvidia.com>,
Joel Fernandes <joelagnelf@nvidia.com>,
Shrikanth Hegde <sshegde@linux.ibm.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/5] sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity
Date: Tue, 5 May 2026 14:48:30 +0200 [thread overview]
Message-ID: <c4ecf0ba-4455-49cf-8236-d3be672272d8@arm.com> (raw)
In-Reply-To: <20260428144352.3575863-3-arighi@nvidia.com>
On 28.04.26 16:41, Andrea Righi wrote:
> From: K Prateek Nayak <kprateek.nayak@amd.com>
>
> On asymmetric CPU capacity systems, the wakeup path uses
> select_idle_capacity(), which scans the span of sd_asym_cpucapacity
> rather than sd_llc.
>
> The has_idle_cores hint however lives on sd_llc->shared, so the
> wakeup-time read of has_idle_cores operates on an LLC-scoped blob while
> the actual scan/decision spans the asym domain; nr_busy_cpus also lives
> in the same shared sched_domain data, but it's never used in the asym
> CPU capacity scenario.
>
> Therefore, move the sched_domain_shared object to sd_asym_cpucapacity
> whenever the CPU has a SD_ASYM_CPUCAPACITY_FULL ancestor and that
> ancestor is non-overlapping (i.e., not built from SD_NUMA). In that case
> the scope of has_idle_cores matches the scope of the wakeup scan.
>
> Fall back to attaching the shared object to sd_llc in three cases:
>
> 1) plain symmetric systems (no SD_ASYM_CPUCAPACITY_FULL anywhere);
>
> 2) CPUs in an exclusive cpuset that carves out a symmetric capacity
> island: has_asym is system-wide but those CPUs have no
> SD_ASYM_CPUCAPACITY_FULL ancestor in their hierarchy and follow
> the symmetric LLC path in select_idle_sibling();
>
> 3) exotic topologies where SD_ASYM_CPUCAPACITY_FULL lands on an
> SD_NUMA-built domain. init_sched_domain_shared() keys the shared
> blob off cpumask_first(span), which on overlapping NUMA domains
> would alias unrelated spans onto the same blob. Keep the shared
> object on the LLC there; select_idle_capacity() gracefully skips
> the has_idle_cores preference when sd->shared is NULL.
Tested it with a coule of real & exotic topolgies, seems to work nicely.
$ cat /sys/devices/system/cpu/cpu*/cpu_capacity
160
160
160
160
498
498
1024
1024
(1) grouping CPUs with same CPU capacities
$ cat /sys/kernel/debug/sched/domains/cpu[0-7]/domain*/name
MC
PKG
$ cat /sys/kernel/debug/sched/domains/cpu[0-7]/domain*/flags
... SD_SHARE_LLC
... SD_ASYM_CPUCAPACITY SD_ASYM_CPUCAPACITY_FULL ...
PKG { 0-7 }
MC {0-3} {4,5} {6,7}
(2) flat
$ cat /sys/kernel/debug/sched/domains/cpu[0-7]/domain*/name
MC
$ cat /sys/kernel/debug/sched/domains/cpu[0-7]/domain*/flags
... SD_ASYM_CPUCAPACITY SD_ASYM_CPUCAPACITY_FULL ...
MC { 0-7 }
(3) flat, exotic, since w/ SMT
$ cat /sys/kernel/debug/sched/domains/cpu[0-7]/domain*/name
SMT
MC
... SD_SHARE_CPUCAPACITY SD_SHARE_LLC ...
... SD_ASYM_CPUCAPACITY SD_ASYM_CPUCAPACITY_FULL SD_SHARE_LLC ...
MC { 0-7 }
SMT {0-1} {2-3} {4-5} {6-7}
(4) exotic, since asymmetric and w/ SMT
$ cat /sys/kernel/debug/sched/domains/cpu[0-3]/domain*/name
SMT
MC
PKG
$ cat /sys/kernel/debug/sched/domains/cpu[0-3]/domain*/flags
... SD_SHARE_CPUCAPACITY SD_SHARE_LLC ...
... SD_SHARE_LLC
... SD_ASYM_CPUCAPACITY SD_ASYM_CPUCAPACITY_FULL ...
$ cat /sys/kernel/debug/sched/domains/cpu[4-7]/domain*/name
SMT
PKG
$ cat /sys/kernel/debug/sched/domains/cpu[4-7]/domain*/flags
... SD_SHARE_CPUCAPACITY SD_SHARE_LLC ...
... SD_ASYM_CPUCAPACITY SD_ASYM_CPUCAPACITY_FULL ...
PKG { 0-7 }
MC { 0-3 }
SMT {0-1} {2-3} {4-5} {6-7}
(5) same as (4) but partial CPU capacity asymmetry in MC { 0-3 }
cat /sys/devices/system/cpu/cpu*/cpu_capacity
160
160
498
498
160
160
1024
1024
$ cat /sys/kernel/debug/sched/domains/cpu[0-3]/domain*/flags
... SD_SHARE_CPUCAPACITY SD_SHARE_LLC ...
... SD_ASYM_CPUCAPACITY SD_SHARE_LLC ...
^^^^^^^^^^^^^^^^^^^
... SD_ASYM_CPUCAPACITY SD_ASYM_CPUCAPACITY_FULL ...
(6) (5) w/ exclusive cpusets with one symmetric island
cd /sys/fs/cgroup
echo +cpuset > cgroup.subtree_control
mkdir cs1
echo "threaded" > cs1/cgroup.type
echo 0-1,4-5 > cs1/cpuset.cpus
echo 0 > cs1/cpuset.mems
echo root > cs1/cpuset.cpus.partition
mkdir cs2
echo "threaded" > cs2/cgroup.type
echo 0 > cs2/cpuset.mems
echo 2-3,6-7 > cs2/cpuset.cpus
echo root > cs2/cpuset.cpus.partition
[ 0.006866] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=0
[ 0.006868] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=1
[ 0.006869] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=2
[ 0.006869] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=3
[ 0.006869] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=4
[ 0.006869] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=5
[ 0.006870] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=6
[ 0.006870] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=7
...
[ 222.767275] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=2
[ 222.767324] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=3
[ 222.767710] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=6
[ 222.767789] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=7
[ 222.781015] build_sched_domains() (3) sd=MC cpu=0
[ 222.781017] build_sched_domains() (3) sd=MC cpu=1
[ 222.781017] build_sched_domains() (3) sd=MC cpu=4
[ 222.781018] build_sched_domains() (3) sd=MC cpu=5
[...]
> @@ -2650,6 +2665,49 @@ static void adjust_numa_imbalance(struct sched_domain *sd_llc)
> }
> }
>
> +static void init_sched_domain_shared(struct s_data *d, struct sched_domain *sd)
> +{
> + int sd_id = cpumask_first(sched_domain_span(sd));
> +
> + sd->shared = *per_cpu_ptr(d->sds, sd_id);
> + atomic_set(&sd->shared->nr_busy_cpus, sd->span_weight);
Will be used only for sd_llc->shared, not for sd_asym, right?
> + atomic_inc(&sd->shared->ref);
> +}
> +
> +/*
> + * For asymmetric CPU capacity, attach sched_domain_shared on the innermost
> + * SD_ASYM_CPUCAPACITY_FULL ancestor of @cpu's base domain when that ancestor is
> + * not an overlapping NUMA-built domain (then LLC should claim shared).
> + *
> + * A CPU may lack any FULL ancestor (e.g., exclusive cpuset symmetric island),
> + * then LLC must claim shared instead.
> + *
> + * Note: SD_ASYM_CPUCAPACITY_FULL is only set when multiple distinct capacities
s/multiple/all ? We want to see all possible CPU capacity values in wakeup.
> + * exist in the domain span, so the asym domain we attach to cannot degenerate
> + * into a single-capacity group. The relevant edge cases are instead covered by
> + * the caveats above.
[...]
next prev parent reply other threads:[~2026-05-05 12:48 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260428144352.3575863-1-arighi@nvidia.com>
[not found] ` <20260428144352.3575863-2-arighi@nvidia.com>
2026-05-05 9:15 ` [PATCH 1/5] sched/fair: Drop redundant RCU read lock in NOHZ kick path Dietmar Eggemann
2026-05-05 9:22 ` Andrea Righi
2026-05-05 20:40 ` [PATCH v5 0/5] sched/fair: SMT-aware asymmetric CPU capacity Dietmar Eggemann
[not found] ` <20260428144352.3575863-3-arighi@nvidia.com>
2026-05-05 12:48 ` Dietmar Eggemann [this message]
2026-05-06 9:45 ` [PATCH 2/5] sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity Vincent Guittot
2026-05-06 10:19 ` K Prateek Nayak
2026-05-06 10:30 ` Vincent Guittot
[not found] ` <20260428144352.3575863-4-arighi@nvidia.com>
2026-05-05 17:20 ` [PATCH 3/5] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Dietmar Eggemann
2026-05-06 18:31 ` Andrea Righi
2026-05-06 10:29 ` Vincent Guittot
2026-05-06 12:34 ` Vincent Guittot
2026-05-06 18:15 ` Andrea Righi
[not found] ` <20260428144352.3575863-6-arighi@nvidia.com>
2026-05-06 12:59 ` [PATCH 5/5] sched/fair: Add SIS_UTIL support to select_idle_capacity() Vincent Guittot
2026-05-06 17:01 ` Dietmar Eggemann
2026-05-06 18:11 ` Andrea Righi
2026-05-07 6:47 ` Vincent Guittot
2026-05-08 14:49 ` Dietmar Eggemann
2026-05-08 22:05 ` Andrea Righi
2026-05-09 18:01 Andrea Righi
2026-05-09 18:01 ` [PATCH 2/5] sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity Andrea Righi
-- strict thread matches above, loose matches on Subject: below --
2026-05-09 18:07 [PATCH v6 0/5 RESEND] sched/fair: SMT-aware asymmetric CPU capacity Andrea Righi
2026-05-09 18:07 ` [PATCH 2/5] sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity Andrea Righi
2026-05-11 13:04 ` Vincent Guittot
2026-05-15 10:05 ` Shrikanth Hegde
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c4ecf0ba-4455-49cf-8236-d3be672272d8@arm.com \
--to=dietmar.eggemann@arm.com \
--cc=arighi@nvidia.com \
--cc=balbirs@nvidia.com \
--cc=bsegall@google.com \
--cc=christian.loehle@arm.com \
--cc=fabecassis@nvidia.com \
--cc=joelagnelf@nvidia.com \
--cc=juri.lelli@redhat.com \
--cc=kobak@nvidia.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=sshegde@linux.ibm.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.