From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A229A29BD8C for ; Tue, 5 May 2026 12:48:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777985318; cv=none; b=m24b4kwQmYFonvafH7lVi+SXR3x5alAfWGvb6xrZUeaWcdL//pXZ6WdGzC/FKdfo4vRlqRzUA9Ejp3UIqAWPFgjTXJH3B9nll8CSw3fI9BvkkaV/R66teQUflDqY6M7VT8Z1HrNG+mB91AH1t4I0k0AqLU65bBEmAijGnyno25E= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777985318; c=relaxed/simple; bh=gJk0VpgS8ppyqT2SnXiAXz2N89+dAkJVWqI9fY2aJ2A=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=M703N91B8yXlugH0L+yqY6JiykwxEJh9EkuAKCpVYm0EvTWreas98Cz6SJAvFPWfM9XZSDx81Zw078s2YUArSlJr8oucKTqTT+uqFGuf0DCTTINxEs5qtRkrMoidolBlGNpJsf1EHlSVa7j870+wiRT3EsA1DMW5eEBsfImmcCM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b=MrUQqsMX; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.b="MrUQqsMX" Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4249928FA; Tue, 5 May 2026 05:48:29 -0700 (PDT) Received: from [192.168.178.100] (usa-sjc-mx-foss1.foss.arm.com [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 08A243F763; Tue, 5 May 2026 05:48:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=arm.com; s=foss; t=1777985314; bh=gJk0VpgS8ppyqT2SnXiAXz2N89+dAkJVWqI9fY2aJ2A=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=MrUQqsMXEuWOqNocU7/sKc9btJ1sdDK9fn37V97h5k8Uc6gK4T2aCig0NshzFTtPq 9X31z7rLhohko8WChwg4CI7qScQF4QKdOaHN8m2ZGSaRHhXOA0UNcZU1KZzjFlV/2e mUsJColz2AlEYQdvdhde9wnJMU7ibIx1zwH/i6RI= Message-ID: Date: Tue, 5 May 2026 14:48:30 +0200 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/5] sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity To: Andrea Righi , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , K Prateek Nayak , Christian Loehle , Koba Ko , Felix Abecassis , Balbir Singh , Joel Fernandes , Shrikanth Hegde , linux-kernel@vger.kernel.org References: <20260428144352.3575863-1-arighi@nvidia.com> <20260428144352.3575863-3-arighi@nvidia.com> Content-Language: en-GB From: Dietmar Eggemann In-Reply-To: <20260428144352.3575863-3-arighi@nvidia.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 28.04.26 16:41, Andrea Righi wrote: > From: K Prateek Nayak > > On asymmetric CPU capacity systems, the wakeup path uses > select_idle_capacity(), which scans the span of sd_asym_cpucapacity > rather than sd_llc. > > The has_idle_cores hint however lives on sd_llc->shared, so the > wakeup-time read of has_idle_cores operates on an LLC-scoped blob while > the actual scan/decision spans the asym domain; nr_busy_cpus also lives > in the same shared sched_domain data, but it's never used in the asym > CPU capacity scenario. > > Therefore, move the sched_domain_shared object to sd_asym_cpucapacity > whenever the CPU has a SD_ASYM_CPUCAPACITY_FULL ancestor and that > ancestor is non-overlapping (i.e., not built from SD_NUMA). In that case > the scope of has_idle_cores matches the scope of the wakeup scan. > > Fall back to attaching the shared object to sd_llc in three cases: > > 1) plain symmetric systems (no SD_ASYM_CPUCAPACITY_FULL anywhere); > > 2) CPUs in an exclusive cpuset that carves out a symmetric capacity > island: has_asym is system-wide but those CPUs have no > SD_ASYM_CPUCAPACITY_FULL ancestor in their hierarchy and follow > the symmetric LLC path in select_idle_sibling(); > > 3) exotic topologies where SD_ASYM_CPUCAPACITY_FULL lands on an > SD_NUMA-built domain. init_sched_domain_shared() keys the shared > blob off cpumask_first(span), which on overlapping NUMA domains > would alias unrelated spans onto the same blob. Keep the shared > object on the LLC there; select_idle_capacity() gracefully skips > the has_idle_cores preference when sd->shared is NULL. Tested it with a coule of real & exotic topolgies, seems to work nicely. $ cat /sys/devices/system/cpu/cpu*/cpu_capacity 160 160 160 160 498 498 1024 1024 (1) grouping CPUs with same CPU capacities $ cat /sys/kernel/debug/sched/domains/cpu[0-7]/domain*/name MC PKG $ cat /sys/kernel/debug/sched/domains/cpu[0-7]/domain*/flags ... SD_SHARE_LLC ... SD_ASYM_CPUCAPACITY SD_ASYM_CPUCAPACITY_FULL ... PKG { 0-7 } MC {0-3} {4,5} {6,7} (2) flat $ cat /sys/kernel/debug/sched/domains/cpu[0-7]/domain*/name MC $ cat /sys/kernel/debug/sched/domains/cpu[0-7]/domain*/flags ... SD_ASYM_CPUCAPACITY SD_ASYM_CPUCAPACITY_FULL ... MC { 0-7 } (3) flat, exotic, since w/ SMT $ cat /sys/kernel/debug/sched/domains/cpu[0-7]/domain*/name SMT MC ... SD_SHARE_CPUCAPACITY SD_SHARE_LLC ... ... SD_ASYM_CPUCAPACITY SD_ASYM_CPUCAPACITY_FULL SD_SHARE_LLC ... MC { 0-7 } SMT {0-1} {2-3} {4-5} {6-7} (4) exotic, since asymmetric and w/ SMT $ cat /sys/kernel/debug/sched/domains/cpu[0-3]/domain*/name SMT MC PKG $ cat /sys/kernel/debug/sched/domains/cpu[0-3]/domain*/flags ... SD_SHARE_CPUCAPACITY SD_SHARE_LLC ... ... SD_SHARE_LLC ... SD_ASYM_CPUCAPACITY SD_ASYM_CPUCAPACITY_FULL ... $ cat /sys/kernel/debug/sched/domains/cpu[4-7]/domain*/name SMT PKG $ cat /sys/kernel/debug/sched/domains/cpu[4-7]/domain*/flags ... SD_SHARE_CPUCAPACITY SD_SHARE_LLC ... ... SD_ASYM_CPUCAPACITY SD_ASYM_CPUCAPACITY_FULL ... PKG { 0-7 } MC { 0-3 } SMT {0-1} {2-3} {4-5} {6-7} (5) same as (4) but partial CPU capacity asymmetry in MC { 0-3 } cat /sys/devices/system/cpu/cpu*/cpu_capacity 160 160 498 498 160 160 1024 1024 $ cat /sys/kernel/debug/sched/domains/cpu[0-3]/domain*/flags ... SD_SHARE_CPUCAPACITY SD_SHARE_LLC ... ... SD_ASYM_CPUCAPACITY SD_SHARE_LLC ... ^^^^^^^^^^^^^^^^^^^ ... SD_ASYM_CPUCAPACITY SD_ASYM_CPUCAPACITY_FULL ... (6) (5) w/ exclusive cpusets with one symmetric island cd /sys/fs/cgroup echo +cpuset > cgroup.subtree_control mkdir cs1 echo "threaded" > cs1/cgroup.type echo 0-1,4-5 > cs1/cpuset.cpus echo 0 > cs1/cpuset.mems echo root > cs1/cpuset.cpus.partition mkdir cs2 echo "threaded" > cs2/cgroup.type echo 0 > cs2/cpuset.mems echo 2-3,6-7 > cs2/cpuset.cpus echo root > cs2/cpuset.cpus.partition [ 0.006866] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=0 [ 0.006868] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=1 [ 0.006869] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=2 [ 0.006869] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=3 [ 0.006869] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=4 [ 0.006869] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=5 [ 0.006870] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=6 [ 0.006870] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=7 ... [ 222.767275] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=2 [ 222.767324] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=3 [ 222.767710] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=6 [ 222.767789] claim_asym_sched_domain_shared() (2) sd_asym=PKG cpu=7 [ 222.781015] build_sched_domains() (3) sd=MC cpu=0 [ 222.781017] build_sched_domains() (3) sd=MC cpu=1 [ 222.781017] build_sched_domains() (3) sd=MC cpu=4 [ 222.781018] build_sched_domains() (3) sd=MC cpu=5 [...] > @@ -2650,6 +2665,49 @@ static void adjust_numa_imbalance(struct sched_domain *sd_llc) > } > } > > +static void init_sched_domain_shared(struct s_data *d, struct sched_domain *sd) > +{ > + int sd_id = cpumask_first(sched_domain_span(sd)); > + > + sd->shared = *per_cpu_ptr(d->sds, sd_id); > + atomic_set(&sd->shared->nr_busy_cpus, sd->span_weight); Will be used only for sd_llc->shared, not for sd_asym, right? > + atomic_inc(&sd->shared->ref); > +} > + > +/* > + * For asymmetric CPU capacity, attach sched_domain_shared on the innermost > + * SD_ASYM_CPUCAPACITY_FULL ancestor of @cpu's base domain when that ancestor is > + * not an overlapping NUMA-built domain (then LLC should claim shared). > + * > + * A CPU may lack any FULL ancestor (e.g., exclusive cpuset symmetric island), > + * then LLC must claim shared instead. > + * > + * Note: SD_ASYM_CPUCAPACITY_FULL is only set when multiple distinct capacities s/multiple/all ? We want to see all possible CPU capacity values in wakeup. > + * exist in the domain span, so the asym domain we attach to cannot degenerate > + * into a single-capacity group. The relevant edge cases are instead covered by > + * the caveats above. [...]