From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from BL2PR02CU003.outbound.protection.outlook.com (mail-eastusazon11011028.outbound.protection.outlook.com [52.101.52.28])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA6873D1CB1
	for <linux-kernel@vger.kernel.org>; Tue, 28 Apr 2026 08:47:38 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.52.28
ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1777366060; cv=fail; b=HWB5cti4Eck4hfsGG8eqR29+QXh5g5CUs7DOJFTJLawivVVfaIYCO2Aj0XEQJIt7a/4QPTzsOW8vdX3vy//0TEma5zS6G51uVE46AeT1i6szbcIJexkD/WiYVdsdsHKDSz0j9+KcpeM2H6waYUBxtKjf72kYKmHohw41w4U/JjQ=
ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1777366060; c=relaxed/simple;
	bh=QOjyTne7C1XGRwXoDLyU9oiG7Tx1uTmeDv1+YgulPjA=;
	h=Date:From:To:Cc:Subject:Message-ID:References:Content-Type:
	 Content-Disposition:In-Reply-To:MIME-Version; b=fqKM3O46mSgAjvsgRnqVwQqYOs6wW1dii7odiAb3Hsvix60cxaDKLXK+pjBykF6UiV3Ilrb2B01kuTB3s/6vxLZUVew8CZWrUo+eb0Ors+i9O7B8VpjeIIFEcYd2v7YixWZlAjt2jROr7dYY7Xm3pmSSBRZw/zcsyCGfY+Im+xc=
ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=GY/5XOsl; arc=fail smtp.client-ip=52.101.52.28
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com
Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="GY/5XOsl"
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=adv4lUOcv1PFlZAivggCK/a9+DFTZAPrGvhwHoOEOplwoGO27AyK3GacGHSxLXQV2Xw2sHTw8MDwGMYzFTJMGMgnBfjLX+AGHh/WYNuebwNjM0Oh6mb+KlVbQQSJnUTgR0VQHFLKqB17/Vdjffk1Q6vBPreSwXjEpv98Z6+uRJjMUVMYLKGZziaFOi6Zb3XLHNtIt026sRF4tFuRhOVT6d5zEeLzMqJI95E3AoG3OCdOUkmDEwupGphBRzyxKA8/9O+VSgaQjAnvCWwNPwDdaQ/r2g8mHLJcjsaZXTgUlBXxyk6drIyQU+hwMk/em+KKZbNVKipyzHfcMnwddKyp0g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=huPNMIPtyIf6RSRH/060dhPbINoeb37Eo0MXugDbFUk=;
 b=xm0LAGBAuR6inbkPQe3gQGSBqRTFdAEtKp5CDno9ILNt8zRUywG5RxjKy6L/DqBbzRVM2STcEzS9fYDDbK02xjKdxfB28D43as5gXoFeLeO/dlBo5fVqOMcyO+A7gmisQin7XYPgUrftjlHSS3/t2QYHpZfxkqKrlm3FyYAYN2iNGuTYNch60e5Cfw5pa6g7uxSlvg2I9RKhyhMadpgk+QqomwgkUprsARzGPgvEllhLHjrC++CIIFbLdecwMBpv6HDlILRgswbWPhsV2HFsM/cgvf0l1K/qsHLThLbute4c7QIw4naR+8+y/HIrLh1tOJCSWTKuItm1OyFOJ28waw==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com;
 dkim=pass header.d=nvidia.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com;
 s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=huPNMIPtyIf6RSRH/060dhPbINoeb37Eo0MXugDbFUk=;
 b=GY/5XOslZkr1tICZuCt8+23SnYwJ30c/T4i9BGETzMt+ske58kNi/lQ4vyAJYTaQK70oo3gUU55C5GgPBpOqQE8JJ4wDFOWt34W3kHCELs5hsWy9un8wcg/Vl3s7N9Jqkx2k3soohXClV5lVVHOjOAyrlF7sxCQBun9sAv0KGPYJKtQjo+duKUHSV9xO6jnXgRHEEZekK3aNkQ2zt+cDAL8clwV3l18EJJWz2uSYZ/ctu6b9tFxkxMfYXkoOluHzwGbyrBTvIFnhEeulzVhP9PDGfgUN6aLfZS8Mhk/bGzVDd9npSuSCXPluqgGQLka9FPMLF1pyY/DcIHC5qTKTLw==
Authentication-Results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=nvidia.com;
Received: from LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19)
 by BL1PR12MB5971.namprd12.prod.outlook.com (2603:10b6:208:39a::13) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9870.18; Tue, 28 Apr
 2026 08:47:33 +0000
Received: from LV8PR12MB9620.namprd12.prod.outlook.com
 ([fe80::299d:f5e0:3550:1528]) by LV8PR12MB9620.namprd12.prod.outlook.com
 ([fe80::299d:f5e0:3550:1528%5]) with mapi id 15.20.9870.013; Tue, 28 Apr 2026
 08:47:33 +0000
Date: Tue, 28 Apr 2026 10:47:22 +0200
From: Andrea Righi <arighi@nvidia.com>
To: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Christian Loehle <christian.loehle@arm.com>,
	Koba Ko <kobak@nvidia.com>, Felix Abecassis <fabecassis@nvidia.com>,
	Balbir Singh <balbirs@nvidia.com>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: Re: [PATCH 2/6] sched/fair: Attach sched_domain_shared to
 sd_asym_cpucapacity
Message-ID: <afB0GmYQhJkI4QPq@gpd4>
References: <20260428051720.3180182-1-arighi@nvidia.com>
 <20260428051720.3180182-3-arighi@nvidia.com>
 <bea8880a-7273-4257-b733-dcb3f1e28ed6@linux.ibm.com>
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <bea8880a-7273-4257-b733-dcb3f1e28ed6@linux.ibm.com>
X-ClientProxiedBy: ZR1PEPF000077DE.CHEP278.PROD.OUTLOOK.COM
 (2603:10a6:918::40b) To LV8PR12MB9620.namprd12.prod.outlook.com
 (2603:10b6:408:2a1::19)
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: LV8PR12MB9620:EE_|BL1PR12MB5971:EE_
X-MS-Office365-Filtering-Correlation-Id: 26ee8aa8-f053-41dd-9055-08dea502c9c7
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam:
	BCL:0;ARA:13230040|366016|1800799024|376014|7416014|56012099003|22082099003|18002099003;
X-Microsoft-Antispam-Message-Info:
	bXgXJ+tJBL8kvMbaLnwdFHShnw/fqMY4oQLsaRZ35UwAHJOzTmSDdmjTD9DCBLALJXJoTS9N0Dg8NvzRtibERnAvYzQfvXFU4l9lNUBgv2flKKHJqiC1WsAnRm2zhMWPLegwnZJn1iriFCDP8H/VU715G9DyHoyE6bB3Z+KI5exnoJBJK35CugQRkUfFs6kC/DiQMawxGv0rCECCkfAFq2dWltE526+38u8dcaWyed7NBF9z204tC5UveO4iHB7SQgUzc5Z0X+38fYgAHAUrCvpZpaH2Y61hBKsdtYfw9K/SFr+lkqf54CnHyjZgEHknRGGOebKiowQU+iKe5iZqzEHOKBPnvaNb8drTPwvgMlY6rWbQx+xd5rBQhrIta7qZ/XCwmg4+VPXsxBrf2VTjEkkmdD1KRA3jJ6pbEoHEg7CrJ95ZQKNqAtANBQ2BcxU0Y/8zpklrD7pla3dAGqUPbQc+YNXSNvnYlNEN7fcxIMgRtqynHAFGN7d6S+p1ZpK6uoJi0YmBpZJoEQ8Vbj9C1qBrLDmmpSNwp+2zBEDQZkb2fI1JXv8NVAUWo4V3cUvU9gY2n16RLa3E2WzHTOZbML60UdDfOnEpEI5Yc1F2Dy7N4QpghEkKByHlrduu9NghAy3aLj9gQEPnHol1slDlOTUtzF57j+nSrqMP4DHIZVKTbr0O+pdE1wQcIqQVqR5az2guSbR6gepsptPVwaBjWNq/EzV2CTu8/rznR94ddMs=
X-Forefront-Antispam-Report:
	CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LV8PR12MB9620.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(1800799024)(376014)(7416014)(56012099003)(22082099003)(18002099003);DIR:OUT;SFP:1101;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0:
	=?us-ascii?Q?80KoV5mi02Ud+wy2O3CLzMLIzzoyDIk2VAZHQkC7bmnPK90FJCgYPsBDnQ6D?=
 =?us-ascii?Q?bELc0ASsArbjTCpcP9+dQT+sQykm4/9L2bY26wD3418tOTrrVORbjQHVdqOX?=
 =?us-ascii?Q?is60KoR6MekxYZooaBpGq67GQy2kKGPKuclFo0SY7PVoFHEOslu2QgK9izrC?=
 =?us-ascii?Q?TJSLKXvHCMJ6IfAdtUKetYB/9jqgc4GVju+1yg0OQxRFEW0Go5i+7sVqKe5U?=
 =?us-ascii?Q?zjsnE35on2a+fTu1r1nPwjwNl9vMK/A1QFwSry9snt20k94RwP2FLfO/p4Mb?=
 =?us-ascii?Q?e0GIRlydhUh7ebW+gQX7YLRZGYV6WQD5cbvClwA5WpRbs2YbFLi0gF2G2eVr?=
 =?us-ascii?Q?CkQ4vKvI2sO0eU/PIzHxZN1mrNsLAu4vdYauzcbNFHtvJAfx1XSRDrHJSmro?=
 =?us-ascii?Q?coguTz8bDEaGePsUX1hfwZQKWoptXI+xvUGbbllsU0mEnWqZzcmGtiin9rz+?=
 =?us-ascii?Q?/szkd4EqrJm2urzUAdzsfh3S9bhUDRmiPYXwfNO63xOKQpFwm0WjTLjYRcUn?=
 =?us-ascii?Q?C81929jfxVv85Hq2qlmYzmMWBhGfsi55dxwiGfl5q/jG0gjaG94eN+nlskyz?=
 =?us-ascii?Q?kuJ8AmrobMYO/u1QHM0H+1rE+0ql3W7muIhi/qvYuK4vkCYDLsCLMSjkoy5n?=
 =?us-ascii?Q?q/s8M1BgijTggnY3hATFwwFsbrRvYqh95pMYqvhH2V02D0DvjoiWk4nlKQBe?=
 =?us-ascii?Q?6GyAGHYgnKH3Z1qGrJBD4N2G82/h6vqPuB8P7/X5hvqbwZJx7xfRyc7qB7mJ?=
 =?us-ascii?Q?4sPevuP2lEN+8t2+xCe3O9pdRfs+JWPfFpOetvqZbk9Tl51FIFgdEMrQ19MO?=
 =?us-ascii?Q?VoV2jez1M1OgILYQjUMp16eBYuYlsVOY8+yV8Zoanj8I8gTC8KAUbCg9rQ50?=
 =?us-ascii?Q?WlDtD0EiJrLZ2EZ46/PWczaYhP5FVOCh6QrXT9x2lgmpkQSi1CsoF/Z9e368?=
 =?us-ascii?Q?ab/cIUlwgXBrrAvC8gGP+iDzsvOh/D9Jx0L5mEcTQLgtbaeDSbO02CR6SxuD?=
 =?us-ascii?Q?Jz2y03qkbAjgbXnURkOmNZQEhfLoMhbXirS6F2fu1NpseTI65rZbhFxq4NQa?=
 =?us-ascii?Q?zBdM0Ctoc0MQqZS0uoCzpjLAHCtxdX9N1lHW86u+mOHXrT0CJnxncSaV3w7w?=
 =?us-ascii?Q?2Ip3pXD2qYAJoR2wMowXjxCgP7ap+kO+BifG3sKqhtucotRiwlZGp9bxuQTg?=
 =?us-ascii?Q?nn95ZFJm62kVPNoU22w+aXEtdNa3OanzfYzcrrEcc8npOeObkNxRofFVufDB?=
 =?us-ascii?Q?bGFHz8itJN5HmN2gx2ScOARc+nwIjGbfgnxf1ZSvUJUEkdT4c4gYe2DZTi3Z?=
 =?us-ascii?Q?KEc6HKE0OA0xYWrMNa4wsgUOQXSoHoNoAQ8RBNHwkh9T1c/g8ZWnOq1avs/I?=
 =?us-ascii?Q?kZFpASOPdDHF4Kn73tVSB7a53sKzDdyYy+gKR1awFqIiXp872HVlKrRRyA3f?=
 =?us-ascii?Q?1IzVUtDFh8GRY9H9xQKiI1xDgzKRSKlBdwtRgjBgvdjZO/euA/fW9UuRFPgv?=
 =?us-ascii?Q?eGr01R875bt9v+fsVrai+OAG+SebFQSMv6jYBm8Ya8u9mRInM+a4s/kNk2C3?=
 =?us-ascii?Q?RkfDq74U4GHeV6mopESnrv9pWqDQactnkwyME2yzBju6j2wpH5J7dbWf7gGR?=
 =?us-ascii?Q?kXBd9wPC7E9DTqmbsXyZowydO023YxrcoejXRCAawiOZn+XLZG0J/dqvlIhR?=
 =?us-ascii?Q?e9FlDo1dj4GwE+lVCrP6dlNd4CY0yiMqKiK52NjnmH47m4oKQO5NiH+q5cKj?=
 =?us-ascii?Q?Ophu7PWqxw=3D=3D?=
X-OriginatorOrg: Nvidia.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 26ee8aa8-f053-41dd-9055-08dea502c9c7
X-MS-Exchange-CrossTenant-AuthSource: LV8PR12MB9620.namprd12.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Apr 2026 08:47:32.9701
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: O2RxT1fy//oumnmGwrIuz7WT6Ok5oClMNZtp/JBsDxTAmz5FjJfk4UrO1Lbhhn8mow3Wxtx/QNMl+/rFBjadJA==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL1PR12MB5971

Hi Shrikanth,

On Tue, Apr 28, 2026 at 12:15:15PM +0530, Shrikanth Hegde wrote:
> On 4/28/26 10:46 AM, Andrea Righi wrote:
> > From: K Prateek Nayak <kprateek.nayak@amd.com>
> > 
> > On asymmetric CPU capacity systems, the wakeup path uses
> > select_idle_capacity(), which scans the span of sd_asym_cpucapacity
> > rather than sd_llc.
> > 
> > The has_idle_cores hint however lives on sd_llc->shared, so the
> > wakeup-time read of has_idle_cores operates on an LLC-scoped blob while
> > the actual scan/decision spans the asym domain; nr_busy_cpus also lives
> > in the same shared sched_domain data, but it's never used in the asym
> > CPU capacity scenario.
> > 
> > Therefore, move the sched_domain_shared object to sd_asym_cpucapacity
> > whenever the CPU has a SD_ASYM_CPUCAPACITY_FULL ancestor and that
> > ancestor is non-overlapping (i.e., not built from SD_NUMA). In that case
> > the scope of has_idle_cores matches the scope of the wakeup scan.
> > 
> > Fall back to attaching the shared object to sd_llc in three cases:
> > 
> >    1) plain symmetric systems (no SD_ASYM_CPUCAPACITY_FULL anywhere);
> > 
> >    2) CPUs in an exclusive cpuset that carves out a symmetric capacity
> >       island: has_asym is system-wide but those CPUs have no
> >       SD_ASYM_CPUCAPACITY_FULL ancestor in their hierarchy and follow
> >       the symmetric LLC path in select_idle_sibling();
> > 
> >    3) exotic topologies where SD_ASYM_CPUCAPACITY_FULL lands on an
> >       SD_NUMA-built domain. init_sched_domain_shared() keys the shared
> >       blob off cpumask_first(span), which on overlapping NUMA domains
> >       would alias unrelated spans onto the same blob. Keep the shared
> >       object on the LLC there; select_idle_capacity() gracefully skips
> >       the has_idle_cores preference when sd->shared is NULL.
> > 
> 
> Can you share the example topology where this benefits?

I've tested this both on a system with 1 NUMA node, 1 LLC, 88 SMT cores per LLC
(176 CPUs total) and 2 NUMA nodes, 2 LLC (one per node), 88 SMT cores per LLC
(352 CPUs total). The CPU capacities are ranging from 992 to 1024.

> 
> Is SD_ASYM_CPUCAPACITY_FULL one level above LLC but below NUMA?

In the system with a single node SD_ASYM_CPUCAPACITY_FULL is at the LLC level,
in the system with 2 nodes it's at the NUMA level.

> 
> > While at it, also rename the per-CPU sd_llc_shared to sd_balance_shared,
> > as it is no longer strictly tied to the LLC.
> > 
> 
> llc scans are at wakeup's. name sd_balance_shared indicates it is for load balance.

True, but sd_llc/balance_shared is used for the balancer kick logic. And idle
CPU scan is still a form of balancing at the end... but I'm open to suggestions
if we find a better name.

Thanks,
-Andrea

> 
> > Co-developed-by: Andrea Righi <arighi@nvidia.com>
> > Signed-off-by: Andrea Righi <arighi@nvidia.com>
> > Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> > ---
> >   kernel/sched/fair.c     | 20 +++++----
> >   kernel/sched/sched.h    |  2 +-
> >   kernel/sched/topology.c | 91 +++++++++++++++++++++++++++++++++++------
> >   3 files changed, 91 insertions(+), 22 deletions(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index fc0828150c780..ece3a26f59c27 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7790,7 +7790,7 @@ static inline void set_idle_cores(int cpu, int val)
> >   {
> >   	struct sched_domain_shared *sds;
> > -	sds = rcu_dereference_all(per_cpu(sd_llc_shared, cpu));
> > +	sds = rcu_dereference_all(per_cpu(sd_balance_shared, cpu));
> >   	if (sds)
> >   		WRITE_ONCE(sds->has_idle_cores, val);
> >   }
> > @@ -7799,7 +7799,7 @@ static inline bool test_idle_cores(int cpu)
> >   {
> >   	struct sched_domain_shared *sds;
> > -	sds = rcu_dereference_all(per_cpu(sd_llc_shared, cpu));
> > +	sds = rcu_dereference_all(per_cpu(sd_balance_shared, cpu));
> >   	if (sds)
> >   		return READ_ONCE(sds->has_idle_cores);
> > @@ -7808,7 +7808,7 @@ static inline bool test_idle_cores(int cpu)
> >   /*
> >    * Scans the local SMT mask to see if the entire core is idle, and records this
> > - * information in sd_llc_shared->has_idle_cores.
> > + * information in sd_balance_shared->has_idle_cores.
> >    *
> >    * Since SMT siblings share all cache levels, inspecting this limited remote
> >    * state should be fairly cheap.
> > @@ -7925,7 +7925,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
> >   	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_rq_mask);
> >   	int i, cpu, idle_cpu = -1, nr = INT_MAX;
> > -	if (sched_feat(SIS_UTIL)) {
> > +	if (sched_feat(SIS_UTIL) && sd->shared) {
> >   		/*
> >   		 * Increment because !--nr is the condition to stop scan.
> >   		 *
> > @@ -12759,7 +12759,7 @@ static bool nohz_balancer_needs_kick(struct rq *rq)
> >   		return false;
> >   	}
> > -	sds = rcu_dereference_all(per_cpu(sd_llc_shared, cpu));
> > +	sds = rcu_dereference_all(per_cpu(sd_balance_shared, cpu));
> >   	if (sds) {
> >   		/*
> >   		 * If there is an imbalance between LLC domains (IOW we could
> > @@ -12841,10 +12841,13 @@ static void set_cpu_sd_state_busy(int cpu)
> >   	guard(rcu)();
> >   	sd = rcu_dereference_all(per_cpu(sd_llc, cpu));
> > -	if (!sd || !sd->nohz_idle)
> > +	/*
> > +	 * sd->nohz_idle only pairs with nr_busy_cpus on sd->shared; if this LLC
> > +	 * domain has no shared object there is nothing to clear or account.
> > +	 */
> > +	if (!sd || !sd->shared || !sd->nohz_idle)
> >   		return;
> >   	sd->nohz_idle = 0;
> > -
> >   	atomic_inc(&sd->shared->nr_busy_cpus);
> >   }
> > @@ -12868,7 +12871,8 @@ static void set_cpu_sd_state_idle(int cpu)
> >   	guard(rcu)();
> >   	sd = rcu_dereference_all(per_cpu(sd_llc, cpu));
> > -	if (!sd || sd->nohz_idle)
> > +	/* See set_cpu_sd_state_busy(): nohz_idle is only used with sd->shared. */
> > +	if (!sd || !sd->shared || sd->nohz_idle)
> >   		return;
> >   	sd->nohz_idle = 1;
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index 9f63b15d309d1..330f5893c4561 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -2170,7 +2170,7 @@ DECLARE_PER_CPU(struct sched_domain __rcu *, sd_llc);
> >   DECLARE_PER_CPU(int, sd_llc_size);
> >   DECLARE_PER_CPU(int, sd_llc_id);
> >   DECLARE_PER_CPU(int, sd_share_id);
> > -DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
> > +DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_balance_shared);
> >   DECLARE_PER_CPU(struct sched_domain __rcu *, sd_numa);
> >   DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
> >   DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 5847b83d9d552..1e6ce369a4bbc 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -665,7 +665,7 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_llc);
> >   DEFINE_PER_CPU(int, sd_llc_size);
> >   DEFINE_PER_CPU(int, sd_llc_id);
> >   DEFINE_PER_CPU(int, sd_share_id);
> > -DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
> > +DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_balance_shared);
> >   DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa);
> >   DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
> >   DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
> > @@ -680,20 +680,39 @@ static void update_top_cache_domain(int cpu)
> >   	int id = cpu;
> >   	int size = 1;
> > +	sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY_FULL);
> > +	/*
> > +	 * The shared object is attached to sd_asym_cpucapacity only when the
> > +	 * asym domain is non-overlapping (i.e., not built from SD_NUMA).
> > +	 * On overlapping (NUMA) asym domains we fall back to letting the
> > +	 * SD_SHARE_LLC path own the shared object, so sd->shared may be NULL
> > +	 * here.
> > +	 */
> > +	if (sd && sd->shared)
> > +		sds = sd->shared;
> > +
> > +	rcu_assign_pointer(per_cpu(sd_asym_cpucapacity, cpu), sd);
> > +
> >   	sd = highest_flag_domain(cpu, SD_SHARE_LLC);
> >   	if (sd) {
> >   		id = cpumask_first(sched_domain_span(sd));
> >   		size = cpumask_weight(sched_domain_span(sd));
> > -		/* If sd_llc exists, sd_llc_shared should exist too. */
> > -		WARN_ON_ONCE(!sd->shared);
> > -		sds = sd->shared;
> > +		/*
> > +		 * If sd_asym_cpucapacity didn't claim the shared object,
> > +		 * sd_llc must have one linked.
> > +		 */
> > +		if (!sds) {
> > +			WARN_ON_ONCE(!sd->shared);
> > +			sds = sd->shared;
> > +		}
> >   	}
> >   	rcu_assign_pointer(per_cpu(sd_llc, cpu), sd);
> >   	per_cpu(sd_llc_size, cpu) = size;
> >   	per_cpu(sd_llc_id, cpu) = id;
> > -	rcu_assign_pointer(per_cpu(sd_llc_shared, cpu), sds);
> > +
> > +	rcu_assign_pointer(per_cpu(sd_balance_shared, cpu), sds);
> >   	sd = lowest_flag_domain(cpu, SD_CLUSTER);
> >   	if (sd)
> > @@ -711,9 +730,6 @@ static void update_top_cache_domain(int cpu)
> >   	sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
> >   	rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd);
> > -
> > -	sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY_FULL);
> > -	rcu_assign_pointer(per_cpu(sd_asym_cpucapacity, cpu), sd);
> >   }
> >   /*
> > @@ -2650,6 +2666,49 @@ static void adjust_numa_imbalance(struct sched_domain *sd_llc)
> >   	}
> >   }
> > +static void init_sched_domain_shared(struct s_data *d, struct sched_domain *sd)
> > +{
> > +	int sd_id = cpumask_first(sched_domain_span(sd));
> > +
> > +	sd->shared = *per_cpu_ptr(d->sds, sd_id);
> > +	atomic_set(&sd->shared->nr_busy_cpus, sd->span_weight);
> > +	atomic_inc(&sd->shared->ref);
> > +}
> > +
> > +/*
> > + * For asymmetric CPU capacity, attach sched_domain_shared on the innermost
> > + * SD_ASYM_CPUCAPACITY_FULL ancestor of @cpu's base domain when that ancestor is
> > + * not an overlapping NUMA-built domain (then LLC should claim shared).
> > + *
> > + * A CPU may lack any FULL ancestor (e.g., exclusive cpuset symmetric island),
> > + * then LLC must claim shared instead.
> > + *
> > + * Note: SD_ASYM_CPUCAPACITY_FULL is only set when multiple distinct capacities
> > + * exist in the domain span, so the asym domain we attach to cannot degenerate
> > + * into a single-capacity group. The relevant edge cases are instead covered by
> > + * the caveats above.
> > + *
> > + * Return true if this CPU's asym path claimed sd->shared, false otherwise.
> > + */
> > +static bool claim_asym_sched_domain_shared(struct s_data *d, int cpu)
> > +{
> > +	struct sched_domain *sd = *per_cpu_ptr(d->sd, cpu);
> > +	struct sched_domain *sd_asym;
> > +
> > +	if (!sd)
> > +		return false;
> > +
> > +	sd_asym = sd;
> > +	while (sd_asym && !(sd_asym->flags & SD_ASYM_CPUCAPACITY_FULL))
> > +		sd_asym = sd_asym->parent;
> > +
> > +	if (!sd_asym || (sd_asym->flags & SD_NUMA))
> > +		return false;
> > +
> > +	init_sched_domain_shared(d, sd_asym);
> > +	return true;
> > +}
> > +
> >   /*
> >    * Build sched domains for a given set of CPUs and attach the sched domains
> >    * to the individual CPUs
> > @@ -2708,20 +2767,26 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
> >   	}
> >   	for_each_cpu(i, cpu_map) {
> > +		bool asym_claimed = false;
> > +
> >   		sd = *per_cpu_ptr(d.sd, i);
> >   		if (!sd)
> >   			continue;
> > +		if (has_asym)
> > +			asym_claimed = claim_asym_sched_domain_shared(&d, i);
> > +
> >   		/* First, find the topmost SD_SHARE_LLC domain */
> >   		while (sd->parent && (sd->parent->flags & SD_SHARE_LLC))
> >   			sd = sd->parent;
> >   		if (sd->flags & SD_SHARE_LLC) {
> > -			int sd_id = cpumask_first(sched_domain_span(sd));
> > -
> > -			sd->shared = *per_cpu_ptr(d.sds, sd_id);
> > -			atomic_set(&sd->shared->nr_busy_cpus, sd->span_weight);
> > -			atomic_inc(&sd->shared->ref);
> > +			/*
> > +			 * Initialize the sd->shared for SD_SHARE_LLC unless
> > +			 * the asym path above already claimed it.
> > +			 */
> > +			if (!asym_claimed)
> > +				init_sched_domain_shared(&d, sd);
> >   			/*
> >   			 * In presence of higher domains, adjust the
>