From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from BL2PR02CU003.outbound.protection.outlook.com (mail-eastusazon11011028.outbound.protection.outlook.com [52.101.52.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA6873D1CB1 for ; Tue, 28 Apr 2026 08:47:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.52.28 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777366060; cv=fail; b=HWB5cti4Eck4hfsGG8eqR29+QXh5g5CUs7DOJFTJLawivVVfaIYCO2Aj0XEQJIt7a/4QPTzsOW8vdX3vy//0TEma5zS6G51uVE46AeT1i6szbcIJexkD/WiYVdsdsHKDSz0j9+KcpeM2H6waYUBxtKjf72kYKmHohw41w4U/JjQ= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777366060; c=relaxed/simple; bh=QOjyTne7C1XGRwXoDLyU9oiG7Tx1uTmeDv1+YgulPjA=; h=Date:From:To:Cc:Subject:Message-ID:References:Content-Type: Content-Disposition:In-Reply-To:MIME-Version; b=fqKM3O46mSgAjvsgRnqVwQqYOs6wW1dii7odiAb3Hsvix60cxaDKLXK+pjBykF6UiV3Ilrb2B01kuTB3s/6vxLZUVew8CZWrUo+eb0Ors+i9O7B8VpjeIIFEcYd2v7YixWZlAjt2jROr7dYY7Xm3pmSSBRZw/zcsyCGfY+Im+xc= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=GY/5XOsl; arc=fail smtp.client-ip=52.101.52.28 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="GY/5XOsl" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=adv4lUOcv1PFlZAivggCK/a9+DFTZAPrGvhwHoOEOplwoGO27AyK3GacGHSxLXQV2Xw2sHTw8MDwGMYzFTJMGMgnBfjLX+AGHh/WYNuebwNjM0Oh6mb+KlVbQQSJnUTgR0VQHFLKqB17/Vdjffk1Q6vBPreSwXjEpv98Z6+uRJjMUVMYLKGZziaFOi6Zb3XLHNtIt026sRF4tFuRhOVT6d5zEeLzMqJI95E3AoG3OCdOUkmDEwupGphBRzyxKA8/9O+VSgaQjAnvCWwNPwDdaQ/r2g8mHLJcjsaZXTgUlBXxyk6drIyQU+hwMk/em+KKZbNVKipyzHfcMnwddKyp0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=huPNMIPtyIf6RSRH/060dhPbINoeb37Eo0MXugDbFUk=; b=xm0LAGBAuR6inbkPQe3gQGSBqRTFdAEtKp5CDno9ILNt8zRUywG5RxjKy6L/DqBbzRVM2STcEzS9fYDDbK02xjKdxfB28D43as5gXoFeLeO/dlBo5fVqOMcyO+A7gmisQin7XYPgUrftjlHSS3/t2QYHpZfxkqKrlm3FyYAYN2iNGuTYNch60e5Cfw5pa6g7uxSlvg2I9RKhyhMadpgk+QqomwgkUprsARzGPgvEllhLHjrC++CIIFbLdecwMBpv6HDlILRgswbWPhsV2HFsM/cgvf0l1K/qsHLThLbute4c7QIw4naR+8+y/HIrLh1tOJCSWTKuItm1OyFOJ28waw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=huPNMIPtyIf6RSRH/060dhPbINoeb37Eo0MXugDbFUk=; b=GY/5XOslZkr1tICZuCt8+23SnYwJ30c/T4i9BGETzMt+ske58kNi/lQ4vyAJYTaQK70oo3gUU55C5GgPBpOqQE8JJ4wDFOWt34W3kHCELs5hsWy9un8wcg/Vl3s7N9Jqkx2k3soohXClV5lVVHOjOAyrlF7sxCQBun9sAv0KGPYJKtQjo+duKUHSV9xO6jnXgRHEEZekK3aNkQ2zt+cDAL8clwV3l18EJJWz2uSYZ/ctu6b9tFxkxMfYXkoOluHzwGbyrBTvIFnhEeulzVhP9PDGfgUN6aLfZS8Mhk/bGzVDd9npSuSCXPluqgGQLka9FPMLF1pyY/DcIHC5qTKTLw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) by BL1PR12MB5971.namprd12.prod.outlook.com (2603:10b6:208:39a::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9870.18; Tue, 28 Apr 2026 08:47:33 +0000 Received: from LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528]) by LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528%5]) with mapi id 15.20.9870.013; Tue, 28 Apr 2026 08:47:33 +0000 Date: Tue, 28 Apr 2026 10:47:22 +0200 From: Andrea Righi To: Shrikanth Hegde Cc: K Prateek Nayak , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Christian Loehle , Koba Ko , Felix Abecassis , Balbir Singh , Joel Fernandes , linux-kernel@vger.kernel.org, Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Subject: Re: [PATCH 2/6] sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity Message-ID: References: <20260428051720.3180182-1-arighi@nvidia.com> <20260428051720.3180182-3-arighi@nvidia.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: ZR1PEPF000077DE.CHEP278.PROD.OUTLOOK.COM (2603:10a6:918::40b) To LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: LV8PR12MB9620:EE_|BL1PR12MB5971:EE_ X-MS-Office365-Filtering-Correlation-Id: 26ee8aa8-f053-41dd-9055-08dea502c9c7 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|376014|7416014|56012099003|22082099003|18002099003; X-Microsoft-Antispam-Message-Info: bXgXJ+tJBL8kvMbaLnwdFHShnw/fqMY4oQLsaRZ35UwAHJOzTmSDdmjTD9DCBLALJXJoTS9N0Dg8NvzRtibERnAvYzQfvXFU4l9lNUBgv2flKKHJqiC1WsAnRm2zhMWPLegwnZJn1iriFCDP8H/VU715G9DyHoyE6bB3Z+KI5exnoJBJK35CugQRkUfFs6kC/DiQMawxGv0rCECCkfAFq2dWltE526+38u8dcaWyed7NBF9z204tC5UveO4iHB7SQgUzc5Z0X+38fYgAHAUrCvpZpaH2Y61hBKsdtYfw9K/SFr+lkqf54CnHyjZgEHknRGGOebKiowQU+iKe5iZqzEHOKBPnvaNb8drTPwvgMlY6rWbQx+xd5rBQhrIta7qZ/XCwmg4+VPXsxBrf2VTjEkkmdD1KRA3jJ6pbEoHEg7CrJ95ZQKNqAtANBQ2BcxU0Y/8zpklrD7pla3dAGqUPbQc+YNXSNvnYlNEN7fcxIMgRtqynHAFGN7d6S+p1ZpK6uoJi0YmBpZJoEQ8Vbj9C1qBrLDmmpSNwp+2zBEDQZkb2fI1JXv8NVAUWo4V3cUvU9gY2n16RLa3E2WzHTOZbML60UdDfOnEpEI5Yc1F2Dy7N4QpghEkKByHlrduu9NghAy3aLj9gQEPnHol1slDlOTUtzF57j+nSrqMP4DHIZVKTbr0O+pdE1wQcIqQVqR5az2guSbR6gepsptPVwaBjWNq/EzV2CTu8/rznR94ddMs= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LV8PR12MB9620.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(1800799024)(376014)(7416014)(56012099003)(22082099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?80KoV5mi02Ud+wy2O3CLzMLIzzoyDIk2VAZHQkC7bmnPK90FJCgYPsBDnQ6D?= =?us-ascii?Q?bELc0ASsArbjTCpcP9+dQT+sQykm4/9L2bY26wD3418tOTrrVORbjQHVdqOX?= =?us-ascii?Q?is60KoR6MekxYZooaBpGq67GQy2kKGPKuclFo0SY7PVoFHEOslu2QgK9izrC?= =?us-ascii?Q?TJSLKXvHCMJ6IfAdtUKetYB/9jqgc4GVju+1yg0OQxRFEW0Go5i+7sVqKe5U?= =?us-ascii?Q?zjsnE35on2a+fTu1r1nPwjwNl9vMK/A1QFwSry9snt20k94RwP2FLfO/p4Mb?= =?us-ascii?Q?e0GIRlydhUh7ebW+gQX7YLRZGYV6WQD5cbvClwA5WpRbs2YbFLi0gF2G2eVr?= =?us-ascii?Q?CkQ4vKvI2sO0eU/PIzHxZN1mrNsLAu4vdYauzcbNFHtvJAfx1XSRDrHJSmro?= =?us-ascii?Q?coguTz8bDEaGePsUX1hfwZQKWoptXI+xvUGbbllsU0mEnWqZzcmGtiin9rz+?= =?us-ascii?Q?/szkd4EqrJm2urzUAdzsfh3S9bhUDRmiPYXwfNO63xOKQpFwm0WjTLjYRcUn?= =?us-ascii?Q?C81929jfxVv85Hq2qlmYzmMWBhGfsi55dxwiGfl5q/jG0gjaG94eN+nlskyz?= =?us-ascii?Q?kuJ8AmrobMYO/u1QHM0H+1rE+0ql3W7muIhi/qvYuK4vkCYDLsCLMSjkoy5n?= =?us-ascii?Q?q/s8M1BgijTggnY3hATFwwFsbrRvYqh95pMYqvhH2V02D0DvjoiWk4nlKQBe?= =?us-ascii?Q?6GyAGHYgnKH3Z1qGrJBD4N2G82/h6vqPuB8P7/X5hvqbwZJx7xfRyc7qB7mJ?= =?us-ascii?Q?4sPevuP2lEN+8t2+xCe3O9pdRfs+JWPfFpOetvqZbk9Tl51FIFgdEMrQ19MO?= =?us-ascii?Q?VoV2jez1M1OgILYQjUMp16eBYuYlsVOY8+yV8Zoanj8I8gTC8KAUbCg9rQ50?= =?us-ascii?Q?WlDtD0EiJrLZ2EZ46/PWczaYhP5FVOCh6QrXT9x2lgmpkQSi1CsoF/Z9e368?= =?us-ascii?Q?ab/cIUlwgXBrrAvC8gGP+iDzsvOh/D9Jx0L5mEcTQLgtbaeDSbO02CR6SxuD?= =?us-ascii?Q?Jz2y03qkbAjgbXnURkOmNZQEhfLoMhbXirS6F2fu1NpseTI65rZbhFxq4NQa?= =?us-ascii?Q?zBdM0Ctoc0MQqZS0uoCzpjLAHCtxdX9N1lHW86u+mOHXrT0CJnxncSaV3w7w?= =?us-ascii?Q?2Ip3pXD2qYAJoR2wMowXjxCgP7ap+kO+BifG3sKqhtucotRiwlZGp9bxuQTg?= =?us-ascii?Q?nn95ZFJm62kVPNoU22w+aXEtdNa3OanzfYzcrrEcc8npOeObkNxRofFVufDB?= =?us-ascii?Q?bGFHz8itJN5HmN2gx2ScOARc+nwIjGbfgnxf1ZSvUJUEkdT4c4gYe2DZTi3Z?= =?us-ascii?Q?KEc6HKE0OA0xYWrMNa4wsgUOQXSoHoNoAQ8RBNHwkh9T1c/g8ZWnOq1avs/I?= =?us-ascii?Q?kZFpASOPdDHF4Kn73tVSB7a53sKzDdyYy+gKR1awFqIiXp872HVlKrRRyA3f?= =?us-ascii?Q?1IzVUtDFh8GRY9H9xQKiI1xDgzKRSKlBdwtRgjBgvdjZO/euA/fW9UuRFPgv?= =?us-ascii?Q?eGr01R875bt9v+fsVrai+OAG+SebFQSMv6jYBm8Ya8u9mRInM+a4s/kNk2C3?= =?us-ascii?Q?RkfDq74U4GHeV6mopESnrv9pWqDQactnkwyME2yzBju6j2wpH5J7dbWf7gGR?= =?us-ascii?Q?kXBd9wPC7E9DTqmbsXyZowydO023YxrcoejXRCAawiOZn+XLZG0J/dqvlIhR?= =?us-ascii?Q?e9FlDo1dj4GwE+lVCrP6dlNd4CY0yiMqKiK52NjnmH47m4oKQO5NiH+q5cKj?= =?us-ascii?Q?Ophu7PWqxw=3D=3D?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 26ee8aa8-f053-41dd-9055-08dea502c9c7 X-MS-Exchange-CrossTenant-AuthSource: LV8PR12MB9620.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Apr 2026 08:47:32.9701 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: O2RxT1fy//oumnmGwrIuz7WT6Ok5oClMNZtp/JBsDxTAmz5FjJfk4UrO1Lbhhn8mow3Wxtx/QNMl+/rFBjadJA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL1PR12MB5971 Hi Shrikanth, On Tue, Apr 28, 2026 at 12:15:15PM +0530, Shrikanth Hegde wrote: > On 4/28/26 10:46 AM, Andrea Righi wrote: > > From: K Prateek Nayak > > > > On asymmetric CPU capacity systems, the wakeup path uses > > select_idle_capacity(), which scans the span of sd_asym_cpucapacity > > rather than sd_llc. > > > > The has_idle_cores hint however lives on sd_llc->shared, so the > > wakeup-time read of has_idle_cores operates on an LLC-scoped blob while > > the actual scan/decision spans the asym domain; nr_busy_cpus also lives > > in the same shared sched_domain data, but it's never used in the asym > > CPU capacity scenario. > > > > Therefore, move the sched_domain_shared object to sd_asym_cpucapacity > > whenever the CPU has a SD_ASYM_CPUCAPACITY_FULL ancestor and that > > ancestor is non-overlapping (i.e., not built from SD_NUMA). In that case > > the scope of has_idle_cores matches the scope of the wakeup scan. > > > > Fall back to attaching the shared object to sd_llc in three cases: > > > > 1) plain symmetric systems (no SD_ASYM_CPUCAPACITY_FULL anywhere); > > > > 2) CPUs in an exclusive cpuset that carves out a symmetric capacity > > island: has_asym is system-wide but those CPUs have no > > SD_ASYM_CPUCAPACITY_FULL ancestor in their hierarchy and follow > > the symmetric LLC path in select_idle_sibling(); > > > > 3) exotic topologies where SD_ASYM_CPUCAPACITY_FULL lands on an > > SD_NUMA-built domain. init_sched_domain_shared() keys the shared > > blob off cpumask_first(span), which on overlapping NUMA domains > > would alias unrelated spans onto the same blob. Keep the shared > > object on the LLC there; select_idle_capacity() gracefully skips > > the has_idle_cores preference when sd->shared is NULL. > > > > Can you share the example topology where this benefits? I've tested this both on a system with 1 NUMA node, 1 LLC, 88 SMT cores per LLC (176 CPUs total) and 2 NUMA nodes, 2 LLC (one per node), 88 SMT cores per LLC (352 CPUs total). The CPU capacities are ranging from 992 to 1024. > > Is SD_ASYM_CPUCAPACITY_FULL one level above LLC but below NUMA? In the system with a single node SD_ASYM_CPUCAPACITY_FULL is at the LLC level, in the system with 2 nodes it's at the NUMA level. > > > While at it, also rename the per-CPU sd_llc_shared to sd_balance_shared, > > as it is no longer strictly tied to the LLC. > > > > llc scans are at wakeup's. name sd_balance_shared indicates it is for load balance. True, but sd_llc/balance_shared is used for the balancer kick logic. And idle CPU scan is still a form of balancing at the end... but I'm open to suggestions if we find a better name. Thanks, -Andrea > > > Co-developed-by: Andrea Righi > > Signed-off-by: Andrea Righi > > Signed-off-by: K Prateek Nayak > > --- > > kernel/sched/fair.c | 20 +++++---- > > kernel/sched/sched.h | 2 +- > > kernel/sched/topology.c | 91 +++++++++++++++++++++++++++++++++++------ > > 3 files changed, 91 insertions(+), 22 deletions(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index fc0828150c780..ece3a26f59c27 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -7790,7 +7790,7 @@ static inline void set_idle_cores(int cpu, int val) > > { > > struct sched_domain_shared *sds; > > - sds = rcu_dereference_all(per_cpu(sd_llc_shared, cpu)); > > + sds = rcu_dereference_all(per_cpu(sd_balance_shared, cpu)); > > if (sds) > > WRITE_ONCE(sds->has_idle_cores, val); > > } > > @@ -7799,7 +7799,7 @@ static inline bool test_idle_cores(int cpu) > > { > > struct sched_domain_shared *sds; > > - sds = rcu_dereference_all(per_cpu(sd_llc_shared, cpu)); > > + sds = rcu_dereference_all(per_cpu(sd_balance_shared, cpu)); > > if (sds) > > return READ_ONCE(sds->has_idle_cores); > > @@ -7808,7 +7808,7 @@ static inline bool test_idle_cores(int cpu) > > /* > > * Scans the local SMT mask to see if the entire core is idle, and records this > > - * information in sd_llc_shared->has_idle_cores. > > + * information in sd_balance_shared->has_idle_cores. > > * > > * Since SMT siblings share all cache levels, inspecting this limited remote > > * state should be fairly cheap. > > @@ -7925,7 +7925,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool > > struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_rq_mask); > > int i, cpu, idle_cpu = -1, nr = INT_MAX; > > - if (sched_feat(SIS_UTIL)) { > > + if (sched_feat(SIS_UTIL) && sd->shared) { > > /* > > * Increment because !--nr is the condition to stop scan. > > * > > @@ -12759,7 +12759,7 @@ static bool nohz_balancer_needs_kick(struct rq *rq) > > return false; > > } > > - sds = rcu_dereference_all(per_cpu(sd_llc_shared, cpu)); > > + sds = rcu_dereference_all(per_cpu(sd_balance_shared, cpu)); > > if (sds) { > > /* > > * If there is an imbalance between LLC domains (IOW we could > > @@ -12841,10 +12841,13 @@ static void set_cpu_sd_state_busy(int cpu) > > guard(rcu)(); > > sd = rcu_dereference_all(per_cpu(sd_llc, cpu)); > > - if (!sd || !sd->nohz_idle) > > + /* > > + * sd->nohz_idle only pairs with nr_busy_cpus on sd->shared; if this LLC > > + * domain has no shared object there is nothing to clear or account. > > + */ > > + if (!sd || !sd->shared || !sd->nohz_idle) > > return; > > sd->nohz_idle = 0; > > - > > atomic_inc(&sd->shared->nr_busy_cpus); > > } > > @@ -12868,7 +12871,8 @@ static void set_cpu_sd_state_idle(int cpu) > > guard(rcu)(); > > sd = rcu_dereference_all(per_cpu(sd_llc, cpu)); > > - if (!sd || sd->nohz_idle) > > + /* See set_cpu_sd_state_busy(): nohz_idle is only used with sd->shared. */ > > + if (!sd || !sd->shared || sd->nohz_idle) > > return; > > sd->nohz_idle = 1; > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > > index 9f63b15d309d1..330f5893c4561 100644 > > --- a/kernel/sched/sched.h > > +++ b/kernel/sched/sched.h > > @@ -2170,7 +2170,7 @@ DECLARE_PER_CPU(struct sched_domain __rcu *, sd_llc); > > DECLARE_PER_CPU(int, sd_llc_size); > > DECLARE_PER_CPU(int, sd_llc_id); > > DECLARE_PER_CPU(int, sd_share_id); > > -DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared); > > +DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_balance_shared); > > DECLARE_PER_CPU(struct sched_domain __rcu *, sd_numa); > > DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing); > > DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity); > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > > index 5847b83d9d552..1e6ce369a4bbc 100644 > > --- a/kernel/sched/topology.c > > +++ b/kernel/sched/topology.c > > @@ -665,7 +665,7 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_llc); > > DEFINE_PER_CPU(int, sd_llc_size); > > DEFINE_PER_CPU(int, sd_llc_id); > > DEFINE_PER_CPU(int, sd_share_id); > > -DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared); > > +DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_balance_shared); > > DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa); > > DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing); > > DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity); > > @@ -680,20 +680,39 @@ static void update_top_cache_domain(int cpu) > > int id = cpu; > > int size = 1; > > + sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY_FULL); > > + /* > > + * The shared object is attached to sd_asym_cpucapacity only when the > > + * asym domain is non-overlapping (i.e., not built from SD_NUMA). > > + * On overlapping (NUMA) asym domains we fall back to letting the > > + * SD_SHARE_LLC path own the shared object, so sd->shared may be NULL > > + * here. > > + */ > > + if (sd && sd->shared) > > + sds = sd->shared; > > + > > + rcu_assign_pointer(per_cpu(sd_asym_cpucapacity, cpu), sd); > > + > > sd = highest_flag_domain(cpu, SD_SHARE_LLC); > > if (sd) { > > id = cpumask_first(sched_domain_span(sd)); > > size = cpumask_weight(sched_domain_span(sd)); > > - /* If sd_llc exists, sd_llc_shared should exist too. */ > > - WARN_ON_ONCE(!sd->shared); > > - sds = sd->shared; > > + /* > > + * If sd_asym_cpucapacity didn't claim the shared object, > > + * sd_llc must have one linked. > > + */ > > + if (!sds) { > > + WARN_ON_ONCE(!sd->shared); > > + sds = sd->shared; > > + } > > } > > rcu_assign_pointer(per_cpu(sd_llc, cpu), sd); > > per_cpu(sd_llc_size, cpu) = size; > > per_cpu(sd_llc_id, cpu) = id; > > - rcu_assign_pointer(per_cpu(sd_llc_shared, cpu), sds); > > + > > + rcu_assign_pointer(per_cpu(sd_balance_shared, cpu), sds); > > sd = lowest_flag_domain(cpu, SD_CLUSTER); > > if (sd) > > @@ -711,9 +730,6 @@ static void update_top_cache_domain(int cpu) > > sd = highest_flag_domain(cpu, SD_ASYM_PACKING); > > rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd); > > - > > - sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY_FULL); > > - rcu_assign_pointer(per_cpu(sd_asym_cpucapacity, cpu), sd); > > } > > /* > > @@ -2650,6 +2666,49 @@ static void adjust_numa_imbalance(struct sched_domain *sd_llc) > > } > > } > > +static void init_sched_domain_shared(struct s_data *d, struct sched_domain *sd) > > +{ > > + int sd_id = cpumask_first(sched_domain_span(sd)); > > + > > + sd->shared = *per_cpu_ptr(d->sds, sd_id); > > + atomic_set(&sd->shared->nr_busy_cpus, sd->span_weight); > > + atomic_inc(&sd->shared->ref); > > +} > > + > > +/* > > + * For asymmetric CPU capacity, attach sched_domain_shared on the innermost > > + * SD_ASYM_CPUCAPACITY_FULL ancestor of @cpu's base domain when that ancestor is > > + * not an overlapping NUMA-built domain (then LLC should claim shared). > > + * > > + * A CPU may lack any FULL ancestor (e.g., exclusive cpuset symmetric island), > > + * then LLC must claim shared instead. > > + * > > + * Note: SD_ASYM_CPUCAPACITY_FULL is only set when multiple distinct capacities > > + * exist in the domain span, so the asym domain we attach to cannot degenerate > > + * into a single-capacity group. The relevant edge cases are instead covered by > > + * the caveats above. > > + * > > + * Return true if this CPU's asym path claimed sd->shared, false otherwise. > > + */ > > +static bool claim_asym_sched_domain_shared(struct s_data *d, int cpu) > > +{ > > + struct sched_domain *sd = *per_cpu_ptr(d->sd, cpu); > > + struct sched_domain *sd_asym; > > + > > + if (!sd) > > + return false; > > + > > + sd_asym = sd; > > + while (sd_asym && !(sd_asym->flags & SD_ASYM_CPUCAPACITY_FULL)) > > + sd_asym = sd_asym->parent; > > + > > + if (!sd_asym || (sd_asym->flags & SD_NUMA)) > > + return false; > > + > > + init_sched_domain_shared(d, sd_asym); > > + return true; > > +} > > + > > /* > > * Build sched domains for a given set of CPUs and attach the sched domains > > * to the individual CPUs > > @@ -2708,20 +2767,26 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att > > } > > for_each_cpu(i, cpu_map) { > > + bool asym_claimed = false; > > + > > sd = *per_cpu_ptr(d.sd, i); > > if (!sd) > > continue; > > + if (has_asym) > > + asym_claimed = claim_asym_sched_domain_shared(&d, i); > > + > > /* First, find the topmost SD_SHARE_LLC domain */ > > while (sd->parent && (sd->parent->flags & SD_SHARE_LLC)) > > sd = sd->parent; > > if (sd->flags & SD_SHARE_LLC) { > > - int sd_id = cpumask_first(sched_domain_span(sd)); > > - > > - sd->shared = *per_cpu_ptr(d.sds, sd_id); > > - atomic_set(&sd->shared->nr_busy_cpus, sd->span_weight); > > - atomic_inc(&sd->shared->ref); > > + /* > > + * Initialize the sd->shared for SD_SHARE_LLC unless > > + * the asym path above already claimed it. > > + */ > > + if (!asym_claimed) > > + init_sched_domain_shared(&d, sd); > > /* > > * In presence of higher domains, adjust the >