From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from PH8PR06CU001.outbound.protection.outlook.com (mail-westus3azon11012053.outbound.protection.outlook.com [40.107.209.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CC50434750B for ; Sat, 9 May 2026 18:10:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.209.53 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778350214; cv=fail; b=sRBrAen8HnTJZ0OsEhna9FjVPKcsego7xl2JxjPAF3jd6KMis5QbuwoGgjGjTEkOavSXRQtvhdwtquder+azwXSq2vabDs8dc80afb4EmNEMMYCQtRFTelvvLW2jhaQ0IvuwpYhh7orkpr5+g7CYsUuTbW8A6vXolbLOjMV330I= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778350214; c=relaxed/simple; bh=hSNJvbmDFUOwzMAAJSKN4CNIhxyGqVvN9KSqzfRq/2E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=sJDsxY71kT125t53JB58xX0iw6KPtMALUOvVu9ZW+l/YwoL6uSNEZ4VHOwOaqqkR5ufGouDP1kGxmOirZV0vXMW5GP0vfgIJ0qVdb9dSZOrnMRZEOHQ2HsQZC9NQrNt0VuFUh3gEjRgV+dEEXZoxVK1y2xorJfmMdmDT2r8+62A= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=gaOBCshe; arc=fail smtp.client-ip=40.107.209.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="gaOBCshe" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=qNrN8aHMZwT7G6pZZI5TfJbw64K/uQ8LGWc/CV8zqQasX3zAWfd+zRVi1EymEmyraPysmOTZIP41O8cpqk4UXesWaaBZYRXAmgsvGm++5eOv6zk0flXVZ9nobFfBDtyqTr3jPrP7hGXxOviHUkp45I+HnLjXErHU9b/dTGyQdqBINxmX1YmLywjUTKdXRIL18dad0NGxitvHI93sb3Hy7uy5Fq7hceBQ7HvdvR6QH0AzQLvyF1p9gcWtbEmrs9IoYXYINgDLVHIouOvoTRw7uC4bs+IVdm/XuZ3gDUx+aD2g5MgR+pEs+6ZHHB68xiTG11NcLjwTQf0L3znrFY3g1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=pEV+Lw0HuznHk6rtUeSfTu3qPHh5oXqhxVm1sYCaWMo=; b=BuUFHoiAErdetpG9gYInF15DFeuQGtnqe+Iv4Ttg04R+7gbOzoa+jsbbu78RyuYGHk1jw+4Fkfc6UcoGppvXhx4Yej2YYLW8PmvtEZRbn5L9L5Th3oGR/W1ho8YP8Uj9hZkwkYBo8mHlSIl5NsQkF1Ejpw2QuG0+a+rVBXXaUYCr2qcjhBJIoO5DcW0ozzqRsJkvY5p7zAhN1tSjm3Ztac5xIUgCSja/9OftPxYvJF2jShQJyKYJSBiPgwROe4r3KgSFnd5mbarbb4/WEk+kqZr/PdWwB9hfL0Fwp88YOG5Xo6tcsO6pppR4w0ezXe3JkUrIk9klP870TKnFJsXIRw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=pEV+Lw0HuznHk6rtUeSfTu3qPHh5oXqhxVm1sYCaWMo=; b=gaOBCshe8jQqTSjPV/aUUhI4ym6JFwbDAftq8PiNII8xKcP+jzm81Fk6WBOj8BV1AbzVfFQIWvo4W45ss9wUxUx3xmbqX/o5ik9NX2gbyTw2l0cTCAS2gHXwXscSNTpLCxP9X3olaJRwUbFjyYdQJ+IWXh7etsw3XODz0qntIHnn+9IaaN3Wpih+uAIPCU62CIk3bqoRkTrBiciUcyzkmBvUMqy71cgRWpC7M6r4QVp8pcB7Lt4iWGzSLPcuLqltlJURxcljjsq7lc3TtOOOumw4cu6kbjwYYvih1YICI+gjnpGBEMz875ftXYaIp3muJvEOwD1UJlpw1QUyXhFBDw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) by SA5PPF9BB0D8619.namprd12.prod.outlook.com (2603:10b6:80f:fc04::8d8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9870.25; Sat, 9 May 2026 18:10:07 +0000 Received: from LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528]) by LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528%5]) with mapi id 15.20.9891.019; Sat, 9 May 2026 18:10:07 +0000 From: Andrea Righi To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , K Prateek Nayak , Christian Loehle , Phil Auld , Koba Ko , Felix Abecassis , Balbir Singh , Joel Fernandes , Shrikanth Hegde , linux-kernel@vger.kernel.org Subject: [PATCH 2/5] sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity Date: Sat, 9 May 2026 20:07:26 +0200 Message-ID: <20260509180955.1840064-3-arighi@nvidia.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260509180955.1840064-1-arighi@nvidia.com> References: <20260509180955.1840064-1-arighi@nvidia.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: MI1P293CA0020.ITAP293.PROD.OUTLOOK.COM (2603:10a6:290:3::18) To LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: LV8PR12MB9620:EE_|SA5PPF9BB0D8619:EE_ X-MS-Office365-Filtering-Correlation-Id: c6b979f2-0df7-45bb-6bc3-08deadf633c3 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|376014|7416014|366016|56012099003|22082099003|18002099003; X-Microsoft-Antispam-Message-Info: M7pu++cTBijVoLFo/Nf/VJelokSC0krcbJMkBf30SZ322ev/Vgbqetbta8Slh8VcAM8nYDit0fgdBuLByRuT13Hj+kWmulpaLiVbwPxfRYsBrKuhaYBL1xbYpMTo8feD1/jh39GWaU6iKN7v6ZEIjfEUU2FDeu6VFjaU/T6x+4FViXX3xzULdYlJeX52wPVLKXti3RqufcJub/EwC1RE4UW3MKYDVtYqXnwP1LmUtWuJyKbH7K8WwmdM+cBJWNCVFSuWIvAAV7xOUlAPtl1BgpRyTGbXDLmCDmRbdBqPHfatqlf+//NFWS4OhIq4l1j9qlhkDYS1EiTIIUBkF57HBAEkaUKyp941V2FM+M//7R6PxppummiBLN0dngxLtV+skI34jJev7GWeEG9ZCEIoJQaI+a1C82/b4Z34IwWQ0/BvUvnr299jju+h/OaB541OCsS1pzPuGVKoWeGgTXdDvnFwoC5znyUeNem6+belEaTeccl521P1kasLJ8LW0nOCxcKNY1mQcYtWVuS88LF5jv4Fo+QIukyrXN6DhmpfVCWOmde52WmtVol4VwcB7pBe7KtQpwOUoXJT0M/KSxQFHj6VoZuHdoK8/WZ9n89kj9DphOatUYf0VB8i5YBCnr1/m9Yi3w9M+6WTZOYmUdyCdSXlpiZQSpMRo1NqwGh/xnF0lt3Vh+iKB45/Xm8USDjp X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LV8PR12MB9620.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(376014)(7416014)(366016)(56012099003)(22082099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?cKhA7U8v2PMiJ7pyMmQ042ASr1rLlBpe3ltnP5mr+ChmduPxq5xXQpWG0JaJ?= =?us-ascii?Q?dHEaIafqAZU8PAUvfynvNh32eVudLOjIkImU2Q3jSoOL71fbHZoqDZXcyH/x?= =?us-ascii?Q?sOIIUpuJdOgYbADw/sJnYlf8bofN/tjuZ1j7B7A65bg7J3l9JLEMGqJoaKQf?= =?us-ascii?Q?YqYdVotP6V/E3TQGvbY28Mz7rtAMPUsZG9idmCby/NDNivvC9fdoeT37+btZ?= =?us-ascii?Q?TCPLj4gItlLtJ9h6e21l0noySXc/LOlPBWzh9ixmIjzIra02A/CRcu5XQuJR?= =?us-ascii?Q?CMsMpl15sArG6H+Yrbi+LZZYB1ulxJ9XSGlUKZ7UDT3Cn96PZfgoeL+4J2Cu?= =?us-ascii?Q?S1c601Jdb1f+XhUVmjbSCdjN5Rq1VUhVHChKy7K/mf1QPbL0KXEol7lE5Z92?= =?us-ascii?Q?zBr5ZmgnKEIRTGKwwiayuuE4NR/gIoMlIiTVwkraOfOuUxP0ttPIwRjzktGl?= =?us-ascii?Q?LFj9nYMt5pde5eaklAkz18D2yaTZ2K2XCOMu89t1cLCrhKvW2u8uhKMndGQP?= =?us-ascii?Q?Q5G+FwTqRStuFYrHgKZztUPMXBgQojtatnnJfMtT1STdXcFvgATZZUUtZ/dL?= =?us-ascii?Q?MHY7X29N0hMOoGOtYPYl50NqzHq0dLOW8HXmxgMWAkjYl4AbXTiggoaSJlx1?= =?us-ascii?Q?0mQ0YCzEto7BQUnXDYstvA6d/N44tcOIsfPk5b5EM9bfyhJhDlGWTrX1p7t0?= =?us-ascii?Q?q0Camu6rqrNNWgQYjN7e8SVQg7n0C6gJDMFQpGnZBAcb/YQ7VSF3Ke32HBtE?= =?us-ascii?Q?Nlc6EISFlMpHJm/Ju01aAlYXGWg3U1eoFr9wl6xtP9ar3T497x3sp94NpKp9?= =?us-ascii?Q?kDbBOCeyq2xvw6jnBGp6g1fXwEiC4VBZ8HJKZEKymANZN0FYUom7oSNnZy2+?= =?us-ascii?Q?zWI2gIB7qp75A2Q3LEMc9DC2rwnBx9V0hTqbgOe0mGvf+a/7Rq3W0RVakQHp?= =?us-ascii?Q?2Rj6Z4Y87Xg5T8o31gEznIF/6WIgA0M43KFRV38mflSDTs0b4nfIqG8Yoml0?= =?us-ascii?Q?7JRRnSS3RyM1my27rcLyT3DT3TkMgKC3yooW9R9oPG3ei3gRVs7CqWgxLcfz?= =?us-ascii?Q?SnobfIkEbwC93ewYPgHcYhTJ8VmU0tvA+bUrylmC+TwxFoqL7EutYwJ0gy8S?= =?us-ascii?Q?ksyp+GmvcxjltO16qorJAczOTazq0FBH+df5sllhtZwoWMk6j8tgp70zbRzM?= =?us-ascii?Q?Ou1tVvSrza6oYEZGOunfdcWkAfU5DPLpOQfJKLoxwFJi+XTlsmx1b63A2QIq?= =?us-ascii?Q?wWGJBGklKsRtTsz+1G/N9cIaAwa4QWuPq+JsHFcWTQECM0xUfyHtjcaXFy/H?= =?us-ascii?Q?8FDTMZv61ZztinxvrTciyxX2ESaOVkIkuW+tMtPiHGZrTdqp1AaUlF5VEJeT?= =?us-ascii?Q?6gXI+7gvChJDVXh2AZhA4hZ0nitdHtaQLsBpzPLmMUU9tlQwjjDBOBFTbd0K?= =?us-ascii?Q?wvdFiSAfxT/twx7S5oNTgo9eExCtjurB2pEPj+5SkHMl0/sJ1ka8EsZc/6qT?= =?us-ascii?Q?ZwUt/GxJpfW/wy1QcRzw5cvclSV2RDbSAPfwO1mzMXdMo+0ii9PKlYKOnvz6?= =?us-ascii?Q?w2cBmB2ibOPBjIuj/RkaLP1fqXiN923pSd5uzdynYGfjk/b+/InyaslVvgPi?= =?us-ascii?Q?nTBTW/8HTa+a1msz7ihTt63iHURBbl/TQxy0ZstIZSZBbPof17dPxrJKtvkq?= =?us-ascii?Q?ty/xnXTBS6pfLy5yCn4LiQq/bjfWz6qrNCNpS7EXW9l4aq4DjD0IX9rn5WbP?= =?us-ascii?Q?c8r8JSDM3w=3D=3D?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: c6b979f2-0df7-45bb-6bc3-08deadf633c3 X-MS-Exchange-CrossTenant-AuthSource: LV8PR12MB9620.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 May 2026 18:10:07.6976 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: PD269LhVocBQQyYyLPV3NNNbOogQooTu9O8N1qQjSidbaMauDoEXWbgHhamwafC5QcQGjBfuRNe68eEmWzxV4A== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA5PPF9BB0D8619 From: K Prateek Nayak On asymmetric CPU capacity systems, the wakeup path uses select_idle_capacity(), which scans the span of sd_asym_cpucapacity rather than sd_llc. The has_idle_cores hint however lives on sd_llc->shared, so the wakeup-time read of has_idle_cores operates on an LLC-scoped blob while the actual scan/decision spans the asym domain; nr_busy_cpus also lives in the same shared sched_domain data, but it's never used in the asym CPU capacity scenario. Therefore, move the sched_domain_shared object to sd_asym_cpucapacity whenever the CPU has a SD_ASYM_CPUCAPACITY_FULL ancestor and that ancestor is non-overlapping (i.e., not built from SD_NUMA). In that case the scope of has_idle_cores matches the scope of the wakeup scan. Fall back to attaching the shared object to sd_llc in three cases: 1) plain symmetric systems (no SD_ASYM_CPUCAPACITY_FULL anywhere); 2) CPUs in an exclusive cpuset that carves out a symmetric capacity island: has_asym is system-wide but those CPUs have no SD_ASYM_CPUCAPACITY_FULL ancestor in their hierarchy and follow the symmetric LLC path in select_idle_sibling(); 3) exotic topologies where SD_ASYM_CPUCAPACITY_FULL lands on an SD_NUMA-built domain. init_sched_domain_shared() keys the shared blob off cpumask_first(span), which on overlapping NUMA domains would alias unrelated spans onto the same blob. Keep the shared object on the LLC there; select_idle_capacity() gracefully skips the has_idle_cores preference when sd->shared is NULL. While at it, also rename the per-CPU sd_llc_shared to sd_balance_shared, as it is no longer strictly tied to the LLC. Cc: Vincent Guittot Cc: Dietmar Eggemann Co-developed-by: Andrea Righi Signed-off-by: Andrea Righi Signed-off-by: K Prateek Nayak --- kernel/sched/fair.c | 19 ++++++--- kernel/sched/sched.h | 2 +- kernel/sched/topology.c | 95 +++++++++++++++++++++++++++++++++++------ 3 files changed, 95 insertions(+), 21 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6b059ee80b631..960a1a9696b98 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7819,7 +7819,7 @@ static inline void set_idle_cores(int cpu, int val) { struct sched_domain_shared *sds; - sds = rcu_dereference_all(per_cpu(sd_llc_shared, cpu)); + sds = rcu_dereference_all(per_cpu(sd_balance_shared, cpu)); if (sds) WRITE_ONCE(sds->has_idle_cores, val); } @@ -7828,7 +7828,7 @@ static inline bool test_idle_cores(int cpu) { struct sched_domain_shared *sds; - sds = rcu_dereference_all(per_cpu(sd_llc_shared, cpu)); + sds = rcu_dereference_all(per_cpu(sd_balance_shared, cpu)); if (sds) return READ_ONCE(sds->has_idle_cores); @@ -7837,7 +7837,7 @@ static inline bool test_idle_cores(int cpu) /* * Scans the local SMT mask to see if the entire core is idle, and records this - * information in sd_llc_shared->has_idle_cores. + * information in sd_balance_shared->has_idle_cores. * * Since SMT siblings share all cache levels, inspecting this limited remote * state should be fairly cheap. @@ -7954,7 +7954,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_rq_mask); int i, cpu, idle_cpu = -1, nr = INT_MAX; - if (sched_feat(SIS_UTIL)) { + if (sched_feat(SIS_UTIL) && sd->shared) { /* * Increment because !--nr is the condition to stop scan. * @@ -12834,7 +12834,7 @@ static void nohz_balancer_kick(struct rq *rq) goto out; } - sds = rcu_dereference_all(per_cpu(sd_llc_shared, cpu)); + sds = rcu_dereference_all(per_cpu(sd_balance_shared, cpu)); if (sds) { /* * If there is an imbalance between LLC domains (IOW we could @@ -12862,7 +12862,11 @@ static void set_cpu_sd_state_busy(int cpu) struct sched_domain *sd; sd = rcu_dereference_all(per_cpu(sd_llc, cpu)); - if (!sd || !sd->nohz_idle) + /* + * sd->nohz_idle only pairs with nr_busy_cpus on sd->shared; if this + * domain has no shared object there is nothing to clear or account. + */ + if (!sd || !sd->shared || !sd->nohz_idle) return; sd->nohz_idle = 0; @@ -12887,7 +12891,8 @@ static void set_cpu_sd_state_idle(int cpu) struct sched_domain *sd; sd = rcu_dereference_all(per_cpu(sd_llc, cpu)); - if (!sd || sd->nohz_idle) + /* See set_cpu_sd_state_busy(): nohz_idle is only used with sd->shared. */ + if (!sd || !sd->shared || sd->nohz_idle) return; sd->nohz_idle = 1; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 9f63b15d309d1..330f5893c4561 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2170,7 +2170,7 @@ DECLARE_PER_CPU(struct sched_domain __rcu *, sd_llc); DECLARE_PER_CPU(int, sd_llc_size); DECLARE_PER_CPU(int, sd_llc_id); DECLARE_PER_CPU(int, sd_share_id); -DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared); +DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_balance_shared); DECLARE_PER_CPU(struct sched_domain __rcu *, sd_numa); DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing); DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity); diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 5847b83d9d552..9bc4d11dd6a98 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -665,7 +665,7 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_llc); DEFINE_PER_CPU(int, sd_llc_size); DEFINE_PER_CPU(int, sd_llc_id); DEFINE_PER_CPU(int, sd_share_id); -DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared); +DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_balance_shared); DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa); DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing); DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity); @@ -680,20 +680,38 @@ static void update_top_cache_domain(int cpu) int id = cpu; int size = 1; + sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY_FULL); + /* + * The shared object is attached to sd_asym_cpucapacity only when the + * asym domain is non-overlapping (i.e., not built from SD_NUMA). + * On overlapping (NUMA) asym domains we fall back to letting the + * SD_SHARE_LLC path own the shared object, so sd->shared may be NULL + * here. + */ + if (sd && sd->shared) + sds = sd->shared; + + rcu_assign_pointer(per_cpu(sd_asym_cpucapacity, cpu), sd); + sd = highest_flag_domain(cpu, SD_SHARE_LLC); if (sd) { id = cpumask_first(sched_domain_span(sd)); size = cpumask_weight(sched_domain_span(sd)); - /* If sd_llc exists, sd_llc_shared should exist too. */ - WARN_ON_ONCE(!sd->shared); - sds = sd->shared; + /* + * If sd_asym_cpucapacity didn't claim the shared object, + * sd_llc must have one linked. + */ + if (!sds) { + WARN_ON_ONCE(!sd->shared); + sds = sd->shared; + } } rcu_assign_pointer(per_cpu(sd_llc, cpu), sd); per_cpu(sd_llc_size, cpu) = size; per_cpu(sd_llc_id, cpu) = id; - rcu_assign_pointer(per_cpu(sd_llc_shared, cpu), sds); + rcu_assign_pointer(per_cpu(sd_balance_shared, cpu), sds); sd = lowest_flag_domain(cpu, SD_CLUSTER); if (sd) @@ -711,9 +729,6 @@ static void update_top_cache_domain(int cpu) sd = highest_flag_domain(cpu, SD_ASYM_PACKING); rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd); - - sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY_FULL); - rcu_assign_pointer(per_cpu(sd_asym_cpucapacity, cpu), sd); } /* @@ -2650,6 +2665,54 @@ static void adjust_numa_imbalance(struct sched_domain *sd_llc) } } +static void init_sched_domain_shared(struct s_data *d, struct sched_domain *sd) +{ + int sd_id = cpumask_first(sched_domain_span(sd)); + + sd->shared = *per_cpu_ptr(d->sds, sd_id); + /* + * nr_busy_cpus is consumed only by the NOHZ kick path via + * sd_balance_shared; on the asym-capacity path it is initialized but + * never read. + */ + atomic_set(&sd->shared->nr_busy_cpus, sd->span_weight); + atomic_inc(&sd->shared->ref); +} + +/* + * For asymmetric CPU capacity, attach sched_domain_shared on the innermost + * SD_ASYM_CPUCAPACITY_FULL ancestor of @cpu's base domain when that ancestor is + * not an overlapping NUMA-built domain (then LLC should claim shared). + * + * A CPU may lack any FULL ancestor (e.g., exclusive cpuset symmetric island), + * then LLC must claim shared instead. + * + * Note: SD_ASYM_CPUCAPACITY_FULL is only set when all CPU capacity values + * are present in the domain span, so the asym domain we attach to cannot + * degenerate into a single-capacity group. The relevant edge cases are instead + * covered by the caveats above. + * + * Return true if this CPU's asym path claimed sd->shared, false otherwise. + */ +static bool claim_asym_sched_domain_shared(struct s_data *d, int cpu) +{ + struct sched_domain *sd = *per_cpu_ptr(d->sd, cpu); + struct sched_domain *sd_asym; + + if (!sd) + return false; + + sd_asym = sd; + while (sd_asym && !(sd_asym->flags & SD_ASYM_CPUCAPACITY_FULL)) + sd_asym = sd_asym->parent; + + if (!sd_asym || (sd_asym->flags & SD_NUMA)) + return false; + + init_sched_domain_shared(d, sd_asym); + return true; +} + /* * Build sched domains for a given set of CPUs and attach the sched domains * to the individual CPUs @@ -2708,20 +2771,26 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att } for_each_cpu(i, cpu_map) { + bool asym_claimed = false; + sd = *per_cpu_ptr(d.sd, i); if (!sd) continue; + if (has_asym) + asym_claimed = claim_asym_sched_domain_shared(&d, i); + /* First, find the topmost SD_SHARE_LLC domain */ while (sd->parent && (sd->parent->flags & SD_SHARE_LLC)) sd = sd->parent; if (sd->flags & SD_SHARE_LLC) { - int sd_id = cpumask_first(sched_domain_span(sd)); - - sd->shared = *per_cpu_ptr(d.sds, sd_id); - atomic_set(&sd->shared->nr_busy_cpus, sd->span_weight); - atomic_inc(&sd->shared->ref); + /* + * Initialize the sd->shared for SD_SHARE_LLC unless + * the asym path above already claimed it. + */ + if (!asym_claimed) + init_sched_domain_shared(&d, sd); /* * In presence of higher domains, adjust the -- 2.54.0