From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from CH1PR05CU001.outbound.protection.outlook.com (mail-northcentralusazon11010035.outbound.protection.outlook.com [52.101.193.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0125733E368 for ; Fri, 24 Apr 2026 05:14:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.193.35 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777007662; cv=fail; b=DVVFrdU43xx8Airi5fVPzPtBSv6A3lqLtk0SnlKmtGTd57vkeyskPK01nXw6FhNwJAf2SJOF8H3YU+AdanaETd5cZUPX4Sc+9LNy2a5xNWI7upRnjFIKc1mYau5jotnI9o3zF49G7MbnQon0kuaz1yv3bZ6KGLdlx4A5LJd1Bz4= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777007662; c=relaxed/simple; bh=DjTrY/dFxATW2N7+4NUqRgBvyFhnabVVWGQD9LQPkkA=; h=Message-ID:Date:MIME-Version:Subject:To:CC:References:From: In-Reply-To:Content-Type; b=J+RapJkjciI5appBSZG+1XCOE81gRSwEdsfxsW20KczFNaj3Qn25DJsU9oY8iLQgyX5AaCdSrO9sEMtAJp/OCFrbC9raRip9iOSiozE6ARlJ7e4hDxHO982P1kXQVfdDYJLQVrn/vMBw28DYWTWCshbQnEA7NSbfihA8/xFi0Ug= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=PWI6bCQM; arc=fail smtp.client-ip=52.101.193.35 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="PWI6bCQM" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=ao4qahzWBd8sGsNLu6igcBW5ezPWBdxUF8QhpUn6RMJW2ULi8r2PUNCmp8MsxM8BHNzEGlAQX55jSEFg+CQo9U2BqAhSLxh6a0U0mJBGSthpqC178ImvN1rA5ucYhWfkleiuvm08wA+JIH1Wi5FD9+O3L03pf99detnSe594sy7sIY/E45AjlBaDx2unPF42t6/z/BSWZ0jyvzuj8wGyecMAwucGC2cQMfsxZMHPr7xhxCtCTIR4Co5oo2TPnG16RWT5nhO5KBqE/mh4DHEZsyadPCN+07IclRNLpzrMYAGVZ9p/yiKO0TKI0mHSv2VmpIm1X5HQHQcu5PygJFMzug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=XKHDzELDNJjVx04jlldcfivMHe2VbAD+ANQTBuWW25U=; b=YiGMYztOMmoszz9NeRqqjgUmgsazS5qAuGM7oj1dYPaIthnvlZEFmHqGslaWwUiMjEqC0dbMIsH5qLxfkZ0hK+MSv5B0tXMxa7sBO0i1O7ZGtJDR92H/FP2MuePyk6Q5zK+xeovKSNWebQKdfqQsToIUmlX9E4mGnKHoDdHyT9w/CWwcmN5S1vlDYsNk6jaQAkPyZjz2rpzTv6CIoQps9Nfd5rlqapnaVoLIai+blfEMIFz5kuONRDnKs1SHqXo+YM30cse+bLpgtF4Kc1Hu/L7+jxYQ3hr8BmB8inD/QoPMDah/sDKX18BNTQjj2G7TzGbg/HJQXwYD1NvokfPL0A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=nvidia.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=XKHDzELDNJjVx04jlldcfivMHe2VbAD+ANQTBuWW25U=; b=PWI6bCQM0EF5oUlaATFOSDUc72jR0n4DbMid6WZBvuHG9HC6FmXLWyABhj+BD77L7FqcVIM758QDjL+I9ordhrUCRoca6MLdJcD1xbD0IRTAXDsLY3RYnAjAfA78Qd8zdB0y7b6kHJkdxvqPxv4mHK1oXQXzXTPL4zcy+qY4QRg= Received: from MN2PR07CA0009.namprd07.prod.outlook.com (2603:10b6:208:1a0::19) by MW6PR12MB7070.namprd12.prod.outlook.com (2603:10b6:303:238::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9846.20; Fri, 24 Apr 2026 05:14:16 +0000 Received: from BL02EPF0001A0FF.namprd03.prod.outlook.com (2603:10b6:208:1a0:cafe::5a) by MN2PR07CA0009.outlook.office365.com (2603:10b6:208:1a0::19) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9791.48 via Frontend Transport; Fri, 24 Apr 2026 05:14:15 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=satlexmb08.amd.com; pr=C Received: from satlexmb08.amd.com (165.204.84.17) by BL02EPF0001A0FF.mail.protection.outlook.com (10.167.242.106) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9846.18 via Frontend Transport; Fri, 24 Apr 2026 05:14:15 +0000 Received: from satlexmb08.amd.com (10.181.42.217) by satlexmb08.amd.com (10.181.42.217) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.17; Fri, 24 Apr 2026 00:14:14 -0500 Received: from [10.136.34.119] (10.180.168.240) by satlexmb08.amd.com (10.181.42.217) with Microsoft SMTP Server id 15.2.2562.17 via Frontend Transport; Fri, 24 Apr 2026 00:14:10 -0500 Message-ID: <75cf4fd1-2e80-4167-9113-954015ba63e1@amd.com> Date: Fri, 24 Apr 2026 10:44:09 +0530 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/5] sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity To: Andrea Righi , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot CC: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Christian Loehle , Koba Ko , Felix Abecassis , Balbir Singh , Joel Fernandes , Shrikanth Hegde , References: <20260423074135.380390-1-arighi@nvidia.com> <20260423074135.380390-2-arighi@nvidia.com> Content-Language: en-US From: K Prateek Nayak In-Reply-To: <20260423074135.380390-2-arighi@nvidia.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BL02EPF0001A0FF:EE_|MW6PR12MB7070:EE_ X-MS-Office365-Filtering-Correlation-Id: b1413615-fd2e-4e67-3cde-08dea1c05448 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|7416014|1800799024|36860700016|82310400026|18002099003|22082099003|56012099003; X-Microsoft-Antispam-Message-Info: Mdsn10+QIE6I/KDz2KJWAKN1Nc/XjGnieUT/O2xON+wm22Q1z3uvdyuN0dbsSsRqycuPgCbNS1752yAdojYMQSeqLMFERqh/j9+zTA8brHGOZLJ/FDZ2iXaAvERB11vNqGfqlBM9HqPM+12MLBmauHm1X7WSUN0iBIcp6G6nnv5EHwIJuH/8pjjZOXNRQH6ft8oO2zs4Ythj/mfDC8b06SyUPOAsZ6WOud6SR1j/XaQUVQjb+y05Guem9rrUjC9yLid2m6udkYNPis6fiu8cyUz/nZN9VbhtzKbDrnEbMGZSj2MlDjPq/0TU71+CrbpV48k3LSPC5RSXd3QNIVyfui+ciwrohKYmALnthE0b+CObPtFBXQJU1Bm0FEL+WjiYvlXqURcqALl9DL1Nqul25HIbukMlKdGqfyuF/JraXZwvSxzL/MhVLu4+CM0OM+gBpizJyTZDsoSWjHJkEPFzDw7whXULiIgquiW5jRL+sVLXvV8fw+RmcBZ1V5NVL5XRJ57teGDTLBXA9M+OgWuUtjtEmNJewo7rYkLSztl4Gp6STI9r3z6ZTfb2zRUWRkfbhT4I/OJdtMcnbSgK7FonIt8OzLBUrSrA7FrXov2c6pi4UkYTqwil81m1xz5stdvzS6CLDOTm2YKyPqZDRoGBbL1tb/iMxG4Pzxu8FWJpmkSdLSRDd5YVkoFhHoTUF6Ogs5tYRXahTqttxGZupsLmOPA7qVvNy642Ya6jEbZ7OGbP3wDSK/ZL7IeEUA4zJd91JP8CuhuFBQ5K5oxgBK7SUQ== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb08.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(376014)(7416014)(1800799024)(36860700016)(82310400026)(18002099003)(22082099003)(56012099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: HmtDcINDyTLjgmfoeIDgwna9vitrKPbRHjby0tlioPXnJfbJ8cwbzm5pwNRsCiFMbS3D6DUTaGGMCM174OsX3GcIhPrfG7PPsyZWdko6wxd067eFzewOJXFmUIB04kStdSqOsLCayGpxf373zcIKNQAm0I7SVwEfs376Tk9+sX9SJXiM8zRiqxZaoDex3ymrEl1S/6Lr1BcnpQoLBPPJkSyXwg7vDcEu67R4+1zh7gQ+J5M668rjv26C/d37WKixfC8dF+ukWGv0z6rurFzJBsvTP9VZWUhyfxACmmhWCN4eoxmyFDxGvFKpeZGfO7dcnb6skxQJTGPgmEMaI/vw74JHLVvjgBh7WqezbngXGY8nZmtzxR0E7ykY/pTCJsWpQAYhDqKFe7WKJh5AKfxboaDlHtamc3yxg1hvyI0Ei3cl5gC3LNKk09snroy5o1WB X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Apr 2026 05:14:15.2554 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b1413615-fd2e-4e67-3cde-08dea1c05448 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb08.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BL02EPF0001A0FF.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW6PR12MB7070 Hello Andrea, On 4/23/2026 1:06 PM, Andrea Righi wrote: > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 69361c63353ad..934eb663f445e 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7925,7 +7925,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool > struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_rq_mask); > int i, cpu, idle_cpu = -1, nr = INT_MAX; > > - if (sched_feat(SIS_UTIL)) { > + if (sched_feat(SIS_UTIL) && sd->shared) { > /* > * Increment because !--nr is the condition to stop scan. > * > @@ -12840,7 +12840,8 @@ static void set_cpu_sd_state_busy(int cpu) > goto unlock; > sd->nohz_idle = 0; I just realised this flag only matters for accounting to "nr_busy_cpus" and we can bail out earlier if we don't have an sd->shared altogether. You can probably adapt this to use guard(rcu)() while you are at it and send these bits as a separate cleanup first saying that the assumption of sd_llc->shared always existing will change with the coming patches and you are introducing guard rails for the same. > > - atomic_inc(&sd->shared->nr_busy_cpus); > + if (sd->shared) > + atomic_inc(&sd->shared->nr_busy_cpus); > unlock: > rcu_read_unlock(); > } > @@ -12869,7 +12870,8 @@ static void set_cpu_sd_state_idle(int cpu) > goto unlock; > sd->nohz_idle = 1; > > - atomic_dec(&sd->shared->nr_busy_cpus); > + if (sd->shared) > + atomic_dec(&sd->shared->nr_busy_cpus); > unlock: > rcu_read_unlock(); > } > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index 5847b83d9d552..dc50193b198c6 100644 > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -680,19 +680,39 @@ static void update_top_cache_domain(int cpu) > int id = cpu; > int size = 1; > > + sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY_FULL); > + /* > + * The shared object is attached to sd_asym_cpucapacity only when the > + * asym domain is non-overlapping (i.e., not built from SD_NUMA). > + * On overlapping (NUMA) asym domains we fall back to letting the > + * SD_SHARE_LLC path own the shared object, so sd->shared may be NULL > + * here. > + */ > + if (sd && sd->shared) > + sds = sd->shared; > + > + rcu_assign_pointer(per_cpu(sd_asym_cpucapacity, cpu), sd); > + > sd = highest_flag_domain(cpu, SD_SHARE_LLC); > if (sd) { > id = cpumask_first(sched_domain_span(sd)); > size = cpumask_weight(sched_domain_span(sd)); > > - /* If sd_llc exists, sd_llc_shared should exist too. */ > - WARN_ON_ONCE(!sd->shared); > - sds = sd->shared; > + /* > + * If sd_asym_cpucapacity didn't claim the shared object, > + * sd_llc must have one linked. > + */ > + if (!sds) { > + WARN_ON_ONCE(!sd->shared); > + sds = sd->shared; > + } > } > > rcu_assign_pointer(per_cpu(sd_llc, cpu), sd); > per_cpu(sd_llc_size, cpu) = size; > per_cpu(sd_llc_id, cpu) = id; > + > + /* TODO: Rename sd_llc_shared to fit the new role. */ > rcu_assign_pointer(per_cpu(sd_llc_shared, cpu), sds); Would love for folks to chime in but IMO "sd_wakeup_shared" sounds pretty reasonable since it is mainly the wakeup path that depends on this except for one !ASYM load balancing trigger. > > sd = lowest_flag_domain(cpu, SD_CLUSTER); > @@ -711,9 +731,6 @@ static void update_top_cache_domain(int cpu) > > sd = highest_flag_domain(cpu, SD_ASYM_PACKING); > rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd); > - > - sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY_FULL); > - rcu_assign_pointer(per_cpu(sd_asym_cpucapacity, cpu), sd); > } > > /* > @@ -2650,6 +2667,15 @@ static void adjust_numa_imbalance(struct sched_domain *sd_llc) > } > } > > +static void init_sched_domain_shared(struct s_data *d, struct sched_domain *sd) > +{ > + int sd_id = cpumask_first(sched_domain_span(sd)); > + > + sd->shared = *per_cpu_ptr(d->sds, sd_id); > + atomic_set(&sd->shared->nr_busy_cpus, sd->span_weight); > + atomic_inc(&sd->shared->ref); > +} > + > /* > * Build sched domains for a given set of CPUs and attach the sched domains > * to the individual CPUs > @@ -2708,20 +2734,53 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att > } > > for_each_cpu(i, cpu_map) { > + struct sched_domain *sd_asym = NULL; > + bool asym_claimed = false; > + > sd = *per_cpu_ptr(d.sd, i); > if (!sd) > continue; > > + /* > + * In case of ASYM_CPUCAPACITY, attach sd->shared to > + * sd_asym_cpucapacity for wakeup stat tracking. > + * > + * Caveats: > + * > + * 1) has_asym is system-wide, but a given CPU may still > + * lack an SD_ASYM_CPUCAPACITY_FULL ancestor (e.g., an > + * exclusive cpuset carving out a symmetric capacity island). > + * Such CPUs must fall through to the LLC seeding path below. > + * > + * 2) Skip the asym attach if the asym ancestor is an > + * overlapping domain (SD_NUMA). On those topologies let the > + * LLC path own the shared object instead. > + * > + * XXX: This assumes SD_ASYM_CPUCAPACITY_FULL domain > + * always has more than one group else it is prone to > + * degeneration. I looked into this and we only set SD_ASYM_CPUCAPACITY if we find more than one capacity and SD_ASYM_CPUCAPACITY_FULL implies there are atleast two CPUs covering differnt capcities in the span. The very first SD_ASYM_CPUCAPACITY_FULL domain should be safe from degeneration when it is non-overlapping. > + */ > + sd_asym = sd; > + while (sd_asym && !(sd_asym->flags & SD_ASYM_CPUCAPACITY_FULL)) > + sd_asym = sd_asym->parent; > + > + if (sd_asym && !(sd_asym->flags & SD_NUMA)) { > + init_sched_domain_shared(&d, sd_asym); > + asym_claimed = true; > + } We should probably guard this behind a "has_asym" check. Maybe even extract into a sperate helper if the nesting gets too deep. Thoughts? > + > /* First, find the topmost SD_SHARE_LLC domain */ > + sd = *per_cpu_ptr(d.sd, i); nit. I think this reassignment is no longer required since you use a separate "sd_asym" variable now. > while (sd->parent && (sd->parent->flags & SD_SHARE_LLC)) > sd = sd->parent; > > if (sd->flags & SD_SHARE_LLC) { > - int sd_id = cpumask_first(sched_domain_span(sd)); > - > - sd->shared = *per_cpu_ptr(d.sds, sd_id); > - atomic_set(&sd->shared->nr_busy_cpus, sd->span_weight); > - atomic_inc(&sd->shared->ref); > + /* > + * Initialize the sd->shared for SD_SHARE_LLC unless > + * the asym path above already claimed it. > + */ > + if (!asym_claimed) > + init_sched_domain_shared(&d, sd); Tbh, if "has_asym" is true, we probabaly don't even need this since the nr_busy_cpus accounting gets us nothing. Might save a little overhead and space on those systems but I would love to hear if there are any concerns if we just drop the sd_llc->shared when we detect asym capacities. > > /* > * In presence of higher domains, adjust the -- Thanks and Regards, Prateek