From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from BL0PR03CU003.outbound.protection.outlook.com (mail-eastusazon11012055.outbound.protection.outlook.com [52.101.53.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E64E3451C1 for ; Sat, 28 Mar 2026 21:42:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.53.55 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774734158; cv=fail; b=SUJhFMAMxR9WVr4H82FvK9+GEb7F4zdsZg/NPlg9oYv/igFB2X7qq2l26/5SVDu1iR9Z9Rp6oVVcv/SQYJb5n0g8VhXl2dO55FaKRFunZsInF6nizp7im+gPoyiikyQUJX6rTW5wc+aoHd35aHUkgcwP+iIWwuethaHPuGpHRDk= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774734158; c=relaxed/simple; bh=/mFRzjVDtmAtdUALgFIbvw0IHA4xvuXfwJlxzrhHA2o=; h=Date:From:To:Cc:Subject:Message-ID:References:Content-Type: Content-Disposition:In-Reply-To:MIME-Version; b=Nk9Lceg6k5WDtdI2wI2Mz8LjlSaHVBr6id7aEOvodvfD4Q55bRjlzDFxTpM9VEo1E89FHB4O81+clsLWaa1VJ085WGUEOPIzP7ZlsbUER9ceYMK9xGSkRzx6g5G1bf7cJ4UbeqaMQgeZ8FvSDCIL1v1xsEd/vNRrn0+F2Ae8Usg= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=XGfbF+U4; arc=fail smtp.client-ip=52.101.53.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="XGfbF+U4" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=C9WR2JVyCSW5P/HeEeovQAT0qObIV9MlfqiInavLr4iRRQj0qGHJ+bH09Gvv4n4GzpSSieGZ42GHptjjlD4iuUzlxfaw5jZqMv1ey0P5x6JZqblPrvAjicbHGgkrg2fYbPUBIVaZxVNaFBYKzhXCLLqQmbfIADrX4gqzJA2roiCTpsqd/sIjIVNUXgeUZnP25tu3D1fnhKyXR5g4H9snKBxJ2mhKILLcI3c8hiwCWCi0ERZRA0wE42ZZ26aiYIt6/OjWgyk0kkUGt8mr0Q2b0waJUxOARkxovCFsSzE1NGS71XsBx2lXudl+4hVXFdoSW6XZUvQuKIjL5JjaghXwIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=YB5L9JuMO3d4nGICKVXuKPEbPGcVnej7e+Glh9AuvK8=; b=GPBEYmVJTSEzXRSwKkGGazOTmFZcsGu2P6qAg5OGlfMsbULdTRqhrpFA8H0F9yTlk1nwsfLoRgDGzcr0Awyns4MU3WkypqI5S0fri0KxxHgK6dKPC1T2jxGVcUBIF1Jh7bj6t9Ne2Qv80vECh3ASlsHofGajsUNS21EaPkpRGBKXtlT5i9t35VODQhEDUkR7jD3mRaIG5V9geP57rImuzX9xwsZvxQ3Nv9QTilGW/1Z+1g27mhbZoWvnUyJzbc1Kl7cP0G6mb+3sFBhrONMipgLKKRmStYGBEr3NknqVC5ey1+PKLi2OvqhoJw28M04lRO7gyjRmZq1S/wYGVnDSCQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YB5L9JuMO3d4nGICKVXuKPEbPGcVnej7e+Glh9AuvK8=; b=XGfbF+U49IIcHdKHZsfYhCgTleVmSkrU7EWKdJAR8VWWItsPKzkGKfnkLD/V1z4uF8jZyg5NqUmE933ARaEdrnoi7x3KMoELG+6JCbvnIr90vrdTc7Ldh2ye+qKgqMsMK5GNZ1b8HMLv/vli4kpE6hjlY3DW2+0rj9+9v7A/+94yY9S7CApmLN+HRAXIFhG4EKBnQrhczLnZeV5DhXz1KpYwkGlWhnpTNyFHycFEn21iCQcPkAI2+VYqBxCeEzvzXZGoMr5OM9vf6QJYz3/GGRJYxKBqopO3PteU098Naroyfgbok/VRCEyWPaxYfRiSfZA+mVdqogNS7cV+nM1Qbg== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CY8PR12MB7171.namprd12.prod.outlook.com (2603:10b6:930:5c::20) by CH3PR12MB8075.namprd12.prod.outlook.com (2603:10b6:610:122::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.14; Sat, 28 Mar 2026 21:42:33 +0000 Received: from CY8PR12MB7171.namprd12.prod.outlook.com ([fe80::4487:395f:3abf:ad9]) by CY8PR12MB7171.namprd12.prod.outlook.com ([fe80::4487:395f:3abf:ad9%4]) with mapi id 15.20.9769.011; Sat, 28 Mar 2026 21:42:31 +0000 Date: Sat, 28 Mar 2026 17:42:28 -0400 From: Yury Norov To: Valentin Schneider Cc: Peter Zijlstra , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , linux-kernel@vger.kernel.org, Yury Norov Subject: Re: [PATCH] sched/topology: optimize sched_numa_find_nth_cpu() Message-ID: References: <20260319172607.926280-1-ynorov@nvidia.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: BN9PR03CA0895.namprd03.prod.outlook.com (2603:10b6:408:13c::30) To CY8PR12MB7171.namprd12.prod.outlook.com (2603:10b6:930:5c::20) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY8PR12MB7171:EE_|CH3PR12MB8075:EE_ X-MS-Office365-Filtering-Correlation-Id: 58c754b2-ad07-4ad2-dd5d-08de8d12ea4b X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|1800799024|7416014|376014|10070799003|22082099003|18002099003|56012099003; X-Microsoft-Antispam-Message-Info: SIMLoMAgMqrpp4zhlRoBxI9+WxZUolQGSDNUZ4AZdBde95+HW5J/a0DtgZrdkSLvncGLaeZze/sHPaEKqUbFQDYziI7j6eNkbXJGl8jmUYY6J5wkcb4YA1TA2U7eBwOp15S/SZTBM9YXllI1oi8NtmKp/Qk34GfTSGMwWWIenSnqu6lGE1ccOedEnWmJJXo/oG19/Iyem1xuvUvTOWRhzp3Kh90pMZVax4rJpyHg23Xp8eCDrbMNKmZeiEer3ShrqxokbKYr4pmQTHKyUnsw0X8YHyp5QD3OzrGv+BDB+iicjBHQ4XI413P9B1louQJLlgZ26Q//RUbD6nWWoq2ZJs5yBfFF5IaBF5TdijBn2kyjLbz2PfMeu9hVosVkB2XYHWijJIFqKgH3tegNrypIuEDqcqFNjTjkmbjqRLvwNYuY9gLAhpEIKA3BYxtPqGb064969UsTV9+cUUuuWMJXnfGV8EMD9Kl9M4VY7GxW2jhdKwiTDb3GrPCQ8Hbak/0kq3PDlZhHn/25B3rlq7HcTbyyPFWVEswBdEzxx1i5NtMmDSqmel2j8b29/Y0VH1LqLqKfIbKcbPhaD3No7PAdYW/GWCww98FhwVuOtP1Ki7mlVE9XknH/YSiYDjZsvHONMxTUjBYqFHD7caf9THw57W6Nsn31XQdopGKoXqPia6ydWUABRozzoIUHfIOeJu/qIJV/0sXNdFWUMsISPoLkc5yUqZeafKrP7u3tmB6jpgs= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CY8PR12MB7171.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(1800799024)(7416014)(376014)(10070799003)(22082099003)(18002099003)(56012099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?/ymuCSwxrr6IDVSf6cVD9IMRYmJpkfpJAIBjGngmfBeftdGI0Ich6RUM829+?= =?us-ascii?Q?uTKhblR9TL3ElnzQK3w/ysOh5D2E+gnLJfA3C52GpanSGSwDQPA2AlewFo+1?= =?us-ascii?Q?p/DHLyHF5aB77/nGmFFlXMESDpI8jh22HjNwX5/sCauaDBRF1k+RH5H6qut8?= =?us-ascii?Q?ptNjFD1TNU2DmaW1T4ubgo87IvoQpkYVjcbBNfTpkXLoi6IbzMLvO0+jfqWK?= =?us-ascii?Q?zAd3xoMbhsm/XziF/IlFzyZwllnezTWWlwZSGAMiPLzIpr7XpAYMrytj2jxH?= =?us-ascii?Q?BWf7WkdgHlmWgYnhwt1tU6gx06ZVcqnEj507R3W80fIQO5kn6wAHJtFv3tJC?= =?us-ascii?Q?mSIcZdsvHRpvcUX3RYY7fHeM56LM6ffuzscRQZjVSgmLT84rtNpL2ZhLkj8l?= =?us-ascii?Q?SGSegnMJjMeb959hH4U5D5zMhDLwU7e2UmpiE+MpIELnRLYSxshSIK5pUod0?= =?us-ascii?Q?QYix64p0nex8vob8gftJavxtSlq0pwasbKBgq66v6h1BX3wzZJ+R5sxdrE2E?= =?us-ascii?Q?zSzMb2N/DSZ/hlDBzizoG86VKzYPzkhEFojBrFlySBC9hpl6C1JIPNBBUBPJ?= =?us-ascii?Q?4GX7T/4/6NcUTPUMud2oxJDnEfevkkBnFMRCApy/El3G9yOjnRjtGQUlqr2m?= =?us-ascii?Q?UJU3QtiH+jMBF9LrRsY8f8Io3EzEJz1NFr85wMhkVlJoeKWZpjr7zVi6rhFR?= =?us-ascii?Q?ufXU2cQc3wz3dPpn0AxnDYxvagkoMUJMNNLbpdD8t/uf6sGYQxFJM+wyJ4Dr?= =?us-ascii?Q?1m9dxF+JTqrjVNfHXTzUCeQZwniPhCd/2/X0hnSzu5r19Gaj8yRoWEqKg90E?= =?us-ascii?Q?z72q0Gy8I+CW45E1YwYDh/dznM6W8EQ7jvJnhqfh8ji795sbxs+FBgW8PAtj?= =?us-ascii?Q?2COygQEpH1aT9fbt9VIM17IrMvDM0MJ2bO7W39WnEdqrIBhMSR926X/qQ+UV?= =?us-ascii?Q?oDGW1n7kZCEs/7/coFL6XX1rvzRLbbruHRHJH9oke04V9OlV1Xejs3qgkRsu?= =?us-ascii?Q?oWfD1JgBTQoQSeUcDmPKZRKwkNZpkeEoGpweMZiALpP2MVG+mtriNOA1oRxi?= =?us-ascii?Q?7SX+r/WofBMY+5NxRfJf4U3N02TJZrGc2lFQqOGnj/4AOGktmAsY1SphHJDy?= =?us-ascii?Q?azQYvMRj2JcENDNiCi2Uin+WHOzO2h95+YmQz9XLuQ25kPKMa5YeaTC6eIHr?= =?us-ascii?Q?7NNgJ9vViJRU8QDOrAEucM1VW/Cz8j3MHiM+s1M2T8LWYxir3ktIgvfcznGk?= =?us-ascii?Q?adv/c7gp3V6tSWocdMlZLUmzNqM8OxAf2AfnZSBd+qwNtLyLUpgCeVI6sl77?= =?us-ascii?Q?LYV+ixNccMlDfnNaYFHKGHhCcDUSf4thLbc3NR0huN0DA1szPkSzDmh3V9+O?= =?us-ascii?Q?9JfzqdEuFrh5l7/veUD99MjSyiLjubBiq7H/FN/GdmZit618j9T0wWMmZCn/?= =?us-ascii?Q?7IeLpGG1iZgRhOBhg93B2Nn3e0TyZNJH4iiHQqdNAheYZ7JL8su5O19zsDuE?= =?us-ascii?Q?07DS+bA1uqHmOfzrQXrpHdqZmYgCbwilpkdo73MzU/D+YFtGz7yiVK76mb9x?= =?us-ascii?Q?mlfwINS8VYf4jzAxztfloskCfRJVUypysfqFxpm69rp2jIIO8WmunQKnHLgK?= =?us-ascii?Q?BpoyIKLQP0FPDjzQlSJaAOZ8J3sh2/UimBpE/GiwwI2O4kzrMy9PUFbmRJIs?= =?us-ascii?Q?y7c3GjiELx9AJVBEfmT8eqpa3VHzq2FGfETteyMmAEKZeUidS9zlTUj2gQD5?= =?us-ascii?Q?2I8j3b0lWWLOYD0h0s8KG364OPYhJHXpdNp1A/kQsZRV3qYVmaLU?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 58c754b2-ad07-4ad2-dd5d-08de8d12ea4b X-MS-Exchange-CrossTenant-AuthSource: CY8PR12MB7171.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Mar 2026 21:42:31.7138 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: tab38Brum1bqxs/gu7Iwu4BH1UOz5epK6LnWcXI0PKQUDcM0o72u+3D8Bkjfa4BBU9eH0EY7GD5SwvN90wk7Iw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH3PR12MB8075 On Thu, Mar 26, 2026 at 08:45:25PM +0100, Valentin Schneider wrote: > On 26/03/26 20:09, Valentin Schneider wrote: > > On 19/03/26 13:26, Yury Norov wrote: > >> The binary search callback hop_cmp() uses cpumask_weight_and() on each > >> iteration. Switch it to cpumask_nth_and() as it returns earlier, as > >> soon as the required number of CPUs is found. > >> > >> Signed-off-by: Yury Norov > > > > Woopsie, forget about the empty reply. > > > > Took me a little while to get back on track with how > > sched_numa_find_nth_cpu() works. Doesn't help that it and the > > cpumask_nth*() family have a @cpu parameter when it isn't a CPU but an > > offset from a search start (i.e. an index, as coined for cpumask_local_spread()). > > > > Comments would help, I'll try to come up with something tomorrow if I wake > > up from whatever's brewing in my lungs :( > > I figured I'd give it a try before the fever kicks in: > --- > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index 32dcddaead82d..7069179d5ee0c 100644 > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -2267,6 +2267,7 @@ static int hop_cmp(const void *a, const void *b) > struct cpumask **prev_hop, **cur_hop = *(struct cpumask ***)b; > struct __cmp_key *k = (struct __cmp_key *)a; > > + /* Not enough CPUs reachable in that many hops */ > if (cpumask_weight_and(k->cpus, cur_hop[k->node]) <= k->cpu) > return 1; > > @@ -2275,6 +2276,10 @@ static int hop_cmp(const void *a, const void *b) > return 0; > } > > + /* > + * cur_hop spans enough CPUs to return an nth one, if the immediately > + * preceding hop doesn't then we're done. > + */ > prev_hop = *((struct cpumask ***)b - 1); > k->w = cpumask_weight_and(k->cpus, prev_hop[k->node]); > if (k->w <= k->cpu) I'm not a fan of comments explaining how the code works, especially if they pollute the code itself. Instead of increasing the function vertical length, can you add a top comment like: Comparator for the binary search in sched_numa_find_nth_cpu(). Returns positive value if the given hop doesn't contain enough CPUs. Returns zero if the current hop is a minimal hop containing enough CPUs. Returns negative value if the current hop is not a minimal hop containing enough CPUs. If you think the comparator is too complicated, it's always better to add self-explaining helpers: if (not_enough_hops()) return 1; if (just_enough_hops()) return 0; /* Otherwise too many hops */ return -1; > @@ -2284,16 +2289,16 @@ static int hop_cmp(const void *a, const void *b) > } > > /** > - * sched_numa_find_nth_cpu() - given the NUMA topology, find the Nth closest CPU > - * from @cpus to @cpu, taking into account distance > - * from a given @node. > + * sched_numa_find_nth_cpu() - given the NUMA topology, find the @nth_cpu in > + * @cpus reachable from @node in the least amount > + * of hops. > * @cpus: cpumask to find a cpu from > - * @cpu: CPU to start searching > - * @node: NUMA node to order CPUs by distance > + * @nth_cpu: CPU offset to search for > + * @node: NUMA node to start the search from > * > * Return: cpu, or nr_cpu_ids when nothing found. > */ > -int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node) > +int sched_numa_find_nth_cpu(const struct cpumask *cpus, int nth_cpu, int node) If you think that 'cpu' is confusing, then 'nth_cpu' would be even more confusing: you're searching for the nth_cpu, and you pass it as a parameter. Let's name it just num or idx, with the reasoning: 1. This is not CPU (contrary to cpumask_next(cpu), for example); and 2. This is just an index in a numa-based distance enumeration. And the description should reflect that, like: @idx: index of a CPU to find in a node-based distance CPU enumeration > { > struct __cmp_key k = { .cpus = cpus, .cpu = cpu }; > struct cpumask ***hop_masks; > @@ -2315,8 +2320,21 @@ int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node) > hop_masks = bsearch(&k, k.masks, sched_domains_numa_levels, sizeof(k.masks[0]), hop_cmp); > if (!hop_masks) > goto unlock; > + /* > + * bsearch returned sched_domains_numa_masks[hop], with @hop being the > + * smallest amount of hops it takes to reach an @nth_cpu from @node. > + */ Please no kernel doc syntax out of kernel doc blocks. > hop = hop_masks - k.masks; > > + /* > + * @hop is constructed by hop_cmp() such that sched_domains_numa_masks[hop][node] hop_cmp() doesn't construct, it's a comparator for bsearch, which searches for the nearest hop containing at least N CPUs. > + * spans enough CPUs to return an @nth_cpu, and sched_domains_numa_masks[hop-1][node] > + * doesn't. > + * > + * Get a cpumask without the CPUs from sched_domains_numa_masks[hop-1][node] > + * subtract how many CPUs that contains (@k.w), and fetch our @nth_cpu from > + * the resulting mask. It explains only hop != NULL case. This sentence confuses more than explain, to me. The below code is quite self-explaining. > + */ > ret = hop ? > cpumask_nth_and_andnot(cpu - k.w, cpus, k.masks[hop][node], k.masks[hop-1][node]) : > cpumask_nth_and(cpu, cpus, k.masks[0][node]);