From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2081.outbound.protection.outlook.com [40.107.236.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 80F804D5AB; Tue, 24 Dec 2024 09:33:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.236.81 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735032785; cv=fail; b=MPz7BMVysLVsa0N2ykTdl5pm+7Oqstu8adL8OwzbPxotoFfiKi4Q+BqV+p2n3jn5RhB0M6iu8tufSBKoxzz4Sx2vKs8q9tQCkZCLVwPNNKFeJH7hkJL+kxUwqbDDXgeboaVZkZzZPAxrSD/BufMjBP4XNt/hSF76PDJP2XXygRA= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735032785; c=relaxed/simple; bh=DmNzJdIgkUKxue+0eF22FkseZdE1hVHZo8DVrawD+vM=; h=Date:From:To:Cc:Subject:Message-ID:References:Content-Type: Content-Disposition:In-Reply-To:MIME-Version; b=awAMTrObVsINznEG7uvufqKGuDm5W1v29SSEK2PiIjwNkXlx88fkw2I0DLsx14Px2WoNu2srBjA+WpAoMNSPjTk69Lmc61aW1skaLW9wHI93rfYMq7wfsr4S2Me4i7gCNujDqc0ouQe6Cg9k75BP9Sg4jsTsyH9oya3jc5gSLsc= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=OepVQPOY; arc=fail smtp.client-ip=40.107.236.81 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="OepVQPOY" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=j7ULfVXEQco9XoQ1DcHWbmpKAJvZlHjTUHuowwPFYRJ4hZcDxnz3z2+ezUY844FFmNdE+YEZXt8ct2DUVe4JgjX5nomTTNj4XbDf8BBXO+KTyFM5tnQRu8DJ6v8YuU21UFOJL3/Mc7LOt3KbSR8E1RyyhY4pyASkL9ufpGwdQ+DwJUo2eut8ZGHOLuvwCQW8UcjvRLaRD+XsbWmP35LJ34IwG7/ch56V9pfrNSRivjzvccehk39kYLzttWQrVegvLk1ZnBSztHU9U1P6iLeGG6Lm5keX9lWPx+4ldFMS3v5IcquXJ4V2/3J5ltaYHIMKUz6TOmcUr0REhDKK5wcfwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ym6g/8DUCt+roUVnNxFHKFwJaK+Rd+umVxTyGDovhg8=; b=bx5/1HtrhVIZu8+ipoRqgyHBdffJbPUYQlQ6UPjYR3GYwxNY3gWP3I1AKkGu2WjIeet57DHHi2HuQ8WmSNB8Z1gu8bLGu7I1nU3/mWgdNhxsIJAo1aWkPZ2iZnldygE2oHIuKeOakbazbbX3zY3ByVBQmh8MV/QJ9rX92DFs6uHu4V5al5RVZjUWs1cpVVQHxoCnxfrs10IxH8Pln4aGhfhs5JWvyIkbSZnu/ty6l/bCmzpkN5k+JnGtGX8q/a1z/kPEkFoq4Fui9ttt6L/wrEgvcqdeL/UQsC5OtEhMA7eARIf1cTxtLSiqKnMu2/rAH/oPw/mfRypxKrBS1oOj6w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ym6g/8DUCt+roUVnNxFHKFwJaK+Rd+umVxTyGDovhg8=; b=OepVQPOYkqO2Y55rPv/i3JTTKPd9fSE0yU9lLJz87uKoWBJajQ6tSSAY6tlWyCquSSRlRh9C57rCR0e0OThypgKFOJvpwzx+trTQexl0JKFe37/U4CkZ13T8TiJ9RU67b2aTKRwGZhz+Sdd3wYWrvc/x8hfQw+8uyCwFLfHPp8Nh+MhzrRQk5ueBpMv0rb5c/zxFVfKOTHFdXpCyjRkoywoDtfqx7CqF8BPqVqk7CVa3PraYGD2k+K+43j9CnR07HjL1ha/7FMn677OPDASsdHMJ0fjhO5RRj2ynHVUjeao1NeM7jdFCc0eHKGi36qQAQBcJnwrR80rTQakHRNQSYQ== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17) by PH7PR12MB5853.namprd12.prod.outlook.com (2603:10b6:510:1d4::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8272.19; Tue, 24 Dec 2024 09:32:55 +0000 Received: from CY5PR12MB6405.namprd12.prod.outlook.com ([fe80::2119:c96c:b455:53b5]) by CY5PR12MB6405.namprd12.prod.outlook.com ([fe80::2119:c96c:b455:53b5%3]) with mapi id 15.20.8272.013; Tue, 24 Dec 2024 09:32:54 +0000 Date: Tue, 24 Dec 2024 10:32:48 +0100 From: Andrea Righi To: Yury Norov Cc: Tejun Heo , David Vernet , Changwoo Min , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , bpf@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 10/10] sched_ext: idle: Introduce NUMA aware idle cpu kfunc helpers Message-ID: References: <20241220154107.287478-1-arighi@nvidia.com> <20241220154107.287478-11-arighi@nvidia.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: FR2P281CA0012.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:a::22) To CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY5PR12MB6405:EE_|PH7PR12MB5853:EE_ X-MS-Office365-Filtering-Correlation-Id: e7223d6c-c386-4c1c-9b21-08dd23fdf196 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|7416014|1800799024|366016; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?Gutg8SRkiPNcwCIEilIUz4fvDCZfXiDmBwxDCZRNklBiXbSCi/j+ELprOBGC?= =?us-ascii?Q?6G6ccrGepR4AXPqaIItvWFBoO5RN2137FMFF2d2EHyiphcVB+Fb6dDB4xG7y?= =?us-ascii?Q?udocRSwQvCCn27ipKtKtGnv4ZV5va7LI9cGmD5crtFrUtLqDyVJji59gv4o5?= =?us-ascii?Q?CDdhohWUPuhZJi4ZwHphS8MNyOZsHAA3e09TuoWkep/K+P6N78sz9iZpXBT9?= =?us-ascii?Q?UUXQwm0jb7DCrTW19I0Qc+8KEtKGnGBbdh1pfaVcBeRVt46U2WmnloVJAX1G?= =?us-ascii?Q?lYgxeXHehKItUi/fHw1K/g1teSCvHinPPNCL5G2P9uvDd2vk96WmMPyAEj6d?= =?us-ascii?Q?M/EiExWkwoyid7jUdBb20e8gH8KlFskaqfK8S3QxWrT7Kenp6ut5DFknPszx?= =?us-ascii?Q?qOUEEwJ03Ww25tGG+0/ApOZTqIkDe22aL+BBplPRACuQ/77uRz/VJSHZQPVe?= =?us-ascii?Q?+HdnZvIEbbjZ8T9Eck8AA0h1nJ1jLH/VZm1myWMwI+55qUFSL6UyZI7g0yMw?= =?us-ascii?Q?iclrmXLJ19auT7iXMf3HxSagurYSkA4iF89d2xOeevkrJo9iKRNUQn+iLAuM?= =?us-ascii?Q?3q1n2OFb/1/De7ZbYluSm4FcBsNZWjcuBc4bF8QL41hLzCn/mKIF4zItlEMQ?= =?us-ascii?Q?voLsooecA6wXk/p7+0deckjitdT+P2T6Gq4jqDjbubF0lYXQi+Kh2+rO6T5c?= =?us-ascii?Q?hh8DSpz55Qk6eA2d1L08+72Kugwm9kFyrD3+Qls2azgeurb5fsVmE6KJUSlW?= =?us-ascii?Q?ty6YltNqNa3T0ahsaZMyqlSfgTuIpG1s+HIHmKLIAzfgRsiLN1/inPK+48tP?= =?us-ascii?Q?GoK5nH2w6uLCRI770tFObueCqNcovmH48kbJVKkk8UMNmWlTdBmg7btdlXs0?= =?us-ascii?Q?lG+nGgiCoWEKCqMm95sB22jYi2jHuPemYgcLOGARDWKuEqsVMqtb2pKkSpk1?= =?us-ascii?Q?TgK9fuTb8hwGtpbQ17b/U8m/aoQpDtb7YeuSUKzjJY6NK6ps197HR9hlzDcy?= =?us-ascii?Q?wOATdb3ntxA6tJzbIqRIS9yrBOQ4Di/WL7WHCzR2K2pjPJhhD2K0VENGpZG6?= =?us-ascii?Q?axnOudyzosC7qqnOi31CDeUb22q7cidaWe7EbsCv3jGSvCW+PfeZp/VIQBpK?= =?us-ascii?Q?YM3mfbP7s33QNwvRtYMFb0OPDz5zTXW3HIMLWSz39xt0GTufDXsYYNw4aXzb?= =?us-ascii?Q?3dnN7di1qy9dunMi67s2qyRRZLz8pyRxVlyGczAhn2aazk3kux5t59HOUmvV?= =?us-ascii?Q?jsLFzQh++wWDvQ1puYbsZNv4O7gxAV4gjfvTYFEidlVdPZG58jzuYu5hb1TG?= =?us-ascii?Q?7UklpLQBbHvJEiS5S2lDqteD6BX5WQY2+5gtPWTMRY34ni2f9ST8twQxT6VQ?= =?us-ascii?Q?D0ozKCRzSqqEtQQiQk9v3FtFRt99?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CY5PR12MB6405.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(376014)(7416014)(1800799024)(366016);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?zjrcWFLNxnqDl2qfXZs9xIeEKESsNsY2HVzMEprFNxfz6+rguoya8ao43Avv?= =?us-ascii?Q?OApgH39HeiRStMIFYOiwz9cGpKVKIOdj7rOa6y/FKMbQGoqMhgaMS9NhiwbM?= =?us-ascii?Q?3LtCIc2HEa06DulRG7CAjedOgEWQhdhNK3whvp+6XzpG7sOUiZpJo4RP1Nuq?= =?us-ascii?Q?jYY7X23vYvPoQoiU9LL/g9HOD+ToH+HqohQ+N9jMlB0QZ0gX02yq8GAeolJg?= =?us-ascii?Q?yymcMdjtiXDtUwUjbRhacOFxYSKs6fwk5Pjql0zYdR11xHC6tD5zRBqZ18lg?= =?us-ascii?Q?U/+QQDTTOTMwsuu+6cbq1esyolfxkPzGZo3MIR1YchT2TQPH6dEdPtnl5/FK?= =?us-ascii?Q?YHEZKFNAfK0wzUqZlHufx5Mb4S43unT/teejfaq+eRNzCIcxI++stWJITi1t?= =?us-ascii?Q?6oE2xUQK1dOTjGNNNemOlIDS5rujt/vDz9YrXtgeZuxN7jDg4HxvTC3OHNrP?= =?us-ascii?Q?1ZB0e/4B6Cagr3RwJjBAaZHcpWTr85cMR1tDipRUdGTAWpdLAWruALNau/CS?= =?us-ascii?Q?HNrWU/I7m3Gbjfz2k+8MHxsOMwy2mBG3OENwSHgKmeup2fBHn7XZP1G8UQns?= =?us-ascii?Q?HStjKNd7gM90gEyQ1I2HfbATSddYaz/QOZ9Hbnk/u3jWyfyxLtVDNsfBswXw?= =?us-ascii?Q?f//gpFrh5pfVMvKlai2e9WGi1fCgrJoObs1yw57h9kfU5TyIeAB2N2LnhtSS?= =?us-ascii?Q?1wghPA8Fc27QWdJDXz8x6e0xce20aFvSAs/Esc7Mvs7r9vBO/QO3N54dNVmL?= =?us-ascii?Q?yFVff8CfN9EfLoVOCbeBzK1wZzUwIZRw01ZPl32cF+/cIA8xAAT58Iq/KOcX?= =?us-ascii?Q?s1Afwpie9RV9y7SOHyzE3vvrUwb3nGXKEM1hexYjDiO/rsZ52TwTTnSuOueW?= =?us-ascii?Q?ULofHmluKnXt2RAUjGyzro6QNaMoTS4TwDKc20i/dHP+bxvY4psEoxcnn2Ad?= =?us-ascii?Q?b36pJMUOuEx33kmRzl1oB0ETvkV/9ndaEWpQYSmRPWPsgef/wj4ggw4cDzmd?= =?us-ascii?Q?zPCZvjCoKuE9iEYv5TA58JdW4IXixmDF7+yD+KYhFSVlxJoeLyA6nvXuOy2h?= =?us-ascii?Q?M8DManWWNI3x7XsID9/cKwc2C8PmF923DzXEUpjeZZJn2Ykkq2uSJuvJrQ5d?= =?us-ascii?Q?/tSr6YRChOabP5WAcchu/u/99gcAjD8iFF39SNiaDCyXY7NtG6GXGACMFnad?= =?us-ascii?Q?BB27l2eKEhQ7QAeryX2TUkTfkvfYZ199uVtn6Z4wG0HNDtuMQ+sIP5uQH+al?= =?us-ascii?Q?wyH1vSOfWcIRw3DmZQjyTkxskJUZPzvVJMMTIBGCNg86O64+HWZgug/g/+Mr?= =?us-ascii?Q?LgjEqPtMyuzJpzrZiXR/8lIIQgLds9w11LCZCluXZ2fsbWwp3ab2h+FmcIIX?= =?us-ascii?Q?TveHSDyzHrJNZG8Y0eJjs2X0Qof7eueRji/DTECW/glMKTPlbFZTGKZdTmko?= =?us-ascii?Q?hhGmzAHYeBPK67RUE1QqRP3ab2vNDNBa6Vi2dgoqF0IHJDVkp9G8/ATLmxhP?= =?us-ascii?Q?Gr8GuknXT1vj6DTazpUQ5imIgk+kO1qkVOtG+ODmEktLozmFt66moJjt5HbL?= =?us-ascii?Q?hXiPQqvu489JCVyvslXdw9DLCIdWjghv+GFnwDqO?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: e7223d6c-c386-4c1c-9b21-08dd23fdf196 X-MS-Exchange-CrossTenant-AuthSource: CY5PR12MB6405.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Dec 2024 09:32:54.5742 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 6xDbrI/s3vLaThiyIB0NDMlBBQz50CNVLGLf+Co/CDwtOT2G+Mhe3ADhVByNMunN3GTUaytxtBzR4jeKUVFBig== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB5853 On Mon, Dec 23, 2024 at 04:57:36PM -0800, Yury Norov wrote: > On Fri, Dec 20, 2024 at 04:11:42PM +0100, Andrea Righi wrote: > > Add the following kfunc's to provide scx schedulers direct access to > > per-node idle cpumasks information: > > > > const struct cpumask *scx_bpf_get_idle_cpumask_node(int node) > > const struct cpumask *scx_bpf_get_idle_smtmask_node(int node) > > s32 scx_bpf_pick_idle_cpu_node(const cpumask_t *cpus_allowed, > > int node, u64 flags) > > int scx_bpf_cpu_to_node(s32 cpu) > > > > Signed-off-by: Andrea Righi > > --- > > kernel/sched/ext_idle.c | 163 ++++++++++++++++++++--- > > tools/sched_ext/include/scx/common.bpf.h | 4 + > > tools/sched_ext/include/scx/compat.bpf.h | 19 +++ > > 3 files changed, 170 insertions(+), 16 deletions(-) > > > > diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c > > index b36e93da1b75..0f8ccc1e290e 100644 > > --- a/kernel/sched/ext_idle.c > > +++ b/kernel/sched/ext_idle.c > > @@ -28,6 +28,60 @@ static bool check_builtin_idle_enabled(void) > > return false; > > } > > > > +static bool check_builtin_idle_per_node_enabled(void) > > +{ > > + if (static_branch_likely(&scx_builtin_idle_per_node)) > > + return true; > > return 0; > > > + > > + scx_ops_error("per-node idle tracking is disabled"); > > + return false; > > return -ENOTSUP; Ok. > > > +} > > + > > +/* > > + * Validate and resolve a NUMA node. > > + * > > + * Return the resolved node ID on success or a negative value otherwise. > > + */ > > +static int validate_node(int node) > > +{ > > + if (!check_builtin_idle_per_node_enabled()) > > + return -EINVAL; > > So the node may be valid, but this validator may fail. EINVAL is a > misleading error code for that. You need ENOTSUP. Ok. > > > + > > + /* If no node is specified, use the current one */ > > + if (node == NUMA_NO_NODE) > > + return numa_node_id(); > > + > > + /* Make sure node is in a valid range */ > > + if (node < 0 || node >= nr_node_ids) { > > + scx_ops_error("invalid node %d", node); > > + return -ENOENT; > > No such file or directory? Hmm... > > This should be EINVAL. I would join this one with node_possible() > check. We probably need bpf_node_possible() or something... Ok about EINVAL. About bpf_node_possible() I'm not sure, it'd be convenient to have a kfunc for the BPF code to validate a node, but then we may also need to introduce bpf_node_online(), or even bpf_node_state(), ...? This can be probably addressed in a separate patch. > > > + } > > + > > + /* Make sure the node is part of the set of possible nodes */ > > + if (!node_possible(node)) { > > + scx_ops_error("unavailable node %d", node); > > Not that it's unavailable. It just doesn't exist... I'd say: > > scx_ops_error("Non-existing node %d. The existing nodes are: %pbl", > node, nodemask_pr_args(node_states[N_POSSIBLE])); > > > + return -EINVAL; > > + } > > What if user provides offline or cpu-less nodes? Is that a normal usage? > If not, it would be nice to print warning, or even return an error... I think we're returning -EBUSY in this case, which might be a reasonable error already. Triggering an scx_ops_error() seems a bit too aggressive. > > > + > > + return node; > > +} > > + > > +/* > > + * Return the node id associated to a target idle CPU (used to determine > > + * the proper idle cpumask). > > + */ > > +static int idle_cpu_to_node(int cpu) > > +{ > > + int node; > > + > > + if (static_branch_maybe(CONFIG_NUMA, &scx_builtin_idle_per_node)) > > + node = cpu_to_node(cpu); > > + else > > + node = NUMA_FLAT_NODE; > > + > > + return node; > > +} > > + > > #ifdef CONFIG_SMP > > struct idle_cpumask { > > cpumask_var_t cpu; > > @@ -83,22 +137,6 @@ static void idle_masks_init(void) > > > > static DEFINE_STATIC_KEY_FALSE(scx_selcpu_topo_llc); > > > > -/* > > - * Return the node id associated to a target idle CPU (used to determine > > - * the proper idle cpumask). > > - */ > > -static int idle_cpu_to_node(int cpu) > > -{ > > - int node; > > - > > - if (static_branch_maybe(CONFIG_NUMA, &scx_builtin_idle_per_node)) > > - node = cpu_to_node(cpu); > > - else > > - node = NUMA_FLAT_NODE; > > - > > - return node; > > -} > > - > > static bool test_and_clear_cpu_idle(int cpu) > > { > > int node = idle_cpu_to_node(cpu); > > @@ -613,6 +651,17 @@ static void reset_idle_masks(void) {} > > */ > > __bpf_kfunc_start_defs(); > > > > +/** > > + * scx_bpf_cpu_to_node - Return the NUMA node the given @cpu belongs to > > + */ > > +__bpf_kfunc int scx_bpf_cpu_to_node(s32 cpu) > > +{ > > + if (cpu < 0 || cpu >= nr_cpu_ids) > > + return -EINVAL; > > + > > + return idle_cpu_to_node(cpu); > > +} > > + > > /** > > * scx_bpf_select_cpu_dfl - The default implementation of ops.select_cpu() > > * @p: task_struct to select a CPU for > > @@ -645,6 +694,28 @@ __bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, > > return prev_cpu; > > } > > > > +/** > > + * scx_bpf_get_idle_cpumask_node - Get a referenced kptr to the idle-tracking > > + * per-CPU cpumask of a target NUMA node. > > + * > > + * NUMA_NO_NODE is interpreted as the current node. > > + * > > + * Returns an empty cpumask if idle tracking is not enabled, if @node is not > > + * valid, or running on a UP kernel. > > + */ > > +__bpf_kfunc const struct cpumask *scx_bpf_get_idle_cpumask_node(int node) > > +{ > > + node = validate_node(node); > > + if (node < 0) > > + return cpu_none_mask; > > I think I commented this in v7. This simply hides an error. You need to > return ERR_PTR(node). And your user should check it with IS_ERR_VALUE(). > > This should be consistent with scx_bpf_pick_idle_cpu_node(), where you > return an actual error. I think I changed it... somewhere, but it looks like I missed this part. :) Will change this as well, thanks! > > > + > > +#ifdef CONFIG_SMP > > + return get_idle_cpumask(node); > > +#else > > + return cpu_none_mask; > > +#endif > > +} > > + > > /** > > * scx_bpf_get_idle_cpumask - Get a referenced kptr to the idle-tracking > > * per-CPU cpumask. > > @@ -664,6 +735,32 @@ __bpf_kfunc const struct cpumask *scx_bpf_get_idle_cpumask(void) > > return get_idle_cpumask(NUMA_FLAT_NODE); > > } > > > > +/** > > + * scx_bpf_get_idle_smtmask_node - Get a referenced kptr to the idle-tracking, > > + * per-physical-core cpumask of a target NUMA node. Can be used to determine > > + * if an entire physical core is free. > > If it goes to DOCs, it should have parameters section. Ok. > > > + * > > + * NUMA_NO_NODE is interpreted as the current node. > > + * > > + * Returns an empty cpumask if idle tracking is not enabled, if @node is not > > + * valid, or running on a UP kernel. > > + */ > > +__bpf_kfunc const struct cpumask *scx_bpf_get_idle_smtmask_node(int node) > > +{ > > + node = validate_node(node); > > + if (node < 0) > > + return cpu_none_mask; > > + > > +#ifdef CONFIG_SMP > > + if (sched_smt_active()) > > + return get_idle_smtmask(node); > > + else > > + return get_idle_cpumask(node); > > +#else > > + return cpu_none_mask; > > +#endif > > +} > > + > > /** > > * scx_bpf_get_idle_smtmask - Get a referenced kptr to the idle-tracking, > > * per-physical-core cpumask. Can be used to determine if an entire physical > > @@ -722,6 +819,36 @@ __bpf_kfunc bool scx_bpf_test_and_clear_cpu_idle(s32 cpu) > > return false; > > } > > > > +/** > > + * scx_bpf_pick_idle_cpu_node - Pick and claim an idle cpu from a NUMA node > > + * @cpus_allowed: Allowed cpumask > > + * @node: target NUMA node > > + * @flags: %SCX_PICK_IDLE_CPU_* flags > > + * > > + * Pick and claim an idle cpu in @cpus_allowed from the NUMA node @node. > > + * Returns the picked idle cpu number on success. -%EBUSY if no matching cpu > > + * was found. > > validate_node() returns more errors. > > > + * > > + * If @node is NUMA_NO_NODE, the search is restricted to the current NUMA > > + * node. Otherwise, the search starts from @node and proceeds to other > > + * online NUMA nodes in order of increasing distance (unless > > + * SCX_PICK_IDLE_NODE is specified, in which case the search is limited to > > + * the target @node). > > Can you reorder statements, like: > > Restricted to current node if NUMA_NO_NODE. > Restricted to @node if SCX_PICK_IDLE_NODE is specified > Otherwise ... > > What if NUMA_NO_NODE + SCX_PICK_IDLE_NODE? Seems to be OK, but looks > redundant and non-intuitive. Why not if NUMA_NO_NODE provided, start > from current node, but not restrict with it? The more I think about NUMA_NO_NODE behavior, the more I'm convinved we should just return -EBUSY (or a similar error). Implicitly assuming NUMA_NO_NODE == current node seems a bit confusing in some cases. Moreover, BPF already has the bpf_get_numa_node_id() helper, so there's no reason to introduce this NUMA_NO_NODE == current node assumption. > > > + * > > + * Unavailable if ops.update_idle() is implemented and > > + * %SCX_OPS_KEEP_BUILTIN_IDLE is not set or if %SCX_OPS_KEEP_BUILTIN_IDLE is > > + * not set. > > + */ > > +__bpf_kfunc s32 scx_bpf_pick_idle_cpu_node(const struct cpumask *cpus_allowed, > > + int node, u64 flags) > > +{ > > + node = validate_node(node); > > Hold on! This validate_node() replaces NO_NODE with current node but > doesn't touch flags. It means that scx_pick_idle_cpu() will never see > NO_NODE, and will not be able to restrict to current node. The comment > above is incorrect, right? Yes, the comment is incorrect, the logic here was to simply replace NUMA_NO_NODE with current node, the restriction is only determined by SCX_PICK_IDLE_NODE. However, as mentioned above, I think we should just get rid of this NO_NODE == current node assumption, this is yet another place where it adds unnecessary complexity and it makes the code harder to follow. Thanks, -Andrea