From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2082.outbound.protection.outlook.com [40.107.93.82]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 17FBC198E69 for ; Wed, 18 Dec 2024 10:21:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.93.82 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734517301; cv=fail; b=eHo0xi2S98lZXcVSuG3wldy3XXrSG7dGdQ8dDdOjFwTPSrWK9ey8RWxPLnBPkGS+P8fk4WxRC++9ojVUhxIwkNdkKiU7qXlbYlPJ25zCY+AkUMmbJ56i9ifD1y2oeExO4QJzSvASsLfJOtoRblRETg/nDuFrgBxuIRYSLaxrfXg= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734517301; c=relaxed/simple; bh=ck71COv7s/PgZdZ9vEpP3bVlrbZLh6Nr3iEhsUUF15Q=; h=Date:From:To:Cc:Subject:Message-ID:References:Content-Type: Content-Disposition:In-Reply-To:MIME-Version; b=UbrLXh7oqwbuyyZpyZlrUaqR4v/ab2BKq1uG0UuuY+96MhADBuoaP8zcjTwRU8xTc+qJ00vfL3L5rYeHXwMepiBGHXBvx8vtBplZP0KgmcTp/Kl08/AQOqi39k2p3pS0ARM8a454hRbvjUeyeyhYLlGw74wl5TYeX9GV/74qlJY= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=uBYecUkK; arc=fail smtp.client-ip=40.107.93.82 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="uBYecUkK" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=zSaClRJvC4nDnzQ0cug3De/5rbWoRv1itDjl+gFKeQcrOKD1vHmlDXCRzdpL8HmbS39t5htUN/drBp0kzdrK5O4LlMNkXbp4Ntf0DIY0UepC13erMx3YFuwEeH79YD7HyQ7BUNMi1KSkvFWKKhMbRCqn3aO7ziA3YN6BjpFW10lnZMh4MCq5eMmmOwWj9CrVUmRGCSgSfYkyMiIffx82SJR1+HGNVM6BXlD/jSD40CRfkZALADqeE1cdGsRDrKdGlufjaH1pzfq2SQMFVhKsgfKWH7fgCGSB5Aak9R2gbAHstQacAoM00Y8uVbOLfP5vkfgdJc1wAhaIEPlOq6yeHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Bq27SwN4+07Vy8Tt6collA132XiSjfZMqs5dy7uyZWw=; b=PKc3wb0tyvxliQQ5txEEPamPn4H446AILojiRj+act6qx8TGVkAVYzuPLAGaYOPK+ITgaREp7olGmlErEHpHTPYJZTt5+k+WWQimG51/DGRvP4A5k2bLEUoTXZRO6hH56tbBbZFuo4DnAZtHcgIJZSy8G6YoA2jJcihJopDUR85Irf243jOzhRRBPiqrIgS7IsCllQwOkD0bh5xn4QujjQKnLslEUMtHlMhCmHGqJ56pfMiii+AOZqoIupG8p0Hs/AQaAc0+YAruHNzWnuCYmEkG2NL0pCuU/UY4F3d+DCnjp00rgNwzUzM9VvJPphrCqqM7eOOXetIOFZvWi7enTA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Bq27SwN4+07Vy8Tt6collA132XiSjfZMqs5dy7uyZWw=; b=uBYecUkKCT65DpidQMhzzSZaBWGbR0btJaTvYQjDQz44d+m2aNMhYB/91SKkr8qn4nR6rZDeGf0rlOfa1eA3Ari0LWHSFnK+Lmq6OOaDl6ouSLuW1W16bkgYapLeyxRXYwxQ4LwtXrqS9JiaFioGpqTKXbVv09UhdEd1OYSu9pGSTKb8oxRKGDqHLrqO43DG0VG5cuzWX1ui0Dev31JuBOuznY7V62W/LI2+oMLItYt57zPLuYcLqIObeIMxyoPsfy5l2NuHZOQqCZx1jrMyb16pGM3IfoLZRfarOiQL1Tsyuu9nXEpz47/7pcz6bv/Cxn7lC22c6v3iXcsEvfpqgA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17) by BY5PR12MB4291.namprd12.prod.outlook.com (2603:10b6:a03:20c::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8251.16; Wed, 18 Dec 2024 10:21:36 +0000 Received: from CY5PR12MB6405.namprd12.prod.outlook.com ([fe80::2119:c96c:b455:53b5]) by CY5PR12MB6405.namprd12.prod.outlook.com ([fe80::2119:c96c:b455:53b5%3]) with mapi id 15.20.8251.015; Wed, 18 Dec 2024 10:21:36 +0000 Date: Wed, 18 Dec 2024 11:21:30 +0100 From: Andrea Righi To: Tejun Heo Cc: David Vernet , Changwoo Min , Yury Norov , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/6] sched_ext: Introduce per-node idle cpumasks Message-ID: References: <20241217094156.577262-1-arighi@nvidia.com> <20241217094156.577262-4-arighi@nvidia.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: FR0P281CA0183.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:ab::12) To CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY5PR12MB6405:EE_|BY5PR12MB4291:EE_ X-MS-Office365-Filtering-Correlation-Id: f64c608a-dcf1-4f03-da0f-08dd1f4dc09f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|1800799024|366016|7416014; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?kiDLab8avKUmMCyDlaKxoLCGe2GgLLThdVJU1nHQ822uSxivdyjz/tPuCQ9O?= =?us-ascii?Q?sCjktepxWrvM1hbdtqlXWNsb6Z5I+dBfkYsV5DKP6TmWUHlLUc/lSZXflEiH?= =?us-ascii?Q?uqm7E7tlP08evRcyAqLVzuW6acrfefg+URmbXFXNc8XeUlOIoo4HB6vDKTf6?= =?us-ascii?Q?PVyLoYax6O6Nrn3AJY6vt5puZB/gWJQC8e7fgTlt//gnufaC2owcAoJ6KzUX?= =?us-ascii?Q?/7DjQKca6NuOuw+mdrq1d//l1XixmFqmeFdbxEkLqGIw6aulsjHGNqN+uMML?= =?us-ascii?Q?Zv1R4PrZIlbIirUwg1O9LwuxMPxfJFBurgcZdoXOPwVocKDUMDA60Uuup41r?= =?us-ascii?Q?KBRUryZfbVhmn/pX0BfQL1qyTqD0Q4ul8wSB+Km/jmPwOYfDuJNhE4ZkdN8t?= =?us-ascii?Q?5pMIaecQnyjk4KHiqt8EnP/roUbz+3okGhCGuPwIs5Q8BDbw7zwwq/CqpkWE?= =?us-ascii?Q?1KOIlgkGnkdc874aUResllZcrwvQ7nZY24h5lXZ2U64ONnMdLJVsfJDmsdWr?= =?us-ascii?Q?4BkZGBgxl4EiBTElyar0ih83XYtkMRqM+U6AqnKiEN2OmK8uBty0JAD6O0h8?= =?us-ascii?Q?K/7ERG5eVeTpHh29LDxbaDOmtgT8pfZwbXU2x47jvYJXuxVlg5eHiB4VmPbA?= =?us-ascii?Q?fOs5kJSErgWP7p8Mw9oEP1ndQu1gFRSS/FOZEy/dn6FII8rv/wR17wtCugMG?= =?us-ascii?Q?zeIrpbjAc3e92042IQ+EHuGP0xSQ4SiyXWm3HPDyasK7zByBePiQ1t9nw88l?= =?us-ascii?Q?NphcBs/1E5F00GDq1YhEI+nTxu7HkZAiRgeDg/kb5QDW1aTnnHI+LoW9WjLR?= =?us-ascii?Q?rzmTMi1Bgzw/PMczxubvR4QcbzmpkYLSaWFrqoz7P2lt6Bc8T5u96dxmpfGt?= =?us-ascii?Q?9QY657dHJOghl7W6Oq/fOQhNqYIV248N0yo+0RbJjFtWp+Femc2EUkVb2jpe?= =?us-ascii?Q?bx6wsIqeoJDlBBm/TCTnRKzT8LMknzxxBjpcXwOp4S/vpv9xyyxeVW2UjdAa?= =?us-ascii?Q?MOT16liM7DOVPoTkZIrYF4RgheQJVaE3qhStddZCNHT0ytt8EYF1+4QzHY77?= =?us-ascii?Q?HFhH3G5kywNM3mxJ6ZTNM/+3GBZPluVno0g/hP/GwNbXUEATv9lQBmjuMX8R?= =?us-ascii?Q?AAsStTtJgEIvN/6MbCvn5qEMcCbRlsJPZqK3QfKIsoPhRYOX0bWByM7QxIfH?= =?us-ascii?Q?lS3IIF4yBamNJhJPVnDRkWirDQcQ5suZHbtQpE5YNugdLan2lVFvsXxUFMIc?= =?us-ascii?Q?Y79WKh9ZMJMWazm7VuNRuU9eNY6iUc3/mkS1oNV8Qj3mijOqCkD9PCYDk1ZS?= =?us-ascii?Q?iZY7FliCIjkshdn6hCA2ee4bhEqlmXQXz++WuXRkq69OkjyVSmw3K58EDA+T?= =?us-ascii?Q?ap4A6aX9nZ7I/2QbghrdITuH2ArE?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CY5PR12MB6405.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(376014)(1800799024)(366016)(7416014);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?Niv2B2FLKVXSK4bE7fy4Tk4qOcyqkvD79U4bdV7FhvsjHhaGHCER0+4W588l?= =?us-ascii?Q?gdfO0bXKDczNkulYCWBUjfsQgUDiBxrjLOcjjG6nTJOhc75ZAeG5IyvSdgUH?= =?us-ascii?Q?PmroUrqWCSIMUOLNvRxWoPaJqmFO2+MJ1blKhN4nx58W1CTQSY9J4Stl/zRL?= =?us-ascii?Q?2XwNthq/CWOWPIQzweDvhUVtSQ5gqWi6gL2AxIL1oi1O1d74vT/bGqQSzcru?= =?us-ascii?Q?ODo7Nb8NLkQtpvCxybfMwERP+sOm4/SDTtOf2lsIs+lJxjEiesvYuAQTLUry?= =?us-ascii?Q?5lQcrRirmV7k28dAWSPR2EVAZo57sXsRDEsQdlXnOuj4C/sSZBVWeT0RcfJH?= =?us-ascii?Q?GfIqE0qFAS4Neso7JxeJjHC2vAqYmLo/I/MtIIXN/S/5bAlGvPl1ffN0ucwf?= =?us-ascii?Q?z6/r11XGMEVlAB+WpRdptd9OxvvBpZhctyICTKnHiprttMBGmOrhvorjSlxh?= =?us-ascii?Q?a3Q3eACWP28R6JrrvLDjkPRUK7vczmtEgwDteu0LF8vcibZqh8GvTSErLklP?= =?us-ascii?Q?70y+nW33zCzgyKE7+TXb6AAd8Ys9lUYHI0ZarKhMi9nQ7P7YAGG2ic987YTQ?= =?us-ascii?Q?FTPvN+DdSwtx5gyDinZyvmuPnm7gBS399iPqa5zno3zEJ6WSPKfS2+A8lLco?= =?us-ascii?Q?kkvoI6dWPyS2+uA2LSBvzJNRBTs9RUPw8VwTlXpo22wfYLGOA126MdbU0lV+?= =?us-ascii?Q?RthkvzZBOKrCGS/Yn8k8XchVQ/89fBAGCanGQJxdJ2iF1XBhhaBcM1H8Sw5+?= =?us-ascii?Q?PtyWUTb3hZzwuw8bB+3bVeXAavFb4QK+IfSUzvlQiM0xJVdImfA3HXMnHsW1?= =?us-ascii?Q?eHfGJ7Rt2qOZQEPMCoJLnPYWpWPy2bFB9eitAZ1y3kukyMotLsD4cDY3zf/J?= =?us-ascii?Q?XU2ypOJLByvSgvIKvGesMhl+TW1BBzLZ7cwZ8+Xy9ArxLdaZcZLLYvQR9rj7?= =?us-ascii?Q?uzLpCeUXnN2Dvx8jtO43bIm4ePc7FhZKb+BhJ+ObVUxotQKHf+Q4LvFeZU/g?= =?us-ascii?Q?Bwyzd+zwZdSnQN+ETSEGBnYWPafChHdibn8IFN4JdsoiSyW/PPRiju7msMoE?= =?us-ascii?Q?R1PsKNB5+FC6FU2jYUIKMh5vA0QOReOrkol7h1pYwJmdaS1CMhRmiVSPrklv?= =?us-ascii?Q?bYHYg4Z4tfh5xtpzl6fCy6oTsjp0UiHD+O7KxZX8nVdcBfID6KYvKCBNPyvo?= =?us-ascii?Q?u/gEORaxIDcna57e2HLCKYN5rS8B1vRJnx9Fiqzf+fqssoQhu7d+a0DkJhc2?= =?us-ascii?Q?MQCwkAVZCIzZkPIdxdkFI4zcwkjy8Tgy2uBoKioXD7S6skiFgGERubmQn40y?= =?us-ascii?Q?jSfkl6sf0Gw0ShrH+4X4bJov8gr0E1f0q7u8P2LG5SlBlckd7oD469z84uE/?= =?us-ascii?Q?Aa8KR6Lmg95qt5UiT4EdvMm2if7sFEStOFHkIEkgXrjMB4/FgNavbVc7KQwD?= =?us-ascii?Q?0RCosMkhmAAAfG2SjovzLJoX93l5kEeCdTolZL1GL8cVWzEAmmeyoI3Usd//?= =?us-ascii?Q?E15DSTC5wnyuqR+kWBfXFa6HnXTurBp/VxVyJnA/o/3/P04in1QxNbrF/aHA?= =?us-ascii?Q?ZvwC3NyapkzRVdYJlHSptpta894qfger8bibjpkt?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: f64c608a-dcf1-4f03-da0f-08dd1f4dc09f X-MS-Exchange-CrossTenant-AuthSource: CY5PR12MB6405.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Dec 2024 10:21:36.2101 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: HAQfm611UIQXNfTKAjueJvFwXqtwZ2RPbj1hs54pgtxx4aRMkCaZxlR6AirPkjJRsMLrYrKHy8/RvjQPUhz+/A== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR12MB4291 Hi Tejun, On Tue, Dec 17, 2024 at 01:22:26PM -1000, Tejun Heo wrote: > On Tue, Dec 17, 2024 at 10:32:28AM +0100, Andrea Righi wrote: > > +static int validate_node(int node) > > +{ > > + /* If no node is specified, return the current one */ > > + if (node == NUMA_NO_NODE) > > + return numa_node_id(); > > + > > + /* Make sure node is in the range of possible nodes */ > > + if (node < 0 || node >= num_possible_nodes()) > > + return -EINVAL; > > Are node IDs guaranteed to be consecutive? Shouldn't it be `node >= > nr_node_ids`? Also, should probably add node_possible(node)? Or even better add node_online(node), an offline NUMA node shouldn't be used in this context. > > > +/* > > + * cpumasks to track idle CPUs within each NUMA node. > > + * > > + * If SCX_OPS_BUILTIN_IDLE_PER_NODE is not specified, a single flat cpumask > > + * from node 0 is used to track all idle CPUs system-wide. > > + */ > > +static struct idle_cpumask **idle_masks CL_ALIGNED_IF_ONSTACK; > > As the masks are allocated separately anyway, the aligned attribute can be > dropped. There's no reason to align the index array. Right. > > > +static struct cpumask *get_idle_mask_node(int node, bool smt) > > +{ > > + if (!static_branch_maybe(CONFIG_NUMA, &scx_builtin_idle_per_node)) > > + return smt ? idle_masks[0]->smt : idle_masks[0]->cpu; > > + > > + node = validate_node(node); > > It's odd to validate input node in an internal function. If node is being > passed from BPF side, we should validate it and trigger scx_ops_error() if > invalid, but once the node number is inside the kernel, we should be able to > trust it. Makes sense, I'll move the validation in the kfuncs and trigger scx_ops_error() if the validation fails. > > > +static struct cpumask *get_idle_cpumask_node(int node) > > +{ > > + return get_idle_mask_node(node, false); > > Maybe make the inner function return `struct idle_cpumasks *` so that the > caller can pick between cpu and smt? Ok. > > > +static void idle_masks_init(void) > > +{ > > + int node; > > + > > + idle_masks = kcalloc(num_possible_nodes(), sizeof(*idle_masks), GFP_KERNEL); > > We probably want to use a variable name which is more qualified for a global > variable - scx_idle_masks? Ok. > > > @@ -3173,6 +3245,9 @@ bool scx_prio_less(const struct task_struct *a, const struct task_struct *b, > > > > static bool test_and_clear_cpu_idle(int cpu) > > { > > + int node = cpu_to_node(cpu); > > + struct cpumask *idle_cpu = get_idle_cpumask_node(node); > > Can we use plurals for cpumask varialbles - idle_cpus here? Ok. > > > -static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u64 flags) > > +static s32 scx_pick_idle_cpu_from_node(int node, const struct cpumask *cpus_allowed, u64 flags) > > Do we need "from_node"? > > > { > > int cpu; > > > > retry: > > if (sched_smt_active()) { > > - cpu = cpumask_any_and_distribute(idle_masks.smt, cpus_allowed); > > + cpu = cpumask_any_and_distribute(get_idle_smtmask_node(node), cpus_allowed); > > This too, would s/get_idle_smtmask_node(node)/idle_smtmask(node)/ work? > There are no node-unaware counterparts to these functions, right? Correct, we can just get rid of the _from_node() part. > > > +static s32 > > +scx_pick_idle_cpu_numa(const struct cpumask *cpus_allowed, s32 prev_cpu, u64 flags) > > +{ > > + nodemask_t hop_nodes = NODE_MASK_NONE; > > + int start_node = cpu_to_node(prev_cpu); > > + s32 cpu = -EBUSY; > > + > > + /* > > + * Traverse all online nodes in order of increasing distance, > > + * starting from prev_cpu's node. > > + */ > > + rcu_read_lock(); > > Is rcu_read_lock() necessary? Does lockdep warn if the explicit > rcu_read_lock() is dropped? Good point, the other for_each_numa_hop_mask() iterator requires it, but only to access the cpumasks via rcu_dereference(). Since we are iterating node IDs I think we can get rid of rcu_read_lock/unlock() here. I'll double check if lockdep complains without it. > > > @@ -3643,17 +3776,33 @@ static void set_cpus_allowed_scx(struct task_struct *p, > > > > static void reset_idle_masks(void) > > { > > + int node; > > + > > + if (!static_branch_maybe(CONFIG_NUMA, &scx_builtin_idle_per_node)) { > > + cpumask_copy(get_idle_cpumask_node(0), cpu_online_mask); > > + cpumask_copy(get_idle_smtmask_node(0), cpu_online_mask); > > + return; > > + } > > + > > /* > > * Consider all online cpus idle. Should converge to the actual state > > * quickly. > > */ > > - cpumask_copy(idle_masks.cpu, cpu_online_mask); > > - cpumask_copy(idle_masks.smt, cpu_online_mask); > > + for_each_node_state(node, N_POSSIBLE) { > > + const struct cpumask *node_mask = cpumask_of_node(node); > > + struct cpumask *idle_cpu = get_idle_cpumask_node(node); > > + struct cpumask *idle_smt = get_idle_smtmask_node(node); > > + > > + cpumask_and(idle_cpu, cpu_online_mask, node_mask); > > + cpumask_copy(idle_smt, idle_cpu); > > Can you do the same cpumask_and() here? I don't think it'll cause practical > problems but idle_cpus can be updated inbetween and e.g. we can end up with > idle_smts that have different idle states between siblings. Makes sense, the state should still converge to the right one in any case, but I agree that it's more accurate to use cpumask_and() also for idle_smt. Will change that. > > > /** > > * scx_bpf_get_idle_cpumask - Get a referenced kptr to the idle-tracking > > - * per-CPU cpumask. > > + * per-CPU cpumask of the current NUMA node. > > This is a bit misleading as it can be system-wide too. > > It's a bit confusing for scx_bpf_get_idle_cpu/smtmask() to return per-node > mask while scx_bpf_pick_idle_cpu() and friends are not scoped to the node. > Also, scx_bpf_pick_idle_cpu() picking the local node as the origin probably > doesn't make sense for most use cases as it's usually called from > ops.select_cpu() and the waker won't necessarily run on the same node as the > wakee. > > Maybe disallow scx_bpf_get_idle_cpu/smtmask() if idle_per_node is enabled > and add scx_bpF_get_idle_cpu/smtmask_node()? Ditto for > scx_bpf_pick_idle_cpu() and we can add a PICK_IDLE flag to allow/inhibit > CPUs outside the specified node. Yeah, I also don't like much the idea of implicitly use the current node when SCX_OPS_BUILTIN_IDLE_PER_NODE is enabled. I think it's totally reasonable to disallow the system-wide scx_bpf_get_idle_cpu/smtmask() when the flag is enabled. Ultimately, it's the scheduler's responsibility to enable or disable this feature, and if it's enabled, the scheduler is expected to implement NUMA-aware logic. I'm also fine with adding SCX_PICK_IDLE_NODE (or similar) to restrict the search for an idle CPU to the specified node. Thanks! -Andrea