From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2064.outbound.protection.outlook.com [40.107.92.64]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E6592561A9; Fri, 7 Mar 2025 20:05:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.92.64 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741377940; cv=fail; b=qQdbhjqBgv862BXRZZXEdJFm2EAo819wJ6rsGYQPNLGMgf12kEYjLWVnVqqHY4iD+FgVy6ZAhPQbp4tzQxPCtY0jhMu61zoF+A7O0ZBm/BYolW1hL4cGfiuU1ypzhuibSU/iNtnxWd1Lf8fsLIF9nFg7lkq1r8Qq9mSCFFVyRuA= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741377940; c=relaxed/simple; bh=/GhLnLOCdmMJ0U1HO805kp559ADOxkw5/gGfVLjQAws=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: Content-Type:MIME-Version; b=s2UPhZLZwqcgYvwCA5Vsho1loY++vPugmc2ia/AP1dlioHwuYaXRy5UNXyKngwarmV9DC7VvN407JAVOxFU7xLJm8DSqaQD+FhMzz0pF5nZtcShZNQWaM4czdLFRAsc3Mi7hsmtqimpz0yXBq3ln93AjdmEOyjqw7AZq6EO5y00= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=Dq07eFeb; arc=fail smtp.client-ip=40.107.92.64 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="Dq07eFeb" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=en2PMsTnO5kl7Zle7QvfaEIc1EvAHir58EuCGA/lIn1UzzVmDwNampA/6qwRCicMCIVMCOB+aDOVvhCY3lUsTyMy3OBaJlf61+uDUCMQ2Dqf67fNsKkYBj4WRJxWju9gcsE9TmWZH4m0by/fp++ANYA6z+ZU6MWNlkFBA0tawTsaupn0zveFSkeyaeGVPVKeNPvwUXxJ/OlK2yZehH+cE5VHJOGhUKPdTbGDuDi2CnYEVJ27FL1b4wK/ZyknPbHXazadrdtRhKGGnJWLjnI9DtVMqk8eLjgAMTws+xuDx5SKPBQqJMbPFabs9f2QZwvrA9HstKMtPqhJVzNxmW0XIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=OsyEIjBQW5rnBIkr4w2LptpsXbn0z6KciEOIeaSFzGA=; b=sSsvaCgrL591IveeBFZNXoZjn74QbdJEcYlytzBlXss0gdTXq1qBrkBWlWHffMMBvgA/BrrthS8kG26N9hIvSivX35zXMPpy2bHz6VKPqamtrFPCkTmsTMpgu0Rg41nRdIVLNsDRVY/AjQQSBwjulDpksq0W70e2JODpoLcqo89aFMJWRVi122TLlYfcH4w1Bze6uXjNUV/BZrqZftuPNp8rTRKxVWd7YnhJEdTpdyabD+s7CIsGQj7hWpfDXipK5DYemRIvMtx9MV2BXstBanqec+RnCy7X9Yd9XtTAL8UI+0udqkkVtpz5rm3J3qUnFMpMEZ1ofRTZQ50gXPKwng== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=OsyEIjBQW5rnBIkr4w2LptpsXbn0z6KciEOIeaSFzGA=; b=Dq07eFebUDNLMA8lXBx5fTcc4ni8tc18rQvotc1Tcj79QY8g2UTak7GVPyTxf0U/VjcurNvhkmFtJda/czJ1hqKrNwZQO5w2/E1VRgVy3xMypgKZR7lTZn7EPXQMIfq4XBUsAtmJ+fgTLKXxzrPNz9RHfOPlNESbv4foIFk1OcU+fH16uhWnvQi22weJYDR3yRlLogjhrjeQkxcE7kVqoNSh0mvOdTmpoJs9ziz/ghBbUYxDPIGV4URxRg2hZ930eiQ2B+1lMBsSrY6kcMwWTwN4xCy1sz9GnJPDdEksGEIl/3ZzA5RnC7B+NHiKH2h87BoOLeVc2gAxjRY0DDF44w== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17) by CY3PR12MB9554.namprd12.prod.outlook.com (2603:10b6:930:109::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8511.20; Fri, 7 Mar 2025 20:05:33 +0000 Received: from CY5PR12MB6405.namprd12.prod.outlook.com ([fe80::2119:c96c:b455:53b5]) by CY5PR12MB6405.namprd12.prod.outlook.com ([fe80::2119:c96c:b455:53b5%7]) with mapi id 15.20.8511.017; Fri, 7 Mar 2025 20:05:33 +0000 From: Andrea Righi To: Tejun Heo , David Vernet , Changwoo Min Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 3/6] sched_ext: idle: Introduce the concept of allowed CPUs Date: Fri, 7 Mar 2025 21:01:05 +0100 Message-ID: <20250307200502.253867-4-arighi@nvidia.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250307200502.253867-1-arighi@nvidia.com> References: <20250307200502.253867-1-arighi@nvidia.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: SJ0PR03CA0344.namprd03.prod.outlook.com (2603:10b6:a03:39c::19) To CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17) Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY5PR12MB6405:EE_|CY3PR12MB9554:EE_ X-MS-Office365-Filtering-Correlation-Id: 66085a62-fcea-4813-9f2d-08dd5db36b5c X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|366016|1800799024; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?5NGydfYkPxKzX5N5MrQUnwzDce5pbWR6WTuRVLzmpcijE+U206QT7UgiCWhx?= =?us-ascii?Q?iULSrIbsL6lXH2SDAdzB32Elttme8szoh7yiUWg63m+R8gSmMtlMBWG+JpO5?= =?us-ascii?Q?hZk7346kHIjcP1EDEOcLeeO3UUA1Ibs+xCky50LnZmamzTTXtnYIMNa27Pbf?= =?us-ascii?Q?qI8ZFCVlLM26y3wXPv1eJBSmhjNcIAu7eBS8Qn7g/VCk0l3+RGneHbzTbTdC?= =?us-ascii?Q?zIwCvjNU+7DDf+mW7JOtSssFv7VOcwjO8QGjQX3E9I79e+Gw8teHtB7UBCuo?= =?us-ascii?Q?dxsbYtlLKPZGjYBMujBYUqSkMo7RZF32dnoJfgM1y5UWK0KA+RrAjIq/emM+?= =?us-ascii?Q?0joZGco1Mc1+YYiKFqglMZnnICc7JHvNrJuE6UnujFE6hkcMv/b+noae6Uhg?= =?us-ascii?Q?aIqjXYL6W+/bHDxo8F7rptSQv6Wp87ZjjAgfbVHu+XDwmT0zHtxOlXs1ybAm?= =?us-ascii?Q?UjIlaPIrBABOUSRHzsBeHqkkXt8HMU2vmk44OY/hobofRAWAtZMsi4PkAZiZ?= =?us-ascii?Q?c7ffqlq3i4FPGdkgQmrxIVP8RCBzvUrDNqbXxsMTWM5a4zdVeAd9f4/B3waQ?= =?us-ascii?Q?foQ1Cgp2jwJEA1ZgDrDLvY1SYSg7n9BFfX/N+d6I/wzHDyzAqLcSdH2HbHrS?= =?us-ascii?Q?pxPrKtm67xA5t3LLTPOvyt33FpSbSYECUWrhrDRI2LLisxoAQlR4SaSlSik0?= =?us-ascii?Q?uPBOug97krWjWhYq1olAVZceBEDDsCMQCF1qnBmhPrafYLxanaPplllcLEDA?= =?us-ascii?Q?8NC80ouxWEzKhjj7OzPQMQbN1xT2wnR23zq+QRDAmBX5urKZX7q6iVqyHNIW?= =?us-ascii?Q?5T2ZVkOUJMjrbtAzsIp+kOGlvsMcik9U/cuw9UWT8Wiig/OrXGVb6i4/0/oj?= =?us-ascii?Q?XSxCK3u/Ca8P6W37TzubDU045i0a0ouN362eX1m/UB0Q8/mXEzVOKR0WcMqr?= =?us-ascii?Q?kHt/1Z9LU5tDI7tHJS6k6KV0JIXDRRfBWblS5jr9FQ99LLLeKE1mQyGchjFt?= =?us-ascii?Q?UlY06gIsWGNWXeFPKd3uNMVmuHKQNvO/bJm2zFGnLpTt9xk4g+WKEH2fFfIT?= =?us-ascii?Q?Z9pMyFbfSWxHc3/ayaGhZt9JJ+XFhI6Z2jBigv5iGTnLkznzPal51K/sjPJr?= =?us-ascii?Q?Qn0whjmp9xxdNXVeuhqGr7bLvmw9aNtL5mNwhz6GdSWKKMiQuQY5hQF4PMmA?= =?us-ascii?Q?gs+78Uo2N33N3N6NE00ADnAHJAGVnlzDskBsPbv8+yxXxfZAG45QpgINoPHF?= =?us-ascii?Q?ImoU4JGY+lZmRlB0vbAT29GSq62B6EhyIkDXjxpvBuuOFlYHAJF6nIbZm/Tf?= =?us-ascii?Q?ZPrt256RLIm9xA5pOvBOCkejAY9WHq+OifvX9S+uSgft/pPoY3J16JXEcS6o?= =?us-ascii?Q?/xQFZoxB6okY6WnV3TRlHchUu8ks?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CY5PR12MB6405.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(376014)(366016)(1800799024);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?/VWtixpCzcibF/lwAwszn/+oVE3HAfwWSsQnDxGvd48AnDTE+2uKzqhbtqKl?= =?us-ascii?Q?KQWEShGAgEPts8tentELFUZ20FWWz6GOtgYiP8jpazJvEBg6vjOffdAa8wuX?= =?us-ascii?Q?FGJrQasOBwcW359sitM0XzyFf5txOPvwW2P0HbSo5Jsvg17cO4Abf64ROInY?= =?us-ascii?Q?jghuNhold2G9AFxEd/TPoWO+P0Qedwvwmtu9KCO1/P5vfgAcc02eYI72solo?= =?us-ascii?Q?VfUeCKcGjFn00hTrNOJwqSTOdq+K7duhyDNxZWhsCz5hyBe0a3Ax9+BP/Mte?= =?us-ascii?Q?ujFYFM3ItUrJCVZiHRbAUvhgsTTIUUmPgRSRqHTGcpAGLBcSvO2DvrWlGFBP?= =?us-ascii?Q?KI81ZtV+Ol+lYuHJglL863O29ZUawN5/e8NPEZvRk6TyvCw/ysRmOCqcqS+E?= =?us-ascii?Q?UAK7oeSuCXds+BHFJTgmePI9fBmO4UnukX9pwYUYbehxvQiMkFQ8u1D1h5hK?= =?us-ascii?Q?r7b5RRLXv4iOeb97Jgk+hcunCmmqO+qq45E/qe6BTBpEhGBB3OuHbf87A2c8?= =?us-ascii?Q?1Qx0VvsE7EJAjFApBU8K/VXv8JvRcPKUOIHAtsEL+QowiS+/4F6MWGpmEf/Y?= =?us-ascii?Q?ER4BYjPQYbl3fgo/BqOpwq7p4xMa9KvXtpQrSf06+gYxxwtl6hpkFLHIC1v6?= =?us-ascii?Q?RjMCuRzRfkvHJdhkhvUwZ3jf/sBKYqJqqbOjbZEFqpkD+A1wOCB0UcG6zhbl?= =?us-ascii?Q?GSlHYB6RZkrH8YmTiXxoX96VMdLqVwEASt2GSoMq4BIMB0QgOq/ij1YWIDnY?= =?us-ascii?Q?NkKU+wrCpesvtKZ4hMVlOtOpaAPqw/j8du6d/95X0gPIZG3i0K2x94xB/6Qz?= =?us-ascii?Q?z8rbHmmXbUb2lBsTWW9IHEZ6V/jnfVkT4OO4KGjax0HkSDEqPZ/zH0GRJKuF?= =?us-ascii?Q?fwz/TmHrpHDfbOuWA9E+c/+rFWcwhBsbkYocMaKOGLV7HYbO8qzvGNnUf1E4?= =?us-ascii?Q?4NAbrUIiYJAwVA4A/O46jINpBLb9dZa+3vXz7wT0d5jTtkH/ryz/atTtqb3D?= =?us-ascii?Q?ZCnbGVBvm2+oN9N/ea9ND7e8PoBurGEXYcJ+JOBBBP0cNgt4n3CnAT2FH322?= =?us-ascii?Q?nnHLT7SOLDqPdscOMGU4MIsoZG5XPT/qeSoPPByU78JG3CfDII+fr1fyMoTl?= =?us-ascii?Q?CmI+EOJkWnFbKE10sddlmpwL70j76J0cW9RcDnlo3hiFrg9VGQIz2fAbFjMG?= =?us-ascii?Q?D/XhguEDnaJQKQWNXQQDADKwCBC+oZA3/Aw8xHmxIDYMg+raecJciSCrVVZP?= =?us-ascii?Q?izrsdhG8wnP29gbJH6L5F/yVkvZAZpssUaq1DI4vN6Sw4DuEL6oeTeONlu4z?= =?us-ascii?Q?ZnqZCnMYgCPlLuJIxiWb+dwmQnT15DVaLqRZo9ZkHVw4KN5PLOven4kkPrpD?= =?us-ascii?Q?VpMOw0rl+caJM/fh/To7AG/xWaX4Wx679yOUt3myKXWcjDVBhY3VveDaDoVP?= =?us-ascii?Q?S9zw96JaiDDpO7ttg5FcgFpG3tAA6TyXPoyA7w0yjUsVOTZ7Zd/9bFwS50Rl?= =?us-ascii?Q?NxeNj6NhVECwuiRu0glqjK9RbSXhMTufkpbW8mhdT3mFI+UQXdpU/4WXQ9oG?= =?us-ascii?Q?Mf7aNmT8L+2LACDqbqdrz3yHh5hbcBEkWORnu6uo?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 66085a62-fcea-4813-9f2d-08dd5db36b5c X-MS-Exchange-CrossTenant-AuthSource: CY5PR12MB6405.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 07 Mar 2025 20:05:33.9227 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: Jfol53BTkdqJzDgxTJkkp5T0CGdIqkPctAFhGLCzcz0CkZGsT9+EOq9Lakpu66bTpJe9iwBjGrkQHhN2Nzd0EA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY3PR12MB9554 Many scx schedulers define their own concept of scheduling domains to represent topology characteristics, such as heterogeneous architectures (e.g., big.LITTLE, P-cores/E-cores), or to categorize tasks based on specific properties (e.g., setting the soft-affinity of certain tasks to a subset of CPUs). Currently, there is no mechanism to share these domains with the built-in idle CPU selection policy. As a result, schedulers often implement their own idle CPU selection policies, which are typically similar to one another, leading to a lot of code duplication. To address this, introduce the concept of allowed domain (represented as a cpumask) that can be used by the BPF schedulers to apply the built-in idle CPU selection policy to a subset of preferred CPUs. With this concept the idle CPU selection policy becomes the following: - always prioritize CPUs from fully idle SMT cores (if SMT is enabled), - select the same CPU if it's idle and in the allowed domain, - select an idle CPU within the same LLC domain, if the LLC domain is a subset of the allowed domain, - select an idle CPU within the same node, if the node domain is a subset of the allowed domain, - select an idle CPU within the allowed domain. If the allowed domain is empty or NULL, the behavior of the built-in idle CPU selection policy remains unchanged. This only introduces the core concept of allowed domain. This functionality will be exposed through a dedicated kfunc in a separate patch. Signed-off-by: Andrea Righi --- kernel/sched/ext.c | 2 +- kernel/sched/ext_idle.c | 128 +++++++++++++++++++++++++++++----------- kernel/sched/ext_idle.h | 3 +- 3 files changed, 97 insertions(+), 36 deletions(-) diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 8c9f36baf7dfd..1e9414ffeff01 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -3395,7 +3395,7 @@ static int select_task_rq_scx(struct task_struct *p, int prev_cpu, int wake_flag } else { s32 cpu; - cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, 0); + cpu = scx_select_cpu_dfl(p, p->cpus_ptr, prev_cpu, wake_flags, 0); if (cpu >= 0) { p->scx.slice = SCX_SLICE_DFL; p->scx.ddsp_dsq_id = SCX_DSQ_LOCAL; diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c index 4f8a6e46a37a4..9469bf41fd571 100644 --- a/kernel/sched/ext_idle.c +++ b/kernel/sched/ext_idle.c @@ -46,6 +46,11 @@ static struct scx_idle_cpus scx_idle_global_masks; */ static struct scx_idle_cpus **scx_idle_node_masks; +/* + * Local per-CPU cpumasks (used to generate temporary idle cpumasks). + */ +static DEFINE_PER_CPU(cpumask_var_t, local_idle_cpumask); + /* * Return the idle masks associated to a target @node. * @@ -391,6 +396,21 @@ void scx_idle_update_selcpu_topology(struct sched_ext_ops *ops) static_branch_disable_cpuslocked(&scx_selcpu_topo_numa); } +static const struct cpumask * +task_allowed_cpumask(const struct task_struct *p, const struct cpumask *cpus_allowed, s32 prev_cpu) +{ + struct cpumask *allowed; + + if (cpus_allowed == p->cpus_ptr || p->nr_cpus_allowed >= num_possible_cpus()) + return cpus_allowed; + + allowed = this_cpu_cpumask_var_ptr(local_idle_cpumask); + if (!cpumask_and(allowed, p->cpus_ptr, cpus_allowed)) + return NULL; + + return allowed; +} + /* * Built-in CPU idle selection policy: * @@ -403,50 +423,83 @@ void scx_idle_update_selcpu_topology(struct sched_ext_ops *ops) * branch prediction optimizations. * * 3. Pick a CPU within the same LLC (Last-Level Cache): - * - if the above conditions aren't met, pick a CPU that shares the same LLC - * to maintain cache locality. + * - if the above conditions aren't met, pick a CPU that shares the same + * LLC, if the LLC domain is a subset of @cpus_allowed, to maintain + * cache locality. * * 4. Pick a CPU within the same NUMA node, if enabled: - * - choose a CPU from the same NUMA node to reduce memory access latency. + * - choose a CPU from the same NUMA node, if the node domain is a subset + * of @cpus_allowed, to reduce memory access latency. + * + * 5. Pick any idle CPU within the @cpus_allowed domain. * - * 5. Pick any idle CPU usable by the task. + * If @cpus_allowed is NULL, the task is allowed to run on any CPU. * * Step 3 and 4 are performed only if the system has, respectively, multiple * LLC domains / multiple NUMA nodes (see scx_selcpu_topo_llc and - * scx_selcpu_topo_numa). + * scx_selcpu_topo_numa) and their domains don't overlap. + * + * If %SCX_OPS_BUILTIN_IDLE_PER_NODE is enabled, the search will always + * begin in @prev_cpu's node and proceed to other nodes in order of + * increasing distance. + * + * Return the picked CPU if idle, or a negative value otherwise. * * NOTE: tasks that can only run on 1 CPU are excluded by this logic, because * we never call ops.select_cpu() for them, see select_task_rq(). */ -s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64 flags) +s32 scx_select_cpu_dfl(struct task_struct *p, const struct cpumask *cpus_allowed, + s32 prev_cpu, u64 wake_flags, u64 flags) { - const struct cpumask *llc_cpus = NULL; - const struct cpumask *numa_cpus = NULL; - int node = scx_cpu_node_if_enabled(prev_cpu); + const struct cpumask *llc_cpus = NULL, *numa_cpus = NULL; + const struct cpumask *allowed; + int node; s32 cpu; + preempt_disable(); + + /* + * Determine the allowed scheduling domain of the task. + */ + allowed = task_allowed_cpumask(p, cpus_allowed, prev_cpu); + if (!allowed) { + cpu = -EBUSY; + goto out_enable; + } + + /* + * If @prev_cpu is not in the allowed domain, try to assign a new + * arbitrary CPU in the allowed domain. + */ + if (!cpumask_test_cpu(prev_cpu, allowed)) { + cpu = cpumask_any_and_distribute(p->cpus_ptr, allowed); + if (cpu < nr_cpu_ids) + prev_cpu = cpu; + } + node = scx_cpu_node_if_enabled(prev_cpu); + /* * This is necessary to protect llc_cpus. */ rcu_read_lock(); /* - * Determine the scheduling domain only if the task is allowed to run - * on all CPUs. - * - * This is done primarily for efficiency, as it avoids the overhead of - * updating a cpumask every time we need to select an idle CPU (which - * can be costly in large SMP systems), but it also aligns logically: - * if a task's scheduling domain is restricted by user-space (through - * CPU affinity), the task will simply use the flat scheduling domain - * defined by user-space. + * Consider node/LLC scheduling domains only if the allowed cpumask + * contains all the CPUs of each particular domain and if the + * domains don't overlap. */ - if (p->nr_cpus_allowed >= num_possible_cpus()) { - if (static_branch_maybe(CONFIG_NUMA, &scx_selcpu_topo_numa)) - numa_cpus = numa_span(prev_cpu); + if (static_branch_maybe(CONFIG_NUMA, &scx_selcpu_topo_numa)) { + const struct cpumask *cpus = numa_span(prev_cpu); + + if (cpus && !cpumask_equal(cpus, allowed) && cpumask_subset(cpus, allowed)) + numa_cpus = cpus; + } + + if (static_branch_maybe(CONFIG_SCHED_MC, &scx_selcpu_topo_llc)) { + const struct cpumask *cpus = llc_span(prev_cpu); - if (static_branch_maybe(CONFIG_SCHED_MC, &scx_selcpu_topo_llc)) - llc_cpus = llc_span(prev_cpu); + if (cpus && !cpumask_equal(cpus, allowed) && cpumask_subset(cpus, allowed)) + llc_cpus = cpus; } /* @@ -484,7 +537,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64 cpu_rq(cpu)->scx.local_dsq.nr == 0 && (!(flags & SCX_PICK_IDLE_IN_NODE) || (waker_node == node)) && !cpumask_empty(idle_cpumask(waker_node)->cpu)) { - if (cpumask_test_cpu(cpu, p->cpus_ptr)) + if (cpumask_test_cpu(cpu, allowed)) goto out_unlock; } } @@ -529,7 +582,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64 * begin in prev_cpu's node and proceed to other nodes in * order of increasing distance. */ - cpu = scx_pick_idle_cpu(p->cpus_ptr, node, flags | SCX_PICK_IDLE_CORE); + cpu = scx_pick_idle_cpu(allowed, node, flags | SCX_PICK_IDLE_CORE); if (cpu >= 0) goto out_unlock; @@ -577,12 +630,14 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64 * in prev_cpu's node and proceed to other nodes in order of * increasing distance. */ - cpu = scx_pick_idle_cpu(p->cpus_ptr, node, flags); + cpu = scx_pick_idle_cpu(allowed, node, flags); if (cpu >= 0) goto out_unlock; out_unlock: rcu_read_unlock(); +out_enable: + preempt_enable(); return cpu; } @@ -592,7 +647,7 @@ s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64 */ void scx_idle_init_masks(void) { - int node; + int i; /* Allocate global idle cpumasks */ BUG_ON(!alloc_cpumask_var(&scx_idle_global_masks.cpu, GFP_KERNEL)); @@ -603,14 +658,19 @@ void scx_idle_init_masks(void) sizeof(*scx_idle_node_masks), GFP_KERNEL); BUG_ON(!scx_idle_node_masks); - for_each_node(node) { - scx_idle_node_masks[node] = kzalloc_node(sizeof(**scx_idle_node_masks), - GFP_KERNEL, node); - BUG_ON(!scx_idle_node_masks[node]); + for_each_node(i) { + scx_idle_node_masks[i] = kzalloc_node(sizeof(**scx_idle_node_masks), + GFP_KERNEL, i); + BUG_ON(!scx_idle_node_masks[i]); - BUG_ON(!alloc_cpumask_var_node(&scx_idle_node_masks[node]->cpu, GFP_KERNEL, node)); - BUG_ON(!alloc_cpumask_var_node(&scx_idle_node_masks[node]->smt, GFP_KERNEL, node)); + BUG_ON(!alloc_cpumask_var_node(&scx_idle_node_masks[i]->cpu, GFP_KERNEL, i)); + BUG_ON(!alloc_cpumask_var_node(&scx_idle_node_masks[i]->smt, GFP_KERNEL, i)); } + + /* Allocate local per-cpu idle cpumasks */ + for_each_possible_cpu(i) + BUG_ON(!alloc_cpumask_var_node(&per_cpu(local_idle_cpumask, i), + GFP_KERNEL, cpu_to_node(i))); } static void update_builtin_idle(int cpu, bool idle) @@ -825,7 +885,7 @@ __bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, goto prev_cpu; #ifdef CONFIG_SMP - cpu = scx_select_cpu_dfl(p, prev_cpu, wake_flags, 0); + cpu = scx_select_cpu_dfl(p, p->cpus_ptr, prev_cpu, wake_flags, 0); if (cpu >= 0) { *is_idle = true; return cpu; diff --git a/kernel/sched/ext_idle.h b/kernel/sched/ext_idle.h index 511cc2221f7a8..977f49905f2c7 100644 --- a/kernel/sched/ext_idle.h +++ b/kernel/sched/ext_idle.h @@ -27,7 +27,8 @@ static inline s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, int node } #endif /* CONFIG_SMP */ -s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, u64 wake_flags, u64 flags); +s32 scx_select_cpu_dfl(struct task_struct *p, const struct cpumask *cpus_allowed, + s32 prev_cpu, u64 wake_flags, u64 flags); void scx_idle_enable(struct sched_ext_ops *ops); void scx_idle_disable(void); int scx_idle_init(void); -- 2.48.1