From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from CH5PR02CU005.outbound.protection.outlook.com (mail-northcentralusazon11012057.outbound.protection.outlook.com [40.107.200.57]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54453188596 for ; Wed, 18 Mar 2026 10:31:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.200.57 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773829884; cv=fail; b=kwP4hjV0KeD6c5r5Z58fKK6aYQf7RpuJHme1QOZVmCrQkdgKT3Uh7nGRp56+LquYgjljgLcjYkhM6S0GGWL9CUv5oycNm20VAbvj1OwH2YQGh+PZqdYRxf9cq8ceHlz+vvN8/RsUAn23LIk/NKcirWnWstco8E23XXqxyGU0f8w= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773829884; c=relaxed/simple; bh=UzbpEKBHGg3dhzKizr9lWSfzFwO+7JyFAVzO/OAlOu0=; h=Date:From:To:Cc:Subject:Message-ID:References:Content-Type: Content-Disposition:In-Reply-To:MIME-Version; b=hb5PSAWuNhd9jCHTSjf7sabAR7Z5V68vUStVXm0G1M0lAJeEASL9tu9VdVEOmV0R3dxk7w23yN5t201u1gsuStI9uGpWxT0LpBZ04HYScZMFMqSeul1igj1GWnz57MCsVLED4F91XzbipQ4TjHxUZw2MnEiqrAlyrmRNrzATm9I= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=cOTfhjJz; arc=fail smtp.client-ip=40.107.200.57 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="cOTfhjJz" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=YZVNCa0WjUUlo1qrt5ah+AALxEEcZ6dKfTBy+g7/ACiGleH5GbFJirjW6eo7SEzC7NhG9XCUAcf3APj29UD9VfphO9YHRPr0QGfIa9xqpRFeuuVuZ5FM33b+w+VrL/o1TVsozLhTFrqofkA3Qg9h89knfs2mcVF3TdKy4SA68AqHNRlsMvbrYFvbcSTi9OT3XXskh1vCUs/DBQ69pty0vNhMbwIcDjN+01KmNVQ41YDS/wg7HbdpKsudDOqliuyjUxqwMNBPXabgYeadDOeV7Czv4luSpIBDxN3vMb31VlwyT901G4wgRvAc2Qr3/rqTAqgUiDfiZnWGVSWoOMwzsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=2mg9S/GZoA31LiT/nDouoUjCjXTATWxZ6ybPc8PiM5Q=; b=BVmdcj+CXXHEKmGUFaeHULYm2+SW4d0g1SQbx6H3Zjh/jE2u7Hd29NoEAoCH3Bu4Ektx65x4JhiH4VvxuWCwLea6fqVnW6vCs2eUcAcWytYFP+NFflY6mFRuo3CqSVEwrQEwqCK/NUHko8pURseEi0DJrXPpqEiqNNcagnvoPoJIREG4fG5FnfIje0jBuCw4LhPgBDOEI3ZQ5fieEJjuVzUD6otwgvU+Bx7btvKj09cy0F2tmr7vjd6MEbJ2FvUYQjNHtnBwikCvsAAIJkP+b5kPAbHnVTjP2hzo/V3BCIUVphpaJukvGGTgd+YpYTP8328ymvsCnDisPSaElNiRpw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=2mg9S/GZoA31LiT/nDouoUjCjXTATWxZ6ybPc8PiM5Q=; b=cOTfhjJz5pvD/FVjAW9T8eUdJiO0xNV8Ou607YRhuBa2B7SxCFMsyAW2daiLJ81Xtgjq2MGCiQZqxWFxOtI3fh7PqqJWgl6jgXYwL61w1tqjAmsYaKazmfkGglphlimrgXkhaZHcLrtlbnJO4SINf+VmmDm2gEfoLiDYuiX4qDTt2nwojk9KBPIs8xSBy8/L529OvTw6zhb74DKiH8MRnk69Spn3G3OyhEmwY2qx+bJIzHwPh2bd0chALaUpaAKCvH+MavEjZ+pS1ufFQAq4k54nr5LeoOtRvGKDe6BGCXiHSbska96ALDqXJ980XNTa1+XEhWoskD19NWtp04ScaA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) by SA3PR12MB9090.namprd12.prod.outlook.com (2603:10b6:806:397::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9745.9; Wed, 18 Mar 2026 10:31:17 +0000 Received: from LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528]) by LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528%5]) with mapi id 15.20.9723.010; Wed, 18 Mar 2026 10:31:16 +0000 Date: Wed, 18 Mar 2026 11:31:06 +0100 From: Andrea Righi To: Vincent Guittot Cc: Ingo Molnar , Peter Zijlstra , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Joel Fernandes , linux-kernel@vger.kernel.org Subject: Re: [PATCH] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Message-ID: References: <20260318092214.130908-1-arighi@nvidia.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: MI1P293CA0017.ITAP293.PROD.OUTLOOK.COM (2603:10a6:290:3::13) To LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: LV8PR12MB9620:EE_|SA3PR12MB9090:EE_ X-MS-Office365-Filtering-Correlation-Id: d4e96730-5967-4ec2-d9ca-08de84d97c3f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|376014|366016|22082099003|56012099003|18002099003; X-Microsoft-Antispam-Message-Info: ADIe6sf1rddzpp9yZHeFa5PYp2l4I5HSi4D5DIXviJZj2UxeedrNQOuBfLonx/wKqX0nkY2DJH/lFtNEtVkanQSytbXZs53V1uWKZI5M3DA56vbi0Et4fBdVzlvcFUgTDJz7dYGN8QLOWmOFo/RZ0RiDFheDtanXbmsTayamciRZjtEQxb7cneVnFnt+bFYHM6fweobgX6vbsCJKAWbDDlAwfAcHQs8Mk1fT0QZLNYhBymd2hsGQX/YQXrKylhNCqxY0Qe7bo1FdIGaLxFTYUQtICdKGhv8D8QpPPf/aS5QIpVgV7j/l0wwJFAE3piY8nu3edVjicqIShzvteQATFNjenrWGh26m0iwHhPW8jNwGCc4dMd0aOUlCcis8udt4Bib854IsAKXqSnZetWzHWjNH3Oq3QBpt7DJ6lH4kbF2LFkgy0X4QcfJE3b8NjBqzdQwLuGsRRoaXdJwxCu29YFoCY8SKh2SdF3YcUC5Mt/znTUPN+PwBgu3VCAqfhsPD5mrER1gwDztMPRR5RGtuOIihfdCEud9yE3IfhB4iCCHYQdhoQwbt2W7HsZCSC2YFeRXzbUYlaf33BGsqotFD/CfkQDJY/BuITsHP6v//kQfd6qxvNuKiuuSdbGrWBZ/1wkew9GPE5Ub1PDS6unvGEbaxJydwZaV80TFV/7pzmFUegfcHwwfHkFrWxVmGB9E/0LiBsBBwzd3+SkV4lMhSKtUv6rIgcPVhwCVA9OyEkjg= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LV8PR12MB9620.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(376014)(366016)(22082099003)(56012099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?YbGDYlyJCwHnagiJMWXFGzdagATt3d2cla1sT3LyfTMtIvTCm6xtzpUXNIhQ?= =?us-ascii?Q?B32P0zqGhsEplDKE4s51JWEDxKS9yUK23g1k5OoT6ui4KA2lEXmVVdQaeR3c?= =?us-ascii?Q?g+y4ptR/RSkk221xOZ17oyI09B41VHCCzRjEqRrH1B3vkoz4eXOBwItTbaGY?= =?us-ascii?Q?Op0l0eJlhSdblX2aGUQVpt4l0dFNCpXa5G75T90xx+IIu1auRCj1dB2sO49b?= =?us-ascii?Q?RUDJupcAOa4y6eq9lU9/fLsCZZPYv2WiGNPBNC/hl6NvkW67tx+S8OH1tEcq?= =?us-ascii?Q?a4I9ttfXVIao9RH4qoe3Xz1LUIYC6/MxXKC9PpZ5RRqhO43wHV/cu487XTc5?= =?us-ascii?Q?6y/llTTfu39WaiIc70pQ1AkN02xA6XSiIjMbdxK0wcT2+05zozWO7yGH6IMs?= =?us-ascii?Q?XL0stjGLUv0nvEMCCSTkcXJPPCa1WxPWhSrPODIUtP8kf0fr2clpvejf1m8/?= =?us-ascii?Q?CMa3kIWGn5X2XXT3vmH+hbDYNd2i8qGaV87TFnx2MBbRIoXJTubt6w1gPhUl?= =?us-ascii?Q?/GytQj9zn0XXN96fL8TzAAmeT1qcetQtip7xGrh8fCmqcckQ/ATRfKTS65bA?= =?us-ascii?Q?Rcjyb7QC6H7oC4hEWgO4G/N7Mu5HgHq4I24N4GcjlMD3EE2u1S8QRxHnZNrN?= =?us-ascii?Q?Kg3HWjhaiiNwlC2xkStwGmVWl6hxJHz3WnYqxJRTPm03bGl1m697iWX2npEM?= =?us-ascii?Q?6BC/4lDBriaKGeIe39mmjoODqiJiokh4EpnlTVHqfCbADfprq0toaefO4OwH?= =?us-ascii?Q?rcvoD0JotKZ9R5u3jZB7/JQTmRZup/AYm95qdXUdZnyl25xCMutlX7zfeU0I?= =?us-ascii?Q?B+9lcYiWdYxv+PU5PUSnMSSiLb9v6XJ+Sb/nYzn7Xl+ZxINqm/69BOMVzEkb?= =?us-ascii?Q?n8dxfBCA2BhRBTAfbF4L53I/nnbi1g5MnfY/3EEQ+XL1TvFXli2CHEKJkAFh?= =?us-ascii?Q?qzI9PkHsVqmdJ8Gm3p3rm8gINhaVaj1zxneFyBG5ygCHdWcjwupDljRcnHRT?= =?us-ascii?Q?CO5EQwAf+akobhNniD9l3OjV3runxtQTmvjNLPCJHnsH/vu+5uBultII4bPy?= =?us-ascii?Q?D+PwNHqIsjKEvkZgSjVdVMbuv6aS5tyLOCS2L9+yCK15Hy1iVlTD6hFFVsfl?= =?us-ascii?Q?S0xDP3dCJDApP0ehoNrkL1sYDXojZMIWqAWRQWv400yKu0FQlFVORctAoqT6?= =?us-ascii?Q?+NLw+90lL5IvNDjHabdWiSjhPmBzrhqY+R2wgbWVDKv7c3HV9Kus6jkETgCK?= =?us-ascii?Q?2TAorEY8nfczvXPOXnAy+XnDj3UbOSVZYr47Ncsb5qvEGKTKUIuXLu2/926n?= =?us-ascii?Q?DOj6uo8kl6AARkgDFE4slbYmJ1z1eQceIbJczxxdKccjIhhrGXkZg1E0uIuB?= =?us-ascii?Q?ov5zMbanPSIiooxMh7gdcOFIdMT+b9eiqEbn+3kG9pIiKh0/A9hFGv4RGUFl?= =?us-ascii?Q?95uOPZZN5J85sY/zs9JUYkNzmtEUySCxr2toYSpJ0u88fN3WGf/DhwANcp5Y?= =?us-ascii?Q?a9Ko0bn/ZWVYS5MGiWu+PT7xSO2x87Ex/Zlwl+bcmr5kSsWJ9ux61DWsbo/3?= =?us-ascii?Q?tqYXz1MXJ7ELt/q1YTqCeXCWJeSSKweaPkBWb4g8ozYpKrIDMbYOkaF+nL7o?= =?us-ascii?Q?SeCST/RL+43+XN/2oezkoKS80kwpGo7BW8ywDpoNwgllQvUxFPgnsao3CSzN?= =?us-ascii?Q?8aCpIKJvmrFW+KBtYsJ6tH5nNE9fkpifaVqQYv2ASJGwC0Cqqs902Z3EUnNm?= =?us-ascii?Q?Zs3/uu4meA=3D=3D?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: d4e96730-5967-4ec2-d9ca-08de84d97c3f X-MS-Exchange-CrossTenant-AuthSource: LV8PR12MB9620.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Mar 2026 10:31:16.2842 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: HSElZqlKzGOIk/nmSk1vvKtFCta+hY3gI6o/ayqRB3eHTYW4Hzee0jHs0Q9cuw5AX4RUfpRDieVnwzrg8xESaA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA3PR12MB9090 Hi Vincent, On Wed, Mar 18, 2026 at 10:41:15AM +0100, Vincent Guittot wrote: > On Wed, 18 Mar 2026 at 10:22, Andrea Righi wrote: > > > > On systems with asymmetric CPU capacity (e.g., ACPI/CPPC reporting > > different per-core frequencies), the wakeup path uses > > select_idle_capacity() and prioritizes idle CPUs with higher capacity > > for better task placement. However, when those CPUs belong to SMT cores, > > Interesting, which kind of system has both SMT and SD_ASYM_CPUCAPACITY > ? I thought both were never set simultaneously and SD_ASYM_PACKING was > used for system involving SMT like x86 It's an NVIDIA platform (not publicly available yet), where the firmware exposes different CPU capacities and has SMT enabled, so both SD_ASYM_CPUCAPACITY and SMT are present. I'm not sure whether the final firmware release will keep this exact configuration (there's a good chance it will), so I'm targeting it to be prepared. > > > their effective capacity can be much lower than the nominal capacity > > when the sibling thread is busy: SMT siblings compete for shared > > resources, so a "high capacity" CPU that is idle but whose sibling is > > busy does not deliver its full capacity. This effective capacity > > reduction cannot be modeled by the static capacity value alone. > > > > Introduce SMT awareness in the asym-capacity idle selection policy: when > > SMT is active prefer fully-idle SMT cores over partially-idle ones. A > > two-phase selection first tries only CPUs on fully idle cores, then > > falls back to any idle CPU if none fit. > > > > Prioritizing fully-idle SMT cores yields better task placement because > > the effective capacity of partially-idle SMT cores is reduced; always > > preferring them when available leads to more accurate capacity usage on > > task wakeup. > > > > On an SMT system with asymmetric CPU capacities, SMT-aware idle > > selection has been shown to improve throughput by around 15-18% for > > CPU-bound workloads, running an amount of tasks equal to the amount of > > SMT cores. > > > > Signed-off-by: Andrea Righi > > --- > > kernel/sched/fair.c | 24 +++++++++++++++++++++--- > > 1 file changed, 21 insertions(+), 3 deletions(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 0a35a82e47920..0f97c44d4606b 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -7945,9 +7945,13 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool > > * Scan the asym_capacity domain for idle CPUs; pick the first idle one on which > > * the task fits. If no CPU is big enough, but there are idle ones, try to > > * maximize capacity. > > + * > > + * When @smt_idle_only is true (asym + SMT), only consider CPUs on cores whose > > + * SMT siblings are all idle, to avoid stacking and sharing SMT resources. > > */ > > static int > > -select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target) > > +select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target, > > + bool smt_idle_only) > > { > > unsigned long task_util, util_min, util_max, best_cap = 0; > > int fits, best_fits = 0; > > @@ -7967,6 +7971,9 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target) > > if (!choose_idle_cpu(cpu, p)) > > continue; > > > > + if (smt_idle_only && !is_core_idle(cpu)) > > + continue; > > + > > fits = util_fits_cpu(task_util, util_min, util_max, cpu); > > > > /* This CPU fits with all requirements */ > > @@ -8102,8 +8109,19 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) > > * capacity path. > > */ > > if (sd) { > > - i = select_idle_capacity(p, sd, target); > > - return ((unsigned)i < nr_cpumask_bits) ? i : target; > > + /* > > + * When asym + SMT and the hint says idle cores exist, > > + * try idle cores first to avoid stacking on SMT; else > > + * scan all idle CPUs. > > + */ > > + if (sched_smt_active() && test_idle_cores(target)) { > > + i = select_idle_capacity(p, sd, target, true); > > + if ((unsigned int)i >= nr_cpumask_bits) > > + i = select_idle_capacity(p, sd, target, false); > > Can't you make it one pass in select_idle_capacity ? Oh yes, absolutely, we can select the best-fit CPU in the same pass and use it as a fallback if we can't find any fully-idle SMT CPU. I'll change that. > > > + } else { > > + i = select_idle_capacity(p, sd, target, false); > > + } > > + return ((unsigned int)i < nr_cpumask_bits) ? i : target; > > } > > } > > > > -- > > 2.53.0 > > Thanks, -Andrea