From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from BL2PR02CU003.outbound.protection.outlook.com (mail-eastusazon11011005.outbound.protection.outlook.com [52.101.52.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A4AD23535E for ; Sat, 18 Apr 2026 06:02:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.52.5 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776492153; cv=fail; b=UcvuQ9kUinbP9I4FFuflbW36eDPSM7Yt8pgIhISTRYjvhOAT4f3hnscKhfSpKByeZm4BKRhJ7ZpFGa6hp/tVHuor/41mo2tKoToJbihNa/vS0w6dFQONsm3n6H9Ll75NvCmsUS4e9KGYVYaoOoIlvOABHRwh/NseyiL9knmw/6A= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776492153; c=relaxed/simple; bh=hspUNeGhUyv09y/mH72B9vLmuOe/TmBOurXYIxUlnYs=; h=Date:From:To:Cc:Subject:Message-ID:References:Content-Type: Content-Disposition:In-Reply-To:MIME-Version; b=e+NN28c2PqLvAZhSEAUaWkFPSKJ0A4SMA00XjeBtA4sggh2Oq7iXXKLgPdbFX0Wt6pEPnLWkNv8b9ZQjrBvEgxi+EBL+8m6TBaOJTIdDLGi9evCz3V5aV+xXfigMvpYkrfhvOlq+rKKHljdz2MyCWmQDTvAsgqGzuQ7080umGJ4= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=g6dud98E; arc=fail smtp.client-ip=52.101.52.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="g6dud98E" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=mCqlA6a7hR30Vrjl2CF6i0n3iR8sPLsgs5gt3fzxA7UcjIuOxEKsIPLkawLq+H2puIDnWE/wk3KwnSXBwJz6G+wh5VjoLJsprM+uPvEyNdLdXMrJk7U8XrIXkYcgqtj4juWGfOJ3xnNVmOj5q4FHm+OIXhFFQ0hAcBx2+z9oLbV8hREdk4gcUd5nI98Us/HQpTCXtYmQW1j0x8dZfGrjpPQEcMXkt4GMYjaePhAXr/2iHEjWrlxjaJBt29x2EUtW84veGfQStauDQeBRme9/TcJYqAn6TMgj9UoUadj2Q2mt9X7tM6q/eneQYfeuoN5YcnrsR8L6FT31RGgZ4TFgoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=JQwDBpfJ0wI+wZ4X+2is1EKQtz/p36CYdJVDCuLi80o=; b=dS0j5G6YbiZCIyYRAXLBxU8Qd0Rv2xeB2Hrw9Pdiv1XaXCGRvCCBBA+Uz2LXuGsCW9/8GC2RaLDQfNPQcmi2e1BeqccoxLtbHyux/uhtCjspPct7wo8E/RFl2mow+qoCTiJ0mu5XBlUBDZnpnTBR/zxdnfuyEWCoUDLtf4Dze0o+EeZGOuNTRYA1Yw4L3hE0wd6BRslkS1J/TkvBdVBeNyaZU587xpxFnY7pRgqWHgdfEufaIbhkd3Hr1uhwE+2txbPh9vsIp/sUD37f/e6gi5h9F7bV/YmqL7aU91egpo/lbBNfJk8okaPWEdHQ/Wvz6hLSFN9rpM91vGRjQ1UWcA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JQwDBpfJ0wI+wZ4X+2is1EKQtz/p36CYdJVDCuLi80o=; b=g6dud98EbzOKpCQidCOtiuR9O6N66HBp73cQHzO37FGxqRZPCbFndPxOtX4PXf0WRdZ2BZB7q3I4LYTDzfEpntXdcm22UAZvFXOdLAcRUQNwMlWakkoc/u0egLPyLMS5VuTnCACk7XL8TQuXw1w3r750Ihpgu7ZjNy+PN7I0rr37K9WJOjYHWciK9AQUzz/UxwWUIozG5BduBJhQPwNrsdC4KhkiLeAJjs/vVGPIoVR/GdHBedl1M/xOG/SII0ef8jH0qiR0fOYHwDrMPgNWDcunEt8Wn8YXKnRXSe8vEZGamNfmKsZkxMqhubt8KbKscrrz3ulE+W9KuQu5wHu8eA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) by BL3PR12MB6594.namprd12.prod.outlook.com (2603:10b6:208:38d::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9818.25; Sat, 18 Apr 2026 06:02:28 +0000 Received: from LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528]) by LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528%5]) with mapi id 15.20.9818.017; Sat, 18 Apr 2026 06:02:28 +0000 Date: Sat, 18 Apr 2026 08:02:16 +0200 From: Andrea Righi To: Vincent Guittot Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , K Prateek Nayak , Christian Loehle , Koba Ko , Felix Abecassis , Balbir Singh , Shrikanth Hegde , linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Message-ID: References: <20260403053654.1559142-1-arighi@nvidia.com> <20260403053654.1559142-2-arighi@nvidia.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: MI2P293CA0015.ITAP293.PROD.OUTLOOK.COM (2603:10a6:290:45::9) To LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: LV8PR12MB9620:EE_|BL3PR12MB6594:EE_ X-MS-Office365-Filtering-Correlation-Id: 8c4ebaec-4b1b-479f-f4e6-08de9d1011d4 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|7416014|376014|366016|22082099003|18002099003|56012099003; X-Microsoft-Antispam-Message-Info: Y55fYmbRrV0OvSiIfseatnwl+Pg+uxzqtwxy3ATD8UOqJ4sWWkTMnf8EbJJJboqM4mfdJVCtcN1hvSLkdXkmi2slGj7vIvzziWRZi2yvH/Jq9oY+7S1lr6YIRwAQgSAixQKcEMvzhpFR508b7GHn+zJfKSH38aLsQ2wVl59aGXMzFwJ9XQ8lhpdFmc/7xjsYebGqiV2JTEn6BtSOUHYJu/7PcUmtvZItSzGiwOQuqDus9UnCjDzhltkZkAeaI5jU95iN204C6NYTrHM1b2/MXiTsbEpWSSL4B7IYqts13+vR30jPy4lQ39kmv/b711A01ZsSnJMZysjIT70KArABcksvNR6F0QxnSKPCEFoiwT2jhSM1hT6WkP9e7tf0jqXDqdHkfpOURbmVYyCzH0812IdbI3153z6Y4JeHCkWvaAcHTwJiLuzWC+FPWl0yWQP0Op36927miG6SscF84ogbDcMJRJOVolp2wwXllzBsC495KeRSavD83mQZja4tMB46yizaCNQOzQ3IOwymJH5+jmbXJm/anc1DLfwogHpJhhdIKDEIufBs/KXOtzk998ZxZGyWma4bp0ksLtEvRBINdw9G97e8FTI66zirLpC+0peZzz2Xb2jQHun/QS/o3u59htw5VAV2HenJT7f4I0OQdOySeaWRDEzRd4zfAxOZnFiLz0G1xg6Af9+ArtxW1uyfHQ3nTrjB7rKBKQcwhi0T3OlPPJBNhxAv+9ERG50t03Y= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LV8PR12MB9620.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(7416014)(376014)(366016)(22082099003)(18002099003)(56012099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?XWoGSYvnUCT96BtgogQt4ADSFS4fv/9A70hpgwwiEZy2M02iB4VHMee7Q0Xt?= =?us-ascii?Q?DeP+CT9F+ij5zq0de6oNonouYQw5AOVNTEDsjnS1wZ+o2j1lTG7j7NHOQADa?= =?us-ascii?Q?+3jhl4BQpZGAYTSVPIs7IaE1zL7I8rG0RWB48/Bgpcd/EIUXNInBjNpjTOjw?= =?us-ascii?Q?W8RBCBaW9Q4nOxWSqFk1i0eQOp/0b8CkXQtu4WqWxUYVr3l/z/r5wG5iU1mB?= =?us-ascii?Q?FMkR2pHJJXD+elY4RGSjEVWekiOt3ofCTbGtDDVl68soEoA+mvkIQ8cXD7y+?= =?us-ascii?Q?C0QpFNPSiqQxoBVvRz/JKNjotozCGMTv/GulLmErnezeHh4kX5eKdCGcw4IX?= =?us-ascii?Q?n74R2WHdwxDJuCLN7jtkbjbUQyq6lCYciksFPlxxxxRSZ+++MTULnToCsfBN?= =?us-ascii?Q?I3WgUF0qUagQlzGlIQ2zJDj4Mrxuv0CNuNxtGCkcCKIUhZnmq+AgjJ8oPu5A?= =?us-ascii?Q?s4AE+TpvdXnbaMj3P4ZBpEPxIE/77vq7o80H86cu5rV9Sw53x4wFWbUl+jot?= =?us-ascii?Q?KrwTvonSLhrOEQTAy1a1WjWVz64YZeWTqByyMqxjcfG+yWDst+RI81v1h/yA?= =?us-ascii?Q?iMAzWjz71H0zcaDsX2p6D7elJ4fC3q4yYE+dlqgKuA29Z2mCmJvRcgGx8K0W?= =?us-ascii?Q?mlLFDYmlz0hXyPokgjIxpHxnJpCIhuZYU4t9Eyh/vquUbN68XBM4kVtP8YkF?= =?us-ascii?Q?I7cDjsKC/bQD/AHqLDEm9+O0roR0rb01YbPitjKdClugHX7Zum1Kr8dFApQ2?= =?us-ascii?Q?6Zpuz6x5rSK96vKxD+G+eSdwHA/wbql3sfqtwvTp3cw0gW9tSwsY7UMt0+fv?= =?us-ascii?Q?dnWszBSzEJMdazQLWFJW9bkUjc6Vf2mPgUC4RK8nECUJM8yz4NzlKN5YAJ1a?= =?us-ascii?Q?s6eghK527jYZge7P6KlIAWHdp6TdDOBSfa4n98RFUjaO+wsSbClEVhLta1Y9?= =?us-ascii?Q?JPeuJd2ud6Hip9iIbi7WnpbTv76SgFFL4yhVwQ8KN0fhjAn6d2HjouZuHwZE?= =?us-ascii?Q?gInf5WQ52wFGD56wrTJbzOcW4fj6uGiisUbGe/8gTh5G2LRZdmYyfu0ReYA+?= =?us-ascii?Q?dcx/F7CexMngltrFprMlymH7qlI+8UFQ9u56DnyGHX9my+bEyjz+obfgeY1Y?= =?us-ascii?Q?Gh47HguG/yQw3Y9vzFF3m/ND4s9NDrWU2ElHims3yWJyLfWNaU21CQMnMBZO?= =?us-ascii?Q?WPM9VOtN+PEL3/aJZX73G0R+clLWB6DXVYEOaYhQCOfjeOUgwgJ6+9yOQrkl?= =?us-ascii?Q?I96fnKg46BRJNr94eLzcZgwe8/ZkzWOeaTE0vz/BZXYnWbcYDABZEHnrBsQ4?= =?us-ascii?Q?Yr1lIjoX9U6L3sf6v7zBUtPUiZUMUeX0tWKlMWAuwaQ5u309osJbemxG9n/K?= =?us-ascii?Q?NGRJ3iw1uMmKzeW2916SdR4qycEPrOCtHGtCN6h82i3kfVK0UPXfWlcg2pkZ?= =?us-ascii?Q?ZWKRGTQtFUF3/cx0sC+E4m3JPhJ0Eg/OKfyAosIZzgZZU2DNqNh0yrcnE5F3?= =?us-ascii?Q?jr36D3yYqL0ZIJ7GZ1AbRDnvb7IVFcbL3SivtzSted/+o2E3Gain1VFzJ4h0?= =?us-ascii?Q?WmYltVTGZDwgbcase0H9mXfAAJiPZHCAvpijnlxFf64AQhqS8duNe1qsq0W/?= =?us-ascii?Q?LfzXYI9wQ8T0X3Wt6uL9CYAvj1T8S1XhprW6D4Exo+zXmP4whd5QrQcuPd+1?= =?us-ascii?Q?m1jp7zJN+1ynXil+eHmQAm+uWbLHwsLwUT+S/yCGufax83S3JTn7kkTExrWF?= =?us-ascii?Q?wq1TeG+F5g=3D=3D?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 8c4ebaec-4b1b-479f-f4e6-08de9d1011d4 X-MS-Exchange-CrossTenant-AuthSource: LV8PR12MB9620.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Apr 2026 06:02:28.1968 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: N2ds2tVg41qBXO+nTVWaDZUwiZiy6ld+JQMTn+AZWSxo0SAywz+53lrlnJW19HtZzVN2ercCG1aLRt6bwUgePw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL3PR12MB6594 Hi Vincent, On Fri, Apr 17, 2026 at 11:39:21AM +0200, Vincent Guittot wrote: > On Fri, 3 Apr 2026 at 07:37, Andrea Righi wrote: > > > > On systems with asymmetric CPU capacity (e.g., ACPI/CPPC reporting > > different per-core frequencies), the wakeup path uses > > select_idle_capacity() and prioritizes idle CPUs with higher capacity > > for better task placement. > > > > However, when those CPUs belong to SMT cores, their effective capacity > > can be much lower than the nominal capacity when the sibling thread is > > busy: SMT siblings compete for shared resources, so a "high capacity" > > CPU that is idle but whose sibling is busy does not deliver its full > > capacity. This effective capacity reduction cannot be modeled by the > > static capacity value alone. > > > > When SMT is active, teach asym-capacity idle selection to treat a > > logical CPU as a weaker target if its physical core is only partially > > idle: select_idle_capacity() no longer returns on the first idle CPU > > whose static capacity fits the task when that CPU still has a busy > > sibling, it keeps scanning for an idle CPU on a fully-idle core and only > > if none qualify does it fall back to partially-idle cores, using shifted > > fit scores so fully-idle cores win ties; asym_fits_cpu() applies the > > same fully-idle core requirement when asym capacity and SMT are both > > active. > > > > This improves task placement, since partially-idle SMT siblings deliver > > less than their nominal capacity. Favoring fully idle cores, when > > available, can significantly enhance both throughput and wakeup latency > > on systems with both SMT and CPU asymmetry. > > > > No functional changes on systems with only asymmetric CPUs or only SMT. > > > > Cc: K Prateek Nayak > > Cc: Vincent Guittot > > Cc: Dietmar Eggemann > > Cc: Christian Loehle > > Cc: Koba Ko > > Reported-by: Felix Abecassis > > Signed-off-by: Andrea Righi > > --- > > kernel/sched/fair.c | 36 ++++++++++++++++++++++++++++++++---- > > 1 file changed, 32 insertions(+), 4 deletions(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index bf948db905ed1..7f09191014d18 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -7774,6 +7774,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool > > static int > > select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target) > > { > > + bool prefers_idle_core = sched_smt_active() && test_idle_cores(target); > > unsigned long task_util, util_min, util_max, best_cap = 0; > > int fits, best_fits = 0; > > int cpu, best_cpu = -1; > > @@ -7787,6 +7788,7 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target) > > util_max = uclamp_eff_value(p, UCLAMP_MAX); > > > > for_each_cpu_wrap(cpu, cpus, target) { > > + bool preferred_core = !prefers_idle_core || is_core_idle(cpu); > > unsigned long cpu_cap = capacity_of(cpu); > > > > if (!available_idle_cpu(cpu) && !sched_idle_cpu(cpu)) > > @@ -7795,7 +7797,7 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target) > > fits = util_fits_cpu(task_util, util_min, util_max, cpu); > > > > /* This CPU fits with all requirements */ > > - if (fits > 0) > > + if (fits > 0 && preferred_core) > > return cpu; > > /* > > * Only the min performance hint (i.e. uclamp_min) doesn't fit. > > @@ -7803,9 +7805,30 @@ select_idle_capacity(struct task_struct *p, struct sched_domain *sd, int target) > > */ > > else if (fits < 0) > > cpu_cap = get_actual_cpu_capacity(cpu); > > + /* > > + * fits > 0 implies we are not on a preferred core > > + * but the util fits CPU capacity. Set fits to -2 so > > + * the effective range becomes [-2, 0] where: > > + * 0 - does not fit > > + * -1 - fits with the exception of UCLAMP_MIN > > + * -2 - fits with the exception of preferred_core > > + */ > > + else if (fits > 0) > > + fits = -2; > > + > > + /* > > + * If we are on a preferred core, translate the range of fits > > + * of [-1, 0] to [-4, -3]. This ensures that an idle core > > + * is always given priority over (partially) busy core. > > + * > > + * A fully fitting idle core would have returned early and hence > > + * fits > 0 for preferred_core need not be dealt with. > > + */ > > + if (preferred_core) > > + fits -= 3; > > > > /* > > - * First, select CPU which fits better (-1 being better than 0). > > + * First, select CPU which fits better (lower is more preferred). > > * Then, select the one with best capacity at same level. > > */ > > if ((fits < best_fits) || > > You have to clear idle_core if you were looking of an idle core but > didn't find one while looping on CPUs. > > You need the following to clear idle core: > > @@ -7739,6 +7739,11 @@ select_idle_capacity(struct task_struct *p, > struct sched_domain *sd, int target) > } > } > > + /* The range [-4, -3] implies at least one idle core, the values above > + * imply that we didn't find anyone while looping CPUs */ > + if (prefers_idle_core && fits > -3) > + set_idle_cores(target, false); > + > return best_cpu; > } That makes sense! But it should be best_fits instead of fits, right? Thanks, -Andrea