From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from SA9PR02CU001.outbound.protection.outlook.com (mail-southcentralusazon11013057.outbound.protection.outlook.com [40.93.196.57]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76199341ADF for ; Sat, 28 Mar 2026 22:50:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.93.196.57 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774738228; cv=fail; b=qCvuFTGpm5IX3EN+U8WJEVS8KayxMd82hDr3E0p8LZY/gcmTAoYjUapRMi8Z4rPdhw5u/NZ0psjKUh6111uwPx6Q/vP0VYP1kVPWGFJZBfv5GCOlsJmiBEj9nPzaUvdMJFKexNzVvj6t1l8AEgElYZ4JyUJ40mk6Sp4K7GG2m1M= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774738228; c=relaxed/simple; bh=oUTuXDRlPRuTDx7RwgLvx6rcwX4nUIa0qx6WdiMyC7s=; h=Date:From:To:Cc:Subject:Message-ID:References:Content-Type: Content-Disposition:In-Reply-To:MIME-Version; b=R12wUlbXWNG5IyUPB5gxwStNUTyjrudgp2H72hpQZX02uGdp2SKQJcTe5v4f7z+FBkHzAAgdOlDj3G/WzAHoSop4/FM/1h04vaSQByxO/5U3ZYtSZMQjWGWHu0m8syowJHFM6C5EwkeFfPoQnwYPvuHC1TD3TAH5Nvde24Texn4= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=EHcU20y2; arc=fail smtp.client-ip=40.93.196.57 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="EHcU20y2" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=MiLWo4kR9xpjeuIsbx6phqo5wKACVEli7DfLZQzUSjTPYbC5jL8q4yQbQfDIu9CGwi4UWySv1q20xjBm4ckBEnHSDS68JUysAS+MeRb1mioC6MZHSsnmVilXUu9wBTvMYv+J8XpJ6jjDFt8XcxuvktOlUe+L7MngAlcdHlGoGimclJaa+tsCUjY08HkCZ9Mjved9V+Bu01qYnuFpxCD4F5EUWsRWxLLfbFQeYQTnZ1kwk1z+dSj/Dg2ut4+dkRIBX8ANI5FVlR02P6W30jItGRwCeGli/IGju6I+xlQHjGRd6RT6NVWjTI7nrZ8zoIbgqAm/AQEH8zg6H3QZSqDYQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lUMG3jrAy2VCWDwCUMwx1SQPtGWpGr/bvkIPEubfa9o=; b=X+iQ6AJkrwLa5iKmhjiQdI7PJcAVjn7XZXAnosqD2D2JfSCVpcrt0Im7hnMry4BY3iLnEZgyUZSojFnv3kL7Hwn7aMpAd5iLoKLhCvilYCGrfpN47fg43vHldOczvaCqfAxDKx4ltBAyRZdUSU2bULaomSnVPAKXU/5+VxgiM9zFPvKhq7JsVU+luhvs+i8fF0RLKfrzbmTYWAs1XkxHbtRVPY+8Zq5kHLQvVn5NV1a+H3AU9FslIZ08tYHugPu93xy1af6SWIHQjcrjByMtx2r0+OR2GE+1CyXGBP5Pz6zH0wph5gxItE/YpuoRi1rVqJK4SMuMzN4h5yJ28G9e2g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lUMG3jrAy2VCWDwCUMwx1SQPtGWpGr/bvkIPEubfa9o=; b=EHcU20y2vR23CoAEqac2My0u+fr3KvDK4dcQ6un4EjkWZPy8i0V+W7LcEyQip8tGsKKqr8qkHA3L5saxPBVe/ItqizdOYicABw+7sY38pPQm5iD895E+FMe2Px4+G1dqACyOMTmP2ootWLoHH4HN/HNsdshLc1I6yhDXEg6DLQeIYo5Fsvh+Y9YvKx5H11lNSeFCpsyTsDcUnXuTb6gF0gPvq5fTOzpqGmXgEKCRr/CqAwLCxIAPdWDUaThDir8wrXGa14QzahflLkKF4qIuHaenHKlE4XGeWdwn/SuCtly7TYeQt4Lm5TRMnGUGx96gAo/UWpdz+l3WbAyc9BkjGg== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) by DS7PR12MB6286.namprd12.prod.outlook.com (2603:10b6:8:95::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.12; Sat, 28 Mar 2026 22:50:22 +0000 Received: from LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528]) by LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528%5]) with mapi id 15.20.9769.014; Sat, 28 Mar 2026 22:50:21 +0000 Date: Sat, 28 Mar 2026 23:50:09 +0100 From: Andrea Righi To: Balbir Singh Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Christian Loehle , Koba Ko , Felix Abecassis , linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity Message-ID: References: <20260326151211.1862600-1-arighi@nvidia.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: MI1P293CA0015.ITAP293.PROD.OUTLOOK.COM (2603:10a6:290:2::14) To LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: LV8PR12MB9620:EE_|DS7PR12MB6286:EE_ X-MS-Office365-Filtering-Correlation-Id: 66975eef-4784-4d3c-6b92-08de8d1c6429 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|7416014|376014|366016|56012099003|22082099003|18002099003; X-Microsoft-Antispam-Message-Info: hPAC+bVlBT/3dCzvsySfV/jkKzIDuGS/UtzeFbHF26j+cDPYQtVwS1QwK91PzZyLRq7QjXCzg12wf4GO220FUIFBbXdNrp59OU2qYs08wGtvMAIrrzckmBc5i5FktAuljNsUim3FSUatmt72KNWymmCjZXNWSpek7pLLrnhbxrVL3rFUbmTGz8DVMAwvDqF/XmmE3++HZ5ogow0KM2TVC/AD5nJTqgviH72Inw4AC/yLVCuCpH2mNExteCnLi0n685UotqWx89j6tnv87GGtZohgfdzEXpQt1Gwf2m3b1rdJunj1NdXPGQDNtwfyGQKCGXOt+RCKTxOq+hLmxW3eYCCY0w/LmGjnD3PkDpUShY33004a9prHR+c2BppD6PPVAvRdOvuxkZZjkoMJ8nnzoneuvJcF5KQ/Ju1z1tQk+AS3l6JSvs98EUXuna+hSd9saPhEU/x0NPtnvFRL3ZZQVt8aNLATmRoWjApuhboxQNlBR+F7SFUhb9To8vcYzCEjz3JNH1/jDWkZJsr2+nZxZWaewExKZA+0EZF1MNNPZD+Bv9XCqU0pWE5PclK/5e2cGCKk7uRbWmOBcuHZdqQAo95RXpPUZona6UvxrcLqfDa9kphEJVuMFpmVgqLrOhjry1BvJJPNxEj+15mSPyDCw63G2snIHJxxSSdK3vYkNifqXCFDkRLdXHXWqbYP9nGMhu5EvoRAwSDNs37m+DmlAkreZGVdZ7jdhd4AXX96f0E= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LV8PR12MB9620.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(7416014)(376014)(366016)(56012099003)(22082099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?Vk2rFI5Dkk0vY2KyOGIdUbD8MTN5WJBZi85AyZi8PMVWw6ycN1UjmCdZ3KTb?= =?us-ascii?Q?IZiQt7k8IOl85SKx6y6Zs9nOTUDR8IoABM1Z6Wv7Tx91BIlBFEszhatWgrJR?= =?us-ascii?Q?HtvFkYrXiaG1e2yKWPHX05tjcl9Mmro8JGBqk3D5mWad96TnDwdSEDVMzIGB?= =?us-ascii?Q?d4YspB/EeozTwvU2soyRwSDyNTyG5/Nv0QZ/OkwIPTD9N++500EoywvPfjNB?= =?us-ascii?Q?lN1cAniL2jusxUA0WQ+KK6UFpPVg0RIatdwjiNr1bn550c3FBUX5fffaKxE1?= =?us-ascii?Q?8OAosJg+rtSMIk4t8zvfjmq7opprX3k+oe/IbjUpuEza5jXJz75SF0mh/o+H?= =?us-ascii?Q?9WtgzHgX3dDYskk99st8hd+cwdxbpKkEjJ1S7048wv8m7iMzcz61TXtupB9A?= =?us-ascii?Q?Km8VpOMn1pRyh5Lr5Gm4QVIq1GeiPQ1XXMJcdhAL2E5+rmjqMc/6JfXoPrPe?= =?us-ascii?Q?YNoetTUNi49Ox+H1Rpp5dtuV3bEz/n72aYg2ETHSU9N9eExHWRw+9QhjFdgr?= =?us-ascii?Q?uMYDNfvUSNs8TUsxOkYz9rhlCSJlMxJ6n0vp6xs8yC29zoLBDFlUWOSSbYDF?= =?us-ascii?Q?qzcPLAczAPMaspZoco7ad/lqYb/H0i6enw4Gqr8D8Eo7NTHs4WRbRyNLuds3?= =?us-ascii?Q?ilQcgJzBealH/7Xs2FKXSZWbkqE+MWU4pbZmcibrs/3GSYeLSRq2ym3BH+it?= =?us-ascii?Q?jpt/ZUpOd0MUK2z8sI3U45M6n/Miw5UlK2b6o8jk3SMA/iH5MeAdPAfnbC3K?= =?us-ascii?Q?p5ohap7n1hzihy8V9TKWL24dSMnhp+tyvkrVb1E1G54DtRGFV5Gqx8ptNpeF?= =?us-ascii?Q?ruzHQoQCxg3TvHBpwcpFoXBxi4eJHJS2meb9gEWCeO5KTGm+c5Kpy340Shmq?= =?us-ascii?Q?DmtM2RZPYD406pwbwhFmJyYRoFExEYHqiyXFOlz2ipgf1jVZeGWy7tQkskpv?= =?us-ascii?Q?H02chW05coKgXJngsXxVNocWQeroX2m4beq8JWzYR3u42S9atT2NNEF63LvV?= =?us-ascii?Q?H5zLgN6TK0QBRFCfKZlSDGYIiLLpmJV1l7azTOdvk6M1t6d9EP6KZWoxwqaM?= =?us-ascii?Q?njco4OV5UY+nkIW9WHZVQf/VbHxbe/Vl0Q9FaqhkCaf41J6gmQXRMFBXq7Xw?= =?us-ascii?Q?8DLSKU7XbwWQ68eU75gUUc2yDJGv52zJuUoqERb2hKlW+O1eLh7T05z7TZgX?= =?us-ascii?Q?WZwJFhmpvKHaionBeIko+mZB4alssKrOfYczQ+qQe6alcGlSDL/IBv4glJqc?= =?us-ascii?Q?c9VGr/FtHmv1dpzoGBH5ntRBWx9oBqZMmsVygwJfRIAcvkIUzo9CoLIFiQsa?= =?us-ascii?Q?WxL6/taZEwsBDZMUv//UhceFBI5iNKMuGFJeeLy/hRpr/oqKi/f2VkeEurKY?= =?us-ascii?Q?3vEzwSIX6UGSVSwy6EqrQtKisqNyx35Cm2tRu0h02KSCukoZHNEOEXJFODM3?= =?us-ascii?Q?rPncokrE8UUR451a166P6JuJPwOO3lV4OXkAL8chTVkWj6lJEnZfqa6KrXXD?= =?us-ascii?Q?jErZ8gym7obLmQSRHo72U8as7Q5dwYq0lGh0J/pkT4fM/OlJGqB2aaKuc4X8?= =?us-ascii?Q?ffHIrcRZWEGK9DTVcwPKwMIlPzlEwDqhvjBA9SODDoSw8M0f59IgJ5YhRwPX?= =?us-ascii?Q?l3fop53IzCxIedJi7MSf8s2pyybd9qpeR7v5aOls72LA4BPqqXwIU2Kc3lXK?= =?us-ascii?Q?nbrMWlNk9kj66p1r2IKuo2LQxlryBix/HPJ4ik3tQ35N/6ItmCFbQ2GkY/ts?= =?us-ascii?Q?LCHeheo4DQ=3D=3D?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 66975eef-4784-4d3c-6b92-08de8d1c6429 X-MS-Exchange-CrossTenant-AuthSource: LV8PR12MB9620.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Mar 2026 22:50:21.7322 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 004BwAkbk318q9q1mPLa+KY2cBxvHCrqc8A4e2yn8tIOnHvXLWsPScxPuxwxMJWlflc80OeC4G06syEDCvJsvA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR12MB6286 Hi Balbir, On Sun, Mar 29, 2026 at 12:03:19AM +1100, Balbir Singh wrote: > On 3/27/26 02:02, Andrea Righi wrote: > > This series attempts to improve SD_ASYM_CPUCAPACITY scheduling by > > introducing SMT awareness. > > > > = Problem = > > > > Nominal per-logical-CPU capacity can overstate usable compute when an SMT > > sibling is busy, because the physical core doesn't deliver its full nominal > > capacity. So, several SD_ASYM_CPUCAPACITY paths may pick high capacity CPUs > > that are not actually good destinations. > > > > = Proposed Solution = > > > > This patch set aligns those paths with a simple rule already used > > elsewhere: when SMT is active, prefer fully idle cores and avoid treating > > partially idle SMT siblings as full-capacity targets where that would > > mislead load balance. > > In kernel/sched/topology.c > > /* Don't attempt to spread across CPUs of different capacities. */ > if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child) > sd->child->flags &= ~SD_PREFER_SIBLING; > > Should handle the selection, but I guess this does not work for SMT level sd's? IIUC, SD_PREFER_SIBLING steers load balance toward sibling_imbalance() (spread runnables across child/sibling domains), it doesn't encode the fully-idle core first logic. In practice it doesn't give us SMT-aware destination choice when a sibling is busy and this series is trying to cover that gap in the palcement path. BTW, on Vera the hierarchy is SMT -> MC -> NUMA: root@localhost:~# grep . /sys/kernel/debug/sched/domains/cpu0/domain*/flags /sys/kernel/debug/sched/domains/cpu0/domain0/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SHARE_CPUCAPACITY SD_SHARE_LLC SD_PREFER_SIBLING /sys/kernel/debug/sched/domains/cpu0/domain1/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_ASYM_CPUCAPACITY SD_SHARE_LLC /sys/kernel/debug/sched/domains/cpu0/domain2/flags:SD_BALANCE_NEWIDLE SD_ASYM_CPUCAPACITY SD_ASYM_CPUCAPACITY_FULL SD_SERIALIZE SD_NUMA And domain1/groups_flags (child / SMT flags on the sched groups used at the MC level) still has SD_PREFER_SIBLING together with SD_SHARE_CPUCAPACITY. root@localhost:~# cat /sys/kernel/debug/sched/domains/cpu0/domain1/groups_flags SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SHARE_CPUCAPACITY SD_SHARE_LLC SD_PREFER_SIBLING So, prefer-sibling is still in play for SMT (including via MC groups_flags). On machines where asymmetry attaches immediately above SMT, topology may strip that flag and reduce this branch of behavior, but explicit SMT-aware placement still matters. > > > > Patch set summary: > > > > - [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection > > > > Prefer fully-idle SMT cores in asym-capacity idle selection. In the > > wakeup fast path, extend select_idle_capacity() / asym_fits_cpu() so > > idle selection can prefer CPUs on fully idle cores, with a safe fallback. > > > > - [PATCH 2/4] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity > > > > Reject misfit pulls onto busy SMT siblings on SD_ASYM_CPUCAPACITY. > > Provided for consistency with PATCH 1/4. > > > > - [PATCH 3/4] sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems > > > > Enable EAS with SD_ASYM_CPUCAPACITY and SMT. Also provided for > > consistency with PATCH 1/4. I've also tested with/without > > /proc/sys/kernel/sched_energy_aware enabled (same platform) and haven't > > noticed any regression. > > > > - [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer > > > > When choosing the housekeeping CPU that runs the idle load balancer, > > prefer an idle CPU on a fully idle core so migrated work lands where > > effective capacity is available. > > > > The change is still consistent with the same "avoid CPUs with busy > > sibling" logic and it shows some benefits on Vera, but could have > > negative impact on other systems, I'm including it for completeness > > (feedback is appreciated). > > > > This patch set has been tested on the new NVIDIA Vera Rubin platform, where > > SMT is enabled and the firmware exposes small frequency variations (+/-~5%) > > as differences in CPU capacity, resulting in SD_ASYM_CPUCAPACITY being set. > > > > Are you referring to nominal_freq? > Correct. Thanks, -Andrea