From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from SA9PR02CU001.outbound.protection.outlook.com (mail-southcentralusazon11013057.outbound.protection.outlook.com [40.93.196.57])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76199341ADF
	for <linux-kernel@vger.kernel.org>; Sat, 28 Mar 2026 22:50:26 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.93.196.57
ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1774738228; cv=fail; b=qCvuFTGpm5IX3EN+U8WJEVS8KayxMd82hDr3E0p8LZY/gcmTAoYjUapRMi8Z4rPdhw5u/NZ0psjKUh6111uwPx6Q/vP0VYP1kVPWGFJZBfv5GCOlsJmiBEj9nPzaUvdMJFKexNzVvj6t1l8AEgElYZ4JyUJ40mk6Sp4K7GG2m1M=
ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1774738228; c=relaxed/simple;
	bh=oUTuXDRlPRuTDx7RwgLvx6rcwX4nUIa0qx6WdiMyC7s=;
	h=Date:From:To:Cc:Subject:Message-ID:References:Content-Type:
	 Content-Disposition:In-Reply-To:MIME-Version; b=R12wUlbXWNG5IyUPB5gxwStNUTyjrudgp2H72hpQZX02uGdp2SKQJcTe5v4f7z+FBkHzAAgdOlDj3G/WzAHoSop4/FM/1h04vaSQByxO/5U3ZYtSZMQjWGWHu0m8syowJHFM6C5EwkeFfPoQnwYPvuHC1TD3TAH5Nvde24Texn4=
ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=EHcU20y2; arc=fail smtp.client-ip=40.93.196.57
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com
Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="EHcU20y2"
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=MiLWo4kR9xpjeuIsbx6phqo5wKACVEli7DfLZQzUSjTPYbC5jL8q4yQbQfDIu9CGwi4UWySv1q20xjBm4ckBEnHSDS68JUysAS+MeRb1mioC6MZHSsnmVilXUu9wBTvMYv+J8XpJ6jjDFt8XcxuvktOlUe+L7MngAlcdHlGoGimclJaa+tsCUjY08HkCZ9Mjved9V+Bu01qYnuFpxCD4F5EUWsRWxLLfbFQeYQTnZ1kwk1z+dSj/Dg2ut4+dkRIBX8ANI5FVlR02P6W30jItGRwCeGli/IGju6I+xlQHjGRd6RT6NVWjTI7nrZ8zoIbgqAm/AQEH8zg6H3QZSqDYQw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=lUMG3jrAy2VCWDwCUMwx1SQPtGWpGr/bvkIPEubfa9o=;
 b=X+iQ6AJkrwLa5iKmhjiQdI7PJcAVjn7XZXAnosqD2D2JfSCVpcrt0Im7hnMry4BY3iLnEZgyUZSojFnv3kL7Hwn7aMpAd5iLoKLhCvilYCGrfpN47fg43vHldOczvaCqfAxDKx4ltBAyRZdUSU2bULaomSnVPAKXU/5+VxgiM9zFPvKhq7JsVU+luhvs+i8fF0RLKfrzbmTYWAs1XkxHbtRVPY+8Zq5kHLQvVn5NV1a+H3AU9FslIZ08tYHugPu93xy1af6SWIHQjcrjByMtx2r0+OR2GE+1CyXGBP5Pz6zH0wph5gxItE/YpuoRi1rVqJK4SMuMzN4h5yJ28G9e2g==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass
 smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com;
 dkim=pass header.d=nvidia.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com;
 s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=lUMG3jrAy2VCWDwCUMwx1SQPtGWpGr/bvkIPEubfa9o=;
 b=EHcU20y2vR23CoAEqac2My0u+fr3KvDK4dcQ6un4EjkWZPy8i0V+W7LcEyQip8tGsKKqr8qkHA3L5saxPBVe/ItqizdOYicABw+7sY38pPQm5iD895E+FMe2Px4+G1dqACyOMTmP2ootWLoHH4HN/HNsdshLc1I6yhDXEg6DLQeIYo5Fsvh+Y9YvKx5H11lNSeFCpsyTsDcUnXuTb6gF0gPvq5fTOzpqGmXgEKCRr/CqAwLCxIAPdWDUaThDir8wrXGa14QzahflLkKF4qIuHaenHKlE4XGeWdwn/SuCtly7TYeQt4Lm5TRMnGUGx96gAo/UWpdz+l3WbAyc9BkjGg==
Authentication-Results: dkim=none (message not signed)
 header.d=none;dmarc=none action=none header.from=nvidia.com;
Received: from LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19)
 by DS7PR12MB6286.namprd12.prod.outlook.com (2603:10b6:8:95::11) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.12; Sat, 28 Mar
 2026 22:50:22 +0000
Received: from LV8PR12MB9620.namprd12.prod.outlook.com
 ([fe80::299d:f5e0:3550:1528]) by LV8PR12MB9620.namprd12.prod.outlook.com
 ([fe80::299d:f5e0:3550:1528%5]) with mapi id 15.20.9769.014; Sat, 28 Mar 2026
 22:50:21 +0000
Date: Sat, 28 Mar 2026 23:50:09 +0100
From: Andrea Righi <arighi@nvidia.com>
To: Balbir Singh <balbirs@nvidia.com>
Cc: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Christian Loehle <christian.loehle@arm.com>,
	Koba Ko <kobak@nvidia.com>, Felix Abecassis <fabecassis@nvidia.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity
Message-ID: <achbIaTOR9V5LW2F@gpd4>
References: <20260326151211.1862600-1-arighi@nvidia.com>
 <a3bce886-b4bb-4f5e-af04-930934fef50d@nvidia.com>
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <a3bce886-b4bb-4f5e-af04-930934fef50d@nvidia.com>
X-ClientProxiedBy: MI1P293CA0015.ITAP293.PROD.OUTLOOK.COM
 (2603:10a6:290:2::14) To LV8PR12MB9620.namprd12.prod.outlook.com
 (2603:10b6:408:2a1::19)
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: LV8PR12MB9620:EE_|DS7PR12MB6286:EE_
X-MS-Office365-Filtering-Correlation-Id: 66975eef-4784-4d3c-6b92-08de8d1c6429
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam:
	BCL:0;ARA:13230040|1800799024|7416014|376014|366016|56012099003|22082099003|18002099003;
X-Microsoft-Antispam-Message-Info:
	hPAC+bVlBT/3dCzvsySfV/jkKzIDuGS/UtzeFbHF26j+cDPYQtVwS1QwK91PzZyLRq7QjXCzg12wf4GO220FUIFBbXdNrp59OU2qYs08wGtvMAIrrzckmBc5i5FktAuljNsUim3FSUatmt72KNWymmCjZXNWSpek7pLLrnhbxrVL3rFUbmTGz8DVMAwvDqF/XmmE3++HZ5ogow0KM2TVC/AD5nJTqgviH72Inw4AC/yLVCuCpH2mNExteCnLi0n685UotqWx89j6tnv87GGtZohgfdzEXpQt1Gwf2m3b1rdJunj1NdXPGQDNtwfyGQKCGXOt+RCKTxOq+hLmxW3eYCCY0w/LmGjnD3PkDpUShY33004a9prHR+c2BppD6PPVAvRdOvuxkZZjkoMJ8nnzoneuvJcF5KQ/Ju1z1tQk+AS3l6JSvs98EUXuna+hSd9saPhEU/x0NPtnvFRL3ZZQVt8aNLATmRoWjApuhboxQNlBR+F7SFUhb9To8vcYzCEjz3JNH1/jDWkZJsr2+nZxZWaewExKZA+0EZF1MNNPZD+Bv9XCqU0pWE5PclK/5e2cGCKk7uRbWmOBcuHZdqQAo95RXpPUZona6UvxrcLqfDa9kphEJVuMFpmVgqLrOhjry1BvJJPNxEj+15mSPyDCw63G2snIHJxxSSdK3vYkNifqXCFDkRLdXHXWqbYP9nGMhu5EvoRAwSDNs37m+DmlAkreZGVdZ7jdhd4AXX96f0E=
X-Forefront-Antispam-Report:
	CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LV8PR12MB9620.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(7416014)(376014)(366016)(56012099003)(22082099003)(18002099003);DIR:OUT;SFP:1101;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0:
	=?us-ascii?Q?Vk2rFI5Dkk0vY2KyOGIdUbD8MTN5WJBZi85AyZi8PMVWw6ycN1UjmCdZ3KTb?=
 =?us-ascii?Q?IZiQt7k8IOl85SKx6y6Zs9nOTUDR8IoABM1Z6Wv7Tx91BIlBFEszhatWgrJR?=
 =?us-ascii?Q?HtvFkYrXiaG1e2yKWPHX05tjcl9Mmro8JGBqk3D5mWad96TnDwdSEDVMzIGB?=
 =?us-ascii?Q?d4YspB/EeozTwvU2soyRwSDyNTyG5/Nv0QZ/OkwIPTD9N++500EoywvPfjNB?=
 =?us-ascii?Q?lN1cAniL2jusxUA0WQ+KK6UFpPVg0RIatdwjiNr1bn550c3FBUX5fffaKxE1?=
 =?us-ascii?Q?8OAosJg+rtSMIk4t8zvfjmq7opprX3k+oe/IbjUpuEza5jXJz75SF0mh/o+H?=
 =?us-ascii?Q?9WtgzHgX3dDYskk99st8hd+cwdxbpKkEjJ1S7048wv8m7iMzcz61TXtupB9A?=
 =?us-ascii?Q?Km8VpOMn1pRyh5Lr5Gm4QVIq1GeiPQ1XXMJcdhAL2E5+rmjqMc/6JfXoPrPe?=
 =?us-ascii?Q?YNoetTUNi49Ox+H1Rpp5dtuV3bEz/n72aYg2ETHSU9N9eExHWRw+9QhjFdgr?=
 =?us-ascii?Q?uMYDNfvUSNs8TUsxOkYz9rhlCSJlMxJ6n0vp6xs8yC29zoLBDFlUWOSSbYDF?=
 =?us-ascii?Q?qzcPLAczAPMaspZoco7ad/lqYb/H0i6enw4Gqr8D8Eo7NTHs4WRbRyNLuds3?=
 =?us-ascii?Q?ilQcgJzBealH/7Xs2FKXSZWbkqE+MWU4pbZmcibrs/3GSYeLSRq2ym3BH+it?=
 =?us-ascii?Q?jpt/ZUpOd0MUK2z8sI3U45M6n/Miw5UlK2b6o8jk3SMA/iH5MeAdPAfnbC3K?=
 =?us-ascii?Q?p5ohap7n1hzihy8V9TKWL24dSMnhp+tyvkrVb1E1G54DtRGFV5Gqx8ptNpeF?=
 =?us-ascii?Q?ruzHQoQCxg3TvHBpwcpFoXBxi4eJHJS2meb9gEWCeO5KTGm+c5Kpy340Shmq?=
 =?us-ascii?Q?DmtM2RZPYD406pwbwhFmJyYRoFExEYHqiyXFOlz2ipgf1jVZeGWy7tQkskpv?=
 =?us-ascii?Q?H02chW05coKgXJngsXxVNocWQeroX2m4beq8JWzYR3u42S9atT2NNEF63LvV?=
 =?us-ascii?Q?H5zLgN6TK0QBRFCfKZlSDGYIiLLpmJV1l7azTOdvk6M1t6d9EP6KZWoxwqaM?=
 =?us-ascii?Q?njco4OV5UY+nkIW9WHZVQf/VbHxbe/Vl0Q9FaqhkCaf41J6gmQXRMFBXq7Xw?=
 =?us-ascii?Q?8DLSKU7XbwWQ68eU75gUUc2yDJGv52zJuUoqERb2hKlW+O1eLh7T05z7TZgX?=
 =?us-ascii?Q?WZwJFhmpvKHaionBeIko+mZB4alssKrOfYczQ+qQe6alcGlSDL/IBv4glJqc?=
 =?us-ascii?Q?c9VGr/FtHmv1dpzoGBH5ntRBWx9oBqZMmsVygwJfRIAcvkIUzo9CoLIFiQsa?=
 =?us-ascii?Q?WxL6/taZEwsBDZMUv//UhceFBI5iNKMuGFJeeLy/hRpr/oqKi/f2VkeEurKY?=
 =?us-ascii?Q?3vEzwSIX6UGSVSwy6EqrQtKisqNyx35Cm2tRu0h02KSCukoZHNEOEXJFODM3?=
 =?us-ascii?Q?rPncokrE8UUR451a166P6JuJPwOO3lV4OXkAL8chTVkWj6lJEnZfqa6KrXXD?=
 =?us-ascii?Q?jErZ8gym7obLmQSRHo72U8as7Q5dwYq0lGh0J/pkT4fM/OlJGqB2aaKuc4X8?=
 =?us-ascii?Q?ffHIrcRZWEGK9DTVcwPKwMIlPzlEwDqhvjBA9SODDoSw8M0f59IgJ5YhRwPX?=
 =?us-ascii?Q?l3fop53IzCxIedJi7MSf8s2pyybd9qpeR7v5aOls72LA4BPqqXwIU2Kc3lXK?=
 =?us-ascii?Q?nbrMWlNk9kj66p1r2IKuo2LQxlryBix/HPJ4ik3tQ35N/6ItmCFbQ2GkY/ts?=
 =?us-ascii?Q?LCHeheo4DQ=3D=3D?=
X-OriginatorOrg: Nvidia.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 66975eef-4784-4d3c-6b92-08de8d1c6429
X-MS-Exchange-CrossTenant-AuthSource: LV8PR12MB9620.namprd12.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Mar 2026 22:50:21.7322
 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: 004BwAkbk318q9q1mPLa+KY2cBxvHCrqc8A4e2yn8tIOnHvXLWsPScxPuxwxMJWlflc80OeC4G06syEDCvJsvA==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR12MB6286

Hi Balbir,

On Sun, Mar 29, 2026 at 12:03:19AM +1100, Balbir Singh wrote:
> On 3/27/26 02:02, Andrea Righi wrote:
> > This series attempts to improve SD_ASYM_CPUCAPACITY scheduling by
> > introducing SMT awareness.
> > 
> > = Problem =
> > 
> > Nominal per-logical-CPU capacity can overstate usable compute when an SMT
> > sibling is busy, because the physical core doesn't deliver its full nominal
> > capacity. So, several SD_ASYM_CPUCAPACITY paths may pick high capacity CPUs
> > that are not actually good destinations.
> > 
> > = Proposed Solution =
> > 
> > This patch set aligns those paths with a simple rule already used
> > elsewhere: when SMT is active, prefer fully idle cores and avoid treating
> > partially idle SMT siblings as full-capacity targets where that would
> > mislead load balance.
> 
> In kernel/sched/topology.c
> 
> 	/* Don't attempt to spread across CPUs of different capacities. */
> 	if ((sd->flags & SD_ASYM_CPUCAPACITY) && sd->child)
> 		sd->child->flags &= ~SD_PREFER_SIBLING;
> 
> Should handle the selection, but I guess this does not work for SMT level sd's?

IIUC, SD_PREFER_SIBLING steers load balance toward sibling_imbalance()
(spread runnables across child/sibling domains), it doesn't encode the
fully-idle core first logic. In practice it doesn't give us SMT-aware
destination choice when a sibling is busy and this series is trying to
cover that gap in the palcement path.

BTW, on Vera the hierarchy is SMT -> MC -> NUMA:

root@localhost:~# grep . /sys/kernel/debug/sched/domains/cpu0/domain*/flags
/sys/kernel/debug/sched/domains/cpu0/domain0/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SHARE_CPUCAPACITY SD_SHARE_LLC SD_PREFER_SIBLING
/sys/kernel/debug/sched/domains/cpu0/domain1/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_ASYM_CPUCAPACITY SD_SHARE_LLC
/sys/kernel/debug/sched/domains/cpu0/domain2/flags:SD_BALANCE_NEWIDLE SD_ASYM_CPUCAPACITY SD_ASYM_CPUCAPACITY_FULL SD_SERIALIZE SD_NUMA

And domain1/groups_flags (child / SMT flags on the sched groups used at the
MC level) still has SD_PREFER_SIBLING together with SD_SHARE_CPUCAPACITY.

root@localhost:~# cat /sys/kernel/debug/sched/domains/cpu0/domain1/groups_flags
SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SHARE_CPUCAPACITY SD_SHARE_LLC SD_PREFER_SIBLING

So, prefer-sibling is still in play for SMT (including via MC
groups_flags). On machines where asymmetry attaches immediately above SMT,
topology may strip that flag and reduce this branch of behavior, but
explicit SMT-aware placement still matters.

> > 
> > Patch set summary:
> > 
> >  - [PATCH 1/4] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection
> > 
> >    Prefer fully-idle SMT cores in asym-capacity idle selection. In the
> >    wakeup fast path, extend select_idle_capacity() / asym_fits_cpu() so
> >    idle selection can prefer CPUs on fully idle cores, with a safe fallback.
> > 
> >  - [PATCH 2/4] sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity
> > 
> >    Reject misfit pulls onto busy SMT siblings on SD_ASYM_CPUCAPACITY.
> >    Provided for consistency with PATCH 1/4.
> > 
> >  - [PATCH 3/4] sched/fair: Enable EAS with SMT on SD_ASYM_CPUCAPACITY systems
> > 
> >    Enable EAS with SD_ASYM_CPUCAPACITY and SMT. Also provided for
> >    consistency with PATCH 1/4. I've also tested with/without
> >    /proc/sys/kernel/sched_energy_aware enabled (same platform) and haven't
> >    noticed any regression.
> > 
> >  - [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer
> > 
> >    When choosing the housekeeping CPU that runs the idle load balancer,
> >    prefer an idle CPU on a fully idle core so migrated work lands where
> >    effective capacity is available.
> > 
> >    The change is still consistent with the same "avoid CPUs with busy
> >    sibling" logic and it shows some benefits on Vera, but could have
> >    negative impact on other systems, I'm including it for completeness
> >    (feedback is appreciated).
> > 
> > This patch set has been tested on the new NVIDIA Vera Rubin platform, where
> > SMT is enabled and the firmware exposes small frequency variations (+/-~5%)
> > as differences in CPU capacity, resulting in SD_ASYM_CPUCAPACITY being set.
> > 
> 
> Are you referring to nominal_freq?
> 

Correct.

Thanks,
-Andrea