From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from BN8PR05CU002.outbound.protection.outlook.com (mail-eastus2azon11011049.outbound.protection.outlook.com [52.101.57.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0963033C186 for ; Fri, 27 Mar 2026 20:36:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.57.49 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774643781; cv=fail; b=gsHWspG94+O6Q8Jdl+Zjxl+Z32N86fegIDlS+tc9NyN+dfaO5Amik1o6t1P09Yoa+gdQ5WdUBwKQCZqEE7Bz/XuLThl8hLMefpMWlSdgXn8+2qBcGSQpGjmNU+vVVG1a7ASW7B/6/urh64qtku1TjAF/xPAKgz9ho+QvFLI3fa4= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774643781; c=relaxed/simple; bh=isIemGLACQYmUUeanz3oL63IbiFyU7wnTre7WjxXnNE=; h=Date:From:To:Cc:Subject:Message-ID:References:Content-Type: Content-Disposition:In-Reply-To:MIME-Version; b=D0vt+7sjiVmtrjF0NPkNW8tz20txuy/XXg3hB0NgWNVfGz8s9N6VaiSt3gMu8bNOaWEXle9DoXyl1gBwW02wOy0pOKSKuacwmu/lBFETlKMAcdfsuGyEzU07SwHp93QCxbAMQOm4htk2Z5LHlr1lrLjzRUIoGLWy2CS/bFTGaPE= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=OYDGJBbx; arc=fail smtp.client-ip=52.101.57.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="OYDGJBbx" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=W3mg/1jhdNCRLKsmmNvpLCzu9Bb13S6OaV7+aWd7OZfLBUNfTNphFXIzeIN/B0LsrySuTH/6aQ4BBRTTq1nmG88l7ZMQmJrbR+Pt2DBc5UFitZJLa1IDfnS7pNeY+GTseOz3b7FhUWo1fcy2ITGo+uxBzKjIM2IFWcvFN+nyemq9wSfCf0fy5bLIwi5gMs2++yNPJ76/I7dR8YmxLKsgNUW3Sw4Oz2HKwTF1YY27v7bQPCwUMSLcOyhAZgA7YjlCLf0Ex8IAzZYRkbLnzeSD658fQPu23p6yJ4OMgIvgotd2YYrHsUTBHbDa05GdrCT+821ZQN6uYpPX6p/Zyr41DA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9vMs8JWUAzVptXLtvBPEPz2xj13Z8T8Nz/mqP0evqD4=; b=WlaG/D1WWuOsJ/gC0oT6YlHOqZVSOBgeQvytiGFcpk1yWWhihCHB6Qi1xUHcOv3YkupRWEV8w6QHY9pNr63HUv2j/XqClO76g8D669vrPuxVzaV/F84z+XYohjAfnuNTKos2vPF/eosEfRpztrmFOzKpQQG2/w/yH4hM1S+ShL630YoxhLGdfsK2aGgzsyEqcu2bwq7KNs223le5wkD6N11AJNKRNznPOp0J0nYtQBtp2uyopE/3lZxT/HgdWcYxpNRehtnjUGJr7hyqVOuhGM5AbgNC54tyNvB9b/8BULXB68CxwmXSnXYQUOfyeHoizg7TTa1UgoOoH4o2SHETLg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=9vMs8JWUAzVptXLtvBPEPz2xj13Z8T8Nz/mqP0evqD4=; b=OYDGJBbxkdDfrtIbJ5sppwme50LmfTJioihMaV/vdTe8WI7Cjn3u5nvuUcnAWJYAX0KkzRe2+FZRsYpm/7Rz+hTWiMj4AAT6X4zb+4YYsdiESqueJrvmJjeWgApBpNFiMJt49ImnHZrxkuqk8CHru6zVJswP1aLUyJNWILvVohs5aU+79YGyW2XeziVidsr9d7M5l9gm2vZXv2MtTpV9It0+JwCGNIa9Q29wKOiVRaxULnA0qBFPG2nBLK1nHvgEOM4nHexbvSymsh5XwNsUSH5Bomud1ZyVDgoIkyoEiWYEu+CZNNqDKB+hfwp5XaLMXK0G+IPfL63otI6ELphPfQ== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) by CY5PR12MB6180.namprd12.prod.outlook.com (2603:10b6:930:23::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.11; Fri, 27 Mar 2026 20:36:14 +0000 Received: from LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528]) by LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528%5]) with mapi id 15.20.9769.006; Fri, 27 Mar 2026 20:36:14 +0000 Date: Fri, 27 Mar 2026 21:36:04 +0100 From: Andrea Righi To: K Prateek Nayak Cc: Vincent Guittot , Ingo Molnar , Peter Zijlstra , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Christian Loehle , Koba Ko , Felix Abecassis , Balbir Singh , linux-kernel@vger.kernel.org Subject: Re: [PATCH 4/4] sched/fair: Prefer fully-idle SMT core for NOHZ idle load balancer Message-ID: References: <20260326151211.1862600-1-arighi@nvidia.com> <20260326151211.1862600-5-arighi@nvidia.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: MI2PEPF00000B78.ITAP293.PROD.OUTLOOK.COM (2603:10a6:298:1::40c) To LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: LV8PR12MB9620:EE_|CY5PR12MB6180:EE_ X-MS-Office365-Filtering-Correlation-Id: 1b490d70-afb6-4725-68f7-08de8c407d74 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|366016|7416014|376014|56012099003|18002099003|22082099003; X-Microsoft-Antispam-Message-Info: nggFc2/4/2fqiQ8KI61bpVn7CihlJQoscbfF7buiuS7vQ1INXZKrc57oHopzXNIDGAUg6P7W68wFZXnxUyHTIrYWZGWmr0xvuJp8w39wGJZ5saaVt4H5l0FvsLPPw88ZwS+yx//Yknpo+qUujceM06q96xbljz1Q2OiUppk3i+7b2FrMt86brMIZRyt6Q3r3C4wDlUmmnwvddxoQODNGW2Nvck4TgHLCfvmF0/928utr/vNzuti1eNFE+n7Q4Dq4HcBtctz8M9lHAMf19RJnqdQDJcX3MMjAAi/rW5X7q/ylHoIT/a1BuOQzxR0imh62Hho/Vqvz3Qv58+fmEq1g+MiQWIpxX0+qFLiWQvhe9TalZJh3j/ZaaNl8yxzzuMPk2mhNkbXfaeVjVD3LZYCQC54+u7iTZooyZ6ogQltG7PbPUSOkKFsrQn3ZBkaVypku4IxtLs/OIVfqJjud68VrUw/qmo/rqptGWRLJuAWdfcc6KkPYai5s0u08/7Mf9WXqwZS8YEp1pUUrUDoCNODdlb4qMQIw/nS5Oc5yrrYhajF5wZlK4V8jTHKzvDzHguyy43VP3TZnapRSvr+tFVa2jGugR/i9tlsauFiX+uVXV3/fSTooXeaJvX8qBzpgnEIjYQL/HQjPIV1RVUCLcZHjtseR/ZKDMOCmm1NrmhV58zn3UXee0Im+udyBlwwt+exN4dB1Ex7FshvSQ2Yv9apcX3e4TUn5smKsir1wW3yrcD8= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LV8PR12MB9620.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(366016)(7416014)(376014)(56012099003)(18002099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?aqXjDGYsXoapLVz9tgPC7fbUeRFMmw+yZw8TtJI2r1vYj7yc85vTCgPU2K+e?= =?us-ascii?Q?//4vaL8cGY+SPMBHQSKhqr6Su7c1w2zzat02VVh5c1OUniOPpvj54Zb+HlHF?= =?us-ascii?Q?jj3FVE67k1XvRqvtp95BTw/iwc/Wu5KAwFwq3ERNSPI+dQoJFHeYl47ZlYqz?= =?us-ascii?Q?eJBUXcKMePnlOQrlQRNpWwxiCCB4Ly64jjliYh2KBp2TVlCuMKBy5U/J+kzl?= =?us-ascii?Q?AZ1nHMfAojkrhr+x4oEzB/8g/YlKa7fXn8m+B++hjIGrn9Zv4qYsEwqpOlrW?= =?us-ascii?Q?iefWdQPOu7u0UyU4gcH+pi+D+TEw5eL1JuQ2JZESZtWRjf08QOhOSDK1eMJ4?= =?us-ascii?Q?poHSc61aGsKiqgCc0ZJfM8GukAhRDFYI/VCKmI3G+2xxLsFPGe8EsDi3nxkG?= =?us-ascii?Q?OO9camxFDeHh/TrwgYnzF8agj05r6CWqCIVclH3sCex6yt8dBA8cUvx1hRtt?= =?us-ascii?Q?B3QToldGA3SpiB/ora7PIzC3tI/dKGWIjktLUE7oNxlEgZwQo56q8B/fxRV7?= =?us-ascii?Q?m80vDf+rTwnuA4AH50HbKrOKhlZbd6VwqLrrMg1x6f+AIOGSTVUsDY2sJF48?= =?us-ascii?Q?/LerzXLNUoiPX7URmEqob3F+7fd2RoUWc4pOMueSV30wdzbKhIxdQ/RtMpo5?= =?us-ascii?Q?4UqPrO3cMF/SHOq5EdFtO3WlKk6S4XdNALWUnXjd8MGKWAZjRPSo80X/xMLL?= =?us-ascii?Q?+o06Qxnz9xQbOgjyQURPT7FuLZljv6jV0W6eU+Jnsw0za2r06NuEO4TNxNHL?= =?us-ascii?Q?dOgiUe5l2xu+JFG4PWUzEPpTtw0iLWPk7YFhH8AJx5j4UcPas1DfjWZXggg/?= =?us-ascii?Q?AVkEnN6ZIhDdiUwK2utm8CoDdIifh21DHtfh9+Gx3YcKZPs/00zIwG0MHKgg?= =?us-ascii?Q?mNxwWaoI6Cd/nBBb1FQDrB5sGWIPXBVsdoZ/0qhuK82XfAdfCMdsD+LWF+yV?= =?us-ascii?Q?9DAedOowwtJ0gVnylQwMF/Y+yxrf6ZZzUaxf/4PVeHs8T6vzc+lwROul3xLl?= =?us-ascii?Q?EhfwhglZu2NZ0GkX+PYJTcwI35dcHcamKz94Gv1WnCKHuy5Me6z2eWMsRgTY?= =?us-ascii?Q?7gSgroVqUrSdhBMa+xjBar5azy647jsKIa8d8jr6LLftoHUnavv5WQqtV/O+?= =?us-ascii?Q?T4riGb4vN2UMT5bZRb8880leLxIAKNmz4hpOob7/T6lkem7S3i2Uak1SRzgU?= =?us-ascii?Q?PRUme870u/W6bAy5zF3GZROsChoIA8fHRFfP0laeJeUoqr8pD5qR28g+pSC4?= =?us-ascii?Q?FBqQUYWIsMZ06I6sBKTBKB98ABxHhEQzWGDVZTblJ1EgMFmJT4O+K73xd/T1?= =?us-ascii?Q?9qNoNUyc7yOgjkONCvsBfP6fkMvpNtaAcpAeti5AoKdCi7vWduGJjH248PWX?= =?us-ascii?Q?Mk/zwuBL8XmAS+1YnaP7JtyG/PrzYsT/XWF2eLk1KZdBlkrFry0DHmimkosg?= =?us-ascii?Q?V1Ox9gY2saCUtR6Bf3RpfcE71voGb3cOUFm/RHzJKdxfyMMC289WHhnoB8Rx?= =?us-ascii?Q?nj36rEhESmncdNLecavA98hN2inw0gleVuweZy6OGy0F+VExyc9EBznIfNYi?= =?us-ascii?Q?BnRjUXa/CwVhQipZqaifeXP46p2PaL9LE4Wet54yyb7qPxpEvTMtmaAmldgR?= =?us-ascii?Q?FnQsJi59fifYqkRro2M8TdrAKYx0kmoLy0cPEQPHdip/5b0sKO4e/WZn/IqU?= =?us-ascii?Q?pmlNGH+YyoapNW41t3Nbswf4bYIFLjcYbH4L9uVnzXIWgHR17o5W88t1kt2F?= =?us-ascii?Q?Gp5PbOEAag=3D=3D?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 1b490d70-afb6-4725-68f7-08de8c407d74 X-MS-Exchange-CrossTenant-AuthSource: LV8PR12MB9620.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 Mar 2026 20:36:14.6651 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: c1mimbTi3KzR9hGyKiON+Pq9lR0+jjbiSP8CZmr+b/Q6zKMaSuCxHSSxAQA1QInmsX3elP2XkJClZcM2KkgW4g== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY5PR12MB6180 On Fri, Mar 27, 2026 at 05:04:23PM +0530, K Prateek Nayak wrote: > Hello Andrea, > > On 3/27/2026 3:14 PM, Andrea Righi wrote: > > Hi Vincent, > > > > On Fri, Mar 27, 2026 at 09:45:56AM +0100, Vincent Guittot wrote: > >> On Thu, 26 Mar 2026 at 16:12, Andrea Righi wrote: > >>> > >>> When choosing which idle housekeeping CPU runs the idle load balancer, > >>> prefer one on a fully idle core if SMT is active, so balance can migrate > >>> work onto a CPU that still offers full effective capacity. Fall back to > >>> any idle candidate if none qualify. > >> > >> This one isn't straightforward for me. The ilb cpu will check all > >> other idle CPUs 1st and finish with itself so unless the next CPU in > >> the idle_cpus_mask is a sibling, this should not make a difference > >> > >> Did you see any perf diff ? > > > > I actually see a benefit, in particular, with the first patch applied I see > > a ~1.76x speedup, if I add this on top I get ~1.9x speedup vs baseline, > > which seems pretty consistent across runs (definitely not in error range). > > > > The intention with this change was to minimize SMT noise running the ILB > > code on a fully-idle core when possible, but I also didn't expect to see > > such big difference. > > > > I'll investigate more to better understand what's happening. > > Interesting! Either this "CPU-intensive workload" hates SMT turning > busy (but to an extent where performance drops visibly?) or ILB > keeps getting interrupted on an SMT sibling that is burdened by > interrupts leading to slower balance (or IRQs driving the workload > being delayed by rq_lock disabling them) > > Would it be possible to share the total SCHED_SOFTIRQ time, load > balancing attempts, and utlization with and without the patch? I too > will go queue up some runs to see if this makes a difference. Quick update: I also tried this on a Vera machine with a firmware that exposes the same capacity for all the CPUs (so with SD_ASYM_CPUCAPACITY disabled and SMT still on of course) and I see similar performance benefits. Looking at SCHED_SOFTIRQ and load balancing attempts I don't see big differences, all within error range (results produced using a vibe-coded python script): - baseline (stats/sec): SCHED softirq count : 2,625 LB attempts (total) : 69,832 Per-domain breakdown: domain0 (SMT): lb_count (total) : 68,482 [balanced=68,472 failed=9] CPU_IDLE : lb=1,408 imb(load=0 util=0 task=0 misfit=0) gained=0 CPU_NEWLY_IDLE : lb=67,041 imb(load=0 util=0 task=7 misfit=0) gained=0 CPU_NOT_IDLE : lb=33 imb(load=0 util=0 task=2 misfit=0) gained=0 domain1 (MC): lb_count (total) : 902 [balanced=900 failed=2] CPU_NEWLY_IDLE : lb=869 imb(load=0 util=0 task=0 misfit=0) gained=0 CPU_NOT_IDLE : lb=33 imb(load=0 util=0 task=2 misfit=0) gained=0 domain2 (NUMA): lb_count (total) : 448 [balanced=441 failed=7] CPU_NEWLY_IDLE : lb=415 imb(load=0 util=0 task=44 misfit=0) gained=0 CPU_NOT_IDLE : lb=33 imb(load=0 util=0 task=268 misfit=0) gained=0 - with ilb-smt (stats/sec): SCHED softirq count : 2,671 LB attempts (total) : 68,572 Per-domain breakdown: domain0 (SMT): lb_count (total) : 67,239 [balanced=67,197 failed=41] CPU_IDLE : lb=1,419 imb(load=0 util=0 task=0 misfit=0) gained=0 CPU_NEWLY_IDLE : lb=65,783 imb(load=0 util=0 task=42 misfit=0) gained=1 CPU_NOT_IDLE : lb=37 imb(load=0 util=0 task=0 misfit=0) gained=0 domain1 (MC): lb_count (total) : 833 [balanced=833 failed=0] CPU_NEWLY_IDLE : lb=796 imb(load=0 util=0 task=0 misfit=0) gained=0 CPU_NOT_IDLE : lb=37 imb(load=0 util=0 task=0 misfit=0) gained=0 domain2 (NUMA): lb_count (total) : 500 [balanced=488 failed=12] CPU_NEWLY_IDLE : lb=463 imb(load=0 util=0 task=44 misfit=0) gained=0 CPU_NOT_IDLE : lb=37 imb(load=0 util=0 task=627 misfit=0) gained=0 I'll add more direct instrumentation to check what ILB is doing differently... And I'll also repeat the test and collect the same metrics on the Vera machine with the firmware that exposes different CPU capacities as soon as I get access again. Thanks, -Andrea