From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from CY7PR03CU001.outbound.protection.outlook.com (mail-westcentralusazon11010016.outbound.protection.outlook.com [40.93.198.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70A5E413245 for ; Tue, 28 Apr 2026 14:44:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.93.198.16 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777387452; cv=fail; b=tNG36XDIasSVdqg8uKEKLlfQ++xhqsH7xXOE8eby4UT48yw2j7Hu2yzZn1H72hDGvtEL9yZkYN3dOy6JbvTIiwmhb1B16j6kxPaFwsmJw8OXhYqwrtLS5lsDZFUYnjulvjLQ/jx6TwBQSmTKdowEWlv0vkL9hlKKvgUZi6Am5Ys= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777387452; c=relaxed/simple; bh=YtURsUk98qFFzkeGHby2uAG/ToZwhKit/BKFhZtbXsk=; h=From:To:Cc:Subject:Date:Message-ID:Content-Type:MIME-Version; b=lLSXJ22ykeLf59die5bwN/pS4+c3TK5GCM2fp5qTBxkCOKuuZoBa3lxdHKgb0mf+RuNXZbZygVTxcGQ1Voo05kjsdgv8DHtGdvC71vAj9glWUX9EQz3DKOsqBnsOueRnV3ccoa7H/1XT29tSnASoI20ppSAudJ1z5v/kXQCp0tA= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=DQKFMW18; arc=fail smtp.client-ip=40.93.198.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="DQKFMW18" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=KLTJXe6pKEskD2V5ZAA0pcYOtSE15uR4xLTmipuTiQaPyduGBNIsw6b2am++4UfkKgJ8awNZOambHTsbVCfNvMrkgFrEOtzhC3NIPMTPHzO9g9rSr80VkpCHKQDcD3glxj/rCqbfVY1mdll0BM9CTnts6IMor2I6Ol4J79qf68WUcUbD16g5gCILt1Mycjnj0FasS3P5SpE8RKJpz675B23GB0X9jfPchjSpzBn0J5nCAWaCh6RdVenE8YBARJFfY1wCV4LWnJguU+UyqHBvcgxGGGFFT+6DuHMWwzotLkwjHQlN+CZAkOgmOSuB/lTe4mTJo5jcmaPBK06f68tQaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ZnP4WfWZDbh6FPORAl1n7mznpvNMw9Fx9kPSMphz9T0=; b=XrxWal4hHsrFVzkMBpTrJ5GyRzbxQhwWr6hcIJi0YOSNVeBPoxa+KnJdgrgdTCv7T21OO6e0x9WKovYvb9e57uweVBGbEtFC21ZFJyG4ygPZVy2AWFh783+bdqW4Oae3luLKLOzCbWzDPYDFghO4vNYcdjL+3eq4ZKbyC6iUcHfftsyZ7s/TpnI9exOkz9AVyYhcIByC4JVJrS3XEru4lS07GHbLk6Ac1r5SmCno5M1awQpef67eIqPHcV7mv5jmrwVJoz9D38twnGIb1L3TK2FUhayDQ1gEAajpHkN3Hv3bjJsOGz8SBc909niZn/YSkReb+q5HpLU7vkgtXEw7lA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ZnP4WfWZDbh6FPORAl1n7mznpvNMw9Fx9kPSMphz9T0=; b=DQKFMW18LdqchVTdyUV5NR4Tz6zWJ5sN09f9BKIWvl2GATxsDU07I7Vp6x3uL9STzJEoVuUdsDutOdaqn7R22H6mtX4JrmoGEiCBbDP4y2rApVUNnV+Fg4s+PAzl/ONcYy7KsPwlVWFLL/nMkEmP4jBD+2dsWf+kJs51fl71qcBcLg3MlIG1IAiS+OrFSfAX9ps0n6ZosGGZmDgbdOiNWDbkILfoskZjx8QjkNu3wIvKNydQM1Iy0Oc5+HUitHGpCQV6ohDzTo+8xkuvApdpSpQFMbGl0WrQ0iNpIIAbRD5vEWX3R1tbhs6fvY4AqpWaMxXWCJgfMP5CfXpaB7jADw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) by IA1PR12MB6211.namprd12.prod.outlook.com (2603:10b6:208:3e5::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9870.15; Tue, 28 Apr 2026 14:44:03 +0000 Received: from LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528]) by LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528%5]) with mapi id 15.20.9870.013; Tue, 28 Apr 2026 14:44:03 +0000 From: Andrea Righi To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , K Prateek Nayak , Christian Loehle , Koba Ko , Felix Abecassis , Balbir Singh , Joel Fernandes , Shrikanth Hegde , linux-kernel@vger.kernel.org Subject: [PATCH v5 0/5] sched/fair: SMT-aware asymmetric CPU capacity Date: Tue, 28 Apr 2026 16:41:06 +0200 Message-ID: <20260428144352.3575863-1-arighi@nvidia.com> X-Mailer: git-send-email 2.54.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: MI2PEPF00000B79.ITAP293.PROD.OUTLOOK.COM (2603:10a6:298:1::40d) To LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: LV8PR12MB9620:EE_|IA1PR12MB6211:EE_ X-MS-Office365-Filtering-Correlation-Id: 09fea15d-5fe5-4fdd-78fa-08dea534973f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|1800799024|376014|366016|18002099003|56012099003|18096099003; X-Microsoft-Antispam-Message-Info: oYl6lURRrI7f4fIp05dFPuTsg4oGlhPBYbXGsqSQdnkTZ1QwbI7LjkNxLemQQ5+pwREcUf/QRAZYzgVoE8+EFfG7Ux7IY9KJAw8klJVt5Qj/1sUkoL0vc+rtUOMStt46CNWujqBGG5e+TzoG4PWMcvUy6pNjwHCnndj3/vsZY2WGzcK83d6tR7gYiK46OpaeNQzYgtlv42ijUkcxJyFkTXvOKXKVTNpgjpEnFtigMIn0/D6nSMx/pjV+jPPn0ktWkES8YkP5B/yHXoses6XDT/IkNaej31KhYd2DzyKlOGigo6cJS0Y8XbwZH9h04q4yM2FTcJnCN7o5DF5F1FYnWxdEXH15J4tA6Wqr7SpO8QYKMIG9HM4/Se0YZv/URJptNJ8YlumPz5b84KzyuEAff5s1ZB+LMZoNGrNB/qMmANmlCgmmnPomCbmHJk9E9FB4QWk530AiLtXyXOCqMUznRR6llzTYPdT2zaZerJ5Tovmivi+qfd1XS8X3Vf3MJnq4T6slAP4qSJRWlcwRaDuHoqAbckCLc4SUKBpvhxOiL4mmHowoFL4lYS8iOC8+YNlVYct0poKNq+HrQtcpElWdY28skwCxg6p387auNdvoX0LsJEYj729eSSJmEN6DCB1P8XJ3qHsu78jptfCSAjsXcYw69I16EX4XKA8V+sQgRIa7wpUu8xcEg+gR8uyeVt/Z0+aYz0PNh/3YMuzmQakjAHQ8hcmycRdhvTe1eOJ1lM7uPzAyPSGI+d3wFbSUaO73 X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LV8PR12MB9620.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(7416014)(1800799024)(376014)(366016)(18002099003)(56012099003)(18096099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?I+6Se5QofP3ulphBwASjdD8vy7AXZTmmPej1eQhDELHskvO+oLYf53nmGMop?= =?us-ascii?Q?gl1lkaoMwlqMzVC5ztmLw9vvmluKYbF6cwZ+qnvAiWB4rfqoGVpjI9BYk2GV?= =?us-ascii?Q?6KCx7euqpSMeEKgpga/rzWDj77vtlKwSCkd0Bk6nREYKWGHQvbQfKmPmVttV?= =?us-ascii?Q?QpuDGFbxpEXCDVIaVjjum59qDmToq8yXPAy5mrom/P3iuLDFwncsCKNQEz0C?= =?us-ascii?Q?wXRwaZaMDDBrBMwR8xujs00ed9FDNQphA3yAGjcxzmdYL3sIAWrGeoUTDXX1?= =?us-ascii?Q?3thEwvUGxGAFrlKGgCmdoE38bb5Z3kLueBhUk+SBUSQKnwgeLWV9xWMqA7BY?= =?us-ascii?Q?ArfUexT639uTnT2IomxS1/NG4Q91MSVMPTH6c24uaPTM0gqqyYVQIGrpzHw9?= =?us-ascii?Q?cg5ZhjdFPyoPN7SqCOkYcWhHispTwC9DSQNaEZ5DDS/KF8KI4dtKxMHfo4AW?= =?us-ascii?Q?c/E+D86CUGAWOZbgrOvnfnfFXoeqA0S3NKeVMiXwUgz+bw250sAnhB18WL63?= =?us-ascii?Q?SU5ZZW86v2Pd6CE0ag3+pUzlMlBgOZWI3yDzdvGGCYWZTbdEJiZ6kIlT5Yw9?= =?us-ascii?Q?Pot0XDnil5VF5s42F/Sj92/ILUTRCQVfgIxRM4gRGsQ3QivnOW4VHW0PNFUz?= =?us-ascii?Q?/PfzPTyDCY499aV5hA4srE6CE0mpeLglMUNcuaejAND07Ii3kYZ6T3chlnGF?= =?us-ascii?Q?jsa+jDmYfmf6RNXv39R3yO57kjGwMLycvzUvy0Eq90uxSU4Pd9IMJow8MtQV?= =?us-ascii?Q?Pip6HSLSDPAFcG5mo1m8wYsbvK7jJ6c0epI5JbXvPiRkYn2inay0CETA2IWa?= =?us-ascii?Q?FKRmHRQc8Hut0qzmFHMiqvkNfwSIiKV7JHiLAppWmu8ApbU2ds8ExRIuJCNt?= =?us-ascii?Q?J6Jm7aEEHNSFotkLp3y+cocc+OGIUTIbVtgXrTAOSwCu40OaJpGc2c75T39j?= =?us-ascii?Q?gc9WFWDi0BxHI8LeIiXGgM2C634Q/1vz1bwxyORerRwFKuIQzuJR87GziPvf?= =?us-ascii?Q?DgPLGuBALi0HbPEt9P56kQsZdoAyrnKjK9CB/8YNdxqNx/JunsqNTLgvFXkp?= =?us-ascii?Q?DMzPk/Du65h+1iHVC188QdsH3LWXosLHKj25lnR0jpjS6teC+fn09S7jGwJ2?= =?us-ascii?Q?2AwmPVv7+Bzo132Ju6s0xe76ozY5zsgBOoX80qMTPiHp9zK2V0Up1sYU5hbl?= =?us-ascii?Q?0ivIps52TB631LpjUxhoHPTcv7YhG6iELySNh1KkFZqwgWQVbJwtDkT8Yk5g?= =?us-ascii?Q?+V5miBBqc6M4k6OtrgV0b0I9tCVxfY5SL6iVP2og2hBVGWRf1cPAKMo9tRoI?= =?us-ascii?Q?ZOehUpU0pfV9/iVdujekNz0Vpp/QWfFZdrKm5x/8Txo3FuX9yt9TMuR4zxZp?= =?us-ascii?Q?AejCpEW+kchx72TU1TIxuwFJ+nX0z0NIGPThdSsdmax9dOiP0t6knLSoLLXy?= =?us-ascii?Q?ovjJdyMPrQpCspWGjeUNY8qEquirw2RT1jYG+hynVezi3Atz+6j8Q8Azayu5?= =?us-ascii?Q?vxx6btX2WmGW6sXb5Up66aJTgCcwLxyYdIvm7dbzBVkZeTPwVai4hZrReTni?= =?us-ascii?Q?RVs9IYed+c2Y3y67TKhlRqxMUOROtu39rT8zD73ptfp/YQELdzAniC7L0weN?= =?us-ascii?Q?XHEgcTaG/BRrHR59Dg4cYkJFTN6P9RcIrQoVh+l1qSudZcS+tfSUZDyQ36qf?= =?us-ascii?Q?7vBYWEgsnptZj7nGXcY/Hd6GfU/mi6ohlJRA+loaqTHkZYVBUuexsg+NAz5M?= =?us-ascii?Q?gF9n7o7yAg=3D=3D?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 09fea15d-5fe5-4fdd-78fa-08dea534973f X-MS-Exchange-CrossTenant-AuthSource: LV8PR12MB9620.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Apr 2026 14:44:03.4211 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: QT7prYoQiDEDswb1v1PO9hoEPoOc1VyCmFsoK+mNCXCiMtctwwTVIsHduxmn+dGBMR3+mIdZvJb2ByIEP2TWWg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA1PR12MB6211 This series attempts to improve SD_ASYM_CPUCAPACITY scheduling by introducing SMT awareness. = Problem = Nominal per-logical-CPU capacity can overstate usable compute when an SMT sibling is busy, because the physical core doesn't deliver its full nominal capacity. So, several asym-cpu-capacity paths may pick high capacity idle CPUs that are not actually good destinations. = Solution = This patch set aligns those paths with a simple rule already used elsewhere: when SMT is active, prefer fully idle cores and avoid treating partially idle SMT siblings as full-capacity targets where that would mislead load balance. Patch set summary: - Attach sched_domain_shared to sd_asym_cpucapacity in SD_ASYM_CPUCAPACITY to use has_idle_cores hint consistently in the wakeup idle scan and rename sd_llc_shared -> sd_balance_shared. - Prefer fully-idle SMT cores in asym-capacity idle selection: in the wakeup fast path, extend select_idle_capacity() / asym_fits_cpu() so idle selection can prefer CPUs on fully idle cores. - Reject misfit pulls onto busy SMT siblings on SD_ASYM_CPUCAPACITY. - Add SIS_UTIL support to select_idle_capacity(): add to select_idle_capacity() the same SIS_UTIL-controlled idle-scan mechanism, already used by select_idle_cpu(). This patch set has been tested on the new NVIDIA Vera Rubin platform, where SMT is enabled and the firmware exposes small frequency variations (+/-~5%) as differences in CPU capacity, resulting in SD_ASYM_CPUCAPACITY being set. Without these patches, performance can drop by up to ~2x with CPU-intensive workloads, because the SD_ASYM_CPUCAPACITY idle selection policy does not account for busy SMT siblings. Alternative approaches have been evaluated, such as equalizing CPU capacities, either by exposing uniform values via firmware or normalizing them in the kernel by grouping CPUs within a small capacity window (+-5%). However, the SMT-aware SD_ASYM_CPUCAPACITY approach has shown better results so far. Improving this policy also seems worthwhile in general, as future platforms may enable SMT with asymmetric CPU topologies. Performance results on Vera Rubin with SD_ASYM_CPUCAPACITY (mainline) vs SD_ASYM_CPUCAPACITY + SMT - NVBLAS benchblas (one task / SMT core): +---------------------------------+--------+ | Configuration | gflops | +---------------------------------+--------+ | ASYM (mainline) + SIS_UTIL | 5478 | | ASYM (mainline) + NO_SIS_UTIL | 5491 | | | | | NO ASYM + SIS_UTIL | 8912 | | NO ASYM + NO_SIS_UTIL | 8978 | | | | | ASYM + SMT + SIS_UTIL | 9259 | | ASYM + SMT + NO_SIS_UTIL | 9291 | +---------------------------------+--------+ - DCPerf MediaWiki (all CPUs): +---------------------------------+--------+--------+--------+--------+ | Configuration | rps | p50 | p95 | p99 | +---------------------------------+--------+--------+--------+--------+ | ASYM (mainline) + SIS_UTIL | 7994 | 0.052 | 0.223 | 0.246 | | ASYM (mainline) + NO_SIS_UTIL | 7993 | 0.052 | 0.221 | 0.245 | | | | | | | | NO ASYM + SIS_UTIL | 8113 | 0.067 | 0.184 | 0.225 | | NO ASYM + NO_SIS_UTIL | 8093 | 0.068 | 0.184 | 0.223 | | | | | | | | ASYM + SMT + SIS_UTIL | 8129 | 0.076 | 0.149 | 0.188 | | ASYM + SMT + NO_SIS_UTIL | 8138 | 0.076 | 0.148 | 0.186 | +---------------------------------+--------+--------+--------+--------+ In the MediaWiki case SMT awareness is less impactful, because for the majority of the run all CPUs are used, but it still seems to provide some benefits at reducing tail latency. Tests have also been conducted on NVIDIA Grace (which does not support SMT) to ensure that SIS_UTIL support in select_idle_capacity() does not introduce regressions and results show slight improvements under the same workloads. See also: - https://lore.kernel.org/lkml/20260324005509.1134981-1-arighi@nvidia.com - https://lore.kernel.org/lkml/20260318092214.130908-1-arighi@nvidia.com Changes in v5: - Drop redundant RCU protection in nohz_balancer_kick() (Prateek Nayak) - Do not remove CPU capacity asymmetry / SMT warning (Prateek Nayak) - Link to v4: https://lore.kernel.org/all/20260428051720.3180182-1-arighi@nvidia.com Changes in v4: - Rename sd_llc_shared -> sd_balance_shared - Add preliminary cleanup patch to use guard(rcu)() for sched_domain RCU (Prateek Nayak) - Apply SIS_UTIL scan cap only with !prefers_idle_core, matching select_idle_cpu() / has_idle_core logic (Vincent Guittot) - Cache env->dst_cpu idle state to reduce is_core_idle() calls (Prateek Nayak) - Remove warning about CPU capacity asymmetry not supporting SMT - Link to v3: https://lore.kernel.org/all/20260423074135.380390-1-arighi@nvidia.com Changes in v3: - Add SIS_UTIL support to select_idle_capacity() (K Prateek Nayak) - Attach sched_domain_shared to sd_asym_cpucapacity (K Prateek Nayak) - Add enum for the different fit state (K Prateek Nayak) - Update has_idle_cores hint (Vincent Guittot) - Link to v2: https://lore.kernel.org/all/20260403053654.1559142-1-arighi@nvidia.com Changes in v2: - Rework SMT awareness logic in select_idle_capacity() (K Prateek Nayak) - Drop EAS and find_new_ilb() changes for now - Link to v1: https://lore.kernel.org/all/20260326151211.1862600-1-arighi@nvidia.com Git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arighi/linux.git sched-asym-smt-v5 Andrea Righi (3): sched/fair: Drop redundant RCU read lock in NOHZ kick path sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity K Prateek Nayak (2): sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity sched/fair: Add SIS_UTIL support to select_idle_capacity() kernel/sched/fair.c | 157 ++++++++++++++++++++++++++++++++++++------------ kernel/sched/sched.h | 2 +- kernel/sched/topology.c | 90 +++++++++++++++++++++++---- 3 files changed, 195 insertions(+), 54 deletions(-)