From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from BYAPR05CU005.outbound.protection.outlook.com (mail-westusazon11010023.outbound.protection.outlook.com [52.101.85.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C604D2E3AF1 for ; Thu, 19 Mar 2026 14:00:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.85.23 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773928845; cv=fail; b=MwpNp4D3DmJMUYwt7+rx9zEWMH5RmaQsAEwyoCIaOe+3sHRMGfg48fETBGgY0HLRwB13E55YnG38EOBALe1Anx07nRejfij4uNJ1edsDjAiHGTiUbXsCHW25VVYBx7D690j4ADiFxcn6fVWgJNzAGThcF72pJYFyqn5/MJ+K+BQ= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773928845; c=relaxed/simple; bh=RKfvlOOI6TvdCW+FETp20uHneSaDA9/iXSPcT07OKEQ=; h=Date:From:To:Cc:Subject:Message-ID:References:Content-Type: Content-Disposition:In-Reply-To:MIME-Version; b=GyNF/0PUW6eho+/dSRsrzxD/RW9R4CTdIVraUp4EuqzIJ74M7mF8A3XvFZZ2PBhfFytf8tk0q1G/0ubIadsMQukkR54dYqhIKtq2erKkOsP3AFe1z1vYQg/P0t/AIQ31pO1ekft92I9Y0CjbrmtFRa+5Y04Z65xj5poX44GfXKc= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=uPTuUutn; arc=fail smtp.client-ip=52.101.85.23 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="uPTuUutn" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=wtCG6w7Okv5v5ZLorBX2yzGH1H7Wvyuijk+XAVv6bVw+P+Ciqojk6OHW+JMf9rg8lt5JkM5RgidS2sPlFf47Z3ICwPAD4Z+OjYHDzh8Rj9TiFQXT+nmLmHGhcgA+Uk109auL4SAhbcDhy4CQdKYzsSuMeP08agIlF0HyeAkf4UfOULYlGrAaph4Dj5rp5KMY66NTE2KQykUZbYFDhKc6Y5P9unCKDs/AuP7/R4z0KxD+I6Plq+AMMKXwDCzmjzTCkl/lsIUkLKwYnmaZBf60Ea0IWbCL6oqGLjSNvkZygVLKjuXjwf0tmvKqHIr0uu8KTZ6q4kFlOZKCrtXEgMW3KA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=KKzUbQPWSxUzmdWnMNdYxM13xjhhj48JpzckxJy5YNw=; b=WMErLNcfmYZjcg3fquMds/7NIWMbRWQj9RmJETRwVoASJ22i1+gkRHUzz9Dqlsr06y/MzAaKhT6zNCwpELuER8ppfhcQS+WtlqzYcEllJZFGCktDnJgxff4pJQO51rdmp5jwP58NnRbOGSzgOnfJo1iZdeG6tpwCO86FeeAQ3W7QBpMIKi+OzXyqVomRQLtnY8kcAdI9GJlvZ7cYyr3gesTyTOA6skmX8c9nHM0rud0iIxzR2VFtviwVkOV1s6iur5T+jNRj8dSHKmnLAZQiwVj5SghV+g0vMZpE469W43o5CzkiSUa8lyKxA5MMTkuw+beof6uKBS0fqFvAZuPcpA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KKzUbQPWSxUzmdWnMNdYxM13xjhhj48JpzckxJy5YNw=; b=uPTuUutnXsZQ6Vxqc7HGMBa1txF6391cPgPfJhT/R27h+73dHphm13zYXSsbc+Eh8zb5WPUgt8SfCjGpI44Wghb1xZLMGOX0aHKfS4sus2AdC5LRwKKA/FyrjKPj+vog8UWg2cQ86uyTKgW6o8cfRE5DuAqTxXvNZMQoK+RekRTNKwcUyIO20A13GAvmzHJLMzfjv971lbCUCyh3gzH6rwaphNm9ys+cpAJ38rY4oOtQC4k3eExPco3p2SqBWjdivd7Xw/5Sf3o8qBg0aoAIoDYmWcsdo3ADWBQNLH+/NOaxn4PcvE6DgEqWInG9J/FJKpMA0HX99kjslXHs3fF2pg== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) by DM4PR12MB5818.namprd12.prod.outlook.com (2603:10b6:8:62::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9745.9; Thu, 19 Mar 2026 14:00:32 +0000 Received: from LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528]) by LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528%5]) with mapi id 15.20.9723.010; Thu, 19 Mar 2026 14:00:31 +0000 Date: Thu, 19 Mar 2026 15:00:21 +0100 From: Andrea Righi To: Christian Loehle Cc: Dietmar Eggemann , Vincent Guittot , Ingo Molnar , Peter Zijlstra , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Joel Fernandes , linux-kernel@vger.kernel.org Subject: Re: [PATCH] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Message-ID: References: <20260318092214.130908-1-arighi@nvidia.com> <4830a5aa-0682-4501-af92-8a2e7858b1d3@arm.com> <15ffdeb3-a0f3-4b88-92c0-17ffb03b0574@arm.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <15ffdeb3-a0f3-4b88-92c0-17ffb03b0574@arm.com> X-ClientProxiedBy: ZR0P278CA0177.CHEP278.PROD.OUTLOOK.COM (2603:10a6:910:45::11) To LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: LV8PR12MB9620:EE_|DM4PR12MB5818:EE_ X-MS-Office365-Filtering-Correlation-Id: c17aba5b-4811-407b-752e-08de85bfe214 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|376014|1800799024|7416014|18002099003|56012099003|22082099003; X-Microsoft-Antispam-Message-Info: 03EOp/j7R9E8PmZpTSCBKOKJ114jsVQ5hIoEgoa0BuUS0aWWOXGP667DvnlEE1k8hWJWQgy64ts1NKQ/omjmQ9WLKqCe5uMc4pFaURfGGPedvfMC3LOr7Ecz3t7e4wIpfqmVFg5mmGq3Sp+ydODH8/z9akFbxBTt7Chzwun+LwNn/AXmIqKDevzuBuzy/yg1/briSDjYxPq2/BkapVAAxjbpmgjNhlai6JCVbEOOp9C4+VndEXVRRLs6eQLMYfn3XyZLDaSYf+/jTV7dG+RvH0FkMdpl+jSjJx6CsUhPPacU8EquBVvV1Jp11RZEQBOjG2cmN0+bv3ot+9pr+mLLdAjoQmu5vjUYH9Vm9Z9tWTmQInyyz3bKUPD6vsP/NK0pVZoxVJedJQPIF5GdUx6VZ9GcCVzloxzolzX2KR2stHUH04SzVEaUE7pULfa0ZDOOCQGCgKysP/UI5t0oiay/9hAssI5lM3g5YehNxd11C3jxfrrGTQhWxZxP2StoM7pSsMleb06GTo6lfnoZlrrK+AnZ8MR1MKxSuzEPuER8U1NIbYPePbMnfcko6k1EOxOaLe8rhM7Ap9bKOjhq/03WyfkOybSfAnHwTif6QsQmN4LYIZmBqYeEsqEtd2FC3TRhvCF8xhue+UNmTx7bONed1WrVZ03vaC4r4skdGsPA1LtDC7AhrlrQFLt3Sa65PFudQQusnYZLToxvlNbYE0CzpeF9R3MFtLCo2m8ZyScmJng= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LV8PR12MB9620.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(376014)(1800799024)(7416014)(18002099003)(56012099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?v9q5RAjSC7KCoJurz4394nhmH4noeeZ6mwC0S6tX1QVnmiFxWTdRjMNvNA2g?= =?us-ascii?Q?ckxN7zTYacA1K+WrfcWBdC+891Gsct7E5U1pz2Lbq9S9vEnjmXdPX1qzdd/+?= =?us-ascii?Q?JkLnjl3Tz7rQayXpCkGOCJ0uJKnn6MH/xFNOFrx+Mdf8VVnGk2g1Ux+m5+Ir?= =?us-ascii?Q?s/0dyJlPWVmPGKVntDvxWXBW/VUQCZRaqxS6orS7SJIW6ZEV7X/VxZynJgAt?= =?us-ascii?Q?JZ0GZhvH/4KnEpxdT+uXBE4N95voteNfqAOEaKPGPWCxdDO4GaVtMur0VUgE?= =?us-ascii?Q?y0N3ddUc69u0LdSeca42mLzYp4jRPf0kDs7e7dhAJBk4uZaaksOSJ22AFf7h?= =?us-ascii?Q?tUUZ42tvWqWB2rWhggIa2IzT22CzIJTJuAVNhIv7/JmHW2c4IR9r5afNCMRD?= =?us-ascii?Q?ZVes7NqdzRaHSZnwnWtwGfiskyTYcoR5J6nfJa3UKaSnfWB0elJFrYQEuuea?= =?us-ascii?Q?eBAm5ekaAb8kgRmxqOT4nKHpMnb6RuOx9cZQDoj6bz7UflwIEaEFp8NdEJva?= =?us-ascii?Q?k9jbOZGTgIAz39hTGcAIREvGP48az7BkLYQkimVzGfQx5f23gn1mzjTMYRhf?= =?us-ascii?Q?3KBdxhbWTO+VRbHxm2jkplgg6kpTqj0uJOLCmyOLn3SSzjqiHawMqsueQ+ko?= =?us-ascii?Q?CWerXNoKmVcBYy3Wtl9Y7A/w4gujKpBz9RzNLjsuUKnxGNTSBComULqeJFFM?= =?us-ascii?Q?oSDt1jTxXqya970jJVg0aSrDWS9tOVJzjUwZekc5WEJP7b0Qll5B4haFeFl3?= =?us-ascii?Q?h7B8UY/qamQPfUXTzKtynD6hercy0WS1SA4WuP9aS8qRiNybKz1YRm5nhSBy?= =?us-ascii?Q?TGhrOXfNOvmnZD8gfXWftDI5I1FNhQT//v+aDNUW+XumO5Y+G0X+JsaIPUrP?= =?us-ascii?Q?+dSZ4hkZX3ZvNlbwP/wYpYBVOeEjbjdYiEfqSDqS5/aWVvkqGXTjMvcsZK+3?= =?us-ascii?Q?SuXpVMHSpK3kF3/IGDqlmQsHxKbiDDqLYb2ACsE0Dz1QiU8p3L+up9tg6Wxf?= =?us-ascii?Q?enetETiVrrMtnxaRHVShbn5frQ5PkOQZSo5JFVmC8SmbFH/P3hNOzmmiEn35?= =?us-ascii?Q?f6P9n5vIbpSm0sV7MiUYHJz1uEbxIcmmKM1Ni8Ut86XOBCFVBIY9YDpD+cWy?= =?us-ascii?Q?Qu4kdA7UMwTbBXo1IgcSvQB4Ssnp9qUUiD4n4IhlmlrWmiVFFF4jCiVl77ca?= =?us-ascii?Q?B1G8WKrsu5zE2J5jeGKs9FN/xE9+NwGEX2LyHAOwSkzjbaXDkBgCEO4Rg5/i?= =?us-ascii?Q?cWL4th4IJu9EgsxErVRg2Hgv8NMmlo15P94w5U02vvf6nd3vG4Yl/2+tH1kE?= =?us-ascii?Q?CdBQtt9iFfpVO2KArkxG3rkGhmfbkcJW+XzmsKdXPJ8MotWpeqs0ic4a9MuY?= =?us-ascii?Q?doqIfVwsLoBV68GkVjA4AkpQRHw/naUZlUnPbdgyV6QdEMmEwf8OreEX+/+j?= =?us-ascii?Q?YDxzGOu6N4bf4MoRhQOYVf9Qxhvc2NqNrUdkJe8ttQ/fUowKh/hsnBJqSxF2?= =?us-ascii?Q?wJLAPiNhNheDcE7iHQ0lx/Kfh/9jB49dJJIeGSMFP21AukxLbl7YipUtxtoJ?= =?us-ascii?Q?SbMOEY5hEiZ51PRn6N9gdTpQLB6XWQSEhC8ZF9chyS4qPCBgIiJ9JnLFfxtn?= =?us-ascii?Q?+P1zettsQfGEAlPJ2KAsYuRmZ4rXX5Bo5Zu7NY9/tfgfarDvvA6dyv+fna3p?= =?us-ascii?Q?4dLrJBT017nfXc3UvBqnh10/nwORxXitG4PJ2o/lYyPKXZ6fBbXSzhMC86R7?= =?us-ascii?Q?V+29+DKGAw=3D=3D?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: c17aba5b-4811-407b-752e-08de85bfe214 X-MS-Exchange-CrossTenant-AuthSource: LV8PR12MB9620.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Mar 2026 14:00:31.4804 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: i8IncSgPqRNtjMsfLVYwkcFaej3/sUGpBIbKsxndAE5PnjBuhO3EpobzVJFDOMuGLqsaFIEc5iHdatH5ho13yw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB5818 Hi Christian, On Thu, Mar 19, 2026 at 11:58:39AM +0000, Christian Loehle wrote: > On 3/18/26 17:09, Andrea Righi wrote: > > Hi Christian, > > > > On Wed, Mar 18, 2026 at 03:43:26PM +0000, Christian Loehle wrote: > >> On 3/18/26 10:31, Andrea Righi wrote: > >>> Hi Vincent, > >>> > >>> On Wed, Mar 18, 2026 at 10:41:15AM +0100, Vincent Guittot wrote: > >>>> On Wed, 18 Mar 2026 at 10:22, Andrea Righi wrote: > >>>>> > >>>>> On systems with asymmetric CPU capacity (e.g., ACPI/CPPC reporting > >>>>> different per-core frequencies), the wakeup path uses > >>>>> select_idle_capacity() and prioritizes idle CPUs with higher capacity > >>>>> for better task placement. However, when those CPUs belong to SMT cores, > >>>> > >>>> Interesting, which kind of system has both SMT and SD_ASYM_CPUCAPACITY > >>>> ? I thought both were never set simultaneously and SD_ASYM_PACKING was > >>>> used for system involving SMT like x86 > >>> > >>> It's an NVIDIA platform (not publicly available yet), where the firmware > >>> exposes different CPU capacities and has SMT enabled, so both > >>> SD_ASYM_CPUCAPACITY and SMT are present. I'm not sure whether the final > >>> firmware release will keep this exact configuration (there's a good chance > >>> it will), so I'm targeting it to be prepared. > >> > >> > >> Andrea, > >> that makes me think, I've played with a nvidia grace available to me recently, > >> which sets slightly different CPPC highest_perf values (~2%) which automatically > >> will set SD_ASYM_CPUCAPACITY and run the entire capacity-aware scheduling > >> machinery for really almost negligible capacity differences, where it's > >> questionable how sensible that is. > > > > That looks like the same system that I've been working with. I agree that > > treating small CPPC differences as full asymmetry can be a bit overkill. > > > > I've been experimenting with flattening the capacities (to force the > > "regular" idle CPU selection policy), which performs better than the > > current asym-capacity CPU selection. However, adding the SMT awareness to > > the asym-capacity, seems to give a consistent +2-3% (same set of > > CPU-intensive benchmarks) compared to flatening alone, which is not bad. > > > >> I have an arm64 + CPPC implementation for asym-packing for this machine, maybe > >> we can reuse that for here too? > > > > Sure, that sounds interesting, if it's available somewhere I'd be happy to > > do some testing. > > > Hi Andrea, > > I will clean up the asympacking code a bit and share it with you for testing. > > Interestingly, when we looked at DCPerf MediaWiki, we found the exact opposite. > > On NVIDIA Grace, enabling CAS due to the small CPPC highest_perf differences was > actually beneficial for the workload. More interestingly, we saw a similar uplift > on a different arm64 server without ASYM_CPUCAPACITY when we force-enabled > sched_asym_cpucap_active() even though the system was highest_perf-symmetric. > That suggests the uplift on Grace may have come from CAS-specific behavior rather > than from better selection of the highest_perf CPUs. What NVIDIA Grace in particular? On GB300 ASYM_CPUCAPACITY seems to be enabled. I can try to disable / equalize the capacities and repeat the test there as well. > > I'd be very curious whether something similar (i.e. the inverse) is happening in your > case as well, i.e. flattening the capacities but still forcing > select_idle_sibling() / sched_asym_cpucap_active() despite equal capacities. Of course, > that will also depend on the workloads (what are you testing?) I can definitely try that. I'm using an internal benchmark suite, in particular the benchmark that is showing the bigger improvements is based on the NVBLAS library (but using the CPUs, not the GPUs), not sure if it's publicly available, I'll check. > > Just to illustrate, below is one example where CAS improved both score and CPU utilization: > +--------------------------+----------------------+-------------------------+-----------------------------------------+ > | Platform | default (v6.8) | force all CPUs = 1024 | force sched_asym_cpucap_active() = TRUE | > +--------------------------+----------------------+-------------------------+-----------------------------------------+ > | arm64 symmetric (72 CPUs)| 100% (90% CPU util) | ------------- | 104.26% (99%) | > | Grace (72 CPUs) | 100% (99%) | 99.49% (90%) | ------------- | > +--------------------------+----------------------+-------------------------+-----------------------------------------+ I see, interesting. Now I'm curious to do the opposite on the GB300 that I have access to, flattening the capacities to 1024 and see what I get. Thanks, -Andrea