From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from PH0PR06CU001.outbound.protection.outlook.com (mail-westus3azon11011057.outbound.protection.outlook.com [40.107.208.57]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 21A7536BCDE for ; Tue, 31 Mar 2026 09:04:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.208.57 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774947878; cv=fail; b=WGIpL5ag1ibINAEDxpUZH338KRq0lfD90b/oFPRiThIHDcxL0NnfI7uXkfn7F6cKLW2V7oK4rPjsXzP2L7Tc27vAF4xXB67jPJPoNgV0MORxHf0tB1ahtPAWGkMcJwD1GD2/SDFbyXweXWRqEAcSdOUfLJgM65HV7MfiRkFi4jM= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774947878; c=relaxed/simple; bh=DnO/Ohk8If8CHTbTbtfj+1dEEqSyrKHtZ1iIf6XAst4=; h=Date:From:To:Cc:Subject:Message-ID:References:Content-Type: Content-Disposition:In-Reply-To:MIME-Version; b=IAkbHkRE8JMFerxdgrVuvFicf6Mi0XiFDmzcPCzEeCjSwSqMGDxftbkRGZv8ZrBpfMwj1YLIz+EOJ7f0HFhxe1fytMY1RITnqpLDM/vI2wyv/CuRPfAW9DLUjyBp9KhYWHW8ccS7uq0wMoO9tNbBUvcISl+7aOGoPeMEFbFOsLY= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=BjhfhSLV; arc=fail smtp.client-ip=40.107.208.57 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="BjhfhSLV" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=A7QfAev1G5lpn5dkzgeEXsOOvTQxntz9pX1DcOP4mI+ryTZZ59CKfdvIveWJhPrTImSf8UScVe9gGal1fd49PyGkZhzzJcDH1tZrurev4WikE+AaoMtnRphWlgLphHbvZjzeXHL+vhFInClfS3v5VHBOtS4EXNPvNUhafT7Z8iSwYg4hg7uYHViLS1UOKGyDBpabjUPpEYnPAzhyd1ssQx9wlpYt5JujJIvGbtEQBA9JKcTrPgWkb3+9GKQwSraCcoEgFMIJ4tsZrcn4L2eKtMnDaPZUS58YrASNtC3blnb5gcFRrYKjd8DZ2B9yeAZpwsiX/nbkySEpxZkccAS1hQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=oxZEr/XutzEBfGsp0/dPkwhzBWHoZwYMfUWMkAaP7aA=; b=cOueQaScuv9P2FgG567YUMxDSVOTtCbRQVNS68GOKff8xnAOLD/8SyH7JjNCrC4XPfl8EFYZ45uZ9Kv6f4UuEmsg5LeA+26Jx/qm2UvKTR09XwVaAHdWTmETnJhjWzd6Sn6NRytjBvgm90XS3yKZuP62dS7KfPFNu7AaPB3h7xvwQOetCfadrrLBTqtrr7+0Imns87SlznyJaVFsVxLMTpclYFARJyFcYnBaZ8oRxsUUoiT8U734uMVrSUdgLnwBJ95oIYPnCXc2ukNZOPt+jxAHrkuekH6xMLzLzF7wVLbp4eLBkbZNv+8bQj3PEocJB27VYJ7ODoSjIoMRdQcwKw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=oxZEr/XutzEBfGsp0/dPkwhzBWHoZwYMfUWMkAaP7aA=; b=BjhfhSLVc1YtFsg3JP9c4pzjC38Lrk9XQB0LPgKWnQqAJ1Ydwpewkjf0xkiBa+wBGysHDWgMYKW29LxhONM8F4XzNcoXlBlZHJoqinsyoYUHpu+QMXO4ZTfZXbQInqGH+Vy45JQq4yfc2UiP4MyN/kxKSnVH40O6KjAREQd1UgdiepAs62j/DuBIlXOB52BQIAwLhfQZ0uTNlTL9N4NtXVmmie2QHssPwYREZ1n20xntqD0U9W+nxKggsWOn+hf/dX+DQ7JCbr9eDjdIW/8tBUrtGEKmLQtYUqrTW3YROV2STcf2UJ9OxUtAh014nonTwi1ce+gUyrfo2TiFIAPg4A== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) by DM4PR12MB6040.namprd12.prod.outlook.com (2603:10b6:8:af::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.15; Tue, 31 Mar 2026 09:04:32 +0000 Received: from LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528]) by LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528%5]) with mapi id 15.20.9769.014; Tue, 31 Mar 2026 09:04:32 +0000 Date: Tue, 31 Mar 2026 11:04:19 +0200 From: Andrea Righi To: Dietmar Eggemann Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Christian Loehle , Koba Ko , Felix Abecassis , Balbir Singh , linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/4] sched/fair: SMT-aware asymmetric CPU capacity Message-ID: References: <20260326151211.1862600-1-arighi@nvidia.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: ZR0P278CA0155.CHEP278.PROD.OUTLOOK.COM (2603:10a6:910:41::16) To LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: LV8PR12MB9620:EE_|DM4PR12MB6040:EE_ X-MS-Office365-Filtering-Correlation-Id: 00706d44-1a87-4c7d-8c68-08de8f048585 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|366016|7416014|376014|56012099003|18002099003|22082099003; X-Microsoft-Antispam-Message-Info: /Uypk9PJcqY/RfJAJA+v8cHI3LIMbGdFsnTb3qMYSU6jrDyD3EBpajo7nB1X9clu91DrcVQjGi/IpOzCygH3L/V1RQieBi6NCSQRogyUjuAMpN5NOXGT4uXXOUdzFn80xJCoRxsnXVgJPdCAQaomx4fehYoUhm5moAeUYTdp1VVdEYZBSxFZwqqSmz5JqKW1ZXmkv8MXxaQ0bl1miYLIcBsAO75QJn6/3t7qg08XPz82YFRUmQr9Ypo7cWsiLu/On2kko2yeGDBwz+2vM2NCf6lF0ieUiwMDNuHbLriq8N4OrFU/mQ8sPiLDYRyPDeLyxsXlECxp6t0c/UY0InS3Wt3xgZlQkA1pWwxTt2QqDDI3AraBm+yLci2qoKM4eaYHUlIypxDYpK4DWwbPzJKtOCsyLi7rVHLGUcT4nAUulkMfbmrfQLN67zqTcFQr+hCx6C+1DNT2gpe1DmZZU1X5yJOr4B182N3cVXKlmBC5wKrIoYlh+aj/WxRcnXAEfWHNg/LUIxBhL/3LSqBlc2vYX1azSUi2y0etAyi60Behy9pe8RdNUrolQuwunbRYPmy96h2nM9jUVB3lOb48Ea5kWvnU7QG75hDhxSuL6bxqlgM6OvFn8t/4Ra+p1LwLxOlgcHVPVbq/JZ/ibttfHW3o/IMVqe94S1yPUPYP7j6mH9o+hq9xnPBuzNxTgTuaX8dkP+BNjHyYqst+sQCXbfyTAeXH7+EWfjdYvc6/FgTc8KA= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LV8PR12MB9620.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(366016)(7416014)(376014)(56012099003)(18002099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?f11aFhXzKmzBGMYC6nmD2OIaCnlfcXueXuQtU/uxslXwh+4LWW8mAKYO907l?= =?us-ascii?Q?nfhNb6oIJJ5EH3H5nMJZHziv4CTG10W6hDDB6kzO0hV2e/RY0Te54vMWpJGP?= =?us-ascii?Q?02V+k66Gn7dy9VwRsaQyXFeaJlDRIWH6R0oaQOSmfst3ahCgvVCiO7SImDgi?= =?us-ascii?Q?1o7/rgr07g2Hu/8UxNQPVFnbLQwQf2tNelJQA0oD54EEpMhTamK42uvcivcB?= =?us-ascii?Q?x7dG6MBxblrSb0M8SWvP3EVP9LFC4xzmokr4ytRh1Rk3WjPgnH2sMJ9XY7Ur?= =?us-ascii?Q?vcycrAETB1VlPwSAmSjQ6NQTzIwXj9l5C1xfhnavNCHqgu49jKWVkw1YHQs2?= =?us-ascii?Q?JIUuRKl76DLa5NcpSCIQZvCg+42UR1OWOfgBMw058S3PaB4ARh8O8rT7VeML?= =?us-ascii?Q?43z4dyeYKdAdphicHQOFYSRUxO5s/VbnRNbyBTEeABUNgMEvckh+rT6vzc4+?= =?us-ascii?Q?d/nlJYNjvpF8Of4Pp7k3vXpeqk71mtHUnOCqA4s4AeUH8gEjWAE81+uf9iqx?= =?us-ascii?Q?F8EiE2RbIDnCRBD26Q0gD5rOXTw7VfNUYZOnDA3RrEAjmZ8PX3MjwFpJ015T?= =?us-ascii?Q?EjJcgYiYo012TsvyB1Cgxj+Vv3no42HYVOiTkUGfiroJjeq9++6xUtTF8hQJ?= =?us-ascii?Q?ffDvOvFBH7bu3R+CDlFqqwzpNL3SO9TNqygOZNNgOT9Ir+WurivSHP1MlQ3v?= =?us-ascii?Q?MXuBzij18B0q8tYNw0NTf2xl6C43o/A59qcdul1uJ1gFYAucs1ykIRFgBaLu?= =?us-ascii?Q?eh57qFrejBYdxEZ/NTSjg2a+O6zoHKEf7/iQB7AE0bLvoLz7PgbB3Kb2lISx?= =?us-ascii?Q?yZN9h+5CyWhfaaWMtTf4l/tSPhrun4ykUQfoPXSPjXU70jDCClKkMGN4wCT3?= =?us-ascii?Q?wvLRD0ZKZQyau2QDXVudwoDLHA1qUkqzgyuYzFIFJdpdiUMpwJUcIjS3Qzg3?= =?us-ascii?Q?CLtcLC9KyPJb8rryqhEAS++vW8g6VyZtAWisptxBbEQpcl6nhjCMd8+UVAYi?= =?us-ascii?Q?ZzNMJN7yMXi7p4PVFukusq3gl29cna9Rioq0Mng27OsArCwkZvzNBBcJQ15z?= =?us-ascii?Q?C5XXcRidgrhQTimjrQm8yDvydZU2iKpXQU01+yRyoaQAkob6FyR/fHq7o1o8?= =?us-ascii?Q?AdAj6Z8mzT/XkasVC3Q9UEGpn6AXT/md+688UWKq15hjGlY8gqCmOdGMydUP?= =?us-ascii?Q?y7UTNIp7zzj/e6SO4Mxq3h9AaRyo7BjJSLnfHt1kxyzb2VjVnkmMigbJmm9H?= =?us-ascii?Q?0/OV9MhHh2OhjL+LEdmxn83WautQrqf2n1vieOw/w2ckbmq0Jwuig9lX3Pxa?= =?us-ascii?Q?VFbEB9u5yiJ+TxbPvdIHHCIqFUxXx2XtC9/3KIocvFh1LenEXwOGF6aKt3s4?= =?us-ascii?Q?ecduamfQUrQR15trPgbvki02KoxIN69nxv1IG2u434oTxiuRCg35T3lhYxyK?= =?us-ascii?Q?Qa/w/Q06S+Nh9+JLa4oIrM7lJr+GC3gwmrcss58TZVtEG4r9af2sItlbwYHS?= =?us-ascii?Q?hARFam//ZtyFQ7pc61t5OumZEM76P45T3L2stlbwDgzx9zXVKhT7zvB5OnF8?= =?us-ascii?Q?vhDEFM0oYSomU4IHMzSFCeOgH+eu5bWobiRPJhWSF+sL98acvFA46iAiumlU?= =?us-ascii?Q?SxlPo2iC5WKNWdb7hwD75t1NeTr9WcaUz8/nqEnyqdscOIVE2avXZeNjS/wA?= =?us-ascii?Q?Adv3bleE7IomlilBHC8HUD4SxXOCKt/3BrWHjpEZ2ND9alhUxv0qQ1UYI2ps?= =?us-ascii?Q?VzihuTtX6A=3D=3D?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 00706d44-1a87-4c7d-8c68-08de8f048585 X-MS-Exchange-CrossTenant-AuthSource: LV8PR12MB9620.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 31 Mar 2026 09:04:32.5678 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: +wVDOjy2jwrRz3M+70sPgTQ/3nEQpdQJykFrYIOwf5KAEjkD2LDZfdTUtp2NDV/QdMjdvnL6i8TfR8ftgM2TXw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB6040 Hi Dietmar, On Tue, Mar 31, 2026 at 12:30:55AM +0200, Dietmar Eggemann wrote: > Hi Andrea, > > On 26.03.26 16:02, Andrea Righi wrote: > > [...] > > > This patch set has been tested on the new NVIDIA Vera Rubin platform, where > > SMT is enabled and the firmware exposes small frequency variations (+/-~5%) > > as differences in CPU capacity, resulting in SD_ASYM_CPUCAPACITY being set. > > > > Without these patches, performance can drop up to ~2x with CPU-intensive > > workloads, because the SD_ASYM_CPUCAPACITY idle selection policy does not > > account for busy SMT siblings. > > > > Alternative approaches have been evaluated, such as equalizing CPU > > capacities, either by exposing uniform values via firmware (ACPI/CPPC) or > > normalizing them in the kernel by grouping CPUs within a small capacity > > window (+-5%) [1][2], or enabling asympacking [3]. > > > > However, adding SMT awareness to SD_ASYM_CPUCAPACITY has shown better > > results so far. Improving this policy also seems worthwhile in general, as > > other platforms in the future may enable SMT with asymmetric CPU > > topologies. > I still wonder whether we really need select_idle_capacity() (plus the > smt part) for asymmetric CPU capacity systems where the CPU capacity > differences are < 5% of SCHED_CAPACITY_SCALE. > > The known example would be the NVIDIA Grace (!smt) server with its > slightly different perf_caps.highest_perf values. > > We did run DCPerf Mediawiki on this thing with: > > (1) ASYM_CPUCAPACITY (default) > > (2) NO ASYM_CPUCAPACITY > > We also ran on a comparable ARM64 server (!smt) for comparison: > > (1) ASYM_CPUCAPACITY > > (2) NO ASYM_CPUCAPACITY (default) > > Both systems have 72 CPUs, run v6.8 and have a single MC sched domain > with LLC spanning over all 72 CPUs. During the tests there were ~750 > tasks among them the workload related: > > #hhvmworker 147 > #mariadbd 204 > #memcached 11 > #nginx 8 > #wrk 144 > #ProxygenWorker 1 > > load_balance: > > not_idle 3x more on (2) > > idle 2x more on (2) > > newly_idle 2-10x more on (2) > > wakeup: > > move_affine 2-3x more on (1) > > ttwu_local 1.5-2 more on (2) > > We also instrumented all the bailout conditions in select_task_sibling() > (sis())->select_idle_cpu() and select_idle_capacity() (sic()). > > In (1) almost all wakeups end up in select_idle_cpu() returning -1 due > to the fact that 'sd->shared->nr_idle_scan' under SIS_UTIL is 0. So > sis() in (1) almost always returns target (this_cpu or prev_cpu). sic() > doesn't do this. > > What I haven't done is to try (1) with SIS_UTIL or (2) with NO_SIS_UTIL. > > I wonder whether this is the underlying reason for the benefit of (1) > over (2) we see here with smt now? > > So IMHO before adding smt support to (1) for these small CPPC based CPU > capacity differences we should make sure that the same can't be achieved > by disabling SIS_UTIL or to soften it a bit. > > So does (2) with NO_SIS_UTIL performs worse than (1) with your smt > related add-ons in sic()? Thanks for running these experiments and sharing the data, this is very useful! I did a quick test on Vera using the NVBLAS benchmark, comparing NO ASYM_CPUCAPACITY with and without SIS_UTIL, but the difference seems to be within error range. I'll also run DCPerf MediaWiki with all the different configurations to see if I get similar results. More in general, I agree that for small capacity differences (e.g., within ~5%) the benefits of using ASYM_CPUCAPACITY is questionable. And I'm also fine to go back to the idea of grouping together CPUS within the 5% capacity window, if we think it's a safer approach (results in your case are quite evident, and BTW, that means we also shouldn't have ASYM_CPU_CAPACITY on Grace, so in theory the 5% threshold should also improve performance on Grace, that doesn't have SMT). That said, I still think there's value in adding SMT awareness to select_idle_capacity(). Even if we decide to avoid ASYM_CPUCAPACITY for small capacity deltas, we should ensure that the behavior remains reasonable if both features are enabled, for any reason. Right now, there are cases where the current behavior leads to significant performance degradation (~2x), so having a mechanism to prevent clearly suboptimal task placement still seems worthwhile. Essentially, what I'm saying is that one thing doesn't exclude the other. Thanks, -Andrea