From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from SA9PR02CU001.outbound.protection.outlook.com (mail-southcentralusazon11013059.outbound.protection.outlook.com [40.93.196.59]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8765B3C4553 for ; Mon, 20 Apr 2026 21:42:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.93.196.59 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776721349; cv=fail; b=GnWUthEi4bRytxT/rAdre5bDZzUtRQdbupDDY1C8q/WyTloG5ID4dKuxF2rd3GxLcw5iD9O9xjBOww0l1KJvUHLLZZdRREZ2QZ+OB1alH+UnLNnlPbexEPqLz8KGyrWnJLKoPHaSYfmq7weB8ET+Z1vDwhCu/AB6Zviblz4xzTA= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776721349; c=relaxed/simple; bh=QyVk4dkVd2pkGesACzpKFxlORV9dF5fvqVKpNFtU8ss=; h=Date:From:To:Cc:Subject:Message-ID:References:Content-Type: Content-Disposition:In-Reply-To:MIME-Version; b=SdRYpUSQpNhYtz6ECNZjOCMXknTgwaY5DdUi+gX4xGZJrsszCia//b4EO+yTSUdw2HNDWfNGXUgGJTwPueiplxvihLl4HnzW/zgwc02RCbViqqgm/1yqshHW8vT5noC4NVN5AkRs/6jO8ehJ8SGrAC/5ckHArhOlxEwHAFjtjUk= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=le1oOCYk; arc=fail smtp.client-ip=40.93.196.59 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="le1oOCYk" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Uk9kOBYszA62c2Ojs/t55fBLu2QVwxj4cw67RjQfI6zIe+tjUuKMj7GqqDNxAuq0vbva/wJjABKEIC/Xy7/JNeIQixy+noJT3Wgp+X3vIQm1HvbmIPiorC5gMz/OS+i1YmUgVfo73uxWjlYKjd2XEcbhKU6kdZjWAFMGY+Tf3jmcKcJXy7qr3lWLQJTambtEY0X4yhXJ50ebBkRvoaE9NSZ0BqimEP0rcxKvJyNJeN/cDqWVw64DsYCFA9kObfA6hXEYMQTKy+7YRs+BTCzCqEeneamTXjVNki2P7Z62NVZtPdzlFO+sM+LgKEGyIIO56zsOn1OgORh5z86to0UoJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5EIeAeKC4H/Ad2bCsw+uYlGNX6apKnZtANDpU+SdEYc=; b=la9rPjKFN0bPN8v3MNnQkIKX5BYV8LXETR8F0Txw68NN1f1y6s4JI1robtbirunWbIf80A7OV48RmshsA/1USncUhb+DMX/bCzoqKefKv+Biv3xXsk9rP0lho8Ezmej7ZCoHVVKrcn2TtsmqldMi002Zu1WWUQYksPhZXWGjop5mIvk4mtzF8hjwzn6yJj/koDzuV+fNpBHrVSLBShvXCK1ZTeDpnUx6ygEhJ5pPO9vHV/jPKWHWWXI9pSVqJwe5eDfWcLVmNHoaRuholzXydxBmwNp+MQMQOsTE9vm30qSlaaZ2NITu8wLiynAl/Fgpom61E79X7r1I8IcePqhw1g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5EIeAeKC4H/Ad2bCsw+uYlGNX6apKnZtANDpU+SdEYc=; b=le1oOCYk5fQRS3Vr/Nd7QZNU0FlHRO/VTRwfmnlhX+1yaQ7UZJEbDCw/7pW448aiWjW0j/1RL2ZhB4q+QC5TCGNvR3Ga/i2qgOTSZ4aq9/KL6VD5JtWkFWu9uHVQUQceTCZN1YrNNfRb5ChtCYCVm2jFWcDNdScaLQAaFcPpPwmvKbk+CU4ZSEYK9V/JTFFb33xZHRVsNcm6C276wJafO9vfFwbH6aHxItLQ/wOqob38RKax+Y+BvxtqdbAmapqJLoRL9TOB/QNn6OtYYopZb8nhM7dEaNrSg+sGBIQAjKPgUf1mChrG7pN7MltbmlhjEOk8l8oFIRVTugABX/vkvg== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) by DS0PR12MB8245.namprd12.prod.outlook.com (2603:10b6:8:f2::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9846.16; Mon, 20 Apr 2026 21:42:23 +0000 Received: from LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528]) by LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528%5]) with mapi id 15.20.9846.014; Mon, 20 Apr 2026 21:42:23 +0000 Date: Mon, 20 Apr 2026 23:42:13 +0200 From: Andrea Righi To: K Prateek Nayak Cc: Dietmar Eggemann , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Christian Loehle , Koba Ko , Felix Abecassis , Balbir Singh , Shrikanth Hegde , linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2] sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection Message-ID: References: <20260403053654.1559142-1-arighi@nvidia.com> <20260403053654.1559142-2-arighi@nvidia.com> <64fe32e0-d428-42bb-beb4-2656d8781b0f@arm.com> <7313ba07-7b87-447c-9c48-2f6b2b53ac94@amd.com> <1230f5df-470a-4e59-8c8e-fa159a6fc093@amd.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1230f5df-470a-4e59-8c8e-fa159a6fc093@amd.com> X-ClientProxiedBy: MI1P293CA0008.ITAP293.PROD.OUTLOOK.COM (2603:10a6:290:2::11) To LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: LV8PR12MB9620:EE_|DS0PR12MB8245:EE_ X-MS-Office365-Filtering-Correlation-Id: 4404eb3f-da84-4274-feb8-08de9f25b490 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|7416014|366016|1800799024|18002099003|22082099003|56012099003; X-Microsoft-Antispam-Message-Info: tzonVAg5d2MEVKM0YS7h+jMwrWZjDq0wc82MTV6zwf25iDMXXzUelmn8s9P2qDHW7YpaPmvxdRGDxd4/eVzrvJUJ6TLIli+zAJlYCRA/mOKl8d1nu9oDMy2R0HDkkQvNQZC5fshHhr4B4RCBaBILtZTKPfIwdP0OIYAPpEZ3bthI+3EVTIQ91iqYZyUlCOWTmX1VwvpZu9OpeOJ0pFsWfBNthcj1mrNa4ILL6i4IW3YXk/yj+1XZzOpymp7DpHLJi66HfbRsoq5WrYiArqb7DQ2ZXvHvJKHd8fCxBbCe5kUQ9x4E0FQNdK8HkdZlHtpwSDz6e/cfk7zP24m8/TURIoGTu5sCGerYbrSYJMZ6cZzEptR9RUUdnqqB62d0LY25kiB10XzlRwwZx9OVlVMvaP9oSc9L8Vsk/cCobzlnYOOdng+D71loICq9wiVlKDPiykYVZjbIueOuJnZGWG3e4EHUz5w5k0PiyvKdyEv8qxlA4ru/hieofzoZ7pLngYEKf6lxlKxq0NzUmkCRauHyyQEgsE2/JOebieL1VafCEQi+/h+Bc3bTCbZjaHHvFK0BAqa8xwugTaA+hYjHwZfG3G1UYdFS/rXLgQCKNvAlg15wCyY2TS6HqxSvkxaQeyh/m7lrYNkWJLuU6n6MPKE26TZbScFCyf/YUrjsCto9DKDxtMQrmYSTiW587vqA29XVinKUTN598uEx1Z1z9VguQ+dkMI+aXL4upWuDcRm9rjw= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LV8PR12MB9620.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(376014)(7416014)(366016)(1800799024)(18002099003)(22082099003)(56012099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?+XmAYSt2bCLfwfUrY+aC38zGXyDs72JDPgo7kQ3bjDTTO+C/edQr+LDiwoUS?= =?us-ascii?Q?qS6T6+dgqI5esOs6n6C09eqcjA2Xf9XsLsmXWZQ9pcXnhtzvmucAVdUhpY2q?= =?us-ascii?Q?UB2B9J2AqpwBv6J5dzGzlP8sV2Zie09pUnlUs9yu+bhNF/I/8yfV9nH2uZAN?= =?us-ascii?Q?0jkwt6glxVR/XfHOF6Bm+MRYVYcP04XFEstaK2iKvwEIRkpB5gmcvGRa3yxE?= =?us-ascii?Q?T7U9G7JHRN0mfWPA8FVpK5p6JQYuzMT8b7sLweVzj3sfVxs7356sv9HSBHxE?= =?us-ascii?Q?xU5LqCVRNKdW3ICkAYXKymEar1/x/kqGiO/8ayKP1beNoL2B9GG66gkBucv8?= =?us-ascii?Q?obqh6Q4qY9Rii6yMPCvKiOt3oQBQoNokevEJouE+g5yAz3I4d7TRIgMdqFSJ?= =?us-ascii?Q?66AeC89Dvy7xx30rDob4mm9rYRd6hytLhDi8eCQcrCGmd00LGeD1kf5oTyzt?= =?us-ascii?Q?U/oei72Tieh068gG2XC+45Dvq5Z3x3Tj0QGRLsYhp1Qunet8qlZNKofOaoTv?= =?us-ascii?Q?0kPyGRRGkvb906O7kQ3clKAubvnCHwJxPBA2Dd2iRqNQUWCaFdGCQbN0xK2U?= =?us-ascii?Q?xYB8RtIf2drr0yTEz6neVXxF+LwAdTE2XJtBnnjLj1kSIAkpwFQ4BpoQULop?= =?us-ascii?Q?cLgjiPanliejl1zalY/iXBpiYzJzJ9jYxIMygAqTMi16a6dLkrAngFcWQvcb?= =?us-ascii?Q?B5JTMIP+7oBzRK1A1/7NMd4/Ann/z952R5OKIj/0WvWPUWVk+uq9hXD7Ki48?= =?us-ascii?Q?1pV9Jp7OrhrIfznlUso+yf4QcSXWAjlfs3KLr4YPau79bDMFkqJv4GS/CFqq?= =?us-ascii?Q?EX+Quu1eaJxAw9Ce4E5ZjFSn+iYgRjYauz5hPmnEZoil13QahVWGZ5u8yVEu?= =?us-ascii?Q?czu9jsf7s6aqYpKL6hrwLyI0NrCTfTsh9c9LypAF19TPXyz9290qtR3vg69s?= =?us-ascii?Q?wkEPMIQ2exIv2z7MBdIHwf9zSehBDzZhfEql6tS7o661tz6qAIX0EOCQnbC4?= =?us-ascii?Q?dFNETcVM9IBhslJwawClTTo16gI6OXDztOyLVv5M2LlaOkWeeJAGsSnyh/d1?= =?us-ascii?Q?Z47QZAk1ZzCqF5z8l6VkNg5Xq38NJ2MmbsxlycVDW/aVeNodTwxoBkB5xUYI?= =?us-ascii?Q?QyD2JWyJFHPcJgctxXClYEr0LdmdvKdA1YxElkJkERS0T68XBUzAox/W5N5Z?= =?us-ascii?Q?uGF5FtI5Jmzi+a++JSgDQ21O7UFV7VWAUwg1QDKBl18DukQPoQo/sBZi4koC?= =?us-ascii?Q?DVeGeP5GCI9ApJwVL7rjlpQz8UsZKyY1SGN5jxI4Ra08Th1S7zA+4LPsPr5c?= =?us-ascii?Q?y3VC9ZPlMclhHb4Sc5nAu8XuaXvyEJbQy2CVwhKWEBIZHj5dK1Obb2aHqBlY?= =?us-ascii?Q?u4fU3wmdF2pfgCV4dqW3s9CfJK7enyYXsThKqREZNb2OasjH5vwIZnQ4J75Z?= =?us-ascii?Q?4lVFoEJgjVovAzdB+YooTte2U+pV6iEMLYwPkIgxydMyrc6kpojDpmlKTMvw?= =?us-ascii?Q?GqRc8UXI3GP1CMrZ9Ezkme2PN2pvnhkreC9tRua6YWBpp1J/Vud89ffgDOC9?= =?us-ascii?Q?5dblVr0zMjdcuL7SxXXAuOEo9WutKxvL4Lkvxk4diRJjAVdEHOR2vT68ZsGX?= =?us-ascii?Q?+YTdjfEU+MtPgJxJejnt6nouKBhToRr+2DFNH5r1Ifwu9tkgUAiW8o0dib8y?= =?us-ascii?Q?n9dlCL6kMnMVypcBQ9QdSqI6Cp2HlXJDRmvrorLxhfqXB2WWuH26wUyVAG36?= =?us-ascii?Q?W5lx8XcrHA=3D=3D?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 4404eb3f-da84-4274-feb8-08de9f25b490 X-MS-Exchange-CrossTenant-AuthSource: LV8PR12MB9620.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Apr 2026 21:42:23.0538 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: vxHvt/y+MrYvM9DjRZ0Kxew+EMhbOfksczh77lZHeVg1zieeqkP1mHkr59OHHJUO9p4nOdI1QKzCn3ndT4yX0A== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB8245 Hi Prateek, On Mon, Apr 20, 2026 at 03:09:17PM +0530, K Prateek Nayak wrote: > On 4/20/2026 2:06 PM, Andrea Righi wrote: > >> With your changes, only two places actually care about test_idle_cores(): > >> > >> - select_idle_capacity() > >> - select_idle_cpu() > >> > >> If we go into select_idle_capacity(), we don't do select_idle_cpu() so > >> the two paths are mutually exclusive. > >> > >> In nohz_balancer_kick(), if we find, sd_asym_cpucapacity, we simply > >> don't care about the sd_llc_shared->nr_busy_cpus during balancing so > >> that begs the question if we can simply track idle_cores at > >> sd_asym_cpucapacity for these systems? > > > > Yeah, makes sense to me. I was planning to test something similar, so thanks for > > sharing this patch. :) I'll give it a try and report back. > > Thank you for taking it for a spin! I've tested this extensively on Vera and haven't encountered any issues. Performance wise I get similar results (with vs without), which was expected, as sd_llc matches sd_asym_cpucapacity in my case. > > >> I still have one question: Can first SD_ASYM_CPUCAPACITY_FULL be set at > >> a SD_NUMA? > >> > >> We'll need to deal with overlapping domains then but seems like it could > >> be possible with weird cpusets :-( > >> > >> But in that case, do we even want to search CPUs outside the NUMA in > >> select_idle_capacity()? I don't think anything stops this currently but > >> I might be wrong. > > > > My $0.02 on this. > > > > In theory it could happen with unusual topologies or constrained cpusets, > > although it should be quite rare. That said, select_idle_capacity() already > > operates on the span of sd_asym_cpucapacity, so if that domain crosses NUMA > > boundaries, we're already scanning across NUMA today. This patch doesn't > > fundamentally alter this behavior. > > Ack! I was just thinking loud from the topology standpoint since > sd->shared is not designed to handle the overlapping domains like > sg->sgc does but we can probably figure some way to make it work. > > Using the ring topology example from topology.c: > > 0 ----- 1 > | | > | | > | | > 3 ----- 2 > > Consider NUMA-1 below gets the SD_ASYM_CPUCAPACITY_FULL flag: > > NUMA-2 0-3 0-3 0-3 0-3 > groups: {0-1,3},{1-3} {0-2},{0,2-3} {1-3},{0-1,3} {0,2-3},{0-2} > > NUMA-1 0-1,3 0-2 1-3 0,2-3 > groups: {0},{1},{3} {0},{1},{2} {1},{2},{3} {0},{2},{3} > > NUMA-0 0 1 2 3 > > > The "sd->shared" assignments at NUMA-1 will put first, second, and the > last domain in the same "shared" range by today's logic since the first > CPU in their span is the same although their spans are slightly > different. > > The third will be standalone since the first CPU of the domain span > will be different. Yeah, makes sense. I'm wondering if we should attach the shared blob to sd_asym_cpucapacity only when asym is a non-overlapping domain, otherwise fallback to sd_llc and, in this case, ignore has_idle_cores in select_idle_capacity(). This might be not the best in terms of efficiency on those exotic topologies, but it'd eliminate the overlap/aliasing risk, while still being correct. What do you think? Thanks, -Andrea