From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from BL0PR03CU003.outbound.protection.outlook.com (mail-eastusazon11012028.outbound.protection.outlook.com [52.101.53.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70EFB30AD02 for ; Tue, 27 Jan 2026 03:25:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.53.28 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769484321; cv=fail; b=KYLG4uM51iOPlsSg1fSPhiOf/fiVH0FVy1xZ8lB7UOsqQu4ynrbWlLnOJhXSVVr6ihyiEI+ahyb8mKfwsOwz9gbQttlQQYsNrmcrGZavL0wC1hN9IwrM9ooy6l1rJpCPcvOSgn9k/BdJsI/J0d+LCEKdie7kICSc3fRnMOCReLg= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769484321; c=relaxed/simple; bh=9ghvM5ufotIcSrAd/zmomfPIJdtDLA+dA/dXE+WMtO4=; h=Message-ID:Date:MIME-Version:Subject:To:CC:References:From: In-Reply-To:Content-Type; b=nDG7UJtKOOOEZQT2lKkvh2WWb8mF1KKFdpBzjrn+S4yLiOO2hDZajZKnij/He3rEqZ7qtIFQuTTa8VA1tFFYsmDVaP4BHxYXf8nqCporpupBxL4GToLgv7P5bLfUiutHfXEhjU8lM/CfEjAgj9bkS8h51X9KxBRT1IibIhqc6eA= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=0/BhI4+g; arc=fail smtp.client-ip=52.101.53.28 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="0/BhI4+g" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Xy0hpC9VF6fXUnVfW5Nx1ji3bAsmYD18oYgKCe9bEdoPpjSfqOmpJmhdClGodSppt1KDU6a8qknkFTsj4xhzehUw0Aa2PDxioVHQXi8i9qtkYMj8OOLr0xnFuAomxWS5OHR+r7WWzvxhtHmSAiYXW5Q36xuO9OSi47X/OlsMkVpv1ke7SPONyvQNnMjGU2ifjNgJSmXYp4eGI9Q5a7g4O5qNEKkyGQcnb9ozeE4bVQUyJUn5o9Tz/XurwTLkOR0t9R4ChwvqYMqf4Kv8fJu6sBsMuy3yAgN2FwZKf8fBMPchTEI+QJYLYB/378HOTwB3E9ht/HL0W1XecAj2Z5jqqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=rhIYoCDqvA74Pc2KCrqW9HI/gf7crBb5Ysy7lWRxXpc=; b=fJbh8ThXLtZdd/2vKagRMn1ypKIH04+WB/pKBKlBN6BloIiEe9VcJZPSJPruLBRKkees+0GZdO29Sk95xFRdKS3mFkTW+T0XPCJhsR5YsBdVlA3AQmDhmyG753tTQcqb5mxqdPMyrRvrrCd9kEHjQbX6Sp92WwyCBUj7ZE8okXDHnE73x97a85HGMThNj/+E/74QtwTyGYdeajr4ty+uVDk6DH71s/0F9KSQ3dXGTyFiZtDxxhkp9oe8oFNPbSE/44YFSt/nC7i5fxUQdV7tCJcUEfqXI+Lxd6A+hKg+oVYzEzP5VID9/cIaAKpenrR6AdG6eqVn7JZr4ULnexwfZQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=gmail.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=rhIYoCDqvA74Pc2KCrqW9HI/gf7crBb5Ysy7lWRxXpc=; b=0/BhI4+gNJrViogLgK57D5u8ryGUsOAVkw5fQRvleL6KvduOvbfAjw8rW+reUnCe0IuIQMpTvv0nI+rafvK/ftOaWqRfOdMrbv8PuJwzc+DPEdkuCMEcGSWjP9uE7Im221/f3sjXeU9VTq2B9U3qCOc+tLwuPBkRDZPkRXGKvQw= Received: from SN7P222CA0012.NAMP222.PROD.OUTLOOK.COM (2603:10b6:806:124::26) by CH2PR12MB9517.namprd12.prod.outlook.com (2603:10b6:610:27f::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9542.15; Tue, 27 Jan 2026 03:25:13 +0000 Received: from SN1PEPF000397B4.namprd05.prod.outlook.com (2603:10b6:806:124:cafe::31) by SN7P222CA0012.outlook.office365.com (2603:10b6:806:124::26) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9542.15 via Frontend Transport; Tue, 27 Jan 2026 03:25:13 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=satlexmb08.amd.com; pr=C Received: from satlexmb08.amd.com (165.204.84.17) by SN1PEPF000397B4.mail.protection.outlook.com (10.167.248.58) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9564.3 via Frontend Transport; Tue, 27 Jan 2026 03:25:12 +0000 Received: from satlexmb08.amd.com (10.181.42.217) by satlexmb08.amd.com (10.181.42.217) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.17; Mon, 26 Jan 2026 21:25:11 -0600 Received: from [10.136.38.16] (10.180.168.240) by satlexmb08.amd.com (10.181.42.217) with Microsoft SMTP Server id 15.2.2562.17 via Frontend Transport; Mon, 26 Jan 2026 21:25:08 -0600 Message-ID: Date: Tue, 27 Jan 2026 08:55:07 +0530 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3] sched/fair: Cache NUMA node statistics to avoid O(N) scanning To: Qiliang Yuan CC: , , , , , , , , , , References: <20260126110250.1060512-1-realwujing@gmail.com> Content-Language: en-US From: K Prateek Nayak In-Reply-To: <20260126110250.1060512-1-realwujing@gmail.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SN1PEPF000397B4:EE_|CH2PR12MB9517:EE_ X-MS-Office365-Filtering-Correlation-Id: 9bb0e69e-7a2f-4746-79cf-08de5d53aed4 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|1800799024|376014|7416014|36860700013|7053199007; X-Microsoft-Antispam-Message-Info: =?utf-8?B?L25zbWtKNFM0bi9rRWYrZEtWaEFkbWloa1plZTB6Wld0Wk1xc1pkWVoxYnhS?= =?utf-8?B?SVZiTXpNQk9PbUc0U0J1ZFNzdmc1c25PdEkwQ0FhS2NFcGFnb2dsajBjNkRO?= =?utf-8?B?akUydFUvRmV3T0s5VldOZy92WFU2SFZCMjJncHRRdll0OXY2blZwVHdEUnd6?= =?utf-8?B?YklqaGtOOFVQRUlEYmFrRGluZWhidEUyYTNQZGcrN2N4UEJWQmJ1VEFnWEdk?= =?utf-8?B?aWZ4V2NjVnAyVDVhUWpraU9VRGkrQ3dZTnh1MkhPOWhXNUxSSGd1em5ZdnNN?= =?utf-8?B?bWZrcXpMcUQ3Zm5kSjVGZEJYUDdPanRidGRTUjNlaE1vYUtMMENhU3hMQVBQ?= =?utf-8?B?T0o4ejh2amxTZkJZQlNoOHd3SG1Xcm9CMExtSk9oR0Q2VXdTcWx3TmRWYU9k?= =?utf-8?B?U0xxc0RXSVlzR3pUZDdZV3l2clVJTFJMeS91VCtPYktPYnI5T0FoTVZGbEli?= =?utf-8?B?Z3A4WlVsTU9weFZodWoveDJJcmJjU3VMZ0d2b2pQVWcvQWN4VllNTWlZeTVC?= =?utf-8?B?VUJIakphZldFNkozL3hBdWVZTk04SUI2RHRWck5EV2svUHR1d2VLcmdFRlRC?= =?utf-8?B?UUpNTzBPNmZuVTVkUEltZjF6aDN2V2M4L0U4cE1wYTh4TGxFMHB0VVN4dVRO?= =?utf-8?B?dHFWSEJ4S085YkhIZGZydDlKeUhNWXo4eGRvYms1OWZTb0NaalZZeWNKbXJv?= =?utf-8?B?T0V6b250RmpnQkRubkVTbTNleklsUjZ5dWswZ09VN2FkVk1GRnIzQ1hBZ3da?= =?utf-8?B?bWxnN2JSWFRtY2F5TDcwTVN5ZSs0UitMODVNVXNraXBIU3VNb1VydDFvMnRi?= =?utf-8?B?RjlEc0hIdGlUMEVvdlRXVEdSOGo4amNkTU5ZN0dkU2tjelpkUFNLVURuelBv?= =?utf-8?B?S3crajFudEVBcDJzT2djTy9FVmU3Y3FWTkRWTHNuQkZJQWVhTUQxM2hmZUNE?= =?utf-8?B?VjB1aTZtYXZVbCtmTWl6MG1MRHBNM0M5Q1U5RFJGNVRSZExIVkdodWhCYWhG?= =?utf-8?B?L3R4cUVwOThGaE53K05nU3JxQlQ2ZE9ZUkZsd2RUbWU4MnUxZFBDRkZHVHNI?= =?utf-8?B?c0FWenJTUC9KRkxjbndRTFFBb2tIeEg4clVPcG9TcFdnYnVJQTJHY1R6dVl1?= =?utf-8?B?SGh6SzkvelE5ODJUakErUlpxck4rUlJhelRyQkdoc3djTEtDREJyU2tCYmpV?= =?utf-8?B?WXY3Z2FyQS9CRVFZWU13NDZGaUJyemV6MVdqdTIrRUlFdEVsY1hEQlJURnhO?= =?utf-8?B?b04wT3V2MVBWM29ja0xYdkt2N1pKOGlrbWU2VUR1dEVqT1VIVFpGYi9rdlF2?= =?utf-8?B?dWhxLzgvQW0zSzMvM3puV3JPNE1xLzlGV2t0YUNFdUtjT0ZJbnk3WDl5M3Nn?= =?utf-8?B?eStteTBNMzVOdmp5NEdVck9hcnNxRmZUTFArenJTS0cvY0pETHJSR3cvYTIx?= =?utf-8?B?eVl4VTRiTmpWY2MyRlY2ZUV3dkRtRDVGWDlWTE1QRDhnMnVzeVo4T1hPUjBo?= =?utf-8?B?bklUQXB5VE1ubm84em5mNWN2M1YyVHg4NXJYWDVEN2RwOWpyZlBiV0M0cDNN?= =?utf-8?B?U0FIdjFsSVBFR01OeE1aeHNsc0VOdjJjbDl6K003YkI3VHFwWHJBR09SR1JD?= =?utf-8?B?c3VuNE9veXo4b3ZsbklBMXFPMU5iQ1hGc205dE83TGxKdGJHdDRuTE9XNEZK?= =?utf-8?B?M1JkVkNrTmpPeXYwZnJqSEJKbmplbHdLcUN5MERRSEhrMFpRb1BLd2hHaDRI?= =?utf-8?B?bmVnZ3V1bEFyZjhPdERKWk40T2NCZGpnSzdpZ29kSEQ4MWF5cjRzSmFadkhn?= =?utf-8?B?R0VnQnRxd1hSUTEzalFZVERtYUJwem5oSXF6SFJrQTFzQWY4N2hpbzV3OXhj?= =?utf-8?B?VjUzYkVnbk93TWdldm1sU0FIUVl5NDRod0VyMnlBM3l3SXR6UWlZTjBtQjVw?= =?utf-8?B?YkJhb2xCdHBYQXBadCtNcXVxY3JLSHVaZlNwR0NqTlBZTnVYdEM0SHYwUEtI?= =?utf-8?B?Y1BrMW05cU91dnF5V2lZMHNoOTJlbUVLZTgxOEhlcHlkMVlNNk9MbFY4Y2t4?= =?utf-8?B?dlNOdmpaY3lwalNzUXAxZkNCZHQyUFZyQVhJV1o2K1ZvenQzb0Nqa0xmVVdu?= =?utf-8?B?ODhvUVlZWHJmUFJIOHkzallMUXpvUDZmdTBZOWVFS1pNczArNU5TQStlMnFj?= =?utf-8?B?SnYwUTNwZU9FRVcwbHlhZDNLcGQybFk3MG96U1gvVXdnNlNBUVY1cUh0amRt?= =?utf-8?B?ZGs0VWdCcUVPR1hCL2Y1b1RRMGd3PT0=?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb08.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(82310400026)(1800799024)(376014)(7416014)(36860700013)(7053199007);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 Jan 2026 03:25:12.9410 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9bb0e69e-7a2f-4746-79cf-08de5d53aed4 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb08.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SN1PEPF000397B4.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR12MB9517 Hello Qiliang, On 1/26/2026 4:32 PM, Qiliang Yuan wrote: > Optimize update_numa_stats() by leveraging pre-calculated node > statistics cached during the load balancing process. This reduces the > complexity of NUMA balancing overhead from O(CPUs_per_node) to O(1) > when statistics for the source node are fresh. > > Signed-off-by: Qiliang Yuan > Signed-off-by: Qiliang Yuan > --- Missing a changelog and the performance numbers that justify this change. > kernel/sched/fair.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 44 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index e71302282671..070b61f65b6d 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2094,6 +2094,17 @@ static inline int numa_idle_core(int idle_core, int cpu) > * borrows code and logic from update_sg_lb_stats but sharing a > * common implementation is impractical. > */ > +struct numa_stats_cache { > + unsigned long load; > + unsigned long runnable; > + unsigned long util; > + unsigned long nr_running; > + unsigned long capacity; > + unsigned long last_update; > +}; > + > +static struct numa_stats_cache node_stats_cache[MAX_NUMNODES]; MAX_NUMNODES is a very large value. Why do you need to have this all up front and not dynamically allocate it during sched domain build. Speaking of sched domains, partitioning the system can make it so that the NUMA domain is split across multiple partition which makes these numbers partition specific. Tasks running in one partition cannot use the cached values from another partition. If there is really a noticeable benefit, I would suggest using the previous method to cache it somewhere in the sched domain hierarchy - but only if there is a noticeable benefit. > + > static void update_numa_stats(struct task_numa_env *env, > struct numa_stats *ns, int nid, > bool find_idle) > @@ -2104,6 +2115,24 @@ static void update_numa_stats(struct task_numa_env *env, > ns->idle_cpu = -1; > > rcu_read_lock(); > + /* > + * Algorithmic Optimization: Avoid O(N) scan by using cached stats. > + * Only applicable for the source node where we don't need to find > + * an idle CPU. > + */ > + if (!find_idle && nid == env->src_nid) { > + struct numa_stats_cache *cache = &node_stats_cache[nid]; > + > + if (time_before(jiffies, cache->last_update + msecs_to_jiffies(10))) { > + ns->load = READ_ONCE(cache->load); > + ns->runnable = READ_ONCE(cache->runnable); > + ns->util = READ_ONCE(cache->util); > + ns->nr_running = READ_ONCE(cache->nr_running); > + ns->compute_capacity = READ_ONCE(cache->capacity); So READ_ONCE()/WRITE_ONCE() doesn't solve the issue I was highlighting in the last version. Say the following happens: CPU0 CPU1 ==== ==== update_numa_stats() /* Working on current numa_stats_cache */ ns->load = READ_ONCE(cache->load); ns->runnable = READ_ONCE(cache->runnable); ... interrupted update_sg_lb_stats() ... ... updates the entire numa_stats_cache ... ns->util = READ_ONCE(cache->util); /* Sees new data. */ Can this cause an issue? If not, please highlight in the commit log why it is not an issue. There can be cases where we see util > capacity, util > runnable, etc. which might lead to incorrect calculations later on. > + goto skip_scan; > + } > + } > + > for_each_cpu(cpu, cpumask_of_node(nid)) { > struct rq *rq = cpu_rq(cpu); > > @@ -2124,6 +2153,8 @@ static void update_numa_stats(struct task_numa_env *env, > idle_core = numa_idle_core(idle_core, cpu); > } > } > + > +skip_scan: > rcu_read_unlock(); > > ns->weight = cpumask_weight(cpumask_of_node(nid)); > @@ -10488,6 +10519,19 @@ static inline void update_sg_lb_stats(struct lb_env *env, > if (sgs->group_type == group_overloaded) > sgs->avg_load = (sgs->group_load * SCHED_CAPACITY_SCALE) / > sgs->group_capacity; > + > + /* Algorithmic Optimization: Cache node stats for O(1) NUMA lookups */ > + if (env->sd->flags & SD_NUMA) { Also you'll need to think about partitions. -- Thanks and Regards, Prateek