From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from SN4PR2101CU001.outbound.protection.outlook.com (mail-southcentralusazon11012025.outbound.protection.outlook.com [40.93.195.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1082039099F for ; Thu, 26 Mar 2026 10:42:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.93.195.25 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774521733; cv=fail; b=M16USZlczT9jGP+IkUhtc+UKU17md+EbAKrm6t/axScU0MLsEdJoPDpIpAMpZtNJqwIefOJZejoRgT/+5grFO0T2z/YDrl/q+J2ZsSrBByvztxLoF8pFKRaJDuSfmLOJCiBnTNGcbmIdHfVjKfCXyIDZ3PItfZmzKgLWebVXhOU= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774521733; c=relaxed/simple; bh=yL4VjO08clhZsRrzzAYs28vxrvTOlso6O/3RLQePgD8=; h=Message-ID:Date:MIME-Version:Subject:To:CC:References:From: In-Reply-To:Content-Type; b=dc2Qw+Yo0adMe6s1p2LWLDRLm0xHx4fdmauZADh7IYH8R8bBPLJHcUVzlScXWoIEwR+lVTQfYMIRNebHziu7SP4/bb+NFLmIMppp/sjp6/3BGju6rf5cbXNADPX6cIgFZluI7uD5dQL1GTVaykPSKT6XhRbYKtj/dyLzSHcrsS4= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=FrtoI4cd; arc=fail smtp.client-ip=40.93.195.25 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="FrtoI4cd" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=mTHiz9FAdFNSSwDLWtsbwjMeY72cBHPZeYFY0+Ln07vGUTAaeF/4ix4E8qq+XtcKRruAxGNUfvTQT+/W/mHGfHCfl8qYx3WOf4z4sgEJJLiKGuthE8nIr1mUycVsXuG6mKTFDUqJq+sllcYp4r9VTymmnLQw2QqhsSIHu7HvcpwtQLIcXhFagn3t7m3FGMNmdCEye+1QeQFjjp8oJs7ZSfICZYc2SpvT8R9Lp2nu31H25AIVK/nbz8HQyrVFF5l19c4bcXnEZo/cenfT8az9bzcXo6NmwPblVMRvREPZbvzqjLnfo3Ejdfr5nQ0U0ZP6mH2RArbF9BgB+zhBgpE51g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4/no+d7M7pA9wHG6LHltuYE9vVAZuxXeRXo8wZybwAo=; b=pSEf8SF7nIRS8pKIT8ZUOrjnXR/Tzdf5sRa7nQKe8QOzuUluHNoogpH0likdEPTpmxmXFleKGq5RjiuLXN1BZgkMGoc/NBsZYDbT2BwGZmYjyYpH0HnB0DFmmhbTDqNC/qGEnpV6+nzL0J/mwNNDZoef5rOMuA3ADXsQ9eOg8UUAuVGNrOOzOZ8wEMHZg0YvVlua21S1cVJSaZ8UisD9YdXKvPtdXZYVaN5iHRklpJNDPgzjCKhzMxFlmhY790tAjpK/ksyrbzjDulvcqUlRWsFrs9ZJj5Jhr+JuaFmmekdjUqBKw9YMuNtecI6A+X36kgC9UeYfcUxXXtqTJwCjGw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=4/no+d7M7pA9wHG6LHltuYE9vVAZuxXeRXo8wZybwAo=; b=FrtoI4cdydRUVrRtHAmnV9V47xbsuKxXgTLydIgV47ogWRomml7uPiVJjt81vb2hUMY20+vgla68NufqSHZiJXq+AnqRJfqDPbnCL4PYBuh5uc/3UizfJm52VGHv2o8jWmGmfCClnE64zA7zB/Fix1GtoNJXhDX7qaCy1zDtOcE= Received: from BN9P220CA0011.NAMP220.PROD.OUTLOOK.COM (2603:10b6:408:13e::16) by MN0PR12MB5883.namprd12.prod.outlook.com (2603:10b6:208:37b::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.7; Thu, 26 Mar 2026 10:42:06 +0000 Received: from BN3PEPF0000B070.namprd21.prod.outlook.com (2603:10b6:408:13e:cafe::7) by BN9P220CA0011.outlook.office365.com (2603:10b6:408:13e::16) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.22 via Frontend Transport; Thu, 26 Mar 2026 10:42:05 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=satlexmb07.amd.com; pr=C Received: from satlexmb07.amd.com (165.204.84.17) by BN3PEPF0000B070.mail.protection.outlook.com (10.167.243.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.0 via Frontend Transport; Thu, 26 Mar 2026 10:42:05 +0000 Received: from Satlexmb09.amd.com (10.181.42.218) by satlexmb07.amd.com (10.181.42.216) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.17; Thu, 26 Mar 2026 05:42:05 -0500 Received: from satlexmb08.amd.com (10.181.42.217) by satlexmb09.amd.com (10.181.42.218) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.17; Thu, 26 Mar 2026 03:42:05 -0700 Received: from [10.252.223.214] (10.180.168.240) by satlexmb08.amd.com (10.181.42.217) with Microsoft SMTP Server id 15.2.2562.17 via Frontend Transport; Thu, 26 Mar 2026 05:41:58 -0500 Message-ID: Date: Thu, 26 Mar 2026 16:11:57 +0530 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v6 4/5] mm: pghot: Precision mode for pghot To: , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , References: <20260323095104.238982-1-bharata@amd.com> <20260323095104.238982-5-bharata@amd.com> Content-Language: en-US From: Bharata B Rao In-Reply-To: <20260323095104.238982-5-bharata@amd.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN3PEPF0000B070:EE_|MN0PR12MB5883:EE_ X-MS-Office365-Filtering-Correlation-Id: 4346f67b-ca71-4536-52e3-08de8b2452fa X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|36860700016|1800799024|82310400026|376014|7416014|13003099007|22082099003|56012099003|18002099003; X-Microsoft-Antispam-Message-Info: Mt1lJnq6EeYS8E7hMm7t8Xbgt2KRqGAhJNfJf6sYMREK9TsM6152W1UwZvJQfOkNN1OrH6btw0dIzEeM6nCf608dpXN8uuO14rumZ7MHkdHVFekFeloE2KZB08+2a3DIQhP/4C+gPGlAMVn/vyfTYC/w2Gpf6OrNSQ3SrK4fpNjqgBzWeRSm8W4CTCuCxbGJfuSFETofrhKljvvxzh0Oz58A2fwxmKVaTBx07OKbWGdXalMEKAQLOgVPEDSVFS3rzFQNIyOjrHXC6O9FaugO0Fx1fou6uDkLzPyKRlBmVRZHkJFGDX1jYYKi9OsfqA0sLHnRlFbaqFHq4cbt1ZfYztDmS6uqa3DXzkgUm5GVgpo3YSul5/E1PNe0hKUEMbK9TEgPlzCEQP4tVJ75K6cq8jmgzf8uu9IZV1/6doEM0dEgho8rG6C5zQQYbxE04ovvXSoNq6q1oQEzPlKD84EDGgTXSYuF7l8V2xPFOVmYqB5eW50dC36/hJNWHRDxOAhoGaRjwI1Gumv0onR/5NXvCFOUPaSKh/o+fMOG6kY1AxWVPXI71/VIDOtb6e/6HZZEki5vgXzRN+XoDizTUysDdkrCoxEwl7B2g8zruVdTSCfIeMoXaUkJQRaqc56AZhm/hXv9VsQ2CBXYy1zg0SWqmUVEs1dFLCIyWB/+x0J7Z45oYth0gEaB/mSzZSBwH7i3CnNRHJhP7f8K6WcOY6MuXg== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb07.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(36860700016)(1800799024)(82310400026)(376014)(7416014)(13003099007)(22082099003)(56012099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: Cci+Xkdf5p+qkBNGsz7tJJmvr7q8gDuMKB71TA2xSHztFbwr9hFvgRZvDY7NsajHpMnx5Y64uQ078uZHv4lfsTiwwdKNG1Lx5tS5xg7sTJoaIluOGVgAZ99hQhRbtn8gm6ASvIClXbCHy4JXYcfwrcicjWydcmVZRv14119BIvKfgpjkauCMeHBKV8hvlgTIpoKE2zNDC0PzMC9Ny17H1Xn9k8sgrM6jkAsjqhJt3MDKI8XT1xNvHdVr2ov8h9wIzA+MMemwxOu0N69Ccro+Ox2RPw3at5hz1FT8GLtmL4SPYEF/Cinqovl0/2ypTD4V8jeVKUTyTCK6j5Gxe+/CLuOo57mTxZjQSdQpM0R1As8P/t4bKtKX5aqnsMsKC2QhzIHHEHe46ZnV8YNFLfmZpjmELoQbIbLBmDaYl8xOxetFR/jgarAqWKyW3wEFLNU2 X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Mar 2026 10:42:05.9978 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 4346f67b-ca71-4536-52e3-08de8b2452fa X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb07.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN3PEPF0000B070.namprd21.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN0PR12MB5883 Replies to review from Shashiko [https://sashiko.dev/#/patchset/20260323095104.238982-1-bharata%40amd.com] >> diff --git a/mm/Kconfig b/mm/Kconfig >> index 4aeab6aee535..14383bb1d890 100644 >> --- a/mm/Kconfig >> +++ b/mm/Kconfig >> @@ -1485,6 +1485,17 @@ config PGHOT >> This adds 1 byte of metadata overhead per page in lower-tier >> memory nodes. >> >> +config PGHOT_PRECISE >> + bool "Hot page tracking precision mode" >> + def_bool n > This isn't a bug, but having both bool with a prompt and def_bool is > redundant and can be confusing. Should this use default n instead? Sure. >> + depends on PGHOT >> + help >> + Enables precision mode for tracking hot pages with pghot sub-system. >[ ... ] >> diff --git a/mm/pghot-precise.c b/mm/pghot-precise.c >> new file mode 100644 >> index 000000000000..9e8007adfff9 >> --- /dev/null >> +++ b/mm/pghot-precise.c >> @@ -0,0 +1,81 @@ >> +// SPDX-License-Identifier: GPL-2.0 >> +/* >> + * pghot: Precision mode >> + * >> + * 4 byte hotness record per PFN (u32) >> + * NID, time and frequency tracked as part of the record. >> + */ >> + >> +#include >> +#include >> + >> +bool pghot_nid_valid(int nid) >> +{ >> + /* >> + * TODO: Add node_online() and node_is_toptier() checks? >> + */ >> + if (nid != NUMA_NO_NODE && (nid < 0 || nid >= PGHOT_NID_MAX)) >> + return false; > Could this validation allow node IDs that exceed the system's actual > MAX_NUMNODES? Unlikely as the nid gets passed down from other sub-systems like NUMA Balancing etc. > If PGHOT_NID_MAX is 1023 but the system supports fewer nodes, could passing > a large node ID bypass this check and cause out-of-bounds accesses in NUMA > node arrays like node_data or node_states? > Also, is there an off-by-one regression here? PGHOT_NID_MAX is the maximum > value that fits in the bitfield, so should this check be nid > PGHOT_NID_MAX > to allow the maximum node ID? Yes, will adjust this in the next iteration. >> + >> + return true; >> +} >> + >> +unsigned long pghot_access_latency(unsigned long old_time, unsigned long time) >> +{ >> + return jiffies_to_msecs((time - old_time) & PGHOT_TIME_MASK); >> +} > Does this timestamp wraparound allow cold pages to be improperly promoted? > If a page is accessed exactly at intervals matching the wraparound of the > 14-bit time mask (e.g., every 16.384 seconds at HZ=1000), the latency would > evaluate to 0. Would this bypass the reset of the frequency counter and > falsely trigger migration? The hotness timestamp is stored in a limited number of bits (14 bits in precise mode, 5 bucketed bits in default mode), which means it can only represent a finite window of time before wrapping around — approximately 16 seconds for precise mode and 4 seconds for default mode at HZ=1000. Latency calculations rely on modular subtraction of these truncated timestamps. This produces the correct elapsed time as long as the true interval between two events is shorter than the wrap period. When the interval exceeds the wrap period, there could be approximations. In practice, this should be acceptable because the system has two independent gates that a page must pass through before migration actually occurs: 1. Access frequency gate (recording time): pghot_update_record() determines whether consecutive accesses to a page fall within the same frequency-counting window. A page must be accessed at least pghot_freq_threshold times within sysctl_pghot_freq_window (default 5 seconds) to be marked migration-ready. 2. Recency gate (migration time): kmigrated_should_migrate_memory() checks whether the page's last recorded access is recent enough relative to the current time before actually initiating migration. >> + >> +bool pghot_update_record(phi_t *phi, int nid, unsigned long now) >> +{ >> + phi_t freq, old_freq, hotness, old_hotness, old_time; >> + phi_t time = now & PGHOT_TIME_MASK; >> + >> + nid = (nid == NUMA_NO_NODE) ? pghot_target_nid : nid; >> + old_hotness = READ_ONCE(*phi); >> + >> + do { >> + bool new_window = false; >> + >> + hotness = old_hotness; >> + old_freq = (hotness >> PGHOT_FREQ_SHIFT) & PGHOT_FREQ_MASK; >> + old_time = (hotness >> PGHOT_TIME_SHIFT) & PGHOT_TIME_MASK; >> + >> + if (pghot_access_latency(old_time, time) > sysctl_pghot_freq_window) >> + new_window = true; > Is there a missing upper bound on the sysctl_pghot_freq_window configuration? > The time is tracked in a 14-bit field, which represents a maximum latency of > 16,383 jiffies. If a user configures a window larger than this maximum > representable latency, could this condition permanently evaluate to false, > causing sparse accesses to accumulate over arbitrarily long periods? I can add an upper bound check. Regards, Bharata.