From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88370C197BF for ; Thu, 27 Feb 2025 20:48:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9400D6B007B; Thu, 27 Feb 2025 15:48:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8EF376B0082; Thu, 27 Feb 2025 15:48:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 79B4D6B0083; Thu, 27 Feb 2025 15:48:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 5943B6B007B for ; Thu, 27 Feb 2025 15:48:32 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 07E5881D5A for ; Thu, 27 Feb 2025 20:48:32 +0000 (UTC) X-FDA: 83166912864.24.4BD46FE Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by imf27.hostedemail.com (Postfix) with ESMTP id 28E7A4000B for ; Thu, 27 Feb 2025 20:48:30 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2023-11-20 header.b=FWd8Egt3; spf=pass (imf27.hostedemail.com: domain of sidhartha.kumar@oracle.com designates 205.220.177.32 as permitted sender) smtp.mailfrom=sidhartha.kumar@oracle.com; dmarc=pass (policy=reject) header.from=oracle.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740689310; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=vtzYSLWofQEg5GnL4CHfpwShCIiXcvD22mMwWpFecxY=; b=h0gjoyKNjIeFgrUKqrfix5IKPCksDCuTp5ykPLC8FYaJqibYhBOIxw9oGz/Ws/FaOnd0Li 6+l5D22DMCex43kF/OtK4TidCAj98rjl3fJ3bPV5Wi75C4IQ5AQbkFo6GO29Xjd9gYwfvw 2p2Wx9a3DhhFoW3e/xeqaXTFDe9voyw= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=oracle.com header.s=corp-2023-11-20 header.b=FWd8Egt3; spf=pass (imf27.hostedemail.com: domain of sidhartha.kumar@oracle.com designates 205.220.177.32 as permitted sender) smtp.mailfrom=sidhartha.kumar@oracle.com; dmarc=pass (policy=reject) header.from=oracle.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740689310; a=rsa-sha256; cv=none; b=Rif2TRTRkd+cyFj76KHbGxO3PgGJf7SeVkMc3JvXibSQb2y9w3qb+nG5wdlmGZD3JdmD8s uBOkvspMl5ecR/y5S+qA4KKzMi6uzW+ucMatS3NB50sz60RtxTuwR8jfV88h71o+Eh6QOG VsfHWQQNe1WrBExhX7yhRcxTjTFAMcI= Received: from pps.filterd (m0333520.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51RJis2U020630; Thu, 27 Feb 2025 20:48:28 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=cc :content-transfer-encoding:date:from:message-id:mime-version :subject:to; s=corp-2023-11-20; bh=vtzYSLWofQEg5GnL4CHfpwShCIiXc vD22mMwWpFecxY=; b=FWd8Egt3l4hWqz0dsQBLXhJnRkCt1upQaByRXfDI9rWNZ ifDb4vHWs3KuPR2fRoD1sRXQ+FOC+3WiGldyzkUPdxA4NthlIvCZpByM498zHnm1 W99A7XDQykgoqzcmMPUsFY7uP8SALJnkMdyAAQB8wD9gYMUXfDFjyUP8KfFSEdYM dfuoc3kRqNp6U0IcP38102KF+as1uTjvnLgul/Kvc4g/5zyY43ezVL0nttZhA7dd lAbXFHpymDg2P0KQda93NFo6mD4PipJDkC+WS3/Ii8qYyY1qBdZhAzlDZmBWfu4h 1Bd9xX5WAfE9lIHNGHSRWGqyzDyy18t0V+ulp6/Dg== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 451pscccmd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Feb 2025 20:48:28 +0000 (GMT) Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.18.1.2/8.18.1.2) with ESMTP id 51RJGgf9012623; Thu, 27 Feb 2025 20:48:27 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 44y51dwrx7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 27 Feb 2025 20:48:27 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 51RKhrvc030883; Thu, 27 Feb 2025 20:48:27 GMT Received: from sidhakum-ubuntu.osdevelopmeniad.oraclevcn.com (sidhakum-ubuntu.allregionaliads.osdevelopmeniad.oraclevcn.com [100.100.250.108]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 44y51dwrvc-1; Thu, 27 Feb 2025 20:48:26 +0000 From: Sidhartha Kumar To: linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org Cc: linux-mm@kvack.org, akpm@linux-foundation.org, liam.howlett@oracle.com, richard.weiyang@gmail.com, Sidhartha Kumar Subject: [PATCH v3 0/6] Track node vacancy to reduce worst case allocation counts Date: Thu, 27 Feb 2025 20:48:17 +0000 Message-ID: <20250227204823.758784-1-sidhartha.kumar@oracle.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-27_07,2025-02-27_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 spamscore=0 mlxscore=0 adultscore=0 bulkscore=0 mlxlogscore=999 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2502100000 definitions=main-2502270154 X-Proofpoint-GUID: 0B2tKnjDNF41CsIjASYpjsewG4mkUtM0 X-Proofpoint-ORIG-GUID: 0B2tKnjDNF41CsIjASYpjsewG4mkUtM0 X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 28E7A4000B X-Stat-Signature: 5jugizuc74zz6fz6m6ee9zh7i5fs5nk8 X-HE-Tag: 1740689310-906066 X-HE-Meta: U2FsdGVkX1+A1AfS5/bXCawdpjLDhRlo/IqdFvZe5lzeL+YbQ5+Brw+Xs8UQNvKQX4bZbvY3B3bz6b9anNNRlMPEmtbXC55G66A+XNGVfCHqNxbN/Sse+g6PxcK+MtK0/lwGL+LQ0/GC2EExPktk3D9KHDrXhkaCqTiwMwo0hl/Jh88hzlrQcT8dp/lMnA328Too8GR0GWRjkYL61f95KyR9Ufbo3paoluRGPj1hN5mpv+CGaLY80zfymOA3iwJ3iWzZdRhpa24HKHsHrLvz9Fl4rG5NM5HJ3Su9yHYuvSxU0nLu3upuhmFfBVZ8Uba6mo6p4zRKF0wXY+SC498CsCo8SVaozrjFcyhr3c2i8j8q1FxespFDNmHabgikMcozMDetdO9mrupqLhH1xf6GgqQkzeaCC6psE0vIIV9UNmALRCkIkoYBJCUXe2j3T3NmSS3cIexwiGnpqh6wnahST2fNjTp3AGqI/xDCSR8HDT6o3oKFNL8t50MTwDTJCJtqjOhBRhQ1M9CyjPJC24EsNrDpZA0mzJdJPD1NNDGyfro0+hkz+eSON8Pq6FNDS0p0jJbtvf7c/nA09Lb9SnhQthTfjgeaPKHUjK8fyxwTkse6r8TznHSPJGmDFIY25HUcy5o6G5eCLy6IJ5sfRxDBFeKj3mRn84QaGwAMu2nyPiUXWzF8VauwI+0V6Uz4wxYDSooAKNEz0kIPV4MBmaJIKR7Q07/5tK8ynLGbqGoPGrmils6M7UB7c8gJyx/nq+CrGZXhzPVcHrTOs4hPvN0lRIvKLqK7UJE8CmzqN55FxO1J7P+f+VjqtP84TG70gPAQr2/q7aJs6uA5Pz+R5+S2ogsCbMynAx3LkavD1c30+//2G0UsTG9E3rdjDiTME86T1+25Lx2+TaJvBeCNS5BX92VNtEapZeJNv69tUGjaQy6e9rIvAxpxz4aGVSlvCfhld25bGp5bLF31ZL8zORE oiYAILqq 5i+t+87ZdYP6MhqxnsPYrDx7y9ZoeUX4eHNMR7S7zJysezciN79JAwMSM2KML+N/k25eUEVKWcKP8Kb01MNwWiei9GzK7UfvQGcronY4rj62zvljCQ8VriBREJm6WNcbT8rx8i584mzW5/pZZLSHYYVh4dbGyTS4HMRvL7GkeFmNRSVZu/f7+C+fgsWC01S32YM5WSmbNUuQ6Dd+LH/H02uVuLoogu9avX+XNHCO3E5PkksT/5ZfQYHzONxFI+l8k1RJ8x5FlbLMaEsSackLWwbmiwhPQLCwJrVDyQH9ZQo407twYLF20TG1SDTe047+Uyqph X-Bogosity: Ham, tests=bogofilter, spamicity=0.000039, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: v2[2] -> v3: - add r-b to patches 1,4, and 6 - update function parameter comments in patch 2 - remove line that sets mas->depth in patch 2 - remove redundant code for checking for a spanning write in patch 3 - rewrite commit message of patch 5 for additonal context and clarity v1[1] -> v2: - fix comment for vacant_height which refers to depth per Wei - add a patch to reorder switch case statements in mas_prealloc_calc and mas_wr_store_entry - use sufficient height in spanning stores - modify patch 2 to use a counter to track ascending the tree rather than overloading mas->depth to have this function. ================ overview ======================== Currently, the maple tree preallocates the worst case number of nodes for given store type by taking into account the whole height of the tree. This comes from a worst case scenario of every node in the tree being full and having to propagate node allocation upwards until we reach the root of the tree. This can be optimized if there are vacancies in nodes that are at a lower depth than the root node. This series implements tracking the level at which there is a vacant node so we only need to allocate until this level is reached, rather than always using the full height of the tree. The ma_wr_state struct is modified to add a field which keeps track of the vacant height and is updated during walks of the tree. This value is then read in mas_prealloc_calc() when we decide how many nodes to allocate. For rebalancing and spanning stores, we also need to track the lowest height at which a node has 1 more entry than the minimum sufficient number of entries. This is because rebalancing can cause a parent node to become insufficient which results in further node allocations. In this case, we need to use the sufficient height as the worst case rather than the vacant height. patch 1-2: preparatory patches patch 3: implement vacant height tracking + update the tests patch 4: support vacant height tracking for rebalancing writes patch 5: implement sufficient height tracking patch 6: reorder switch case statements ================ results ========================= Bpftrace was used to profile the allocation path for requesting new maple nodes while running stress-ng mmap 120s. The histograms below represent requests to kmem_cache_alloc_bulk() and show the count argument. This represnts how many maple nodes the caller is requesting in kmem_cache_alloc_bulk() command: stress-ng --mmap 4 --timeout 120 mm-unstable @bulk_alloc_req: [3, 4) 4 | | [4, 5) 54170 |@ | [5, 6) 0 | | [6, 7) 893057 |@@@@@@@@@@@@@@@@@@@@ | [7, 8) 4 | | [8, 9) 2230287 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [9, 10) 55811 |@ | [10, 11) 77834 |@ | [11, 12) 0 | | [12, 13) 1368684 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [13, 14) 0 | | [14, 15) 0 | | [15, 16) 367197 |@@@@@@@@ | @maple_node_total: 46,630,160 @total_vmas: 46184591 mm-unstable + this series @bulk_alloc_req: [2, 3) 198 | | [3, 4) 4 | | [4, 5) 43 | | [5, 6) 0 | | [6, 7) 1069503 |@@@@@@@@@@@@@@@@@@@@@ | [7, 8) 4 | | [8, 9) 2597268 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [9, 10) 472191 |@@@@@@@@@ | [10, 11) 191904 |@@@ | [11, 12) 0 | | [12, 13) 247316 |@@@@ | [13, 14) 0 | | [14, 15) 0 | | [15, 16) 98769 |@ | @maple_node_total: 37,813,856 @total_vmas: 43493287 This represents a ~19% reduction in the number of bulk maple nodes allocated. For more reproducible results, a historgram of the return value of mas_prealloc_calc() is displayed while running the maple_tree_tests whcih have a deterministic store pattern mas_prealloc_calc() return value mm-unstable 1 : (12068) 3 : (11836) 5 : ***** (271192) 7 : ************************************************** (2329329) 9 : *********** (534186) 10 : (435) 11 : *************** (704306) 13 : ******** (409781) mas_prealloc_calc() return value mm-unstable + this series 1 : (12070) 3 : ************************************************** (3548777) 5 : ******** (633458) 7 : (65081) 9 : (11224) 10 : (341) 11 : (2973) 13 : (68) do_mmap latency was also measured for regressions: command: stress-ng --mmap 4 --timeout 120 mm-unstable: avg = 7162 nsecs, total: 16101821292 nsecs, count: 2248034 mm-unstable + this series: avg = 6689 nsecs, total: 15135391764 nsecs, count: 2262726 [1]: https://lore.kernel.org/lkml/20241114170524.64391-1-sidhartha.kumar@oracle.com/T/ [2]: https://lore.kernel.org/lkml/20250221163610.578409-1-sidhartha.kumar@oracle.com/ Sidhartha Kumar (6): maple_tree: convert mas_prealloc_calc() to take in a maple write state maple_tree: use height and depth consistently maple_tree: use vacant nodes to reduce worst case allocations maple_tree: break on convergence in mas_spanning_rebalance() maple_tree: add sufficient height maple_tree: reorder mas->store_type case statements include/linux/maple_tree.h | 4 + lib/maple_tree.c | 193 ++++++++++++++++++------------- tools/testing/radix-tree/maple.c | 107 +++++++++++++++-- 3 files changed, 218 insertions(+), 86 deletions(-) -- 2.43.0