From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B5AA1CD98C5 for ; Sat, 13 Jun 2026 17:21:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 17D956B00A5; Sat, 13 Jun 2026 13:21:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 156366B00A7; Sat, 13 Jun 2026 13:21:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 044566B00A8; Sat, 13 Jun 2026 13:21:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E1AD76B00A5 for ; Sat, 13 Jun 2026 13:21:52 -0400 (EDT) Received: from smtpin14.hostedemail.com (lb01a-stub [10.200.18.249]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7748C12048B for ; Sat, 13 Jun 2026 17:21:52 +0000 (UTC) X-FDA: 84875556864.14.2D2A723 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) by imf26.hostedemail.com (Postfix) with ESMTP id D9F8D140003 for ; Sat, 13 Jun 2026 17:21:49 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=qualcomm.com header.s=qcppdkim1 header.b=F3d3zfgV; dkim=pass header.d=oss.qualcomm.com header.s=google header.b="XzcJOq+/"; spf=pass (imf26.hostedemail.com: domain of pranjal.arya@oss.qualcomm.com designates 205.220.180.131 as permitted sender) smtp.mailfrom=pranjal.arya@oss.qualcomm.com; dmarc=pass (policy=reject) header.from=qualcomm.com ARC-Seal: i=1; a=rsa-sha256; d=hostedemail.com; s=arc-20220608; cv=none; t=1781371309; b=012fKVauv4CwjTfAl1IZhamE7LnZkI0P6WaNFI+cWH3MCBGAHY/4mhG4M6KEsRQhZcY56Z EymacHAkTBFTh02t7CTgDb8TKodldQOkAnI4WPPgAkKY+Hk992Bx/GAPtdPKi5we/48Idw Y2NGzQdycIA/w15zOHeEltdCGdY83yU= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=qualcomm.com header.s=qcppdkim1 header.b=F3d3zfgV; dkim=pass header.d=oss.qualcomm.com header.s=google header.b="XzcJOq+/"; spf=pass (imf26.hostedemail.com: domain of pranjal.arya@oss.qualcomm.com designates 205.220.180.131 as permitted sender) smtp.mailfrom=pranjal.arya@oss.qualcomm.com; dmarc=pass (policy=reject) header.from=qualcomm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1781371309; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CEwkWdeK2JOxz6CkWOPVdEpzOJip0EPBtxLmX7NxFto=; b=xBW4JkS6FP0mSjtkdpz3dHfP/9eqXPovQt8vkRiYwsYCYv0+q0uBnL8rmA5Qorxwzn4+BT jvWOem1hJKHzLJNsXgyDRtg0/C1+fTDODtN3jpiR32DHoRYThqvHaR4dAb5NkZf0B7w1gS B6Mgkd6kbdhjwpooRCuGGgQpq87mUTQ= Received: from pps.filterd (m0279869.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 65DFApg33283442 for ; Sat, 13 Jun 2026 17:21:49 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qualcomm.com; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=qcppdkim1; bh= CEwkWdeK2JOxz6CkWOPVdEpzOJip0EPBtxLmX7NxFto=; b=F3d3zfgVQWpLzqDb rVHYFKrrSxRwKj/qdAheSyaCiab4OUQWb9AmJytKD3J+LB2MSuXW61DmfyhWWjQW CKmSuBYXIsTMaCuJta5D/TNu7gIaHsh9A5GAuZ/gXS9zcqHIf9hlA1jCzoNRtgwF XDb/aaLKwthBrUC4viEg6Q5lQEbeFENAEkS5QY9hz3+FSxHIrO2tMQCRyXM4MuZv nnkelEy0BYA4jWhypdLJwcJFQJ4Mis0U+I6U6dJMWpa3vFkmEcv6xNI2ea4Acv/S RsFPtl9UlF192CxhabyjaVZVonAv+/T7imas3ScKscxTW8/NGozkJHK/zhVRH21L OmNUeQ== Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 4ery8wsmpa-1 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NOT) for ; Sat, 13 Jun 2026 17:21:48 +0000 (GMT) Received: by mail-pf1-f200.google.com with SMTP id d2e1a72fcca58-8423f544944so1418216b3a.3 for ; Sat, 13 Jun 2026 10:21:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oss.qualcomm.com; s=google; t=1781371308; x=1781976108; darn=kvack.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=CEwkWdeK2JOxz6CkWOPVdEpzOJip0EPBtxLmX7NxFto=; b=XzcJOq+/reOLGFdQhdAAR4trIEVH8fSa907ieLXjhm7LmmWgx4EbrRN8TdZo5umy8d K0zTeL2McgxGAeyWY0OZvKxyXGYjnFN6bC7k6Q/HFvkFu+4DuKHXqszHIZyRRCRLZ1u8 Q6xzXvbwwY4mdMnf5dWs99GyYAFme4/n27nrpjo20r/BaCUOefq7/WSQ6MAQFUnB9t9K b2/YEaowswOc1qVI+vyh7N6+Zv4TWcTOzYpue6IB/eZwtFuBnXj1FnNAaGrnFgvOy1xG hdqtiGt7OZUTiLTvWPqYop+xJPhUMucPiES/b5vV9Q4qsb6AywXc5b4Zrzc30Umyv1Kp M5FQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781371308; x=1781976108; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=CEwkWdeK2JOxz6CkWOPVdEpzOJip0EPBtxLmX7NxFto=; b=CkRRGc96K7adFvgEIm1ilIHsfPnHoUMk4P5zQOlo+o1NgHvTWcCBAUCmXVtLTLPRex iJMTdgOGnn5/jNzJyX7XvZv70wh6qRw6IGgeDEcazIYMKWzERP2KwMcqVTAezTxSpgAt F/QjxR8iOnnce/pqH7v3/KE4xb/hHZN0424uHi3sMwZmnFIO4vbLD5tbfRnuTrJdYrnL aaY2Pwyn7Wfj47vsoYKGf8EiPc6J6gSQvOZYv/0zUwMowF0loa0fWYizYD/TBeAYiV3x dGh+2fpISrfB7S6HygpvKcPYCnAaWK50vTqeOuW2x5SeNwG2kcFWbCwTcUISX2BSUJ10 2fzQ== X-Forwarded-Encrypted: i=1; AFNElJ92zH8CjC9zgNSwakkEi81SKS2i9Wwy+KD/Y9zThPkzBy8ZTbrC/gI7hKr+oNnDgtQ/FwDC52sQIw==@kvack.org X-Gm-Message-State: AOJu0Yw9WddLws/BMoEfphOr2p3TVaFh0CjUnEApMDa0pZvwiBDS+lzK qoCKDFZdeRNfxedgxmXILcVbDRaJYivtBoAtk3mPR8/Qam/uZJ+PYAeytTFdEoxZ1a0og/SLNVT JNUmDeE9RGcsggKhhmdH+1IMwGbThoEEJ/3yiAj7AsAhvyDI+EhJH9g== X-Gm-Gg: Acq92OGyN1YrmSW/9I+i71G28mDZc/1f8PSCKrdrLajxO8E9VcV7lSvyisObv9JIv+3 pF7IFe1r4d5j8WjpfQRJSNk2nNhxY+qYhnLDbow2hPPZg9obJFFJwJjO56WLWg/2q4I76cVjIZK O0ZKA7Sv5gH1I4+zbwJ8EHeOWQ3dhDABAkiInTsx7uxcf8ms3sWw4zjoDuuOupDXFT1p/Hh9Qwn ylfpO/6SwrNPo40Jk+Fad6UQx3TKMMM/fTMQ/KtXOfSEpwCkA+FL5XJAhCWQgk1drPYtJM08qzA wtXjtrajec97LO5Q+iPe0cuJZUofq2HouqFX4KPjjYGjk8tDCvMqrMpg1iZvCK648njcJq6TXnN sBqEnxrtvOD31ZHbgmj+P/OcUOwPg09/RiUfVVvECN6lVlLBVpwjE0w== X-Received: by 2002:a05:6a00:3996:b0:842:3be7:4d57 with SMTP id d2e1a72fcca58-8434ce31498mr7995854b3a.18.1781371307769; Sat, 13 Jun 2026 10:21:47 -0700 (PDT) X-Received: by 2002:a05:6a00:3996:b0:842:3be7:4d57 with SMTP id d2e1a72fcca58-8434ce31498mr7995808b3a.18.1781371307197; Sat, 13 Jun 2026 10:21:47 -0700 (PDT) Received: from hu-pranarya-hyd.qualcomm.com ([202.46.22.19]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-8434accbec5sm5390913b3a.16.2026.06.13.10.21.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 Jun 2026 10:21:46 -0700 (PDT) From: Pranjal Arya Date: Sat, 13 Jun 2026 22:49:52 +0530 Subject: [PATCH RFC 10/12] mm/vmalloc: per-CPU caching of free ranges from the maple_tree allocator MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20260613-vmalloc_maple-v1-10-0aa740bb944b@oss.qualcomm.com> References: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> In-Reply-To: <20260613-vmalloc_maple-v1-0-0aa740bb944b@oss.qualcomm.com> To: Andrew Morton , Uladzislau Rezki , "Liam R. Howlett" , Alice Ryhl , Andrew Ballance Cc: linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, maple-tree@lists.infradead.org, Lorenzo Stoakes , Pranjal Shrivastava , Will Deacon , Suzuki K Poulose , Neil Armstrong , Mostafa Saleh , Balbir Singh , Suren Baghdasaryan , Marco Elver , Dmitry Vyukov , Alexander Potapenko , Shuah Khan , Dev Jain , Brendan Jackman , Puranjay Mohan , Santosh Shukla , Wyes Karny , Pranjal Arya , Sudeep Holla X-Mailer: b4 0.15.2 X-Developer-Signature: v=1; a=ed25519-sha256; t=1781371215; l=10345; i=pranjal.arya@oss.qualcomm.com; s=20260516; h=from:subject:message-id; bh=PsqiD8G6HiZiANmDP/iDjGIPvGoKXcMiDUmCdx9BhDA=; b=Rzw5Lud8MlSAcq9OrXE/82DTua23HOpREVjTa+P4bOWSQXGGH2GBYsa2/2zyORRZl+X2/MnQV Ju0LVd2eH8fBQb6oGLJiYWu3DKB8RDzJ/6N+2REGYcU7x9VGIQ29X4s X-Developer-Key: i=pranjal.arya@oss.qualcomm.com; a=ed25519; pk=ymtcTlccEIDsi3ErhpjIoZZHKdPBYWGWW0Lchs5MsbE= X-Proofpoint-ORIG-GUID: I_41Uz0ckrh5MMR2Gd_IMPhOyxQUQNLT X-Proofpoint-GUID: I_41Uz0ckrh5MMR2Gd_IMPhOyxQUQNLT X-Proofpoint-Spam-Info: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfX7SEyjLh/XR64 7N1JVSwlvEy0Desf+OMdT+MA7NVGmwi+Hm//K75pmP7PfGwKoaX9kus0TbvSeIew5rK3lQbj2yn oG7aYkLPRh+YjWX3LOIezY2ifj8abbU= X-Authority-Analysis: v=2.4 cv=IqAutr/g c=1 sm=1 tr=0 ts=6a2d91ac cx=c_pps a=mDZGXZTwRPZaeRUbqKGCBw==:117 a=fChuTYTh2wq5r3m49p7fHw==:17 a=IkcTkHD0fZMA:10 a=FelO9ux0wxsA:10 a=s4-Qcg_JpJYA:10 a=VkNPw1HP01LnGYTKEx00:22 a=u7WPNUs3qKkmUXheDGA7:22 a=_glEPmIy2e8OvE2BGh3C:22 a=EUspDBNiAAAA:8 a=FUy_0n9OcN9IEB7T7O0A:9 a=QEXdDO2ut3YA:10 a=zc0IvFSfCIW2DFIPzwfm:22 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwNjEzMDE4MCBTYWx0ZWRfXwbEL3HbaKBU/ +QoZrJZ39RJfXv1mMh17D4wR97/1h66lHksuyCKXtX/ewiqxjwSY9ceDR8P3Hf8i60UFWtPeZWE NS9mYdU8EPSN5+szpKKFr6F5zhM397k0Pv6g8v8/HOYRpyhzI4g6SsjE2phfilKqWbdVqHb4OuK vL795jm825j/v5AdJiMWfNVcC4bw3SomHn6sCnQ3U132vcLdBkL2UPrPLBtcLtN1wjLr8WaK0n0 CSvdjKuIrD9jv1Jia0nRlt+eRoU2IKXboUNepZ2/+iJcL8A9Feojd+X6i59093DQzaowSECNvQP 5EAh1G8WS5whvTS+YCJIv3IeghP9O4jWVJGApnPeB/CsLVr3sgkf5LNCqkjGloMTO8ZROHcQ5iE e5FoTZj4GZZrdNXF5rDhlnfEfAK5skH3VHvfsIzI2x4v86SqEnH22l9aUnnjrN0fBelZMN2bbnT PceznLExaBw9UBE7OFg== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1143,Hydra:6.1.125,FMLib:17.12.100.49 definitions=2026-06-13_03,2026-06-12_03,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 suspectscore=0 priorityscore=1501 malwarescore=0 phishscore=0 adultscore=0 lowpriorityscore=0 spamscore=0 bulkscore=0 clxscore=1015 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2606040000 definitions=main-2606130180 X-Rspamd-Queue-Id: D9F8D140003 X-Stat-Signature: zm9uiohoqbxergx4fywf34qkpkpank5d X-Rspamd-Server: rspam03 X-Rspam-User: X-HE-Tag: 1781371309-942326 X-HE-Meta: U2FsdGVkX1/bc4F9nTPUqFPx6SIcKLm8b8fOHygua+njL0XztE84ca6tWQTClk702g9EVKZF+pUEDOnvkOnYnaJjGpDfVKTiFVmuVFSejhAEXQ/9dvCjZhSU5uxUIAEUIqj1hFr7R7g/TQXMB/QJjo+Arn4PHP7AeTKc1NWll8iQDuaTJD0jZgVGmQNv14hvn7vA/OJ87w4mmtKm9OxHqglJye4W9OYvTC6g26ssU9vbo9HfmgPospO6p+cT1gtYBxDyRIMyxW6oy/qKOH3HHzyku+l2eU1Nh/gf80WVtTzXnsaHYMnajWHmGqHT1ozP+rPQJKy/NKpLCafar9G/YQjTFf5q/dtAOYVTIzpooDFQArF6yv3rT+onSL0jfN+UvReNRhA09s316XbIcmOlIpjB8mBYcsYrA4EDayUWUE+pQYrA5UcF6OiFN3jXx1O3wln30HUSmHM0FxTsC30KJN+C8b48MMSHSLUtVyeIzaKe2MXBkyKufeoWFr3Oz3rVbe3AOH1DwZt28XUrq6FdkB/Oi1vIgJAhOxmgmqgmfudXZPtvKrE9rwmyS/mB29UTPfz7Yy4t5E0hNtBNXWNJcWRawRNRirqas/uNgMX9TFAKzCyq/35Kkzc0V6MMbI6Hxq4PMw01fDvUtoxo7Tc0tWbJ1pSzpHMsM/DC+HDK+nx9tsfrx/oqzngEb++lmj7MHYQ3p/Xvv1QDPjBEWaqsbXv9xZFGT26k18BI+kMHFQE5cx/1pi38Fnei87uctCpSD+jooesPojTDbv8lWJ6JQkkmLTlDBQIdddaWBRYiP504+HKhW/sANmD2aDheRPjlrpxPWYxjtXkiN/aIAlov234tb8jCYlid4sfFKg0elij0N9EM2gY8o/PHoIxzCgABKRXbZYc2kHjWx+2eLbWagzWra9R1Bd1SMf5MpyU5u//6B4rv2f317S8r3wmLY8gMNcvcHuaV9SU+Fy4Dstc hPVeOt/e ig0JCiSjNuEaJu5MCu+Ay8S4fhBp/KcWRLIfBl9E/AMLEm0yC6up2nagqZ7h17dpEv9aLwQdw79MB6jJipAGzh0l8xYHGXQvoQPZOS6O+blp27KtvazPWhnzg6jDkhU9P8JDogl2YB9Y07mL9lyIZ8lHhDfdQzR1RihN3fPUYfkYl23liBdPvvATG5rF8cgr5AShd2ZWc7ePkpGf3AxXDO8QgsSB/M/+2ic/SC4LFjuuOa4v8MyihO21d8XFO3opH4x0ecIWLWTlycWx6mPsnYwtDQ4ZT4AH9e+E+GCmmqlkteZ1DWKejdywb7I0+picYt4c2/QwZVo1MGgf6FHrip/EehOubLtZoWTK3Tk1knv0dhS9Ze08hvG0QsFMmyaTqbweVhaPFyBUnM/+HzzMGDDZ4mrH3pllyA2Om3W/uAb5dm2Vz9fVb5cdMnnDIigcMUD3/uxyiKtFGxrjPjymr4PSPixjzBYhix1gN23aD8ayvCPizOb/ucGzAJqw3a768pHMGyLoYBiSG31oIV7LfbUMkgNrg2jR/yUp5BrgkXP63Sjiw9/wGlOOEKFH8HYU359WT3536yTczbnU= Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Now that the alloc path goes through the maple_tree-based gap finder (mas_empty_area), amortise the cost of visiting it for the most common shape of vmalloc call: short-lived, page-aligned, PAGE_SIZE-multiple allocations. Each CPU reserves a 64 MB chunk via __alloc_vmap_area -- the same maple-backed allocator the global path uses -- and dispenses page- aligned allocations from a bump pointer inside that chunk. Chunk reservation and drain are the only operations that touch the global allocator; per-allocation work stays entirely per-CPU. When a chunk's allocation count returns to zero and it is no longer the per-CPU current chunk, vmap_bump_unlink() releases the chunk's range back to the global allocator via occupied_mt_erase_range_locked -- the same maple primitive the consolidate-occupied-tree patch made authoritative. The chunk install path uses occupied_mt_store_range_locked symmetrically, so cache lifecycle is expressed entirely through the maple-tree's range primitives. Per-CPU access uses preempt_disable() rather than a spinlock; the chunk pointer is per-CPU and only mutated by its owner. The chunks list (vmap_bump_chunks) is gated by a single global spinlock that is taken only on chunk install/release, not on the fast path. Why this overlay sits on the maple_tree migration ================================================= The overlay relies on three primitives that maple_tree provides natively and that the augmented rb_tree allocator does not expose in a clean form: - Bare [base, limit) range reservation. The augmented rb_node carries a vmap_area-shaped subtree_max_size consulted by find_vmap_lowest_match. A chunk reservation has no associated vmap_area object, so it cannot be stored in the augmented tree without either synthesising a fake vmap_area per chunk or introducing a parallel range tracker with its own augmentation discipline. maple_tree stores [base, limit) ranges natively and the gap walker (mas_empty_area) returns the lowest free region in a single descent, sharing one primitive with the regular allocation path. - Sentinel range storage. occupied_vmap_area_mt records a reserved chunk as XA_ZERO_ENTRY over [base, limit), sharing one index with ordinary in-use vmap_area ranges. The augmented rb_tree has no equivalent of XA_ZERO_ENTRY: a chunk would have to live in a dedicated structure, doubling the alloc-side state surface. - RCU range traversal. vmap_chunk_lookup() must run lock-free so that cross-chunk vfree() does not take a global spinlock per free of a chunk-resident allocation. maple_tree supports RCU traversal as a property of the data structure; rb_tree-side equivalents (lib/rbtree_latch, hand-rolled grace-period accounting on top of rb_tree) impose write-side cost and would have to be added to vmalloc as new infrastructure. After the migration these three primitives are part of the allocator API; the overlay reuses mas_empty_area() for chunk refill, occupied_mt_store_range_locked() and occupied_mt_erase_range_locked() for chunk lifecycle, and maple-tree-friendly RCU for the chunk-list lookup. No parallel data structures are introduced. VMAP_BUMP_CHUNK_SIZE = 64 MB derivation ======================================= The chunk size is the smallest power-of-two value that satisfies three independent constraints: 1. Eligibility coverage. vmap_bump_eligible() requires size <= VMAP_BUMP_CHUNK_SIZE / 2 so that any single eligible allocation fits with room for alignment slack. The largest standard-range vmalloc() callers in tree are the module loader (modules can carry up to ~32 MB of text + RO data + RW data on architectures with full kernel module support) and BPF JIT buffers (capped near 4 MB). Setting CHUNK_SIZE = 64 MB keeps all of these on the bump fast path; halving the chunk to 32 MB would push module loads to the slow path. 2. Refill amortisation. The global vmalloc lock is taken once per chunk refill, paying for ~CHUNK_SIZE / avg_alloc_size bump allocations between lock acquisitions. At avg = 4 KB (a plausible lower bound for typical kernel vmalloc traffic), 64 MB amortises to ~16,000 fast-path allocations per global lock acquisition; at avg = 1 MB, ~64 per lock. Doubling the chunk size beyond 64 MB barely improves this ratio. 3. Address-space cost. Each CPU pins a chunk-sized reservation within the vmalloc range. On a 32-CPU server with the standard 128 GB x86_64 vmalloc range, 64 MB chunks reserve 32 * 64 MB = 2 GB = 1.6 % of the range. On arm64 with CONFIG_ARM64_VA_BITS=52 (256 PB vmalloc), the cost is negligible. Doubling to 128 MB pushes the x86_64 reservation to 3.2 %, which is still acceptable but starts to matter for workloads with high CPU counts. Per-chunk metadata associated with each chunk is sized as sizeof(struct vmap_area *) * (CHUNK_SIZE / PAGE_SIZE), which scales linearly with chunk size and stays at a constant 0.2 % overhead regardless of the chosen value. At 64 MB this is 128 KB per chunk. 64 MB is therefore the *minimum* chunk size that meets constraint (1) and (2) simultaneously; constraint (3) sets the upper bound and allows growing the chunk if module sizes grow in the future. The constant is exposed at the top of the bump-allocator code block so distributors can tune it for unusual configurations. Allocations that don't match the predicate (non-page-aligned, larger than half a chunk, fixed-VA, or with NUMA constraints) fall through to the existing __alloc_vmap_area path unchanged. Signed-off-by: Pranjal Arya --- mm/vmalloc.c | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 107 insertions(+) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 463127d5ce58..65ee80eaf4bf 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2467,6 +2467,98 @@ static inline void setup_vmalloc_vm(struct vm_struct *vm, va->vm = vm; } +/* + * Per-CPU bump-allocator overlay. + * + * Each CPU reserves a contiguous chunk of vmalloc address space and + * dispenses page-aligned allocations via a bump pointer. The chunk's + * range is reserved through the global allocator once; individual + * allocations within the chunk avoid the global maple-tree work + * entirely. Each allocation still gets its own vmap_area struct and + * is inserted into the per-node busy.mt, so find_vmap_area() and + * vfree() continue to work unchanged. + * + * Recycling: chunks leak in this minimal form. With 16 MB chunks on a + * 128 GB vmalloc range, the address space supports thousands of chunks + * before exhaustion. A future iteration can add chunk recycling via a + * va->bump_chunk back-pointer + refcount; deferred to keep this hot + * path's struct vmap_area footprint at 48 B. + * + * Constraints: only the standard vmalloc range with align <= PAGE_SIZE + * and size <= VMAP_BUMP_CHUNK_SIZE/2 takes the bump path. Anything + * else falls through to the existing __alloc_vmap_area path. + */ +#define VMAP_BUMP_CHUNK_SIZE (64UL * 1024 * 1024) + +struct vmap_bump_chunk { + unsigned long base; + unsigned long limit; + unsigned long bump; +}; + +static DEFINE_PER_CPU(struct vmap_bump_chunk, vmap_bump); +static DEFINE_PER_CPU(spinlock_t, vmap_bump_lock); + +/* Try the per-CPU bump-allocator. Returns the chosen address or + * a negative IS_ERR_VALUE on miss; callers fall through to the + * regular path on miss. + */ +static unsigned long +vmap_bump_alloc(unsigned long size, unsigned long align, + unsigned long vstart, unsigned long vend) +{ + struct vmap_bump_chunk *chunk; + spinlock_t *lock; + unsigned long aligned, addr = -ENOENT; + + if (vstart != VMALLOC_START || vend != VMALLOC_END || + size == 0 || size > VMAP_BUMP_CHUNK_SIZE / 2 || + align > VMAP_BUMP_CHUNK_SIZE / 2) + return -EINVAL; + + lock = this_cpu_ptr(&vmap_bump_lock); + spin_lock(lock); + chunk = this_cpu_ptr(&vmap_bump); + if (chunk->base) { + aligned = ALIGN(chunk->bump, align); + if (aligned + size <= chunk->limit) { + chunk->bump = aligned + size; + addr = aligned; + } + } + spin_unlock(lock); + return addr; +} + +/* Refill this CPU's bump chunk. Reserves a fresh range from the + * global allocator. Old chunk's remaining space is leaked (the + * already-allocated VAs in it stay live; the unused tail is wasted). + */ +static int +vmap_bump_refill(gfp_t gfp_mask) +{ + struct vmap_bump_chunk *chunk; + spinlock_t *lock; + unsigned long base; + + preload_this_cpu_lock(&free_vmap_area_lock, gfp_mask, NUMA_NO_NODE); + base = __alloc_vmap_area(VMAP_BUMP_CHUNK_SIZE, PAGE_SIZE, + VMALLOC_START, VMALLOC_END); + spin_unlock(&free_vmap_area_lock); + + if (IS_ERR_VALUE(base)) + return -ENOMEM; + + lock = this_cpu_ptr(&vmap_bump_lock); + spin_lock(lock); + chunk = this_cpu_ptr(&vmap_bump); + chunk->base = base; + chunk->limit = base + VMAP_BUMP_CHUNK_SIZE; + chunk->bump = base; + spin_unlock(lock); + return 0; +} + /* * Allocate a region of KVA of the specified size and alignment, within the * vstart and vend. If vm is passed in, the two will also be bound. @@ -2519,6 +2611,19 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, } retry: + if (IS_ERR_VALUE(addr)) { + /* + * Per-CPU bump-allocator fast path. On hit, no global + * tree work runs at all. On miss, refill the chunk and + * try again before falling back to the regular path. + */ + addr = vmap_bump_alloc(size, align, vstart, vend); + if (IS_ERR_VALUE(addr) && (long)addr == -ENOENT) { + if (vmap_bump_refill(gfp_mask) == 0) + addr = vmap_bump_alloc(size, align, + vstart, vend); + } + } if (IS_ERR_VALUE(addr)) { preload_this_cpu_lock(&free_vmap_area_lock, gfp_mask, node); try_init_free_mt_locked(); @@ -6214,6 +6319,8 @@ void __init vmalloc_init(void) init_llist_head(&p->list); INIT_WORK(&p->wq, delayed_vfree_work); xa_init(&vbq->vmap_blocks); + + spin_lock_init(&per_cpu(vmap_bump_lock, i)); } /* -- 2.34.1