From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8010AE77197 for ; Thu, 9 Jan 2025 06:27:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1676E6B0092; Thu, 9 Jan 2025 01:27:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F1186B0093; Thu, 9 Jan 2025 01:27:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EFACC6B0096; Thu, 9 Jan 2025 01:27:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CD6146B0092 for ; Thu, 9 Jan 2025 01:27:49 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 802F51A0C76 for ; Thu, 9 Jan 2025 06:27:49 +0000 (UTC) X-FDA: 82986932658.28.9A95DC7 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf20.hostedemail.com (Postfix) with ESMTP id C70831C0009 for ; Thu, 9 Jan 2025 06:27:47 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=none; spf=pass (imf20.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736404067; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JN0h9CxLTXoTtwTqKsK/lrIXbbkU/ovw5avQ+ZZ3Yng=; b=HeOkBUG+Z1M5MnZz8KYrkJP3Bc7Uarf3I8imLEWVnLyaNyGZdET3TiZyVwHrYIQade7qku ngmWPOeiNHIJw758PYIxJo/a+uH76Ljtod3FypYyBgjp+QH0L1WLjO1pSJUN2XJxI3h1fj 1zbQQ1uunLqbOcjI4Pam+UeSewHj5MI= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=none; spf=pass (imf20.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736404067; a=rsa-sha256; cv=none; b=Wg07388vfnymjuBETR06VtAT1vfs6BfUZKXfAH8Qh43MP9XuyGy8LO2aoXFTGGYqQnj76c DgiVeaVcjZGSPD9vXasMKA7qL0fGO+vpqxYZBgea5XkUx9J6ZHUEfGXQXc6FQKdZ3Qun2y OEZ2rSpCJi+Dwlc3Iibf4/WDac4I2l0= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3AC291516; Wed, 8 Jan 2025 22:28:15 -0800 (PST) Received: from [10.162.43.52] (K4MQJ0H1H2.blr.arm.com [10.162.43.52]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 806903F59E; Wed, 8 Jan 2025 22:27:35 -0800 (PST) Message-ID: Date: Thu, 9 Jan 2025 11:57:32 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC 00/11] khugepaged: mTHP support To: Nico Pache , linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org References: <20250108233128.14484-1-npache@redhat.com> Content-Language: en-US From: Dev Jain In-Reply-To: <20250108233128.14484-1-npache@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: C70831C0009 X-Rspam-User: X-Stat-Signature: b3chsk8gf1k4egxwsi33ysoy8ngs3a3d X-HE-Tag: 1736404067-682120 X-HE-Meta: U2FsdGVkX1/0u4t0iWOIAI3wSf89ZLVlsOmSU+VPbSdmkhn5024ZGQC/BZv5qtZBDo8c1YNQ27ynGVPlUeSfULmaaL19wgwbeQpDjawM76g5GJLgjpXQ0ucxlveiqx+yM+tw+zIMwn/Qc33imYBdkMywOv0nIskDSO8fF8WGfF+JYU6da/33HvRycISHvmuRObaajPJNPx2WxGlLVflesm66WiViR84HNxwEWnLqzUXqtaXU5C4qMeRQUBWknf4UXH/e6OnqD1P8scc3hR/6RPoDLCpCdZWNiNebu0jL6RAp3VkWGawkPEJCXFxm4lQo6NGretdigwhiplZMS7H7kWGyk1h3Bagntd58gYrCHrVVRLjt0gq1RqFbQwNhUdWpOtxKAFjk9SSk5SaymIloHm7l6bf8nKMzPJ3tYiekDVx4e7zvTJOPcdsrmju937N8MjCmb768b8uEWe3cfPZRsXW4+6rEN/ECv79e04fsYUZHSFctsI6il+cznIiJyIOmeBVhKIXf84ulWd24hgYEZO+8F689t2K8j63HSm5GV4aPMCmHTXf/SOiQtRiqDvhfwd2Cq08AdDJUjvZRulb7DyO8R7aXciudNYKmUzq+FrgRnxT+z0N+BO1Sz2656kEZoF/swCFRj0iSobtBwxpzk+k1EoTa+HLRgp/JypAXWQ+z1eVH/XF79zzKGzXHUyWe3YH8ys2EtmoNG0+KnRwQFHMoHxRRdNbdOwbVRp2ZaGBZskWwyGBDEqL2QRmQ+bmqzNTNx8NA2q7DlGT3vydVUfSmYCbz3yXKpcGLrthOoDAvaPymHaPOyzvgM+8s6UndVMnolLOSx9fAW57BBE5dDS85wxpgPGQ+X/r1tZkd0/5qlia9MbOVdK4xN5BYahBGZTQeZLpAfu2i7x+M8Q/LphxjzVEmVXJyq2wwFGiq1u0Y2113cl3nDAROkXkLB4CfTpOmCF3vD97QGtpoCB0 i7lW82Kd ZD+JyA/GAK7V/9tBKXQqZRZJiqCa2Iy2tJlGtR7un8HrinItdM3xIxCnIDBtpd1Cveyn9hqYn2ES3oMCD40v7b0bOGIqMw9GiDIysmOBq9rBYqYKveSa6mo1qaXnNJEhjIwf47R6RoWXq2CPrVWXUbV+7BjOSiHqGLg5eNBUqAgUG3EfQeC4tYC1Njp2u1ZdnYFVvN0hGAlltefbzhJzomMWisJJJyuvITwHuIv7feuR1liR8Rp7lECJ2U7dImhsc67uXHluC5EovaPwLR+1tErf44dPhm7P1KXCZmgmL8MBZLig= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 09/01/25 5:01 am, Nico Pache wrote: > The following series provides khugepaged and madvise collapse with the > capability to collapse regions to mTHPs. > > To achieve this we generalize the khugepaged functions to no longer depend > on PMD_ORDER. Then during the PMD scan, we keep track of chunks of pages > (defined by MTHP_MIN_ORDER) that are fully utilized. This info is tracked > using a bitmap. After the PMD scan is done, we do binary recursion on the > bitmap to find the optimal mTHP sizes for the PMD range. The restriction > on max_ptes_none is removed during the scan, to make sure we account for > the whole PMD range. max_ptes_none is mapped to a 0-100 range to > determine how full a mTHP order needs to be before collapsing it. > > Some design choices to note: > - bitmap structures are allocated dynamically because on some arch's > (like PowerPC) the value of MTHP_BITMAP_SIZE cannot be computed at > compile time leading to warnings. > - The recursion is masked through a stack structure. > - A MTHP_MIN_ORDER was added to compress the bitmap, and ensure it was > 64bit on x86. This provides some optimization on the bitmap operations. > if other arches/configs that have larger than 512 PTEs per PMD want to > compress their bitmap further we can change this value per arch. > > Patch 1-2: Some refactoring to combine madvise_collapse and khugepaged > Patch 3: A minor "fix"/optimization > Patch 4: Refactor/rename hpage_collapse > Patch 5-7: Generalize khugepaged functions for arbitrary orders > Patch 8-11: The mTHP patches > > This series acts as an alternative to Dev Jain's approach [1]. The two > series differ in a few ways: > - My approach uses a bitmap to store the state of the linear scan_pmd to > then determine potential mTHP batches. Devs incorporates his directly > into the scan, and will try each available order. > - Dev is attempting to optimize the locking, while my approach keeps the > locking changes to a minimum. I believe his changes are not safe for > uffd. > - Dev's changes only work for khugepaged not madvise_collapse (although > i think that was by choice and it could easily support madvise) > - Dev scales all khugepaged sysfs tunables by order, while im removing > the restriction of max_ptes_none and converting it to a scale to > determine a (m)THP threshold. > - Dev turns on khugepaged if any order is available while mine still > only runs if PMDs are enabled. I like Dev's approach and will most > likely do the same in my PATCH posting. > - mTHPs need their ref count updated to 1< > Patch 11 was inspired by one of Dev's changes. > > [1] https://lore.kernel.org/lkml/20241216165105.56185-1-dev.jain@arm.com/ > > Nico Pache (11): > introduce khugepaged_collapse_single_pmd to collapse a single pmd > khugepaged: refactor madvise_collapse and khugepaged_scan_mm_slot > khugepaged: Don't allocate khugepaged mm_slot early > khugepaged: rename hpage_collapse_* to khugepaged_* > khugepaged: generalize hugepage_vma_revalidate for mTHP support > khugepaged: generalize alloc_charge_folio for mTHP support > khugepaged: generalize __collapse_huge_page_* for mTHP support > khugepaged: introduce khugepaged_scan_bitmap for mTHP support > khugepaged: add mTHP support > khugepaged: remove max_ptes_none restriction on the pmd scan > khugepaged: skip collapsing mTHP to smaller orders > > include/linux/khugepaged.h | 4 +- > mm/huge_memory.c | 3 +- > mm/khugepaged.c | 436 +++++++++++++++++++++++++------------ > 3 files changed, 306 insertions(+), 137 deletions(-) >