From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6510C7EE30 for ; Wed, 2 Jul 2025 06:00:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 548F56B00AB; Wed, 2 Jul 2025 02:00:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4F9B36B00C3; Wed, 2 Jul 2025 02:00:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3C1236B00C4; Wed, 2 Jul 2025 02:00:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 273EC6B00AB for ; Wed, 2 Jul 2025 02:00:12 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id DC52E14075C for ; Wed, 2 Jul 2025 06:00:11 +0000 (UTC) X-FDA: 83618274222.30.C9F571E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 283BE1C0005 for ; Wed, 2 Jul 2025 06:00:09 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=CKt0caaX; spf=pass (imf18.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751436010; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=J7loTkprZZNXlyy874VYi0qZRn+a3HlQFG7EFdDFZf0=; b=DLKZzFo8x+jjJCtrdYa1BB+GhpkKYv4Df3at5tzgQsIpMy3vt8JPYvz0MTHrv5yrRfUrTf YxCGQM2UylyBbFvcllMM/YwlDx9f1ZrkxNrE0gZiqO3k+ZHsJ+SLwMOKMCdt6ce9q/49TD Hl5MZx9VqmfQS9NWpJyw1xUkoV1IytQ= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=CKt0caaX; spf=pass (imf18.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751436010; a=rsa-sha256; cv=none; b=QsPMom1ECKzvMSi4SjzDGbm6HIZg1+5nVlcvXMe+KthL6eLpUfrpAn7yExEMWrEBsOhtpU //EdGA2FlrJRa+bcCubZs3feXYVyuWB3GVsUPmKAEoV7lwLA/rMeXB0SNqRISlLltdALbe ij9pEkZAZ6NwUM1DJc17EEztNKPOLTM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751436009; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J7loTkprZZNXlyy874VYi0qZRn+a3HlQFG7EFdDFZf0=; b=CKt0caaX4nUsne4tTeFdnNEpzltdYxxplrNsAvqRcG6ruDxwGJd4krqkq9qNM8fywJ5AE8 mtF3AbVjaKfV7AgUQTLE1kI2PMjW5s044Arx0CeN6P2an4jVwIZBhRP3OzVE4GcoVifstk vYNG8CyaW+v0G7a2SFzKJAHPnoxaOKA= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-194-ZzPbPVkwMKWjJJxEGn_KPw-1; Wed, 02 Jul 2025 02:00:04 -0400 X-MC-Unique: ZzPbPVkwMKWjJJxEGn_KPw-1 X-Mimecast-MFC-AGG-ID: ZzPbPVkwMKWjJJxEGn_KPw_1751436000 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3215518001D6; Wed, 2 Jul 2025 06:00:00 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.112]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 35F2318003FC; Wed, 2 Jul 2025 05:59:44 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org Subject: [PATCH v8 06/15] khugepaged: introduce khugepaged_scan_bitmap for mTHP support Date: Tue, 1 Jul 2025 23:57:33 -0600 Message-ID: <20250702055742.102808-7-npache@redhat.com> In-Reply-To: <20250702055742.102808-1-npache@redhat.com> References: <20250702055742.102808-1-npache@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 283BE1C0005 X-Stat-Signature: fi71c7imm9qw6mnhz97z4ekr6ihs61wj X-HE-Tag: 1751436009-860872 X-HE-Meta: U2FsdGVkX18MteuPARIIzpALwmKSHjcZ2Z0s0Pbr5+qgnvvXmxac5FL/S/veyULRb8bqxEMYoITid6vPU8z2tScSdGq4iJkdfz8TqKvRAW2VgKdW4MH3gGXrZlh7XNyQwCP/Zykrgc2d5bwU8aq7GpNZ73pmBCP7pAzblpnsvx24mJ6zKeTQOOYuwKqXkThzZtuL7LBBzIK55DPSpZAjFXlqKjAWIJ1MD/yaJi22wS77PjKSdMfj6MBa1KbUJP0nG/orn+Q181I+JLQ6BTXS/yj9IDhZtwkym+fMExQJoTz4SjV9D91aGKhmFEWheVqKIinWd2JzfTzVqs8J0ME7qsnjsAGOBREdizcBkcZk6mlmSYvtFUNKgl+ORHWDqs+uXiNoXtYn3E/giONnseddSGQetnPihcQoL/xR/AjBr7OkDO3cC57obfFI2dDIIC/UXU35L6zgYlYY0Tcy77J/MeYCtbUwV3xKRjXizd3xFIzYfvQ0q4xBCtglvUuRMUEHYiOU1fGpTuCnrCHR34znLbuvcAuWapXNwzxQdNddLWMdpNmPtDN/UeVuzmpFQSfMKMGAj6V568Qur1yQEL6ZyNr0vMzTXIsNHJr32VSWSvHCfJ/JBuoqmYHAgw2mP0moWdDWgcO7o4tk/ncFl7SeIDuJ4Q62HzpHIzg0uaN71ZRH9M0RGQnJg6tQ7omh2GiMf5PdYp1e37KJYmkgIy6ou1s8HEEyZXjItC5x29HnhvhUculUkg4xChxkfR0RQc1YqQZjlCWWjs4ePU3O+X0el4N1gzowSov7DMOvGvuV8t9ghidavLyvLjucm6WI6JbrMreQ4rB7YG1pFS0aGUHtxRRPsCjeSQ4+FjN8W97yknaDZBW+9+9r4hXNCO+yW2yw6BGH8eQGyGrH3ADpqF40o0Kzbu4KAbdLmgE1yX4vgAKNB7N1b05NLA/nMHzl0IEKb8nFl2ArKM9Qh1HpF9B ks/0WMph VmR0zFYGpzbplGjWazXHL0/H7uI0qo2egPotU0hESUlSICQoUA4dWN3VZuPIdDoT2TZVP5bwh874YGs3gnL/b+OrfiiXDMAXPk/gPhIYL9mhBVzjUqyk8Epd8fhT0UvJJdyCiAdkYzhlP/GpgDiwCDT+Hq9xMfywe8Bt1f4ThlhzunmH51Ac85ySKQOiUdYH1AleTqu2c05crqCrGYZyff1wYeePfx/xvvZV+a4Jn74i4Adb0t0oRYrbtPtBPHANc8VsdetMfkNEK87kL1dnHdyIJs2SOr5P8iaFD+whx0tzwl3r4iXw4EI2aco9+ubar58lvC4IEdeAET80svyDcpcxJ272GIfM4MF1vPUwmoxEA0OsMAH4qN+gXJror62Au9s78KrXlXvfdgAR07zTx1Gg/5KAX+UHog9zQ3+tQHJa9d0Y/ZPWMDFybsWe+xy4iPl3bL5+DXl574As= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: khugepaged scans anons PMD ranges for potential collapse to a hugepage. To add mTHP support we use this scan to instead record chunks of utilized sections of the PMD. khugepaged_scan_bitmap uses a stack struct to recursively scan a bitmap that represents chunks of utilized regions. We can then determine what mTHP size fits best and in the following patch, we set this bitmap while scanning the anon PMD. A minimum collapse order of 2 is used as this is the lowest order supported by anon memory. max_ptes_none is used as a scale to determine how "full" an order must be before being considered for collapse. When attempting to collapse an order that has its order set to "always" lets always collapse to that order in a greedy manner without considering the number of bits set. Signed-off-by: Nico Pache --- include/linux/khugepaged.h | 4 ++ mm/khugepaged.c | 96 ++++++++++++++++++++++++++++++++++---- 2 files changed, 90 insertions(+), 10 deletions(-) diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index ff6120463745..0f957711a117 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -1,6 +1,10 @@ /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_KHUGEPAGED_H #define _LINUX_KHUGEPAGED_H +#define KHUGEPAGED_MIN_MTHP_ORDER 2 +#define KHUGEPAGED_MIN_MTHP_NR (1<mthp_bitmap_stack[++top] = (struct scan_bit_state) + { HPAGE_PMD_ORDER - KHUGEPAGED_MIN_MTHP_ORDER, 0 }; + + while (top >= 0) { + state = cc->mthp_bitmap_stack[top--]; + order = state.order + KHUGEPAGED_MIN_MTHP_ORDER; + offset = state.offset; + num_chunks = 1 << (state.order); + // Skip mTHP orders that are not enabled + if (!test_bit(order, &enabled_orders)) + goto next; + + // copy the relavant section to a new bitmap + bitmap_shift_right(cc->mthp_bitmap_temp, cc->mthp_bitmap, offset, + MTHP_BITMAP_SIZE); + + bits_set = bitmap_weight(cc->mthp_bitmap_temp, num_chunks); + threshold_bits = (HPAGE_PMD_NR - khugepaged_max_ptes_none - 1) + >> (HPAGE_PMD_ORDER - state.order); + + //Check if the region is "almost full" based on the threshold + if (bits_set > threshold_bits || is_pmd_only + || test_bit(order, &huge_anon_orders_always)) { + ret = collapse_huge_page(mm, address, referenced, unmapped, cc, + mmap_locked, order, offset * KHUGEPAGED_MIN_MTHP_NR); + if (ret == SCAN_SUCCEED) { + collapsed += (1 << order); + continue; + } + } + +next: + if (state.order > 0) { + next_order = state.order - 1; + mid_offset = offset + (num_chunks / 2); + cc->mthp_bitmap_stack[++top] = (struct scan_bit_state) + { next_order, mid_offset }; + cc->mthp_bitmap_stack[++top] = (struct scan_bit_state) + { next_order, offset }; + } + } + return collapsed; +} + static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, bool *mmap_locked, @@ -1435,9 +1513,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, pte_unmap_unlock(pte, ptl); if (result == SCAN_SUCCEED) { result = collapse_huge_page(mm, address, referenced, - unmapped, cc); - /* collapse_huge_page will return with the mmap_lock released */ - *mmap_locked = false; + unmapped, cc, mmap_locked, HPAGE_PMD_ORDER, 0); } out: trace_mm_khugepaged_scan_pmd(mm, folio, writable, referenced, @@ -2385,7 +2461,7 @@ static int khugepaged_collapse_single_pmd(unsigned long addr, int result = SCAN_FAIL; struct mm_struct *mm = vma->vm_mm; - if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) { + if (!vma_is_anonymous(vma)) { struct file *file = get_file(vma->vm_file); pgoff_t pgoff = linear_page_index(vma, addr); -- 2.49.0