From: Andrew Morton <akpm@linux-foundation.org>
To: mm-commits@vger.kernel.org,zokeefe@google.com,ziy@nvidia.com,willy@infradead.org,will@kernel.org,wangkefeng.wang@huawei.com,vishal.moola@gmail.com,usamaarif642@gmail.com,tiwai@suse.de,thomas.hellstrom@linux.intel.com,surenb@google.com,sunnanyong@huawei.com,ryan.roberts@arm.com,rostedt@goodmis.org,rientjes@google.com,rdunlap@infradead.org,raquini@redhat.com,peterx@redhat.com,mhocko@suse.com,mhiramat@kernel.org,mathieu.desnoyers@efficios.com,lorenzo.stoakes@oracle.com,liam.howlett@oracle.com,kirill.shutemov@linux.intel.com,jack@suse.cz,hannes@cmpxchg.org,dev.jain@arm.com,david@redhat.com,corbet@lwn.net,cl@gentwo.org,catalin.marinas@arm.com,baolin.wang@linux.alibaba.com,baohua@kernel.org,bagasdotme@gmail.com,anshuman.khandual@arm.com,aarcange@redhat.com,npache@redhat.com,akpm@linux-foundation.org
Subject: + khugepaged-introduce-khugepaged_scan_bitmap-for-mthp-support.patch added to mm-new branch
Date: Thu, 03 Jul 2025 17:59:17 -0700 [thread overview]
Message-ID: <20250704005917.DB315C4CEE3@smtp.kernel.org> (raw)
The patch titled
Subject: khugepaged: introduce khugepaged_scan_bitmap for mTHP support
has been added to the -mm mm-new branch. Its filename is
khugepaged-introduce-khugepaged_scan_bitmap-for-mthp-support.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/khugepaged-introduce-khugepaged_scan_bitmap-for-mthp-support.patch
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Nico Pache <npache@redhat.com>
Subject: khugepaged: introduce khugepaged_scan_bitmap for mTHP support
Date: Tue, 1 Jul 2025 23:57:33 -0600
khugepaged scans anons PMD ranges for potential collapse to a hugepage.
To add mTHP support we use this scan to instead record chunks of utilized
sections of the PMD.
khugepaged_scan_bitmap uses a stack struct to recursively scan a bitmap
that represents chunks of utilized regions. We can then determine what
mTHP size fits best and in the following patch, we set this bitmap while
scanning the anon PMD. A minimum collapse order of 2 is used as this is
the lowest order supported by anon memory.
max_ptes_none is used as a scale to determine how "full" an order must be
before being considered for collapse.
When attempting to collapse an order that has its order set to "always"
lets always collapse to that order in a greedy manner without considering
the number of bits set.
Link: https://lkml.kernel.org/r/20250702055742.102808-7-npache@redhat.com
Signed-off-by: Nico Pache <npache@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter (Ampere) <cl@gentwo.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nanyong Sun <sunnanyong@huawei.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rafael Aquini <raquini@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Reported-by:Takashi Iwai <tiwai@suse.de>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/khugepaged.h | 4 +
mm/khugepaged.c | 96 +++++++++++++++++++++++++++++++----
2 files changed, 90 insertions(+), 10 deletions(-)
--- a/include/linux/khugepaged.h~khugepaged-introduce-khugepaged_scan_bitmap-for-mthp-support
+++ a/include/linux/khugepaged.h
@@ -1,6 +1,10 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_KHUGEPAGED_H
#define _LINUX_KHUGEPAGED_H
+#define KHUGEPAGED_MIN_MTHP_ORDER 2
+#define KHUGEPAGED_MIN_MTHP_NR (1<<KHUGEPAGED_MIN_MTHP_ORDER)
+#define MAX_MTHP_BITMAP_SIZE (1 << (ilog2(MAX_PTRS_PER_PTE) - KHUGEPAGED_MIN_MTHP_ORDER))
+#define MTHP_BITMAP_SIZE (1 << (HPAGE_PMD_ORDER - KHUGEPAGED_MIN_MTHP_ORDER))
extern unsigned int khugepaged_max_ptes_none __read_mostly;
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
--- a/mm/khugepaged.c~khugepaged-introduce-khugepaged_scan_bitmap-for-mthp-support
+++ a/mm/khugepaged.c
@@ -94,6 +94,11 @@ static DEFINE_READ_MOSTLY_HASHTABLE(mm_s
static struct kmem_cache *mm_slot_cache __ro_after_init;
+struct scan_bit_state {
+ u8 order;
+ u16 offset;
+};
+
struct collapse_control {
bool is_khugepaged;
@@ -102,6 +107,18 @@ struct collapse_control {
/* nodemask for allocation fallback */
nodemask_t alloc_nmask;
+
+ /*
+ * bitmap used to collapse mTHP sizes.
+ * 1bit = order KHUGEPAGED_MIN_MTHP_ORDER mTHP
+ */
+ DECLARE_BITMAP(mthp_bitmap, MAX_MTHP_BITMAP_SIZE);
+ DECLARE_BITMAP(mthp_bitmap_temp, MAX_MTHP_BITMAP_SIZE);
+ struct scan_bit_state mthp_bitmap_stack[MAX_MTHP_BITMAP_SIZE];
+};
+
+struct collapse_control khugepaged_collapse_control = {
+ .is_khugepaged = true,
};
/**
@@ -838,10 +855,6 @@ static void khugepaged_alloc_sleep(void)
remove_wait_queue(&khugepaged_wait, &wait);
}
-struct collapse_control khugepaged_collapse_control = {
- .is_khugepaged = true,
-};
-
static bool khugepaged_scan_abort(int nid, struct collapse_control *cc)
{
int i;
@@ -1106,7 +1119,8 @@ static int alloc_charge_folio(struct fol
static int collapse_huge_page(struct mm_struct *mm, unsigned long address,
int referenced, int unmapped,
- struct collapse_control *cc)
+ struct collapse_control *cc, bool *mmap_locked,
+ u8 order, u16 offset)
{
LIST_HEAD(compound_pagelist);
pmd_t *pmd, _pmd;
@@ -1125,8 +1139,12 @@ static int collapse_huge_page(struct mm_
* The allocation can take potentially a long time if it involves
* sync compaction, and we do not need to hold the mmap_lock during
* that. We will recheck the vma after taking it again in write mode.
+ * If collapsing mTHPs we may have already released the read_lock.
*/
- mmap_read_unlock(mm);
+ if (*mmap_locked) {
+ mmap_read_unlock(mm);
+ *mmap_locked = false;
+ }
result = alloc_charge_folio(&folio, mm, cc, HPAGE_PMD_ORDER);
if (result != SCAN_SUCCEED)
@@ -1263,12 +1281,72 @@ static int collapse_huge_page(struct mm_
out_up_write:
mmap_write_unlock(mm);
out_nolock:
+ *mmap_locked = false;
if (folio)
folio_put(folio);
trace_mm_collapse_huge_page(mm, result == SCAN_SUCCEED, result);
return result;
}
+/* Recursive function to consume the bitmap */
+static int khugepaged_scan_bitmap(struct mm_struct *mm, unsigned long address,
+ int referenced, int unmapped, struct collapse_control *cc,
+ bool *mmap_locked, unsigned long enabled_orders)
+{
+ u8 order, next_order;
+ u16 offset, mid_offset;
+ int num_chunks;
+ int bits_set, threshold_bits;
+ int top = -1;
+ int collapsed = 0;
+ int ret;
+ struct scan_bit_state state;
+ bool is_pmd_only = (enabled_orders == (1 << HPAGE_PMD_ORDER));
+
+ cc->mthp_bitmap_stack[++top] = (struct scan_bit_state)
+ { HPAGE_PMD_ORDER - KHUGEPAGED_MIN_MTHP_ORDER, 0 };
+
+ while (top >= 0) {
+ state = cc->mthp_bitmap_stack[top--];
+ order = state.order + KHUGEPAGED_MIN_MTHP_ORDER;
+ offset = state.offset;
+ num_chunks = 1 << (state.order);
+ // Skip mTHP orders that are not enabled
+ if (!test_bit(order, &enabled_orders))
+ goto next;
+
+ // copy the relavant section to a new bitmap
+ bitmap_shift_right(cc->mthp_bitmap_temp, cc->mthp_bitmap, offset,
+ MTHP_BITMAP_SIZE);
+
+ bits_set = bitmap_weight(cc->mthp_bitmap_temp, num_chunks);
+ threshold_bits = (HPAGE_PMD_NR - khugepaged_max_ptes_none - 1)
+ >> (HPAGE_PMD_ORDER - state.order);
+
+ //Check if the region is "almost full" based on the threshold
+ if (bits_set > threshold_bits || is_pmd_only
+ || test_bit(order, &huge_anon_orders_always)) {
+ ret = collapse_huge_page(mm, address, referenced, unmapped, cc,
+ mmap_locked, order, offset * KHUGEPAGED_MIN_MTHP_NR);
+ if (ret == SCAN_SUCCEED) {
+ collapsed += (1 << order);
+ continue;
+ }
+ }
+
+next:
+ if (state.order > 0) {
+ next_order = state.order - 1;
+ mid_offset = offset + (num_chunks / 2);
+ cc->mthp_bitmap_stack[++top] = (struct scan_bit_state)
+ { next_order, mid_offset };
+ cc->mthp_bitmap_stack[++top] = (struct scan_bit_state)
+ { next_order, offset };
+ }
+ }
+ return collapsed;
+}
+
static int khugepaged_scan_pmd(struct mm_struct *mm,
struct vm_area_struct *vma,
unsigned long address, bool *mmap_locked,
@@ -1435,9 +1513,7 @@ out_unmap:
pte_unmap_unlock(pte, ptl);
if (result == SCAN_SUCCEED) {
result = collapse_huge_page(mm, address, referenced,
- unmapped, cc);
- /* collapse_huge_page will return with the mmap_lock released */
- *mmap_locked = false;
+ unmapped, cc, mmap_locked, HPAGE_PMD_ORDER, 0);
}
out:
trace_mm_khugepaged_scan_pmd(mm, folio, writable, referenced,
@@ -2385,7 +2461,7 @@ static int khugepaged_collapse_single_pm
int result = SCAN_FAIL;
struct mm_struct *mm = vma->vm_mm;
- if (IS_ENABLED(CONFIG_SHMEM) && !vma_is_anonymous(vma)) {
+ if (!vma_is_anonymous(vma)) {
struct file *file = get_file(vma->vm_file);
pgoff_t pgoff = linear_page_index(vma, addr);
_
Patches currently in -mm which might be from npache@redhat.com are
khugepaged-rename-hpage_collapse_-to-khugepaged_.patch
introduce-khugepaged_collapse_single_pmd-to-unify-khugepaged-and-madvise_collapse.patch
khugepaged-generalize-hugepage_vma_revalidate-for-mthp-support.patch
khugepaged-generalize-__collapse_huge_page_-for-mthp-support.patch
khugepaged-introduce-khugepaged_scan_bitmap-for-mthp-support.patch
khugepaged-add-mthp-support.patch
khugepaged-skip-collapsing-mthp-to-smaller-orders.patch
khugepaged-avoid-unnecessary-mthp-collapse-attempts.patch
khugepaged-allow-madvise_collapse-to-check-all-anonymous-mthp-orders.patch
khugepaged-improve-tracepoints-for-mthp-orders.patch
khugepaged-add-per-order-mthp-khugepaged-stats.patch
documentation-mm-update-the-admin-guide-for-mthp-collapse.patch
next reply other threads:[~2025-07-04 0:59 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-04 0:59 Andrew Morton [this message]
-- strict thread matches above, loose matches on Subject: below --
2025-04-28 19:01 + khugepaged-introduce-khugepaged_scan_bitmap-for-mthp-support.patch added to mm-new branch Andrew Morton
2025-04-17 23:22 Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250704005917.DB315C4CEE3@smtp.kernel.org \
--to=akpm@linux-foundation.org \
--cc=aarcange@redhat.com \
--cc=anshuman.khandual@arm.com \
--cc=bagasdotme@gmail.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=corbet@lwn.net \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=hannes@cmpxchg.org \
--cc=jack@suse.cz \
--cc=kirill.shutemov@linux.intel.com \
--cc=liam.howlett@oracle.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mhocko@suse.com \
--cc=mm-commits@vger.kernel.org \
--cc=npache@redhat.com \
--cc=peterx@redhat.com \
--cc=raquini@redhat.com \
--cc=rdunlap@infradead.org \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=ryan.roberts@arm.com \
--cc=sunnanyong@huawei.com \
--cc=surenb@google.com \
--cc=thomas.hellstrom@linux.intel.com \
--cc=tiwai@suse.de \
--cc=usamaarif642@gmail.com \
--cc=vishal.moola@gmail.com \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.