From: Nico Pache <npache@redhat.com>
To: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org
Cc: aarcange@redhat.com, akpm@linux-foundation.org,
anshuman.khandual@arm.com, apopple@nvidia.com, baohua@kernel.org,
baolin.wang@linux.alibaba.com, byungchul@sk.com,
catalin.marinas@arm.com, cl@gentwo.org, corbet@lwn.net,
dave.hansen@linux.intel.com, david@kernel.org, dev.jain@arm.com,
gourry@gourry.net, hannes@cmpxchg.org, hughd@google.com,
jack@suse.cz, jackmanb@google.com, jannh@google.com,
jglisse@google.com, joshua.hahnjy@gmail.com, kas@kernel.org,
lance.yang@linux.dev, liam@infradead.org, ljs@kernel.org,
mathieu.desnoyers@efficios.com, matthew.brost@intel.com,
mhiramat@kernel.org, mhocko@suse.com, npache@redhat.com,
peterx@redhat.com, pfalcato@suse.de, rakie.kim@sk.com,
raquini@redhat.com, rdunlap@infradead.org,
richard.weiyang@gmail.com, rientjes@google.com,
rostedt@goodmis.org, rppt@kernel.org, ryan.roberts@arm.com,
shivankg@amd.com, sunnanyong@huawei.com, surenb@google.com,
thomas.hellstrom@linux.intel.com, tiwai@suse.de,
usamaarif642@gmail.com, vbabka@suse.cz, vishal.moola@gmail.com,
wangkefeng.wang@huawei.com, will@kernel.org, willy@infradead.org,
yang@os.amperecomputing.com, ying.huang@linux.alibaba.com,
ziy@nvidia.com, zokeefe@google.com,
Usama Arif <usama.arif@linux.dev>
Subject: [PATCH mm-unstable v19 01/14] mm/khugepaged: generalize hugepage_vma_revalidate for mTHP support
Date: Fri, 5 Jun 2026 10:14:08 -0600 [thread overview]
Message-ID: <20260605161422.213817-2-npache@redhat.com> (raw)
In-Reply-To: <20260605161422.213817-1-npache@redhat.com>
For khugepaged to support different mTHP orders, we must generalize this
to check if the PMD is not shared by another VMA and that the order is
enabled.
We cannot collapse VMA regions that do not span the full PMD. This is due
to the potential of the PMD being shared by another VMA which leaves us
vulnerable to race conditions if neighboring VMAs are resized. Always
check the PMD order here to ensure its not shared by another VMA. We'd
need to lock all VMAs in the PMD range to support this which may lead to
increased lock contention and code complexity.
No functional change in this patch. Also correct a comment about the
functionality of the revalidation and fix a double space issues.
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: Usama Arif <usama.arif@linux.dev>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Co-developed-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Nico Pache <npache@redhat.com>
---
mm/khugepaged.c | 26 ++++++++++++++++++--------
1 file changed, 18 insertions(+), 8 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index a4b97ec8ce56..b3910042bbf7 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -905,12 +905,13 @@ static int collapse_find_target_node(struct collapse_control *cc)
/*
* If mmap_lock temporarily dropped, revalidate vma
- * before taking mmap_lock.
+ * after taking the mmap_lock again.
* Returns enum scan_result value.
*/
static enum scan_result hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
- bool expect_anon, struct vm_area_struct **vmap, struct collapse_control *cc)
+ bool expect_anon, struct vm_area_struct **vmap,
+ struct collapse_control *cc, unsigned int order)
{
struct vm_area_struct *vma;
enum tva_type type = cc->is_khugepaged ? TVA_KHUGEPAGED :
@@ -923,15 +924,22 @@ static enum scan_result hugepage_vma_revalidate(struct mm_struct *mm, unsigned l
if (!vma)
return SCAN_VMA_NULL;
+ /*
+ * We cannot collapse VMA regions that do not span the full PMD. This is
+ * due to the potential of the PMD being shared by another VMA leaving
+ * us vulnerable to a race condition. Always check the PMD order here to
+ * ensure its not shared by another VMA. We'd need to lock all VMAs in
+ * the PMD range to support this.
+ */
if (!thp_vma_suitable_order(vma, address, PMD_ORDER))
return SCAN_ADDRESS_RANGE;
- if (!thp_vma_allowable_order(vma, vma->vm_flags, type, PMD_ORDER))
+ if (!thp_vma_allowable_orders(vma, vma->vm_flags, type, BIT(order)))
return SCAN_VMA_CHECK;
/*
* Anon VMA expected, the address may be unmapped then
* remapped to file after khugepaged reaquired the mmap_lock.
*
- * thp_vma_allowable_order may return true for qualified file
+ * thp_vma_allowable_orders may return true for qualified file
* vmas.
*/
if (expect_anon && (!(*vmap)->anon_vma || !vma_is_anonymous(*vmap)))
@@ -1124,7 +1132,8 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
goto out_nolock;
mmap_read_lock(mm);
- result = hugepage_vma_revalidate(mm, address, true, &vma, cc);
+ result = hugepage_vma_revalidate(mm, address, true, &vma, cc,
+ HPAGE_PMD_ORDER);
if (result != SCAN_SUCCEED) {
mmap_read_unlock(mm);
goto out_nolock;
@@ -1158,7 +1167,8 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
* mmap_lock.
*/
mmap_write_lock(mm);
- result = hugepage_vma_revalidate(mm, address, true, &vma, cc);
+ result = hugepage_vma_revalidate(mm, address, true, &vma, cc,
+ HPAGE_PMD_ORDER);
if (result != SCAN_SUCCEED)
goto out_up_write;
/* check if the pmd is still valid */
@@ -2861,8 +2871,8 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
mmap_unlocked = false;
*lock_dropped = true;
result = hugepage_vma_revalidate(mm, addr, false, &vma,
- cc);
- if (result != SCAN_SUCCEED) {
+ cc, HPAGE_PMD_ORDER);
+ if (result != SCAN_SUCCEED) {
last_fail = result;
goto out_nolock;
}
--
2.54.0
next prev parent reply other threads:[~2026-06-05 16:14 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-05 16:14 [PATCH mm-unstable v19 00/14] khugepaged: add mTHP collapse support Nico Pache
2026-06-05 16:14 ` Nico Pache [this message]
2026-06-05 16:14 ` [PATCH mm-unstable v19 02/14] mm/khugepaged: generalize alloc_charge_folio() Nico Pache
2026-06-05 16:14 ` [PATCH mm-unstable v19 03/14] mm/khugepaged: rework max_ptes_* handling with helper functions Nico Pache
2026-06-05 16:14 ` [PATCH mm-unstable v19 04/14] mm/khugepaged: generalize __collapse_huge_page_* for mTHP support Nico Pache
2026-06-05 19:03 ` Zi Yan
2026-06-05 16:14 ` [PATCH mm-unstable v19 05/14] mm/khugepaged: require collapse_huge_page to enter/exit with the lock dropped Nico Pache
2026-06-05 16:14 ` [PATCH mm-unstable v19 06/14] mm/khugepaged: generalize collapse_huge_page for mTHP collapse Nico Pache
2026-06-05 17:48 ` David Hildenbrand (Arm)
2026-06-05 18:15 ` Lorenzo Stoakes
2026-06-05 18:18 ` Lorenzo Stoakes
2026-06-05 16:14 ` [PATCH mm-unstable v19 07/14] mm/khugepaged: skip collapsing mTHP to smaller orders Nico Pache
2026-06-05 16:14 ` [PATCH mm-unstable v19 08/14] mm/khugepaged: add per-order mTHP collapse failure statistics Nico Pache
2026-06-05 16:14 ` [PATCH mm-unstable v19 09/14] mm/khugepaged: improve tracepoints for mTHP orders Nico Pache
2026-06-05 16:14 ` [PATCH mm-unstable v19 10/14] mm/khugepaged: introduce collapse_possible_orders helper functions Nico Pache
2026-06-05 17:46 ` Lorenzo Stoakes
2026-06-05 16:14 ` [PATCH mm-unstable v19 11/14] mm/khugepaged: Introduce mTHP collapse support Nico Pache
2026-06-05 18:03 ` David Hildenbrand (Arm)
2026-06-05 18:38 ` Lorenzo Stoakes
2026-06-05 16:14 ` [PATCH mm-unstable v19 12/14] mm/khugepaged: avoid unnecessary mTHP collapse attempts Nico Pache
2026-06-05 17:49 ` David Hildenbrand (Arm)
2026-06-05 18:16 ` Lorenzo Stoakes
2026-06-05 16:14 ` [PATCH mm-unstable v19 13/14] mm/khugepaged: run khugepaged for all orders Nico Pache
2026-06-05 16:14 ` [PATCH mm-unstable v19 14/14] Documentation: mm: update the admin guide for mTHP collapse Nico Pache
2026-06-05 17:52 ` David Hildenbrand (Arm)
2026-06-05 18:20 ` Lorenzo Stoakes
2026-06-05 18:07 ` [PATCH mm-unstable v19 00/14] khugepaged: add mTHP collapse support David Hildenbrand (Arm)
2026-06-05 18:39 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260605161422.213817-2-npache@redhat.com \
--to=npache@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=apopple@nvidia.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=byungchul@sk.com \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jackmanb@google.com \
--cc=jannh@google.com \
--cc=jglisse@google.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kas@kernel.org \
--cc=lance.yang@linux.dev \
--cc=liam@infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=ljs@kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=matthew.brost@intel.com \
--cc=mhiramat@kernel.org \
--cc=mhocko@suse.com \
--cc=peterx@redhat.com \
--cc=pfalcato@suse.de \
--cc=rakie.kim@sk.com \
--cc=raquini@redhat.com \
--cc=rdunlap@infradead.org \
--cc=richard.weiyang@gmail.com \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shivankg@amd.com \
--cc=sunnanyong@huawei.com \
--cc=surenb@google.com \
--cc=thomas.hellstrom@linux.intel.com \
--cc=tiwai@suse.de \
--cc=usama.arif@linux.dev \
--cc=usamaarif642@gmail.com \
--cc=vbabka@suse.cz \
--cc=vishal.moola@gmail.com \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yang@os.amperecomputing.com \
--cc=ying.huang@linux.alibaba.com \
--cc=ziy@nvidia.com \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox