public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites
@ 2026-03-25 11:40 Nico Pache
  2026-03-25 11:40 ` [PATCH mm-unstable v4 1/5] mm: consolidate anonymous folio PTE mapping into helpers Nico Pache
                   ` (6 more replies)
  0 siblings, 7 replies; 35+ messages in thread
From: Nico Pache @ 2026-03-25 11:40 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
	dev.jain, gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
	mathieu.desnoyers, matthew.brost, mhiramat, mhocko, npache,
	peterx, pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang,
	rientjes, rostedt, rppt, ryan.roberts, shivankg, sunnanyong,
	surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe

MAINTAINER NOTE: This is based on mm-unstable with the coresponding
patches reverted then reapplied.

The following series contains cleanups and prerequisites for my work on
khugepaged mTHP support [1]. These have been separated out to ease review.

The first patch in the series refactors the page fault folio to pte mapping
and follows a similar convention as defined by map_anon_folio_pmd_(no)pf().
This not only cleans up the current implementation of do_anonymous_page(),
but will allow for reuse later in the khugepaged mTHP implementation.

The second patch adds a small is_pmd_order() helper to check if an order is
the PMD order. This check is open-coded in a number of places. This patch
aims to clean this up and will be used more in the khugepaged mTHP work.
The third patch also adds a small DEFINE for (HPAGE_PMD_NR - 1) which is
used often across the khugepaged code.

The fourth and fifth patch come from the khugepaged mTHP patchset [1].
These two patches include the rename of function prefixes, and the
unification of khugepaged and madvise_collapse via a new
collapse_single_pmd function.

Patch 1:     refactor do_anonymous_page into map_anon_folio_pte_(no)pf
Patch 2:     add is_pmd_order helper
Patch 3:     Add define for (HPAGE_PMD_NR - 1)
Patch 4:     Refactor/rename hpage_collapse
Patch 5:     Refactoring to combine madvise_collapse and khugepaged

Testing:
- Built for x86_64, aarch64, ppc64le, and s390x
- ran all arches on test suites provided by the kernel-tests project
- selftests mm

V4 Changes:
 - added RB and SB tags
 - Patch1: commit message cleanup/additions
 - Patch1: constify two variables, and change 1<<order to 1L<<..
 - Patch1: change zero-page read path to use update_mmu_cache varient
 - Patch5: remove dead code switch statement (SCAN_PTE_MAPPED_HUGEPAGE)
 - Patch5: remove local mmap_locked from madvise_collapse()
 - Patch5: rename mmap_locked to lock_dropped in ..scan_mm_slot() and
   invert the logic. the madvise|khugepaged code now share the same 
   naming convention across both functions.
 - Patch5: add assertion to collapse_single_pmd() so both madvise_collapse
   and khugepaged assert the lock.
 - Patch5: Convert one of the VM_BUG_ON's to VM_WARN_ON

V3 - https://lore.kernel.org/all/20260311211315.450947-1-npache@redhat.com
V2 - https://lore.kernel.org/all/20260226012929.169479-1-npache@redhat.com
V1 - https://lore.kernel.org/all/20260212021835.17755-1-npache@redhat.com

A big thanks to everyone that has reviewed, tested, and participated in
the development process.

[1] - https://lore.kernel.org/all/20260122192841.128719-1-npache@redhat.com
[2] - https://lore.kernel.org/all/7334b702-f6a0-4ccf-8ac6-8426a90d1846@kernel.org/
[3] - https://lore.kernel.org/all/25723c0f-c702-44ad-93e9-1056313680cd@kernel.org/
[4] - https://lore.kernel.org/all/81ff9caa-50f2-4951-8d82-2c8dcdf3db91@kernel.org/

Nico Pache (5):
  mm: consolidate anonymous folio PTE mapping into helpers
  mm: introduce is_pmd_order helper
  mm/khugepaged: define KHUGEPAGED_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1
  mm/khugepaged: rename hpage_collapse_* to collapse_*
  mm/khugepaged: unify khugepaged and madv_collapse with
    collapse_single_pmd()

 include/linux/huge_mm.h |   5 +
 include/linux/mm.h      |   4 +
 mm/huge_memory.c        |   2 +-
 mm/khugepaged.c         | 207 ++++++++++++++++++++--------------------
 mm/memory.c             |  63 ++++++++----
 mm/mempolicy.c          |   2 +-
 mm/mremap.c             |   2 +-
 mm/page_alloc.c         |   4 +-
 mm/shmem.c              |   3 +-
 9 files changed, 161 insertions(+), 131 deletions(-)

-- 
2.53.0



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH mm-unstable v4 1/5] mm: consolidate anonymous folio PTE mapping into helpers
  2026-03-25 11:40 [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
@ 2026-03-25 11:40 ` Nico Pache
  2026-03-25 11:40 ` [PATCH mm-unstable v4 2/5] mm: introduce is_pmd_order helper Nico Pache
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 35+ messages in thread
From: Nico Pache @ 2026-03-25 11:40 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
	dev.jain, gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
	mathieu.desnoyers, matthew.brost, mhiramat, mhocko, npache,
	peterx, pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang,
	rientjes, rostedt, rppt, ryan.roberts, shivankg, sunnanyong,
	surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe, Lorenzo Stoakes (Oracle)

The anonymous page fault handler in do_anonymous_page() open-codes the
sequence to map a newly allocated anonymous folio at the PTE level:
	- construct the PTE entry
	- add rmap
	- add to LRU
	- set the PTEs
	- update the MMU cache.

Introduce two helpers to consolidate this duplicated logic, mirroring the
existing map_anon_folio_pmd_nopf() pattern for PMD-level mappings:

map_anon_folio_pte_nopf(): constructs the PTE entry, takes folio
references, adds anon rmap and LRU. This function also handles the
uffd_wp that can occur in the pf variant. The future khugepaged mTHP code
calls this to handle mapping the new collapsed mTHP to its folio.

map_anon_folio_pte_pf(): extends the nopf variant to handle MM_ANONPAGES
counter updates, and mTHP fault allocation statistics for the page fault
path.

The zero-page read path in do_anonymous_page() is also untangled from the
shared setpte label, since it does not allocate a folio and should not
share the same mapping sequence as the write path. We can now leave nr_pages
undeclared at the function intialization, and use the single page
update_mmu_cache function to handle the zero page update.

This refactoring will also help reduce code duplication between mm/memory.c
and mm/khugepaged.c, and provides a clean API for PTE-level anonymous folio
mapping that can be reused by future callers (like khugpeaged mTHP support)

Suggested-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Nico Pache <npache@redhat.com>
---
 include/linux/mm.h |  4 +++
 mm/memory.c        | 61 +++++++++++++++++++++++++++++++---------------
 2 files changed, 45 insertions(+), 20 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index eebf940058da..7edebadb2cb2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -5226,4 +5226,8 @@ static inline bool snapshot_page_is_faithful(const struct page_snapshot *ps)
 
 void snapshot_page(struct page_snapshot *ps, const struct page *page);
 
+void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
+		struct vm_area_struct *vma, unsigned long addr,
+		bool uffd_wp);
+
 #endif /* _LINUX_MM_H */
diff --git a/mm/memory.c b/mm/memory.c
index bd93f34b6120..6396d32c348a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5292,6 +5292,37 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
 	return folio_prealloc(vma->vm_mm, vma, vmf->address, true);
 }
 
+void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
+		struct vm_area_struct *vma, unsigned long addr,
+		bool uffd_wp)
+{
+	const unsigned int nr_pages = folio_nr_pages(folio);
+	pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
+
+	entry = pte_sw_mkyoung(entry);
+
+	if (vma->vm_flags & VM_WRITE)
+		entry = pte_mkwrite(pte_mkdirty(entry), vma);
+	if (uffd_wp)
+		entry = pte_mkuffd_wp(entry);
+
+	folio_ref_add(folio, nr_pages - 1);
+	folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
+	folio_add_lru_vma(folio, vma);
+	set_ptes(vma->vm_mm, addr, pte, entry, nr_pages);
+	update_mmu_cache_range(NULL, vma, addr, pte, nr_pages);
+}
+
+static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
+		struct vm_area_struct *vma, unsigned long addr, bool uffd_wp)
+{
+	const unsigned int order = folio_order(folio);
+
+	map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
+	add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1L << order);
+	count_mthp_stat(order, MTHP_STAT_ANON_FAULT_ALLOC);
+}
+
 /*
  * We enter with non-exclusive mmap_lock (to exclude vma changes,
  * but allow concurrent faults), and pte mapped but not yet locked.
@@ -5303,7 +5334,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 	unsigned long addr = vmf->address;
 	struct folio *folio;
 	vm_fault_t ret = 0;
-	int nr_pages = 1;
+	int nr_pages;
 	pte_t entry;
 
 	/* File mapping without ->vm_ops ? */
@@ -5338,7 +5369,13 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 			pte_unmap_unlock(vmf->pte, vmf->ptl);
 			return handle_userfault(vmf, VM_UFFD_MISSING);
 		}
-		goto setpte;
+		if (vmf_orig_pte_uffd_wp(vmf))
+			entry = pte_mkuffd_wp(entry);
+		set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
+
+		/* No need to invalidate - it was non-present before */
+		update_mmu_cache(vma, addr, vmf->pte);
+		goto unlock;
 	}
 
 	/* Allocate our own private page. */
@@ -5362,11 +5399,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 	 */
 	__folio_mark_uptodate(folio);
 
-	entry = folio_mk_pte(folio, vma->vm_page_prot);
-	entry = pte_sw_mkyoung(entry);
-	if (vma->vm_flags & VM_WRITE)
-		entry = pte_mkwrite(pte_mkdirty(entry), vma);
-
 	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
 	if (!vmf->pte)
 		goto release;
@@ -5388,19 +5420,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 		folio_put(folio);
 		return handle_userfault(vmf, VM_UFFD_MISSING);
 	}
-
-	folio_ref_add(folio, nr_pages - 1);
-	add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
-	count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
-	folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
-	folio_add_lru_vma(folio, vma);
-setpte:
-	if (vmf_orig_pte_uffd_wp(vmf))
-		entry = pte_mkuffd_wp(entry);
-	set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
-
-	/* No need to invalidate - it was non-present before */
-	update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages);
+	map_anon_folio_pte_pf(folio, vmf->pte, vma, addr,
+			      vmf_orig_pte_uffd_wp(vmf));
 unlock:
 	if (vmf->pte)
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH mm-unstable v4 2/5] mm: introduce is_pmd_order helper
  2026-03-25 11:40 [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
  2026-03-25 11:40 ` [PATCH mm-unstable v4 1/5] mm: consolidate anonymous folio PTE mapping into helpers Nico Pache
@ 2026-03-25 11:40 ` Nico Pache
  2026-03-25 12:11   ` Lorenzo Stoakes (Oracle)
  2026-03-25 11:40 ` [PATCH mm-unstable v4 3/5] mm/khugepaged: define KHUGEPAGED_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1 Nico Pache
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: Nico Pache @ 2026-03-25 11:40 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
	dev.jain, gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
	mathieu.desnoyers, matthew.brost, mhiramat, mhocko, npache,
	peterx, pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang,
	rientjes, rostedt, rppt, ryan.roberts, shivankg, sunnanyong,
	surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe

In order to add mTHP support to khugepaged, we will often be checking if a
given order is (or is not) a PMD order. Some places in the kernel already
use this check, so lets create a simple helper function to keep the code
clean and readable.

Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Nico Pache <npache@redhat.com>
---
 include/linux/huge_mm.h | 5 +++++
 mm/huge_memory.c        | 2 +-
 mm/khugepaged.c         | 6 +++---
 mm/memory.c             | 2 +-
 mm/mempolicy.c          | 2 +-
 mm/page_alloc.c         | 4 ++--
 mm/shmem.c              | 3 +--
 7 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index c8799dca3b60..1258fa37e85b 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -769,6 +769,11 @@ static inline bool pmd_is_huge(pmd_t pmd)
 }
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
+static inline bool is_pmd_order(unsigned int order)
+{
+	return order == HPAGE_PMD_ORDER;
+}
+
 static inline int split_folio_to_list_to_order(struct folio *folio,
 		struct list_head *list, int new_order)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2833b06d7498..b2a6060b3c20 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -4118,7 +4118,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
 		i_mmap_unlock_read(mapping);
 out:
 	xas_destroy(&xas);
-	if (old_order == HPAGE_PMD_ORDER)
+	if (is_pmd_order(old_order))
 		count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
 	count_mthp_stat(old_order, !ret ? MTHP_STAT_SPLIT : MTHP_STAT_SPLIT_FAILED);
 	return ret;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 6bd7a7c0632a..1f4609761294 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1547,7 +1547,7 @@ static enum scan_result try_collapse_pte_mapped_thp(struct mm_struct *mm, unsign
 	if (IS_ERR(folio))
 		return SCAN_PAGE_NULL;
 
-	if (folio_order(folio) != HPAGE_PMD_ORDER) {
+	if (!is_pmd_order(folio_order(folio))) {
 		result = SCAN_PAGE_COMPOUND;
 		goto drop_folio;
 	}
@@ -2030,7 +2030,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 		 * we locked the first folio, then a THP might be there already.
 		 * This will be discovered on the first iteration.
 		 */
-		if (folio_order(folio) == HPAGE_PMD_ORDER) {
+		if (is_pmd_order(folio_order(folio))) {
 			result = SCAN_PTE_MAPPED_HUGEPAGE;
 			goto out_unlock;
 		}
@@ -2358,7 +2358,7 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm,
 			continue;
 		}
 
-		if (folio_order(folio) == HPAGE_PMD_ORDER) {
+		if (is_pmd_order(folio_order(folio))) {
 			result = SCAN_PTE_MAPPED_HUGEPAGE;
 			/*
 			 * PMD-sized THP implies that we can only try
diff --git a/mm/memory.c b/mm/memory.c
index 6396d32c348a..e44469f9cf65 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5573,7 +5573,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct folio *folio, struct page *pa
 	if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER))
 		return ret;
 
-	if (folio_order(folio) != HPAGE_PMD_ORDER)
+	if (!is_pmd_order(folio_order(folio)))
 		return ret;
 	page = &folio->page;
 
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index ff52fb94ff27..fd08771e2057 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2449,7 +2449,7 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
 
 	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
 	    /* filter "hugepage" allocation, unless from alloc_pages() */
-	    order == HPAGE_PMD_ORDER && ilx != NO_INTERLEAVE_INDEX) {
+	    is_pmd_order(order) && ilx != NO_INTERLEAVE_INDEX) {
 		/*
 		 * For hugepage allocation and non-interleave policy which
 		 * allows the current node (or other explicitly preferred
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 915b6aef55d0..ee81f5c67c18 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -652,7 +652,7 @@ static inline unsigned int order_to_pindex(int migratetype, int order)
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	bool movable;
 	if (order > PAGE_ALLOC_COSTLY_ORDER) {
-		VM_BUG_ON(order != HPAGE_PMD_ORDER);
+		VM_BUG_ON(!is_pmd_order(order));
 
 		movable = migratetype == MIGRATE_MOVABLE;
 
@@ -684,7 +684,7 @@ static inline bool pcp_allowed_order(unsigned int order)
 	if (order <= PAGE_ALLOC_COSTLY_ORDER)
 		return true;
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	if (order == HPAGE_PMD_ORDER)
+	if (is_pmd_order(order))
 		return true;
 #endif
 	return false;
diff --git a/mm/shmem.c b/mm/shmem.c
index d00044257401..4ecefe02881d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -5532,8 +5532,7 @@ static ssize_t thpsize_shmem_enabled_store(struct kobject *kobj,
 		spin_unlock(&huge_shmem_orders_lock);
 	} else if (sysfs_streq(buf, "inherit")) {
 		/* Do not override huge allocation policy with non-PMD sized mTHP */
-		if (shmem_huge == SHMEM_HUGE_FORCE &&
-		    order != HPAGE_PMD_ORDER)
+		if (shmem_huge == SHMEM_HUGE_FORCE && !is_pmd_order(order))
 			return -EINVAL;
 
 		spin_lock(&huge_shmem_orders_lock);
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH mm-unstable v4 3/5] mm/khugepaged: define KHUGEPAGED_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1
  2026-03-25 11:40 [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
  2026-03-25 11:40 ` [PATCH mm-unstable v4 1/5] mm: consolidate anonymous folio PTE mapping into helpers Nico Pache
  2026-03-25 11:40 ` [PATCH mm-unstable v4 2/5] mm: introduce is_pmd_order helper Nico Pache
@ 2026-03-25 11:40 ` Nico Pache
  2026-03-25 11:40 ` [PATCH mm-unstable v4 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_* Nico Pache
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 35+ messages in thread
From: Nico Pache @ 2026-03-25 11:40 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
	dev.jain, gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
	mathieu.desnoyers, matthew.brost, mhiramat, mhocko, npache,
	peterx, pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang,
	rientjes, rostedt, rppt, ryan.roberts, shivankg, sunnanyong,
	surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe, Lorenzo Stoakes (Oracle)

The value (HPAGE_PMD_NR - 1) is used often in the khugepaged code to
signify the limit of the max_ptes_* values. Add a define for this to
increase code readability and reuse.

Acked-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Suggested-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Nico Pache <npache@redhat.com>
---
 mm/khugepaged.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 1f4609761294..c9c17bfccf0d 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -89,6 +89,7 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
  *
  * Note that these are only respected if collapse was initiated by khugepaged.
  */
+#define KHUGEPAGED_MAX_PTES_LIMIT (HPAGE_PMD_NR - 1)
 unsigned int khugepaged_max_ptes_none __read_mostly;
 static unsigned int khugepaged_max_ptes_swap __read_mostly;
 static unsigned int khugepaged_max_ptes_shared __read_mostly;
@@ -259,7 +260,7 @@ static ssize_t max_ptes_none_store(struct kobject *kobj,
 	unsigned long max_ptes_none;
 
 	err = kstrtoul(buf, 10, &max_ptes_none);
-	if (err || max_ptes_none > HPAGE_PMD_NR - 1)
+	if (err || max_ptes_none > KHUGEPAGED_MAX_PTES_LIMIT)
 		return -EINVAL;
 
 	khugepaged_max_ptes_none = max_ptes_none;
@@ -284,7 +285,7 @@ static ssize_t max_ptes_swap_store(struct kobject *kobj,
 	unsigned long max_ptes_swap;
 
 	err  = kstrtoul(buf, 10, &max_ptes_swap);
-	if (err || max_ptes_swap > HPAGE_PMD_NR - 1)
+	if (err || max_ptes_swap > KHUGEPAGED_MAX_PTES_LIMIT)
 		return -EINVAL;
 
 	khugepaged_max_ptes_swap = max_ptes_swap;
@@ -310,7 +311,7 @@ static ssize_t max_ptes_shared_store(struct kobject *kobj,
 	unsigned long max_ptes_shared;
 
 	err  = kstrtoul(buf, 10, &max_ptes_shared);
-	if (err || max_ptes_shared > HPAGE_PMD_NR - 1)
+	if (err || max_ptes_shared > KHUGEPAGED_MAX_PTES_LIMIT)
 		return -EINVAL;
 
 	khugepaged_max_ptes_shared = max_ptes_shared;
@@ -382,7 +383,7 @@ int __init khugepaged_init(void)
 		return -ENOMEM;
 
 	khugepaged_pages_to_scan = HPAGE_PMD_NR * 8;
-	khugepaged_max_ptes_none = HPAGE_PMD_NR - 1;
+	khugepaged_max_ptes_none = KHUGEPAGED_MAX_PTES_LIMIT;
 	khugepaged_max_ptes_swap = HPAGE_PMD_NR / 8;
 	khugepaged_max_ptes_shared = HPAGE_PMD_NR / 2;
 
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH mm-unstable v4 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_*
  2026-03-25 11:40 [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
                   ` (2 preceding siblings ...)
  2026-03-25 11:40 ` [PATCH mm-unstable v4 3/5] mm/khugepaged: define KHUGEPAGED_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1 Nico Pache
@ 2026-03-25 11:40 ` Nico Pache
  2026-03-25 12:08   ` Lorenzo Stoakes (Oracle)
  2026-03-25 11:40 ` [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd() Nico Pache
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: Nico Pache @ 2026-03-25 11:40 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
	dev.jain, gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
	mathieu.desnoyers, matthew.brost, mhiramat, mhocko, npache,
	peterx, pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang,
	rientjes, rostedt, rppt, ryan.roberts, shivankg, sunnanyong,
	surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe

The hpage_collapse functions describe functions used by madvise_collapse
and khugepaged. remove the unnecessary hpage prefix to shorten the
function name.

Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Nico Pache <npache@redhat.com>
---
 mm/khugepaged.c | 60 ++++++++++++++++++++++++-------------------------
 mm/mremap.c     |  2 +-
 2 files changed, 30 insertions(+), 32 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index c9c17bfccf0d..3728a2cf133c 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -395,14 +395,14 @@ void __init khugepaged_destroy(void)
 	kmem_cache_destroy(mm_slot_cache);
 }
 
-static inline int hpage_collapse_test_exit(struct mm_struct *mm)
+static inline int collapse_test_exit(struct mm_struct *mm)
 {
 	return atomic_read(&mm->mm_users) == 0;
 }
 
-static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm)
+static inline int collapse_test_exit_or_disable(struct mm_struct *mm)
 {
-	return hpage_collapse_test_exit(mm) ||
+	return collapse_test_exit(mm) ||
 		mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm);
 }
 
@@ -436,7 +436,7 @@ void __khugepaged_enter(struct mm_struct *mm)
 	int wakeup;
 
 	/* __khugepaged_exit() must not run from under us */
-	VM_BUG_ON_MM(hpage_collapse_test_exit(mm), mm);
+	VM_BUG_ON_MM(collapse_test_exit(mm), mm);
 	if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm)))
 		return;
 
@@ -490,7 +490,7 @@ void __khugepaged_exit(struct mm_struct *mm)
 	} else if (slot) {
 		/*
 		 * This is required to serialize against
-		 * hpage_collapse_test_exit() (which is guaranteed to run
+		 * collapse_test_exit() (which is guaranteed to run
 		 * under mmap sem read mode). Stop here (after we return all
 		 * pagetables will be destroyed) until khugepaged has finished
 		 * working on the pagetables under the mmap_lock.
@@ -589,7 +589,7 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
 			goto out;
 		}
 
-		/* See hpage_collapse_scan_pmd(). */
+		/* See collapse_scan_pmd(). */
 		if (folio_maybe_mapped_shared(folio)) {
 			++shared;
 			if (cc->is_khugepaged &&
@@ -840,7 +840,7 @@ static struct collapse_control khugepaged_collapse_control = {
 	.is_khugepaged = true,
 };
 
-static bool hpage_collapse_scan_abort(int nid, struct collapse_control *cc)
+static bool collapse_scan_abort(int nid, struct collapse_control *cc)
 {
 	int i;
 
@@ -875,7 +875,7 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void)
 }
 
 #ifdef CONFIG_NUMA
-static int hpage_collapse_find_target_node(struct collapse_control *cc)
+static int collapse_find_target_node(struct collapse_control *cc)
 {
 	int nid, target_node = 0, max_value = 0;
 
@@ -894,7 +894,7 @@ static int hpage_collapse_find_target_node(struct collapse_control *cc)
 	return target_node;
 }
 #else
-static int hpage_collapse_find_target_node(struct collapse_control *cc)
+static int collapse_find_target_node(struct collapse_control *cc)
 {
 	return 0;
 }
@@ -913,7 +913,7 @@ static enum scan_result hugepage_vma_revalidate(struct mm_struct *mm, unsigned l
 	enum tva_type type = cc->is_khugepaged ? TVA_KHUGEPAGED :
 				 TVA_FORCED_COLLAPSE;
 
-	if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
+	if (unlikely(collapse_test_exit_or_disable(mm)))
 		return SCAN_ANY_PROCESS;
 
 	*vmap = vma = find_vma(mm, address);
@@ -984,7 +984,7 @@ static enum scan_result check_pmd_still_valid(struct mm_struct *mm,
 
 /*
  * Bring missing pages in from swap, to complete THP collapse.
- * Only done if hpage_collapse_scan_pmd believes it is worthwhile.
+ * Only done if khugepaged_scan_pmd believes it is worthwhile.
  *
  * Called and returns without pte mapped or spinlocks held.
  * Returns result: if not SCAN_SUCCEED, mmap_lock has been released.
@@ -1070,7 +1070,7 @@ static enum scan_result alloc_charge_folio(struct folio **foliop, struct mm_stru
 {
 	gfp_t gfp = (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() :
 		     GFP_TRANSHUGE);
-	int node = hpage_collapse_find_target_node(cc);
+	int node = collapse_find_target_node(cc);
 	struct folio *folio;
 
 	folio = __folio_alloc(gfp, HPAGE_PMD_ORDER, node, &cc->alloc_nmask);
@@ -1255,7 +1255,7 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
 	return result;
 }
 
-static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
+static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
 		struct vm_area_struct *vma, unsigned long start_addr,
 		bool *mmap_locked, struct collapse_control *cc)
 {
@@ -1380,7 +1380,7 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
 		 * hit record.
 		 */
 		node = folio_nid(folio);
-		if (hpage_collapse_scan_abort(node, cc)) {
+		if (collapse_scan_abort(node, cc)) {
 			result = SCAN_SCAN_ABORT;
 			goto out_unmap;
 		}
@@ -1446,7 +1446,7 @@ static void collect_mm_slot(struct mm_slot *slot)
 
 	lockdep_assert_held(&khugepaged_mm_lock);
 
-	if (hpage_collapse_test_exit(mm)) {
+	if (collapse_test_exit(mm)) {
 		/* free mm_slot */
 		hash_del(&slot->hash);
 		list_del(&slot->mm_node);
@@ -1801,7 +1801,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
 		if (find_pmd_or_thp_or_none(mm, addr, &pmd) != SCAN_SUCCEED)
 			continue;
 
-		if (hpage_collapse_test_exit(mm))
+		if (collapse_test_exit(mm))
 			continue;
 
 		if (!file_backed_vma_is_retractable(vma))
@@ -2317,7 +2317,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 	return result;
 }
 
-static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm,
+static enum scan_result collapse_scan_file(struct mm_struct *mm,
 		unsigned long addr, struct file *file, pgoff_t start,
 		struct collapse_control *cc)
 {
@@ -2370,7 +2370,7 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm,
 		}
 
 		node = folio_nid(folio);
-		if (hpage_collapse_scan_abort(node, cc)) {
+		if (collapse_scan_abort(node, cc)) {
 			result = SCAN_SCAN_ABORT;
 			folio_put(folio);
 			break;
@@ -2424,7 +2424,7 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm,
 	return result;
 }
 
-static void khugepaged_scan_mm_slot(unsigned int progress_max,
+static void collapse_scan_mm_slot(unsigned int progress_max,
 		enum scan_result *result, struct collapse_control *cc)
 	__releases(&khugepaged_mm_lock)
 	__acquires(&khugepaged_mm_lock)
@@ -2458,7 +2458,7 @@ static void khugepaged_scan_mm_slot(unsigned int progress_max,
 		goto breakouterloop_mmap_lock;
 
 	cc->progress++;
-	if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
+	if (unlikely(collapse_test_exit_or_disable(mm)))
 		goto breakouterloop;
 
 	vma_iter_init(&vmi, mm, khugepaged_scan.address);
@@ -2466,7 +2466,7 @@ static void khugepaged_scan_mm_slot(unsigned int progress_max,
 		unsigned long hstart, hend;
 
 		cond_resched();
-		if (unlikely(hpage_collapse_test_exit_or_disable(mm))) {
+		if (unlikely(collapse_test_exit_or_disable(mm))) {
 			cc->progress++;
 			break;
 		}
@@ -2488,7 +2488,7 @@ static void khugepaged_scan_mm_slot(unsigned int progress_max,
 			bool mmap_locked = true;
 
 			cond_resched();
-			if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
+			if (unlikely(collapse_test_exit_or_disable(mm)))
 				goto breakouterloop;
 
 			VM_BUG_ON(khugepaged_scan.address < hstart ||
@@ -2501,12 +2501,12 @@ static void khugepaged_scan_mm_slot(unsigned int progress_max,
 
 				mmap_read_unlock(mm);
 				mmap_locked = false;
-				*result = hpage_collapse_scan_file(mm,
+				*result = collapse_scan_file(mm,
 					khugepaged_scan.address, file, pgoff, cc);
 				fput(file);
 				if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
 					mmap_read_lock(mm);
-					if (hpage_collapse_test_exit_or_disable(mm))
+					if (collapse_test_exit_or_disable(mm))
 						goto breakouterloop;
 					*result = try_collapse_pte_mapped_thp(mm,
 						khugepaged_scan.address, false);
@@ -2515,7 +2515,7 @@ static void khugepaged_scan_mm_slot(unsigned int progress_max,
 					mmap_read_unlock(mm);
 				}
 			} else {
-				*result = hpage_collapse_scan_pmd(mm, vma,
+				*result = collapse_scan_pmd(mm, vma,
 					khugepaged_scan.address, &mmap_locked, cc);
 			}
 
@@ -2547,7 +2547,7 @@ static void khugepaged_scan_mm_slot(unsigned int progress_max,
 	 * Release the current mm_slot if this mm is about to die, or
 	 * if we scanned all vmas of this mm, or THP got disabled.
 	 */
-	if (hpage_collapse_test_exit_or_disable(mm) || !vma) {
+	if (collapse_test_exit_or_disable(mm) || !vma) {
 		/*
 		 * Make sure that if mm_users is reaching zero while
 		 * khugepaged runs here, khugepaged_exit will find
@@ -2600,7 +2600,7 @@ static void khugepaged_do_scan(struct collapse_control *cc)
 			pass_through_head++;
 		if (khugepaged_has_work() &&
 		    pass_through_head < 2)
-			khugepaged_scan_mm_slot(progress_max, &result, cc);
+			collapse_scan_mm_slot(progress_max, &result, cc);
 		else
 			cc->progress = progress_max;
 		spin_unlock(&khugepaged_mm_lock);
@@ -2845,8 +2845,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 			mmap_read_unlock(mm);
 			mmap_locked = false;
 			*lock_dropped = true;
-			result = hpage_collapse_scan_file(mm, addr, file, pgoff,
-							  cc);
+			result = collapse_scan_file(mm, addr, file, pgoff, cc);
 
 			if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
 			    mapping_can_writeback(file->f_mapping)) {
@@ -2860,8 +2859,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 			}
 			fput(file);
 		} else {
-			result = hpage_collapse_scan_pmd(mm, vma, addr,
-							 &mmap_locked, cc);
+			result = collapse_scan_pmd(mm, vma, addr, &mmap_locked, cc);
 		}
 		if (!mmap_locked)
 			*lock_dropped = true;
diff --git a/mm/mremap.c b/mm/mremap.c
index 8566e32d58d9..e9c8b1d05832 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -244,7 +244,7 @@ static int move_ptes(struct pagetable_move_control *pmc,
 		goto out;
 	}
 	/*
-	 * Now new_pte is none, so hpage_collapse_scan_file() path can not find
+	 * Now new_pte is none, so collapse_scan_file() path can not find
 	 * this by traversing file->f_mapping, so there is no concurrency with
 	 * retract_page_tables(). In addition, we already hold the exclusive
 	 * mmap_lock, so this new_pte page is stable, so there is no need to get
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-03-25 11:40 [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
                   ` (3 preceding siblings ...)
  2026-03-25 11:40 ` [PATCH mm-unstable v4 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_* Nico Pache
@ 2026-03-25 11:40 ` Nico Pache
  2026-03-31 14:01   ` Lorenzo Stoakes (Oracle)
  2026-03-25 11:44 ` [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites Lorenzo Stoakes (Oracle)
  2026-03-26  0:25 ` Andrew Morton
  6 siblings, 1 reply; 35+ messages in thread
From: Nico Pache @ 2026-03-25 11:40 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: aarcange, akpm, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, david,
	dev.jain, gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
	mathieu.desnoyers, matthew.brost, mhiramat, mhocko, npache,
	peterx, pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang,
	rientjes, rostedt, rppt, ryan.roberts, shivankg, sunnanyong,
	surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe, Lorenzo Stoakes (Oracle)

The khugepaged daemon and madvise_collapse have two different
implementations that do almost the same thing. Create collapse_single_pmd
to increase code reuse and create an entry point to these two users.

Refactor madvise_collapse and collapse_scan_mm_slot to use the new
collapse_single_pmd function. To help reduce confusion around the
mmap_locked variable, we rename mmap_locked to lock_dropped in the
collapse_scan_mm_slot() function, and remove the redundant mmap_locked
in madvise_collapse(); this further unifies the code readiblity. the
SCAN_PTE_MAPPED_HUGEPAGE enum is no longer reachable in the
madvise_collapse() function, so we drop it from the list of "continuing"
enums.

This introduces a minor behavioral change that is most likely an
undiscovered bug. The current implementation of khugepaged tests
collapse_test_exit_or_disable() before calling collapse_pte_mapped_thp,
but we weren't doing it in the madvise_collapse case. By unifying these
two callers madvise_collapse now also performs this check. We also modify
the return value to be SCAN_ANY_PROCESS which properly indicates that this
process is no longer valid to operate on.

By moving the madvise_collapse writeback-retry logic into the helper
function we can also avoid having to revalidate the VMA.

We guard the khugepaged_pages_collapsed variable to ensure its only
incremented for khugepaged.

As requested we also convert a VM_BUG_ON to a VM_WARN_ON.

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Nico Pache <npache@redhat.com>
---
 mm/khugepaged.c | 142 ++++++++++++++++++++++++------------------------
 1 file changed, 72 insertions(+), 70 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 3728a2cf133c..d06d84219e1b 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1257,7 +1257,7 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
 
 static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
 		struct vm_area_struct *vma, unsigned long start_addr,
-		bool *mmap_locked, struct collapse_control *cc)
+		bool *lock_dropped, struct collapse_control *cc)
 {
 	pmd_t *pmd;
 	pte_t *pte, *_pte;
@@ -1432,7 +1432,7 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
 		result = collapse_huge_page(mm, start_addr, referenced,
 					    unmapped, cc);
 		/* collapse_huge_page will return with the mmap_lock released */
-		*mmap_locked = false;
+		*lock_dropped = true;
 	}
 out:
 	trace_mm_khugepaged_scan_pmd(mm, folio, referenced,
@@ -2424,6 +2424,67 @@ static enum scan_result collapse_scan_file(struct mm_struct *mm,
 	return result;
 }
 
+/*
+ * Try to collapse a single PMD starting at a PMD aligned addr, and return
+ * the results.
+ */
+static enum scan_result collapse_single_pmd(unsigned long addr,
+		struct vm_area_struct *vma, bool *lock_dropped,
+		struct collapse_control *cc)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	bool triggered_wb = false;
+	enum scan_result result;
+	struct file *file;
+	pgoff_t pgoff;
+
+	mmap_assert_locked(mm);
+
+	if (vma_is_anonymous(vma)) {
+		result = collapse_scan_pmd(mm, vma, addr, lock_dropped, cc);
+		goto end;
+	}
+
+	file = get_file(vma->vm_file);
+	pgoff = linear_page_index(vma, addr);
+
+	mmap_read_unlock(mm);
+	*lock_dropped = true;
+retry:
+	result = collapse_scan_file(mm, addr, file, pgoff, cc);
+
+	/*
+	 * For MADV_COLLAPSE, when encountering dirty pages, try to writeback,
+	 * then retry the collapse one time.
+	 */
+	if (!cc->is_khugepaged && result == SCAN_PAGE_DIRTY_OR_WRITEBACK &&
+	    !triggered_wb && mapping_can_writeback(file->f_mapping)) {
+		const loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
+		const loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
+
+		filemap_write_and_wait_range(file->f_mapping, lstart, lend);
+		triggered_wb = true;
+		goto retry;
+	}
+	fput(file);
+
+	if (result == SCAN_PTE_MAPPED_HUGEPAGE) {
+		mmap_read_lock(mm);
+		if (collapse_test_exit_or_disable(mm))
+			result = SCAN_ANY_PROCESS;
+		else
+			result = try_collapse_pte_mapped_thp(mm, addr,
+							     !cc->is_khugepaged);
+		if (result == SCAN_PMD_MAPPED)
+			result = SCAN_SUCCEED;
+		mmap_read_unlock(mm);
+	}
+end:
+	if (cc->is_khugepaged && result == SCAN_SUCCEED)
+		++khugepaged_pages_collapsed;
+	return result;
+}
+
 static void collapse_scan_mm_slot(unsigned int progress_max,
 		enum scan_result *result, struct collapse_control *cc)
 	__releases(&khugepaged_mm_lock)
@@ -2485,46 +2546,21 @@ static void collapse_scan_mm_slot(unsigned int progress_max,
 		VM_BUG_ON(khugepaged_scan.address & ~HPAGE_PMD_MASK);
 
 		while (khugepaged_scan.address < hend) {
-			bool mmap_locked = true;
+			bool lock_dropped = false;
 
 			cond_resched();
 			if (unlikely(collapse_test_exit_or_disable(mm)))
 				goto breakouterloop;
 
-			VM_BUG_ON(khugepaged_scan.address < hstart ||
+			VM_WARN_ON_ONCE(khugepaged_scan.address < hstart ||
 				  khugepaged_scan.address + HPAGE_PMD_SIZE >
 				  hend);
-			if (!vma_is_anonymous(vma)) {
-				struct file *file = get_file(vma->vm_file);
-				pgoff_t pgoff = linear_page_index(vma,
-						khugepaged_scan.address);
-
-				mmap_read_unlock(mm);
-				mmap_locked = false;
-				*result = collapse_scan_file(mm,
-					khugepaged_scan.address, file, pgoff, cc);
-				fput(file);
-				if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
-					mmap_read_lock(mm);
-					if (collapse_test_exit_or_disable(mm))
-						goto breakouterloop;
-					*result = try_collapse_pte_mapped_thp(mm,
-						khugepaged_scan.address, false);
-					if (*result == SCAN_PMD_MAPPED)
-						*result = SCAN_SUCCEED;
-					mmap_read_unlock(mm);
-				}
-			} else {
-				*result = collapse_scan_pmd(mm, vma,
-					khugepaged_scan.address, &mmap_locked, cc);
-			}
-
-			if (*result == SCAN_SUCCEED)
-				++khugepaged_pages_collapsed;
 
+			*result = collapse_single_pmd(khugepaged_scan.address,
+						      vma, &lock_dropped, cc);
 			/* move to next address */
 			khugepaged_scan.address += HPAGE_PMD_SIZE;
-			if (!mmap_locked)
+			if (lock_dropped)
 				/*
 				 * We released mmap_lock so break loop.  Note
 				 * that we drop mmap_lock before all hugepage
@@ -2799,7 +2835,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 	unsigned long hstart, hend, addr;
 	enum scan_result last_fail = SCAN_FAIL;
 	int thps = 0;
-	bool mmap_locked = true;
 
 	BUG_ON(vma->vm_start > start);
 	BUG_ON(vma->vm_end < end);
@@ -2821,13 +2856,11 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 
 	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
 		enum scan_result result = SCAN_FAIL;
-		bool triggered_wb = false;
 
-retry:
-		if (!mmap_locked) {
+		if (*lock_dropped) {
 			cond_resched();
 			mmap_read_lock(mm);
-			mmap_locked = true;
+			*lock_dropped = false;
 			result = hugepage_vma_revalidate(mm, addr, false, &vma,
 							 cc);
 			if (result  != SCAN_SUCCEED) {
@@ -2837,45 +2870,14 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 
 			hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
 		}
-		mmap_assert_locked(mm);
-		if (!vma_is_anonymous(vma)) {
-			struct file *file = get_file(vma->vm_file);
-			pgoff_t pgoff = linear_page_index(vma, addr);
 
-			mmap_read_unlock(mm);
-			mmap_locked = false;
-			*lock_dropped = true;
-			result = collapse_scan_file(mm, addr, file, pgoff, cc);
-
-			if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
-			    mapping_can_writeback(file->f_mapping)) {
-				loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
-				loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
-
-				filemap_write_and_wait_range(file->f_mapping, lstart, lend);
-				triggered_wb = true;
-				fput(file);
-				goto retry;
-			}
-			fput(file);
-		} else {
-			result = collapse_scan_pmd(mm, vma, addr, &mmap_locked, cc);
-		}
-		if (!mmap_locked)
-			*lock_dropped = true;
+		result = collapse_single_pmd(addr, vma, lock_dropped, cc);
 
-handle_result:
 		switch (result) {
 		case SCAN_SUCCEED:
 		case SCAN_PMD_MAPPED:
 			++thps;
 			break;
-		case SCAN_PTE_MAPPED_HUGEPAGE:
-			BUG_ON(mmap_locked);
-			mmap_read_lock(mm);
-			result = try_collapse_pte_mapped_thp(mm, addr, true);
-			mmap_read_unlock(mm);
-			goto handle_result;
 		/* Whitelisted set of results where continuing OK */
 		case SCAN_NO_PTE_TABLE:
 		case SCAN_PTE_NON_PRESENT:
@@ -2898,7 +2900,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 
 out_maybelock:
 	/* Caller expects us to hold mmap_lock on return */
-	if (!mmap_locked)
+	if (*lock_dropped)
 		mmap_read_lock(mm);
 out_nolock:
 	mmap_assert_locked(mm);
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites
  2026-03-25 11:40 [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
                   ` (4 preceding siblings ...)
  2026-03-25 11:40 ` [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd() Nico Pache
@ 2026-03-25 11:44 ` Lorenzo Stoakes (Oracle)
  2026-03-26  0:25 ` Andrew Morton
  6 siblings, 0 replies; 35+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-25 11:44 UTC (permalink / raw)
  To: Nico Pache
  Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
	apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
	corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
	jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
	Liam.Howlett, mathieu.desnoyers, matthew.brost, mhiramat, mhocko,
	peterx, pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang,
	rientjes, rostedt, rppt, ryan.roberts, shivankg, sunnanyong,
	surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe

-cc old email

Gentle reminder to please send new stuff to ljs@kernel.org!

Thanks, Lorenzo

On Wed, Mar 25, 2026 at 05:40:17AM -0600, Nico Pache wrote:
> MAINTAINER NOTE: This is based on mm-unstable with the coresponding
> patches reverted then reapplied.
>
> The following series contains cleanups and prerequisites for my work on
> khugepaged mTHP support [1]. These have been separated out to ease review.
>
> The first patch in the series refactors the page fault folio to pte mapping
> and follows a similar convention as defined by map_anon_folio_pmd_(no)pf().
> This not only cleans up the current implementation of do_anonymous_page(),
> but will allow for reuse later in the khugepaged mTHP implementation.
>
> The second patch adds a small is_pmd_order() helper to check if an order is
> the PMD order. This check is open-coded in a number of places. This patch
> aims to clean this up and will be used more in the khugepaged mTHP work.
> The third patch also adds a small DEFINE for (HPAGE_PMD_NR - 1) which is
> used often across the khugepaged code.
>
> The fourth and fifth patch come from the khugepaged mTHP patchset [1].
> These two patches include the rename of function prefixes, and the
> unification of khugepaged and madvise_collapse via a new
> collapse_single_pmd function.
>
> Patch 1:     refactor do_anonymous_page into map_anon_folio_pte_(no)pf
> Patch 2:     add is_pmd_order helper
> Patch 3:     Add define for (HPAGE_PMD_NR - 1)
> Patch 4:     Refactor/rename hpage_collapse
> Patch 5:     Refactoring to combine madvise_collapse and khugepaged
>
> Testing:
> - Built for x86_64, aarch64, ppc64le, and s390x
> - ran all arches on test suites provided by the kernel-tests project
> - selftests mm
>
> V4 Changes:
>  - added RB and SB tags
>  - Patch1: commit message cleanup/additions
>  - Patch1: constify two variables, and change 1<<order to 1L<<..
>  - Patch1: change zero-page read path to use update_mmu_cache varient
>  - Patch5: remove dead code switch statement (SCAN_PTE_MAPPED_HUGEPAGE)
>  - Patch5: remove local mmap_locked from madvise_collapse()
>  - Patch5: rename mmap_locked to lock_dropped in ..scan_mm_slot() and
>    invert the logic. the madvise|khugepaged code now share the same
>    naming convention across both functions.
>  - Patch5: add assertion to collapse_single_pmd() so both madvise_collapse
>    and khugepaged assert the lock.
>  - Patch5: Convert one of the VM_BUG_ON's to VM_WARN_ON
>
> V3 - https://lore.kernel.org/all/20260311211315.450947-1-npache@redhat.com
> V2 - https://lore.kernel.org/all/20260226012929.169479-1-npache@redhat.com
> V1 - https://lore.kernel.org/all/20260212021835.17755-1-npache@redhat.com
>
> A big thanks to everyone that has reviewed, tested, and participated in
> the development process.
>
> [1] - https://lore.kernel.org/all/20260122192841.128719-1-npache@redhat.com
> [2] - https://lore.kernel.org/all/7334b702-f6a0-4ccf-8ac6-8426a90d1846@kernel.org/
> [3] - https://lore.kernel.org/all/25723c0f-c702-44ad-93e9-1056313680cd@kernel.org/
> [4] - https://lore.kernel.org/all/81ff9caa-50f2-4951-8d82-2c8dcdf3db91@kernel.org/
>
> Nico Pache (5):
>   mm: consolidate anonymous folio PTE mapping into helpers
>   mm: introduce is_pmd_order helper
>   mm/khugepaged: define KHUGEPAGED_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1
>   mm/khugepaged: rename hpage_collapse_* to collapse_*
>   mm/khugepaged: unify khugepaged and madv_collapse with
>     collapse_single_pmd()
>
>  include/linux/huge_mm.h |   5 +
>  include/linux/mm.h      |   4 +
>  mm/huge_memory.c        |   2 +-
>  mm/khugepaged.c         | 207 ++++++++++++++++++++--------------------
>  mm/memory.c             |  63 ++++++++----
>  mm/mempolicy.c          |   2 +-
>  mm/mremap.c             |   2 +-
>  mm/page_alloc.c         |   4 +-
>  mm/shmem.c              |   3 +-
>  9 files changed, 161 insertions(+), 131 deletions(-)
>
> --
> 2.53.0
>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_*
  2026-03-25 11:40 ` [PATCH mm-unstable v4 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_* Nico Pache
@ 2026-03-25 12:08   ` Lorenzo Stoakes (Oracle)
  0 siblings, 0 replies; 35+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-25 12:08 UTC (permalink / raw)
  To: Nico Pache
  Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
	apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
	corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
	jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
	Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
	mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe

On Wed, Mar 25, 2026 at 05:40:21AM -0600, Nico Pache wrote:
> The hpage_collapse functions describe functions used by madvise_collapse
> and khugepaged. remove the unnecessary hpage prefix to shorten the
> function name.
>
> Reviewed-by: Dev Jain <dev.jain@arm.com>
> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Nit, but could we please update this to:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

Thanks, Lorenzo

> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
>  mm/khugepaged.c | 60 ++++++++++++++++++++++++-------------------------
>  mm/mremap.c     |  2 +-
>  2 files changed, 30 insertions(+), 32 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index c9c17bfccf0d..3728a2cf133c 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -395,14 +395,14 @@ void __init khugepaged_destroy(void)
>  	kmem_cache_destroy(mm_slot_cache);
>  }
>
> -static inline int hpage_collapse_test_exit(struct mm_struct *mm)
> +static inline int collapse_test_exit(struct mm_struct *mm)
>  {
>  	return atomic_read(&mm->mm_users) == 0;
>  }
>
> -static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm)
> +static inline int collapse_test_exit_or_disable(struct mm_struct *mm)
>  {
> -	return hpage_collapse_test_exit(mm) ||
> +	return collapse_test_exit(mm) ||
>  		mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm);
>  }
>
> @@ -436,7 +436,7 @@ void __khugepaged_enter(struct mm_struct *mm)
>  	int wakeup;
>
>  	/* __khugepaged_exit() must not run from under us */
> -	VM_BUG_ON_MM(hpage_collapse_test_exit(mm), mm);
> +	VM_BUG_ON_MM(collapse_test_exit(mm), mm);
>  	if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm)))
>  		return;
>
> @@ -490,7 +490,7 @@ void __khugepaged_exit(struct mm_struct *mm)
>  	} else if (slot) {
>  		/*
>  		 * This is required to serialize against
> -		 * hpage_collapse_test_exit() (which is guaranteed to run
> +		 * collapse_test_exit() (which is guaranteed to run
>  		 * under mmap sem read mode). Stop here (after we return all
>  		 * pagetables will be destroyed) until khugepaged has finished
>  		 * working on the pagetables under the mmap_lock.
> @@ -589,7 +589,7 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
>  			goto out;
>  		}
>
> -		/* See hpage_collapse_scan_pmd(). */
> +		/* See collapse_scan_pmd(). */
>  		if (folio_maybe_mapped_shared(folio)) {
>  			++shared;
>  			if (cc->is_khugepaged &&
> @@ -840,7 +840,7 @@ static struct collapse_control khugepaged_collapse_control = {
>  	.is_khugepaged = true,
>  };
>
> -static bool hpage_collapse_scan_abort(int nid, struct collapse_control *cc)
> +static bool collapse_scan_abort(int nid, struct collapse_control *cc)
>  {
>  	int i;
>
> @@ -875,7 +875,7 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void)
>  }
>
>  #ifdef CONFIG_NUMA
> -static int hpage_collapse_find_target_node(struct collapse_control *cc)
> +static int collapse_find_target_node(struct collapse_control *cc)
>  {
>  	int nid, target_node = 0, max_value = 0;
>
> @@ -894,7 +894,7 @@ static int hpage_collapse_find_target_node(struct collapse_control *cc)
>  	return target_node;
>  }
>  #else
> -static int hpage_collapse_find_target_node(struct collapse_control *cc)
> +static int collapse_find_target_node(struct collapse_control *cc)
>  {
>  	return 0;
>  }
> @@ -913,7 +913,7 @@ static enum scan_result hugepage_vma_revalidate(struct mm_struct *mm, unsigned l
>  	enum tva_type type = cc->is_khugepaged ? TVA_KHUGEPAGED :
>  				 TVA_FORCED_COLLAPSE;
>
> -	if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
> +	if (unlikely(collapse_test_exit_or_disable(mm)))
>  		return SCAN_ANY_PROCESS;
>
>  	*vmap = vma = find_vma(mm, address);
> @@ -984,7 +984,7 @@ static enum scan_result check_pmd_still_valid(struct mm_struct *mm,
>
>  /*
>   * Bring missing pages in from swap, to complete THP collapse.
> - * Only done if hpage_collapse_scan_pmd believes it is worthwhile.
> + * Only done if khugepaged_scan_pmd believes it is worthwhile.
>   *
>   * Called and returns without pte mapped or spinlocks held.
>   * Returns result: if not SCAN_SUCCEED, mmap_lock has been released.
> @@ -1070,7 +1070,7 @@ static enum scan_result alloc_charge_folio(struct folio **foliop, struct mm_stru
>  {
>  	gfp_t gfp = (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() :
>  		     GFP_TRANSHUGE);
> -	int node = hpage_collapse_find_target_node(cc);
> +	int node = collapse_find_target_node(cc);
>  	struct folio *folio;
>
>  	folio = __folio_alloc(gfp, HPAGE_PMD_ORDER, node, &cc->alloc_nmask);
> @@ -1255,7 +1255,7 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
>  	return result;
>  }
>
> -static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
> +static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
>  		struct vm_area_struct *vma, unsigned long start_addr,
>  		bool *mmap_locked, struct collapse_control *cc)
>  {
> @@ -1380,7 +1380,7 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
>  		 * hit record.
>  		 */
>  		node = folio_nid(folio);
> -		if (hpage_collapse_scan_abort(node, cc)) {
> +		if (collapse_scan_abort(node, cc)) {
>  			result = SCAN_SCAN_ABORT;
>  			goto out_unmap;
>  		}
> @@ -1446,7 +1446,7 @@ static void collect_mm_slot(struct mm_slot *slot)
>
>  	lockdep_assert_held(&khugepaged_mm_lock);
>
> -	if (hpage_collapse_test_exit(mm)) {
> +	if (collapse_test_exit(mm)) {
>  		/* free mm_slot */
>  		hash_del(&slot->hash);
>  		list_del(&slot->mm_node);
> @@ -1801,7 +1801,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
>  		if (find_pmd_or_thp_or_none(mm, addr, &pmd) != SCAN_SUCCEED)
>  			continue;
>
> -		if (hpage_collapse_test_exit(mm))
> +		if (collapse_test_exit(mm))
>  			continue;
>
>  		if (!file_backed_vma_is_retractable(vma))
> @@ -2317,7 +2317,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>  	return result;
>  }
>
> -static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm,
> +static enum scan_result collapse_scan_file(struct mm_struct *mm,
>  		unsigned long addr, struct file *file, pgoff_t start,
>  		struct collapse_control *cc)
>  {
> @@ -2370,7 +2370,7 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm,
>  		}
>
>  		node = folio_nid(folio);
> -		if (hpage_collapse_scan_abort(node, cc)) {
> +		if (collapse_scan_abort(node, cc)) {
>  			result = SCAN_SCAN_ABORT;
>  			folio_put(folio);
>  			break;
> @@ -2424,7 +2424,7 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm,
>  	return result;
>  }
>
> -static void khugepaged_scan_mm_slot(unsigned int progress_max,
> +static void collapse_scan_mm_slot(unsigned int progress_max,
>  		enum scan_result *result, struct collapse_control *cc)
>  	__releases(&khugepaged_mm_lock)
>  	__acquires(&khugepaged_mm_lock)
> @@ -2458,7 +2458,7 @@ static void khugepaged_scan_mm_slot(unsigned int progress_max,
>  		goto breakouterloop_mmap_lock;
>
>  	cc->progress++;
> -	if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
> +	if (unlikely(collapse_test_exit_or_disable(mm)))
>  		goto breakouterloop;
>
>  	vma_iter_init(&vmi, mm, khugepaged_scan.address);
> @@ -2466,7 +2466,7 @@ static void khugepaged_scan_mm_slot(unsigned int progress_max,
>  		unsigned long hstart, hend;
>
>  		cond_resched();
> -		if (unlikely(hpage_collapse_test_exit_or_disable(mm))) {
> +		if (unlikely(collapse_test_exit_or_disable(mm))) {
>  			cc->progress++;
>  			break;
>  		}
> @@ -2488,7 +2488,7 @@ static void khugepaged_scan_mm_slot(unsigned int progress_max,
>  			bool mmap_locked = true;
>
>  			cond_resched();
> -			if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
> +			if (unlikely(collapse_test_exit_or_disable(mm)))
>  				goto breakouterloop;
>
>  			VM_BUG_ON(khugepaged_scan.address < hstart ||
> @@ -2501,12 +2501,12 @@ static void khugepaged_scan_mm_slot(unsigned int progress_max,
>
>  				mmap_read_unlock(mm);
>  				mmap_locked = false;
> -				*result = hpage_collapse_scan_file(mm,
> +				*result = collapse_scan_file(mm,
>  					khugepaged_scan.address, file, pgoff, cc);
>  				fput(file);
>  				if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
>  					mmap_read_lock(mm);
> -					if (hpage_collapse_test_exit_or_disable(mm))
> +					if (collapse_test_exit_or_disable(mm))
>  						goto breakouterloop;
>  					*result = try_collapse_pte_mapped_thp(mm,
>  						khugepaged_scan.address, false);
> @@ -2515,7 +2515,7 @@ static void khugepaged_scan_mm_slot(unsigned int progress_max,
>  					mmap_read_unlock(mm);
>  				}
>  			} else {
> -				*result = hpage_collapse_scan_pmd(mm, vma,
> +				*result = collapse_scan_pmd(mm, vma,
>  					khugepaged_scan.address, &mmap_locked, cc);
>  			}
>
> @@ -2547,7 +2547,7 @@ static void khugepaged_scan_mm_slot(unsigned int progress_max,
>  	 * Release the current mm_slot if this mm is about to die, or
>  	 * if we scanned all vmas of this mm, or THP got disabled.
>  	 */
> -	if (hpage_collapse_test_exit_or_disable(mm) || !vma) {
> +	if (collapse_test_exit_or_disable(mm) || !vma) {
>  		/*
>  		 * Make sure that if mm_users is reaching zero while
>  		 * khugepaged runs here, khugepaged_exit will find
> @@ -2600,7 +2600,7 @@ static void khugepaged_do_scan(struct collapse_control *cc)
>  			pass_through_head++;
>  		if (khugepaged_has_work() &&
>  		    pass_through_head < 2)
> -			khugepaged_scan_mm_slot(progress_max, &result, cc);
> +			collapse_scan_mm_slot(progress_max, &result, cc);
>  		else
>  			cc->progress = progress_max;
>  		spin_unlock(&khugepaged_mm_lock);
> @@ -2845,8 +2845,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>  			mmap_read_unlock(mm);
>  			mmap_locked = false;
>  			*lock_dropped = true;
> -			result = hpage_collapse_scan_file(mm, addr, file, pgoff,
> -							  cc);
> +			result = collapse_scan_file(mm, addr, file, pgoff, cc);
>
>  			if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
>  			    mapping_can_writeback(file->f_mapping)) {
> @@ -2860,8 +2859,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>  			}
>  			fput(file);
>  		} else {
> -			result = hpage_collapse_scan_pmd(mm, vma, addr,
> -							 &mmap_locked, cc);
> +			result = collapse_scan_pmd(mm, vma, addr, &mmap_locked, cc);
>  		}
>  		if (!mmap_locked)
>  			*lock_dropped = true;
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 8566e32d58d9..e9c8b1d05832 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -244,7 +244,7 @@ static int move_ptes(struct pagetable_move_control *pmc,
>  		goto out;
>  	}
>  	/*
> -	 * Now new_pte is none, so hpage_collapse_scan_file() path can not find
> +	 * Now new_pte is none, so collapse_scan_file() path can not find
>  	 * this by traversing file->f_mapping, so there is no concurrency with
>  	 * retract_page_tables(). In addition, we already hold the exclusive
>  	 * mmap_lock, so this new_pte page is stable, so there is no need to get
> --
> 2.53.0
>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 2/5] mm: introduce is_pmd_order helper
  2026-03-25 11:40 ` [PATCH mm-unstable v4 2/5] mm: introduce is_pmd_order helper Nico Pache
@ 2026-03-25 12:11   ` Lorenzo Stoakes (Oracle)
  2026-03-25 14:45     ` Andrew Morton
  0 siblings, 1 reply; 35+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-25 12:11 UTC (permalink / raw)
  To: Nico Pache
  Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
	apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
	corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
	jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
	Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
	mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe

On Wed, Mar 25, 2026 at 05:40:19AM -0600, Nico Pache wrote:
> In order to add mTHP support to khugepaged, we will often be checking if a
> given order is (or is not) a PMD order. Some places in the kernel already
> use this check, so lets create a simple helper function to keep the code
> clean and readable.
>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Reviewed-by: Dev Jain <dev.jain@arm.com>
> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Reviewed-by: Barry Song <baohua@kernel.org>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Pedro Falcato <pfalcato@suse.de>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Nit, but could we please update both to:

Lorenzo Stoakes (Oracle) <ljs@kernel.org>

Thanks :), Lorenzo

> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
>  include/linux/huge_mm.h | 5 +++++
>  mm/huge_memory.c        | 2 +-
>  mm/khugepaged.c         | 6 +++---
>  mm/memory.c             | 2 +-
>  mm/mempolicy.c          | 2 +-
>  mm/page_alloc.c         | 4 ++--
>  mm/shmem.c              | 3 +--
>  7 files changed, 14 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index c8799dca3b60..1258fa37e85b 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -769,6 +769,11 @@ static inline bool pmd_is_huge(pmd_t pmd)
>  }
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> +static inline bool is_pmd_order(unsigned int order)
> +{
> +	return order == HPAGE_PMD_ORDER;
> +}
> +
>  static inline int split_folio_to_list_to_order(struct folio *folio,
>  		struct list_head *list, int new_order)
>  {
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 2833b06d7498..b2a6060b3c20 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4118,7 +4118,7 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>  		i_mmap_unlock_read(mapping);
>  out:
>  	xas_destroy(&xas);
> -	if (old_order == HPAGE_PMD_ORDER)
> +	if (is_pmd_order(old_order))
>  		count_vm_event(!ret ? THP_SPLIT_PAGE : THP_SPLIT_PAGE_FAILED);
>  	count_mthp_stat(old_order, !ret ? MTHP_STAT_SPLIT : MTHP_STAT_SPLIT_FAILED);
>  	return ret;
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 6bd7a7c0632a..1f4609761294 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1547,7 +1547,7 @@ static enum scan_result try_collapse_pte_mapped_thp(struct mm_struct *mm, unsign
>  	if (IS_ERR(folio))
>  		return SCAN_PAGE_NULL;
>
> -	if (folio_order(folio) != HPAGE_PMD_ORDER) {
> +	if (!is_pmd_order(folio_order(folio))) {
>  		result = SCAN_PAGE_COMPOUND;
>  		goto drop_folio;
>  	}
> @@ -2030,7 +2030,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>  		 * we locked the first folio, then a THP might be there already.
>  		 * This will be discovered on the first iteration.
>  		 */
> -		if (folio_order(folio) == HPAGE_PMD_ORDER) {
> +		if (is_pmd_order(folio_order(folio))) {
>  			result = SCAN_PTE_MAPPED_HUGEPAGE;
>  			goto out_unlock;
>  		}
> @@ -2358,7 +2358,7 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm,
>  			continue;
>  		}
>
> -		if (folio_order(folio) == HPAGE_PMD_ORDER) {
> +		if (is_pmd_order(folio_order(folio))) {
>  			result = SCAN_PTE_MAPPED_HUGEPAGE;
>  			/*
>  			 * PMD-sized THP implies that we can only try
> diff --git a/mm/memory.c b/mm/memory.c
> index 6396d32c348a..e44469f9cf65 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5573,7 +5573,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct folio *folio, struct page *pa
>  	if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER))
>  		return ret;
>
> -	if (folio_order(folio) != HPAGE_PMD_ORDER)
> +	if (!is_pmd_order(folio_order(folio)))
>  		return ret;
>  	page = &folio->page;
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index ff52fb94ff27..fd08771e2057 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -2449,7 +2449,7 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned int order,
>
>  	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
>  	    /* filter "hugepage" allocation, unless from alloc_pages() */
> -	    order == HPAGE_PMD_ORDER && ilx != NO_INTERLEAVE_INDEX) {
> +	    is_pmd_order(order) && ilx != NO_INTERLEAVE_INDEX) {
>  		/*
>  		 * For hugepage allocation and non-interleave policy which
>  		 * allows the current node (or other explicitly preferred
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 915b6aef55d0..ee81f5c67c18 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -652,7 +652,7 @@ static inline unsigned int order_to_pindex(int migratetype, int order)
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  	bool movable;
>  	if (order > PAGE_ALLOC_COSTLY_ORDER) {
> -		VM_BUG_ON(order != HPAGE_PMD_ORDER);
> +		VM_BUG_ON(!is_pmd_order(order));
>
>  		movable = migratetype == MIGRATE_MOVABLE;
>
> @@ -684,7 +684,7 @@ static inline bool pcp_allowed_order(unsigned int order)
>  	if (order <= PAGE_ALLOC_COSTLY_ORDER)
>  		return true;
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -	if (order == HPAGE_PMD_ORDER)
> +	if (is_pmd_order(order))
>  		return true;
>  #endif
>  	return false;
> diff --git a/mm/shmem.c b/mm/shmem.c
> index d00044257401..4ecefe02881d 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -5532,8 +5532,7 @@ static ssize_t thpsize_shmem_enabled_store(struct kobject *kobj,
>  		spin_unlock(&huge_shmem_orders_lock);
>  	} else if (sysfs_streq(buf, "inherit")) {
>  		/* Do not override huge allocation policy with non-PMD sized mTHP */
> -		if (shmem_huge == SHMEM_HUGE_FORCE &&
> -		    order != HPAGE_PMD_ORDER)
> +		if (shmem_huge == SHMEM_HUGE_FORCE && !is_pmd_order(order))
>  			return -EINVAL;
>
>  		spin_lock(&huge_shmem_orders_lock);
> --
> 2.53.0
>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 2/5] mm: introduce is_pmd_order helper
  2026-03-25 12:11   ` Lorenzo Stoakes (Oracle)
@ 2026-03-25 14:45     ` Andrew Morton
  2026-03-25 14:49       ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 35+ messages in thread
From: Andrew Morton @ 2026-03-25 14:45 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Nico Pache, linux-kernel, linux-mm, aarcange, anshuman.khandual,
	apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
	corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
	jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
	Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
	mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe

On Wed, 25 Mar 2026 12:11:43 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:

> On Wed, Mar 25, 2026 at 05:40:19AM -0600, Nico Pache wrote:
> > In order to add mTHP support to khugepaged, we will often be checking if a
> > given order is (or is not) a PMD order. Some places in the kernel already
> > use this check, so lets create a simple helper function to keep the code
> > clean and readable.
> >
> > Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> > Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > Reviewed-by: Dev Jain <dev.jain@arm.com>
> > Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> > Reviewed-by: Lance Yang <lance.yang@linux.dev>
> > Reviewed-by: Barry Song <baohua@kernel.org>
> > Reviewed-by: Zi Yan <ziy@nvidia.com>
> > Reviewed-by: Pedro Falcato <pfalcato@suse.de>
> > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> 
> Nit, but could we please update both to:
> 
> Lorenzo Stoakes (Oracle) <ljs@kernel.org>

hp2:/usr/src/25> grep lorenzo.stoakes@oracle.com patches/*.patch|wc -l
105

Please lmk if you'd prefer that I go through these and update the email
address.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 2/5] mm: introduce is_pmd_order helper
  2026-03-25 14:45     ` Andrew Morton
@ 2026-03-25 14:49       ` Lorenzo Stoakes (Oracle)
  2026-03-25 16:05         ` Andrew Morton
  0 siblings, 1 reply; 35+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-25 14:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nico Pache, linux-kernel, linux-mm, aarcange, anshuman.khandual,
	apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
	corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
	jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
	Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
	mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe

On Wed, Mar 25, 2026 at 07:45:16AM -0700, Andrew Morton wrote:
> On Wed, 25 Mar 2026 12:11:43 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
>
> > On Wed, Mar 25, 2026 at 05:40:19AM -0600, Nico Pache wrote:
> > > In order to add mTHP support to khugepaged, we will often be checking if a
> > > given order is (or is not) a PMD order. Some places in the kernel already
> > > use this check, so lets create a simple helper function to keep the code
> > > clean and readable.
> > >
> > > Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> > > Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > Reviewed-by: Dev Jain <dev.jain@arm.com>
> > > Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> > > Reviewed-by: Lance Yang <lance.yang@linux.dev>
> > > Reviewed-by: Barry Song <baohua@kernel.org>
> > > Reviewed-by: Zi Yan <ziy@nvidia.com>
> > > Reviewed-by: Pedro Falcato <pfalcato@suse.de>
> > > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> >
> > Nit, but could we please update both to:
> >
> > Lorenzo Stoakes (Oracle) <ljs@kernel.org>
>
> hp2:/usr/src/25> grep lorenzo.stoakes@oracle.com patches/*.patch|wc -l
> 105
>
> Please lmk if you'd prefer that I go through these and update the email
> address.

That'd be useful thanks.

It's not a huge big deal, but trying to gently nudge the transition forwards a
bit...

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 2/5] mm: introduce is_pmd_order helper
  2026-03-25 14:49       ` Lorenzo Stoakes (Oracle)
@ 2026-03-25 16:05         ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2026-03-25 16:05 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Nico Pache, linux-kernel, linux-mm, aarcange, anshuman.khandual,
	apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
	corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
	jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
	Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
	mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe

On Wed, 25 Mar 2026 14:49:30 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:

> On Wed, Mar 25, 2026 at 07:45:16AM -0700, Andrew Morton wrote:
> > On Wed, 25 Mar 2026 12:11:43 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:
> >
> > > On Wed, Mar 25, 2026 at 05:40:19AM -0600, Nico Pache wrote:
> > > > In order to add mTHP support to khugepaged, we will often be checking if a
> > > > given order is (or is not) a PMD order. Some places in the kernel already
> > > > use this check, so lets create a simple helper function to keep the code
> > > > clean and readable.
> > > >
> > > > Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> > > > Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > > Reviewed-by: Dev Jain <dev.jain@arm.com>
> > > > Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> > > > Reviewed-by: Lance Yang <lance.yang@linux.dev>
> > > > Reviewed-by: Barry Song <baohua@kernel.org>
> > > > Reviewed-by: Zi Yan <ziy@nvidia.com>
> > > > Reviewed-by: Pedro Falcato <pfalcato@suse.de>
> > > > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > > Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > >
> > > Nit, but could we please update both to:
> > >
> > > Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> >
> > hp2:/usr/src/25> grep lorenzo.stoakes@oracle.com patches/*.patch|wc -l
> > 105
> >
> > Please lmk if you'd prefer that I go through these and update the email
> > address.
> 
> That'd be useful thanks.
> 
> It's not a huge big deal, but trying to gently nudge the transition forwards a
> bit...

Done.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites
  2026-03-25 11:40 [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
                   ` (5 preceding siblings ...)
  2026-03-25 11:44 ` [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites Lorenzo Stoakes (Oracle)
@ 2026-03-26  0:25 ` Andrew Morton
  2026-03-26  4:44   ` Roman Gushchin
  6 siblings, 1 reply; 35+ messages in thread
From: Andrew Morton @ 2026-03-26  0:25 UTC (permalink / raw)
  To: Nico Pache
  Cc: linux-kernel, linux-mm, aarcange, anshuman.khandual, apopple,
	baohua, baolin.wang, byungchul, catalin.marinas, cl, corbet,
	dave.hansen, david, dev.jain, gourry, hannes, hughd, jackmanb,
	jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
	Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
	mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe, Roman Gushchin

On Wed, 25 Mar 2026 05:40:17 -0600 Nico Pache <npache@redhat.com> wrote:

> MAINTAINER NOTE: This is based on mm-unstable with the coresponding
> patches reverted then reapplied.

Unfortunately the update-in-place trick fooled AI review, which might
have been useful.  Oh well.  In retrospect we could have avoided that
by you asking me to drop v3 a couple of days before mailing out v4.

otoh, this series *does* apply to the mm-stable branch.  Roman, I
though Sashiko is attempting that?

> The following series contains cleanups and prerequisites for my work on
> khugepaged mTHP support [1]. These have been separated out to ease review.

And boy that's a lot of reviewers!  Aren't you a lucky ducky ;)

> The first patch in the series refactors the page fault folio to pte mapping
> and follows a similar convention as defined by map_anon_folio_pmd_(no)pf().
> This not only cleans up the current implementation of do_anonymous_page(),
> but will allow for reuse later in the khugepaged mTHP implementation.
> 
> The second patch adds a small is_pmd_order() helper to check if an order is
> the PMD order. This check is open-coded in a number of places. This patch
> aims to clean this up and will be used more in the khugepaged mTHP work.
> The third patch also adds a small DEFINE for (HPAGE_PMD_NR - 1) which is
> used often across the khugepaged code.
> 
> The fourth and fifth patch come from the khugepaged mTHP patchset [1].
> These two patches include the rename of function prefixes, and the
> unification of khugepaged and madvise_collapse via a new
> collapse_single_pmd function.
> 
> Patch 1:     refactor do_anonymous_page into map_anon_folio_pte_(no)pf
> Patch 2:     add is_pmd_order helper
> Patch 3:     Add define for (HPAGE_PMD_NR - 1)
> Patch 4:     Refactor/rename hpage_collapse
> Patch 5:     Refactoring to combine madvise_collapse and khugepaged
> 

Thanks, I updated mm.git's mm-unstable branch to this version.

> V4 Changes:
>  - added RB and SB tags
>  - Patch1: commit message cleanup/additions
>  - Patch1: constify two variables, and change 1<<order to 1L<<..
>  - Patch1: change zero-page read path to use update_mmu_cache varient
>  - Patch5: remove dead code switch statement (SCAN_PTE_MAPPED_HUGEPAGE)
>  - Patch5: remove local mmap_locked from madvise_collapse()
>  - Patch5: rename mmap_locked to lock_dropped in ..scan_mm_slot() and
>    invert the logic. the madvise|khugepaged code now share the same 
>    naming convention across both functions.
>  - Patch5: add assertion to collapse_single_pmd() so both madvise_collapse
>    and khugepaged assert the lock.
>  - Patch5: Convert one of the VM_BUG_ON's to VM_WARN_ON

Below is how v4 altered mm,git:

 mm/khugepaged.c |   34 +++++++++++++++-------------------
 mm/memory.c     |   11 +++++------
 2 files changed, 20 insertions(+), 25 deletions(-)

--- a/mm/khugepaged.c~b
+++ a/mm/khugepaged.c
@@ -1250,7 +1250,7 @@ out_nolock:
 
 static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
 		struct vm_area_struct *vma, unsigned long start_addr,
-		bool *mmap_locked, struct collapse_control *cc)
+		bool *lock_dropped, struct collapse_control *cc)
 {
 	pmd_t *pmd;
 	pte_t *pte, *_pte;
@@ -1425,7 +1425,7 @@ out_unmap:
 		result = collapse_huge_page(mm, start_addr, referenced,
 					    unmapped, cc);
 		/* collapse_huge_page will return with the mmap_lock released */
-		*mmap_locked = false;
+		*lock_dropped = true;
 	}
 out:
 	trace_mm_khugepaged_scan_pmd(mm, folio, referenced,
@@ -2422,7 +2422,7 @@ static enum scan_result collapse_scan_fi
  * the results.
  */
 static enum scan_result collapse_single_pmd(unsigned long addr,
-		struct vm_area_struct *vma, bool *mmap_locked,
+		struct vm_area_struct *vma, bool *lock_dropped,
 		struct collapse_control *cc)
 {
 	struct mm_struct *mm = vma->vm_mm;
@@ -2431,8 +2431,10 @@ static enum scan_result collapse_single_
 	struct file *file;
 	pgoff_t pgoff;
 
+	mmap_assert_locked(mm);
+
 	if (vma_is_anonymous(vma)) {
-		result = collapse_scan_pmd(mm, vma, addr, mmap_locked, cc);
+		result = collapse_scan_pmd(mm, vma, addr, lock_dropped, cc);
 		goto end;
 	}
 
@@ -2440,7 +2442,7 @@ static enum scan_result collapse_single_
 	pgoff = linear_page_index(vma, addr);
 
 	mmap_read_unlock(mm);
-	*mmap_locked = false;
+	*lock_dropped = true;
 retry:
 	result = collapse_scan_file(mm, addr, file, pgoff, cc);
 
@@ -2537,21 +2539,21 @@ static void collapse_scan_mm_slot(unsign
 		VM_BUG_ON(khugepaged_scan.address & ~HPAGE_PMD_MASK);
 
 		while (khugepaged_scan.address < hend) {
-			bool mmap_locked = true;
+			bool lock_dropped = false;
 
 			cond_resched();
 			if (unlikely(collapse_test_exit_or_disable(mm)))
 				goto breakouterloop;
 
-			VM_BUG_ON(khugepaged_scan.address < hstart ||
+			VM_WARN_ON_ONCE(khugepaged_scan.address < hstart ||
 				  khugepaged_scan.address + HPAGE_PMD_SIZE >
 				  hend);
 
 			*result = collapse_single_pmd(khugepaged_scan.address,
-						      vma, &mmap_locked, cc);
+						      vma, &lock_dropped, cc);
 			/* move to next address */
 			khugepaged_scan.address += HPAGE_PMD_SIZE;
-			if (!mmap_locked)
+			if (lock_dropped)
 				/*
 				 * We released mmap_lock so break loop.  Note
 				 * that we drop mmap_lock before all hugepage
@@ -2826,7 +2828,6 @@ int madvise_collapse(struct vm_area_stru
 	unsigned long hstart, hend, addr;
 	enum scan_result last_fail = SCAN_FAIL;
 	int thps = 0;
-	bool mmap_locked = true;
 
 	BUG_ON(vma->vm_start > start);
 	BUG_ON(vma->vm_end < end);
@@ -2849,10 +2850,10 @@ int madvise_collapse(struct vm_area_stru
 	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
 		enum scan_result result = SCAN_FAIL;
 
-		if (!mmap_locked) {
+		if (*lock_dropped) {
 			cond_resched();
 			mmap_read_lock(mm);
-			mmap_locked = true;
+			*lock_dropped = false;
 			result = hugepage_vma_revalidate(mm, addr, false, &vma,
 							 cc);
 			if (result  != SCAN_SUCCEED) {
@@ -2862,12 +2863,8 @@ int madvise_collapse(struct vm_area_stru
 
 			hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
 		}
-		mmap_assert_locked(mm);
-
-		result = collapse_single_pmd(addr, vma, &mmap_locked, cc);
 
-		if (!mmap_locked)
-			*lock_dropped = true;
+		result = collapse_single_pmd(addr, vma, lock_dropped, cc);
 
 		switch (result) {
 		case SCAN_SUCCEED:
@@ -2876,7 +2873,6 @@ int madvise_collapse(struct vm_area_stru
 			break;
 		/* Whitelisted set of results where continuing OK */
 		case SCAN_NO_PTE_TABLE:
-		case SCAN_PTE_MAPPED_HUGEPAGE:
 		case SCAN_PTE_NON_PRESENT:
 		case SCAN_PTE_UFFD_WP:
 		case SCAN_LACK_REFERENCED_PAGE:
@@ -2897,7 +2893,7 @@ int madvise_collapse(struct vm_area_stru
 
 out_maybelock:
 	/* Caller expects us to hold mmap_lock on return */
-	if (!mmap_locked)
+	if (*lock_dropped)
 		mmap_read_lock(mm);
 out_nolock:
 	mmap_assert_locked(mm);
--- a/mm/memory.c~b
+++ a/mm/memory.c
@@ -5201,7 +5201,7 @@ void map_anon_folio_pte_nopf(struct foli
 		struct vm_area_struct *vma, unsigned long addr,
 		bool uffd_wp)
 {
-	unsigned int nr_pages = folio_nr_pages(folio);
+	const unsigned int nr_pages = folio_nr_pages(folio);
 	pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
 
 	entry = pte_sw_mkyoung(entry);
@@ -5221,10 +5221,10 @@ void map_anon_folio_pte_nopf(struct foli
 static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
 		struct vm_area_struct *vma, unsigned long addr, bool uffd_wp)
 {
-	unsigned int order = folio_order(folio);
+	const unsigned int order = folio_order(folio);
 
 	map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
-	add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1 << order);
+	add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1L << order);
 	count_mthp_stat(order, MTHP_STAT_ANON_FAULT_ALLOC);
 }
 
@@ -5239,7 +5239,7 @@ static vm_fault_t do_anonymous_page(stru
 	unsigned long addr = vmf->address;
 	struct folio *folio;
 	vm_fault_t ret = 0;
-	int nr_pages = 1;
+	int nr_pages;
 	pte_t entry;
 
 	/* File mapping without ->vm_ops ? */
@@ -5279,8 +5279,7 @@ static vm_fault_t do_anonymous_page(stru
 		set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
 
 		/* No need to invalidate - it was non-present before */
-		update_mmu_cache_range(vmf, vma, addr, vmf->pte,
-				       /*nr_pages=*/ 1);
+		update_mmu_cache(vma, addr, vmf->pte);
 		goto unlock;
 	}
 
_



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites
  2026-03-26  0:25 ` Andrew Morton
@ 2026-03-26  4:44   ` Roman Gushchin
  2026-03-26 16:48     ` Nico Pache
  0 siblings, 1 reply; 35+ messages in thread
From: Roman Gushchin @ 2026-03-26  4:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nico Pache, linux-kernel, linux-mm, aarcange, anshuman.khandual,
	apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
	corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
	jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
	Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
	mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe

Andrew Morton <akpm@linux-foundation.org> writes:

> On Wed, 25 Mar 2026 05:40:17 -0600 Nico Pache <npache@redhat.com> wrote:
>
>> MAINTAINER NOTE: This is based on mm-unstable with the coresponding
>> patches reverted then reapplied.
>
> Unfortunately the update-in-place trick fooled AI review, which might
> have been useful.  Oh well.  In retrospect we could have avoided that
> by you asking me to drop v3 a couple of days before mailing out v4.
>
> otoh, this series *does* apply to the mm-stable branch.  Roman, I
> though Sashiko is attempting that?

It did, but for some reason it failed.
You can actually see which trees/branches it tried under the Baseline
section for each patchset. And yes, I need to improve it to log specific
sha's, not only something like mm/mm-new. Now it's kinda hard to say why
it failed.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites
  2026-03-26  4:44   ` Roman Gushchin
@ 2026-03-26 16:48     ` Nico Pache
  2026-03-26 17:35       ` Andrew Morton
  0 siblings, 1 reply; 35+ messages in thread
From: Nico Pache @ 2026-03-26 16:48 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-kernel, linux-mm, aarcange,
	anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, david, dev.jain, gourry,
	hannes, hughd, jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe

On Wed, Mar 25, 2026 at 10:45 PM Roman Gushchin
<roman.gushchin@linux.dev> wrote:
>
> Andrew Morton <akpm@linux-foundation.org> writes:
>
> > On Wed, 25 Mar 2026 05:40:17 -0600 Nico Pache <npache@redhat.com> wrote:
> >
> >> MAINTAINER NOTE: This is based on mm-unstable with the coresponding
> >> patches reverted then reapplied.
> >
> > Unfortunately the update-in-place trick fooled AI review, which might
> > have been useful.  Oh well.  In retrospect we could have avoided that
> > by you asking me to drop v3 a couple of days before mailing out v4.
> >
> > otoh, this series *does* apply to the mm-stable branch.  Roman, I
> > though Sashiko is attempting that?
>
> It did, but for some reason it failed.
> You can actually see which trees/branches it tried under the Baseline
> section for each patchset. And yes, I need to improve it to log specific
> sha's, not only something like mm/mm-new. Now it's kinda hard to say why
> it failed.

I dont think this was Sashkio's fault; this doesn't apply cleanly to
anything other than mm-unstable with my patches first reverted. I
first tried an mm-stable base but my code is now dependent on some
code in mm-unstable. As Andrew noted, I should have requested that he
pull the patches from mm-unstable before resending them.

-- Nico

>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites
  2026-03-26 16:48     ` Nico Pache
@ 2026-03-26 17:35       ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2026-03-26 17:35 UTC (permalink / raw)
  To: Nico Pache
  Cc: Roman Gushchin, linux-kernel, linux-mm, aarcange,
	anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, david, dev.jain, gourry,
	hannes, hughd, jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe

On Thu, 26 Mar 2026 10:48:13 -0600 Nico Pache <npache@redhat.com> wrote:

> > It did, but for some reason it failed.
> > You can actually see which trees/branches it tried under the Baseline
> > section for each patchset. And yes, I need to improve it to log specific
> > sha's, not only something like mm/mm-new. Now it's kinda hard to say why
> > it failed.
> 
> I dont think this was Sashkio's fault; this doesn't apply cleanly to
> anything other than mm-unstable with my patches first reverted. I
> first tried an mm-stable base but my code is now dependent on some
> code in mm-unstable. As Andrew noted, I should have requested that he
> pull the patches from mm-unstable before resending them.

Well, you'd be the first person to do this.  We're still figuring out
how to use this tool, taking it day-by-day.

It would be nice for you to have the opportunity - most authors are
appreciating the AI checking.  It just caused Brendan to describe his
own patch as "complete rubbish" ;)

But don't bust a gut over this - let's ease into it rather than aiming
for some possibly premature step transition.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-03-25 11:40 ` [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd() Nico Pache
@ 2026-03-31 14:01   ` Lorenzo Stoakes (Oracle)
  2026-03-31 14:13     ` David Hildenbrand (Arm)
                       ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-31 14:01 UTC (permalink / raw)
  To: Nico Pache
  Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
	apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
	corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
	jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
	Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
	mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe

OK we need a fairly urgent fix for this as this has triggered a syzbot. See [0]
for an analysis.

I show inline where the issue is, and attach a fix-patch for the bug.

[0]: https://lore.kernel.org/all/e1cb33b8-c1f7-4972-8628-3a2169077d6e@lucifer.local/

See below for details.

Cheers, Lorenzo

On Wed, Mar 25, 2026 at 05:40:22AM -0600, Nico Pache wrote:
> The khugepaged daemon and madvise_collapse have two different
> implementations that do almost the same thing. Create collapse_single_pmd
> to increase code reuse and create an entry point to these two users.
>
> Refactor madvise_collapse and collapse_scan_mm_slot to use the new
> collapse_single_pmd function. To help reduce confusion around the
> mmap_locked variable, we rename mmap_locked to lock_dropped in the
> collapse_scan_mm_slot() function, and remove the redundant mmap_locked
> in madvise_collapse(); this further unifies the code readiblity. the
> SCAN_PTE_MAPPED_HUGEPAGE enum is no longer reachable in the
> madvise_collapse() function, so we drop it from the list of "continuing"
> enums.
>
> This introduces a minor behavioral change that is most likely an
> undiscovered bug. The current implementation of khugepaged tests
> collapse_test_exit_or_disable() before calling collapse_pte_mapped_thp,
> but we weren't doing it in the madvise_collapse case. By unifying these
> two callers madvise_collapse now also performs this check. We also modify
> the return value to be SCAN_ANY_PROCESS which properly indicates that this
> process is no longer valid to operate on.
>
> By moving the madvise_collapse writeback-retry logic into the helper
> function we can also avoid having to revalidate the VMA.
>
> We guard the khugepaged_pages_collapsed variable to ensure its only
> incremented for khugepaged.
>
> As requested we also convert a VM_BUG_ON to a VM_WARN_ON.
>
> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
>  mm/khugepaged.c | 142 ++++++++++++++++++++++++------------------------
>  1 file changed, 72 insertions(+), 70 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 3728a2cf133c..d06d84219e1b 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1257,7 +1257,7 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
>
>  static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
>  		struct vm_area_struct *vma, unsigned long start_addr,
> -		bool *mmap_locked, struct collapse_control *cc)
> +		bool *lock_dropped, struct collapse_control *cc)
>  {
>  	pmd_t *pmd;
>  	pte_t *pte, *_pte;
> @@ -1432,7 +1432,7 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
>  		result = collapse_huge_page(mm, start_addr, referenced,
>  					    unmapped, cc);
>  		/* collapse_huge_page will return with the mmap_lock released */
> -		*mmap_locked = false;
> +		*lock_dropped = true;
>  	}
>  out:
>  	trace_mm_khugepaged_scan_pmd(mm, folio, referenced,
> @@ -2424,6 +2424,67 @@ static enum scan_result collapse_scan_file(struct mm_struct *mm,
>  	return result;
>  }
>
> +/*
> + * Try to collapse a single PMD starting at a PMD aligned addr, and return
> + * the results.
> + */
> +static enum scan_result collapse_single_pmd(unsigned long addr,
> +		struct vm_area_struct *vma, bool *lock_dropped,
> +		struct collapse_control *cc)
> +{
> +	struct mm_struct *mm = vma->vm_mm;
> +	bool triggered_wb = false;
> +	enum scan_result result;
> +	struct file *file;
> +	pgoff_t pgoff;
> +
> +	mmap_assert_locked(mm);
> +
> +	if (vma_is_anonymous(vma)) {
> +		result = collapse_scan_pmd(mm, vma, addr, lock_dropped, cc);
> +		goto end;
> +	}
> +
> +	file = get_file(vma->vm_file);
> +	pgoff = linear_page_index(vma, addr);
> +
> +	mmap_read_unlock(mm);
> +	*lock_dropped = true;
> +retry:
> +	result = collapse_scan_file(mm, addr, file, pgoff, cc);
> +
> +	/*
> +	 * For MADV_COLLAPSE, when encountering dirty pages, try to writeback,
> +	 * then retry the collapse one time.
> +	 */
> +	if (!cc->is_khugepaged && result == SCAN_PAGE_DIRTY_OR_WRITEBACK &&
> +	    !triggered_wb && mapping_can_writeback(file->f_mapping)) {
> +		const loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
> +		const loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
> +
> +		filemap_write_and_wait_range(file->f_mapping, lstart, lend);
> +		triggered_wb = true;
> +		goto retry;
> +	}
> +	fput(file);
> +
> +	if (result == SCAN_PTE_MAPPED_HUGEPAGE) {
> +		mmap_read_lock(mm);
> +		if (collapse_test_exit_or_disable(mm))
> +			result = SCAN_ANY_PROCESS;
> +		else
> +			result = try_collapse_pte_mapped_thp(mm, addr,
> +							     !cc->is_khugepaged);
> +		if (result == SCAN_PMD_MAPPED)
> +			result = SCAN_SUCCEED;
> +		mmap_read_unlock(mm);
> +	}
> +end:
> +	if (cc->is_khugepaged && result == SCAN_SUCCEED)
> +		++khugepaged_pages_collapsed;
> +	return result;
> +}
> +
>  static void collapse_scan_mm_slot(unsigned int progress_max,
>  		enum scan_result *result, struct collapse_control *cc)
>  	__releases(&khugepaged_mm_lock)
> @@ -2485,46 +2546,21 @@ static void collapse_scan_mm_slot(unsigned int progress_max,
>  		VM_BUG_ON(khugepaged_scan.address & ~HPAGE_PMD_MASK);
>
>  		while (khugepaged_scan.address < hend) {
> -			bool mmap_locked = true;
> +			bool lock_dropped = false;
>
>  			cond_resched();
>  			if (unlikely(collapse_test_exit_or_disable(mm)))
>  				goto breakouterloop;
>
> -			VM_BUG_ON(khugepaged_scan.address < hstart ||
> +			VM_WARN_ON_ONCE(khugepaged_scan.address < hstart ||
>  				  khugepaged_scan.address + HPAGE_PMD_SIZE >
>  				  hend);
> -			if (!vma_is_anonymous(vma)) {
> -				struct file *file = get_file(vma->vm_file);
> -				pgoff_t pgoff = linear_page_index(vma,
> -						khugepaged_scan.address);
> -
> -				mmap_read_unlock(mm);
> -				mmap_locked = false;
> -				*result = collapse_scan_file(mm,
> -					khugepaged_scan.address, file, pgoff, cc);
> -				fput(file);
> -				if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
> -					mmap_read_lock(mm);
> -					if (collapse_test_exit_or_disable(mm))
> -						goto breakouterloop;
> -					*result = try_collapse_pte_mapped_thp(mm,
> -						khugepaged_scan.address, false);
> -					if (*result == SCAN_PMD_MAPPED)
> -						*result = SCAN_SUCCEED;
> -					mmap_read_unlock(mm);
> -				}
> -			} else {
> -				*result = collapse_scan_pmd(mm, vma,
> -					khugepaged_scan.address, &mmap_locked, cc);
> -			}
> -
> -			if (*result == SCAN_SUCCEED)
> -				++khugepaged_pages_collapsed;
>
> +			*result = collapse_single_pmd(khugepaged_scan.address,
> +						      vma, &lock_dropped, cc);
>  			/* move to next address */
>  			khugepaged_scan.address += HPAGE_PMD_SIZE;
> -			if (!mmap_locked)
> +			if (lock_dropped)
>  				/*
>  				 * We released mmap_lock so break loop.  Note
>  				 * that we drop mmap_lock before all hugepage
> @@ -2799,7 +2835,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>  	unsigned long hstart, hend, addr;
>  	enum scan_result last_fail = SCAN_FAIL;
>  	int thps = 0;
> -	bool mmap_locked = true;
>
>  	BUG_ON(vma->vm_start > start);
>  	BUG_ON(vma->vm_end < end);
> @@ -2821,13 +2856,11 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>
>  	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
>  		enum scan_result result = SCAN_FAIL;
> -		bool triggered_wb = false;
>
> -retry:
> -		if (!mmap_locked) {
> +		if (*lock_dropped) {
>  			cond_resched();
>  			mmap_read_lock(mm);
> -			mmap_locked = true;
> +			*lock_dropped = false;

So this is the bug. 'lock_dropped' needs to record if the lock was _ever_
dropped, not if it is _currently_ dropped.

This is probably a mea culpa on my part on review, so apologies.

See below for a fix-patch.

>  			result = hugepage_vma_revalidate(mm, addr, false, &vma,
>  							 cc);
>  			if (result  != SCAN_SUCCEED) {
> @@ -2837,45 +2870,14 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>
>  			hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
>  		}
> -		mmap_assert_locked(mm);
> -		if (!vma_is_anonymous(vma)) {
> -			struct file *file = get_file(vma->vm_file);
> -			pgoff_t pgoff = linear_page_index(vma, addr);
>
> -			mmap_read_unlock(mm);
> -			mmap_locked = false;
> -			*lock_dropped = true;
> -			result = collapse_scan_file(mm, addr, file, pgoff, cc);
> -
> -			if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
> -			    mapping_can_writeback(file->f_mapping)) {
> -				loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
> -				loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
> -
> -				filemap_write_and_wait_range(file->f_mapping, lstart, lend);
> -				triggered_wb = true;
> -				fput(file);
> -				goto retry;
> -			}
> -			fput(file);
> -		} else {
> -			result = collapse_scan_pmd(mm, vma, addr, &mmap_locked, cc);
> -		}
> -		if (!mmap_locked)
> -			*lock_dropped = true;
> +		result = collapse_single_pmd(addr, vma, lock_dropped, cc);
>
> -handle_result:
>  		switch (result) {
>  		case SCAN_SUCCEED:
>  		case SCAN_PMD_MAPPED:
>  			++thps;
>  			break;
> -		case SCAN_PTE_MAPPED_HUGEPAGE:
> -			BUG_ON(mmap_locked);
> -			mmap_read_lock(mm);
> -			result = try_collapse_pte_mapped_thp(mm, addr, true);
> -			mmap_read_unlock(mm);
> -			goto handle_result;
>  		/* Whitelisted set of results where continuing OK */
>  		case SCAN_NO_PTE_TABLE:
>  		case SCAN_PTE_NON_PRESENT:
> @@ -2898,7 +2900,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>
>  out_maybelock:
>  	/* Caller expects us to hold mmap_lock on return */
> -	if (!mmap_locked)
> +	if (*lock_dropped)
>  		mmap_read_lock(mm);
>  out_nolock:
>  	mmap_assert_locked(mm);
> --
> 2.53.0
>

Fix patch follows:

----8<----
From a4dfc7718a15035449f344a0bc7f58e449366405 Mon Sep 17 00:00:00 2001
From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Date: Tue, 31 Mar 2026 13:11:18 +0100
Subject: [PATCH] mm/khugepaged: fix issue with tracking lock

We are incorrectly treating lock_dropped to track both whether the lock is
currently held and whether or not the lock was ever dropped.

Update this change to account for this.

Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 mm/khugepaged.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index d21348b85a59..b8452dbdb043 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2828,6 +2828,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 	unsigned long hstart, hend, addr;
 	enum scan_result last_fail = SCAN_FAIL;
 	int thps = 0;
+	bool mmap_unlocked = false;

 	BUG_ON(vma->vm_start > start);
 	BUG_ON(vma->vm_end < end);
@@ -2850,10 +2851,11 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
 		enum scan_result result = SCAN_FAIL;

-		if (*lock_dropped) {
+		if (mmap_unlocked) {
 			cond_resched();
 			mmap_read_lock(mm);
-			*lock_dropped = false;
+			mmap_unlocked = false;
+			*lock_dropped = true;
 			result = hugepage_vma_revalidate(mm, addr, false, &vma,
 							cc);
 			if (result  != SCAN_SUCCEED) {
@@ -2864,7 +2866,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 			hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
 		}

-		result = collapse_single_pmd(addr, vma, lock_dropped, cc);
+		result = collapse_single_pmd(addr, vma, &mmap_unlocked, cc);

 		switch (result) {
 		case SCAN_SUCCEED:
@@ -2893,8 +2895,10 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,

 out_maybelock:
 	/* Caller expects us to hold mmap_lock on return */
-	if (*lock_dropped)
+	if (mmap_unlocked) {
+		*lock_dropped = true;
 		mmap_read_lock(mm);
+	}
 out_nolock:
 	mmap_assert_locked(mm);
 	mmdrop(mm);
--
2.53.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-03-31 14:01   ` Lorenzo Stoakes (Oracle)
@ 2026-03-31 14:13     ` David Hildenbrand (Arm)
  2026-03-31 14:15       ` Lorenzo Stoakes (Oracle)
  2026-03-31 19:46       ` Nico Pache
  2026-03-31 16:29     ` Lance Yang
  2026-03-31 19:59     ` Nico Pache
  2 siblings, 2 replies; 35+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-31 14:13 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle), Nico Pache
  Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
	apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
	corbet, dave.hansen, dev.jain, gourry, hannes, hughd, jackmanb,
	jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
	Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
	mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe

>> +		if (*lock_dropped) {
>>  			cond_resched();
>>  			mmap_read_lock(mm);
>> -			mmap_locked = true;
>> +			*lock_dropped = false;
> 
> So this is the bug. 'lock_dropped' needs to record if the lock was _ever_
> dropped, not if it is _currently_ dropped.

Ah, well spotted. I thought we discussed that at some point during
review ...

Thanks for tackling this!

[...]

> 
> ----8<----
> From a4dfc7718a15035449f344a0bc7f58e449366405 Mon Sep 17 00:00:00 2001
> From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
> Date: Tue, 31 Mar 2026 13:11:18 +0100
> Subject: [PATCH] mm/khugepaged: fix issue with tracking lock
> 
> We are incorrectly treating lock_dropped to track both whether the lock is
> currently held and whether or not the lock was ever dropped.
> 
> Update this change to account for this.
> 
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
>  mm/khugepaged.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index d21348b85a59..b8452dbdb043 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2828,6 +2828,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>  	unsigned long hstart, hend, addr;
>  	enum scan_result last_fail = SCAN_FAIL;
>  	int thps = 0;
> +	bool mmap_unlocked = false;
> 
>  	BUG_ON(vma->vm_start > start);
>  	BUG_ON(vma->vm_end < end);
> @@ -2850,10 +2851,11 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>  	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
>  		enum scan_result result = SCAN_FAIL;
> 
> -		if (*lock_dropped) {
> +		if (mmap_unlocked) {
>  			cond_resched();
>  			mmap_read_lock(mm);
> -			*lock_dropped = false;
> +			mmap_unlocked = false;
> +			*lock_dropped = true;
>  			result = hugepage_vma_revalidate(mm, addr, false, &vma,
>  							cc);
>  			if (result  != SCAN_SUCCEED) {
> @@ -2864,7 +2866,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,

(in hurry so I might be wrong)

Do we have to handle when collapse_single_pmd() dropped the lock as well?

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-03-31 14:13     ` David Hildenbrand (Arm)
@ 2026-03-31 14:15       ` Lorenzo Stoakes (Oracle)
  2026-03-31 14:46         ` David Hildenbrand (Arm)
  2026-03-31 20:00         ` David Hildenbrand (Arm)
  2026-03-31 19:46       ` Nico Pache
  1 sibling, 2 replies; 35+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-31 14:15 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Nico Pache, linux-kernel, linux-mm, aarcange, akpm,
	anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
	hannes, hughd, jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe

On Tue, Mar 31, 2026 at 04:13:29PM +0200, David Hildenbrand (Arm) wrote:
> >> +		if (*lock_dropped) {
> >>  			cond_resched();
> >>  			mmap_read_lock(mm);
> >> -			mmap_locked = true;
> >> +			*lock_dropped = false;
> >
> > So this is the bug. 'lock_dropped' needs to record if the lock was _ever_
> > dropped, not if it is _currently_ dropped.
>
> Ah, well spotted. I thought we discussed that at some point during
> review ...
>
> Thanks for tackling this!
>
> [...]
>
> >
> > ----8<----
> > From a4dfc7718a15035449f344a0bc7f58e449366405 Mon Sep 17 00:00:00 2001
> > From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
> > Date: Tue, 31 Mar 2026 13:11:18 +0100
> > Subject: [PATCH] mm/khugepaged: fix issue with tracking lock
> >
> > We are incorrectly treating lock_dropped to track both whether the lock is
> > currently held and whether or not the lock was ever dropped.
> >
> > Update this change to account for this.
> >
> > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > ---
> >  mm/khugepaged.c | 12 ++++++++----
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index d21348b85a59..b8452dbdb043 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -2828,6 +2828,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> >  	unsigned long hstart, hend, addr;
> >  	enum scan_result last_fail = SCAN_FAIL;
> >  	int thps = 0;
> > +	bool mmap_unlocked = false;
> >
> >  	BUG_ON(vma->vm_start > start);
> >  	BUG_ON(vma->vm_end < end);
> > @@ -2850,10 +2851,11 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> >  	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
> >  		enum scan_result result = SCAN_FAIL;
> >
> > -		if (*lock_dropped) {
> > +		if (mmap_unlocked) {
> >  			cond_resched();
> >  			mmap_read_lock(mm);
> > -			*lock_dropped = false;
> > +			mmap_unlocked = false;
> > +			*lock_dropped = true;
> >  			result = hugepage_vma_revalidate(mm, addr, false, &vma,
> >  							cc);
> >  			if (result  != SCAN_SUCCEED) {
> > @@ -2864,7 +2866,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>
> (in hurry so I might be wrong)
>
> Do we have to handle when collapse_single_pmd() dropped the lock as well?

That updates mmap_unlocked which will be handled in the loop or on exit:

		if (mmap_unlocked) {
			...
			mmap_read_lock(mm);
			mmap_unlocked = false;
			*lock_dropped = true;
			...
		}

		result = collapse_single_pmd(addr, vma, &mmap_unlocked, cc);

...

out_maybelock:
	/* Caller expects us to hold mmap_lock on return */
	if (mmap_unlocked) {
		*lock_dropped = true;
		mmap_read_lock(mm);
	}

>
> --
> Cheers,
>
> David

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-03-31 14:15       ` Lorenzo Stoakes (Oracle)
@ 2026-03-31 14:46         ` David Hildenbrand (Arm)
  2026-03-31 20:00         ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 35+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-31 14:46 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Nico Pache, linux-kernel, linux-mm, aarcange, akpm,
	anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
	hannes, hughd, jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe

On 3/31/26 16:15, Lorenzo Stoakes (Oracle) wrote:
> On Tue, Mar 31, 2026 at 04:13:29PM +0200, David Hildenbrand (Arm) wrote:
>>>
>>> So this is the bug. 'lock_dropped' needs to record if the lock was _ever_
>>> dropped, not if it is _currently_ dropped.
>>
>> Ah, well spotted. I thought we discussed that at some point during
>> review ...
>>
>> Thanks for tackling this!
>>
>> [...]
>>
>>>
>>> ----8<----
>>> From a4dfc7718a15035449f344a0bc7f58e449366405 Mon Sep 17 00:00:00 2001
>>> From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
>>> Date: Tue, 31 Mar 2026 13:11:18 +0100
>>> Subject: [PATCH] mm/khugepaged: fix issue with tracking lock
>>>
>>> We are incorrectly treating lock_dropped to track both whether the lock is
>>> currently held and whether or not the lock was ever dropped.
>>>
>>> Update this change to account for this.
>>>
>>> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
>>> ---
>>>  mm/khugepaged.c | 12 ++++++++----
>>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index d21348b85a59..b8452dbdb043 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -2828,6 +2828,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>>>  	unsigned long hstart, hend, addr;
>>>  	enum scan_result last_fail = SCAN_FAIL;
>>>  	int thps = 0;
>>> +	bool mmap_unlocked = false;
>>>
>>>  	BUG_ON(vma->vm_start > start);
>>>  	BUG_ON(vma->vm_end < end);
>>> @@ -2850,10 +2851,11 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>>>  	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
>>>  		enum scan_result result = SCAN_FAIL;
>>>
>>> -		if (*lock_dropped) {
>>> +		if (mmap_unlocked) {
>>>  			cond_resched();
>>>  			mmap_read_lock(mm);
>>> -			*lock_dropped = false;
>>> +			mmap_unlocked = false;
>>> +			*lock_dropped = true;
>>>  			result = hugepage_vma_revalidate(mm, addr, false, &vma,
>>>  							cc);
>>>  			if (result  != SCAN_SUCCEED) {
>>> @@ -2864,7 +2866,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>>
>> (in hurry so I might be wrong)
>>
>> Do we have to handle when collapse_single_pmd() dropped the lock as well?

Great, thanks!

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-03-31 14:01   ` Lorenzo Stoakes (Oracle)
  2026-03-31 14:13     ` David Hildenbrand (Arm)
@ 2026-03-31 16:29     ` Lance Yang
  2026-03-31 19:59     ` Nico Pache
  2 siblings, 0 replies; 35+ messages in thread
From: Lance Yang @ 2026-03-31 16:29 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle), Nico Pache
  Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
	apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
	corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
	jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, Liam.Howlett,
	lorenzo.stoakes, mathieu.desnoyers, matthew.brost, mhiramat,
	mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe



On 2026/3/31 22:01, Lorenzo Stoakes (Oracle) wrote:
> OK we need a fairly urgent fix for this as this has triggered a syzbot. See [0]
> for an analysis.
> 
> I show inline where the issue is, and attach a fix-patch for the bug.
> 
> [0]: https://lore.kernel.org/all/e1cb33b8-c1f7-4972-8628-3a2169077d6e@lucifer.local/
> 
> See below for details.
> 
> Cheers, Lorenzo
> 
[...]
> 
> Fix patch follows:
> 
> ----8<----
>  From a4dfc7718a15035449f344a0bc7f58e449366405 Mon Sep 17 00:00:00 2001
> From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
> Date: Tue, 31 Mar 2026 13:11:18 +0100
> Subject: [PATCH] mm/khugepaged: fix issue with tracking lock
> 
> We are incorrectly treating lock_dropped to track both whether the lock is
> currently held and whether or not the lock was ever dropped.

Good catch!

Right, lock_dropped is not supposed to mean "is the mmap lock currently
unlocked?", it should mean "was the mmap lock dropped at any point
during MADV_COLLAPSE?"

> 
> Update this change to account for this.
> 
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---

Thanks for the fix!
Reviewed-by: Lance Yang <lance.yang@linux.dev>

>   mm/khugepaged.c | 12 ++++++++----
>   1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index d21348b85a59..b8452dbdb043 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2828,6 +2828,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>   	unsigned long hstart, hend, addr;
>   	enum scan_result last_fail = SCAN_FAIL;
>   	int thps = 0;
> +	bool mmap_unlocked = false;
> 
>   	BUG_ON(vma->vm_start > start);
>   	BUG_ON(vma->vm_end < end);
> @@ -2850,10 +2851,11 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>   	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
>   		enum scan_result result = SCAN_FAIL;
> 
> -		if (*lock_dropped) {
> +		if (mmap_unlocked) {
>   			cond_resched();
>   			mmap_read_lock(mm);
> -			*lock_dropped = false;
> +			mmap_unlocked = false;
> +			*lock_dropped = true;
>   			result = hugepage_vma_revalidate(mm, addr, false, &vma,
>   							cc);
>   			if (result  != SCAN_SUCCEED) {
> @@ -2864,7 +2866,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>   			hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
>   		}
> 
> -		result = collapse_single_pmd(addr, vma, lock_dropped, cc);
> +		result = collapse_single_pmd(addr, vma, &mmap_unlocked, cc);
> 
>   		switch (result) {
>   		case SCAN_SUCCEED:
> @@ -2893,8 +2895,10 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> 
>   out_maybelock:
>   	/* Caller expects us to hold mmap_lock on return */
> -	if (*lock_dropped)
> +	if (mmap_unlocked) {
> +		*lock_dropped = true;
>   		mmap_read_lock(mm);
> +	}
>   out_nolock:
>   	mmap_assert_locked(mm);
>   	mmdrop(mm);
> --
> 2.53.0



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-03-31 14:13     ` David Hildenbrand (Arm)
  2026-03-31 14:15       ` Lorenzo Stoakes (Oracle)
@ 2026-03-31 19:46       ` Nico Pache
  2026-03-31 19:59         ` Lorenzo Stoakes (Oracle)
  1 sibling, 1 reply; 35+ messages in thread
From: Nico Pache @ 2026-03-31 19:46 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Lorenzo Stoakes (Oracle), linux-kernel, linux-mm, aarcange, akpm,
	anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
	hannes, hughd, jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe

On Tue, Mar 31, 2026 at 8:13 AM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> >> +            if (*lock_dropped) {
> >>                      cond_resched();
> >>                      mmap_read_lock(mm);
> >> -                    mmap_locked = true;
> >> +                    *lock_dropped = false;
> >
> > So this is the bug. 'lock_dropped' needs to record if the lock was _ever_
> > dropped, not if it is _currently_ dropped.
>
> Ah, well spotted. I thought we discussed that at some point during
> review ...

Yes I think we've discussed this before, and IIRC the outcome was, "We
weren't sure why this semantic was in place, or if we truly needed to
track if it was dropped at any point, or simply if it is currently
dropped.". The code is rather confusing, and changed mid-flight during
my series.

Lorenzo asked me to unify this semantic across the two functions to
further simplify readability, but it looks like we indeed needed that
extra tracking.


Cheers,
 -- Nico

>
> Thanks for tackling this!
>
> [...]
>
> >
> > ----8<----
> > From a4dfc7718a15035449f344a0bc7f58e449366405 Mon Sep 17 00:00:00 2001
> > From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
> > Date: Tue, 31 Mar 2026 13:11:18 +0100
> > Subject: [PATCH] mm/khugepaged: fix issue with tracking lock
> >
> > We are incorrectly treating lock_dropped to track both whether the lock is
> > currently held and whether or not the lock was ever dropped.
> >
> > Update this change to account for this.
> >
> > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > ---
> >  mm/khugepaged.c | 12 ++++++++----
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index d21348b85a59..b8452dbdb043 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -2828,6 +2828,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> >       unsigned long hstart, hend, addr;
> >       enum scan_result last_fail = SCAN_FAIL;
> >       int thps = 0;
> > +     bool mmap_unlocked = false;
> >
> >       BUG_ON(vma->vm_start > start);
> >       BUG_ON(vma->vm_end < end);
> > @@ -2850,10 +2851,11 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> >       for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
> >               enum scan_result result = SCAN_FAIL;
> >
> > -             if (*lock_dropped) {
> > +             if (mmap_unlocked) {
> >                       cond_resched();
> >                       mmap_read_lock(mm);
> > -                     *lock_dropped = false;
> > +                     mmap_unlocked = false;
> > +                     *lock_dropped = true;
> >                       result = hugepage_vma_revalidate(mm, addr, false, &vma,
> >                                                       cc);
> >                       if (result  != SCAN_SUCCEED) {
> > @@ -2864,7 +2866,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>
> (in hurry so I might be wrong)
>
> Do we have to handle when collapse_single_pmd() dropped the lock as well?

There are some oddities about madvise that sadly I don't fully
understand. This is one of them. However, I dont believe khugepaged
had these lock_dropped semantics and just tracked the state of the
lock.

>
> --
> Cheers,
>
> David
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-03-31 19:46       ` Nico Pache
@ 2026-03-31 19:59         ` Lorenzo Stoakes (Oracle)
  0 siblings, 0 replies; 35+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-31 19:59 UTC (permalink / raw)
  To: Nico Pache
  Cc: David Hildenbrand (Arm), linux-kernel, linux-mm, aarcange, akpm,
	anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
	hannes, hughd, jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe

On Tue, Mar 31, 2026 at 01:46:02PM -0600, Nico Pache wrote:
> On Tue, Mar 31, 2026 at 8:13 AM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
> >
> > >> +            if (*lock_dropped) {
> > >>                      cond_resched();
> > >>                      mmap_read_lock(mm);
> > >> -                    mmap_locked = true;
> > >> +                    *lock_dropped = false;
> > >
> > > So this is the bug. 'lock_dropped' needs to record if the lock was _ever_
> > > dropped, not if it is _currently_ dropped.
> >
> > Ah, well spotted. I thought we discussed that at some point during
> > review ...
>
> Yes I think we've discussed this before, and IIRC the outcome was, "We
> weren't sure why this semantic was in place, or if we truly needed to
> track if it was dropped at any point, or simply if it is currently
> dropped.". The code is rather confusing, and changed mid-flight during
> my series.
>
> Lorenzo asked me to unify this semantic across the two functions to
> further simplify readability, but it looks like we indeed needed that
> extra tracking.
>
>
> Cheers,
>  -- Nico
>
> >
> > Thanks for tackling this!
> >
> > [...]
> >
> > >
> > > ----8<----
> > > From a4dfc7718a15035449f344a0bc7f58e449366405 Mon Sep 17 00:00:00 2001
> > > From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
> > > Date: Tue, 31 Mar 2026 13:11:18 +0100
> > > Subject: [PATCH] mm/khugepaged: fix issue with tracking lock
> > >
> > > We are incorrectly treating lock_dropped to track both whether the lock is
> > > currently held and whether or not the lock was ever dropped.
> > >
> > > Update this change to account for this.
> > >
> > > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > > ---
> > >  mm/khugepaged.c | 12 ++++++++----
> > >  1 file changed, 8 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index d21348b85a59..b8452dbdb043 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -2828,6 +2828,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> > >       unsigned long hstart, hend, addr;
> > >       enum scan_result last_fail = SCAN_FAIL;
> > >       int thps = 0;
> > > +     bool mmap_unlocked = false;
> > >
> > >       BUG_ON(vma->vm_start > start);
> > >       BUG_ON(vma->vm_end < end);
> > > @@ -2850,10 +2851,11 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> > >       for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
> > >               enum scan_result result = SCAN_FAIL;
> > >
> > > -             if (*lock_dropped) {
> > > +             if (mmap_unlocked) {
> > >                       cond_resched();
> > >                       mmap_read_lock(mm);
> > > -                     *lock_dropped = false;
> > > +                     mmap_unlocked = false;
> > > +                     *lock_dropped = true;
> > >                       result = hugepage_vma_revalidate(mm, addr, false, &vma,
> > >                                                       cc);
> > >                       if (result  != SCAN_SUCCEED) {
> > > @@ -2864,7 +2866,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> >
> > (in hurry so I might be wrong)
> >
> > Do we have to handle when collapse_single_pmd() dropped the lock as well?
>
> There are some oddities about madvise that sadly I don't fully
> understand. This is one of them. However, I dont believe khugepaged
> had these lock_dropped semantics and just tracked the state of the
> lock.

(As said to David) with my fix-patch we pass a pointer to mmap_unlocked and will
set the upstream lock_dropped appropriately as a result :)

So we cover that case also.

But yeah the handling in madvise is weird, and we maybe need to rethink a bit at
some point.

>
> >
> > --
> > Cheers,
> >
> > David
> >
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-03-31 14:01   ` Lorenzo Stoakes (Oracle)
  2026-03-31 14:13     ` David Hildenbrand (Arm)
  2026-03-31 16:29     ` Lance Yang
@ 2026-03-31 19:59     ` Nico Pache
  2 siblings, 0 replies; 35+ messages in thread
From: Nico Pache @ 2026-03-31 19:59 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: linux-kernel, linux-mm, aarcange, akpm, anshuman.khandual,
	apopple, baohua, baolin.wang, byungchul, catalin.marinas, cl,
	corbet, dave.hansen, david, dev.jain, gourry, hannes, hughd,
	jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas, lance.yang,
	Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers, matthew.brost,
	mhiramat, mhocko, peterx, pfalcato, rakie.kim, raquini, rdunlap,
	richard.weiyang, rientjes, rostedt, rppt, ryan.roberts, shivankg,
	sunnanyong, surenb, thomas.hellstrom, tiwai, usamaarif642, vbabka,
	vishal.moola, wangkefeng.wang, will, willy, yang, ying.huang, ziy,
	zokeefe

On Tue, Mar 31, 2026 at 8:01 AM Lorenzo Stoakes (Oracle) <ljs@kernel.org> wrote:
>
> OK we need a fairly urgent fix for this as this has triggered a syzbot. See [0]
> for an analysis.
>
> I show inline where the issue is, and attach a fix-patch for the bug.
>
> [0]: https://lore.kernel.org/all/e1cb33b8-c1f7-4972-8628-3a2169077d6e@lucifer.local/
>
> See below for details.
>
> Cheers, Lorenzo
>
> On Wed, Mar 25, 2026 at 05:40:22AM -0600, Nico Pache wrote:
> > The khugepaged daemon and madvise_collapse have two different
> > implementations that do almost the same thing. Create collapse_single_pmd
> > to increase code reuse and create an entry point to these two users.
> >
> > Refactor madvise_collapse and collapse_scan_mm_slot to use the new
> > collapse_single_pmd function. To help reduce confusion around the
> > mmap_locked variable, we rename mmap_locked to lock_dropped in the
> > collapse_scan_mm_slot() function, and remove the redundant mmap_locked
> > in madvise_collapse(); this further unifies the code readiblity. the
> > SCAN_PTE_MAPPED_HUGEPAGE enum is no longer reachable in the
> > madvise_collapse() function, so we drop it from the list of "continuing"
> > enums.
> >
> > This introduces a minor behavioral change that is most likely an
> > undiscovered bug. The current implementation of khugepaged tests
> > collapse_test_exit_or_disable() before calling collapse_pte_mapped_thp,
> > but we weren't doing it in the madvise_collapse case. By unifying these
> > two callers madvise_collapse now also performs this check. We also modify
> > the return value to be SCAN_ANY_PROCESS which properly indicates that this
> > process is no longer valid to operate on.
> >
> > By moving the madvise_collapse writeback-retry logic into the helper
> > function we can also avoid having to revalidate the VMA.
> >
> > We guard the khugepaged_pages_collapsed variable to ensure its only
> > incremented for khugepaged.
> >
> > As requested we also convert a VM_BUG_ON to a VM_WARN_ON.
> >
> > Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > Reviewed-by: Lance Yang <lance.yang@linux.dev>
> > Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> > Signed-off-by: Nico Pache <npache@redhat.com>
> > ---
> >  mm/khugepaged.c | 142 ++++++++++++++++++++++++------------------------
> >  1 file changed, 72 insertions(+), 70 deletions(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 3728a2cf133c..d06d84219e1b 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1257,7 +1257,7 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
> >
> >  static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
> >               struct vm_area_struct *vma, unsigned long start_addr,
> > -             bool *mmap_locked, struct collapse_control *cc)
> > +             bool *lock_dropped, struct collapse_control *cc)
> >  {
> >       pmd_t *pmd;
> >       pte_t *pte, *_pte;
> > @@ -1432,7 +1432,7 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
> >               result = collapse_huge_page(mm, start_addr, referenced,
> >                                           unmapped, cc);
> >               /* collapse_huge_page will return with the mmap_lock released */
> > -             *mmap_locked = false;
> > +             *lock_dropped = true;
> >       }
> >  out:
> >       trace_mm_khugepaged_scan_pmd(mm, folio, referenced,
> > @@ -2424,6 +2424,67 @@ static enum scan_result collapse_scan_file(struct mm_struct *mm,
> >       return result;
> >  }
> >
> > +/*
> > + * Try to collapse a single PMD starting at a PMD aligned addr, and return
> > + * the results.
> > + */
> > +static enum scan_result collapse_single_pmd(unsigned long addr,
> > +             struct vm_area_struct *vma, bool *lock_dropped,
> > +             struct collapse_control *cc)
> > +{
> > +     struct mm_struct *mm = vma->vm_mm;
> > +     bool triggered_wb = false;
> > +     enum scan_result result;
> > +     struct file *file;
> > +     pgoff_t pgoff;
> > +
> > +     mmap_assert_locked(mm);
> > +
> > +     if (vma_is_anonymous(vma)) {
> > +             result = collapse_scan_pmd(mm, vma, addr, lock_dropped, cc);
> > +             goto end;
> > +     }
> > +
> > +     file = get_file(vma->vm_file);
> > +     pgoff = linear_page_index(vma, addr);
> > +
> > +     mmap_read_unlock(mm);
> > +     *lock_dropped = true;
> > +retry:
> > +     result = collapse_scan_file(mm, addr, file, pgoff, cc);
> > +
> > +     /*
> > +      * For MADV_COLLAPSE, when encountering dirty pages, try to writeback,
> > +      * then retry the collapse one time.
> > +      */
> > +     if (!cc->is_khugepaged && result == SCAN_PAGE_DIRTY_OR_WRITEBACK &&
> > +         !triggered_wb && mapping_can_writeback(file->f_mapping)) {
> > +             const loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
> > +             const loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
> > +
> > +             filemap_write_and_wait_range(file->f_mapping, lstart, lend);
> > +             triggered_wb = true;
> > +             goto retry;
> > +     }
> > +     fput(file);
> > +
> > +     if (result == SCAN_PTE_MAPPED_HUGEPAGE) {
> > +             mmap_read_lock(mm);
> > +             if (collapse_test_exit_or_disable(mm))
> > +                     result = SCAN_ANY_PROCESS;
> > +             else
> > +                     result = try_collapse_pte_mapped_thp(mm, addr,
> > +                                                          !cc->is_khugepaged);
> > +             if (result == SCAN_PMD_MAPPED)
> > +                     result = SCAN_SUCCEED;
> > +             mmap_read_unlock(mm);
> > +     }
> > +end:
> > +     if (cc->is_khugepaged && result == SCAN_SUCCEED)
> > +             ++khugepaged_pages_collapsed;
> > +     return result;
> > +}
> > +
> >  static void collapse_scan_mm_slot(unsigned int progress_max,
> >               enum scan_result *result, struct collapse_control *cc)
> >       __releases(&khugepaged_mm_lock)
> > @@ -2485,46 +2546,21 @@ static void collapse_scan_mm_slot(unsigned int progress_max,
> >               VM_BUG_ON(khugepaged_scan.address & ~HPAGE_PMD_MASK);
> >
> >               while (khugepaged_scan.address < hend) {
> > -                     bool mmap_locked = true;
> > +                     bool lock_dropped = false;
> >
> >                       cond_resched();
> >                       if (unlikely(collapse_test_exit_or_disable(mm)))
> >                               goto breakouterloop;
> >
> > -                     VM_BUG_ON(khugepaged_scan.address < hstart ||
> > +                     VM_WARN_ON_ONCE(khugepaged_scan.address < hstart ||
> >                                 khugepaged_scan.address + HPAGE_PMD_SIZE >
> >                                 hend);
> > -                     if (!vma_is_anonymous(vma)) {
> > -                             struct file *file = get_file(vma->vm_file);
> > -                             pgoff_t pgoff = linear_page_index(vma,
> > -                                             khugepaged_scan.address);
> > -
> > -                             mmap_read_unlock(mm);
> > -                             mmap_locked = false;
> > -                             *result = collapse_scan_file(mm,
> > -                                     khugepaged_scan.address, file, pgoff, cc);
> > -                             fput(file);
> > -                             if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
> > -                                     mmap_read_lock(mm);
> > -                                     if (collapse_test_exit_or_disable(mm))
> > -                                             goto breakouterloop;
> > -                                     *result = try_collapse_pte_mapped_thp(mm,
> > -                                             khugepaged_scan.address, false);
> > -                                     if (*result == SCAN_PMD_MAPPED)
> > -                                             *result = SCAN_SUCCEED;
> > -                                     mmap_read_unlock(mm);
> > -                             }
> > -                     } else {
> > -                             *result = collapse_scan_pmd(mm, vma,
> > -                                     khugepaged_scan.address, &mmap_locked, cc);
> > -                     }
> > -
> > -                     if (*result == SCAN_SUCCEED)
> > -                             ++khugepaged_pages_collapsed;
> >
> > +                     *result = collapse_single_pmd(khugepaged_scan.address,
> > +                                                   vma, &lock_dropped, cc);
> >                       /* move to next address */
> >                       khugepaged_scan.address += HPAGE_PMD_SIZE;
> > -                     if (!mmap_locked)
> > +                     if (lock_dropped)
> >                               /*
> >                                * We released mmap_lock so break loop.  Note
> >                                * that we drop mmap_lock before all hugepage
> > @@ -2799,7 +2835,6 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> >       unsigned long hstart, hend, addr;
> >       enum scan_result last_fail = SCAN_FAIL;
> >       int thps = 0;
> > -     bool mmap_locked = true;
> >
> >       BUG_ON(vma->vm_start > start);
> >       BUG_ON(vma->vm_end < end);
> > @@ -2821,13 +2856,11 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> >
> >       for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
> >               enum scan_result result = SCAN_FAIL;
> > -             bool triggered_wb = false;
> >
> > -retry:
> > -             if (!mmap_locked) {
> > +             if (*lock_dropped) {
> >                       cond_resched();
> >                       mmap_read_lock(mm);
> > -                     mmap_locked = true;
> > +                     *lock_dropped = false;
>
> So this is the bug. 'lock_dropped' needs to record if the lock was _ever_
> dropped, not if it is _currently_ dropped.
>
> This is probably a mea culpa on my part on review, so apologies.

All good! That code is rather confusing.

>
> See below for a fix-patch.
>
> >                       result = hugepage_vma_revalidate(mm, addr, false, &vma,
> >                                                        cc);
> >                       if (result  != SCAN_SUCCEED) {
> > @@ -2837,45 +2870,14 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> >
> >                       hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
> >               }
> > -             mmap_assert_locked(mm);
> > -             if (!vma_is_anonymous(vma)) {
> > -                     struct file *file = get_file(vma->vm_file);
> > -                     pgoff_t pgoff = linear_page_index(vma, addr);
> >
> > -                     mmap_read_unlock(mm);
> > -                     mmap_locked = false;
> > -                     *lock_dropped = true;
> > -                     result = collapse_scan_file(mm, addr, file, pgoff, cc);
> > -
> > -                     if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
> > -                         mapping_can_writeback(file->f_mapping)) {
> > -                             loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
> > -                             loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
> > -
> > -                             filemap_write_and_wait_range(file->f_mapping, lstart, lend);
> > -                             triggered_wb = true;
> > -                             fput(file);
> > -                             goto retry;
> > -                     }
> > -                     fput(file);
> > -             } else {
> > -                     result = collapse_scan_pmd(mm, vma, addr, &mmap_locked, cc);
> > -             }
> > -             if (!mmap_locked)
> > -                     *lock_dropped = true;
> > +             result = collapse_single_pmd(addr, vma, lock_dropped, cc);
> >
> > -handle_result:
> >               switch (result) {
> >               case SCAN_SUCCEED:
> >               case SCAN_PMD_MAPPED:
> >                       ++thps;
> >                       break;
> > -             case SCAN_PTE_MAPPED_HUGEPAGE:
> > -                     BUG_ON(mmap_locked);
> > -                     mmap_read_lock(mm);
> > -                     result = try_collapse_pte_mapped_thp(mm, addr, true);
> > -                     mmap_read_unlock(mm);
> > -                     goto handle_result;
> >               /* Whitelisted set of results where continuing OK */
> >               case SCAN_NO_PTE_TABLE:
> >               case SCAN_PTE_NON_PRESENT:
> > @@ -2898,7 +2900,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> >
> >  out_maybelock:
> >       /* Caller expects us to hold mmap_lock on return */
> > -     if (!mmap_locked)
> > +     if (*lock_dropped)
> >               mmap_read_lock(mm);
> >  out_nolock:
> >       mmap_assert_locked(mm);
> > --
> > 2.53.0
> >
>
> Fix patch follows:
>
> ----8<----
> From a4dfc7718a15035449f344a0bc7f58e449366405 Mon Sep 17 00:00:00 2001
> From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
> Date: Tue, 31 Mar 2026 13:11:18 +0100
> Subject: [PATCH] mm/khugepaged: fix issue with tracking lock
>
> We are incorrectly treating lock_dropped to track both whether the lock is
> currently held and whether or not the lock was ever dropped.
>
> Update this change to account for this.
>
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---

Thanks for fixing this so quickly :) Sysbot didn't send this to me.
Sadly it looked like we indeed needed that doubled "locked" semantics.
Thank you for the very good explanation in-reply-to the sysbot; that
really cleared up some confusion for me.

Reviewed-by: Nico Pache <npache@redhat.com>

>  mm/khugepaged.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index d21348b85a59..b8452dbdb043 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2828,6 +2828,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>         unsigned long hstart, hend, addr;
>         enum scan_result last_fail = SCAN_FAIL;
>         int thps = 0;
> +       bool mmap_unlocked = false;
>
>         BUG_ON(vma->vm_start > start);
>         BUG_ON(vma->vm_end < end);
> @@ -2850,10 +2851,11 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>         for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
>                 enum scan_result result = SCAN_FAIL;
>
> -               if (*lock_dropped) {
> +               if (mmap_unlocked) {
>                         cond_resched();
>                         mmap_read_lock(mm);
> -                       *lock_dropped = false;
> +                       mmap_unlocked = false;
> +                       *lock_dropped = true;
>                         result = hugepage_vma_revalidate(mm, addr, false, &vma,
>                                                         cc);
>                         if (result  != SCAN_SUCCEED) {
> @@ -2864,7 +2866,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>                         hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
>                 }
>
> -               result = collapse_single_pmd(addr, vma, lock_dropped, cc);
> +               result = collapse_single_pmd(addr, vma, &mmap_unlocked, cc);
>
>                 switch (result) {
>                 case SCAN_SUCCEED:
> @@ -2893,8 +2895,10 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>
>  out_maybelock:
>         /* Caller expects us to hold mmap_lock on return */
> -       if (*lock_dropped)
> +       if (mmap_unlocked) {
> +               *lock_dropped = true;
>                 mmap_read_lock(mm);
> +       }
>  out_nolock:
>         mmap_assert_locked(mm);
>         mmdrop(mm);
> --
> 2.53.0
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-03-31 14:15       ` Lorenzo Stoakes (Oracle)
  2026-03-31 14:46         ` David Hildenbrand (Arm)
@ 2026-03-31 20:00         ` David Hildenbrand (Arm)
  2026-03-31 20:06           ` Lorenzo Stoakes (Oracle)
  2026-03-31 21:35           ` Andrew Morton
  1 sibling, 2 replies; 35+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-31 20:00 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Nico Pache, linux-kernel, linux-mm, aarcange, akpm,
	anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
	hannes, hughd, jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe

On 3/31/26 16:15, Lorenzo Stoakes (Oracle) wrote:
> On Tue, Mar 31, 2026 at 04:13:29PM +0200, David Hildenbrand (Arm) wrote:
>>>
>>> So this is the bug. 'lock_dropped' needs to record if the lock was _ever_
>>> dropped, not if it is _currently_ dropped.
>>
>> Ah, well spotted. I thought we discussed that at some point during
>> review ...
>>
>> Thanks for tackling this!
>>
>> [...]
>>
>>>
>>> ----8<----
>>> From a4dfc7718a15035449f344a0bc7f58e449366405 Mon Sep 17 00:00:00 2001
>>> From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
>>> Date: Tue, 31 Mar 2026 13:11:18 +0100
>>> Subject: [PATCH] mm/khugepaged: fix issue with tracking lock
>>>
>>> We are incorrectly treating lock_dropped to track both whether the lock is
>>> currently held and whether or not the lock was ever dropped.
>>>
>>> Update this change to account for this.
>>>
>>> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
>>> ---
>>>  mm/khugepaged.c | 12 ++++++++----
>>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index d21348b85a59..b8452dbdb043 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -2828,6 +2828,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>>>  	unsigned long hstart, hend, addr;
>>>  	enum scan_result last_fail = SCAN_FAIL;
>>>  	int thps = 0;
>>> +	bool mmap_unlocked = false;
>>>
>>>  	BUG_ON(vma->vm_start > start);
>>>  	BUG_ON(vma->vm_end < end);
>>> @@ -2850,10 +2851,11 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>>>  	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
>>>  		enum scan_result result = SCAN_FAIL;
>>>
>>> -		if (*lock_dropped) {
>>> +		if (mmap_unlocked) {
>>>  			cond_resched();
>>>  			mmap_read_lock(mm);
>>> -			*lock_dropped = false;
>>> +			mmap_unlocked = false;
>>> +			*lock_dropped = true;
>>>  			result = hugepage_vma_revalidate(mm, addr, false, &vma,
>>>  							cc);
>>>  			if (result  != SCAN_SUCCEED) {
>>> @@ -2864,7 +2866,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>>
>> (in hurry so I might be wrong)
>>
>> Do we have to handle when collapse_single_pmd() dropped the lock as well?
> 
> That updates mmap_unlocked which will be handled in the loop or on exit:
> 
> 		if (mmap_unlocked) {
> 			...
> 			mmap_read_lock(mm);
> 			mmap_unlocked = false;
> 			*lock_dropped = true;
> 			...
> 		}
> 
> 		result = collapse_single_pmd(addr, vma, &mmap_unlocked, cc);
> 
> ...
> 
> out_maybelock:
> 	/* Caller expects us to hold mmap_lock on return */
> 	if (mmap_unlocked) {
> 		*lock_dropped = true;
> 		mmap_read_lock(mm);
> 	}
> 

Right.

The original code used

	bool mmap_locked = true;

If the fix get squashed, maybe we should revert to that handling to
minimize the churn?

(this is not in stable, right?)

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-03-31 20:00         ` David Hildenbrand (Arm)
@ 2026-03-31 20:06           ` Lorenzo Stoakes (Oracle)
  2026-03-31 20:50             ` David Hildenbrand (Arm)
  2026-03-31 21:35           ` Andrew Morton
  1 sibling, 1 reply; 35+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-31 20:06 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Nico Pache, linux-kernel, linux-mm, aarcange, akpm,
	anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
	hannes, hughd, jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe

On Tue, Mar 31, 2026 at 10:00:58PM +0200, David Hildenbrand (Arm) wrote:
> On 3/31/26 16:15, Lorenzo Stoakes (Oracle) wrote:
> > On Tue, Mar 31, 2026 at 04:13:29PM +0200, David Hildenbrand (Arm) wrote:
> >>>
> >>> So this is the bug. 'lock_dropped' needs to record if the lock was _ever_
> >>> dropped, not if it is _currently_ dropped.
> >>
> >> Ah, well spotted. I thought we discussed that at some point during
> >> review ...
> >>
> >> Thanks for tackling this!
> >>
> >> [...]
> >>
> >>>
> >>> ----8<----
> >>> From a4dfc7718a15035449f344a0bc7f58e449366405 Mon Sep 17 00:00:00 2001
> >>> From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
> >>> Date: Tue, 31 Mar 2026 13:11:18 +0100
> >>> Subject: [PATCH] mm/khugepaged: fix issue with tracking lock
> >>>
> >>> We are incorrectly treating lock_dropped to track both whether the lock is
> >>> currently held and whether or not the lock was ever dropped.
> >>>
> >>> Update this change to account for this.
> >>>
> >>> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> >>> ---
> >>>  mm/khugepaged.c | 12 ++++++++----
> >>>  1 file changed, 8 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >>> index d21348b85a59..b8452dbdb043 100644
> >>> --- a/mm/khugepaged.c
> >>> +++ b/mm/khugepaged.c
> >>> @@ -2828,6 +2828,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> >>>  	unsigned long hstart, hend, addr;
> >>>  	enum scan_result last_fail = SCAN_FAIL;
> >>>  	int thps = 0;
> >>> +	bool mmap_unlocked = false;
> >>>
> >>>  	BUG_ON(vma->vm_start > start);
> >>>  	BUG_ON(vma->vm_end < end);
> >>> @@ -2850,10 +2851,11 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> >>>  	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
> >>>  		enum scan_result result = SCAN_FAIL;
> >>>
> >>> -		if (*lock_dropped) {
> >>> +		if (mmap_unlocked) {
> >>>  			cond_resched();
> >>>  			mmap_read_lock(mm);
> >>> -			*lock_dropped = false;
> >>> +			mmap_unlocked = false;
> >>> +			*lock_dropped = true;
> >>>  			result = hugepage_vma_revalidate(mm, addr, false, &vma,
> >>>  							cc);
> >>>  			if (result  != SCAN_SUCCEED) {
> >>> @@ -2864,7 +2866,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
> >>
> >> (in hurry so I might be wrong)
> >>
> >> Do we have to handle when collapse_single_pmd() dropped the lock as well?
> >
> > That updates mmap_unlocked which will be handled in the loop or on exit:
> >
> > 		if (mmap_unlocked) {
> > 			...
> > 			mmap_read_lock(mm);
> > 			mmap_unlocked = false;
> > 			*lock_dropped = true;
> > 			...
> > 		}
> >
> > 		result = collapse_single_pmd(addr, vma, &mmap_unlocked, cc);
> >
> > ...
> >
> > out_maybelock:
> > 	/* Caller expects us to hold mmap_lock on return */
> > 	if (mmap_unlocked) {
> > 		*lock_dropped = true;
> > 		mmap_read_lock(mm);
> > 	}
> >
>
> Right.
>
> The original code used
>
> 	bool mmap_locked = true;
>
> If the fix get squashed, maybe we should revert to that handling to
> minimize the churn?

Well collapse_single_pmd() and all descendents would need to be updated to
invert the boolean, not sure it's worth it.

>
> (this is not in stable, right?)
>
> --
> Cheers,
>
> David


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-03-31 20:06           ` Lorenzo Stoakes (Oracle)
@ 2026-03-31 20:50             ` David Hildenbrand (Arm)
  2026-03-31 21:03               ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-31 20:50 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Nico Pache, linux-kernel, linux-mm, aarcange, akpm,
	anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
	hannes, hughd, jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe

On 3/31/26 22:06, Lorenzo Stoakes (Oracle) wrote:
> On Tue, Mar 31, 2026 at 10:00:58PM +0200, David Hildenbrand (Arm) wrote:
>> On 3/31/26 16:15, Lorenzo Stoakes (Oracle) wrote:
>>>
>>> That updates mmap_unlocked which will be handled in the loop or on exit:
>>>
>>> 		if (mmap_unlocked) {
>>> 			...
>>> 			mmap_read_lock(mm);
>>> 			mmap_unlocked = false;
>>> 			*lock_dropped = true;
>>> 			...
>>> 		}
>>>
>>> 		result = collapse_single_pmd(addr, vma, &mmap_unlocked, cc);
>>>
>>> ...
>>>
>>> out_maybelock:
>>> 	/* Caller expects us to hold mmap_lock on return */
>>> 	if (mmap_unlocked) {
>>> 		*lock_dropped = true;
>>> 		mmap_read_lock(mm);
>>> 	}
>>>
>>
>> Right.
>>
>> The original code used
>>
>> 	bool mmap_locked = true;
>>
>> If the fix get squashed, maybe we should revert to that handling to
>> minimize the churn?
> 
> Well collapse_single_pmd() and all descendents would need to be updated to
> invert the boolean, not sure it's worth it.

The code is confusing me.

In the old code, collapse_single_pmd() was called with "mmap_locked ==
true" and set "mmap_locked=false".

Now we call it with "mmap_unlocked=false" and it sets "mmap_unlocked=true".

So far so good.

However, collapse_scan_pmd() consumed "mmap_locked", now it effectively
consumes "mmap_unlocked".

I think I have to squash your fix into Nicos patch to make sense of this. :)


-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-03-31 20:50             ` David Hildenbrand (Arm)
@ 2026-03-31 21:03               ` David Hildenbrand (Arm)
  2026-03-31 21:09                 ` Nico Pache
  0 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-31 21:03 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Nico Pache, linux-kernel, linux-mm, aarcange, akpm,
	anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
	hannes, hughd, jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe

On 3/31/26 22:50, David Hildenbrand (Arm) wrote:
> On 3/31/26 22:06, Lorenzo Stoakes (Oracle) wrote:
>> On Tue, Mar 31, 2026 at 10:00:58PM +0200, David Hildenbrand (Arm) wrote:
>>>
>>> Right.
>>>
>>> The original code used
>>>
>>> 	bool mmap_locked = true;
>>>
>>> If the fix get squashed, maybe we should revert to that handling to
>>> minimize the churn?
>>
>> Well collapse_single_pmd() and all descendents would need to be updated to
>> invert the boolean, not sure it's worth it.
> 
> The code is confusing me.
> 
> In the old code, collapse_single_pmd() was called with "mmap_locked ==
> true" and set "mmap_locked=false".
> 
> Now we call it with "mmap_unlocked=false" and it sets "mmap_unlocked=true".
> 
> So far so good.
> 
> However, collapse_scan_pmd() consumed "mmap_locked", now it effectively
> consumes "mmap_unlocked".

Okay, I'm too confused for today and give up :)

FWIW, this is the effective change, and the effective changes to
mmap_unlocked and lock_dropped ... confuse me:


 mm/khugepaged.c | 146 +++++++++++++++++++++++++-----------------------
 1 file changed, 76 insertions(+), 70 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 26c38934e829..3ffff9f98b2f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1253,7 +1253,7 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
 
 static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
 		struct vm_area_struct *vma, unsigned long start_addr,
-		bool *mmap_locked, struct collapse_control *cc)
+		bool *lock_dropped, struct collapse_control *cc)
 {
 	pmd_t *pmd;
 	pte_t *pte, *_pte;
@@ -1428,7 +1428,7 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
 		result = collapse_huge_page(mm, start_addr, referenced,
 					    unmapped, cc);
 		/* collapse_huge_page will return with the mmap_lock released */
-		*mmap_locked = false;
+		*lock_dropped = true;
 	}
 out:
 	trace_mm_khugepaged_scan_pmd(mm, folio, referenced,
@@ -2420,6 +2420,67 @@ static enum scan_result collapse_scan_file(struct mm_struct *mm,
 	return result;
 }
 
+/*
+ * Try to collapse a single PMD starting at a PMD aligned addr, and return
+ * the results.
+ */
+static enum scan_result collapse_single_pmd(unsigned long addr,
+		struct vm_area_struct *vma, bool *lock_dropped,
+		struct collapse_control *cc)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	bool triggered_wb = false;
+	enum scan_result result;
+	struct file *file;
+	pgoff_t pgoff;
+
+	mmap_assert_locked(mm);
+
+	if (vma_is_anonymous(vma)) {
+		result = collapse_scan_pmd(mm, vma, addr, lock_dropped, cc);
+		goto end;
+	}
+
+	file = get_file(vma->vm_file);
+	pgoff = linear_page_index(vma, addr);
+
+	mmap_read_unlock(mm);
+	*lock_dropped = true;
+retry:
+	result = collapse_scan_file(mm, addr, file, pgoff, cc);
+
+	/*
+	 * For MADV_COLLAPSE, when encountering dirty pages, try to writeback,
+	 * then retry the collapse one time.
+	 */
+	if (!cc->is_khugepaged && result == SCAN_PAGE_DIRTY_OR_WRITEBACK &&
+	    !triggered_wb && mapping_can_writeback(file->f_mapping)) {
+		const loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
+		const loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
+
+		filemap_write_and_wait_range(file->f_mapping, lstart, lend);
+		triggered_wb = true;
+		goto retry;
+	}
+	fput(file);
+
+	if (result == SCAN_PTE_MAPPED_HUGEPAGE) {
+		mmap_read_lock(mm);
+		if (collapse_test_exit_or_disable(mm))
+			result = SCAN_ANY_PROCESS;
+		else
+			result = try_collapse_pte_mapped_thp(mm, addr,
+							     !cc->is_khugepaged);
+		if (result == SCAN_PMD_MAPPED)
+			result = SCAN_SUCCEED;
+		mmap_read_unlock(mm);
+	}
+end:
+	if (cc->is_khugepaged && result == SCAN_SUCCEED)
+		++khugepaged_pages_collapsed;
+	return result;
+}
+
 static void collapse_scan_mm_slot(unsigned int progress_max,
 		enum scan_result *result, struct collapse_control *cc)
 	__releases(&khugepaged_mm_lock)
@@ -2481,46 +2542,21 @@ static void collapse_scan_mm_slot(unsigned int progress_max,
 		VM_BUG_ON(khugepaged_scan.address & ~HPAGE_PMD_MASK);
 
 		while (khugepaged_scan.address < hend) {
-			bool mmap_locked = true;
+			bool lock_dropped = false;
 
 			cond_resched();
 			if (unlikely(collapse_test_exit_or_disable(mm)))
 				goto breakouterloop;
 
-			VM_BUG_ON(khugepaged_scan.address < hstart ||
+			VM_WARN_ON_ONCE(khugepaged_scan.address < hstart ||
 				  khugepaged_scan.address + HPAGE_PMD_SIZE >
 				  hend);
-			if (!vma_is_anonymous(vma)) {
-				struct file *file = get_file(vma->vm_file);
-				pgoff_t pgoff = linear_page_index(vma,
-						khugepaged_scan.address);
-
-				mmap_read_unlock(mm);
-				mmap_locked = false;
-				*result = collapse_scan_file(mm,
-					khugepaged_scan.address, file, pgoff, cc);
-				fput(file);
-				if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
-					mmap_read_lock(mm);
-					if (collapse_test_exit_or_disable(mm))
-						goto breakouterloop;
-					*result = try_collapse_pte_mapped_thp(mm,
-						khugepaged_scan.address, false);
-					if (*result == SCAN_PMD_MAPPED)
-						*result = SCAN_SUCCEED;
-					mmap_read_unlock(mm);
-				}
-			} else {
-				*result = collapse_scan_pmd(mm, vma,
-					khugepaged_scan.address, &mmap_locked, cc);
-			}
-
-			if (*result == SCAN_SUCCEED)
-				++khugepaged_pages_collapsed;
 
+			*result = collapse_single_pmd(khugepaged_scan.address,
+						      vma, &lock_dropped, cc);
 			/* move to next address */
 			khugepaged_scan.address += HPAGE_PMD_SIZE;
-			if (!mmap_locked)
+			if (lock_dropped)
 				/*
 				 * We released mmap_lock so break loop.  Note
 				 * that we drop mmap_lock before all hugepage
@@ -2795,7 +2831,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 	unsigned long hstart, hend, addr;
 	enum scan_result last_fail = SCAN_FAIL;
 	int thps = 0;
-	bool mmap_locked = true;
+	bool mmap_unlocked = false;
 
 	BUG_ON(vma->vm_start > start);
 	BUG_ON(vma->vm_end < end);
@@ -2817,13 +2853,12 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 
 	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
 		enum scan_result result = SCAN_FAIL;
-		bool triggered_wb = false;
 
-retry:
-		if (!mmap_locked) {
+		if (mmap_unlocked) {
 			cond_resched();
 			mmap_read_lock(mm);
-			mmap_locked = true;
+			mmap_unlocked = false;
+			*lock_dropped = true;
 			result = hugepage_vma_revalidate(mm, addr, false, &vma,
 							 cc);
 			if (result  != SCAN_SUCCEED) {
@@ -2833,45 +2868,14 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 
 			hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
 		}
-		mmap_assert_locked(mm);
-		if (!vma_is_anonymous(vma)) {
-			struct file *file = get_file(vma->vm_file);
-			pgoff_t pgoff = linear_page_index(vma, addr);
 
-			mmap_read_unlock(mm);
-			mmap_locked = false;
-			*lock_dropped = true;
-			result = collapse_scan_file(mm, addr, file, pgoff, cc);
-
-			if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
-			    mapping_can_writeback(file->f_mapping)) {
-				loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
-				loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
+		result = collapse_single_pmd(addr, vma, &mmap_unlocked, cc);
 
-				filemap_write_and_wait_range(file->f_mapping, lstart, lend);
-				triggered_wb = true;
-				fput(file);
-				goto retry;
-			}
-			fput(file);
-		} else {
-			result = collapse_scan_pmd(mm, vma, addr, &mmap_locked, cc);
-		}
-		if (!mmap_locked)
-			*lock_dropped = true;
-
-handle_result:
 		switch (result) {
 		case SCAN_SUCCEED:
 		case SCAN_PMD_MAPPED:
 			++thps;
 			break;
-		case SCAN_PTE_MAPPED_HUGEPAGE:
-			BUG_ON(mmap_locked);
-			mmap_read_lock(mm);
-			result = try_collapse_pte_mapped_thp(mm, addr, true);
-			mmap_read_unlock(mm);
-			goto handle_result;
 		/* Whitelisted set of results where continuing OK */
 		case SCAN_NO_PTE_TABLE:
 		case SCAN_PTE_NON_PRESENT:
@@ -2894,8 +2898,10 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 
 out_maybelock:
 	/* Caller expects us to hold mmap_lock on return */
-	if (!mmap_locked)
+	if (mmap_unlocked) {
+		*lock_dropped = false;
 		mmap_read_lock(mm);
+	}
 out_nolock:
 	mmap_assert_locked(mm);
 	mmdrop(mm);
-- 
2.43.0


-- 
Cheers,

David


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-03-31 21:03               ` David Hildenbrand (Arm)
@ 2026-03-31 21:09                 ` Nico Pache
  2026-04-01  8:14                   ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 35+ messages in thread
From: Nico Pache @ 2026-03-31 21:09 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Lorenzo Stoakes (Oracle), linux-kernel, linux-mm, aarcange, akpm,
	anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
	hannes, hughd, jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe

On Tue, Mar 31, 2026 at 3:04 PM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> On 3/31/26 22:50, David Hildenbrand (Arm) wrote:
> > On 3/31/26 22:06, Lorenzo Stoakes (Oracle) wrote:
> >> On Tue, Mar 31, 2026 at 10:00:58PM +0200, David Hildenbrand (Arm) wrote:
> >>>
> >>> Right.
> >>>
> >>> The original code used
> >>>
> >>>     bool mmap_locked = true;
> >>>
> >>> If the fix get squashed, maybe we should revert to that handling to
> >>> minimize the churn?
> >>
> >> Well collapse_single_pmd() and all descendents would need to be updated to
> >> invert the boolean, not sure it's worth it.
> >
> > The code is confusing me.
> >
> > In the old code, collapse_single_pmd() was called with "mmap_locked ==
> > true" and set "mmap_locked=false".
> >
> > Now we call it with "mmap_unlocked=false" and it sets "mmap_unlocked=true".
> >
> > So far so good.
> >
> > However, collapse_scan_pmd() consumed "mmap_locked", now it effectively
> > consumes "mmap_unlocked".
>
> Okay, I'm too confused for today and give up :)
>
> FWIW, this is the effective change, and the effective changes to
> mmap_unlocked and lock_dropped ... confuse me:

basically all the logic was inverted.

>
>
>  mm/khugepaged.c | 146 +++++++++++++++++++++++++-----------------------
>  1 file changed, 76 insertions(+), 70 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 26c38934e829..3ffff9f98b2f 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1253,7 +1253,7 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
>
>  static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
>                 struct vm_area_struct *vma, unsigned long start_addr,
> -               bool *mmap_locked, struct collapse_control *cc)
> +               bool *lock_dropped, struct collapse_control *cc)
>  {
>         pmd_t *pmd;
>         pte_t *pte, *_pte;
> @@ -1428,7 +1428,7 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
>                 result = collapse_huge_page(mm, start_addr, referenced,
>                                             unmapped, cc);
>                 /* collapse_huge_page will return with the mmap_lock released */
> -               *mmap_locked = false;
> +               *lock_dropped = true;
>         }
>  out:
>         trace_mm_khugepaged_scan_pmd(mm, folio, referenced,
> @@ -2420,6 +2420,67 @@ static enum scan_result collapse_scan_file(struct mm_struct *mm,
>         return result;
>  }
>
> +/*
> + * Try to collapse a single PMD starting at a PMD aligned addr, and return
> + * the results.
> + */
> +static enum scan_result collapse_single_pmd(unsigned long addr,
> +               struct vm_area_struct *vma, bool *lock_dropped,
> +               struct collapse_control *cc)
> +{
> +       struct mm_struct *mm = vma->vm_mm;
> +       bool triggered_wb = false;
> +       enum scan_result result;
> +       struct file *file;
> +       pgoff_t pgoff;
> +
> +       mmap_assert_locked(mm);
> +
> +       if (vma_is_anonymous(vma)) {
> +               result = collapse_scan_pmd(mm, vma, addr, lock_dropped, cc);
> +               goto end;
> +       }
> +
> +       file = get_file(vma->vm_file);
> +       pgoff = linear_page_index(vma, addr);
> +
> +       mmap_read_unlock(mm);
> +       *lock_dropped = true;
> +retry:
> +       result = collapse_scan_file(mm, addr, file, pgoff, cc);
> +
> +       /*
> +        * For MADV_COLLAPSE, when encountering dirty pages, try to writeback,
> +        * then retry the collapse one time.
> +        */
> +       if (!cc->is_khugepaged && result == SCAN_PAGE_DIRTY_OR_WRITEBACK &&
> +           !triggered_wb && mapping_can_writeback(file->f_mapping)) {
> +               const loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
> +               const loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
> +
> +               filemap_write_and_wait_range(file->f_mapping, lstart, lend);
> +               triggered_wb = true;
> +               goto retry;
> +       }
> +       fput(file);
> +
> +       if (result == SCAN_PTE_MAPPED_HUGEPAGE) {
> +               mmap_read_lock(mm);
> +               if (collapse_test_exit_or_disable(mm))
> +                       result = SCAN_ANY_PROCESS;
> +               else
> +                       result = try_collapse_pte_mapped_thp(mm, addr,
> +                                                            !cc->is_khugepaged);
> +               if (result == SCAN_PMD_MAPPED)
> +                       result = SCAN_SUCCEED;
> +               mmap_read_unlock(mm);
> +       }
> +end:
> +       if (cc->is_khugepaged && result == SCAN_SUCCEED)
> +               ++khugepaged_pages_collapsed;
> +       return result;
> +}
> +
>  static void collapse_scan_mm_slot(unsigned int progress_max,
>                 enum scan_result *result, struct collapse_control *cc)
>         __releases(&khugepaged_mm_lock)
> @@ -2481,46 +2542,21 @@ static void collapse_scan_mm_slot(unsigned int progress_max,
>                 VM_BUG_ON(khugepaged_scan.address & ~HPAGE_PMD_MASK);
>
>                 while (khugepaged_scan.address < hend) {
> -                       bool mmap_locked = true;
> +                       bool lock_dropped = false;
>
>                         cond_resched();
>                         if (unlikely(collapse_test_exit_or_disable(mm)))
>                                 goto breakouterloop;
>
> -                       VM_BUG_ON(khugepaged_scan.address < hstart ||
> +                       VM_WARN_ON_ONCE(khugepaged_scan.address < hstart ||
>                                   khugepaged_scan.address + HPAGE_PMD_SIZE >
>                                   hend);
> -                       if (!vma_is_anonymous(vma)) {
> -                               struct file *file = get_file(vma->vm_file);
> -                               pgoff_t pgoff = linear_page_index(vma,
> -                                               khugepaged_scan.address);
> -
> -                               mmap_read_unlock(mm);
> -                               mmap_locked = false;
> -                               *result = collapse_scan_file(mm,
> -                                       khugepaged_scan.address, file, pgoff, cc);
> -                               fput(file);
> -                               if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
> -                                       mmap_read_lock(mm);
> -                                       if (collapse_test_exit_or_disable(mm))
> -                                               goto breakouterloop;
> -                                       *result = try_collapse_pte_mapped_thp(mm,
> -                                               khugepaged_scan.address, false);
> -                                       if (*result == SCAN_PMD_MAPPED)
> -                                               *result = SCAN_SUCCEED;
> -                                       mmap_read_unlock(mm);
> -                               }
> -                       } else {
> -                               *result = collapse_scan_pmd(mm, vma,
> -                                       khugepaged_scan.address, &mmap_locked, cc);
> -                       }
> -
> -                       if (*result == SCAN_SUCCEED)
> -                               ++khugepaged_pages_collapsed;
>
> +                       *result = collapse_single_pmd(khugepaged_scan.address,
> +                                                     vma, &lock_dropped, cc);
>                         /* move to next address */
>                         khugepaged_scan.address += HPAGE_PMD_SIZE;
> -                       if (!mmap_locked)
> +                       if (lock_dropped)
>                                 /*
>                                  * We released mmap_lock so break loop.  Note
>                                  * that we drop mmap_lock before all hugepage
> @@ -2795,7 +2831,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>         unsigned long hstart, hend, addr;
>         enum scan_result last_fail = SCAN_FAIL;
>         int thps = 0;
> -       bool mmap_locked = true;
> +       bool mmap_unlocked = false;
>
>         BUG_ON(vma->vm_start > start);
>         BUG_ON(vma->vm_end < end);
> @@ -2817,13 +2853,12 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>
>         for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
>                 enum scan_result result = SCAN_FAIL;
> -               bool triggered_wb = false;
>
> -retry:
> -               if (!mmap_locked) {
> +               if (mmap_unlocked) {
>                         cond_resched();
>                         mmap_read_lock(mm);
> -                       mmap_locked = true;
> +                       mmap_unlocked = false;
> +                       *lock_dropped = true;
>                         result = hugepage_vma_revalidate(mm, addr, false, &vma,
>                                                          cc);
>                         if (result  != SCAN_SUCCEED) {
> @@ -2833,45 +2868,14 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>
>                         hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
>                 }
> -               mmap_assert_locked(mm);
> -               if (!vma_is_anonymous(vma)) {
> -                       struct file *file = get_file(vma->vm_file);
> -                       pgoff_t pgoff = linear_page_index(vma, addr);
>
> -                       mmap_read_unlock(mm);
> -                       mmap_locked = false;
> -                       *lock_dropped = true;
> -                       result = collapse_scan_file(mm, addr, file, pgoff, cc);
> -
> -                       if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
> -                           mapping_can_writeback(file->f_mapping)) {
> -                               loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
> -                               loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
> +               result = collapse_single_pmd(addr, vma, &mmap_unlocked, cc);
>
> -                               filemap_write_and_wait_range(file->f_mapping, lstart, lend);
> -                               triggered_wb = true;
> -                               fput(file);
> -                               goto retry;
> -                       }
> -                       fput(file);
> -               } else {
> -                       result = collapse_scan_pmd(mm, vma, addr, &mmap_locked, cc);
> -               }
> -               if (!mmap_locked)
> -                       *lock_dropped = true;
> -
> -handle_result:
>                 switch (result) {
>                 case SCAN_SUCCEED:
>                 case SCAN_PMD_MAPPED:
>                         ++thps;
>                         break;
> -               case SCAN_PTE_MAPPED_HUGEPAGE:
> -                       BUG_ON(mmap_locked);
> -                       mmap_read_lock(mm);
> -                       result = try_collapse_pte_mapped_thp(mm, addr, true);
> -                       mmap_read_unlock(mm);
> -                       goto handle_result;
>                 /* Whitelisted set of results where continuing OK */
>                 case SCAN_NO_PTE_TABLE:
>                 case SCAN_PTE_NON_PRESENT:
> @@ -2894,8 +2898,10 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>
>  out_maybelock:
>         /* Caller expects us to hold mmap_lock on return */
> -       if (!mmap_locked)
> +       if (mmap_unlocked) {
> +               *lock_dropped = false;

Lorenzo, is this correct? At first glance, shouldn't this be
`locked_dropped = true`?

-- Nico

>                 mmap_read_lock(mm);
> +       }
>  out_nolock:
>         mmap_assert_locked(mm);
>         mmdrop(mm);
> --
> 2.43.0
>
>
> --
> Cheers,
>
> David
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-03-31 20:00         ` David Hildenbrand (Arm)
  2026-03-31 20:06           ` Lorenzo Stoakes (Oracle)
@ 2026-03-31 21:35           ` Andrew Morton
  2026-03-31 21:49             ` Nico Pache
  1 sibling, 1 reply; 35+ messages in thread
From: Andrew Morton @ 2026-03-31 21:35 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Lorenzo Stoakes (Oracle), Nico Pache, linux-kernel, linux-mm,
	aarcange, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, dev.jain,
	gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
	mathieu.desnoyers, matthew.brost, mhiramat, mhocko, peterx,
	pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang, rientjes,
	rostedt, rppt, ryan.roberts, shivankg, sunnanyong, surenb,
	thomas.hellstrom, tiwai, usamaarif642, vbabka, vishal.moola,
	wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe

On Tue, 31 Mar 2026 22:00:58 +0200 "David Hildenbrand (Arm)" <david@kernel.org> wrote:

> > out_maybelock:
> > 	/* Caller expects us to hold mmap_lock on return */
> > 	if (mmap_unlocked) {
> > 		*lock_dropped = true;
> > 		mmap_read_lock(mm);
> > 	}
> > 
> 
> Right.
> 
> The original code used
> 
> 	bool mmap_locked = true;
> 
> If the fix get squashed, maybe we should revert to that handling to
> minimize the churn?
> 
> (this is not in stable, right?)

"mm/khugepaged: unify khugepaged and madv_collapse with
collapse_single_pmd(): is presently in mm-stable.

Doing a big rebase&rework isn't the done thing, I believe.  So queue it
afterwards, add the Fixes: and tolerate the minor runtime bisection
hole.  It ends up looking like a later hotfix.


Below is what I added to mm-unstable.  I added a note-to-self to check
in on Nico's question at
https://lore.kernel.org/all/CAA1CXcCYLGQBy9nT-MSTPVK=XgWWtD6Af-QiA_6pXtM_2pybPg@mail.gmail.com/T/#u


From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Subject: mm/khugepaged: fix issue with tracking lock
Date: Tue, 31 Mar 2026 13:11:18 +0100

We are incorrectly treating lock_dropped to track both whether the lock is
currently held and whether or not the lock was ever dropped.

Update this change to account for this.

Link: https://lkml.kernel.org/r/7760c811-e100-4d40-9217-0813c28314be@lucifer.local
Fixes: 330f3758a3bc ("mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()")
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Nico Pache <npache@redhat.com>
...
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/khugepaged.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

--- a/mm/khugepaged.c~mm-khugepaged-fix-issue-with-tracking-lock
+++ a/mm/khugepaged.c
@@ -2828,6 +2828,7 @@ int madvise_collapse(struct vm_area_stru
 	unsigned long hstart, hend, addr;
 	enum scan_result last_fail = SCAN_FAIL;
 	int thps = 0;
+	bool mmap_unlocked = false;
 
 	BUG_ON(vma->vm_start > start);
 	BUG_ON(vma->vm_end < end);
@@ -2850,10 +2851,11 @@ int madvise_collapse(struct vm_area_stru
 	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
 		enum scan_result result = SCAN_FAIL;
 
-		if (*lock_dropped) {
+		if (mmap_unlocked) {
 			cond_resched();
 			mmap_read_lock(mm);
-			*lock_dropped = false;
+			mmap_unlocked = false;
+			*lock_dropped = true;
 			result = hugepage_vma_revalidate(mm, addr, false, &vma,
 							 cc);
 			if (result  != SCAN_SUCCEED) {
@@ -2864,7 +2866,7 @@ int madvise_collapse(struct vm_area_stru
 			hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
 		}
 
-		result = collapse_single_pmd(addr, vma, lock_dropped, cc);
+		result = collapse_single_pmd(addr, vma, &mmap_unlocked, cc);
 
 		switch (result) {
 		case SCAN_SUCCEED:
@@ -2893,8 +2895,10 @@ int madvise_collapse(struct vm_area_stru
 
 out_maybelock:
 	/* Caller expects us to hold mmap_lock on return */
-	if (*lock_dropped)
+	if (mmap_unlocked) {
+		*lock_dropped = true;
 		mmap_read_lock(mm);
+	}
 out_nolock:
 	mmap_assert_locked(mm);
 	mmdrop(mm);
_



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-03-31 21:35           ` Andrew Morton
@ 2026-03-31 21:49             ` Nico Pache
  2026-04-01  7:05               ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 35+ messages in thread
From: Nico Pache @ 2026-03-31 21:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand (Arm), Lorenzo Stoakes (Oracle), linux-kernel,
	linux-mm, aarcange, anshuman.khandual, apopple, baohua,
	baolin.wang, byungchul, catalin.marinas, cl, corbet, dave.hansen,
	dev.jain, gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
	mathieu.desnoyers, matthew.brost, mhiramat, mhocko, peterx,
	pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang, rientjes,
	rostedt, rppt, ryan.roberts, shivankg, sunnanyong, surenb,
	thomas.hellstrom, tiwai, usamaarif642, vbabka, vishal.moola,
	wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe

On Tue, Mar 31, 2026 at 3:35 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue, 31 Mar 2026 22:00:58 +0200 "David Hildenbrand (Arm)" <david@kernel.org> wrote:
>
> > > out_maybelock:
> > >     /* Caller expects us to hold mmap_lock on return */
> > >     if (mmap_unlocked) {
> > >             *lock_dropped = true;
> > >             mmap_read_lock(mm);
> > >     }
> > >
> >
> > Right.
> >
> > The original code used
> >
> >       bool mmap_locked = true;
> >
> > If the fix get squashed, maybe we should revert to that handling to
> > minimize the churn?
> >
> > (this is not in stable, right?)
>
> "mm/khugepaged: unify khugepaged and madv_collapse with
> collapse_single_pmd(): is presently in mm-stable.
>
> Doing a big rebase&rework isn't the done thing, I believe.  So queue it
> afterwards, add the Fixes: and tolerate the minor runtime bisection
> hole.  It ends up looking like a later hotfix.
>
>
> Below is what I added to mm-unstable.  I added a note-to-self to check
> in on Nico's question at
> https://lore.kernel.org/all/CAA1CXcCYLGQBy9nT-MSTPVK=XgWWtD6Af-QiA_6pXtM_2pybPg@mail.gmail.com/T/#u
>
>
> From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
> Subject: mm/khugepaged: fix issue with tracking lock
> Date: Tue, 31 Mar 2026 13:11:18 +0100
>
> We are incorrectly treating lock_dropped to track both whether the lock is
> currently held and whether or not the lock was ever dropped.
>
> Update this change to account for this.
>
> Link: https://lkml.kernel.org/r/7760c811-e100-4d40-9217-0813c28314be@lucifer.local
> Fixes: 330f3758a3bc ("mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()")
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Reviewed-by: Nico Pache <npache@redhat.com>
> ...
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
>  mm/khugepaged.c |   12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> --- a/mm/khugepaged.c~mm-khugepaged-fix-issue-with-tracking-lock
> +++ a/mm/khugepaged.c
> @@ -2828,6 +2828,7 @@ int madvise_collapse(struct vm_area_stru
>         unsigned long hstart, hend, addr;
>         enum scan_result last_fail = SCAN_FAIL;
>         int thps = 0;
> +       bool mmap_unlocked = false;
>
>         BUG_ON(vma->vm_start > start);
>         BUG_ON(vma->vm_end < end);
> @@ -2850,10 +2851,11 @@ int madvise_collapse(struct vm_area_stru
>         for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
>                 enum scan_result result = SCAN_FAIL;
>
> -               if (*lock_dropped) {
> +               if (mmap_unlocked) {
>                         cond_resched();
>                         mmap_read_lock(mm);
> -                       *lock_dropped = false;
> +                       mmap_unlocked = false;
> +                       *lock_dropped = true;
>                         result = hugepage_vma_revalidate(mm, addr, false, &vma,
>                                                          cc);
>                         if (result  != SCAN_SUCCEED) {
> @@ -2864,7 +2866,7 @@ int madvise_collapse(struct vm_area_stru
>                         hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
>                 }
>
> -               result = collapse_single_pmd(addr, vma, lock_dropped, cc);
> +               result = collapse_single_pmd(addr, vma, &mmap_unlocked, cc);
>
>                 switch (result) {
>                 case SCAN_SUCCEED:
> @@ -2893,8 +2895,10 @@ int madvise_collapse(struct vm_area_stru
>
>  out_maybelock:
>         /* Caller expects us to hold mmap_lock on return */
> -       if (*lock_dropped)
> +       if (mmap_unlocked) {
> +               *lock_dropped = true;

Hmm this was =false in David's example and = true here? Now i am also
confused hahaha. It seems Lorenzo's code is correct. I think i
responded to my own code changes, when I thought David was replying to
Lorenzo's.

I think we are all good with Lorenzo's changes.

Sorry for the confusion

-- Nico

>                 mmap_read_lock(mm);
> +       }
>  out_nolock:
>         mmap_assert_locked(mm);
>         mmdrop(mm);
> _
>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-03-31 21:49             ` Nico Pache
@ 2026-04-01  7:05               ` David Hildenbrand (Arm)
  2026-04-01  8:17                 ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 35+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-01  7:05 UTC (permalink / raw)
  To: Nico Pache, Andrew Morton
  Cc: Lorenzo Stoakes (Oracle), linux-kernel, linux-mm, aarcange,
	anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
	hannes, hughd, jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe

On 3/31/26 23:49, Nico Pache wrote:
> On Tue, Mar 31, 2026 at 3:35 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>>
>> On Tue, 31 Mar 2026 22:00:58 +0200 "David Hildenbrand (Arm)" <david@kernel.org> wrote:
>>
>>>
>>> Right.
>>>
>>> The original code used
>>>
>>>       bool mmap_locked = true;
>>>
>>> If the fix get squashed, maybe we should revert to that handling to
>>> minimize the churn?
>>>
>>> (this is not in stable, right?)
>>
>> "mm/khugepaged: unify khugepaged and madv_collapse with
>> collapse_single_pmd(): is presently in mm-stable.
>>
>> Doing a big rebase&rework isn't the done thing, I believe.  So queue it
>> afterwards, add the Fixes: and tolerate the minor runtime bisection
>> hole.  It ends up looking like a later hotfix.
>>
>>
>> Below is what I added to mm-unstable.  I added a note-to-self to check
>> in on Nico's question at
>> https://lore.kernel.org/all/CAA1CXcCYLGQBy9nT-MSTPVK=XgWWtD6Af-QiA_6pXtM_2pybPg@mail.gmail.com/T/#u
>>
>>
>> From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
>> Subject: mm/khugepaged: fix issue with tracking lock
>> Date: Tue, 31 Mar 2026 13:11:18 +0100
>>
>> We are incorrectly treating lock_dropped to track both whether the lock is
>> currently held and whether or not the lock was ever dropped.
>>
>> Update this change to account for this.
>>
>> Link: https://lkml.kernel.org/r/7760c811-e100-4d40-9217-0813c28314be@lucifer.local
>> Fixes: 330f3758a3bc ("mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()")
>> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
>> Reviewed-by: Nico Pache <npache@redhat.com>
>> ...
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> ---
>>
>>  mm/khugepaged.c |   12 ++++++++----
>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> --- a/mm/khugepaged.c~mm-khugepaged-fix-issue-with-tracking-lock
>> +++ a/mm/khugepaged.c
>> @@ -2828,6 +2828,7 @@ int madvise_collapse(struct vm_area_stru
>>         unsigned long hstart, hend, addr;
>>         enum scan_result last_fail = SCAN_FAIL;
>>         int thps = 0;
>> +       bool mmap_unlocked = false;
>>
>>         BUG_ON(vma->vm_start > start);
>>         BUG_ON(vma->vm_end < end);
>> @@ -2850,10 +2851,11 @@ int madvise_collapse(struct vm_area_stru
>>         for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
>>                 enum scan_result result = SCAN_FAIL;
>>
>> -               if (*lock_dropped) {
>> +               if (mmap_unlocked) {
>>                         cond_resched();
>>                         mmap_read_lock(mm);
>> -                       *lock_dropped = false;
>> +                       mmap_unlocked = false;
>> +                       *lock_dropped = true;
>>                         result = hugepage_vma_revalidate(mm, addr, false, &vma,
>>                                                          cc);
>>                         if (result  != SCAN_SUCCEED) {
>> @@ -2864,7 +2866,7 @@ int madvise_collapse(struct vm_area_stru
>>                         hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
>>                 }
>>
>> -               result = collapse_single_pmd(addr, vma, lock_dropped, cc);
>> +               result = collapse_single_pmd(addr, vma, &mmap_unlocked, cc);
>>
>>                 switch (result) {
>>                 case SCAN_SUCCEED:
>> @@ -2893,8 +2895,10 @@ int madvise_collapse(struct vm_area_stru
>>
>>  out_maybelock:
>>         /* Caller expects us to hold mmap_lock on return */
>> -       if (*lock_dropped)
>> +       if (mmap_unlocked) {
>> +               *lock_dropped = true;
> 
> Hmm this was =false in David's example and = true here? Now i am also
> confused hahaha. It seems Lorenzo's code is correct. I think i
> responded to my own code changes, when I thought David was replying to
> Lorenzo's.

I think I messed up that in my posting when manually applying :)

So this here is correct.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-03-31 21:09                 ` Nico Pache
@ 2026-04-01  8:14                   ` Lorenzo Stoakes (Oracle)
  2026-04-01 20:31                     ` Andrew Morton
  0 siblings, 1 reply; 35+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-04-01  8:14 UTC (permalink / raw)
  To: Nico Pache
  Cc: David Hildenbrand (Arm), linux-kernel, linux-mm, aarcange, akpm,
	anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
	hannes, hughd, jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe

On Tue, Mar 31, 2026 at 03:09:08PM -0600, Nico Pache wrote:
> On Tue, Mar 31, 2026 at 3:04 PM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
> >
> > On 3/31/26 22:50, David Hildenbrand (Arm) wrote:
> > > On 3/31/26 22:06, Lorenzo Stoakes (Oracle) wrote:
> > >> On Tue, Mar 31, 2026 at 10:00:58PM +0200, David Hildenbrand (Arm) wrote:
> > >>>
> > >>> Right.
> > >>>
> > >>> The original code used
> > >>>
> > >>>     bool mmap_locked = true;
> > >>>
> > >>> If the fix get squashed, maybe we should revert to that handling to
> > >>> minimize the churn?
> > >>
> > >> Well collapse_single_pmd() and all descendents would need to be updated to
> > >> invert the boolean, not sure it's worth it.
> > >
> > > The code is confusing me.
> > >
> > > In the old code, collapse_single_pmd() was called with "mmap_locked ==
> > > true" and set "mmap_locked=false".
> > >
> > > Now we call it with "mmap_unlocked=false" and it sets "mmap_unlocked=true".
> > >
> > > So far so good.
> > >
> > > However, collapse_scan_pmd() consumed "mmap_locked", now it effectively
> > > consumes "mmap_unlocked".
> >
> > Okay, I'm too confused for today and give up :)
> >
> > FWIW, this is the effective change, and the effective changes to
> > mmap_unlocked and lock_dropped ... confuse me:
>
> basically all the logic was inverted.

Yup.

> >  out_maybelock:
> >         /* Caller expects us to hold mmap_lock on return */
> > -       if (!mmap_locked)
> > +       if (mmap_unlocked) {
> > +               *lock_dropped = false;
>
> Lorenzo, is this correct? At first glance, shouldn't this be
> `locked_dropped = true`?

It is :) you're looking at the old code from your patch maybe?

I include my fix-patch again below to make life easier, but that code is:

+		if (mmap_unlocked) {
 			cond_resched();
 			mmap_read_lock(mm);
-			*lock_dropped = false;
+			mmap_unlocked = false;
+			*lock_dropped = true;

Having one on top of the other is causing the confusion :)

I also asked Claude to check this btw for correctness. I'm pretty confident it's
ok.

Cheers, Lorenzo

----8<----
From a4dfc7718a15035449f344a0bc7f58e449366405 Mon Sep 17 00:00:00 2001
From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Date: Tue, 31 Mar 2026 13:11:18 +0100
Subject: [PATCH] mm/khugepaged: fix issue with tracking lock

We are incorrectly treating lock_dropped to track both whether the lock is
currently held and whether or not the lock was ever dropped.

Update this change to account for this.

Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 mm/khugepaged.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index d21348b85a59..b8452dbdb043 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2828,6 +2828,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 	unsigned long hstart, hend, addr;
 	enum scan_result last_fail = SCAN_FAIL;
 	int thps = 0;
+	bool mmap_unlocked = false;

 	BUG_ON(vma->vm_start > start);
 	BUG_ON(vma->vm_end < end);
@@ -2850,10 +2851,11 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
 		enum scan_result result = SCAN_FAIL;

-		if (*lock_dropped) {
+		if (mmap_unlocked) {
 			cond_resched();
 			mmap_read_lock(mm);
-			*lock_dropped = false;
+			mmap_unlocked = false;
+			*lock_dropped = true;
 			result = hugepage_vma_revalidate(mm, addr, false, &vma,
 							 cc);
 			if (result  != SCAN_SUCCEED) {
@@ -2864,7 +2866,7 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
 			hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
 		}

-		result = collapse_single_pmd(addr, vma, lock_dropped, cc);
+		result = collapse_single_pmd(addr, vma, &mmap_unlocked, cc);

 		switch (result) {
 		case SCAN_SUCCEED:
@@ -2893,8 +2895,10 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,

 out_maybelock:
 	/* Caller expects us to hold mmap_lock on return */
-	if (*lock_dropped)
+	if (mmap_unlocked) {
+		*lock_dropped = true;
 		mmap_read_lock(mm);
+	}
 out_nolock:
 	mmap_assert_locked(mm);
 	mmdrop(mm);
--
2.53.0


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-04-01  7:05               ` David Hildenbrand (Arm)
@ 2026-04-01  8:17                 ` Lorenzo Stoakes (Oracle)
  0 siblings, 0 replies; 35+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-04-01  8:17 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Nico Pache, Andrew Morton, linux-kernel, linux-mm, aarcange,
	anshuman.khandual, apopple, baohua, baolin.wang, byungchul,
	catalin.marinas, cl, corbet, dave.hansen, dev.jain, gourry,
	hannes, hughd, jackmanb, jack, jannh, jglisse, joshua.hahnjy, kas,
	lance.yang, Liam.Howlett, lorenzo.stoakes, mathieu.desnoyers,
	matthew.brost, mhiramat, mhocko, peterx, pfalcato, rakie.kim,
	raquini, rdunlap, richard.weiyang, rientjes, rostedt, rppt,
	ryan.roberts, shivankg, sunnanyong, surenb, thomas.hellstrom,
	tiwai, usamaarif642, vbabka, vishal.moola, wangkefeng.wang, will,
	willy, yang, ying.huang, ziy, zokeefe

On Wed, Apr 01, 2026 at 09:05:39AM +0200, David Hildenbrand (Arm) wrote:
> On 3/31/26 23:49, Nico Pache wrote:
> > On Tue, Mar 31, 2026 at 3:35 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >>
> >> On Tue, 31 Mar 2026 22:00:58 +0200 "David Hildenbrand (Arm)" <david@kernel.org> wrote:
> >>
> >>>
> >>> Right.
> >>>
> >>> The original code used
> >>>
> >>>       bool mmap_locked = true;
> >>>
> >>> If the fix get squashed, maybe we should revert to that handling to
> >>> minimize the churn?
> >>>
> >>> (this is not in stable, right?)
> >>
> >> "mm/khugepaged: unify khugepaged and madv_collapse with
> >> collapse_single_pmd(): is presently in mm-stable.
> >>
> >> Doing a big rebase&rework isn't the done thing, I believe.  So queue it
> >> afterwards, add the Fixes: and tolerate the minor runtime bisection
> >> hole.  It ends up looking like a later hotfix.
> >>
> >>
> >> Below is what I added to mm-unstable.  I added a note-to-self to check
> >> in on Nico's question at
> >> https://lore.kernel.org/all/CAA1CXcCYLGQBy9nT-MSTPVK=XgWWtD6Af-QiA_6pXtM_2pybPg@mail.gmail.com/T/#u
> >>
> >>
> >> From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
> >> Subject: mm/khugepaged: fix issue with tracking lock
> >> Date: Tue, 31 Mar 2026 13:11:18 +0100
> >>
> >> We are incorrectly treating lock_dropped to track both whether the lock is
> >> currently held and whether or not the lock was ever dropped.
> >>
> >> Update this change to account for this.
> >>
> >> Link: https://lkml.kernel.org/r/7760c811-e100-4d40-9217-0813c28314be@lucifer.local
> >> Fixes: 330f3758a3bc ("mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()")
> >> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> >> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> >> Reviewed-by: Nico Pache <npache@redhat.com>
> >> ...
> >> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> >> ---
> >>
> >>  mm/khugepaged.c |   12 ++++++++----
> >>  1 file changed, 8 insertions(+), 4 deletions(-)
> >>
> >> --- a/mm/khugepaged.c~mm-khugepaged-fix-issue-with-tracking-lock
> >> +++ a/mm/khugepaged.c
> >> @@ -2828,6 +2828,7 @@ int madvise_collapse(struct vm_area_stru
> >>         unsigned long hstart, hend, addr;
> >>         enum scan_result last_fail = SCAN_FAIL;
> >>         int thps = 0;
> >> +       bool mmap_unlocked = false;
> >>
> >>         BUG_ON(vma->vm_start > start);
> >>         BUG_ON(vma->vm_end < end);
> >> @@ -2850,10 +2851,11 @@ int madvise_collapse(struct vm_area_stru
> >>         for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
> >>                 enum scan_result result = SCAN_FAIL;
> >>
> >> -               if (*lock_dropped) {
> >> +               if (mmap_unlocked) {
> >>                         cond_resched();
> >>                         mmap_read_lock(mm);
> >> -                       *lock_dropped = false;
> >> +                       mmap_unlocked = false;
> >> +                       *lock_dropped = true;
> >>                         result = hugepage_vma_revalidate(mm, addr, false, &vma,
> >>                                                          cc);
> >>                         if (result  != SCAN_SUCCEED) {
> >> @@ -2864,7 +2866,7 @@ int madvise_collapse(struct vm_area_stru
> >>                         hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
> >>                 }
> >>
> >> -               result = collapse_single_pmd(addr, vma, lock_dropped, cc);
> >> +               result = collapse_single_pmd(addr, vma, &mmap_unlocked, cc);
> >>
> >>                 switch (result) {
> >>                 case SCAN_SUCCEED:
> >> @@ -2893,8 +2895,10 @@ int madvise_collapse(struct vm_area_stru
> >>
> >>  out_maybelock:
> >>         /* Caller expects us to hold mmap_lock on return */
> >> -       if (*lock_dropped)
> >> +       if (mmap_unlocked) {
> >> +               *lock_dropped = true;
> >
> > Hmm this was =false in David's example and = true here? Now i am also
> > confused hahaha. It seems Lorenzo's code is correct. I think i
> > responded to my own code changes, when I thought David was replying to
> > Lorenzo's.
>
> I think I messed up that in my posting when manually applying :)
>
> So this here is correct.

Yeah the fix-patch on top made it all confusing :)

FWIW :P I also had claude do its thing to check it on this, and it gave the
thumbs up.

>
> --
> Cheers,
>
> David

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()
  2026-04-01  8:14                   ` Lorenzo Stoakes (Oracle)
@ 2026-04-01 20:31                     ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2026-04-01 20:31 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Nico Pache, David Hildenbrand (Arm), linux-kernel, linux-mm,
	aarcange, anshuman.khandual, apopple, baohua, baolin.wang,
	byungchul, catalin.marinas, cl, corbet, dave.hansen, dev.jain,
	gourry, hannes, hughd, jackmanb, jack, jannh, jglisse,
	joshua.hahnjy, kas, lance.yang, Liam.Howlett, lorenzo.stoakes,
	mathieu.desnoyers, matthew.brost, mhiramat, mhocko, peterx,
	pfalcato, rakie.kim, raquini, rdunlap, richard.weiyang, rientjes,
	rostedt, rppt, ryan.roberts, shivankg, sunnanyong, surenb,
	thomas.hellstrom, tiwai, usamaarif642, vbabka, vishal.moola,
	wangkefeng.wang, will, willy, yang, ying.huang, ziy, zokeefe

On Wed, 1 Apr 2026 09:14:35 +0100 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:

> From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
> Date: Tue, 31 Mar 2026 13:11:18 +0100
> Subject: [PATCH] mm/khugepaged: fix issue with tracking lock
> 
> We are incorrectly treating lock_dropped to track both whether the lock is
> currently held and whether or not the lock was ever dropped.
> 
> Update this change to account for this.
> 
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

Not sure if I should add this?

Should it have

Fixes: 330f3758a3bc ("mm/khugepaged: unify khugepaged and madv_collapse
with collapse_single_pmd()")?


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2026-04-01 20:31 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-25 11:40 [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites Nico Pache
2026-03-25 11:40 ` [PATCH mm-unstable v4 1/5] mm: consolidate anonymous folio PTE mapping into helpers Nico Pache
2026-03-25 11:40 ` [PATCH mm-unstable v4 2/5] mm: introduce is_pmd_order helper Nico Pache
2026-03-25 12:11   ` Lorenzo Stoakes (Oracle)
2026-03-25 14:45     ` Andrew Morton
2026-03-25 14:49       ` Lorenzo Stoakes (Oracle)
2026-03-25 16:05         ` Andrew Morton
2026-03-25 11:40 ` [PATCH mm-unstable v4 3/5] mm/khugepaged: define KHUGEPAGED_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1 Nico Pache
2026-03-25 11:40 ` [PATCH mm-unstable v4 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_* Nico Pache
2026-03-25 12:08   ` Lorenzo Stoakes (Oracle)
2026-03-25 11:40 ` [PATCH mm-unstable v4 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd() Nico Pache
2026-03-31 14:01   ` Lorenzo Stoakes (Oracle)
2026-03-31 14:13     ` David Hildenbrand (Arm)
2026-03-31 14:15       ` Lorenzo Stoakes (Oracle)
2026-03-31 14:46         ` David Hildenbrand (Arm)
2026-03-31 20:00         ` David Hildenbrand (Arm)
2026-03-31 20:06           ` Lorenzo Stoakes (Oracle)
2026-03-31 20:50             ` David Hildenbrand (Arm)
2026-03-31 21:03               ` David Hildenbrand (Arm)
2026-03-31 21:09                 ` Nico Pache
2026-04-01  8:14                   ` Lorenzo Stoakes (Oracle)
2026-04-01 20:31                     ` Andrew Morton
2026-03-31 21:35           ` Andrew Morton
2026-03-31 21:49             ` Nico Pache
2026-04-01  7:05               ` David Hildenbrand (Arm)
2026-04-01  8:17                 ` Lorenzo Stoakes (Oracle)
2026-03-31 19:46       ` Nico Pache
2026-03-31 19:59         ` Lorenzo Stoakes (Oracle)
2026-03-31 16:29     ` Lance Yang
2026-03-31 19:59     ` Nico Pache
2026-03-25 11:44 ` [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites Lorenzo Stoakes (Oracle)
2026-03-26  0:25 ` Andrew Morton
2026-03-26  4:44   ` Roman Gushchin
2026-03-26 16:48     ` Nico Pache
2026-03-26 17:35       ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox