Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH mm-new v3 0/3] refactor and merge PTE scanning logic
@ 2025-10-08  4:37 Lance Yang
  2025-10-08  4:37 ` [PATCH mm-new v3 1/3] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain Lance Yang
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Lance Yang @ 2025-10-08  4:37 UTC (permalink / raw)
  To: akpm, david, lorenzo.stoakes
  Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
	baohua, ioworker0, richard.weiyang, linux-kernel, linux-mm

Hi all,

This series cleans up the almost-duplicated PTE scanning logic in the
collapse path.

The first one is a preparatory step that refactors both loops to use
a single if-else-if-else-if chain for checking disjoint PTEs.

The second one replaces VM_BUG_ON_FOLIO() with a more graceful
VM_WARN_ON_FOLIO() for handling non-anonymous folios.

The last one then extracts the common logic into a shared helper.

Thanks,
Lance

---
This series applies on top of patch[1], which is based on mm-new.
[1] https://lore.kernel.org/linux-mm/20251008032657.72406-1-lance.yang@linux.dev

v2 -> v3:
- #02 Collect Reviewed-by from Wei and Dev - thanks!
- #03 Use vm_normal_folio() and drop struct page altogether (per Dev)
- #03 Move the cc parameter to the end (per Dev)
- https://lore.kernel.org/linux-mm/20251006144338.96519-1-lance.yang@linux.dev

v1 -> v2:
- #01 Update the changelog (per Dev)
- #01 Collect Reviewed-by from Wei, Dev and Zi - thanks!
- #03 Make more of the scanning logic common between scan_pmd() and
      _isolate() (per Dev)
- https://lore.kernel.org/linux-mm/20251002073255.14867-1-lance.yang@linux.dev

Lance Yang (3):
  mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain
  mm/khugepaged: use VM_WARN_ON_FOLIO instead of VM_BUG_ON_FOLIO for
    non-anon folios
  mm/khugepaged: merge PTE scanning logic into a new helper

 mm/khugepaged.c | 242 ++++++++++++++++++++++++++----------------------
 1 file changed, 130 insertions(+), 112 deletions(-)

-- 
2.49.0



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH mm-new v3 1/3] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain
  2025-10-08  4:37 [PATCH mm-new v3 0/3] refactor and merge PTE scanning logic Lance Yang
@ 2025-10-08  4:37 ` Lance Yang
  2025-10-14 12:17   ` Lorenzo Stoakes
  2025-10-08  4:37 ` [PATCH mm-new v3 2/3] mm/khugepaged: use VM_WARN_ON_FOLIO instead of VM_BUG_ON_FOLIO for non-anon folios Lance Yang
  2025-10-08  4:37 ` [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper Lance Yang
  2 siblings, 1 reply; 23+ messages in thread
From: Lance Yang @ 2025-10-08  4:37 UTC (permalink / raw)
  To: akpm, david, lorenzo.stoakes
  Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
	baohua, ioworker0, richard.weiyang, linux-kernel, linux-mm,
	Lance Yang

From: Lance Yang <lance.yang@linux.dev>

As pointed out by Dev, the PTE checks for disjoint conditions in the
scanning loops can be optimized. is_swap_pte, (pte_none && is_zero_pfn),
and pte_uffd_wp are mutually exclusive.

This patch refactors the loops in both __collapse_huge_page_isolate() and
hpage_collapse_scan_pmd() to use a continuous if-else-if-else-if chain
instead of separate if blocks. While at it, the redundant pte_present()
check before is_zero_pfn() is also removed.

Also, this is a preparatory step to make it easier to merge the
almost-duplicated scanning logic in these two functions, as suggested
by David.

Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Suggested-by: Dev Jain <dev.jain@arm.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
 mm/khugepaged.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index bec3e268dc76..e3e27223137a 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -548,8 +548,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 	for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
 	     _pte++, addr += PAGE_SIZE) {
 		pte_t pteval = ptep_get(_pte);
-		if (pte_none(pteval) || (pte_present(pteval) &&
-				is_zero_pfn(pte_pfn(pteval)))) {
+		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
 			++none_or_zero;
 			if (!userfaultfd_armed(vma) &&
 			    (!cc->is_khugepaged ||
@@ -560,12 +559,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
 				goto out;
 			}
-		}
-		if (!pte_present(pteval)) {
+		} else if (!pte_present(pteval)) {
 			result = SCAN_PTE_NON_PRESENT;
 			goto out;
-		}
-		if (pte_uffd_wp(pteval)) {
+		} else if (pte_uffd_wp(pteval)) {
 			result = SCAN_PTE_UFFD_WP;
 			goto out;
 		}
@@ -1321,8 +1318,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 				count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
 				goto out_unmap;
 			}
-		}
-		if (pte_uffd_wp(pteval)) {
+		} else if (pte_uffd_wp(pteval)) {
 			/*
 			 * Don't collapse the page if any of the small
 			 * PTEs are armed with uffd write protection.
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH mm-new v3 2/3] mm/khugepaged: use VM_WARN_ON_FOLIO instead of VM_BUG_ON_FOLIO for non-anon folios
  2025-10-08  4:37 [PATCH mm-new v3 0/3] refactor and merge PTE scanning logic Lance Yang
  2025-10-08  4:37 ` [PATCH mm-new v3 1/3] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain Lance Yang
@ 2025-10-08  4:37 ` Lance Yang
  2025-10-14 12:25   ` Lorenzo Stoakes
  2025-10-08  4:37 ` [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper Lance Yang
  2 siblings, 1 reply; 23+ messages in thread
From: Lance Yang @ 2025-10-08  4:37 UTC (permalink / raw)
  To: akpm, david, lorenzo.stoakes
  Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
	baohua, ioworker0, richard.weiyang, linux-kernel, linux-mm,
	Lance Yang

From: Lance Yang <lance.yang@linux.dev>

As Zi pointed out, we should avoid crashing the kernel for conditions
that can be handled gracefully. Encountering a non-anonymous folio in an
anonymous VMA is a bug, but a warning is sufficient.

This patch changes the VM_BUG_ON_FOLIO(!folio_test_anon(folio)) to a
VM_WARN_ON_FOLIO() in both __collapse_huge_page_isolate() and
hpage_collapse_scan_pmd(), and then aborts the scan with SCAN_PAGE_ANON.

Making more of the scanning logic common between hpage_collapse_scan_pmd()
and __collapse_huge_page_isolate(), as suggested by Dev.

Suggested-by: Dev Jain <dev.jain@arm.com>
Suggested-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
 mm/khugepaged.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index e3e27223137a..b5c0295c3414 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -573,7 +573,11 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		}
 
 		folio = page_folio(page);
-		VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
+		if (!folio_test_anon(folio)) {
+			VM_WARN_ON_FOLIO(true, folio);
+			result = SCAN_PAGE_ANON;
+			goto out;
+		}
 
 		/* See hpage_collapse_scan_pmd(). */
 		if (folio_maybe_mapped_shared(folio)) {
@@ -1340,6 +1344,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 		folio = page_folio(page);
 
 		if (!folio_test_anon(folio)) {
+			VM_WARN_ON_FOLIO(true, folio);
 			result = SCAN_PAGE_ANON;
 			goto out_unmap;
 		}
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper
  2025-10-08  4:37 [PATCH mm-new v3 0/3] refactor and merge PTE scanning logic Lance Yang
  2025-10-08  4:37 ` [PATCH mm-new v3 1/3] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain Lance Yang
  2025-10-08  4:37 ` [PATCH mm-new v3 2/3] mm/khugepaged: use VM_WARN_ON_FOLIO instead of VM_BUG_ON_FOLIO for non-anon folios Lance Yang
@ 2025-10-08  4:37 ` Lance Yang
  2025-10-09  1:07   ` Andrew Morton
                     ` (4 more replies)
  2 siblings, 5 replies; 23+ messages in thread
From: Lance Yang @ 2025-10-08  4:37 UTC (permalink / raw)
  To: akpm, david, lorenzo.stoakes
  Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, dev.jain,
	baohua, ioworker0, richard.weiyang, linux-kernel, linux-mm,
	Lance Yang

From: Lance Yang <lance.yang@linux.dev>

As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
and __collapse_huge_page_isolate() was almost duplicated.

This patch cleans things up by moving all the common PTE checking logic
into a new shared helper, thp_collapse_check_pte(). While at it, we use
vm_normal_folio() instead of vm_normal_page().

Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Dev Jain <dev.jain@arm.com>
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
 mm/khugepaged.c | 243 ++++++++++++++++++++++++++----------------------
 1 file changed, 130 insertions(+), 113 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b5c0295c3414..7116caae1fa4 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -61,6 +61,12 @@ enum scan_result {
 	SCAN_PAGE_FILLED,
 };
 
+enum pte_check_result {
+	PTE_CHECK_SUCCEED,
+	PTE_CHECK_CONTINUE,
+	PTE_CHECK_FAIL,
+};
+
 #define CREATE_TRACE_POINTS
 #include <trace/events/huge_memory.h>
 
@@ -533,62 +539,139 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte,
 	}
 }
 
+/*
+ * thp_collapse_check_pte - Check if a PTE is suitable for THP collapse
+ * @pte:           The PTE to check
+ * @vma:           The VMA the PTE belongs to
+ * @addr:          The virtual address corresponding to this PTE
+ * @foliop:        On success, used to return a pointer to the folio
+ *                 Must be non-NULL
+ * @none_or_zero:  Counter for none/zero PTEs. Must be non-NULL
+ * @unmapped:      Counter for swap PTEs. Can be NULL if not scanning swaps
+ * @shared:        Counter for shared pages. Must be non-NULL
+ * @scan_result:   Used to return the failure reason (SCAN_*) on a
+ *                 PTE_CHECK_FAIL return. Must be non-NULL
+ * @cc:            Collapse control settings
+ *
+ * Returns:
+ *   PTE_CHECK_SUCCEED  - PTE is suitable, proceed with further checks
+ *   PTE_CHECK_CONTINUE - Skip this PTE and continue scanning
+ *   PTE_CHECK_FAIL     - Abort collapse scan
+ */
+static inline int thp_collapse_check_pte(pte_t pte, struct vm_area_struct *vma,
+		unsigned long addr, struct folio **foliop, int *none_or_zero,
+		int *unmapped, int *shared, int *scan_result,
+		struct collapse_control *cc)
+{
+	struct folio *folio = NULL;
+
+	if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
+		(*none_or_zero)++;
+		if (!userfaultfd_armed(vma) &&
+		    (!cc->is_khugepaged ||
+		     *none_or_zero <= khugepaged_max_ptes_none)) {
+			return PTE_CHECK_CONTINUE;
+		} else {
+			*scan_result = SCAN_EXCEED_NONE_PTE;
+			count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
+			return PTE_CHECK_FAIL;
+		}
+	} else if (!pte_present(pte)) {
+		if (!unmapped) {
+			*scan_result = SCAN_PTE_NON_PRESENT;
+			return PTE_CHECK_FAIL;
+		}
+
+		if (non_swap_entry(pte_to_swp_entry(pte))) {
+			*scan_result = SCAN_PTE_NON_PRESENT;
+			return PTE_CHECK_FAIL;
+		}
+
+		(*unmapped)++;
+		if (!cc->is_khugepaged ||
+		    *unmapped <= khugepaged_max_ptes_swap) {
+			/*
+			 * Always be strict with uffd-wp enabled swap
+			 * entries. Please see comment below for
+			 * pte_uffd_wp().
+			 */
+			if (pte_swp_uffd_wp(pte)) {
+				*scan_result = SCAN_PTE_UFFD_WP;
+				return PTE_CHECK_FAIL;
+			}
+			return PTE_CHECK_CONTINUE;
+		} else {
+			*scan_result = SCAN_EXCEED_SWAP_PTE;
+			count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
+			return PTE_CHECK_FAIL;
+		}
+	} else if (pte_uffd_wp(pte)) {
+		/*
+		 * Don't collapse the page if any of the small PTEs are
+		 * armed with uffd write protection. Here we can also mark
+		 * the new huge pmd as write protected if any of the small
+		 * ones is marked but that could bring unknown userfault
+		 * messages that falls outside of the registered range.
+		 * So, just be simple.
+		 */
+		*scan_result = SCAN_PTE_UFFD_WP;
+		return PTE_CHECK_FAIL;
+	}
+
+	folio = vm_normal_folio(vma, addr, pte);
+	if (unlikely(!folio) || unlikely(folio_is_zone_device(folio))) {
+		*scan_result = SCAN_PAGE_NULL;
+		return PTE_CHECK_FAIL;
+	}
+
+	if (!folio_test_anon(folio)) {
+		VM_WARN_ON_FOLIO(true, folio);
+		*scan_result = SCAN_PAGE_ANON;
+		return PTE_CHECK_FAIL;
+	}
+
+	/*
+	 * We treat a single page as shared if any part of the THP
+	 * is shared.
+	 */
+	if (folio_maybe_mapped_shared(folio)) {
+		(*shared)++;
+		if (cc->is_khugepaged && *shared > khugepaged_max_ptes_shared) {
+			*scan_result = SCAN_EXCEED_SHARED_PTE;
+			count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
+			return PTE_CHECK_FAIL;
+		}
+	}
+
+	*foliop = folio;
+
+	return PTE_CHECK_SUCCEED;
+}
+
 static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 					unsigned long start_addr,
 					pte_t *pte,
 					struct collapse_control *cc,
 					struct list_head *compound_pagelist)
 {
-	struct page *page = NULL;
 	struct folio *folio = NULL;
 	unsigned long addr = start_addr;
 	pte_t *_pte;
 	int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
+	int pte_check_res;
 
 	for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
 	     _pte++, addr += PAGE_SIZE) {
 		pte_t pteval = ptep_get(_pte);
-		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
-			++none_or_zero;
-			if (!userfaultfd_armed(vma) &&
-			    (!cc->is_khugepaged ||
-			     none_or_zero <= khugepaged_max_ptes_none)) {
-				continue;
-			} else {
-				result = SCAN_EXCEED_NONE_PTE;
-				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
-				goto out;
-			}
-		} else if (!pte_present(pteval)) {
-			result = SCAN_PTE_NON_PRESENT;
-			goto out;
-		} else if (pte_uffd_wp(pteval)) {
-			result = SCAN_PTE_UFFD_WP;
-			goto out;
-		}
-		page = vm_normal_page(vma, addr, pteval);
-		if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
-			result = SCAN_PAGE_NULL;
-			goto out;
-		}
 
-		folio = page_folio(page);
-		if (!folio_test_anon(folio)) {
-			VM_WARN_ON_FOLIO(true, folio);
-			result = SCAN_PAGE_ANON;
-			goto out;
-		}
+		pte_check_res = thp_collapse_check_pte(pteval, vma, addr,
+					&folio, &none_or_zero, NULL, &shared,
+					&result, cc);
 
-		/* See hpage_collapse_scan_pmd(). */
-		if (folio_maybe_mapped_shared(folio)) {
-			++shared;
-			if (cc->is_khugepaged &&
-			    shared > khugepaged_max_ptes_shared) {
-				result = SCAN_EXCEED_SHARED_PTE;
-				count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
-				goto out;
-			}
-		}
+		if (pte_check_res == PTE_CHECK_CONTINUE)
+			continue;
+		else if (pte_check_res == PTE_CHECK_FAIL)
+			goto out;
 
 		if (folio_test_large(folio)) {
 			struct folio *f;
@@ -1264,11 +1347,11 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 	pte_t *pte, *_pte;
 	int result = SCAN_FAIL, referenced = 0;
 	int none_or_zero = 0, shared = 0;
-	struct page *page = NULL;
 	struct folio *folio = NULL;
 	unsigned long addr;
 	spinlock_t *ptl;
 	int node = NUMA_NO_NODE, unmapped = 0;
+	int pte_check_res;
 
 	VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
 
@@ -1287,81 +1370,15 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
 	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
 	     _pte++, addr += PAGE_SIZE) {
 		pte_t pteval = ptep_get(_pte);
-		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
-			++none_or_zero;
-			if (!userfaultfd_armed(vma) &&
-			    (!cc->is_khugepaged ||
-			     none_or_zero <= khugepaged_max_ptes_none)) {
-				continue;
-			} else {
-				result = SCAN_EXCEED_NONE_PTE;
-				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
-				goto out_unmap;
-			}
-		} else if (!pte_present(pteval)) {
-			if (non_swap_entry(pte_to_swp_entry(pteval))) {
-				result = SCAN_PTE_NON_PRESENT;
-				goto out_unmap;
-			}
 
-			++unmapped;
-			if (!cc->is_khugepaged ||
-			    unmapped <= khugepaged_max_ptes_swap) {
-				/*
-				 * Always be strict with uffd-wp
-				 * enabled swap entries.  Please see
-				 * comment below for pte_uffd_wp().
-				 */
-				if (pte_swp_uffd_wp(pteval)) {
-					result = SCAN_PTE_UFFD_WP;
-					goto out_unmap;
-				}
-				continue;
-			} else {
-				result = SCAN_EXCEED_SWAP_PTE;
-				count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
-				goto out_unmap;
-			}
-		} else if (pte_uffd_wp(pteval)) {
-			/*
-			 * Don't collapse the page if any of the small
-			 * PTEs are armed with uffd write protection.
-			 * Here we can also mark the new huge pmd as
-			 * write protected if any of the small ones is
-			 * marked but that could bring unknown
-			 * userfault messages that falls outside of
-			 * the registered range.  So, just be simple.
-			 */
-			result = SCAN_PTE_UFFD_WP;
-			goto out_unmap;
-		}
+		pte_check_res = thp_collapse_check_pte(pteval, vma, addr,
+					&folio, &none_or_zero, &unmapped,
+					&shared, &result, cc);
 
-		page = vm_normal_page(vma, addr, pteval);
-		if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
-			result = SCAN_PAGE_NULL;
-			goto out_unmap;
-		}
-		folio = page_folio(page);
-
-		if (!folio_test_anon(folio)) {
-			VM_WARN_ON_FOLIO(true, folio);
-			result = SCAN_PAGE_ANON;
+		if (pte_check_res == PTE_CHECK_CONTINUE)
+			continue;
+		else if (pte_check_res == PTE_CHECK_FAIL)
 			goto out_unmap;
-		}
-
-		/*
-		 * We treat a single page as shared if any part of the THP
-		 * is shared.
-		 */
-		if (folio_maybe_mapped_shared(folio)) {
-			++shared;
-			if (cc->is_khugepaged &&
-			    shared > khugepaged_max_ptes_shared) {
-				result = SCAN_EXCEED_SHARED_PTE;
-				count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
-				goto out_unmap;
-			}
-		}
 
 		/*
 		 * Record which node the original page is from and save this
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper
  2025-10-08  4:37 ` [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper Lance Yang
@ 2025-10-09  1:07   ` Andrew Morton
  2025-10-09  1:49     ` Lance Yang
  2025-10-10  9:10   ` Dev Jain
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2025-10-09  1:07 UTC (permalink / raw)
  To: Lance Yang
  Cc: david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, ioworker0, richard.weiyang,
	linux-kernel, linux-mm

On Wed,  8 Oct 2025 12:37:48 +0800 Lance Yang <lance.yang@linux.dev> wrote:

> +		if (!cc->is_khugepaged ||
> +		    *unmapped <= khugepaged_max_ptes_swap) {
> +			/*
> +			 * Always be strict with uffd-wp enabled swap
> +			 * entries. Please see comment below for
> +			 * pte_uffd_wp().
> +			 */
> +			if (pte_swp_uffd_wp(pte)) {
> +				*scan_result = SCAN_PTE_UFFD_WP;
> +				return PTE_CHECK_FAIL;
> +			}
> +			return PTE_CHECK_CONTINUE;
> +		} else {
> +			*scan_result = SCAN_EXCEED_SWAP_PTE;
> +			count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
> +			return PTE_CHECK_FAIL;
> +		}

I'm inclined to agree with checkpatch here.

WARNING: else is not generally useful after a break or return
#81: FILE: mm/khugepaged.c:574:
+			return PTE_CHECK_CONTINUE;
+		} else {

did you see this and disagree or did you forget to run checkpatch?

--- a/mm/khugepaged.c~mm-khugepaged-merge-pte-scanning-logic-into-a-new-helper-checkpatch-fixes
+++ a/mm/khugepaged.c
@@ -571,11 +571,10 @@ static inline int thp_collapse_check_pte
 		    (!cc->is_khugepaged ||
 		     *none_or_zero <= khugepaged_max_ptes_none)) {
 			return PTE_CHECK_CONTINUE;
-		} else {
-			*scan_result = SCAN_EXCEED_NONE_PTE;
-			count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
-			return PTE_CHECK_FAIL;
 		}
+		*scan_result = SCAN_EXCEED_NONE_PTE;
+		count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
+		return PTE_CHECK_FAIL;
 	} else if (!pte_present(pte)) {
 		if (!unmapped) {
 			*scan_result = SCAN_PTE_NON_PRESENT;
@@ -600,11 +599,10 @@ static inline int thp_collapse_check_pte
 				return PTE_CHECK_FAIL;
 			}
 			return PTE_CHECK_CONTINUE;
-		} else {
-			*scan_result = SCAN_EXCEED_SWAP_PTE;
-			count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
-			return PTE_CHECK_FAIL;
 		}
+		*scan_result = SCAN_EXCEED_SWAP_PTE;
+		count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
+		return PTE_CHECK_FAIL;
 	} else if (pte_uffd_wp(pte)) {
 		/*
 		 * Don't collapse the page if any of the small PTEs are
_



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper
  2025-10-09  1:07   ` Andrew Morton
@ 2025-10-09  1:49     ` Lance Yang
  0 siblings, 0 replies; 23+ messages in thread
From: Lance Yang @ 2025-10-09  1:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, ioworker0, richard.weiyang,
	linux-kernel, linux-mm



On 2025/10/9 09:07, Andrew Morton wrote:
> On Wed,  8 Oct 2025 12:37:48 +0800 Lance Yang <lance.yang@linux.dev> wrote:
> 
>> +		if (!cc->is_khugepaged ||
>> +		    *unmapped <= khugepaged_max_ptes_swap) {
>> +			/*
>> +			 * Always be strict with uffd-wp enabled swap
>> +			 * entries. Please see comment below for
>> +			 * pte_uffd_wp().
>> +			 */
>> +			if (pte_swp_uffd_wp(pte)) {
>> +				*scan_result = SCAN_PTE_UFFD_WP;
>> +				return PTE_CHECK_FAIL;
>> +			}
>> +			return PTE_CHECK_CONTINUE;
>> +		} else {
>> +			*scan_result = SCAN_EXCEED_SWAP_PTE;
>> +			count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
>> +			return PTE_CHECK_FAIL;
>> +		}
> 
> I'm inclined to agree with checkpatch here.

Thanks!

> 
> WARNING: else is not generally useful after a break or return
> #81: FILE: mm/khugepaged.c:574:
> +			return PTE_CHECK_CONTINUE;
> +		} else {
> 
> did you see this and disagree or did you forget to run checkpatch?

Yes, I saw the warning. I kept the original style because this is just a 
code move ...

> 
> --- a/mm/khugepaged.c~mm-khugepaged-merge-pte-scanning-logic-into-a-new-helper-checkpatch-fixes
> +++ a/mm/khugepaged.c
> @@ -571,11 +571,10 @@ static inline int thp_collapse_check_pte
>   		    (!cc->is_khugepaged ||
>   		     *none_or_zero <= khugepaged_max_ptes_none)) {
>   			return PTE_CHECK_CONTINUE;
> -		} else {
> -			*scan_result = SCAN_EXCEED_NONE_PTE;
> -			count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> -			return PTE_CHECK_FAIL;
>   		}
> +		*scan_result = SCAN_EXCEED_NONE_PTE;
> +		count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> +		return PTE_CHECK_FAIL;
>   	} else if (!pte_present(pte)) {
>   		if (!unmapped) {
>   			*scan_result = SCAN_PTE_NON_PRESENT;
> @@ -600,11 +599,10 @@ static inline int thp_collapse_check_pte
>   				return PTE_CHECK_FAIL;
>   			}
>   			return PTE_CHECK_CONTINUE;
> -		} else {
> -			*scan_result = SCAN_EXCEED_SWAP_PTE;
> -			count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
> -			return PTE_CHECK_FAIL;
>   		}
> +		*scan_result = SCAN_EXCEED_SWAP_PTE;
> +		count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
> +		return PTE_CHECK_FAIL;
>   	} else if (pte_uffd_wp(pte)) {
>   		/*
>   		 * Don't collapse the page if any of the small PTEs are
> _
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper
  2025-10-08  4:37 ` [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper Lance Yang
  2025-10-09  1:07   ` Andrew Morton
@ 2025-10-10  9:10   ` Dev Jain
  2025-10-10 10:42     ` Lance Yang
  2025-10-10 13:29   ` Wei Yang
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Dev Jain @ 2025-10-10  9:10 UTC (permalink / raw)
  To: Lance Yang, akpm, david, lorenzo.stoakes
  Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, baohua,
	ioworker0, richard.weiyang, linux-kernel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 1387 bytes --]


On 08/10/25 10:07 am, Lance Yang wrote:
>   	}
>   }
>   
> +/*
> + * thp_collapse_check_pte - Check if a PTE is suitable for THP collapse
> + * @pte:           The PTE to check
> + * @vma:           The VMA the PTE belongs to
> + * @addr:          The virtual address corresponding to this PTE
> + * @foliop:        On success, used to return a pointer to the folio
> + *                 Must be non-NULL
> + * @none_or_zero:  Counter for none/zero PTEs. Must be non-NULL
> + * @unmapped:      Counter for swap PTEs. Can be NULL if not scanning swaps
> + * @shared:        Counter for shared pages. Must be non-NULL
> + * @scan_result:   Used to return the failure reason (SCAN_*) on a
> + *                 PTE_CHECK_FAIL return. Must be non-NULL
> + * @cc:            Collapse control settings
> + *
> + * Returns:
> + *   PTE_CHECK_SUCCEED  - PTE is suitable, proceed with further checks
> + *   PTE_CHECK_CONTINUE - Skip this PTE and continue scanning
> + *   PTE_CHECK_FAIL     - Abort collapse scan
> + */
> +static inline int thp_collapse_check_pte(pte_t pte, struct vm_area_struct *vma,
> +		unsigned long addr, struct folio **foliop, int *none_or_zero,
> +		int *unmapped, int *shared, int *scan_result,
> +		struct collapse_control *cc)
> +{
> +	struct folio *folio = NULL;

I think initialization is not needed here?Otherwise, LGTM. Reviewed-by: Dev Jain <dev.jain@arm.com>

[-- Attachment #2: Type: text/html, Size: 1869 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper
  2025-10-10  9:10   ` Dev Jain
@ 2025-10-10 10:42     ` Lance Yang
  0 siblings, 0 replies; 23+ messages in thread
From: Lance Yang @ 2025-10-10 10:42 UTC (permalink / raw)
  To: Dev Jain
  Cc: ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts, baohua,
	ioworker0, david, richard.weiyang, lorenzo.stoakes, linux-kernel,
	akpm, linux-mm



On 2025/10/10 17:10, Dev Jain wrote:
> 
> On 08/10/25 10:07 am, Lance Yang wrote:
>>   	}
>>   }
>>   
>> +/*
>> + * thp_collapse_check_pte - Check if a PTE is suitable for THP collapse
>> + * @pte:           The PTE to check
>> + * @vma:           The VMA the PTE belongs to
>> + * @addr:          The virtual address corresponding to this PTE
>> + * @foliop:        On success, used to return a pointer to the folio
>> + *                 Must be non-NULL
>> + * @none_or_zero:  Counter for none/zero PTEs. Must be non-NULL
>> + * @unmapped:      Counter for swap PTEs. Can be NULL if not scanning swaps
>> + * @shared:        Counter for shared pages. Must be non-NULL
>> + * @scan_result:   Used to return the failure reason (SCAN_*) on a
>> + *                 PTE_CHECK_FAIL return. Must be non-NULL
>> + * @cc:            Collapse control settings
>> + *
>> + * Returns:
>> + *   PTE_CHECK_SUCCEED  - PTE is suitable, proceed with further checks
>> + *   PTE_CHECK_CONTINUE - Skip this PTE and continue scanning
>> + *   PTE_CHECK_FAIL     - Abort collapse scan
>> + */
>> +static inline int thp_collapse_check_pte(pte_t pte, struct vm_area_struct *vma,
>> +		unsigned long addr, struct folio **foliop, int *none_or_zero,
>> +		int *unmapped, int *shared, int *scan_result,
>> +		struct collapse_control *cc)
>> +{
>> +	struct folio *folio = NULL;
> 
> I think initialization is not needed here?Otherwise, LGTM. Reviewed-by: Dev Jain <dev.jain@arm.com>

Yep. It's a minor thing, so I'll fold that in if a new version is 
needed. Thanks!


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper
  2025-10-08  4:37 ` [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper Lance Yang
  2025-10-09  1:07   ` Andrew Morton
  2025-10-10  9:10   ` Dev Jain
@ 2025-10-10 13:29   ` Wei Yang
  2025-10-10 13:55     ` Lance Yang
  2025-10-14 12:36   ` Lorenzo Stoakes
  2025-10-14 17:41   ` Lorenzo Stoakes
  4 siblings, 1 reply; 23+ messages in thread
From: Wei Yang @ 2025-10-10 13:29 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, ioworker0,
	richard.weiyang, linux-kernel, linux-mm

On Wed, Oct 08, 2025 at 12:37:48PM +0800, Lance Yang wrote:
>From: Lance Yang <lance.yang@linux.dev>
>
>As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
>and __collapse_huge_page_isolate() was almost duplicated.
>
>This patch cleans things up by moving all the common PTE checking logic
>into a new shared helper, thp_collapse_check_pte(). While at it, we use
>vm_normal_folio() instead of vm_normal_page().
>
>Suggested-by: David Hildenbrand <david@redhat.com>
>Suggested-by: Dev Jain <dev.jain@arm.com>
>Signed-off-by: Lance Yang <lance.yang@linux.dev>
>---
> mm/khugepaged.c | 243 ++++++++++++++++++++++++++----------------------
> 1 file changed, 130 insertions(+), 113 deletions(-)
>
>diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>index b5c0295c3414..7116caae1fa4 100644
>--- a/mm/khugepaged.c
>+++ b/mm/khugepaged.c
>@@ -61,6 +61,12 @@ enum scan_result {
> 	SCAN_PAGE_FILLED,
> };
> 
>+enum pte_check_result {
>+	PTE_CHECK_SUCCEED,
>+	PTE_CHECK_CONTINUE,
>+	PTE_CHECK_FAIL,
>+};
>+
> #define CREATE_TRACE_POINTS
> #include <trace/events/huge_memory.h>
> 
>@@ -533,62 +539,139 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte,
> 	}
> }
> 
>+/*
>+ * thp_collapse_check_pte - Check if a PTE is suitable for THP collapse
>+ * @pte:           The PTE to check
>+ * @vma:           The VMA the PTE belongs to
>+ * @addr:          The virtual address corresponding to this PTE
>+ * @foliop:        On success, used to return a pointer to the folio
>+ *                 Must be non-NULL
>+ * @none_or_zero:  Counter for none/zero PTEs. Must be non-NULL
>+ * @unmapped:      Counter for swap PTEs. Can be NULL if not scanning swaps
>+ * @shared:        Counter for shared pages. Must be non-NULL
>+ * @scan_result:   Used to return the failure reason (SCAN_*) on a
>+ *                 PTE_CHECK_FAIL return. Must be non-NULL
>+ * @cc:            Collapse control settings
>+ *
>+ * Returns:
>+ *   PTE_CHECK_SUCCEED  - PTE is suitable, proceed with further checks
>+ *   PTE_CHECK_CONTINUE - Skip this PTE and continue scanning
>+ *   PTE_CHECK_FAIL     - Abort collapse scan
>+ */
>+static inline int thp_collapse_check_pte(pte_t pte, struct vm_area_struct *vma,
>+		unsigned long addr, struct folio **foliop, int *none_or_zero,
>+		int *unmapped, int *shared, int *scan_result,
>+		struct collapse_control *cc)
>+{
>+	struct folio *folio = NULL;
>+
>+	if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
>+		(*none_or_zero)++;
>+		if (!userfaultfd_armed(vma) &&
>+		    (!cc->is_khugepaged ||
>+		     *none_or_zero <= khugepaged_max_ptes_none)) {
>+			return PTE_CHECK_CONTINUE;
>+		} else {
>+			*scan_result = SCAN_EXCEED_NONE_PTE;
>+			count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>+			return PTE_CHECK_FAIL;
>+		}
>+	} else if (!pte_present(pte)) {
>+		if (!unmapped) {
>+			*scan_result = SCAN_PTE_NON_PRESENT;
>+			return PTE_CHECK_FAIL;
>+		}
>+
>+		if (non_swap_entry(pte_to_swp_entry(pte))) {
>+			*scan_result = SCAN_PTE_NON_PRESENT;
>+			return PTE_CHECK_FAIL;
>+		}
>+
>+		(*unmapped)++;
>+		if (!cc->is_khugepaged ||
>+		    *unmapped <= khugepaged_max_ptes_swap) {
>+			/*
>+			 * Always be strict with uffd-wp enabled swap
>+			 * entries. Please see comment below for
>+			 * pte_uffd_wp().
>+			 */
>+			if (pte_swp_uffd_wp(pte)) {
>+				*scan_result = SCAN_PTE_UFFD_WP;
>+				return PTE_CHECK_FAIL;
>+			}
>+			return PTE_CHECK_CONTINUE;
>+		} else {
>+			*scan_result = SCAN_EXCEED_SWAP_PTE;
>+			count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
>+			return PTE_CHECK_FAIL;
>+		}
>+	} else if (pte_uffd_wp(pte)) {
>+		/*
>+		 * Don't collapse the page if any of the small PTEs are
>+		 * armed with uffd write protection. Here we can also mark
>+		 * the new huge pmd as write protected if any of the small
>+		 * ones is marked but that could bring unknown userfault
>+		 * messages that falls outside of the registered range.
>+		 * So, just be simple.
>+		 */
>+		*scan_result = SCAN_PTE_UFFD_WP;
>+		return PTE_CHECK_FAIL;
>+	}
>+
>+	folio = vm_normal_folio(vma, addr, pte);
>+	if (unlikely(!folio) || unlikely(folio_is_zone_device(folio))) {
>+		*scan_result = SCAN_PAGE_NULL;
>+		return PTE_CHECK_FAIL;
>+	}
>+
>+	if (!folio_test_anon(folio)) {
>+		VM_WARN_ON_FOLIO(true, folio);
>+		*scan_result = SCAN_PAGE_ANON;
>+		return PTE_CHECK_FAIL;
>+	}
>+
>+	/*
>+	 * We treat a single page as shared if any part of the THP
>+	 * is shared.
>+	 */
>+	if (folio_maybe_mapped_shared(folio)) {
>+		(*shared)++;
>+		if (cc->is_khugepaged && *shared > khugepaged_max_ptes_shared) {
>+			*scan_result = SCAN_EXCEED_SHARED_PTE;
>+			count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
>+			return PTE_CHECK_FAIL;
>+		}
>+	}
>+
>+	*foliop = folio;
>+
>+	return PTE_CHECK_SUCCEED;
>+}
>+

This one looks much better.

While my personal feeling is this is not a complete work to merge the scanning
logic. We still have folio_expected_ref_count() and pte_young() check present
both in __collapse_huge_page_isolate() and huge_collapse_scan_pmd().

-- 
Wei Yang
Help you, Help me


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper
  2025-10-10 13:29   ` Wei Yang
@ 2025-10-10 13:55     ` Lance Yang
  0 siblings, 0 replies; 23+ messages in thread
From: Lance Yang @ 2025-10-10 13:55 UTC (permalink / raw)
  To: Wei Yang
  Cc: akpm, david, lorenzo.stoakes, ziy, baolin.wang, Liam.Howlett,
	npache, ryan.roberts, dev.jain, baohua, ioworker0, linux-kernel,
	linux-mm



On 2025/10/10 21:29, Wei Yang wrote:
> On Wed, Oct 08, 2025 at 12:37:48PM +0800, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
>> and __collapse_huge_page_isolate() was almost duplicated.
>>
>> This patch cleans things up by moving all the common PTE checking logic
>> into a new shared helper, thp_collapse_check_pte(). While at it, we use
>> vm_normal_folio() instead of vm_normal_page().
>>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Suggested-by: Dev Jain <dev.jain@arm.com>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>> mm/khugepaged.c | 243 ++++++++++++++++++++++++++----------------------
>> 1 file changed, 130 insertions(+), 113 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index b5c0295c3414..7116caae1fa4 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -61,6 +61,12 @@ enum scan_result {
>> 	SCAN_PAGE_FILLED,
>> };
>>
>> +enum pte_check_result {
>> +	PTE_CHECK_SUCCEED,
>> +	PTE_CHECK_CONTINUE,
>> +	PTE_CHECK_FAIL,
>> +};
>> +
>> #define CREATE_TRACE_POINTS
>> #include <trace/events/huge_memory.h>
>>
>> @@ -533,62 +539,139 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte,
>> 	}
>> }
>>
>> +/*
>> + * thp_collapse_check_pte - Check if a PTE is suitable for THP collapse
>> + * @pte:           The PTE to check
>> + * @vma:           The VMA the PTE belongs to
>> + * @addr:          The virtual address corresponding to this PTE
>> + * @foliop:        On success, used to return a pointer to the folio
>> + *                 Must be non-NULL
>> + * @none_or_zero:  Counter for none/zero PTEs. Must be non-NULL
>> + * @unmapped:      Counter for swap PTEs. Can be NULL if not scanning swaps
>> + * @shared:        Counter for shared pages. Must be non-NULL
>> + * @scan_result:   Used to return the failure reason (SCAN_*) on a
>> + *                 PTE_CHECK_FAIL return. Must be non-NULL
>> + * @cc:            Collapse control settings
>> + *
>> + * Returns:
>> + *   PTE_CHECK_SUCCEED  - PTE is suitable, proceed with further checks
>> + *   PTE_CHECK_CONTINUE - Skip this PTE and continue scanning
>> + *   PTE_CHECK_FAIL     - Abort collapse scan
>> + */
>> +static inline int thp_collapse_check_pte(pte_t pte, struct vm_area_struct *vma,
>> +		unsigned long addr, struct folio **foliop, int *none_or_zero,
>> +		int *unmapped, int *shared, int *scan_result,
>> +		struct collapse_control *cc)
>> +{
>> +	struct folio *folio = NULL;
>> +
>> +	if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
>> +		(*none_or_zero)++;
>> +		if (!userfaultfd_armed(vma) &&
>> +		    (!cc->is_khugepaged ||
>> +		     *none_or_zero <= khugepaged_max_ptes_none)) {
>> +			return PTE_CHECK_CONTINUE;
>> +		} else {
>> +			*scan_result = SCAN_EXCEED_NONE_PTE;
>> +			count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> +			return PTE_CHECK_FAIL;
>> +		}
>> +	} else if (!pte_present(pte)) {
>> +		if (!unmapped) {
>> +			*scan_result = SCAN_PTE_NON_PRESENT;
>> +			return PTE_CHECK_FAIL;
>> +		}
>> +
>> +		if (non_swap_entry(pte_to_swp_entry(pte))) {
>> +			*scan_result = SCAN_PTE_NON_PRESENT;
>> +			return PTE_CHECK_FAIL;
>> +		}
>> +
>> +		(*unmapped)++;
>> +		if (!cc->is_khugepaged ||
>> +		    *unmapped <= khugepaged_max_ptes_swap) {
>> +			/*
>> +			 * Always be strict with uffd-wp enabled swap
>> +			 * entries. Please see comment below for
>> +			 * pte_uffd_wp().
>> +			 */
>> +			if (pte_swp_uffd_wp(pte)) {
>> +				*scan_result = SCAN_PTE_UFFD_WP;
>> +				return PTE_CHECK_FAIL;
>> +			}
>> +			return PTE_CHECK_CONTINUE;
>> +		} else {
>> +			*scan_result = SCAN_EXCEED_SWAP_PTE;
>> +			count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
>> +			return PTE_CHECK_FAIL;
>> +		}
>> +	} else if (pte_uffd_wp(pte)) {
>> +		/*
>> +		 * Don't collapse the page if any of the small PTEs are
>> +		 * armed with uffd write protection. Here we can also mark
>> +		 * the new huge pmd as write protected if any of the small
>> +		 * ones is marked but that could bring unknown userfault
>> +		 * messages that falls outside of the registered range.
>> +		 * So, just be simple.
>> +		 */
>> +		*scan_result = SCAN_PTE_UFFD_WP;
>> +		return PTE_CHECK_FAIL;
>> +	}
>> +
>> +	folio = vm_normal_folio(vma, addr, pte);
>> +	if (unlikely(!folio) || unlikely(folio_is_zone_device(folio))) {
>> +		*scan_result = SCAN_PAGE_NULL;
>> +		return PTE_CHECK_FAIL;
>> +	}
>> +
>> +	if (!folio_test_anon(folio)) {
>> +		VM_WARN_ON_FOLIO(true, folio);
>> +		*scan_result = SCAN_PAGE_ANON;
>> +		return PTE_CHECK_FAIL;
>> +	}
>> +
>> +	/*
>> +	 * We treat a single page as shared if any part of the THP
>> +	 * is shared.
>> +	 */
>> +	if (folio_maybe_mapped_shared(folio)) {
>> +		(*shared)++;
>> +		if (cc->is_khugepaged && *shared > khugepaged_max_ptes_shared) {
>> +			*scan_result = SCAN_EXCEED_SHARED_PTE;
>> +			count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
>> +			return PTE_CHECK_FAIL;
>> +		}
>> +	}
>> +
>> +	*foliop = folio;
>> +
>> +	return PTE_CHECK_SUCCEED;
>> +}
>> +
> 
> This one looks much better.
> 
> While my personal feeling is this is not a complete work to merge the scanning
> logic. We still have folio_expected_ref_count() and pte_young() check present
> both in __collapse_huge_page_isolate() and huge_collapse_scan_pmd().

Yep, good catch!

There's definitely more that can be done. For now, let's keep this patch
as-is to avoid making it too complex. Let's get this one in first, and
then we can work on the next step :)



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mm-new v3 1/3] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain
  2025-10-08  4:37 ` [PATCH mm-new v3 1/3] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain Lance Yang
@ 2025-10-14 12:17   ` Lorenzo Stoakes
  2025-10-14 12:27     ` David Hildenbrand
  0 siblings, 1 reply; 23+ messages in thread
From: Lorenzo Stoakes @ 2025-10-14 12:17 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, david, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, ioworker0, richard.weiyang, linux-kernel,
	linux-mm

On Wed, Oct 08, 2025 at 12:37:46PM +0800, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> As pointed out by Dev, the PTE checks for disjoint conditions in the
> scanning loops can be optimized. is_swap_pte, (pte_none && is_zero_pfn),
> and pte_uffd_wp are mutually exclusive.

But you're not using is_swap_pte anywhere :) This comes back to my review
quesiotn on the series this is dependent upon.

>
> This patch refactors the loops in both __collapse_huge_page_isolate() and
> hpage_collapse_scan_pmd() to use a continuous if-else-if-else-if chain
> instead of separate if blocks. While at it, the redundant pte_present()
> check before is_zero_pfn() is also removed.

I mean see review below, I don't see why you're doing this and I am
unconvinced by how redundant that check is.

Also this just feels like it should be part of the series where you change
these? I'm not sure why this is separate?

>
> Also, this is a preparatory step to make it easier to merge the
> almost-duplicated scanning logic in these two functions, as suggested
> by David.
>
> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> Reviewed-by: Dev Jain <dev.jain@arm.com>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Suggested-by: Dev Jain <dev.jain@arm.com>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
>  mm/khugepaged.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index bec3e268dc76..e3e27223137a 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -548,8 +548,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>  	for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
>  	     _pte++, addr += PAGE_SIZE) {
>  		pte_t pteval = ptep_get(_pte);
> -		if (pte_none(pteval) || (pte_present(pteval) &&
> -				is_zero_pfn(pte_pfn(pteval)))) {
> +		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {

You can have non-pte_none() non-present entries no? Isn't pte_present() a
prerequisite for pfe_pfn() to be valid? If it's a swap entry couldn't you
end up accidentally (unlikely but still) hitting this?

Seems like this is required isn't it? I may be missing something here...

>  			++none_or_zero;
>  			if (!userfaultfd_armed(vma) &&
>  			    (!cc->is_khugepaged ||
> @@ -560,12 +559,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>  				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>  				goto out;
>  			}
> -		}
> -		if (!pte_present(pteval)) {
> +		} else if (!pte_present(pteval)) {

This seems pointless, since either the above logic will continue or goto
out right?

>  			result = SCAN_PTE_NON_PRESENT;
>  			goto out;
> -		}
> -		if (pte_uffd_wp(pteval)) {
> +		} else if (pte_uffd_wp(pteval)) {

Again, what is the point of an else when the if() branch unconditionally
->out?

>  			result = SCAN_PTE_UFFD_WP;
>  			goto out;
>  		}
> @@ -1321,8 +1318,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>  				count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
>  				goto out_unmap;
>  			}
> -		}
> -		if (pte_uffd_wp(pteval)) {
> +		} else if (pte_uffd_wp(pteval)) {

Same comment as above, I'm really confused about the purpose of this logic?


>  			/*
>  			 * Don't collapse the page if any of the small
>  			 * PTEs are armed with uffd write protection.
> --
> 2.49.0
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mm-new v3 2/3] mm/khugepaged: use VM_WARN_ON_FOLIO instead of VM_BUG_ON_FOLIO for non-anon folios
  2025-10-08  4:37 ` [PATCH mm-new v3 2/3] mm/khugepaged: use VM_WARN_ON_FOLIO instead of VM_BUG_ON_FOLIO for non-anon folios Lance Yang
@ 2025-10-14 12:25   ` Lorenzo Stoakes
  0 siblings, 0 replies; 23+ messages in thread
From: Lorenzo Stoakes @ 2025-10-14 12:25 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, david, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, ioworker0, richard.weiyang, linux-kernel,
	linux-mm

On Wed, Oct 08, 2025 at 12:37:47PM +0800, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> As Zi pointed out, we should avoid crashing the kernel for conditions
> that can be handled gracefully. Encountering a non-anonymous folio in an
> anonymous VMA is a bug, but a warning is sufficient.
>
> This patch changes the VM_BUG_ON_FOLIO(!folio_test_anon(folio)) to a
> VM_WARN_ON_FOLIO() in both __collapse_huge_page_isolate() and
> hpage_collapse_scan_pmd(), and then aborts the scan with SCAN_PAGE_ANON.

Well no, in hpage_collapse_scan_pmd() there is no warning at all.

>
> Making more of the scanning logic common between hpage_collapse_scan_pmd()
> and __collapse_huge_page_isolate(), as suggested by Dev.

I mean I guess it's fine but I'm not sure it's really necessary to give a
blow-by-blow of who suggested what in the actual commit message :) This
isn't really useful information for somebody looking at this code in the
future.

>
> Suggested-by: Dev Jain <dev.jain@arm.com>
> Suggested-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
> Reviewed-by: Dev Jain <dev.jain@arm.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
>  mm/khugepaged.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index e3e27223137a..b5c0295c3414 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -573,7 +573,11 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>  		}
>
>  		folio = page_folio(page);
> -		VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
> +		if (!folio_test_anon(folio)) {
> +			VM_WARN_ON_FOLIO(true, folio);
> +			result = SCAN_PAGE_ANON;
> +			goto out;

Hmm this is iffy I'm not sure I agree with Zi here - the purpose of
VM_WARN_ON() etc. is for things that programmatically _should not
happen_.

Now every single time we run this code we're doing this check.

AND it implies that it is an actual possiblity, at run time, for this to be
the case.

I really don't like this.

Also if it's a runtime check this should be a WARN_ON_ONCE() not a
VM_WARN_ON_ONCE(). Of course you lose the folio output then. So this code
is very confused.

In general I don't think we should be doing this at all, rather we should
just convert the VM_BUG_ON_FOLIO() to a VM_WARN_ON_FOLIO().


> +		}
>
>  		/* See hpage_collapse_scan_pmd(). */
>  		if (folio_maybe_mapped_shared(folio)) {
> @@ -1340,6 +1344,7 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>  		folio = page_folio(page);
>
>  		if (!folio_test_anon(folio)) {
> +			VM_WARN_ON_FOLIO(true, folio);

Err, what? This is a condition that should never, ever happen to the point
that we warn on it?

This surely is a condition that we expect to happen sometimes otherwise we
wouldn't do this no?

Either way the comments above still apply. Also VM_WARN_ON_FOLIO(true, ...)
is kinda gross... if this is an actual pattern that exists, VM_WARN_FOLIO()
would be preferable.

>  			result = SCAN_PAGE_ANON;
>  			goto out_unmap;
>  		}
> --
> 2.49.0
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mm-new v3 1/3] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain
  2025-10-14 12:17   ` Lorenzo Stoakes
@ 2025-10-14 12:27     ` David Hildenbrand
  2025-10-15  4:49       ` Lance Yang
  0 siblings, 1 reply; 23+ messages in thread
From: David Hildenbrand @ 2025-10-14 12:27 UTC (permalink / raw)
  To: Lorenzo Stoakes, Lance Yang
  Cc: akpm, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, ioworker0, richard.weiyang, linux-kernel,
	linux-mm

On 14.10.25 14:17, Lorenzo Stoakes wrote:
> On Wed, Oct 08, 2025 at 12:37:46PM +0800, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> As pointed out by Dev, the PTE checks for disjoint conditions in the
>> scanning loops can be optimized. is_swap_pte, (pte_none && is_zero_pfn),
>> and pte_uffd_wp are mutually exclusive.
> 
> But you're not using is_swap_pte anywhere :) This comes back to my review
> quesiotn on the series this is dependent upon.
> 
>>
>> This patch refactors the loops in both __collapse_huge_page_isolate() and
>> hpage_collapse_scan_pmd() to use a continuous if-else-if-else-if chain
>> instead of separate if blocks. While at it, the redundant pte_present()
>> check before is_zero_pfn() is also removed.
> 
> I mean see review below, I don't see why you're doing this and I am
> unconvinced by how redundant that check is.
> 
> Also this just feels like it should be part of the series where you change
> these? I'm not sure why this is separate?

I think Lance is trying to unify both scanning functions to look alike, 
such that when he refactors them out in patch #3 it looks more straight 
forward.

The missing pte_present() check in hpage_collapse_scan_pmd() is interesting

Likely there is one such check missing there?

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper
  2025-10-08  4:37 ` [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper Lance Yang
                     ` (2 preceding siblings ...)
  2025-10-10 13:29   ` Wei Yang
@ 2025-10-14 12:36   ` Lorenzo Stoakes
  2026-06-18 12:22     ` Nico Pache
  2025-10-14 17:41   ` Lorenzo Stoakes
  4 siblings, 1 reply; 23+ messages in thread
From: Lorenzo Stoakes @ 2025-10-14 12:36 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, david, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, ioworker0, richard.weiyang, linux-kernel,
	linux-mm

On Wed, Oct 08, 2025 at 12:37:48PM +0800, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
> and __collapse_huge_page_isolate() was almost duplicated.
>
> This patch cleans things up by moving all the common PTE checking logic
> into a new shared helper, thp_collapse_check_pte(). While at it, we use
> vm_normal_folio() instead of vm_normal_page().
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Suggested-by: Dev Jain <dev.jain@arm.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>

In general I like the idea, the implementation needs work though.

Will check in more detail on respin

> ---
>  mm/khugepaged.c | 243 ++++++++++++++++++++++++++----------------------
>  1 file changed, 130 insertions(+), 113 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index b5c0295c3414..7116caae1fa4 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -61,6 +61,12 @@ enum scan_result {
>  	SCAN_PAGE_FILLED,
>  };
>
> +enum pte_check_result {
> +	PTE_CHECK_SUCCEED,
> +	PTE_CHECK_CONTINUE,

Don't love this logic - this feels like we're essentially abstracting
control flow, I mean what does 'continue' mean here? Other than you're in a
loop and please continue, which is relying a little too much on what the
caller does.

if we retain this logic something like PTE_CHECK_SKIP would make more sense.


> +	PTE_CHECK_FAIL,
> +};
> +
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/huge_memory.h>
>
> @@ -533,62 +539,139 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte,
>  	}
>  }
>
> +/*
> + * thp_collapse_check_pte - Check if a PTE is suitable for THP collapse
> + * @pte:           The PTE to check
> + * @vma:           The VMA the PTE belongs to
> + * @addr:          The virtual address corresponding to this PTE
> + * @foliop:        On success, used to return a pointer to the folio
> + *                 Must be non-NULL
> + * @none_or_zero:  Counter for none/zero PTEs. Must be non-NULL
> + * @unmapped:      Counter for swap PTEs. Can be NULL if not scanning swaps
> + * @shared:        Counter for shared pages. Must be non-NULL
> + * @scan_result:   Used to return the failure reason (SCAN_*) on a
> + *                 PTE_CHECK_FAIL return. Must be non-NULL
> + * @cc:            Collapse control settings

Do we really need this many parameters? THis is hard to follow.

Of course it being me, I'd immediately prefer a helper struct :)

> + *
> + * Returns:
> + *   PTE_CHECK_SUCCEED  - PTE is suitable, proceed with further checks
> + *   PTE_CHECK_CONTINUE - Skip this PTE and continue scanning
> + *   PTE_CHECK_FAIL     - Abort collapse scan
> + */
> +static inline int thp_collapse_check_pte(pte_t pte, struct vm_area_struct *vma,

There's no need for inline in an internal static function in a file.

> +		unsigned long addr, struct folio **foliop, int *none_or_zero,
> +		int *unmapped, int *shared, int *scan_result,
> +		struct collapse_control *cc)
> +{
> +	struct folio *folio = NULL;
> +
> +	if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
> +		(*none_or_zero)++;
> +		if (!userfaultfd_armed(vma) &&
> +		    (!cc->is_khugepaged ||
> +		     *none_or_zero <= khugepaged_max_ptes_none)) {
> +			return PTE_CHECK_CONTINUE;
> +		} else {
> +			*scan_result = SCAN_EXCEED_NONE_PTE;
> +			count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> +			return PTE_CHECK_FAIL;
> +		}
> +	} else if (!pte_present(pte)) {

You're now making the if-else issues with previous patches worse by
returning which gets even checkpatch to warn you.

	if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {

(of course note that I am not convinced you can drop the pte_present()
check here)

		(*none_or_zero)++;
		if (!userfaultfd_armed(vma) &&
		    (!cc->is_khugepaged ||
		     *none_or_zero <= khugepaged_max_ptes_none))
			return PTE_CHECK_CONTINUE;

		*scan_result = SCAN_EXCEED_NONE_PTE;
		count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
		return PTE_CHECK_FAIL;
	}

	if (!pte_present(pte)) {
		...
	}

But even that really needs seriously more refactoring - that condition
above is horrible for instance so, e.g.:

	if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
		(*none_or_zero)++;

		/* Cases where this is acceptable. */
		if (!userfaultfd_armed(vma))
			return PTE_CHECK_SKIP;
		if (!cc->is_khugepaged)
			return PTE_CHECK_SKIP;
		if (*none_or_zero <= khugepaged_max_ptes_none)
			return PTE_CHECK_SKIP;

		/* Otherwise, we must abort the scan. */
		*scan_result = SCAN_EXCEED_NONE_PTE;
		count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
		return PTE_CHECK_FAIL;
	}

	if (!pte_present(pte)) {
		...
	}

Improves things a lot.

I do however _hate_ this (*blah)++; thing. A helper struct would help :)



> +		if (!unmapped) {
> +			*scan_result = SCAN_PTE_NON_PRESENT;
> +			return PTE_CHECK_FAIL;

Can't we have a helper that sets the result, e.g.:

	return pte_check_fail(scan_result, SCAN_PTE_NON_PRESENT);

static enum pte_check_result pte_check_fail(int *scan_result,
		enum pte_check_result result)
{
	*scan_result = result;
	return PTE_CHECK_FAIL;
}

And same for success/skip

Then all of these horrible if (blah) { *foo = bar; return baz; } can be
made into

	if (blah)
		return pte_check_xxx(..., SCAN_PTE_...);

> +		}
> +
> +		if (non_swap_entry(pte_to_swp_entry(pte))) {
> +			*scan_result = SCAN_PTE_NON_PRESENT;
> +			return PTE_CHECK_FAIL;
> +		}
> +
> +		(*unmapped)++;
> +		if (!cc->is_khugepaged ||
> +		    *unmapped <= khugepaged_max_ptes_swap) {
> +			/*
> +			 * Always be strict with uffd-wp enabled swap
> +			 * entries. Please see comment below for
> +			 * pte_uffd_wp().
> +			 */
> +			if (pte_swp_uffd_wp(pte)) {
> +				*scan_result = SCAN_PTE_UFFD_WP;
> +				return PTE_CHECK_FAIL;
> +			}
> +			return PTE_CHECK_CONTINUE;
> +		} else {
> +			*scan_result = SCAN_EXCEED_SWAP_PTE;
> +			count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
> +			return PTE_CHECK_FAIL;
> +		}
> +	} else if (pte_uffd_wp(pte)) {
> +		/*
> +		 * Don't collapse the page if any of the small PTEs are
> +		 * armed with uffd write protection. Here we can also mark
> +		 * the new huge pmd as write protected if any of the small
> +		 * ones is marked but that could bring unknown userfault
> +		 * messages that falls outside of the registered range.
> +		 * So, just be simple.
> +		 */
> +		*scan_result = SCAN_PTE_UFFD_WP;
> +		return PTE_CHECK_FAIL;
> +	}
> +

Again as all of the comments for previous series still apply here so
obviously I don't like this as-is :)

And as Andrew has pointed out, now checkpatch is finding the 'pointless
else' issue (as mentioned above also).

> +	folio = vm_normal_folio(vma, addr, pte);
> +	if (unlikely(!folio) || unlikely(folio_is_zone_device(folio))) {
> +		*scan_result = SCAN_PAGE_NULL;
> +		return PTE_CHECK_FAIL;
> +	}
> +
> +	if (!folio_test_anon(folio)) {
> +		VM_WARN_ON_FOLIO(true, folio);
> +		*scan_result = SCAN_PAGE_ANON;
> +		return PTE_CHECK_FAIL;
> +	}
> +
> +	/*
> +	 * We treat a single page as shared if any part of the THP
> +	 * is shared.
> +	 */
> +	if (folio_maybe_mapped_shared(folio)) {
> +		(*shared)++;
> +		if (cc->is_khugepaged && *shared > khugepaged_max_ptes_shared) {
> +			*scan_result = SCAN_EXCEED_SHARED_PTE;
> +			count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> +			return PTE_CHECK_FAIL;
> +		}
> +	}
> +
> +	*foliop = folio;
> +
> +	return PTE_CHECK_SUCCEED;
> +}
> +
>  static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>  					unsigned long start_addr,
>  					pte_t *pte,
>  					struct collapse_control *cc,
>  					struct list_head *compound_pagelist)
>  {
> -	struct page *page = NULL;
>  	struct folio *folio = NULL;
>  	unsigned long addr = start_addr;
>  	pte_t *_pte;
>  	int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
> +	int pte_check_res;
>
>  	for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
>  	     _pte++, addr += PAGE_SIZE) {
>  		pte_t pteval = ptep_get(_pte);
> -		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> -			++none_or_zero;
> -			if (!userfaultfd_armed(vma) &&
> -			    (!cc->is_khugepaged ||
> -			     none_or_zero <= khugepaged_max_ptes_none)) {
> -				continue;
> -			} else {
> -				result = SCAN_EXCEED_NONE_PTE;
> -				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> -				goto out;
> -			}
> -		} else if (!pte_present(pteval)) {
> -			result = SCAN_PTE_NON_PRESENT;
> -			goto out;
> -		} else if (pte_uffd_wp(pteval)) {
> -			result = SCAN_PTE_UFFD_WP;
> -			goto out;
> -		}
> -		page = vm_normal_page(vma, addr, pteval);
> -		if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
> -			result = SCAN_PAGE_NULL;
> -			goto out;
> -		}
>
> -		folio = page_folio(page);
> -		if (!folio_test_anon(folio)) {
> -			VM_WARN_ON_FOLIO(true, folio);
> -			result = SCAN_PAGE_ANON;
> -			goto out;
> -		}
> +		pte_check_res = thp_collapse_check_pte(pteval, vma, addr,
> +					&folio, &none_or_zero, NULL, &shared,
> +					&result, cc);
>
> -		/* See hpage_collapse_scan_pmd(). */
> -		if (folio_maybe_mapped_shared(folio)) {
> -			++shared;
> -			if (cc->is_khugepaged &&
> -			    shared > khugepaged_max_ptes_shared) {
> -				result = SCAN_EXCEED_SHARED_PTE;
> -				count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> -				goto out;
> -			}
> -		}
> +		if (pte_check_res == PTE_CHECK_CONTINUE)
> +			continue;
> +		else if (pte_check_res == PTE_CHECK_FAIL)
> +			goto out;
>
>  		if (folio_test_large(folio)) {
>  			struct folio *f;
> @@ -1264,11 +1347,11 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>  	pte_t *pte, *_pte;
>  	int result = SCAN_FAIL, referenced = 0;
>  	int none_or_zero = 0, shared = 0;
> -	struct page *page = NULL;
>  	struct folio *folio = NULL;
>  	unsigned long addr;
>  	spinlock_t *ptl;
>  	int node = NUMA_NO_NODE, unmapped = 0;
> +	int pte_check_res;
>
>  	VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
>
> @@ -1287,81 +1370,15 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>  	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>  	     _pte++, addr += PAGE_SIZE) {
>  		pte_t pteval = ptep_get(_pte);
> -		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> -			++none_or_zero;
> -			if (!userfaultfd_armed(vma) &&
> -			    (!cc->is_khugepaged ||
> -			     none_or_zero <= khugepaged_max_ptes_none)) {
> -				continue;
> -			} else {
> -				result = SCAN_EXCEED_NONE_PTE;
> -				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> -				goto out_unmap;
> -			}
> -		} else if (!pte_present(pteval)) {
> -			if (non_swap_entry(pte_to_swp_entry(pteval))) {
> -				result = SCAN_PTE_NON_PRESENT;
> -				goto out_unmap;
> -			}
>
> -			++unmapped;
> -			if (!cc->is_khugepaged ||
> -			    unmapped <= khugepaged_max_ptes_swap) {
> -				/*
> -				 * Always be strict with uffd-wp
> -				 * enabled swap entries.  Please see
> -				 * comment below for pte_uffd_wp().
> -				 */
> -				if (pte_swp_uffd_wp(pteval)) {
> -					result = SCAN_PTE_UFFD_WP;
> -					goto out_unmap;
> -				}
> -				continue;
> -			} else {
> -				result = SCAN_EXCEED_SWAP_PTE;
> -				count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
> -				goto out_unmap;
> -			}
> -		} else if (pte_uffd_wp(pteval)) {
> -			/*
> -			 * Don't collapse the page if any of the small
> -			 * PTEs are armed with uffd write protection.
> -			 * Here we can also mark the new huge pmd as
> -			 * write protected if any of the small ones is
> -			 * marked but that could bring unknown
> -			 * userfault messages that falls outside of
> -			 * the registered range.  So, just be simple.
> -			 */
> -			result = SCAN_PTE_UFFD_WP;
> -			goto out_unmap;
> -		}
> +		pte_check_res = thp_collapse_check_pte(pteval, vma, addr,
> +					&folio, &none_or_zero, &unmapped,
> +					&shared, &result, cc);
>
> -		page = vm_normal_page(vma, addr, pteval);
> -		if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
> -			result = SCAN_PAGE_NULL;
> -			goto out_unmap;
> -		}
> -		folio = page_folio(page);
> -
> -		if (!folio_test_anon(folio)) {
> -			VM_WARN_ON_FOLIO(true, folio);
> -			result = SCAN_PAGE_ANON;
> +		if (pte_check_res == PTE_CHECK_CONTINUE)
> +			continue;
> +		else if (pte_check_res == PTE_CHECK_FAIL)
>  			goto out_unmap;
> -		}
> -
> -		/*
> -		 * We treat a single page as shared if any part of the THP
> -		 * is shared.
> -		 */
> -		if (folio_maybe_mapped_shared(folio)) {
> -			++shared;
> -			if (cc->is_khugepaged &&
> -			    shared > khugepaged_max_ptes_shared) {
> -				result = SCAN_EXCEED_SHARED_PTE;
> -				count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> -				goto out_unmap;
> -			}
> -		}
>
>  		/*
>  		 * Record which node the original page is from and save this
> --
> 2.49.0
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper
  2025-10-08  4:37 ` [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper Lance Yang
                     ` (3 preceding siblings ...)
  2025-10-14 12:36   ` Lorenzo Stoakes
@ 2025-10-14 17:41   ` Lorenzo Stoakes
  2025-10-15  1:48     ` Lance Yang
  4 siblings, 1 reply; 23+ messages in thread
From: Lorenzo Stoakes @ 2025-10-14 17:41 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, david, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, ioworker0, richard.weiyang, linux-kernel,
	linux-mm

OK so this patch _is_ based on [0] already being applied then?

I don't see any is_swap_pte() here so presumably so right?

Can we just respin these series and put them together? Because now I've
reviewed a bunch of stuff in [0], which this series depends upon, and
you're saying we should now review on this instead of that and it's a bit
of a mess.

Once review here is dealt with can you combine [0] into this series please?

For [0] I would like you to reinstate the is_swap_pte() conditional as
(copiously :) discussed over there.

[0}: https://lore.kernel.org/all/20251008032657.72406-1-lance.yang@linux.dev/

Thanks, Lorenzo

On Wed, Oct 08, 2025 at 12:37:48PM +0800, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
> and __collapse_huge_page_isolate() was almost duplicated.
>
> This patch cleans things up by moving all the common PTE checking logic
> into a new shared helper, thp_collapse_check_pte(). While at it, we use
> vm_normal_folio() instead of vm_normal_page().
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Suggested-by: Dev Jain <dev.jain@arm.com>
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
>  mm/khugepaged.c | 243 ++++++++++++++++++++++++++----------------------
>  1 file changed, 130 insertions(+), 113 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index b5c0295c3414..7116caae1fa4 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -61,6 +61,12 @@ enum scan_result {
>  	SCAN_PAGE_FILLED,
>  };
>
> +enum pte_check_result {
> +	PTE_CHECK_SUCCEED,
> +	PTE_CHECK_CONTINUE,
> +	PTE_CHECK_FAIL,
> +};
> +
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/huge_memory.h>
>
> @@ -533,62 +539,139 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte,
>  	}
>  }
>
> +/*
> + * thp_collapse_check_pte - Check if a PTE is suitable for THP collapse
> + * @pte:           The PTE to check
> + * @vma:           The VMA the PTE belongs to
> + * @addr:          The virtual address corresponding to this PTE
> + * @foliop:        On success, used to return a pointer to the folio
> + *                 Must be non-NULL
> + * @none_or_zero:  Counter for none/zero PTEs. Must be non-NULL
> + * @unmapped:      Counter for swap PTEs. Can be NULL if not scanning swaps
> + * @shared:        Counter for shared pages. Must be non-NULL
> + * @scan_result:   Used to return the failure reason (SCAN_*) on a
> + *                 PTE_CHECK_FAIL return. Must be non-NULL
> + * @cc:            Collapse control settings
> + *
> + * Returns:
> + *   PTE_CHECK_SUCCEED  - PTE is suitable, proceed with further checks
> + *   PTE_CHECK_CONTINUE - Skip this PTE and continue scanning
> + *   PTE_CHECK_FAIL     - Abort collapse scan
> + */
> +static inline int thp_collapse_check_pte(pte_t pte, struct vm_area_struct *vma,
> +		unsigned long addr, struct folio **foliop, int *none_or_zero,
> +		int *unmapped, int *shared, int *scan_result,
> +		struct collapse_control *cc)
> +{
> +	struct folio *folio = NULL;
> +
> +	if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
> +		(*none_or_zero)++;
> +		if (!userfaultfd_armed(vma) &&
> +		    (!cc->is_khugepaged ||
> +		     *none_or_zero <= khugepaged_max_ptes_none)) {
> +			return PTE_CHECK_CONTINUE;
> +		} else {
> +			*scan_result = SCAN_EXCEED_NONE_PTE;
> +			count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> +			return PTE_CHECK_FAIL;
> +		}
> +	} else if (!pte_present(pte)) {
> +		if (!unmapped) {
> +			*scan_result = SCAN_PTE_NON_PRESENT;
> +			return PTE_CHECK_FAIL;
> +		}
> +
> +		if (non_swap_entry(pte_to_swp_entry(pte))) {
> +			*scan_result = SCAN_PTE_NON_PRESENT;
> +			return PTE_CHECK_FAIL;
> +		}
> +
> +		(*unmapped)++;
> +		if (!cc->is_khugepaged ||
> +		    *unmapped <= khugepaged_max_ptes_swap) {
> +			/*
> +			 * Always be strict with uffd-wp enabled swap
> +			 * entries. Please see comment below for
> +			 * pte_uffd_wp().
> +			 */
> +			if (pte_swp_uffd_wp(pte)) {
> +				*scan_result = SCAN_PTE_UFFD_WP;
> +				return PTE_CHECK_FAIL;
> +			}
> +			return PTE_CHECK_CONTINUE;
> +		} else {
> +			*scan_result = SCAN_EXCEED_SWAP_PTE;
> +			count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
> +			return PTE_CHECK_FAIL;
> +		}
> +	} else if (pte_uffd_wp(pte)) {
> +		/*
> +		 * Don't collapse the page if any of the small PTEs are
> +		 * armed with uffd write protection. Here we can also mark
> +		 * the new huge pmd as write protected if any of the small
> +		 * ones is marked but that could bring unknown userfault
> +		 * messages that falls outside of the registered range.
> +		 * So, just be simple.
> +		 */
> +		*scan_result = SCAN_PTE_UFFD_WP;
> +		return PTE_CHECK_FAIL;
> +	}
> +
> +	folio = vm_normal_folio(vma, addr, pte);
> +	if (unlikely(!folio) || unlikely(folio_is_zone_device(folio))) {
> +		*scan_result = SCAN_PAGE_NULL;
> +		return PTE_CHECK_FAIL;
> +	}
> +
> +	if (!folio_test_anon(folio)) {
> +		VM_WARN_ON_FOLIO(true, folio);
> +		*scan_result = SCAN_PAGE_ANON;
> +		return PTE_CHECK_FAIL;
> +	}
> +
> +	/*
> +	 * We treat a single page as shared if any part of the THP
> +	 * is shared.
> +	 */
> +	if (folio_maybe_mapped_shared(folio)) {
> +		(*shared)++;
> +		if (cc->is_khugepaged && *shared > khugepaged_max_ptes_shared) {
> +			*scan_result = SCAN_EXCEED_SHARED_PTE;
> +			count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> +			return PTE_CHECK_FAIL;
> +		}
> +	}
> +
> +	*foliop = folio;
> +
> +	return PTE_CHECK_SUCCEED;
> +}
> +
>  static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>  					unsigned long start_addr,
>  					pte_t *pte,
>  					struct collapse_control *cc,
>  					struct list_head *compound_pagelist)
>  {
> -	struct page *page = NULL;
>  	struct folio *folio = NULL;
>  	unsigned long addr = start_addr;
>  	pte_t *_pte;
>  	int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
> +	int pte_check_res;
>
>  	for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
>  	     _pte++, addr += PAGE_SIZE) {
>  		pte_t pteval = ptep_get(_pte);
> -		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> -			++none_or_zero;
> -			if (!userfaultfd_armed(vma) &&
> -			    (!cc->is_khugepaged ||
> -			     none_or_zero <= khugepaged_max_ptes_none)) {
> -				continue;
> -			} else {
> -				result = SCAN_EXCEED_NONE_PTE;
> -				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> -				goto out;
> -			}
> -		} else if (!pte_present(pteval)) {
> -			result = SCAN_PTE_NON_PRESENT;
> -			goto out;
> -		} else if (pte_uffd_wp(pteval)) {
> -			result = SCAN_PTE_UFFD_WP;
> -			goto out;
> -		}
> -		page = vm_normal_page(vma, addr, pteval);
> -		if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
> -			result = SCAN_PAGE_NULL;
> -			goto out;
> -		}
>
> -		folio = page_folio(page);
> -		if (!folio_test_anon(folio)) {
> -			VM_WARN_ON_FOLIO(true, folio);
> -			result = SCAN_PAGE_ANON;
> -			goto out;
> -		}
> +		pte_check_res = thp_collapse_check_pte(pteval, vma, addr,
> +					&folio, &none_or_zero, NULL, &shared,
> +					&result, cc);
>
> -		/* See hpage_collapse_scan_pmd(). */
> -		if (folio_maybe_mapped_shared(folio)) {
> -			++shared;
> -			if (cc->is_khugepaged &&
> -			    shared > khugepaged_max_ptes_shared) {
> -				result = SCAN_EXCEED_SHARED_PTE;
> -				count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> -				goto out;
> -			}
> -		}
> +		if (pte_check_res == PTE_CHECK_CONTINUE)
> +			continue;
> +		else if (pte_check_res == PTE_CHECK_FAIL)
> +			goto out;
>
>  		if (folio_test_large(folio)) {
>  			struct folio *f;
> @@ -1264,11 +1347,11 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>  	pte_t *pte, *_pte;
>  	int result = SCAN_FAIL, referenced = 0;
>  	int none_or_zero = 0, shared = 0;
> -	struct page *page = NULL;
>  	struct folio *folio = NULL;
>  	unsigned long addr;
>  	spinlock_t *ptl;
>  	int node = NUMA_NO_NODE, unmapped = 0;
> +	int pte_check_res;
>
>  	VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
>
> @@ -1287,81 +1370,15 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>  	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>  	     _pte++, addr += PAGE_SIZE) {
>  		pte_t pteval = ptep_get(_pte);
> -		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> -			++none_or_zero;
> -			if (!userfaultfd_armed(vma) &&
> -			    (!cc->is_khugepaged ||
> -			     none_or_zero <= khugepaged_max_ptes_none)) {
> -				continue;
> -			} else {
> -				result = SCAN_EXCEED_NONE_PTE;
> -				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> -				goto out_unmap;
> -			}
> -		} else if (!pte_present(pteval)) {
> -			if (non_swap_entry(pte_to_swp_entry(pteval))) {
> -				result = SCAN_PTE_NON_PRESENT;
> -				goto out_unmap;
> -			}
>
> -			++unmapped;
> -			if (!cc->is_khugepaged ||
> -			    unmapped <= khugepaged_max_ptes_swap) {
> -				/*
> -				 * Always be strict with uffd-wp
> -				 * enabled swap entries.  Please see
> -				 * comment below for pte_uffd_wp().
> -				 */
> -				if (pte_swp_uffd_wp(pteval)) {
> -					result = SCAN_PTE_UFFD_WP;
> -					goto out_unmap;
> -				}
> -				continue;
> -			} else {
> -				result = SCAN_EXCEED_SWAP_PTE;
> -				count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
> -				goto out_unmap;
> -			}
> -		} else if (pte_uffd_wp(pteval)) {
> -			/*
> -			 * Don't collapse the page if any of the small
> -			 * PTEs are armed with uffd write protection.
> -			 * Here we can also mark the new huge pmd as
> -			 * write protected if any of the small ones is
> -			 * marked but that could bring unknown
> -			 * userfault messages that falls outside of
> -			 * the registered range.  So, just be simple.
> -			 */
> -			result = SCAN_PTE_UFFD_WP;
> -			goto out_unmap;
> -		}
> +		pte_check_res = thp_collapse_check_pte(pteval, vma, addr,
> +					&folio, &none_or_zero, &unmapped,
> +					&shared, &result, cc);
>
> -		page = vm_normal_page(vma, addr, pteval);
> -		if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
> -			result = SCAN_PAGE_NULL;
> -			goto out_unmap;
> -		}
> -		folio = page_folio(page);
> -
> -		if (!folio_test_anon(folio)) {
> -			VM_WARN_ON_FOLIO(true, folio);
> -			result = SCAN_PAGE_ANON;
> +		if (pte_check_res == PTE_CHECK_CONTINUE)
> +			continue;
> +		else if (pte_check_res == PTE_CHECK_FAIL)
>  			goto out_unmap;
> -		}
> -
> -		/*
> -		 * We treat a single page as shared if any part of the THP
> -		 * is shared.
> -		 */
> -		if (folio_maybe_mapped_shared(folio)) {
> -			++shared;
> -			if (cc->is_khugepaged &&
> -			    shared > khugepaged_max_ptes_shared) {
> -				result = SCAN_EXCEED_SHARED_PTE;
> -				count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> -				goto out_unmap;
> -			}
> -		}
>
>  		/*
>  		 * Record which node the original page is from and save this
> --
> 2.49.0
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper
  2025-10-14 17:41   ` Lorenzo Stoakes
@ 2025-10-15  1:48     ` Lance Yang
  0 siblings, 0 replies; 23+ messages in thread
From: Lance Yang @ 2025-10-15  1:48 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: akpm, david, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, ioworker0, richard.weiyang, linux-kernel,
	linux-mm



On 2025/10/15 01:41, Lorenzo Stoakes wrote:
> OK so this patch _is_ based on [0] already being applied then?
> 
> I don't see any is_swap_pte() here so presumably so right?
> 
> Can we just respin these series and put them together? Because now I've
> reviewed a bunch of stuff in [0], which this series depends upon, and
> you're saying we should now review on this instead of that and it's a bit
> of a mess.
> 
> Once review here is dealt with can you combine [0] into this series please?

Got it, thanks!

> 
> For [0] I would like you to reinstate the is_swap_pte() conditional as
> (copiously :) discussed over there.
> 
> [0}: https://lore.kernel.org/all/20251008032657.72406-1-lance.yang@linux.dev/
> 
> Thanks, Lorenzo
> 
> On Wed, Oct 08, 2025 at 12:37:48PM +0800, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
>> and __collapse_huge_page_isolate() was almost duplicated.
>>
>> This patch cleans things up by moving all the common PTE checking logic
>> into a new shared helper, thp_collapse_check_pte(). While at it, we use
>> vm_normal_folio() instead of vm_normal_page().
>>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Suggested-by: Dev Jain <dev.jain@arm.com>
>> Signed-off-by: Lance Yang <lance.yang@linux.dev>
>> ---
>>   mm/khugepaged.c | 243 ++++++++++++++++++++++++++----------------------
>>   1 file changed, 130 insertions(+), 113 deletions(-)
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index b5c0295c3414..7116caae1fa4 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -61,6 +61,12 @@ enum scan_result {
>>   	SCAN_PAGE_FILLED,
>>   };
>>
>> +enum pte_check_result {
>> +	PTE_CHECK_SUCCEED,
>> +	PTE_CHECK_CONTINUE,
>> +	PTE_CHECK_FAIL,
>> +};
>> +
>>   #define CREATE_TRACE_POINTS
>>   #include <trace/events/huge_memory.h>
>>
>> @@ -533,62 +539,139 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte,
>>   	}
>>   }
>>
>> +/*
>> + * thp_collapse_check_pte - Check if a PTE is suitable for THP collapse
>> + * @pte:           The PTE to check
>> + * @vma:           The VMA the PTE belongs to
>> + * @addr:          The virtual address corresponding to this PTE
>> + * @foliop:        On success, used to return a pointer to the folio
>> + *                 Must be non-NULL
>> + * @none_or_zero:  Counter for none/zero PTEs. Must be non-NULL
>> + * @unmapped:      Counter for swap PTEs. Can be NULL if not scanning swaps
>> + * @shared:        Counter for shared pages. Must be non-NULL
>> + * @scan_result:   Used to return the failure reason (SCAN_*) on a
>> + *                 PTE_CHECK_FAIL return. Must be non-NULL
>> + * @cc:            Collapse control settings
>> + *
>> + * Returns:
>> + *   PTE_CHECK_SUCCEED  - PTE is suitable, proceed with further checks
>> + *   PTE_CHECK_CONTINUE - Skip this PTE and continue scanning
>> + *   PTE_CHECK_FAIL     - Abort collapse scan
>> + */
>> +static inline int thp_collapse_check_pte(pte_t pte, struct vm_area_struct *vma,
>> +		unsigned long addr, struct folio **foliop, int *none_or_zero,
>> +		int *unmapped, int *shared, int *scan_result,
>> +		struct collapse_control *cc)
>> +{
>> +	struct folio *folio = NULL;
>> +
>> +	if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
>> +		(*none_or_zero)++;
>> +		if (!userfaultfd_armed(vma) &&
>> +		    (!cc->is_khugepaged ||
>> +		     *none_or_zero <= khugepaged_max_ptes_none)) {
>> +			return PTE_CHECK_CONTINUE;
>> +		} else {
>> +			*scan_result = SCAN_EXCEED_NONE_PTE;
>> +			count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> +			return PTE_CHECK_FAIL;
>> +		}
>> +	} else if (!pte_present(pte)) {
>> +		if (!unmapped) {
>> +			*scan_result = SCAN_PTE_NON_PRESENT;
>> +			return PTE_CHECK_FAIL;
>> +		}
>> +
>> +		if (non_swap_entry(pte_to_swp_entry(pte))) {
>> +			*scan_result = SCAN_PTE_NON_PRESENT;
>> +			return PTE_CHECK_FAIL;
>> +		}
>> +
>> +		(*unmapped)++;
>> +		if (!cc->is_khugepaged ||
>> +		    *unmapped <= khugepaged_max_ptes_swap) {
>> +			/*
>> +			 * Always be strict with uffd-wp enabled swap
>> +			 * entries. Please see comment below for
>> +			 * pte_uffd_wp().
>> +			 */
>> +			if (pte_swp_uffd_wp(pte)) {
>> +				*scan_result = SCAN_PTE_UFFD_WP;
>> +				return PTE_CHECK_FAIL;
>> +			}
>> +			return PTE_CHECK_CONTINUE;
>> +		} else {
>> +			*scan_result = SCAN_EXCEED_SWAP_PTE;
>> +			count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
>> +			return PTE_CHECK_FAIL;
>> +		}
>> +	} else if (pte_uffd_wp(pte)) {
>> +		/*
>> +		 * Don't collapse the page if any of the small PTEs are
>> +		 * armed with uffd write protection. Here we can also mark
>> +		 * the new huge pmd as write protected if any of the small
>> +		 * ones is marked but that could bring unknown userfault
>> +		 * messages that falls outside of the registered range.
>> +		 * So, just be simple.
>> +		 */
>> +		*scan_result = SCAN_PTE_UFFD_WP;
>> +		return PTE_CHECK_FAIL;
>> +	}
>> +
>> +	folio = vm_normal_folio(vma, addr, pte);
>> +	if (unlikely(!folio) || unlikely(folio_is_zone_device(folio))) {
>> +		*scan_result = SCAN_PAGE_NULL;
>> +		return PTE_CHECK_FAIL;
>> +	}
>> +
>> +	if (!folio_test_anon(folio)) {
>> +		VM_WARN_ON_FOLIO(true, folio);
>> +		*scan_result = SCAN_PAGE_ANON;
>> +		return PTE_CHECK_FAIL;
>> +	}
>> +
>> +	/*
>> +	 * We treat a single page as shared if any part of the THP
>> +	 * is shared.
>> +	 */
>> +	if (folio_maybe_mapped_shared(folio)) {
>> +		(*shared)++;
>> +		if (cc->is_khugepaged && *shared > khugepaged_max_ptes_shared) {
>> +			*scan_result = SCAN_EXCEED_SHARED_PTE;
>> +			count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
>> +			return PTE_CHECK_FAIL;
>> +		}
>> +	}
>> +
>> +	*foliop = folio;
>> +
>> +	return PTE_CHECK_SUCCEED;
>> +}
>> +
>>   static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>>   					unsigned long start_addr,
>>   					pte_t *pte,
>>   					struct collapse_control *cc,
>>   					struct list_head *compound_pagelist)
>>   {
>> -	struct page *page = NULL;
>>   	struct folio *folio = NULL;
>>   	unsigned long addr = start_addr;
>>   	pte_t *_pte;
>>   	int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
>> +	int pte_check_res;
>>
>>   	for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
>>   	     _pte++, addr += PAGE_SIZE) {
>>   		pte_t pteval = ptep_get(_pte);
>> -		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> -			++none_or_zero;
>> -			if (!userfaultfd_armed(vma) &&
>> -			    (!cc->is_khugepaged ||
>> -			     none_or_zero <= khugepaged_max_ptes_none)) {
>> -				continue;
>> -			} else {
>> -				result = SCAN_EXCEED_NONE_PTE;
>> -				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> -				goto out;
>> -			}
>> -		} else if (!pte_present(pteval)) {
>> -			result = SCAN_PTE_NON_PRESENT;
>> -			goto out;
>> -		} else if (pte_uffd_wp(pteval)) {
>> -			result = SCAN_PTE_UFFD_WP;
>> -			goto out;
>> -		}
>> -		page = vm_normal_page(vma, addr, pteval);
>> -		if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
>> -			result = SCAN_PAGE_NULL;
>> -			goto out;
>> -		}
>>
>> -		folio = page_folio(page);
>> -		if (!folio_test_anon(folio)) {
>> -			VM_WARN_ON_FOLIO(true, folio);
>> -			result = SCAN_PAGE_ANON;
>> -			goto out;
>> -		}
>> +		pte_check_res = thp_collapse_check_pte(pteval, vma, addr,
>> +					&folio, &none_or_zero, NULL, &shared,
>> +					&result, cc);
>>
>> -		/* See hpage_collapse_scan_pmd(). */
>> -		if (folio_maybe_mapped_shared(folio)) {
>> -			++shared;
>> -			if (cc->is_khugepaged &&
>> -			    shared > khugepaged_max_ptes_shared) {
>> -				result = SCAN_EXCEED_SHARED_PTE;
>> -				count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
>> -				goto out;
>> -			}
>> -		}
>> +		if (pte_check_res == PTE_CHECK_CONTINUE)
>> +			continue;
>> +		else if (pte_check_res == PTE_CHECK_FAIL)
>> +			goto out;
>>
>>   		if (folio_test_large(folio)) {
>>   			struct folio *f;
>> @@ -1264,11 +1347,11 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>>   	pte_t *pte, *_pte;
>>   	int result = SCAN_FAIL, referenced = 0;
>>   	int none_or_zero = 0, shared = 0;
>> -	struct page *page = NULL;
>>   	struct folio *folio = NULL;
>>   	unsigned long addr;
>>   	spinlock_t *ptl;
>>   	int node = NUMA_NO_NODE, unmapped = 0;
>> +	int pte_check_res;
>>
>>   	VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
>>
>> @@ -1287,81 +1370,15 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>>   	for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>>   	     _pte++, addr += PAGE_SIZE) {
>>   		pte_t pteval = ptep_get(_pte);
>> -		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> -			++none_or_zero;
>> -			if (!userfaultfd_armed(vma) &&
>> -			    (!cc->is_khugepaged ||
>> -			     none_or_zero <= khugepaged_max_ptes_none)) {
>> -				continue;
>> -			} else {
>> -				result = SCAN_EXCEED_NONE_PTE;
>> -				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> -				goto out_unmap;
>> -			}
>> -		} else if (!pte_present(pteval)) {
>> -			if (non_swap_entry(pte_to_swp_entry(pteval))) {
>> -				result = SCAN_PTE_NON_PRESENT;
>> -				goto out_unmap;
>> -			}
>>
>> -			++unmapped;
>> -			if (!cc->is_khugepaged ||
>> -			    unmapped <= khugepaged_max_ptes_swap) {
>> -				/*
>> -				 * Always be strict with uffd-wp
>> -				 * enabled swap entries.  Please see
>> -				 * comment below for pte_uffd_wp().
>> -				 */
>> -				if (pte_swp_uffd_wp(pteval)) {
>> -					result = SCAN_PTE_UFFD_WP;
>> -					goto out_unmap;
>> -				}
>> -				continue;
>> -			} else {
>> -				result = SCAN_EXCEED_SWAP_PTE;
>> -				count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
>> -				goto out_unmap;
>> -			}
>> -		} else if (pte_uffd_wp(pteval)) {
>> -			/*
>> -			 * Don't collapse the page if any of the small
>> -			 * PTEs are armed with uffd write protection.
>> -			 * Here we can also mark the new huge pmd as
>> -			 * write protected if any of the small ones is
>> -			 * marked but that could bring unknown
>> -			 * userfault messages that falls outside of
>> -			 * the registered range.  So, just be simple.
>> -			 */
>> -			result = SCAN_PTE_UFFD_WP;
>> -			goto out_unmap;
>> -		}
>> +		pte_check_res = thp_collapse_check_pte(pteval, vma, addr,
>> +					&folio, &none_or_zero, &unmapped,
>> +					&shared, &result, cc);
>>
>> -		page = vm_normal_page(vma, addr, pteval);
>> -		if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
>> -			result = SCAN_PAGE_NULL;
>> -			goto out_unmap;
>> -		}
>> -		folio = page_folio(page);
>> -
>> -		if (!folio_test_anon(folio)) {
>> -			VM_WARN_ON_FOLIO(true, folio);
>> -			result = SCAN_PAGE_ANON;
>> +		if (pte_check_res == PTE_CHECK_CONTINUE)
>> +			continue;
>> +		else if (pte_check_res == PTE_CHECK_FAIL)
>>   			goto out_unmap;
>> -		}
>> -
>> -		/*
>> -		 * We treat a single page as shared if any part of the THP
>> -		 * is shared.
>> -		 */
>> -		if (folio_maybe_mapped_shared(folio)) {
>> -			++shared;
>> -			if (cc->is_khugepaged &&
>> -			    shared > khugepaged_max_ptes_shared) {
>> -				result = SCAN_EXCEED_SHARED_PTE;
>> -				count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
>> -				goto out_unmap;
>> -			}
>> -		}
>>
>>   		/*
>>   		 * Record which node the original page is from and save this
>> --
>> 2.49.0
>>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mm-new v3 1/3] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain
  2025-10-14 12:27     ` David Hildenbrand
@ 2025-10-15  4:49       ` Lance Yang
  2025-10-15  9:16         ` Lorenzo Stoakes
  0 siblings, 1 reply; 23+ messages in thread
From: Lance Yang @ 2025-10-15  4:49 UTC (permalink / raw)
  To: David Hildenbrand, Lorenzo Stoakes
  Cc: akpm, ziy, baolin.wang, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, ioworker0, richard.weiyang, linux-kernel,
	linux-mm



On 2025/10/14 20:27, David Hildenbrand wrote:
> On 14.10.25 14:17, Lorenzo Stoakes wrote:
>> On Wed, Oct 08, 2025 at 12:37:46PM +0800, Lance Yang wrote:
>>> From: Lance Yang <lance.yang@linux.dev>
>>>
>>> As pointed out by Dev, the PTE checks for disjoint conditions in the
>>> scanning loops can be optimized. is_swap_pte, (pte_none && is_zero_pfn),
>>> and pte_uffd_wp are mutually exclusive.
>>
>> But you're not using is_swap_pte anywhere :) This comes back to my review
>> quesiotn on the series this is dependent upon.
>>
>>>
>>> This patch refactors the loops in both __collapse_huge_page_isolate() 
>>> and
>>> hpage_collapse_scan_pmd() to use a continuous if-else-if-else-if chain
>>> instead of separate if blocks. While at it, the redundant pte_present()
>>> check before is_zero_pfn() is also removed.
>>
>> I mean see review below, I don't see why you're doing this and I am
>> unconvinced by how redundant that check is.

Ah, good catch! Lorenzo, thanks!!!

>>
>> Also this just feels like it should be part of the series where you 
>> change
>> these? I'm not sure why this is separate?
> 
> I think Lance is trying to unify both scanning functions to look alike, 
> such that when he refactors them out in patch #3 it looks more straight 
> forward.
> 
> The missing pte_present() check in hpage_collapse_scan_pmd() is interesting

Yep, indeed ;)

> 
> Likely there is one such check missing there?

I think the risk is exactly how pte_pfn() would handle a swap PTE ...

A swap PTE contains completely different data(swap type and offset).
pte_pfn() doesn't know this, so if we feed a swap entry to it, it will
spit out a junk PFN :)

What if that junk PFN happens to match the zeropage's PFN by sheer
chance? IHMO, it's really unlikely, but it would be really bad if it did.

Clearly, pte_present() prevents this ;)

By the way, I noticed there are other places in khugepaged.c that
call pte_pfn() without being under a pte_present() check.

Perhaps those should all be fixed as well in a separate patch?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mm-new v3 1/3] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain
  2025-10-15  4:49       ` Lance Yang
@ 2025-10-15  9:16         ` Lorenzo Stoakes
  2025-10-15  9:31           ` Lance Yang
  0 siblings, 1 reply; 23+ messages in thread
From: Lorenzo Stoakes @ 2025-10-15  9:16 UTC (permalink / raw)
  To: Lance Yang
  Cc: David Hildenbrand, akpm, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, ioworker0, richard.weiyang,
	linux-kernel, linux-mm

On Wed, Oct 15, 2025 at 12:49:26PM +0800, Lance Yang wrote:
>
>
> On 2025/10/14 20:27, David Hildenbrand wrote:
> > On 14.10.25 14:17, Lorenzo Stoakes wrote:
> > > On Wed, Oct 08, 2025 at 12:37:46PM +0800, Lance Yang wrote:
> > > > From: Lance Yang <lance.yang@linux.dev>
> > > >
> > > > As pointed out by Dev, the PTE checks for disjoint conditions in the
> > > > scanning loops can be optimized. is_swap_pte, (pte_none && is_zero_pfn),
> > > > and pte_uffd_wp are mutually exclusive.
> > >
> > > But you're not using is_swap_pte anywhere :) This comes back to my review
> > > quesiotn on the series this is dependent upon.
> > >
> > > >
> > > > This patch refactors the loops in both
> > > > __collapse_huge_page_isolate() and
> > > > hpage_collapse_scan_pmd() to use a continuous if-else-if-else-if chain
> > > > instead of separate if blocks. While at it, the redundant pte_present()
> > > > check before is_zero_pfn() is also removed.
> > >
> > > I mean see review below, I don't see why you're doing this and I am
> > > unconvinced by how redundant that check is.
>
> Ah, good catch! Lorenzo, thanks!!!
>
> > >
> > > Also this just feels like it should be part of the series where you
> > > change
> > > these? I'm not sure why this is separate?
> >
> > I think Lance is trying to unify both scanning functions to look alike,
> > such that when he refactors them out in patch #3 it looks more straight
> > forward.
> >
> > The missing pte_present() check in hpage_collapse_scan_pmd() is interesting
>
> Yep, indeed ;)
>
> >
> > Likely there is one such check missing there?
>
> I think the risk is exactly how pte_pfn() would handle a swap PTE ...
>
> A swap PTE contains completely different data(swap type and offset).
> pte_pfn() doesn't know this, so if we feed a swap entry to it, it will
> spit out a junk PFN :)
>
> What if that junk PFN happens to match the zeropage's PFN by sheer
> chance? IHMO, it's really unlikely, but it would be really bad if it did.
>
> Clearly, pte_present() prevents this ;)

Yeah, not so clearly kinda the whole point I'm making here. I mean all this code
sucks because we have deeply nested conditions and you have to keep in your mind
that 'oh we already checked for X so we can do Y'.

But the THP code is horrible in general.

Anyway let's stay focused (so I can stand a chance of catching up with my
post-vacation backlog :), I will check the respin once you send!

>
> By the way, I noticed there are other places in khugepaged.c that
> call pte_pfn() without being under a pte_present() check.
>
> Perhaps those should all be fixed as well in a separate patch?

Yeah I'm not surprised and sure indeed.

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mm-new v3 1/3] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain
  2025-10-15  9:16         ` Lorenzo Stoakes
@ 2025-10-15  9:31           ` Lance Yang
  0 siblings, 0 replies; 23+ messages in thread
From: Lance Yang @ 2025-10-15  9:31 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: David Hildenbrand, akpm, ziy, baolin.wang, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, ioworker0, richard.weiyang,
	linux-kernel, linux-mm



On 2025/10/15 17:16, Lorenzo Stoakes wrote:
> On Wed, Oct 15, 2025 at 12:49:26PM +0800, Lance Yang wrote:
>>
>>
>> On 2025/10/14 20:27, David Hildenbrand wrote:
>>> On 14.10.25 14:17, Lorenzo Stoakes wrote:
>>>> On Wed, Oct 08, 2025 at 12:37:46PM +0800, Lance Yang wrote:
>>>>> From: Lance Yang <lance.yang@linux.dev>
>>>>>
>>>>> As pointed out by Dev, the PTE checks for disjoint conditions in the
>>>>> scanning loops can be optimized. is_swap_pte, (pte_none && is_zero_pfn),
>>>>> and pte_uffd_wp are mutually exclusive.
>>>>
>>>> But you're not using is_swap_pte anywhere :) This comes back to my review
>>>> quesiotn on the series this is dependent upon.
>>>>
>>>>>
>>>>> This patch refactors the loops in both
>>>>> __collapse_huge_page_isolate() and
>>>>> hpage_collapse_scan_pmd() to use a continuous if-else-if-else-if chain
>>>>> instead of separate if blocks. While at it, the redundant pte_present()
>>>>> check before is_zero_pfn() is also removed.
>>>>
>>>> I mean see review below, I don't see why you're doing this and I am
>>>> unconvinced by how redundant that check is.
>>
>> Ah, good catch! Lorenzo, thanks!!!
>>
>>>>
>>>> Also this just feels like it should be part of the series where you
>>>> change
>>>> these? I'm not sure why this is separate?
>>>
>>> I think Lance is trying to unify both scanning functions to look alike,
>>> such that when he refactors them out in patch #3 it looks more straight
>>> forward.
>>>
>>> The missing pte_present() check in hpage_collapse_scan_pmd() is interesting
>>
>> Yep, indeed ;)
>>
>>>
>>> Likely there is one such check missing there?
>>
>> I think the risk is exactly how pte_pfn() would handle a swap PTE ...
>>
>> A swap PTE contains completely different data(swap type and offset).
>> pte_pfn() doesn't know this, so if we feed a swap entry to it, it will
>> spit out a junk PFN :)
>>
>> What if that junk PFN happens to match the zeropage's PFN by sheer
>> chance? IHMO, it's really unlikely, but it would be really bad if it did.
>>
>> Clearly, pte_present() prevents this ;)
> 
> Yeah, not so clearly kinda the whole point I'm making here. I mean all this code
> sucks because we have deeply nested conditions and you have to keep in your mind
> that 'oh we already checked for X so we can do Y'.

I see :)

> 
> But the THP code is horrible in general.

Yep, you'll get no argument from me on that one ;p

The code is indeed tricky ...

> 
> Anyway let's stay focused (so I can stand a chance of catching up with my
> post-vacation backlog :), I will check the respin once you send!

Cheers!

This series has been dropped from mm-new. I'm going to take a bit to
rethink the approach based on the feedback.

> 
>>
>> By the way, I noticed there are other places in khugepaged.c that
>> call pte_pfn() without being under a pte_present() check.
>>
>> Perhaps those should all be fixed as well in a separate patch?
> 
> Yeah I'm not surprised and sure indeed.
> 
> Thanks, Lorenzo

In the meantime, I'll send up a separate patch for the missing
pte_present() checks first ;)

Thanks,
Lance


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper
  2025-10-14 12:36   ` Lorenzo Stoakes
@ 2026-06-18 12:22     ` Nico Pache
  2026-06-18 13:10       ` Lance Yang
  0 siblings, 1 reply; 23+ messages in thread
From: Nico Pache @ 2026-06-18 12:22 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Lance Yang, akpm, david, ziy, baolin.wang, Liam.Howlett,
	ryan.roberts, dev.jain, baohua, ioworker0, richard.weiyang,
	linux-kernel, linux-mm

On Tue, Oct 14, 2025 at 6:37 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Wed, Oct 08, 2025 at 12:37:48PM +0800, Lance Yang wrote:
> > From: Lance Yang <lance.yang@linux.dev>
> >
> > As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
> > and __collapse_huge_page_isolate() was almost duplicated.
> >
> > This patch cleans things up by moving all the common PTE checking logic
> > into a new shared helper, thp_collapse_check_pte(). While at it, we use
> > vm_normal_folio() instead of vm_normal_page().
> >
> > Suggested-by: David Hildenbrand <david@redhat.com>
> > Suggested-by: Dev Jain <dev.jain@arm.com>
> > Signed-off-by: Lance Yang <lance.yang@linux.dev>
>
> In general I like the idea, the implementation needs work though.
>
> Will check in more detail on respin
>
> > ---
> >  mm/khugepaged.c | 243 ++++++++++++++++++++++++++----------------------
> >  1 file changed, 130 insertions(+), 113 deletions(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index b5c0295c3414..7116caae1fa4 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -61,6 +61,12 @@ enum scan_result {
> >       SCAN_PAGE_FILLED,
> >  };
> >
> > +enum pte_check_result {
> > +     PTE_CHECK_SUCCEED,
> > +     PTE_CHECK_CONTINUE,
>
> Don't love this logic - this feels like we're essentially abstracting
> control flow, I mean what does 'continue' mean here? Other than you're in a
> loop and please continue, which is relying a little too much on what the
> caller does.
>
> if we retain this logic something like PTE_CHECK_SKIP would make more sense.
>
>
> > +     PTE_CHECK_FAIL,
> > +};
> > +
> >  #define CREATE_TRACE_POINTS
> >  #include <trace/events/huge_memory.h>
> >
> > @@ -533,62 +539,139 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte,
> >       }
> >  }
> >
> > +/*
> > + * thp_collapse_check_pte - Check if a PTE is suitable for THP collapse
> > + * @pte:           The PTE to check
> > + * @vma:           The VMA the PTE belongs to
> > + * @addr:          The virtual address corresponding to this PTE
> > + * @foliop:        On success, used to return a pointer to the folio
> > + *                 Must be non-NULL
> > + * @none_or_zero:  Counter for none/zero PTEs. Must be non-NULL
> > + * @unmapped:      Counter for swap PTEs. Can be NULL if not scanning swaps
> > + * @shared:        Counter for shared pages. Must be non-NULL
> > + * @scan_result:   Used to return the failure reason (SCAN_*) on a
> > + *                 PTE_CHECK_FAIL return. Must be non-NULL
> > + * @cc:            Collapse control settings
>
> Do we really need this many parameters? THis is hard to follow.
>
> Of course it being me, I'd immediately prefer a helper struct :)

Hey!

Ive been trying to reimplement this patch, and started converting this
to a helper struct... At first glance it seems like the right
approach, but because the scan/isolate share two slightly different
implementations, we need to leave some logic in the parent. This
results in polluting the rest of the code significantly just to save
one ugly function call with many parameters.

not sure what to do about that tbh.

e.g., (just one of many examples)


-                               NR_ISOLATED_ANON + folio_is_file_lru(folio),
-                               folio_nr_pages(folio));
-               VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
-               VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
+                               NR_ISOLATED_ANON + folio_is_file_lru(ctx.folio),
+                               folio_nr_pages(ctx.folio));
+               VM_BUG_ON_FOLIO(!folio_test_locked(ctx.folio), ctx.folio);
+               VM_BUG_ON_FOLIO(folio_test_lru(ctx.folio), ctx.folio);

-               if (folio_test_large(folio))
-                       list_add_tail(&folio->lru, compound_pagelist);
+               if (folio_test_large(ctx.folio))
+                       list_add_tail(ctx.folio->lru, compound_pagelist);

Are we sure this helper structure is the right approach?

-- Nico

>
> > + *
> > + * Returns:
> > + *   PTE_CHECK_SUCCEED  - PTE is suitable, proceed with further checks
> > + *   PTE_CHECK_CONTINUE - Skip this PTE and continue scanning
> > + *   PTE_CHECK_FAIL     - Abort collapse scan
> > + */
> > +static inline int thp_collapse_check_pte(pte_t pte, struct vm_area_struct *vma,
>
> There's no need for inline in an internal static function in a file.
>
> > +             unsigned long addr, struct folio **foliop, int *none_or_zero,
> > +             int *unmapped, int *shared, int *scan_result,
> > +             struct collapse_control *cc)
> > +{
> > +     struct folio *folio = NULL;
> > +
> > +     if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
> > +             (*none_or_zero)++;
> > +             if (!userfaultfd_armed(vma) &&
> > +                 (!cc->is_khugepaged ||
> > +                  *none_or_zero <= khugepaged_max_ptes_none)) {
> > +                     return PTE_CHECK_CONTINUE;
> > +             } else {
> > +                     *scan_result = SCAN_EXCEED_NONE_PTE;
> > +                     count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> > +                     return PTE_CHECK_FAIL;
> > +             }
> > +     } else if (!pte_present(pte)) {
>
> You're now making the if-else issues with previous patches worse by
> returning which gets even checkpatch to warn you.
>
>         if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
>
> (of course note that I am not convinced you can drop the pte_present()
> check here)
>
>                 (*none_or_zero)++;
>                 if (!userfaultfd_armed(vma) &&
>                     (!cc->is_khugepaged ||
>                      *none_or_zero <= khugepaged_max_ptes_none))
>                         return PTE_CHECK_CONTINUE;
>
>                 *scan_result = SCAN_EXCEED_NONE_PTE;
>                 count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>                 return PTE_CHECK_FAIL;
>         }
>
>         if (!pte_present(pte)) {
>                 ...
>         }
>
> But even that really needs seriously more refactoring - that condition
> above is horrible for instance so, e.g.:
>
>         if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
>                 (*none_or_zero)++;
>
>                 /* Cases where this is acceptable. */
>                 if (!userfaultfd_armed(vma))
>                         return PTE_CHECK_SKIP;
>                 if (!cc->is_khugepaged)
>                         return PTE_CHECK_SKIP;
>                 if (*none_or_zero <= khugepaged_max_ptes_none)
>                         return PTE_CHECK_SKIP;
>
>                 /* Otherwise, we must abort the scan. */
>                 *scan_result = SCAN_EXCEED_NONE_PTE;
>                 count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>                 return PTE_CHECK_FAIL;
>         }
>
>         if (!pte_present(pte)) {
>                 ...
>         }
>
> Improves things a lot.
>
> I do however _hate_ this (*blah)++; thing. A helper struct would help :)
>
>
>
> > +             if (!unmapped) {
> > +                     *scan_result = SCAN_PTE_NON_PRESENT;
> > +                     return PTE_CHECK_FAIL;
>
> Can't we have a helper that sets the result, e.g.:
>
>         return pte_check_fail(scan_result, SCAN_PTE_NON_PRESENT);
>
> static enum pte_check_result pte_check_fail(int *scan_result,
>                 enum pte_check_result result)
> {
>         *scan_result = result;
>         return PTE_CHECK_FAIL;
> }
>
> And same for success/skip
>
> Then all of these horrible if (blah) { *foo = bar; return baz; } can be
> made into
>
>         if (blah)
>                 return pte_check_xxx(..., SCAN_PTE_...);
>
> > +             }
> > +
> > +             if (non_swap_entry(pte_to_swp_entry(pte))) {
> > +                     *scan_result = SCAN_PTE_NON_PRESENT;
> > +                     return PTE_CHECK_FAIL;
> > +             }
> > +
> > +             (*unmapped)++;
> > +             if (!cc->is_khugepaged ||
> > +                 *unmapped <= khugepaged_max_ptes_swap) {
> > +                     /*
> > +                      * Always be strict with uffd-wp enabled swap
> > +                      * entries. Please see comment below for
> > +                      * pte_uffd_wp().
> > +                      */
> > +                     if (pte_swp_uffd_wp(pte)) {
> > +                             *scan_result = SCAN_PTE_UFFD_WP;
> > +                             return PTE_CHECK_FAIL;
> > +                     }
> > +                     return PTE_CHECK_CONTINUE;
> > +             } else {
> > +                     *scan_result = SCAN_EXCEED_SWAP_PTE;
> > +                     count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
> > +                     return PTE_CHECK_FAIL;
> > +             }
> > +     } else if (pte_uffd_wp(pte)) {
> > +             /*
> > +              * Don't collapse the page if any of the small PTEs are
> > +              * armed with uffd write protection. Here we can also mark
> > +              * the new huge pmd as write protected if any of the small
> > +              * ones is marked but that could bring unknown userfault
> > +              * messages that falls outside of the registered range.
> > +              * So, just be simple.
> > +              */
> > +             *scan_result = SCAN_PTE_UFFD_WP;
> > +             return PTE_CHECK_FAIL;
> > +     }
> > +
>
> Again as all of the comments for previous series still apply here so
> obviously I don't like this as-is :)
>
> And as Andrew has pointed out, now checkpatch is finding the 'pointless
> else' issue (as mentioned above also).
>
> > +     folio = vm_normal_folio(vma, addr, pte);
> > +     if (unlikely(!folio) || unlikely(folio_is_zone_device(folio))) {
> > +             *scan_result = SCAN_PAGE_NULL;
> > +             return PTE_CHECK_FAIL;
> > +     }
> > +
> > +     if (!folio_test_anon(folio)) {
> > +             VM_WARN_ON_FOLIO(true, folio);
> > +             *scan_result = SCAN_PAGE_ANON;
> > +             return PTE_CHECK_FAIL;
> > +     }
> > +
> > +     /*
> > +      * We treat a single page as shared if any part of the THP
> > +      * is shared.
> > +      */
> > +     if (folio_maybe_mapped_shared(folio)) {
> > +             (*shared)++;
> > +             if (cc->is_khugepaged && *shared > khugepaged_max_ptes_shared) {
> > +                     *scan_result = SCAN_EXCEED_SHARED_PTE;
> > +                     count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> > +                     return PTE_CHECK_FAIL;
> > +             }
> > +     }
> > +
> > +     *foliop = folio;
> > +
> > +     return PTE_CHECK_SUCCEED;
> > +}
> > +
> >  static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> >                                       unsigned long start_addr,
> >                                       pte_t *pte,
> >                                       struct collapse_control *cc,
> >                                       struct list_head *compound_pagelist)
> >  {
> > -     struct page *page = NULL;
> >       struct folio *folio = NULL;
> >       unsigned long addr = start_addr;
> >       pte_t *_pte;
> >       int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
> > +     int pte_check_res;
> >
> >       for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
> >            _pte++, addr += PAGE_SIZE) {
> >               pte_t pteval = ptep_get(_pte);
> > -             if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> > -                     ++none_or_zero;
> > -                     if (!userfaultfd_armed(vma) &&
> > -                         (!cc->is_khugepaged ||
> > -                          none_or_zero <= khugepaged_max_ptes_none)) {
> > -                             continue;
> > -                     } else {
> > -                             result = SCAN_EXCEED_NONE_PTE;
> > -                             count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> > -                             goto out;
> > -                     }
> > -             } else if (!pte_present(pteval)) {
> > -                     result = SCAN_PTE_NON_PRESENT;
> > -                     goto out;
> > -             } else if (pte_uffd_wp(pteval)) {
> > -                     result = SCAN_PTE_UFFD_WP;
> > -                     goto out;
> > -             }
> > -             page = vm_normal_page(vma, addr, pteval);
> > -             if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
> > -                     result = SCAN_PAGE_NULL;
> > -                     goto out;
> > -             }
> >
> > -             folio = page_folio(page);
> > -             if (!folio_test_anon(folio)) {
> > -                     VM_WARN_ON_FOLIO(true, folio);
> > -                     result = SCAN_PAGE_ANON;
> > -                     goto out;
> > -             }
> > +             pte_check_res = thp_collapse_check_pte(pteval, vma, addr,
> > +                                     &folio, &none_or_zero, NULL, &shared,
> > +                                     &result, cc);
> >
> > -             /* See hpage_collapse_scan_pmd(). */
> > -             if (folio_maybe_mapped_shared(folio)) {
> > -                     ++shared;
> > -                     if (cc->is_khugepaged &&
> > -                         shared > khugepaged_max_ptes_shared) {
> > -                             result = SCAN_EXCEED_SHARED_PTE;
> > -                             count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> > -                             goto out;
> > -                     }
> > -             }
> > +             if (pte_check_res == PTE_CHECK_CONTINUE)
> > +                     continue;
> > +             else if (pte_check_res == PTE_CHECK_FAIL)
> > +                     goto out;
> >
> >               if (folio_test_large(folio)) {
> >                       struct folio *f;
> > @@ -1264,11 +1347,11 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> >       pte_t *pte, *_pte;
> >       int result = SCAN_FAIL, referenced = 0;
> >       int none_or_zero = 0, shared = 0;
> > -     struct page *page = NULL;
> >       struct folio *folio = NULL;
> >       unsigned long addr;
> >       spinlock_t *ptl;
> >       int node = NUMA_NO_NODE, unmapped = 0;
> > +     int pte_check_res;
> >
> >       VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
> >
> > @@ -1287,81 +1370,15 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> >       for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
> >            _pte++, addr += PAGE_SIZE) {
> >               pte_t pteval = ptep_get(_pte);
> > -             if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> > -                     ++none_or_zero;
> > -                     if (!userfaultfd_armed(vma) &&
> > -                         (!cc->is_khugepaged ||
> > -                          none_or_zero <= khugepaged_max_ptes_none)) {
> > -                             continue;
> > -                     } else {
> > -                             result = SCAN_EXCEED_NONE_PTE;
> > -                             count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> > -                             goto out_unmap;
> > -                     }
> > -             } else if (!pte_present(pteval)) {
> > -                     if (non_swap_entry(pte_to_swp_entry(pteval))) {
> > -                             result = SCAN_PTE_NON_PRESENT;
> > -                             goto out_unmap;
> > -                     }
> >
> > -                     ++unmapped;
> > -                     if (!cc->is_khugepaged ||
> > -                         unmapped <= khugepaged_max_ptes_swap) {
> > -                             /*
> > -                              * Always be strict with uffd-wp
> > -                              * enabled swap entries.  Please see
> > -                              * comment below for pte_uffd_wp().
> > -                              */
> > -                             if (pte_swp_uffd_wp(pteval)) {
> > -                                     result = SCAN_PTE_UFFD_WP;
> > -                                     goto out_unmap;
> > -                             }
> > -                             continue;
> > -                     } else {
> > -                             result = SCAN_EXCEED_SWAP_PTE;
> > -                             count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
> > -                             goto out_unmap;
> > -                     }
> > -             } else if (pte_uffd_wp(pteval)) {
> > -                     /*
> > -                      * Don't collapse the page if any of the small
> > -                      * PTEs are armed with uffd write protection.
> > -                      * Here we can also mark the new huge pmd as
> > -                      * write protected if any of the small ones is
> > -                      * marked but that could bring unknown
> > -                      * userfault messages that falls outside of
> > -                      * the registered range.  So, just be simple.
> > -                      */
> > -                     result = SCAN_PTE_UFFD_WP;
> > -                     goto out_unmap;
> > -             }
> > +             pte_check_res = thp_collapse_check_pte(pteval, vma, addr,
> > +                                     &folio, &none_or_zero, &unmapped,
> > +                                     &shared, &result, cc);
> >
> > -             page = vm_normal_page(vma, addr, pteval);
> > -             if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
> > -                     result = SCAN_PAGE_NULL;
> > -                     goto out_unmap;
> > -             }
> > -             folio = page_folio(page);
> > -
> > -             if (!folio_test_anon(folio)) {
> > -                     VM_WARN_ON_FOLIO(true, folio);
> > -                     result = SCAN_PAGE_ANON;
> > +             if (pte_check_res == PTE_CHECK_CONTINUE)
> > +                     continue;
> > +             else if (pte_check_res == PTE_CHECK_FAIL)
> >                       goto out_unmap;
> > -             }
> > -
> > -             /*
> > -              * We treat a single page as shared if any part of the THP
> > -              * is shared.
> > -              */
> > -             if (folio_maybe_mapped_shared(folio)) {
> > -                     ++shared;
> > -                     if (cc->is_khugepaged &&
> > -                         shared > khugepaged_max_ptes_shared) {
> > -                             result = SCAN_EXCEED_SHARED_PTE;
> > -                             count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> > -                             goto out_unmap;
> > -                     }
> > -             }
> >
> >               /*
> >                * Record which node the original page is from and save this
> > --
> > 2.49.0
> >
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper
  2026-06-18 12:22     ` Nico Pache
@ 2026-06-18 13:10       ` Lance Yang
  2026-06-18 13:19         ` Nico Pache
  0 siblings, 1 reply; 23+ messages in thread
From: Lance Yang @ 2026-06-18 13:10 UTC (permalink / raw)
  To: npache
  Cc: lance.yang, akpm, ziy, baolin.wang, Liam.Howlett, ryan.roberts,
	dev.jain, baohua, richard.weiyang, linux-kernel, linux-mm, david,
	ljs

+Cc David and Lorenzo, address oops fixed :)

On Thu, Jun 18, 2026 at 06:22:49AM -0600, Nico Pache wrote:
>On Tue, Oct 14, 2025 at 6:37 AM Lorenzo Stoakes
><lorenzo.stoakes@oracle.com> wrote:
>>
>> On Wed, Oct 08, 2025 at 12:37:48PM +0800, Lance Yang wrote:
>> > From: Lance Yang <lance.yang@linux.dev>
>> >
>> > As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
>> > and __collapse_huge_page_isolate() was almost duplicated.
>> >
>> > This patch cleans things up by moving all the common PTE checking logic
>> > into a new shared helper, thp_collapse_check_pte(). While at it, we use
>> > vm_normal_folio() instead of vm_normal_page().
>> >
>> > Suggested-by: David Hildenbrand <david@redhat.com>
>> > Suggested-by: Dev Jain <dev.jain@arm.com>
>> > Signed-off-by: Lance Yang <lance.yang@linux.dev>
>>
>> In general I like the idea, the implementation needs work though.
>>
>> Will check in more detail on respin
>>
>> > ---
>> >  mm/khugepaged.c | 243 ++++++++++++++++++++++++++----------------------
>> >  1 file changed, 130 insertions(+), 113 deletions(-)
>> >
>> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> > index b5c0295c3414..7116caae1fa4 100644
>> > --- a/mm/khugepaged.c
>> > +++ b/mm/khugepaged.c
>> > @@ -61,6 +61,12 @@ enum scan_result {
>> >       SCAN_PAGE_FILLED,
>> >  };
>> >
>> > +enum pte_check_result {
>> > +     PTE_CHECK_SUCCEED,
>> > +     PTE_CHECK_CONTINUE,
>>
>> Don't love this logic - this feels like we're essentially abstracting
>> control flow, I mean what does 'continue' mean here? Other than you're in a
>> loop and please continue, which is relying a little too much on what the
>> caller does.
>>
>> if we retain this logic something like PTE_CHECK_SKIP would make more sense.
>>
>>
>> > +     PTE_CHECK_FAIL,
>> > +};
>> > +
>> >  #define CREATE_TRACE_POINTS
>> >  #include <trace/events/huge_memory.h>
>> >
>> > @@ -533,62 +539,139 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte,
>> >       }
>> >  }
>> >
>> > +/*
>> > + * thp_collapse_check_pte - Check if a PTE is suitable for THP collapse
>> > + * @pte:           The PTE to check
>> > + * @vma:           The VMA the PTE belongs to
>> > + * @addr:          The virtual address corresponding to this PTE
>> > + * @foliop:        On success, used to return a pointer to the folio
>> > + *                 Must be non-NULL
>> > + * @none_or_zero:  Counter for none/zero PTEs. Must be non-NULL
>> > + * @unmapped:      Counter for swap PTEs. Can be NULL if not scanning swaps
>> > + * @shared:        Counter for shared pages. Must be non-NULL
>> > + * @scan_result:   Used to return the failure reason (SCAN_*) on a
>> > + *                 PTE_CHECK_FAIL return. Must be non-NULL
>> > + * @cc:            Collapse control settings
>>
>> Do we really need this many parameters? THis is hard to follow.
>>
>> Of course it being me, I'd immediately prefer a helper struct :)
>
>Hey!
>
>Ive been trying to reimplement this patch, and started converting this
>to a helper struct... At first glance it seems like the right
>approach, but because the scan/isolate share two slightly different
>implementations, we need to leave some logic in the parent. This
>results in polluting the rest of the code significantly just to save
>one ugly function call with many parameters.

Been a while ... so I don't fully recall anymore :)

If you already have a patch or series, maybe just post it and we can look
at the actual diff.

Probably easier than talking about it without seeing the code :D

Cheers, Lance

>
>not sure what to do about that tbh.
>
>e.g., (just one of many examples)
>
>
>-                               NR_ISOLATED_ANON + folio_is_file_lru(folio),
>-                               folio_nr_pages(folio));
>-               VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
>-               VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
>+                               NR_ISOLATED_ANON + folio_is_file_lru(ctx.folio),
>+                               folio_nr_pages(ctx.folio));
>+               VM_BUG_ON_FOLIO(!folio_test_locked(ctx.folio), ctx.folio);
>+               VM_BUG_ON_FOLIO(folio_test_lru(ctx.folio), ctx.folio);
>
>-               if (folio_test_large(folio))
>-                       list_add_tail(&folio->lru, compound_pagelist);
>+               if (folio_test_large(ctx.folio))
>+                       list_add_tail(ctx.folio->lru, compound_pagelist);
>
>Are we sure this helper structure is the right approach?
>
>-- Nico
>
>>
>> > + *
>> > + * Returns:
>> > + *   PTE_CHECK_SUCCEED  - PTE is suitable, proceed with further checks
>> > + *   PTE_CHECK_CONTINUE - Skip this PTE and continue scanning
>> > + *   PTE_CHECK_FAIL     - Abort collapse scan
>> > + */
>> > +static inline int thp_collapse_check_pte(pte_t pte, struct vm_area_struct *vma,
>>
>> There's no need for inline in an internal static function in a file.
>>
>> > +             unsigned long addr, struct folio **foliop, int *none_or_zero,
>> > +             int *unmapped, int *shared, int *scan_result,
>> > +             struct collapse_control *cc)
>> > +{
>> > +     struct folio *folio = NULL;
>> > +
>> > +     if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
>> > +             (*none_or_zero)++;
>> > +             if (!userfaultfd_armed(vma) &&
>> > +                 (!cc->is_khugepaged ||
>> > +                  *none_or_zero <= khugepaged_max_ptes_none)) {
>> > +                     return PTE_CHECK_CONTINUE;
>> > +             } else {
>> > +                     *scan_result = SCAN_EXCEED_NONE_PTE;
>> > +                     count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> > +                     return PTE_CHECK_FAIL;
>> > +             }
>> > +     } else if (!pte_present(pte)) {
>>
>> You're now making the if-else issues with previous patches worse by
>> returning which gets even checkpatch to warn you.
>>
>>         if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
>>
>> (of course note that I am not convinced you can drop the pte_present()
>> check here)
>>
>>                 (*none_or_zero)++;
>>                 if (!userfaultfd_armed(vma) &&
>>                     (!cc->is_khugepaged ||
>>                      *none_or_zero <= khugepaged_max_ptes_none))
>>                         return PTE_CHECK_CONTINUE;
>>
>>                 *scan_result = SCAN_EXCEED_NONE_PTE;
>>                 count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>>                 return PTE_CHECK_FAIL;
>>         }
>>
>>         if (!pte_present(pte)) {
>>                 ...
>>         }
>>
>> But even that really needs seriously more refactoring - that condition
>> above is horrible for instance so, e.g.:
>>
>>         if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
>>                 (*none_or_zero)++;
>>
>>                 /* Cases where this is acceptable. */
>>                 if (!userfaultfd_armed(vma))
>>                         return PTE_CHECK_SKIP;
>>                 if (!cc->is_khugepaged)
>>                         return PTE_CHECK_SKIP;
>>                 if (*none_or_zero <= khugepaged_max_ptes_none)
>>                         return PTE_CHECK_SKIP;
>>
>>                 /* Otherwise, we must abort the scan. */
>>                 *scan_result = SCAN_EXCEED_NONE_PTE;
>>                 count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>>                 return PTE_CHECK_FAIL;
>>         }
>>
>>         if (!pte_present(pte)) {
>>                 ...
>>         }
>>
>> Improves things a lot.
>>
>> I do however _hate_ this (*blah)++; thing. A helper struct would help :)
>>
>>
>>
>> > +             if (!unmapped) {
>> > +                     *scan_result = SCAN_PTE_NON_PRESENT;
>> > +                     return PTE_CHECK_FAIL;
>>
>> Can't we have a helper that sets the result, e.g.:
>>
>>         return pte_check_fail(scan_result, SCAN_PTE_NON_PRESENT);
>>
>> static enum pte_check_result pte_check_fail(int *scan_result,
>>                 enum pte_check_result result)
>> {
>>         *scan_result = result;
>>         return PTE_CHECK_FAIL;
>> }
>>
>> And same for success/skip
>>
>> Then all of these horrible if (blah) { *foo = bar; return baz; } can be
>> made into
>>
>>         if (blah)
>>                 return pte_check_xxx(..., SCAN_PTE_...);
>>
>> > +             }
>> > +
>> > +             if (non_swap_entry(pte_to_swp_entry(pte))) {
>> > +                     *scan_result = SCAN_PTE_NON_PRESENT;
>> > +                     return PTE_CHECK_FAIL;
>> > +             }
>> > +
>> > +             (*unmapped)++;
>> > +             if (!cc->is_khugepaged ||
>> > +                 *unmapped <= khugepaged_max_ptes_swap) {
>> > +                     /*
>> > +                      * Always be strict with uffd-wp enabled swap
>> > +                      * entries. Please see comment below for
>> > +                      * pte_uffd_wp().
>> > +                      */
>> > +                     if (pte_swp_uffd_wp(pte)) {
>> > +                             *scan_result = SCAN_PTE_UFFD_WP;
>> > +                             return PTE_CHECK_FAIL;
>> > +                     }
>> > +                     return PTE_CHECK_CONTINUE;
>> > +             } else {
>> > +                     *scan_result = SCAN_EXCEED_SWAP_PTE;
>> > +                     count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
>> > +                     return PTE_CHECK_FAIL;
>> > +             }
>> > +     } else if (pte_uffd_wp(pte)) {
>> > +             /*
>> > +              * Don't collapse the page if any of the small PTEs are
>> > +              * armed with uffd write protection. Here we can also mark
>> > +              * the new huge pmd as write protected if any of the small
>> > +              * ones is marked but that could bring unknown userfault
>> > +              * messages that falls outside of the registered range.
>> > +              * So, just be simple.
>> > +              */
>> > +             *scan_result = SCAN_PTE_UFFD_WP;
>> > +             return PTE_CHECK_FAIL;
>> > +     }
>> > +
>>
>> Again as all of the comments for previous series still apply here so
>> obviously I don't like this as-is :)
>>
>> And as Andrew has pointed out, now checkpatch is finding the 'pointless
>> else' issue (as mentioned above also).
>>
>> > +     folio = vm_normal_folio(vma, addr, pte);
>> > +     if (unlikely(!folio) || unlikely(folio_is_zone_device(folio))) {
>> > +             *scan_result = SCAN_PAGE_NULL;
>> > +             return PTE_CHECK_FAIL;
>> > +     }
>> > +
>> > +     if (!folio_test_anon(folio)) {
>> > +             VM_WARN_ON_FOLIO(true, folio);
>> > +             *scan_result = SCAN_PAGE_ANON;
>> > +             return PTE_CHECK_FAIL;
>> > +     }
>> > +
>> > +     /*
>> > +      * We treat a single page as shared if any part of the THP
>> > +      * is shared.
>> > +      */
>> > +     if (folio_maybe_mapped_shared(folio)) {
>> > +             (*shared)++;
>> > +             if (cc->is_khugepaged && *shared > khugepaged_max_ptes_shared) {
>> > +                     *scan_result = SCAN_EXCEED_SHARED_PTE;
>> > +                     count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
>> > +                     return PTE_CHECK_FAIL;
>> > +             }
>> > +     }
>> > +
>> > +     *foliop = folio;
>> > +
>> > +     return PTE_CHECK_SUCCEED;
>> > +}
>> > +
>> >  static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>> >                                       unsigned long start_addr,
>> >                                       pte_t *pte,
>> >                                       struct collapse_control *cc,
>> >                                       struct list_head *compound_pagelist)
>> >  {
>> > -     struct page *page = NULL;
>> >       struct folio *folio = NULL;
>> >       unsigned long addr = start_addr;
>> >       pte_t *_pte;
>> >       int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
>> > +     int pte_check_res;
>> >
>> >       for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
>> >            _pte++, addr += PAGE_SIZE) {
>> >               pte_t pteval = ptep_get(_pte);
>> > -             if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> > -                     ++none_or_zero;
>> > -                     if (!userfaultfd_armed(vma) &&
>> > -                         (!cc->is_khugepaged ||
>> > -                          none_or_zero <= khugepaged_max_ptes_none)) {
>> > -                             continue;
>> > -                     } else {
>> > -                             result = SCAN_EXCEED_NONE_PTE;
>> > -                             count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> > -                             goto out;
>> > -                     }
>> > -             } else if (!pte_present(pteval)) {
>> > -                     result = SCAN_PTE_NON_PRESENT;
>> > -                     goto out;
>> > -             } else if (pte_uffd_wp(pteval)) {
>> > -                     result = SCAN_PTE_UFFD_WP;
>> > -                     goto out;
>> > -             }
>> > -             page = vm_normal_page(vma, addr, pteval);
>> > -             if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
>> > -                     result = SCAN_PAGE_NULL;
>> > -                     goto out;
>> > -             }
>> >
>> > -             folio = page_folio(page);
>> > -             if (!folio_test_anon(folio)) {
>> > -                     VM_WARN_ON_FOLIO(true, folio);
>> > -                     result = SCAN_PAGE_ANON;
>> > -                     goto out;
>> > -             }
>> > +             pte_check_res = thp_collapse_check_pte(pteval, vma, addr,
>> > +                                     &folio, &none_or_zero, NULL, &shared,
>> > +                                     &result, cc);
>> >
>> > -             /* See hpage_collapse_scan_pmd(). */
>> > -             if (folio_maybe_mapped_shared(folio)) {
>> > -                     ++shared;
>> > -                     if (cc->is_khugepaged &&
>> > -                         shared > khugepaged_max_ptes_shared) {
>> > -                             result = SCAN_EXCEED_SHARED_PTE;
>> > -                             count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
>> > -                             goto out;
>> > -                     }
>> > -             }
>> > +             if (pte_check_res == PTE_CHECK_CONTINUE)
>> > +                     continue;
>> > +             else if (pte_check_res == PTE_CHECK_FAIL)
>> > +                     goto out;
>> >
>> >               if (folio_test_large(folio)) {
>> >                       struct folio *f;
>> > @@ -1264,11 +1347,11 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>> >       pte_t *pte, *_pte;
>> >       int result = SCAN_FAIL, referenced = 0;
>> >       int none_or_zero = 0, shared = 0;
>> > -     struct page *page = NULL;
>> >       struct folio *folio = NULL;
>> >       unsigned long addr;
>> >       spinlock_t *ptl;
>> >       int node = NUMA_NO_NODE, unmapped = 0;
>> > +     int pte_check_res;
>> >
>> >       VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
>> >
>> > @@ -1287,81 +1370,15 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>> >       for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
>> >            _pte++, addr += PAGE_SIZE) {
>> >               pte_t pteval = ptep_get(_pte);
>> > -             if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
>> > -                     ++none_or_zero;
>> > -                     if (!userfaultfd_armed(vma) &&
>> > -                         (!cc->is_khugepaged ||
>> > -                          none_or_zero <= khugepaged_max_ptes_none)) {
>> > -                             continue;
>> > -                     } else {
>> > -                             result = SCAN_EXCEED_NONE_PTE;
>> > -                             count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>> > -                             goto out_unmap;
>> > -                     }
>> > -             } else if (!pte_present(pteval)) {
>> > -                     if (non_swap_entry(pte_to_swp_entry(pteval))) {
>> > -                             result = SCAN_PTE_NON_PRESENT;
>> > -                             goto out_unmap;
>> > -                     }
>> >
>> > -                     ++unmapped;
>> > -                     if (!cc->is_khugepaged ||
>> > -                         unmapped <= khugepaged_max_ptes_swap) {
>> > -                             /*
>> > -                              * Always be strict with uffd-wp
>> > -                              * enabled swap entries.  Please see
>> > -                              * comment below for pte_uffd_wp().
>> > -                              */
>> > -                             if (pte_swp_uffd_wp(pteval)) {
>> > -                                     result = SCAN_PTE_UFFD_WP;
>> > -                                     goto out_unmap;
>> > -                             }
>> > -                             continue;
>> > -                     } else {
>> > -                             result = SCAN_EXCEED_SWAP_PTE;
>> > -                             count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
>> > -                             goto out_unmap;
>> > -                     }
>> > -             } else if (pte_uffd_wp(pteval)) {
>> > -                     /*
>> > -                      * Don't collapse the page if any of the small
>> > -                      * PTEs are armed with uffd write protection.
>> > -                      * Here we can also mark the new huge pmd as
>> > -                      * write protected if any of the small ones is
>> > -                      * marked but that could bring unknown
>> > -                      * userfault messages that falls outside of
>> > -                      * the registered range.  So, just be simple.
>> > -                      */
>> > -                     result = SCAN_PTE_UFFD_WP;
>> > -                     goto out_unmap;
>> > -             }
>> > +             pte_check_res = thp_collapse_check_pte(pteval, vma, addr,
>> > +                                     &folio, &none_or_zero, &unmapped,
>> > +                                     &shared, &result, cc);
>> >
>> > -             page = vm_normal_page(vma, addr, pteval);
>> > -             if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
>> > -                     result = SCAN_PAGE_NULL;
>> > -                     goto out_unmap;
>> > -             }
>> > -             folio = page_folio(page);
>> > -
>> > -             if (!folio_test_anon(folio)) {
>> > -                     VM_WARN_ON_FOLIO(true, folio);
>> > -                     result = SCAN_PAGE_ANON;
>> > +             if (pte_check_res == PTE_CHECK_CONTINUE)
>> > +                     continue;
>> > +             else if (pte_check_res == PTE_CHECK_FAIL)
>> >                       goto out_unmap;
>> > -             }
>> > -
>> > -             /*
>> > -              * We treat a single page as shared if any part of the THP
>> > -              * is shared.
>> > -              */
>> > -             if (folio_maybe_mapped_shared(folio)) {
>> > -                     ++shared;
>> > -                     if (cc->is_khugepaged &&
>> > -                         shared > khugepaged_max_ptes_shared) {
>> > -                             result = SCAN_EXCEED_SHARED_PTE;
>> > -                             count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
>> > -                             goto out_unmap;
>> > -                     }
>> > -             }
>> >
>> >               /*
>> >                * Record which node the original page is from and save this
>> > --
>> > 2.49.0
>> >
>>
>
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper
  2026-06-18 13:10       ` Lance Yang
@ 2026-06-18 13:19         ` Nico Pache
  2026-06-18 14:34           ` Lorenzo Stoakes
  0 siblings, 1 reply; 23+ messages in thread
From: Nico Pache @ 2026-06-18 13:19 UTC (permalink / raw)
  To: Lance Yang
  Cc: akpm, ziy, baolin.wang, Liam.Howlett, ryan.roberts, dev.jain,
	baohua, richard.weiyang, linux-kernel, linux-mm, david, ljs

On Thu, Jun 18, 2026 at 7:10 AM Lance Yang <lance.yang@linux.dev> wrote:
>
> +Cc David and Lorenzo, address oops fixed :)
>
> On Thu, Jun 18, 2026 at 06:22:49AM -0600, Nico Pache wrote:
> >On Tue, Oct 14, 2025 at 6:37 AM Lorenzo Stoakes
> ><lorenzo.stoakes@oracle.com> wrote:
> >>
> >> On Wed, Oct 08, 2025 at 12:37:48PM +0800, Lance Yang wrote:
> >> > From: Lance Yang <lance.yang@linux.dev>
> >> >
> >> > As David suggested, the PTE scanning logic in hpage_collapse_scan_pmd()
> >> > and __collapse_huge_page_isolate() was almost duplicated.
> >> >
> >> > This patch cleans things up by moving all the common PTE checking logic
> >> > into a new shared helper, thp_collapse_check_pte(). While at it, we use
> >> > vm_normal_folio() instead of vm_normal_page().
> >> >
> >> > Suggested-by: David Hildenbrand <david@redhat.com>
> >> > Suggested-by: Dev Jain <dev.jain@arm.com>
> >> > Signed-off-by: Lance Yang <lance.yang@linux.dev>
> >>
> >> In general I like the idea, the implementation needs work though.
> >>
> >> Will check in more detail on respin
> >>
> >> > ---
> >> >  mm/khugepaged.c | 243 ++++++++++++++++++++++++++----------------------
> >> >  1 file changed, 130 insertions(+), 113 deletions(-)
> >> >
> >> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> >> > index b5c0295c3414..7116caae1fa4 100644
> >> > --- a/mm/khugepaged.c
> >> > +++ b/mm/khugepaged.c
> >> > @@ -61,6 +61,12 @@ enum scan_result {
> >> >       SCAN_PAGE_FILLED,
> >> >  };
> >> >
> >> > +enum pte_check_result {
> >> > +     PTE_CHECK_SUCCEED,
> >> > +     PTE_CHECK_CONTINUE,
> >>
> >> Don't love this logic - this feels like we're essentially abstracting
> >> control flow, I mean what does 'continue' mean here? Other than you're in a
> >> loop and please continue, which is relying a little too much on what the
> >> caller does.
> >>
> >> if we retain this logic something like PTE_CHECK_SKIP would make more sense.
> >>
> >>
> >> > +     PTE_CHECK_FAIL,
> >> > +};
> >> > +
> >> >  #define CREATE_TRACE_POINTS
> >> >  #include <trace/events/huge_memory.h>
> >> >
> >> > @@ -533,62 +539,139 @@ static void release_pte_pages(pte_t *pte, pte_t *_pte,
> >> >       }
> >> >  }
> >> >
> >> > +/*
> >> > + * thp_collapse_check_pte - Check if a PTE is suitable for THP collapse
> >> > + * @pte:           The PTE to check
> >> > + * @vma:           The VMA the PTE belongs to
> >> > + * @addr:          The virtual address corresponding to this PTE
> >> > + * @foliop:        On success, used to return a pointer to the folio
> >> > + *                 Must be non-NULL
> >> > + * @none_or_zero:  Counter for none/zero PTEs. Must be non-NULL
> >> > + * @unmapped:      Counter for swap PTEs. Can be NULL if not scanning swaps
> >> > + * @shared:        Counter for shared pages. Must be non-NULL
> >> > + * @scan_result:   Used to return the failure reason (SCAN_*) on a
> >> > + *                 PTE_CHECK_FAIL return. Must be non-NULL
> >> > + * @cc:            Collapse control settings
> >>
> >> Do we really need this many parameters? THis is hard to follow.
> >>
> >> Of course it being me, I'd immediately prefer a helper struct :)
> >
> >Hey!
> >
> >Ive been trying to reimplement this patch, and started converting this
> >to a helper struct... At first glance it seems like the right
> >approach, but because the scan/isolate share two slightly different
> >implementations, we need to leave some logic in the parent. This
> >results in polluting the rest of the code significantly just to save
> >one ugly function call with many parameters.
>
> Been a while ... so I don't fully recall anymore :)
>
> If you already have a patch or series, maybe just post it and we can look
> at the actual diff.
>
> Probably easier than talking about it without seeing the code :D

Yeah fair point!

I think I found a good intermediate solution after all. Which was to
keep the folio local (not part of the helper), as that was the major
source of the noise.

I have a work-in-progress series to address a number of cleanups, the
code should be ready at the start of next week. I just want to test
and verify things first, and it's best not to post at the end of the
week :)

-- Nico

>
> Cheers, Lance
>
> >
> >not sure what to do about that tbh.
> >
> >e.g., (just one of many examples)
> >
> >
> >-                               NR_ISOLATED_ANON + folio_is_file_lru(folio),
> >-                               folio_nr_pages(folio));
> >-               VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
> >-               VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
> >+                               NR_ISOLATED_ANON + folio_is_file_lru(ctx.folio),
> >+                               folio_nr_pages(ctx.folio));
> >+               VM_BUG_ON_FOLIO(!folio_test_locked(ctx.folio), ctx.folio);
> >+               VM_BUG_ON_FOLIO(folio_test_lru(ctx.folio), ctx.folio);
> >
> >-               if (folio_test_large(folio))
> >-                       list_add_tail(&folio->lru, compound_pagelist);
> >+               if (folio_test_large(ctx.folio))
> >+                       list_add_tail(ctx.folio->lru, compound_pagelist);
> >
> >Are we sure this helper structure is the right approach?
> >
> >-- Nico
> >
> >>
> >> > + *
> >> > + * Returns:
> >> > + *   PTE_CHECK_SUCCEED  - PTE is suitable, proceed with further checks
> >> > + *   PTE_CHECK_CONTINUE - Skip this PTE and continue scanning
> >> > + *   PTE_CHECK_FAIL     - Abort collapse scan
> >> > + */
> >> > +static inline int thp_collapse_check_pte(pte_t pte, struct vm_area_struct *vma,
> >>
> >> There's no need for inline in an internal static function in a file.
> >>
> >> > +             unsigned long addr, struct folio **foliop, int *none_or_zero,
> >> > +             int *unmapped, int *shared, int *scan_result,
> >> > +             struct collapse_control *cc)
> >> > +{
> >> > +     struct folio *folio = NULL;
> >> > +
> >> > +     if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
> >> > +             (*none_or_zero)++;
> >> > +             if (!userfaultfd_armed(vma) &&
> >> > +                 (!cc->is_khugepaged ||
> >> > +                  *none_or_zero <= khugepaged_max_ptes_none)) {
> >> > +                     return PTE_CHECK_CONTINUE;
> >> > +             } else {
> >> > +                     *scan_result = SCAN_EXCEED_NONE_PTE;
> >> > +                     count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> >> > +                     return PTE_CHECK_FAIL;
> >> > +             }
> >> > +     } else if (!pte_present(pte)) {
> >>
> >> You're now making the if-else issues with previous patches worse by
> >> returning which gets even checkpatch to warn you.
> >>
> >>         if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
> >>
> >> (of course note that I am not convinced you can drop the pte_present()
> >> check here)
> >>
> >>                 (*none_or_zero)++;
> >>                 if (!userfaultfd_armed(vma) &&
> >>                     (!cc->is_khugepaged ||
> >>                      *none_or_zero <= khugepaged_max_ptes_none))
> >>                         return PTE_CHECK_CONTINUE;
> >>
> >>                 *scan_result = SCAN_EXCEED_NONE_PTE;
> >>                 count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> >>                 return PTE_CHECK_FAIL;
> >>         }
> >>
> >>         if (!pte_present(pte)) {
> >>                 ...
> >>         }
> >>
> >> But even that really needs seriously more refactoring - that condition
> >> above is horrible for instance so, e.g.:
> >>
> >>         if (pte_none(pte) || is_zero_pfn(pte_pfn(pte))) {
> >>                 (*none_or_zero)++;
> >>
> >>                 /* Cases where this is acceptable. */
> >>                 if (!userfaultfd_armed(vma))
> >>                         return PTE_CHECK_SKIP;
> >>                 if (!cc->is_khugepaged)
> >>                         return PTE_CHECK_SKIP;
> >>                 if (*none_or_zero <= khugepaged_max_ptes_none)
> >>                         return PTE_CHECK_SKIP;
> >>
> >>                 /* Otherwise, we must abort the scan. */
> >>                 *scan_result = SCAN_EXCEED_NONE_PTE;
> >>                 count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> >>                 return PTE_CHECK_FAIL;
> >>         }
> >>
> >>         if (!pte_present(pte)) {
> >>                 ...
> >>         }
> >>
> >> Improves things a lot.
> >>
> >> I do however _hate_ this (*blah)++; thing. A helper struct would help :)
> >>
> >>
> >>
> >> > +             if (!unmapped) {
> >> > +                     *scan_result = SCAN_PTE_NON_PRESENT;
> >> > +                     return PTE_CHECK_FAIL;
> >>
> >> Can't we have a helper that sets the result, e.g.:
> >>
> >>         return pte_check_fail(scan_result, SCAN_PTE_NON_PRESENT);
> >>
> >> static enum pte_check_result pte_check_fail(int *scan_result,
> >>                 enum pte_check_result result)
> >> {
> >>         *scan_result = result;
> >>         return PTE_CHECK_FAIL;
> >> }
> >>
> >> And same for success/skip
> >>
> >> Then all of these horrible if (blah) { *foo = bar; return baz; } can be
> >> made into
> >>
> >>         if (blah)
> >>                 return pte_check_xxx(..., SCAN_PTE_...);
> >>
> >> > +             }
> >> > +
> >> > +             if (non_swap_entry(pte_to_swp_entry(pte))) {
> >> > +                     *scan_result = SCAN_PTE_NON_PRESENT;
> >> > +                     return PTE_CHECK_FAIL;
> >> > +             }
> >> > +
> >> > +             (*unmapped)++;
> >> > +             if (!cc->is_khugepaged ||
> >> > +                 *unmapped <= khugepaged_max_ptes_swap) {
> >> > +                     /*
> >> > +                      * Always be strict with uffd-wp enabled swap
> >> > +                      * entries. Please see comment below for
> >> > +                      * pte_uffd_wp().
> >> > +                      */
> >> > +                     if (pte_swp_uffd_wp(pte)) {
> >> > +                             *scan_result = SCAN_PTE_UFFD_WP;
> >> > +                             return PTE_CHECK_FAIL;
> >> > +                     }
> >> > +                     return PTE_CHECK_CONTINUE;
> >> > +             } else {
> >> > +                     *scan_result = SCAN_EXCEED_SWAP_PTE;
> >> > +                     count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
> >> > +                     return PTE_CHECK_FAIL;
> >> > +             }
> >> > +     } else if (pte_uffd_wp(pte)) {
> >> > +             /*
> >> > +              * Don't collapse the page if any of the small PTEs are
> >> > +              * armed with uffd write protection. Here we can also mark
> >> > +              * the new huge pmd as write protected if any of the small
> >> > +              * ones is marked but that could bring unknown userfault
> >> > +              * messages that falls outside of the registered range.
> >> > +              * So, just be simple.
> >> > +              */
> >> > +             *scan_result = SCAN_PTE_UFFD_WP;
> >> > +             return PTE_CHECK_FAIL;
> >> > +     }
> >> > +
> >>
> >> Again as all of the comments for previous series still apply here so
> >> obviously I don't like this as-is :)
> >>
> >> And as Andrew has pointed out, now checkpatch is finding the 'pointless
> >> else' issue (as mentioned above also).
> >>
> >> > +     folio = vm_normal_folio(vma, addr, pte);
> >> > +     if (unlikely(!folio) || unlikely(folio_is_zone_device(folio))) {
> >> > +             *scan_result = SCAN_PAGE_NULL;
> >> > +             return PTE_CHECK_FAIL;
> >> > +     }
> >> > +
> >> > +     if (!folio_test_anon(folio)) {
> >> > +             VM_WARN_ON_FOLIO(true, folio);
> >> > +             *scan_result = SCAN_PAGE_ANON;
> >> > +             return PTE_CHECK_FAIL;
> >> > +     }
> >> > +
> >> > +     /*
> >> > +      * We treat a single page as shared if any part of the THP
> >> > +      * is shared.
> >> > +      */
> >> > +     if (folio_maybe_mapped_shared(folio)) {
> >> > +             (*shared)++;
> >> > +             if (cc->is_khugepaged && *shared > khugepaged_max_ptes_shared) {
> >> > +                     *scan_result = SCAN_EXCEED_SHARED_PTE;
> >> > +                     count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> >> > +                     return PTE_CHECK_FAIL;
> >> > +             }
> >> > +     }
> >> > +
> >> > +     *foliop = folio;
> >> > +
> >> > +     return PTE_CHECK_SUCCEED;
> >> > +}
> >> > +
> >> >  static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> >> >                                       unsigned long start_addr,
> >> >                                       pte_t *pte,
> >> >                                       struct collapse_control *cc,
> >> >                                       struct list_head *compound_pagelist)
> >> >  {
> >> > -     struct page *page = NULL;
> >> >       struct folio *folio = NULL;
> >> >       unsigned long addr = start_addr;
> >> >       pte_t *_pte;
> >> >       int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0;
> >> > +     int pte_check_res;
> >> >
> >> >       for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
> >> >            _pte++, addr += PAGE_SIZE) {
> >> >               pte_t pteval = ptep_get(_pte);
> >> > -             if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> >> > -                     ++none_or_zero;
> >> > -                     if (!userfaultfd_armed(vma) &&
> >> > -                         (!cc->is_khugepaged ||
> >> > -                          none_or_zero <= khugepaged_max_ptes_none)) {
> >> > -                             continue;
> >> > -                     } else {
> >> > -                             result = SCAN_EXCEED_NONE_PTE;
> >> > -                             count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> >> > -                             goto out;
> >> > -                     }
> >> > -             } else if (!pte_present(pteval)) {
> >> > -                     result = SCAN_PTE_NON_PRESENT;
> >> > -                     goto out;
> >> > -             } else if (pte_uffd_wp(pteval)) {
> >> > -                     result = SCAN_PTE_UFFD_WP;
> >> > -                     goto out;
> >> > -             }
> >> > -             page = vm_normal_page(vma, addr, pteval);
> >> > -             if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
> >> > -                     result = SCAN_PAGE_NULL;
> >> > -                     goto out;
> >> > -             }
> >> >
> >> > -             folio = page_folio(page);
> >> > -             if (!folio_test_anon(folio)) {
> >> > -                     VM_WARN_ON_FOLIO(true, folio);
> >> > -                     result = SCAN_PAGE_ANON;
> >> > -                     goto out;
> >> > -             }
> >> > +             pte_check_res = thp_collapse_check_pte(pteval, vma, addr,
> >> > +                                     &folio, &none_or_zero, NULL, &shared,
> >> > +                                     &result, cc);
> >> >
> >> > -             /* See hpage_collapse_scan_pmd(). */
> >> > -             if (folio_maybe_mapped_shared(folio)) {
> >> > -                     ++shared;
> >> > -                     if (cc->is_khugepaged &&
> >> > -                         shared > khugepaged_max_ptes_shared) {
> >> > -                             result = SCAN_EXCEED_SHARED_PTE;
> >> > -                             count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> >> > -                             goto out;
> >> > -                     }
> >> > -             }
> >> > +             if (pte_check_res == PTE_CHECK_CONTINUE)
> >> > +                     continue;
> >> > +             else if (pte_check_res == PTE_CHECK_FAIL)
> >> > +                     goto out;
> >> >
> >> >               if (folio_test_large(folio)) {
> >> >                       struct folio *f;
> >> > @@ -1264,11 +1347,11 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> >> >       pte_t *pte, *_pte;
> >> >       int result = SCAN_FAIL, referenced = 0;
> >> >       int none_or_zero = 0, shared = 0;
> >> > -     struct page *page = NULL;
> >> >       struct folio *folio = NULL;
> >> >       unsigned long addr;
> >> >       spinlock_t *ptl;
> >> >       int node = NUMA_NO_NODE, unmapped = 0;
> >> > +     int pte_check_res;
> >> >
> >> >       VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
> >> >
> >> > @@ -1287,81 +1370,15 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> >> >       for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR;
> >> >            _pte++, addr += PAGE_SIZE) {
> >> >               pte_t pteval = ptep_get(_pte);
> >> > -             if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> >> > -                     ++none_or_zero;
> >> > -                     if (!userfaultfd_armed(vma) &&
> >> > -                         (!cc->is_khugepaged ||
> >> > -                          none_or_zero <= khugepaged_max_ptes_none)) {
> >> > -                             continue;
> >> > -                     } else {
> >> > -                             result = SCAN_EXCEED_NONE_PTE;
> >> > -                             count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> >> > -                             goto out_unmap;
> >> > -                     }
> >> > -             } else if (!pte_present(pteval)) {
> >> > -                     if (non_swap_entry(pte_to_swp_entry(pteval))) {
> >> > -                             result = SCAN_PTE_NON_PRESENT;
> >> > -                             goto out_unmap;
> >> > -                     }
> >> >
> >> > -                     ++unmapped;
> >> > -                     if (!cc->is_khugepaged ||
> >> > -                         unmapped <= khugepaged_max_ptes_swap) {
> >> > -                             /*
> >> > -                              * Always be strict with uffd-wp
> >> > -                              * enabled swap entries.  Please see
> >> > -                              * comment below for pte_uffd_wp().
> >> > -                              */
> >> > -                             if (pte_swp_uffd_wp(pteval)) {
> >> > -                                     result = SCAN_PTE_UFFD_WP;
> >> > -                                     goto out_unmap;
> >> > -                             }
> >> > -                             continue;
> >> > -                     } else {
> >> > -                             result = SCAN_EXCEED_SWAP_PTE;
> >> > -                             count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
> >> > -                             goto out_unmap;
> >> > -                     }
> >> > -             } else if (pte_uffd_wp(pteval)) {
> >> > -                     /*
> >> > -                      * Don't collapse the page if any of the small
> >> > -                      * PTEs are armed with uffd write protection.
> >> > -                      * Here we can also mark the new huge pmd as
> >> > -                      * write protected if any of the small ones is
> >> > -                      * marked but that could bring unknown
> >> > -                      * userfault messages that falls outside of
> >> > -                      * the registered range.  So, just be simple.
> >> > -                      */
> >> > -                     result = SCAN_PTE_UFFD_WP;
> >> > -                     goto out_unmap;
> >> > -             }
> >> > +             pte_check_res = thp_collapse_check_pte(pteval, vma, addr,
> >> > +                                     &folio, &none_or_zero, &unmapped,
> >> > +                                     &shared, &result, cc);
> >> >
> >> > -             page = vm_normal_page(vma, addr, pteval);
> >> > -             if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
> >> > -                     result = SCAN_PAGE_NULL;
> >> > -                     goto out_unmap;
> >> > -             }
> >> > -             folio = page_folio(page);
> >> > -
> >> > -             if (!folio_test_anon(folio)) {
> >> > -                     VM_WARN_ON_FOLIO(true, folio);
> >> > -                     result = SCAN_PAGE_ANON;
> >> > +             if (pte_check_res == PTE_CHECK_CONTINUE)
> >> > +                     continue;
> >> > +             else if (pte_check_res == PTE_CHECK_FAIL)
> >> >                       goto out_unmap;
> >> > -             }
> >> > -
> >> > -             /*
> >> > -              * We treat a single page as shared if any part of the THP
> >> > -              * is shared.
> >> > -              */
> >> > -             if (folio_maybe_mapped_shared(folio)) {
> >> > -                     ++shared;
> >> > -                     if (cc->is_khugepaged &&
> >> > -                         shared > khugepaged_max_ptes_shared) {
> >> > -                             result = SCAN_EXCEED_SHARED_PTE;
> >> > -                             count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> >> > -                             goto out_unmap;
> >> > -                     }
> >> > -             }
> >> >
> >> >               /*
> >> >                * Record which node the original page is from and save this
> >> > --
> >> > 2.49.0
> >> >
> >>
> >
> >
>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper
  2026-06-18 13:19         ` Nico Pache
@ 2026-06-18 14:34           ` Lorenzo Stoakes
  0 siblings, 0 replies; 23+ messages in thread
From: Lorenzo Stoakes @ 2026-06-18 14:34 UTC (permalink / raw)
  To: Nico Pache
  Cc: Lance Yang, akpm, ziy, baolin.wang, Liam.Howlett, ryan.roberts,
	dev.jain, baohua, richard.weiyang, linux-kernel, linux-mm, david

On Thu, Jun 18, 2026 at 07:19:41AM -0600, Nico Pache wrote:
>
> I have a work-in-progress series to address a number of cleanups, the
> code should be ready at the start of next week. I just want to test
> and verify things first, and it's best not to post at the end of the
> week :)

Woo! Thanks :) Cleanups very welcome.

But yeah best to send on week of -rc1 please. I'm holding off on a pretty
large series myself and we need to get into the habit of treating the merge
window as quiet time.

I think Andrew is not taking anything atm anyway and asking people to
resend in -rc1 anyway.

(There's a debate as to - send early, get early feedback, resend in -rc1
vs. tidal wave in -rc1, but I'm kinda in favour of the tidal wave, as it at
least gives some breathing space. I am very very far behind on review rn :)

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2026-06-18 14:34 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-08  4:37 [PATCH mm-new v3 0/3] refactor and merge PTE scanning logic Lance Yang
2025-10-08  4:37 ` [PATCH mm-new v3 1/3] mm/khugepaged: optimize PTE scanning with if-else-if-else-if chain Lance Yang
2025-10-14 12:17   ` Lorenzo Stoakes
2025-10-14 12:27     ` David Hildenbrand
2025-10-15  4:49       ` Lance Yang
2025-10-15  9:16         ` Lorenzo Stoakes
2025-10-15  9:31           ` Lance Yang
2025-10-08  4:37 ` [PATCH mm-new v3 2/3] mm/khugepaged: use VM_WARN_ON_FOLIO instead of VM_BUG_ON_FOLIO for non-anon folios Lance Yang
2025-10-14 12:25   ` Lorenzo Stoakes
2025-10-08  4:37 ` [PATCH mm-new v3 3/3] mm/khugepaged: merge PTE scanning logic into a new helper Lance Yang
2025-10-09  1:07   ` Andrew Morton
2025-10-09  1:49     ` Lance Yang
2025-10-10  9:10   ` Dev Jain
2025-10-10 10:42     ` Lance Yang
2025-10-10 13:29   ` Wei Yang
2025-10-10 13:55     ` Lance Yang
2025-10-14 12:36   ` Lorenzo Stoakes
2026-06-18 12:22     ` Nico Pache
2026-06-18 13:10       ` Lance Yang
2026-06-18 13:19         ` Nico Pache
2026-06-18 14:34           ` Lorenzo Stoakes
2025-10-14 17:41   ` Lorenzo Stoakes
2025-10-15  1:48     ` Lance Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox