public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
* [PATCH v3 01/13] mm/huge_memory: simplify vma_is_specal_huge()
  2026-03-20 18:14 [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
@ 2026-03-20 18:07 ` Lorenzo Stoakes (Oracle)
  2026-03-20 18:07 ` [PATCH v3 02/13] mm/huge: avoid big else branch in zap_huge_pmd() Lorenzo Stoakes (Oracle)
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-20 18:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

This function is confused - it overloads the term 'special' yet again,
checks for DAX but in many cases the code explicitly excludes DAX before
invoking the predicate.

It also unnecessarily checks for vma->vm_file - this has to be present for
a driver to have set VMA_MIXEDMAP_BIT or VMA_PFNMAP_BIT.

In fact, a far simpler form of this is to reverse the DAX predicate and
return false if DAX is set.

This makes sense from the point of view of 'special' as in
vm_normal_page(), as DAX actually does potentially have retrievable
folios.

Also there's no need to have this in mm.h so move it to huge_memory.c.

No functional change intended.

Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 include/linux/huge_mm.h |  4 ++--
 include/linux/mm.h      | 16 ----------------
 mm/huge_memory.c        | 30 +++++++++++++++++++++++-------
 3 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 8d801ed378db..af726f0aa30d 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -83,7 +83,7 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr;
  * file is never split and the MAX_PAGECACHE_ORDER limit does not apply to
  * it.  Same to PFNMAPs where there's neither page* nor pagecache.
  */
-#define THP_ORDERS_ALL_SPECIAL		\
+#define THP_ORDERS_ALL_SPECIAL_DAX	\
 	(BIT(PMD_ORDER) | BIT(PUD_ORDER))
 #define THP_ORDERS_ALL_FILE_DEFAULT	\
 	((BIT(MAX_PAGECACHE_ORDER + 1) - 1) & ~BIT(0))
@@ -92,7 +92,7 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr;
  * Mask of all large folio orders supported for THP.
  */
 #define THP_ORDERS_ALL	\
-	(THP_ORDERS_ALL_ANON | THP_ORDERS_ALL_SPECIAL | THP_ORDERS_ALL_FILE_DEFAULT)
+	(THP_ORDERS_ALL_ANON | THP_ORDERS_ALL_SPECIAL_DAX | THP_ORDERS_ALL_FILE_DEFAULT)
 
 enum tva_type {
 	TVA_SMAPS,		/* Exposing "THPeligible:" in smaps. */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8aadf115278e..6b07ee99b38b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -5078,22 +5078,6 @@ long copy_folio_from_user(struct folio *dst_folio,
 			   const void __user *usr_src,
 			   bool allow_pagefault);
 
-/**
- * vma_is_special_huge - Are transhuge page-table entries considered special?
- * @vma: Pointer to the struct vm_area_struct to consider
- *
- * Whether transhuge page-table entries are considered "special" following
- * the definition in vm_normal_page().
- *
- * Return: true if transhuge page-table entries should be considered special,
- * false otherwise.
- */
-static inline bool vma_is_special_huge(const struct vm_area_struct *vma)
-{
-	return vma_is_dax(vma) || (vma->vm_file &&
-				   (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)));
-}
-
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
 
 #if MAX_NUMNODES > 1
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e90d08db219d..2775309b317a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -103,6 +103,14 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
 	return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
 }
 
+/* If returns true, we are unable to access the VMA's folios. */
+static bool vma_is_special_huge(const struct vm_area_struct *vma)
+{
+	if (vma_is_dax(vma))
+		return false;
+	return vma_test_any(vma, VMA_PFNMAP_BIT, VMA_MIXEDMAP_BIT);
+}
+
 unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
 					 vm_flags_t vm_flags,
 					 enum tva_type type,
@@ -116,8 +124,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
 	/* Check the intersection of requested and supported orders. */
 	if (vma_is_anonymous(vma))
 		supported_orders = THP_ORDERS_ALL_ANON;
-	else if (vma_is_special_huge(vma))
-		supported_orders = THP_ORDERS_ALL_SPECIAL;
+	else if (vma_is_dax(vma) || vma_is_special_huge(vma))
+		supported_orders = THP_ORDERS_ALL_SPECIAL_DAX;
 	else
 		supported_orders = THP_ORDERS_ALL_FILE_DEFAULT;
 
@@ -2338,7 +2346,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 						tlb->fullmm);
 	arch_check_zapped_pmd(vma, orig_pmd);
 	tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
-	if (!vma_is_dax(vma) && vma_is_special_huge(vma)) {
+	if (vma_is_special_huge(vma)) {
 		if (arch_needs_pgtable_deposit())
 			zap_deposited_table(tlb->mm, pmd);
 		spin_unlock(ptl);
@@ -2840,7 +2848,7 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	orig_pud = pudp_huge_get_and_clear_full(vma, addr, pud, tlb->fullmm);
 	arch_check_zapped_pud(vma, orig_pud);
 	tlb_remove_pud_tlb_entry(tlb, pud, addr);
-	if (!vma_is_dax(vma) && vma_is_special_huge(vma)) {
+	if (vma_is_special_huge(vma)) {
 		spin_unlock(ptl);
 		/* No zero page support yet */
 	} else {
@@ -2991,7 +2999,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 		 */
 		if (arch_needs_pgtable_deposit())
 			zap_deposited_table(mm, pmd);
-		if (!vma_is_dax(vma) && vma_is_special_huge(vma))
+		if (vma_is_special_huge(vma))
 			return;
 		if (unlikely(pmd_is_migration_entry(old_pmd))) {
 			const softleaf_t old_entry = softleaf_from_pmd(old_pmd);
@@ -4517,8 +4525,16 @@ static void split_huge_pages_all(void)
 
 static inline bool vma_not_suitable_for_thp_split(struct vm_area_struct *vma)
 {
-	return vma_is_special_huge(vma) || (vma->vm_flags & VM_IO) ||
-		    is_vm_hugetlb_page(vma);
+	if (vma_is_dax(vma))
+		return true;
+	if (vma_is_special_huge(vma))
+		return true;
+	if (vma_test(vma, VMA_IO_BIT))
+		return true;
+	if (is_vm_hugetlb_page(vma))
+		return true;
+
+	return false;
 }
 
 static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 02/13] mm/huge: avoid big else branch in zap_huge_pmd()
  2026-03-20 18:14 [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
  2026-03-20 18:07 ` [PATCH v3 01/13] mm/huge_memory: simplify vma_is_specal_huge() Lorenzo Stoakes (Oracle)
@ 2026-03-20 18:07 ` Lorenzo Stoakes (Oracle)
  2026-03-20 18:07 ` [PATCH v3 03/13] mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc Lorenzo Stoakes (Oracle)
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-20 18:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

We don't need to have an extra level of indentation, we can simply exit
early in the first two branches.

No functional change intended.

Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 mm/huge_memory.c | 87 +++++++++++++++++++++++++-----------------------
 1 file changed, 45 insertions(+), 42 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 2775309b317a..4e8df3a35cab 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2328,8 +2328,10 @@ static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
 int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		 pmd_t *pmd, unsigned long addr)
 {
-	pmd_t orig_pmd;
+	struct folio *folio = NULL;
+	int flush_needed = 1;
 	spinlock_t *ptl;
+	pmd_t orig_pmd;
 
 	tlb_change_page_size(tlb, HPAGE_PMD_SIZE);
 
@@ -2350,59 +2352,60 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		if (arch_needs_pgtable_deposit())
 			zap_deposited_table(tlb->mm, pmd);
 		spin_unlock(ptl);
-	} else if (is_huge_zero_pmd(orig_pmd)) {
+		return 1;
+	}
+	if (is_huge_zero_pmd(orig_pmd)) {
 		if (!vma_is_dax(vma) || arch_needs_pgtable_deposit())
 			zap_deposited_table(tlb->mm, pmd);
 		spin_unlock(ptl);
-	} else {
-		struct folio *folio = NULL;
-		int flush_needed = 1;
+		return 1;
+	}
 
-		if (pmd_present(orig_pmd)) {
-			struct page *page = pmd_page(orig_pmd);
+	if (pmd_present(orig_pmd)) {
+		struct page *page = pmd_page(orig_pmd);
 
-			folio = page_folio(page);
-			folio_remove_rmap_pmd(folio, page, vma);
-			WARN_ON_ONCE(folio_mapcount(folio) < 0);
-			VM_BUG_ON_PAGE(!PageHead(page), page);
-		} else if (pmd_is_valid_softleaf(orig_pmd)) {
-			const softleaf_t entry = softleaf_from_pmd(orig_pmd);
+		folio = page_folio(page);
+		folio_remove_rmap_pmd(folio, page, vma);
+		WARN_ON_ONCE(folio_mapcount(folio) < 0);
+		VM_BUG_ON_PAGE(!PageHead(page), page);
+	} else if (pmd_is_valid_softleaf(orig_pmd)) {
+		const softleaf_t entry = softleaf_from_pmd(orig_pmd);
 
-			folio = softleaf_to_folio(entry);
-			flush_needed = 0;
+		folio = softleaf_to_folio(entry);
+		flush_needed = 0;
 
-			if (!thp_migration_supported())
-				WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
-		}
+		if (!thp_migration_supported())
+			WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
+	}
 
-		if (folio_test_anon(folio)) {
+	if (folio_test_anon(folio)) {
+		zap_deposited_table(tlb->mm, pmd);
+		add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
+	} else {
+		if (arch_needs_pgtable_deposit())
 			zap_deposited_table(tlb->mm, pmd);
-			add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
-		} else {
-			if (arch_needs_pgtable_deposit())
-				zap_deposited_table(tlb->mm, pmd);
-			add_mm_counter(tlb->mm, mm_counter_file(folio),
-				       -HPAGE_PMD_NR);
-
-			/*
-			 * Use flush_needed to indicate whether the PMD entry
-			 * is present, instead of checking pmd_present() again.
-			 */
-			if (flush_needed && pmd_young(orig_pmd) &&
-			    likely(vma_has_recency(vma)))
-				folio_mark_accessed(folio);
-		}
+		add_mm_counter(tlb->mm, mm_counter_file(folio),
+			       -HPAGE_PMD_NR);
 
-		if (folio_is_device_private(folio)) {
-			folio_remove_rmap_pmd(folio, &folio->page, vma);
-			WARN_ON_ONCE(folio_mapcount(folio) < 0);
-			folio_put(folio);
-		}
+		/*
+		 * Use flush_needed to indicate whether the PMD entry
+		 * is present, instead of checking pmd_present() again.
+		 */
+		if (flush_needed && pmd_young(orig_pmd) &&
+		    likely(vma_has_recency(vma)))
+			folio_mark_accessed(folio);
+	}
 
-		spin_unlock(ptl);
-		if (flush_needed)
-			tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE);
+	if (folio_is_device_private(folio)) {
+		folio_remove_rmap_pmd(folio, &folio->page, vma);
+		WARN_ON_ONCE(folio_mapcount(folio) < 0);
+		folio_put(folio);
 	}
+
+	spin_unlock(ptl);
+	if (flush_needed)
+		tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE);
+
 	return 1;
 }
 
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 03/13] mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc
  2026-03-20 18:14 [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
  2026-03-20 18:07 ` [PATCH v3 01/13] mm/huge_memory: simplify vma_is_specal_huge() Lorenzo Stoakes (Oracle)
  2026-03-20 18:07 ` [PATCH v3 02/13] mm/huge: avoid big else branch in zap_huge_pmd() Lorenzo Stoakes (Oracle)
@ 2026-03-20 18:07 ` Lorenzo Stoakes (Oracle)
  2026-03-20 18:07 ` [PATCH v3 04/13] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() Lorenzo Stoakes (Oracle)
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-20 18:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

There's no need to use the ancient approach of returning an integer here,
just return a boolean.

Also update flush_needed to be a boolean, similarly.

Also add a kdoc comment describing the function.

No functional change intended.

Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 include/linux/huge_mm.h |  4 ++--
 mm/huge_memory.c        | 23 ++++++++++++++++-------
 2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index af726f0aa30d..1258fa37e85b 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -27,8 +27,8 @@ static inline void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud)
 vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf);
 bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 			   pmd_t *pmd, unsigned long addr, unsigned long next);
-int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd,
-		 unsigned long addr);
+bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd,
+		  unsigned long addr);
 int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pud,
 		 unsigned long addr);
 bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4e8df3a35cab..3c9e2ebaacfa 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2325,11 +2325,20 @@ static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
 	mm_dec_nr_ptes(mm);
 }
 
-int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
+/**
+ * zap_huge_pmd - Zap a huge THP which is of PMD size.
+ * @tlb: The MMU gather TLB state associated with the operation.
+ * @vma: The VMA containing the range to zap.
+ * @pmd: A pointer to the leaf PMD entry.
+ * @addr: The virtual address for the range to zap.
+ *
+ * Returns: %true on success, %false otherwise.
+ */
+bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		 pmd_t *pmd, unsigned long addr)
 {
 	struct folio *folio = NULL;
-	int flush_needed = 1;
+	bool flush_needed = true;
 	spinlock_t *ptl;
 	pmd_t orig_pmd;
 
@@ -2337,7 +2346,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 
 	ptl = __pmd_trans_huge_lock(pmd, vma);
 	if (!ptl)
-		return 0;
+		return false;
 	/*
 	 * For architectures like ppc64 we look at deposited pgtable
 	 * when calling pmdp_huge_get_and_clear. So do the
@@ -2352,13 +2361,13 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		if (arch_needs_pgtable_deposit())
 			zap_deposited_table(tlb->mm, pmd);
 		spin_unlock(ptl);
-		return 1;
+		return true;
 	}
 	if (is_huge_zero_pmd(orig_pmd)) {
 		if (!vma_is_dax(vma) || arch_needs_pgtable_deposit())
 			zap_deposited_table(tlb->mm, pmd);
 		spin_unlock(ptl);
-		return 1;
+		return true;
 	}
 
 	if (pmd_present(orig_pmd)) {
@@ -2372,7 +2381,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		const softleaf_t entry = softleaf_from_pmd(orig_pmd);
 
 		folio = softleaf_to_folio(entry);
-		flush_needed = 0;
+		flush_needed = false;
 
 		if (!thp_migration_supported())
 			WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
@@ -2406,7 +2415,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	if (flush_needed)
 		tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE);
 
-	return 1;
+	return true;
 }
 
 #ifndef pmd_move_must_withdraw
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 04/13] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
  2026-03-20 18:14 [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
                   ` (2 preceding siblings ...)
  2026-03-20 18:07 ` [PATCH v3 03/13] mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc Lorenzo Stoakes (Oracle)
@ 2026-03-20 18:07 ` Lorenzo Stoakes (Oracle)
  2026-03-20 18:07 ` [PATCH v3 05/13] mm/huge_memory: add a common exit path to zap_huge_pmd() Lorenzo Stoakes (Oracle)
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-20 18:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

A recent bug I analysed managed to, through a bug in the userfaultfd
implementation, reach an invalid point in the zap_huge_pmd() code where
the PMD was none of:

- A non-DAX, PFN or mixed map.
- The huge zero folio
- A present PMD entry
- A softleaf entry

The code at this point calls folio_test_anon() on a known-NULL folio.
Having logic like this explicitly NULL dereference in the code is hard to
understand, and makes debugging potentially more difficult.

Add an else branch to handle this case and WARN().

No functional change intended.

Link: https://lore.kernel.org/all/6b3d7ad7-49e1-407a-903d-3103704160d8@lucifer.local/
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 mm/huge_memory.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3c9e2ebaacfa..0056ac27ec9a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2385,6 +2385,10 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 
 		if (!thp_migration_supported())
 			WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
+	} else {
+		WARN_ON_ONCE(true);
+		spin_unlock(ptl);
+		return true;
 	}
 
 	if (folio_test_anon(folio)) {
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 05/13] mm/huge_memory: add a common exit path to zap_huge_pmd()
  2026-03-20 18:14 [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
                   ` (3 preceding siblings ...)
  2026-03-20 18:07 ` [PATCH v3 04/13] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() Lorenzo Stoakes (Oracle)
@ 2026-03-20 18:07 ` Lorenzo Stoakes (Oracle)
  2026-03-20 18:07 ` [PATCH v3 06/13] mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() Lorenzo Stoakes (Oracle)
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-20 18:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

Other than when we acquire the PTL, we always need to unlock the PTL, and
optionally need to flush on exit.

The code is currently very duplicated in this respect, so default
flush_needed to false, set it true in the case in which it's required,
then share the same logic for all exit paths.

This also makes flush_needed make more sense as a function-scope value (we
don't need to flush for the PFN map/mixed map, zero huge, error cases for
instance).

Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 mm/huge_memory.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 0056ac27ec9a..b9d9acfef147 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2338,7 +2338,7 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		 pmd_t *pmd, unsigned long addr)
 {
 	struct folio *folio = NULL;
-	bool flush_needed = true;
+	bool flush_needed = false;
 	spinlock_t *ptl;
 	pmd_t orig_pmd;
 
@@ -2360,19 +2360,18 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	if (vma_is_special_huge(vma)) {
 		if (arch_needs_pgtable_deposit())
 			zap_deposited_table(tlb->mm, pmd);
-		spin_unlock(ptl);
-		return true;
+		goto out;
 	}
 	if (is_huge_zero_pmd(orig_pmd)) {
 		if (!vma_is_dax(vma) || arch_needs_pgtable_deposit())
 			zap_deposited_table(tlb->mm, pmd);
-		spin_unlock(ptl);
-		return true;
+		goto out;
 	}
 
 	if (pmd_present(orig_pmd)) {
 		struct page *page = pmd_page(orig_pmd);
 
+		flush_needed = true;
 		folio = page_folio(page);
 		folio_remove_rmap_pmd(folio, page, vma);
 		WARN_ON_ONCE(folio_mapcount(folio) < 0);
@@ -2381,14 +2380,12 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		const softleaf_t entry = softleaf_from_pmd(orig_pmd);
 
 		folio = softleaf_to_folio(entry);
-		flush_needed = false;
 
 		if (!thp_migration_supported())
 			WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
 	} else {
 		WARN_ON_ONCE(true);
-		spin_unlock(ptl);
-		return true;
+		goto out;
 	}
 
 	if (folio_test_anon(folio)) {
@@ -2415,10 +2412,10 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		folio_put(folio);
 	}
 
+out:
 	spin_unlock(ptl);
 	if (flush_needed)
 		tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE);
-
 	return true;
 }
 
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 06/13] mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE()
  2026-03-20 18:14 [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
                   ` (4 preceding siblings ...)
  2026-03-20 18:07 ` [PATCH v3 05/13] mm/huge_memory: add a common exit path to zap_huge_pmd() Lorenzo Stoakes (Oracle)
@ 2026-03-20 18:07 ` Lorenzo Stoakes (Oracle)
  2026-03-20 18:07 ` [PATCH v3 07/13] mm/huge_memory: deduplicate zap deposited table call Lorenzo Stoakes (Oracle)
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-20 18:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

This has been around since the beginnings of the THP implementation.  I
think we can safely assume that, if we have a THP folio, it will have a
head page.

Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 mm/huge_memory.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b9d9acfef147..4add863cd18f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2375,7 +2375,6 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		folio = page_folio(page);
 		folio_remove_rmap_pmd(folio, page, vma);
 		WARN_ON_ONCE(folio_mapcount(folio) < 0);
-		VM_BUG_ON_PAGE(!PageHead(page), page);
 	} else if (pmd_is_valid_softleaf(orig_pmd)) {
 		const softleaf_t entry = softleaf_from_pmd(orig_pmd);
 
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 07/13] mm/huge_memory: deduplicate zap deposited table call
  2026-03-20 18:14 [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
                   ` (5 preceding siblings ...)
  2026-03-20 18:07 ` [PATCH v3 06/13] mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() Lorenzo Stoakes (Oracle)
@ 2026-03-20 18:07 ` Lorenzo Stoakes (Oracle)
  2026-03-21  5:39   ` Baolin Wang
  2026-03-20 18:07 ` [PATCH v3 08/13] mm/huge_memory: remove unnecessary sanity checks Lorenzo Stoakes (Oracle)
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-20 18:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

Rather than having separate logic for each case determining whether to zap
the deposited table, simply track this via a boolean.

We default this to whether the architecture requires it, and update it as
required elsewhere.

Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 mm/huge_memory.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4add863cd18f..fca44aec6022 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2337,6 +2337,7 @@ static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
 bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		 pmd_t *pmd, unsigned long addr)
 {
+	bool has_deposit = arch_needs_pgtable_deposit();
 	struct folio *folio = NULL;
 	bool flush_needed = false;
 	spinlock_t *ptl;
@@ -2357,23 +2358,19 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 						tlb->fullmm);
 	arch_check_zapped_pmd(vma, orig_pmd);
 	tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
-	if (vma_is_special_huge(vma)) {
-		if (arch_needs_pgtable_deposit())
-			zap_deposited_table(tlb->mm, pmd);
+	if (vma_is_special_huge(vma))
 		goto out;
-	}
 	if (is_huge_zero_pmd(orig_pmd)) {
-		if (!vma_is_dax(vma) || arch_needs_pgtable_deposit())
-			zap_deposited_table(tlb->mm, pmd);
+		if (!vma_is_dax(vma))
+			has_deposit = true;
 		goto out;
 	}
 
 	if (pmd_present(orig_pmd)) {
-		struct page *page = pmd_page(orig_pmd);
+		folio = pmd_folio(orig_pmd);
 
 		flush_needed = true;
-		folio = page_folio(page);
-		folio_remove_rmap_pmd(folio, page, vma);
+		folio_remove_rmap_pmd(folio, &folio->page, vma);
 		WARN_ON_ONCE(folio_mapcount(folio) < 0);
 	} else if (pmd_is_valid_softleaf(orig_pmd)) {
 		const softleaf_t entry = softleaf_from_pmd(orig_pmd);
@@ -2388,11 +2385,9 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	}
 
 	if (folio_test_anon(folio)) {
-		zap_deposited_table(tlb->mm, pmd);
+		has_deposit = true;
 		add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
 	} else {
-		if (arch_needs_pgtable_deposit())
-			zap_deposited_table(tlb->mm, pmd);
 		add_mm_counter(tlb->mm, mm_counter_file(folio),
 			       -HPAGE_PMD_NR);
 
@@ -2412,6 +2407,9 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	}
 
 out:
+	if (has_deposit)
+		zap_deposited_table(tlb->mm, pmd);
+
 	spin_unlock(ptl);
 	if (flush_needed)
 		tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE);
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 08/13] mm/huge_memory: remove unnecessary sanity checks
  2026-03-20 18:14 [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
                   ` (6 preceding siblings ...)
  2026-03-20 18:07 ` [PATCH v3 07/13] mm/huge_memory: deduplicate zap deposited table call Lorenzo Stoakes (Oracle)
@ 2026-03-20 18:07 ` Lorenzo Stoakes (Oracle)
  2026-03-20 18:07 ` [PATCH v3 09/13] mm/huge_memory: use mm instead of tlb->mm Lorenzo Stoakes (Oracle)
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-20 18:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

These checks have been in place since 2014, I think we can safely assume
that we are in a place where we don't need these as runtime checks.

In addition there are 4 other invocations of folio_remove_rmap_pmd(), none
of which make this assertion.

If we need to add this assertion, it should be in folio_remove_rmap_pmd(),
and as a VM_WARN_ON_ONCE(), however these seem superfluous so just remove
them.

Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 mm/huge_memory.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index fca44aec6022..c5b16c218900 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2371,7 +2371,6 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 
 		flush_needed = true;
 		folio_remove_rmap_pmd(folio, &folio->page, vma);
-		WARN_ON_ONCE(folio_mapcount(folio) < 0);
 	} else if (pmd_is_valid_softleaf(orig_pmd)) {
 		const softleaf_t entry = softleaf_from_pmd(orig_pmd);
 
@@ -2402,7 +2401,6 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 
 	if (folio_is_device_private(folio)) {
 		folio_remove_rmap_pmd(folio, &folio->page, vma);
-		WARN_ON_ONCE(folio_mapcount(folio) < 0);
 		folio_put(folio);
 	}
 
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 09/13] mm/huge_memory: use mm instead of tlb->mm
  2026-03-20 18:14 [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
                   ` (7 preceding siblings ...)
  2026-03-20 18:07 ` [PATCH v3 08/13] mm/huge_memory: remove unnecessary sanity checks Lorenzo Stoakes (Oracle)
@ 2026-03-20 18:07 ` Lorenzo Stoakes (Oracle)
  2026-03-21  5:42   ` Baolin Wang
  2026-03-20 18:07 ` [PATCH v3 10/13] mm/huge_memory: separate out the folio part of zap_huge_pmd() Lorenzo Stoakes (Oracle)
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-20 18:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

Reduce the repetition, and lay the ground for further refactorings by
keeping this variable separate.

Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 mm/huge_memory.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c5b16c218900..673d0c4734ad 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2338,6 +2338,7 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		 pmd_t *pmd, unsigned long addr)
 {
 	bool has_deposit = arch_needs_pgtable_deposit();
+	struct mm_struct *mm = tlb->mm;
 	struct folio *folio = NULL;
 	bool flush_needed = false;
 	spinlock_t *ptl;
@@ -2385,9 +2386,9 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 
 	if (folio_test_anon(folio)) {
 		has_deposit = true;
-		add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
+		add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR);
 	} else {
-		add_mm_counter(tlb->mm, mm_counter_file(folio),
+		add_mm_counter(mm, mm_counter_file(folio),
 			       -HPAGE_PMD_NR);
 
 		/*
@@ -2406,7 +2407,7 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 
 out:
 	if (has_deposit)
-		zap_deposited_table(tlb->mm, pmd);
+		zap_deposited_table(mm, pmd);
 
 	spin_unlock(ptl);
 	if (flush_needed)
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 10/13] mm/huge_memory: separate out the folio part of zap_huge_pmd()
  2026-03-20 18:14 [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
                   ` (8 preceding siblings ...)
  2026-03-20 18:07 ` [PATCH v3 09/13] mm/huge_memory: use mm instead of tlb->mm Lorenzo Stoakes (Oracle)
@ 2026-03-20 18:07 ` Lorenzo Stoakes (Oracle)
  2026-03-21  5:59   ` Baolin Wang
  2026-03-20 18:07 ` [PATCH v3 11/13] mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() Lorenzo Stoakes (Oracle)
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-20 18:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

Place the part of the logic that manipulates counters and possibly updates
the accessed bit of the folio into its own function to make zap_huge_pmd()
more readable.

Also rename flush_needed to is_present as we only require a flush for
present entries.

Additionally add comments as to why we're doing what we're doing with
respect to softleaf entries.

This also lays the ground for further refactoring.

Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 mm/huge_memory.c | 61 +++++++++++++++++++++++++++---------------------
 1 file changed, 35 insertions(+), 26 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 673d0c4734ad..9ddf38d68406 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2325,6 +2325,37 @@ static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
 	mm_dec_nr_ptes(mm);
 }
 
+static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma,
+		pmd_t pmdval, struct folio *folio, bool is_present,
+		bool *has_deposit)
+{
+	const bool is_device_private = folio_is_device_private(folio);
+
+	/* Present and device private folios are rmappable. */
+	if (is_present || is_device_private)
+		folio_remove_rmap_pmd(folio, &folio->page, vma);
+
+	if (folio_test_anon(folio)) {
+		*has_deposit = true;
+		add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR);
+	} else {
+		add_mm_counter(mm, mm_counter_file(folio),
+			       -HPAGE_PMD_NR);
+
+		/*
+		 * Use flush_needed to indicate whether the PMD entry
+		 * is present, instead of checking pmd_present() again.
+		 */
+		if (is_present && pmd_young(pmdval) &&
+		    likely(vma_has_recency(vma)))
+			folio_mark_accessed(folio);
+	}
+
+	/* Device private folios are pinned. */
+	if (is_device_private)
+		folio_put(folio);
+}
+
 /**
  * zap_huge_pmd - Zap a huge THP which is of PMD size.
  * @tlb: The MMU gather TLB state associated with the operation.
@@ -2340,7 +2371,7 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	bool has_deposit = arch_needs_pgtable_deposit();
 	struct mm_struct *mm = tlb->mm;
 	struct folio *folio = NULL;
-	bool flush_needed = false;
+	bool is_present = false;
 	spinlock_t *ptl;
 	pmd_t orig_pmd;
 
@@ -2369,14 +2400,11 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 
 	if (pmd_present(orig_pmd)) {
 		folio = pmd_folio(orig_pmd);
-
-		flush_needed = true;
-		folio_remove_rmap_pmd(folio, &folio->page, vma);
+		is_present = true;
 	} else if (pmd_is_valid_softleaf(orig_pmd)) {
 		const softleaf_t entry = softleaf_from_pmd(orig_pmd);
 
 		folio = softleaf_to_folio(entry);
-
 		if (!thp_migration_supported())
 			WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
 	} else {
@@ -2384,33 +2412,14 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		goto out;
 	}
 
-	if (folio_test_anon(folio)) {
-		has_deposit = true;
-		add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR);
-	} else {
-		add_mm_counter(mm, mm_counter_file(folio),
-			       -HPAGE_PMD_NR);
-
-		/*
-		 * Use flush_needed to indicate whether the PMD entry
-		 * is present, instead of checking pmd_present() again.
-		 */
-		if (flush_needed && pmd_young(orig_pmd) &&
-		    likely(vma_has_recency(vma)))
-			folio_mark_accessed(folio);
-	}
-
-	if (folio_is_device_private(folio)) {
-		folio_remove_rmap_pmd(folio, &folio->page, vma);
-		folio_put(folio);
-	}
+	zap_huge_pmd_folio(mm, vma, orig_pmd, folio, is_present, &has_deposit);
 
 out:
 	if (has_deposit)
 		zap_deposited_table(mm, pmd);
 
 	spin_unlock(ptl);
-	if (flush_needed)
+	if (is_present)
 		tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE);
 	return true;
 }
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 11/13] mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio()
  2026-03-20 18:14 [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
                   ` (9 preceding siblings ...)
  2026-03-20 18:07 ` [PATCH v3 10/13] mm/huge_memory: separate out the folio part of zap_huge_pmd() Lorenzo Stoakes (Oracle)
@ 2026-03-20 18:07 ` Lorenzo Stoakes (Oracle)
  2026-03-20 18:07 ` [PATCH v3 12/13] mm/huge_memory: add and use normal_or_softleaf_folio_pmd() Lorenzo Stoakes (Oracle)
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-20 18:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

Separate pmd_is_valid_softleaf() into separate components, then use the
pmd_is_valid_softleaf() predicate to implement pmd_to_softleaf_folio().

This returns the folio associated with a softleaf entry at PMD level. It
expects this to be valid for a PMD entry.

If CONFIG_DEBUG_VM is set, then assert on this being an invalid entry, and
either way return NULL in this case.

This lays the ground for further refactorings.

Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 include/linux/leafops.h | 39 +++++++++++++++++++++++++++++++++++----
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/include/linux/leafops.h b/include/linux/leafops.h
index dd4130b7cb7f..65957283fa9f 100644
--- a/include/linux/leafops.h
+++ b/include/linux/leafops.h
@@ -603,7 +603,20 @@ static inline bool pmd_is_migration_entry(pmd_t pmd)
 }
 
 /**
- * pmd_is_valid_softleaf() - Is this PMD entry a valid leaf entry?
+ * softleaf_is_valid_pmd_entry() - Is the specified softleaf entry obtained from
+ * a PMD one that we support at PMD level?
+ * @entry: Entry to check.
+ * Returns: true if the softleaf entry is valid at PMD, otherwise false.
+ */
+static inline bool softleaf_is_valid_pmd_entry(softleaf_t entry)
+{
+	/* Only device private, migration entries valid for PMD. */
+	return softleaf_is_device_private(entry) ||
+		softleaf_is_migration(entry);
+}
+
+/**
+ * pmd_is_valid_softleaf() - Is this PMD entry a valid softleaf entry?
  * @pmd: PMD entry.
  *
  * PMD leaf entries are valid only if they are device private or migration
@@ -616,9 +629,27 @@ static inline bool pmd_is_valid_softleaf(pmd_t pmd)
 {
 	const softleaf_t entry = softleaf_from_pmd(pmd);
 
-	/* Only device private, migration entries valid for PMD. */
-	return softleaf_is_device_private(entry) ||
-		softleaf_is_migration(entry);
+	return softleaf_is_valid_pmd_entry(entry);
+}
+
+/**
+ * pmd_to_softleaf_folio() - Convert the PMD entry to a folio.
+ * @pmd: PMD entry.
+ *
+ * The PMD entry is expected to be a valid PMD softleaf entry.
+ *
+ * Returns: the folio the softleaf entry references if this is a valid softleaf
+ * entry, otherwise NULL.
+ */
+static inline struct folio *pmd_to_softleaf_folio(pmd_t pmd)
+{
+	const softleaf_t entry = softleaf_from_pmd(pmd);
+
+	if (!softleaf_is_valid_pmd_entry(entry)) {
+		VM_WARN_ON_ONCE(true);
+		return NULL;
+	}
+	return softleaf_to_folio(entry);
 }
 
 #endif  /* CONFIG_MMU */
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 12/13] mm/huge_memory: add and use normal_or_softleaf_folio_pmd()
  2026-03-20 18:14 [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
                   ` (10 preceding siblings ...)
  2026-03-20 18:07 ` [PATCH v3 11/13] mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() Lorenzo Stoakes (Oracle)
@ 2026-03-20 18:07 ` Lorenzo Stoakes (Oracle)
  2026-03-23 11:24   ` Lorenzo Stoakes (Oracle)
  2026-03-20 18:07 ` [PATCH v3 13/13] mm/huge_memory: add and use has_deposited_pgtable() Lorenzo Stoakes (Oracle)
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-20 18:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

Now we have pmd_to_softleaf_folio() available to us which also raises a
CONFIG_DEBUG_VM warning if unexpectedly an invalid softleaf entry, we can
now abstract folio handling altogether.

vm_normal_folio() deals with the huge zero page (which is present), as well
as PFN map/mixed map mappings in both cases returning NULL.

Otherwise, we try to obtain the softleaf folio.

This makes the logic far easier to comprehend and has it use the standard
vm_normal_folio_pmd() path for decoding of present entries.

Finally, we have to update the flushing logic to only do so if a folio is
established.

This patch also makes the 'is_present' value more accurate - because PFN
map, mixed map and zero huge pages are present, just not present and
'normal'.

Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 mm/huge_memory.c | 47 +++++++++++++++++++----------------------------
 1 file changed, 19 insertions(+), 28 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9ddf38d68406..5831966391bd 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2342,10 +2342,6 @@ static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma,
 		add_mm_counter(mm, mm_counter_file(folio),
 			       -HPAGE_PMD_NR);
 
-		/*
-		 * Use flush_needed to indicate whether the PMD entry
-		 * is present, instead of checking pmd_present() again.
-		 */
 		if (is_present && pmd_young(pmdval) &&
 		    likely(vma_has_recency(vma)))
 			folio_mark_accessed(folio);
@@ -2356,6 +2352,17 @@ static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma,
 		folio_put(folio);
 }
 
+static struct folio *normal_or_softleaf_folio_pmd(struct vm_area_struct *vma,
+		unsigned long addr, pmd_t pmdval, bool is_present)
+{
+	if (is_present)
+		return vm_normal_folio_pmd(vma, addr, pmdval);
+
+	if (!thp_migration_supported())
+		WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
+	return pmd_to_softleaf_folio(pmdval);
+}
+
 /**
  * zap_huge_pmd - Zap a huge THP which is of PMD size.
  * @tlb: The MMU gather TLB state associated with the operation.
@@ -2390,36 +2397,20 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 						tlb->fullmm);
 	arch_check_zapped_pmd(vma, orig_pmd);
 	tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
-	if (vma_is_special_huge(vma))
-		goto out;
-	if (is_huge_zero_pmd(orig_pmd)) {
-		if (!vma_is_dax(vma))
-			has_deposit = true;
-		goto out;
-	}
 
-	if (pmd_present(orig_pmd)) {
-		folio = pmd_folio(orig_pmd);
-		is_present = true;
-	} else if (pmd_is_valid_softleaf(orig_pmd)) {
-		const softleaf_t entry = softleaf_from_pmd(orig_pmd);
+	is_present = pmd_present(orig_pmd);
+	folio = normal_or_softleaf_folio_pmd(vma, addr, orig_pmd, is_present);
+	if (folio)
+		zap_huge_pmd_folio(mm, vma, orig_pmd, folio, is_present,
+				   &has_deposit);
+	else if (is_huge_zero_pmd(orig_pmd))
+		has_deposit = !vma_is_dax(vma);
 
-		folio = softleaf_to_folio(entry);
-		if (!thp_migration_supported())
-			WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
-	} else {
-		WARN_ON_ONCE(true);
-		goto out;
-	}
-
-	zap_huge_pmd_folio(mm, vma, orig_pmd, folio, is_present, &has_deposit);
-
-out:
 	if (has_deposit)
 		zap_deposited_table(mm, pmd);
 
 	spin_unlock(ptl);
-	if (is_present)
+	if (is_present && folio)
 		tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE);
 	return true;
 }
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 13/13] mm/huge_memory: add and use has_deposited_pgtable()
  2026-03-20 18:14 [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
                   ` (11 preceding siblings ...)
  2026-03-20 18:07 ` [PATCH v3 12/13] mm/huge_memory: add and use normal_or_softleaf_folio_pmd() Lorenzo Stoakes (Oracle)
@ 2026-03-20 18:07 ` Lorenzo Stoakes (Oracle)
  2026-03-23 11:45   ` Lorenzo Stoakes (Oracle)
  2026-03-20 18:42 ` [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Andrew Morton
  2026-03-23 12:08 ` Lorenzo Stoakes (Oracle)
  14 siblings, 1 reply; 24+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-20 18:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

Rather than thread has_deposited through zap_huge_pmd(), make things
clearer by adding has_deposited_pgtable() with comments describing why in
each case.

Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 mm/huge_memory.c | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5831966391bd..610a6184e92c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2326,8 +2326,7 @@ static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
 }
 
 static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma,
-		pmd_t pmdval, struct folio *folio, bool is_present,
-		bool *has_deposit)
+		pmd_t pmdval, struct folio *folio, bool is_present)
 {
 	const bool is_device_private = folio_is_device_private(folio);
 
@@ -2336,7 +2335,6 @@ static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma,
 		folio_remove_rmap_pmd(folio, &folio->page, vma);
 
 	if (folio_test_anon(folio)) {
-		*has_deposit = true;
 		add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR);
 	} else {
 		add_mm_counter(mm, mm_counter_file(folio),
@@ -2363,6 +2361,27 @@ static struct folio *normal_or_softleaf_folio_pmd(struct vm_area_struct *vma,
 	return pmd_to_softleaf_folio(pmdval);
 }
 
+static bool has_deposited_pgtable(struct vm_area_struct *vma, pmd_t pmdval,
+		struct folio *folio)
+{
+	/* Some architectures require unconditional depositing. */
+	if (arch_needs_pgtable_deposit())
+		return true;
+
+	/*
+	 * Huge zero always deposited except for DAX which handles itself, see
+	 * set_huge_zero_folio().
+	 */
+	if (is_huge_zero_pmd(pmdval))
+		return !vma_is_dax(vma);
+
+	/*
+	 * Otherwise, only anonymous folios are deposited, see
+	 * __do_huge_pmd_anonymous_page().
+	 */
+	return folio && folio_test_anon(folio);
+}
+
 /**
  * zap_huge_pmd - Zap a huge THP which is of PMD size.
  * @tlb: The MMU gather TLB state associated with the operation.
@@ -2375,7 +2394,6 @@ static struct folio *normal_or_softleaf_folio_pmd(struct vm_area_struct *vma,
 bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		 pmd_t *pmd, unsigned long addr)
 {
-	bool has_deposit = arch_needs_pgtable_deposit();
 	struct mm_struct *mm = tlb->mm;
 	struct folio *folio = NULL;
 	bool is_present = false;
@@ -2401,12 +2419,9 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	is_present = pmd_present(orig_pmd);
 	folio = normal_or_softleaf_folio_pmd(vma, addr, orig_pmd, is_present);
 	if (folio)
-		zap_huge_pmd_folio(mm, vma, orig_pmd, folio, is_present,
-				   &has_deposit);
-	else if (is_huge_zero_pmd(orig_pmd))
-		has_deposit = !vma_is_dax(vma);
+		zap_huge_pmd_folio(mm, vma, orig_pmd, folio, is_present);
 
-	if (has_deposit)
+	if (has_deposited_pgtable(vma, orig_pmd, folio))
 		zap_deposited_table(mm, pmd);
 
 	spin_unlock(ptl);
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd()
@ 2026-03-20 18:14 Lorenzo Stoakes (Oracle)
  2026-03-20 18:07 ` [PATCH v3 01/13] mm/huge_memory: simplify vma_is_specal_huge() Lorenzo Stoakes (Oracle)
                   ` (14 more replies)
  0 siblings, 15 replies; 24+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-20 18:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

The zap_huge_pmd() function is overly complicated, clean it up and also add
an assert in the case that we encounter a buggy PMD entry that doesn't
match expectations.

This is motivated by a bug discovered [0] where the PMD entry was none of:

* A non-DAX, PFN or mixed map.
* The huge zero folio
* A present PMD entry
* A softleaf entry

In zap_huge_pmd(), but due to the bug we manged to reach this code.

It is useful to explicitly call this out rather than have an arbitrary NULL
pointer dereference happen, which also improves understanding of what's
going on.

The series goes further to make use of vm_normal_folio_pmd() rather than
implementing custom logic for retrieving the folio, and extends softleaf
functionality to provide and use an equivalent softleaf function.

[0]:https://lore.kernel.org/all/6b3d7ad7-49e1-407a-903d-3103704160d8@lucifer.local/


v3:
* Propagated tags, thanks everybody!
* Fixed const vma parameter in vma_is_special_huge() in 1/13 as per
  Sashiko.
* Renamed needs_deposit -> has_deposit as per Kiryl, better describing the
  situation as we're zapping deposited tables, not depositing them.
* Initialised has_deposit to arch_needs_pgtable_deposit(), and updated huge
  zero page case to account for that as per Kiryl.
* Dropped separated logic approach as per Baolin.
* Added 'No functional change intended.' caveats.
* Removed seemingly superfluous, inconsistent pot-folio_remove_rmap_pmd()
  mapcount sanity checks.
* De-duplicated tlb->mm's.
* Separated folio-specific logic into another function.
* Added softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() functions.
* Add and use normal_or_softleaf_folio_pmd() to make use of
  vm_normal_folio_pmd() and pmd_to_softleaf_folio() for obtaining the
  folio.
* Add and use has_deposited_pgtable() to figure out deposits.
* Added a bunch of explanatory comments as per Baolin.

v2:
* Added tags thanks everybody!
* Fixed issue with returning false on bug case potentially looping forever as
  per Baolin.
* Fixed further issue in bug path in 5/8 with double pte unlock.
* Add patch to use vm_normal_folio_pmd() as per David.
https://lore.kernel.org/all/cover.1773924928.git.ljs@kernel.org/
v1:
https://lore.kernel.org/all/cover.1773865827.git.ljs@kernel.org/

Lorenzo Stoakes (Oracle) (13):
  mm/huge_memory: simplify vma_is_specal_huge()
  mm/huge: avoid big else branch in zap_huge_pmd()
  mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc
  mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
  mm/huge_memory: add a common exit path to zap_huge_pmd()
  mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE()
  mm/huge_memory: deduplicate zap deposited table call
  mm/huge_memory: remove unnecessary sanity checks
  mm/huge_memory: use mm instead of tlb->mm
  mm/huge_memory: separate out the folio part of zap_huge_pmd()
  mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio()
  mm/huge_memory: add and use normal_or_softleaf_folio_pmd()
  mm/huge_memory: add and use has_deposited_pgtable()

 include/linux/huge_mm.h |   8 +-
 include/linux/leafops.h |  39 +++++++++-
 include/linux/mm.h      |  16 ----
 mm/huge_memory.c        | 168 +++++++++++++++++++++++++---------------
 4 files changed, 143 insertions(+), 88 deletions(-)

--
2.53.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd()
  2026-03-20 18:14 [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
                   ` (12 preceding siblings ...)
  2026-03-20 18:07 ` [PATCH v3 13/13] mm/huge_memory: add and use has_deposited_pgtable() Lorenzo Stoakes (Oracle)
@ 2026-03-20 18:42 ` Andrew Morton
  2026-03-23 12:08 ` Lorenzo Stoakes (Oracle)
  14 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2026-03-20 18:42 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

On Fri, 20 Mar 2026 18:14:50 +0000 "Lorenzo Stoakes (Oracle)" <ljs@kernel.org> wrote:

> The zap_huge_pmd() function is overly complicated, clean it up and also add
> an assert in the case that we encounter a buggy PMD entry that doesn't
> match expectations.
> 
> This is motivated by a bug discovered [0] where the PMD entry was none of:
> 
> * A non-DAX, PFN or mixed map.
> * The huge zero folio
> * A present PMD entry
> * A softleaf entry
> 
> In zap_huge_pmd(), but due to the bug we manged to reach this code.
> 
> It is useful to explicitly call this out rather than have an arbitrary NULL
> pointer dereference happen, which also improves understanding of what's
> going on.
> 
> The series goes further to make use of vm_normal_folio_pmd() rather than
> implementing custom logic for retrieving the folio, and extends softleaf
> functionality to provide and use an equivalent softleaf function.
> 

Thanks, I updated mm-unstable to this version.

> v3:
> * Propagated tags, thanks everybody!
> * Fixed const vma parameter in vma_is_special_huge() in 1/13 as per
>   Sashiko.
> * Renamed needs_deposit -> has_deposit as per Kiryl, better describing the
>   situation as we're zapping deposited tables, not depositing them.
> * Initialised has_deposit to arch_needs_pgtable_deposit(), and updated huge
>   zero page case to account for that as per Kiryl.
> * Dropped separated logic approach as per Baolin.
> * Added 'No functional change intended.' caveats.
> * Removed seemingly superfluous, inconsistent pot-folio_remove_rmap_pmd()
>   mapcount sanity checks.
> * De-duplicated tlb->mm's.
> * Separated folio-specific logic into another function.
> * Added softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() functions.
> * Add and use normal_or_softleaf_folio_pmd() to make use of
>   vm_normal_folio_pmd() and pmd_to_softleaf_folio() for obtaining the
>   folio.
> * Add and use has_deposited_pgtable() to figure out deposits.
> * Added a bunch of explanatory comments as per Baolin.

Here's how v3 altered mm.git:



 include/linux/leafops.h |   39 +++++++++++-
 mm/huge_memory.c        |  115 ++++++++++++++++++++++----------------
 2 files changed, 102 insertions(+), 52 deletions(-)

--- a/include/linux/leafops.h~b
+++ a/include/linux/leafops.h
@@ -603,7 +603,20 @@ static inline bool pmd_is_migration_entr
 }
 
 /**
- * pmd_is_valid_softleaf() - Is this PMD entry a valid leaf entry?
+ * softleaf_is_valid_pmd_entry() - Is the specified softleaf entry obtained from
+ * a PMD one that we support at PMD level?
+ * @entry: Entry to check.
+ * Returns: true if the softleaf entry is valid at PMD, otherwise false.
+ */
+static inline bool softleaf_is_valid_pmd_entry(softleaf_t entry)
+{
+	/* Only device private, migration entries valid for PMD. */
+	return softleaf_is_device_private(entry) ||
+		softleaf_is_migration(entry);
+}
+
+/**
+ * pmd_is_valid_softleaf() - Is this PMD entry a valid softleaf entry?
  * @pmd: PMD entry.
  *
  * PMD leaf entries are valid only if they are device private or migration
@@ -616,9 +629,27 @@ static inline bool pmd_is_valid_softleaf
 {
 	const softleaf_t entry = softleaf_from_pmd(pmd);
 
-	/* Only device private, migration entries valid for PMD. */
-	return softleaf_is_device_private(entry) ||
-		softleaf_is_migration(entry);
+	return softleaf_is_valid_pmd_entry(entry);
+}
+
+/**
+ * pmd_to_softleaf_folio() - Convert the PMD entry to a folio.
+ * @pmd: PMD entry.
+ *
+ * The PMD entry is expected to be a valid PMD softleaf entry.
+ *
+ * Returns: the folio the softleaf entry references if this is a valid softleaf
+ * entry, otherwise NULL.
+ */
+static inline struct folio *pmd_to_softleaf_folio(pmd_t pmd)
+{
+	const softleaf_t entry = softleaf_from_pmd(pmd);
+
+	if (!softleaf_is_valid_pmd_entry(entry)) {
+		VM_WARN_ON_ONCE(true);
+		return NULL;
+	}
+	return softleaf_to_folio(entry);
 }
 
 #endif  /* CONFIG_MMU */
--- a/mm/huge_memory.c~b
+++ a/mm/huge_memory.c
@@ -104,7 +104,7 @@ static inline bool file_thp_enabled(stru
 }
 
 /* If returns true, we are unable to access the VMA's folios. */
-static bool vma_is_special_huge(struct vm_area_struct *vma)
+static bool vma_is_special_huge(const struct vm_area_struct *vma)
 {
 	if (vma_is_dax(vma))
 		return false;
@@ -2325,6 +2325,63 @@ static inline void zap_deposited_table(s
 	mm_dec_nr_ptes(mm);
 }
 
+static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma,
+		pmd_t pmdval, struct folio *folio, bool is_present)
+{
+	const bool is_device_private = folio_is_device_private(folio);
+
+	/* Present and device private folios are rmappable. */
+	if (is_present || is_device_private)
+		folio_remove_rmap_pmd(folio, &folio->page, vma);
+
+	if (folio_test_anon(folio)) {
+		add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR);
+	} else {
+		add_mm_counter(mm, mm_counter_file(folio),
+			       -HPAGE_PMD_NR);
+
+		if (is_present && pmd_young(pmdval) &&
+		    likely(vma_has_recency(vma)))
+			folio_mark_accessed(folio);
+	}
+
+	/* Device private folios are pinned. */
+	if (is_device_private)
+		folio_put(folio);
+}
+
+static struct folio *normal_or_softleaf_folio_pmd(struct vm_area_struct *vma,
+		unsigned long addr, pmd_t pmdval, bool is_present)
+{
+	if (is_present)
+		return vm_normal_folio_pmd(vma, addr, pmdval);
+
+	if (!thp_migration_supported())
+		WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
+	return pmd_to_softleaf_folio(pmdval);
+}
+
+static bool has_deposited_pgtable(struct vm_area_struct *vma, pmd_t pmdval,
+		struct folio *folio)
+{
+	/* Some architectures require unconditional depositing. */
+	if (arch_needs_pgtable_deposit())
+		return true;
+
+	/*
+	 * Huge zero always deposited except for DAX which handles itself, see
+	 * set_huge_zero_folio().
+	 */
+	if (is_huge_zero_pmd(pmdval))
+		return !vma_is_dax(vma);
+
+	/*
+	 * Otherwise, only anonymous folios are deposited, see
+	 * __do_huge_pmd_anonymous_page().
+	 */
+	return folio && folio_test_anon(folio);
+}
+
 /**
  * zap_huge_pmd - Zap a huge THP which is of PMD size.
  * @tlb: The MMU gather TLB state associated with the operation.
@@ -2337,10 +2394,9 @@ static inline void zap_deposited_table(s
 bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		 pmd_t *pmd, unsigned long addr)
 {
-	bool needs_remove_rmap = false;
-	bool needs_deposit = false;
+	struct mm_struct *mm = tlb->mm;
+	struct folio *folio = NULL;
 	bool is_present = false;
-	struct folio *folio;
 	spinlock_t *ptl;
 	pmd_t orig_pmd;
 
@@ -2357,56 +2413,19 @@ bool zap_huge_pmd(struct mmu_gather *tlb
 	 */
 	orig_pmd = pmdp_huge_get_and_clear_full(vma, addr, pmd,
 						tlb->fullmm);
-
 	arch_check_zapped_pmd(vma, orig_pmd);
 	tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
 
-	if (pmd_present(orig_pmd)) {
-		folio = vm_normal_folio_pmd(vma, addr, orig_pmd);
-		if (folio) {
-			needs_remove_rmap = true;
-			is_present = true;
-		} else if (is_huge_zero_pmd(orig_pmd)) {
-			needs_deposit = !vma_is_dax(vma);
-		}
-	} else if (pmd_is_valid_softleaf(orig_pmd)) {
-		folio = softleaf_to_folio(softleaf_from_pmd(orig_pmd));
-		needs_remove_rmap = folio_is_device_private(folio);
-		if (!thp_migration_supported())
-			WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
-	} else {
-		WARN_ON_ONCE(true);
-		folio = NULL;
-	}
-	if (!folio)
-		goto out;
-
-	if (folio_test_anon(folio)) {
-		needs_deposit = true;
-		add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
-	} else {
-		add_mm_counter(tlb->mm, mm_counter_file(folio),
-			       -HPAGE_PMD_NR);
-
-		if (is_present && pmd_young(orig_pmd) &&
-		    likely(vma_has_recency(vma)))
-			folio_mark_accessed(folio);
-	}
+	is_present = pmd_present(orig_pmd);
+	folio = normal_or_softleaf_folio_pmd(vma, addr, orig_pmd, is_present);
+	if (folio)
+		zap_huge_pmd_folio(mm, vma, orig_pmd, folio, is_present);
 
-	if (needs_remove_rmap) {
-		folio_remove_rmap_pmd(folio, &folio->page, vma);
-		WARN_ON_ONCE(folio_mapcount(folio) < 0);
-	}
-
-out:
-	if (arch_needs_pgtable_deposit() || needs_deposit)
-		zap_deposited_table(tlb->mm, pmd);
-
-	if (needs_remove_rmap && !is_present)
-		folio_put(folio);
+	if (has_deposited_pgtable(vma, orig_pmd, folio))
+		zap_deposited_table(mm, pmd);
 
 	spin_unlock(ptl);
-	if (is_present)
+	if (is_present && folio)
 		tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE);
 	return true;
 }
_



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 07/13] mm/huge_memory: deduplicate zap deposited table call
  2026-03-20 18:07 ` [PATCH v3 07/13] mm/huge_memory: deduplicate zap deposited table call Lorenzo Stoakes (Oracle)
@ 2026-03-21  5:39   ` Baolin Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Baolin Wang @ 2026-03-21  5:39 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle), Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Liam R . Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Kiryl Shutsemau,
	linux-mm, linux-kernel



On 3/21/26 2:07 AM, Lorenzo Stoakes (Oracle) wrote:
> Rather than having separate logic for each case determining whether to zap
> the deposited table, simply track this via a boolean.
> 
> We default this to whether the architecture requires it, and update it as
> required elsewhere.
> 
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---

Thanks. The logic is much clearer now.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 09/13] mm/huge_memory: use mm instead of tlb->mm
  2026-03-20 18:07 ` [PATCH v3 09/13] mm/huge_memory: use mm instead of tlb->mm Lorenzo Stoakes (Oracle)
@ 2026-03-21  5:42   ` Baolin Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Baolin Wang @ 2026-03-21  5:42 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle), Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Liam R . Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Kiryl Shutsemau,
	linux-mm, linux-kernel



On 3/21/26 2:07 AM, Lorenzo Stoakes (Oracle) wrote:
> Reduce the repetition, and lay the ground for further refactorings by
> keeping this variable separate.
> 
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---

LGTM.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 10/13] mm/huge_memory: separate out the folio part of zap_huge_pmd()
  2026-03-20 18:07 ` [PATCH v3 10/13] mm/huge_memory: separate out the folio part of zap_huge_pmd() Lorenzo Stoakes (Oracle)
@ 2026-03-21  5:59   ` Baolin Wang
  2026-03-23 10:42     ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 24+ messages in thread
From: Baolin Wang @ 2026-03-21  5:59 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle), Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Liam R . Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Kiryl Shutsemau,
	linux-mm, linux-kernel



On 3/21/26 2:07 AM, Lorenzo Stoakes (Oracle) wrote:
> Place the part of the logic that manipulates counters and possibly updates
> the accessed bit of the folio into its own function to make zap_huge_pmd()
> more readable.
> 
> Also rename flush_needed to is_present as we only require a flush for
> present entries.
> 
> Additionally add comments as to why we're doing what we're doing with
> respect to softleaf entries.
> 
> This also lays the ground for further refactoring.
> 
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
>   mm/huge_memory.c | 61 +++++++++++++++++++++++++++---------------------
>   1 file changed, 35 insertions(+), 26 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 673d0c4734ad..9ddf38d68406 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2325,6 +2325,37 @@ static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
>   	mm_dec_nr_ptes(mm);
>   }
>   
> +static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma,
> +		pmd_t pmdval, struct folio *folio, bool is_present,
> +		bool *has_deposit)
> +{
> +	const bool is_device_private = folio_is_device_private(folio);
> +
> +	/* Present and device private folios are rmappable. */
> +	if (is_present || is_device_private)
> +		folio_remove_rmap_pmd(folio, &folio->page, vma);
> +
> +	if (folio_test_anon(folio)) {
> +		*has_deposit = true;
> +		add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR);
> +	} else {
> +		add_mm_counter(mm, mm_counter_file(folio),
> +			       -HPAGE_PMD_NR);
> +
> +		/*
> +		 * Use flush_needed to indicate whether the PMD entry
> +		 * is present, instead of checking pmd_present() again.
> +		 */
> +		if (is_present && pmd_young(pmdval) &&
> +		    likely(vma_has_recency(vma)))
> +			folio_mark_accessed(folio);

Nit: these comments were added by me to explain why 'flush_needed' was 
used:). Since it has been renamed to the more readable 'is_present', 
these comments are now redundant and can be removed.

With that,
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 10/13] mm/huge_memory: separate out the folio part of zap_huge_pmd()
  2026-03-21  5:59   ` Baolin Wang
@ 2026-03-23 10:42     ` Lorenzo Stoakes (Oracle)
  2026-03-24 12:42       ` Baolin Wang
  0 siblings, 1 reply; 24+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-23 10:42 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Andrew Morton, David Hildenbrand, Zi Yan, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

On Sat, Mar 21, 2026 at 01:59:34PM +0800, Baolin Wang wrote:
>
>
> On 3/21/26 2:07 AM, Lorenzo Stoakes (Oracle) wrote:
> > Place the part of the logic that manipulates counters and possibly updates
> > the accessed bit of the folio into its own function to make zap_huge_pmd()
> > more readable.
> >
> > Also rename flush_needed to is_present as we only require a flush for
> > present entries.
> >
> > Additionally add comments as to why we're doing what we're doing with
> > respect to softleaf entries.
> >
> > This also lays the ground for further refactoring.
> >
> > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > ---
> >   mm/huge_memory.c | 61 +++++++++++++++++++++++++++---------------------
> >   1 file changed, 35 insertions(+), 26 deletions(-)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 673d0c4734ad..9ddf38d68406 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2325,6 +2325,37 @@ static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
> >   	mm_dec_nr_ptes(mm);
> >   }
> > +static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma,
> > +		pmd_t pmdval, struct folio *folio, bool is_present,
> > +		bool *has_deposit)
> > +{
> > +	const bool is_device_private = folio_is_device_private(folio);
> > +
> > +	/* Present and device private folios are rmappable. */
> > +	if (is_present || is_device_private)
> > +		folio_remove_rmap_pmd(folio, &folio->page, vma);
> > +
> > +	if (folio_test_anon(folio)) {
> > +		*has_deposit = true;
> > +		add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR);
> > +	} else {
> > +		add_mm_counter(mm, mm_counter_file(folio),
> > +			       -HPAGE_PMD_NR);
> > +
> > +		/*
> > +		 * Use flush_needed to indicate whether the PMD entry
> > +		 * is present, instead of checking pmd_present() again.
> > +		 */
> > +		if (is_present && pmd_young(pmdval) &&
> > +		    likely(vma_has_recency(vma)))
> > +			folio_mark_accessed(folio);
>
> Nit: these comments were added by me to explain why 'flush_needed' was
> used:). Since it has been renamed to the more readable 'is_present', these
> comments are now redundant and can be removed.

Ack, I think it's _probably_ ok to leave that as a later commit removes it
anyway, if that works for you?

>
> With that,
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

Thanks!

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 12/13] mm/huge_memory: add and use normal_or_softleaf_folio_pmd()
  2026-03-20 18:07 ` [PATCH v3 12/13] mm/huge_memory: add and use normal_or_softleaf_folio_pmd() Lorenzo Stoakes (Oracle)
@ 2026-03-23 11:24   ` Lorenzo Stoakes (Oracle)
  0 siblings, 0 replies; 24+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-23 11:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

Hi Andrew,

Just a quick fix-patch for 12/13 to avoid a bisection hazard, as per
Sashiko. The next patch eliminates this case in any case.

Thanks, Lorenzo

----8<----
From 7c5c0a66517efba0e675e707f931cf8a3f1a4d87 Mon Sep 17 00:00:00 2001
From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Date: Mon, 23 Mar 2026 11:22:32 +0000
Subject: [PATCH] fix

Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 mm/huge_memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5831966391bd..5d5d9ddca6ff 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2404,7 +2404,7 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		zap_huge_pmd_folio(mm, vma, orig_pmd, folio, is_present,
 				  &has_deposit);
 	else if (is_huge_zero_pmd(orig_pmd))
-		has_deposit = !vma_is_dax(vma);
+		has_deposit = has_deposit || !vma_is_dax(vma);

 	if (has_deposit)
 		zap_deposited_table(mm, pmd);
--
2.53.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 13/13] mm/huge_memory: add and use has_deposited_pgtable()
  2026-03-20 18:07 ` [PATCH v3 13/13] mm/huge_memory: add and use has_deposited_pgtable() Lorenzo Stoakes (Oracle)
@ 2026-03-23 11:45   ` Lorenzo Stoakes (Oracle)
  2026-03-23 12:25     ` Lorenzo Stoakes (Oracle)
  0 siblings, 1 reply; 24+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-23 11:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

Hi Andrew,

Could you apply the below fix-patch to resolve an issue with us performing
folio_put() on a folio before checking it again to see if a table was
deposited, as per Sashiko.

This patch resolves the issue by storing whether or not this is the case in
a has_deposit local variable (as used previously) before invoking
zap_huge_pmd_folio(), then using this boolean to determine whether or not
to zap any deposited table.

Thanks, Lorenzo

----8<----
From 009f8abba834b49f8285b03a680dbd04d953a528 Mon Sep 17 00:00:00 2001
From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Date: Mon, 23 Mar 2026 11:42:01 +0000
Subject: [PATCH] fix

Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 mm/huge_memory.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 610a6184e92c..4585465eda0c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2397,6 +2397,7 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	struct mm_struct *mm = tlb->mm;
 	struct folio *folio = NULL;
 	bool is_present = false;
+	bool has_deposit;
 	spinlock_t *ptl;
 	pmd_t orig_pmd;

@@ -2420,8 +2421,8 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	folio = normal_or_softleaf_folio_pmd(vma, addr, orig_pmd, is_present);
 	if (folio)
 		zap_huge_pmd_folio(mm, vma, orig_pmd, folio, is_present);
-
-	if (has_deposited_pgtable(vma, orig_pmd, folio))
+	has_deposit = has_deposited_pgtable(vma, orig_pmd, folio);
+	if (has_deposit)
 		zap_deposited_table(mm, pmd);

 	spin_unlock(ptl);
--
2.53.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd()
  2026-03-20 18:14 [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
                   ` (13 preceding siblings ...)
  2026-03-20 18:42 ` [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Andrew Morton
@ 2026-03-23 12:08 ` Lorenzo Stoakes (Oracle)
  14 siblings, 0 replies; 24+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-23 12:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

I really don't want to have to reply to every bit of noise generated by
Sashiko, but for the sake of it, the stuff I'm not changing:

4/13

>Are other PMD handlers vulnerable to this same userfaultfd bug?

That's completely out of scope for the series. It's not a regression, it's
a suggestion for future work and so shouldn't be labelled as such by it,
instead of listing a 'high' priority regression.

7/13

>Is it possible for the error path immediately preceding this block to trigger
>a NULL pointer dereference?

You're already at the point where something has gone horribly wrong and a
bug in the kernel renders it unstable. It's not really worth trying to
avoid every possible bad outcome here in case of kernel bugs, that's not
practical and not worth the maintenance load to put such things in.

Also if I add this, it breaks further refactorings for the sake of
defending against a specific class of kernel bug - that's not worthwhile.

10/13

>Could evaluating folio_is_device_private(folio) cause issues if the PMD
>contains a migration entry rather than a device private entry?

If this is even possible (I don't think so), that was an existing
issue. This is a refactoring, it'd not be appropriate for me to
fundamentally change behaviour at the same time.

>This isn't a bug, but the comment still refers to flush_needed, which was
> renamed to is_present in this patch.

Baolin already raised, and I don't think it really matters to leave that
comment in there as it's removed in a later commit, and a comment isn't a
bisection hazard :)

11/13

>This isn't a bug, but the kernel-doc for pmd_is_valid_softleaf() states
>that it asserts the validity of the entry, while the function strictly

This is a turn of phrase...! Anybody wondering can go read the function.

>returns a boolean without triggering any warnings or bugs.
>Would it be better to update this comment to reflect the actual behavior,
>especially now that an actual assertion has been added to the neighboring
>pmd_to_softleaf_folio() function?

I think the CONFIG_DEBUG_VM assert itself is pretty good documentation of
there being a CONFIG_DEBUG_VM assert honestly. Should the kdoc comments
list the code too?

>Could this warning be written to evaluate the condition directly?
>if (VM_WARN_ON_ONCE(!softleaf_is_valid_pmd_entry(entry))) {
>        return NULL;
>}
>When VM_WARN_ON_ONCE(true) is placed inside an if block, the kernel's
>warning machinery stringifies and prints "true" as the failing condition
>in the backtrace, which makes debugging more difficult. Wrapping the actual
>condition inside the warning macro ensures the specific violated constraint
>is visible in the console output.

This is a silly comment anyway, you can figure out why the thing failed
very easily and this is a common pattern in the kernel.

But this is also a hallucination, VM_WARN_ON_ONCE() is defined as:

#define VM_WARN_ON_ONCE(cond) (void)WARN_ON_ONCE(cond)

So you know, that won't work, and even if I did it's a silly and pedantic
comment. Plus you don't use {} for single line branches...

12/13:

While the comment about deposit was valid, the comment:

>For non-DAX special VMAs, this also forces has_deposit to true even if
>the architecture does not need a deposit, potentially attempting to free a
>non-existent deposit.

Is another hallucination.:

	else if (is_huge_zero_pmd(orig_pmd))
		has_deposit = !vma_is_dax(vma);

This is the line it's discussing. So we're explicitly gating on
is_huge_zero_pmd(). It's not any 'special' VMA.

And in the original code:

	} else if (is_huge_zero_pmd(orig_pmd)) {
		if (!vma_is_dax(vma) || arch_needs_pgtable_deposit())
			zap_deposited_table(tlb->mm, pmd);
		...
	}

With the fix-patch applied this is 'has_deposit = has_deposit ||
!vma_is_dax()' where has_deposit is initialised with
arch_needs_pgtable_deposit(), so the logic matches.

--

Dealing with the above has taken a lot of time that I would rather spend
doing other things.

AI can asymmetrically drown us with this kind of stuff, radically
increasing workload.

This continues to be my primary concern with the application of AI, and the
only acceptable use of it will be in cases where we are able to filter
things well enough to avoid wasting people's time like this.

As I've said before, burnout continues to be a major hazard that is simply
being ignored in mm, and I don't think that's a healthy or good thing.

Let's please be considerate as to the opinions of those actually doing the
work.

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 13/13] mm/huge_memory: add and use has_deposited_pgtable()
  2026-03-23 11:45   ` Lorenzo Stoakes (Oracle)
@ 2026-03-23 12:25     ` Lorenzo Stoakes (Oracle)
  0 siblings, 0 replies; 24+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-23 12:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel

On Mon, Mar 23, 2026 at 11:45:23AM +0000, Lorenzo Stoakes (Oracle) wrote:
> Hi Andrew,
>
> Could you apply the below fix-patch to resolve an issue with us performing
> folio_put() on a folio before checking it again to see if a table was
> deposited, as per Sashiko.
>
> This patch resolves the issue by storing whether or not this is the case in
> a has_deposit local variable (as used previously) before invoking
> zap_huge_pmd_folio(), then using this boolean to determine whether or not
> to zap any deposited table.

Oops this is wrong...!

Please apply the below instead :)

Thanks, Lorenzo

----8<----
From e6d58747d00dd954c605201e97f8b769b2ba8cf8 Mon Sep 17 00:00:00 2001
From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
Date: Mon, 23 Mar 2026 11:42:01 +0000
Subject: [PATCH] fix

Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
 mm/huge_memory.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 610a6184e92c..b2a6060b3c20 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2397,6 +2397,7 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 	struct mm_struct *mm = tlb->mm;
 	struct folio *folio = NULL;
 	bool is_present = false;
+	bool has_deposit;
 	spinlock_t *ptl;
 	pmd_t orig_pmd;

@@ -2418,10 +2419,10 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,

 	is_present = pmd_present(orig_pmd);
 	folio = normal_or_softleaf_folio_pmd(vma, addr, orig_pmd, is_present);
+	has_deposit = has_deposited_pgtable(vma, orig_pmd, folio);
 	if (folio)
 		zap_huge_pmd_folio(mm, vma, orig_pmd, folio, is_present);
-
-	if (has_deposited_pgtable(vma, orig_pmd, folio))
+	if (has_deposit)
 		zap_deposited_table(mm, pmd);

 	spin_unlock(ptl);
--
2.53.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 10/13] mm/huge_memory: separate out the folio part of zap_huge_pmd()
  2026-03-23 10:42     ` Lorenzo Stoakes (Oracle)
@ 2026-03-24 12:42       ` Baolin Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Baolin Wang @ 2026-03-24 12:42 UTC (permalink / raw)
  To: Lorenzo Stoakes (Oracle)
  Cc: Andrew Morton, David Hildenbrand, Zi Yan, Liam R . Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
	Kiryl Shutsemau, linux-mm, linux-kernel



On 3/23/26 6:42 PM, Lorenzo Stoakes (Oracle) wrote:
> On Sat, Mar 21, 2026 at 01:59:34PM +0800, Baolin Wang wrote:
>>
>>
>> On 3/21/26 2:07 AM, Lorenzo Stoakes (Oracle) wrote:
>>> Place the part of the logic that manipulates counters and possibly updates
>>> the accessed bit of the folio into its own function to make zap_huge_pmd()
>>> more readable.
>>>
>>> Also rename flush_needed to is_present as we only require a flush for
>>> present entries.
>>>
>>> Additionally add comments as to why we're doing what we're doing with
>>> respect to softleaf entries.
>>>
>>> This also lays the ground for further refactoring.
>>>
>>> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
>>> ---
>>>    mm/huge_memory.c | 61 +++++++++++++++++++++++++++---------------------
>>>    1 file changed, 35 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 673d0c4734ad..9ddf38d68406 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -2325,6 +2325,37 @@ static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
>>>    	mm_dec_nr_ptes(mm);
>>>    }
>>> +static void zap_huge_pmd_folio(struct mm_struct *mm, struct vm_area_struct *vma,
>>> +		pmd_t pmdval, struct folio *folio, bool is_present,
>>> +		bool *has_deposit)
>>> +{
>>> +	const bool is_device_private = folio_is_device_private(folio);
>>> +
>>> +	/* Present and device private folios are rmappable. */
>>> +	if (is_present || is_device_private)
>>> +		folio_remove_rmap_pmd(folio, &folio->page, vma);
>>> +
>>> +	if (folio_test_anon(folio)) {
>>> +		*has_deposit = true;
>>> +		add_mm_counter(mm, MM_ANONPAGES, -HPAGE_PMD_NR);
>>> +	} else {
>>> +		add_mm_counter(mm, mm_counter_file(folio),
>>> +			       -HPAGE_PMD_NR);
>>> +
>>> +		/*
>>> +		 * Use flush_needed to indicate whether the PMD entry
>>> +		 * is present, instead of checking pmd_present() again.
>>> +		 */
>>> +		if (is_present && pmd_young(pmdval) &&
>>> +		    likely(vma_has_recency(vma)))
>>> +			folio_mark_accessed(folio);
>>
>> Nit: these comments were added by me to explain why 'flush_needed' was
>> used:). Since it has been renamed to the more readable 'is_present', these
>> comments are now redundant and can be removed.
> 
> Ack, I think it's _probably_ ok to leave that as a later commit removes it
> anyway, if that works for you?

Ah, I saw you removed them in the following patch. Looks fine to me. Thanks.


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2026-03-24 12:42 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-20 18:14 [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-20 18:07 ` [PATCH v3 01/13] mm/huge_memory: simplify vma_is_specal_huge() Lorenzo Stoakes (Oracle)
2026-03-20 18:07 ` [PATCH v3 02/13] mm/huge: avoid big else branch in zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-20 18:07 ` [PATCH v3 03/13] mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc Lorenzo Stoakes (Oracle)
2026-03-20 18:07 ` [PATCH v3 04/13] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-20 18:07 ` [PATCH v3 05/13] mm/huge_memory: add a common exit path to zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-20 18:07 ` [PATCH v3 06/13] mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() Lorenzo Stoakes (Oracle)
2026-03-20 18:07 ` [PATCH v3 07/13] mm/huge_memory: deduplicate zap deposited table call Lorenzo Stoakes (Oracle)
2026-03-21  5:39   ` Baolin Wang
2026-03-20 18:07 ` [PATCH v3 08/13] mm/huge_memory: remove unnecessary sanity checks Lorenzo Stoakes (Oracle)
2026-03-20 18:07 ` [PATCH v3 09/13] mm/huge_memory: use mm instead of tlb->mm Lorenzo Stoakes (Oracle)
2026-03-21  5:42   ` Baolin Wang
2026-03-20 18:07 ` [PATCH v3 10/13] mm/huge_memory: separate out the folio part of zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-21  5:59   ` Baolin Wang
2026-03-23 10:42     ` Lorenzo Stoakes (Oracle)
2026-03-24 12:42       ` Baolin Wang
2026-03-20 18:07 ` [PATCH v3 11/13] mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() Lorenzo Stoakes (Oracle)
2026-03-20 18:07 ` [PATCH v3 12/13] mm/huge_memory: add and use normal_or_softleaf_folio_pmd() Lorenzo Stoakes (Oracle)
2026-03-23 11:24   ` Lorenzo Stoakes (Oracle)
2026-03-20 18:07 ` [PATCH v3 13/13] mm/huge_memory: add and use has_deposited_pgtable() Lorenzo Stoakes (Oracle)
2026-03-23 11:45   ` Lorenzo Stoakes (Oracle)
2026-03-23 12:25     ` Lorenzo Stoakes (Oracle)
2026-03-20 18:42 ` [PATCH v3 00/13] mm/huge_memory: refactor zap_huge_pmd() Andrew Morton
2026-03-23 12:08 ` Lorenzo Stoakes (Oracle)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox