* [PATCH 0/8] mm/huge_memory: refactor zap_huge_pmd()
@ 2026-03-18 20:39 Lorenzo Stoakes (Oracle)
2026-03-18 20:39 ` [PATCH 1/8] mm/huge_memory: simplify vma_is_specal_huge() Lorenzo Stoakes (Oracle)
` (7 more replies)
0 siblings, 8 replies; 22+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-18 20:39 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
The zap_huge_pmd() function is overly complicated, clean it up and also add
an assert in the case that we encounter a buggy PMD entry that doesn't
match expectations.
This is motivated by a bug discovered [0] where the PMD entry was none of:
- A non-DAX, PFN or mixed map.
- The huge zero folio
- A present PMD entry
- A softleaf entry
In zap_huge_pmd(), but due to the bug we manged to reach this code.
It is useful to explicitly call this out rather than have an arbitrary NULL
pointer dereference happen, which also improves understanding of what's
going on.
[0]:https://lore.kernel.org/all/6b3d7ad7-49e1-407a-903d-3103704160d8@lucifer.local/
Lorenzo Stoakes (Oracle) (8):
mm/huge_memory: simplify vma_is_specal_huge()
mm/huge: avoid big else branch in zap_huge_pmd()
mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc
mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
mm/huge_memory: add a common exit path to zap_huge_pmd()
mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE()
mm/huge_memory: deduplicate zap deposited table call
mm/huge_memory: deduplicate zap_huge_pmd() further by tracking state
include/linux/huge_mm.h | 8 +--
include/linux/mm.h | 16 -----
mm/huge_memory.c | 145 +++++++++++++++++++++++-----------------
3 files changed, 89 insertions(+), 80 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/8] mm/huge_memory: simplify vma_is_specal_huge()
2026-03-18 20:39 [PATCH 0/8] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
@ 2026-03-18 20:39 ` Lorenzo Stoakes (Oracle)
2026-03-18 20:45 ` David Hildenbrand (Arm)
2026-03-19 3:16 ` Qi Zheng
2026-03-18 20:39 ` [PATCH 2/8] mm/huge: avoid big else branch in zap_huge_pmd() Lorenzo Stoakes (Oracle)
` (6 subsequent siblings)
7 siblings, 2 replies; 22+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-18 20:39 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
This function is confused - it overloads the term 'special' yet again,
checks for DAX but in many cases the code explicitly excludes DAX before
invoking the predicate.
It also unnecessarily checks for vma->vm_file - this has to be present for
a driver to have set VMA_MIXEDMAP_BIT or VMA_PFNMAP_BIT.
In fact, a far simpler form of this is to reverse the DAX predicate and
return false if DAX is set.
This makes sense from the point of view of 'special' as in
vm_normal_page(), as DAX actually does potentially have retrievable folios.
Also there's no need to have this in mm.h so move it to huge_memory.c.
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
include/linux/huge_mm.h | 4 ++--
include/linux/mm.h | 16 ----------------
mm/huge_memory.c | 30 +++++++++++++++++++++++-------
3 files changed, 25 insertions(+), 25 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index bd7f0e1d8094..61fda1672b29 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -83,7 +83,7 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr;
* file is never split and the MAX_PAGECACHE_ORDER limit does not apply to
* it. Same to PFNMAPs where there's neither page* nor pagecache.
*/
-#define THP_ORDERS_ALL_SPECIAL \
+#define THP_ORDERS_ALL_SPECIAL_DAX \
(BIT(PMD_ORDER) | BIT(PUD_ORDER))
#define THP_ORDERS_ALL_FILE_DEFAULT \
((BIT(MAX_PAGECACHE_ORDER + 1) - 1) & ~BIT(0))
@@ -92,7 +92,7 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr;
* Mask of all large folio orders supported for THP.
*/
#define THP_ORDERS_ALL \
- (THP_ORDERS_ALL_ANON | THP_ORDERS_ALL_SPECIAL | THP_ORDERS_ALL_FILE_DEFAULT)
+ (THP_ORDERS_ALL_ANON | THP_ORDERS_ALL_SPECIAL_DAX | THP_ORDERS_ALL_FILE_DEFAULT)
enum tva_type {
TVA_SMAPS, /* Exposing "THPeligible:" in smaps. */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6f0a3edb24e1..50d68b092204 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -5077,22 +5077,6 @@ long copy_folio_from_user(struct folio *dst_folio,
const void __user *usr_src,
bool allow_pagefault);
-/**
- * vma_is_special_huge - Are transhuge page-table entries considered special?
- * @vma: Pointer to the struct vm_area_struct to consider
- *
- * Whether transhuge page-table entries are considered "special" following
- * the definition in vm_normal_page().
- *
- * Return: true if transhuge page-table entries should be considered special,
- * false otherwise.
- */
-static inline bool vma_is_special_huge(const struct vm_area_struct *vma)
-{
- return vma_is_dax(vma) || (vma->vm_file &&
- (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)));
-}
-
#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
#if MAX_NUMNODES > 1
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3fc02913b63e..f76edfa91e96 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -100,6 +100,14 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
}
+/* If returns true, we are unable to access the VMA's folios. */
+static bool vma_is_special_huge(struct vm_area_struct *vma)
+{
+ if (vma_is_dax(vma))
+ return false;
+ return vma_test_any(vma, VMA_PFNMAP_BIT, VMA_MIXEDMAP_BIT);
+}
+
unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
vm_flags_t vm_flags,
enum tva_type type,
@@ -113,8 +121,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
/* Check the intersection of requested and supported orders. */
if (vma_is_anonymous(vma))
supported_orders = THP_ORDERS_ALL_ANON;
- else if (vma_is_special_huge(vma))
- supported_orders = THP_ORDERS_ALL_SPECIAL;
+ else if (vma_is_dax(vma) || vma_is_special_huge(vma))
+ supported_orders = THP_ORDERS_ALL_SPECIAL_DAX;
else
supported_orders = THP_ORDERS_ALL_FILE_DEFAULT;
@@ -2431,7 +2439,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
tlb->fullmm);
arch_check_zapped_pmd(vma, orig_pmd);
tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
- if (!vma_is_dax(vma) && vma_is_special_huge(vma)) {
+ if (vma_is_special_huge(vma)) {
if (arch_needs_pgtable_deposit())
zap_deposited_table(tlb->mm, pmd);
spin_unlock(ptl);
@@ -2933,7 +2941,7 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma,
orig_pud = pudp_huge_get_and_clear_full(vma, addr, pud, tlb->fullmm);
arch_check_zapped_pud(vma, orig_pud);
tlb_remove_pud_tlb_entry(tlb, pud, addr);
- if (!vma_is_dax(vma) && vma_is_special_huge(vma)) {
+ if (vma_is_special_huge(vma)) {
spin_unlock(ptl);
/* No zero page support yet */
} else {
@@ -3084,7 +3092,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
*/
if (arch_needs_pgtable_deposit())
zap_deposited_table(mm, pmd);
- if (!vma_is_dax(vma) && vma_is_special_huge(vma))
+ if (vma_is_special_huge(vma))
return;
if (unlikely(pmd_is_migration_entry(old_pmd))) {
const softleaf_t old_entry = softleaf_from_pmd(old_pmd);
@@ -4645,8 +4653,16 @@ static void split_huge_pages_all(void)
static inline bool vma_not_suitable_for_thp_split(struct vm_area_struct *vma)
{
- return vma_is_special_huge(vma) || (vma->vm_flags & VM_IO) ||
- is_vm_hugetlb_page(vma);
+ if (vma_is_dax(vma))
+ return true;
+ if (vma_is_special_huge(vma))
+ return true;
+ if (vma_test(vma, VMA_IO_BIT))
+ return true;
+ if (is_vm_hugetlb_page(vma))
+ return true;
+
+ return false;
}
static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
--
2.53.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/8] mm/huge: avoid big else branch in zap_huge_pmd()
2026-03-18 20:39 [PATCH 0/8] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-18 20:39 ` [PATCH 1/8] mm/huge_memory: simplify vma_is_specal_huge() Lorenzo Stoakes (Oracle)
@ 2026-03-18 20:39 ` Lorenzo Stoakes (Oracle)
2026-03-19 3:26 ` Qi Zheng
2026-03-19 6:38 ` Baolin Wang
2026-03-18 20:39 ` [PATCH 3/8] mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc Lorenzo Stoakes (Oracle)
` (5 subsequent siblings)
7 siblings, 2 replies; 22+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-18 20:39 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
We don't need to have an extra level of indentation, we can simply exit
early in the first two branches.
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
mm/huge_memory.c | 87 +++++++++++++++++++++++++-----------------------
1 file changed, 45 insertions(+), 42 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f76edfa91e96..4ebe1f19341e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2421,8 +2421,10 @@ static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
pmd_t *pmd, unsigned long addr)
{
- pmd_t orig_pmd;
+ struct folio *folio = NULL;
+ int flush_needed = 1;
spinlock_t *ptl;
+ pmd_t orig_pmd;
tlb_change_page_size(tlb, HPAGE_PMD_SIZE);
@@ -2443,59 +2445,60 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
if (arch_needs_pgtable_deposit())
zap_deposited_table(tlb->mm, pmd);
spin_unlock(ptl);
- } else if (is_huge_zero_pmd(orig_pmd)) {
+ return 1;
+ }
+ if (is_huge_zero_pmd(orig_pmd)) {
if (!vma_is_dax(vma) || arch_needs_pgtable_deposit())
zap_deposited_table(tlb->mm, pmd);
spin_unlock(ptl);
- } else {
- struct folio *folio = NULL;
- int flush_needed = 1;
+ return 1;
+ }
- if (pmd_present(orig_pmd)) {
- struct page *page = pmd_page(orig_pmd);
+ if (pmd_present(orig_pmd)) {
+ struct page *page = pmd_page(orig_pmd);
- folio = page_folio(page);
- folio_remove_rmap_pmd(folio, page, vma);
- WARN_ON_ONCE(folio_mapcount(folio) < 0);
- VM_BUG_ON_PAGE(!PageHead(page), page);
- } else if (pmd_is_valid_softleaf(orig_pmd)) {
- const softleaf_t entry = softleaf_from_pmd(orig_pmd);
+ folio = page_folio(page);
+ folio_remove_rmap_pmd(folio, page, vma);
+ WARN_ON_ONCE(folio_mapcount(folio) < 0);
+ VM_BUG_ON_PAGE(!PageHead(page), page);
+ } else if (pmd_is_valid_softleaf(orig_pmd)) {
+ const softleaf_t entry = softleaf_from_pmd(orig_pmd);
- folio = softleaf_to_folio(entry);
- flush_needed = 0;
+ folio = softleaf_to_folio(entry);
+ flush_needed = 0;
- if (!thp_migration_supported())
- WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
- }
+ if (!thp_migration_supported())
+ WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
+ }
- if (folio_test_anon(folio)) {
+ if (folio_test_anon(folio)) {
+ zap_deposited_table(tlb->mm, pmd);
+ add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
+ } else {
+ if (arch_needs_pgtable_deposit())
zap_deposited_table(tlb->mm, pmd);
- add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
- } else {
- if (arch_needs_pgtable_deposit())
- zap_deposited_table(tlb->mm, pmd);
- add_mm_counter(tlb->mm, mm_counter_file(folio),
- -HPAGE_PMD_NR);
-
- /*
- * Use flush_needed to indicate whether the PMD entry
- * is present, instead of checking pmd_present() again.
- */
- if (flush_needed && pmd_young(orig_pmd) &&
- likely(vma_has_recency(vma)))
- folio_mark_accessed(folio);
- }
+ add_mm_counter(tlb->mm, mm_counter_file(folio),
+ -HPAGE_PMD_NR);
- if (folio_is_device_private(folio)) {
- folio_remove_rmap_pmd(folio, &folio->page, vma);
- WARN_ON_ONCE(folio_mapcount(folio) < 0);
- folio_put(folio);
- }
+ /*
+ * Use flush_needed to indicate whether the PMD entry
+ * is present, instead of checking pmd_present() again.
+ */
+ if (flush_needed && pmd_young(orig_pmd) &&
+ likely(vma_has_recency(vma)))
+ folio_mark_accessed(folio);
+ }
- spin_unlock(ptl);
- if (flush_needed)
- tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE);
+ if (folio_is_device_private(folio)) {
+ folio_remove_rmap_pmd(folio, &folio->page, vma);
+ WARN_ON_ONCE(folio_mapcount(folio) < 0);
+ folio_put(folio);
}
+
+ spin_unlock(ptl);
+ if (flush_needed)
+ tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE);
+
return 1;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/8] mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc
2026-03-18 20:39 [PATCH 0/8] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-18 20:39 ` [PATCH 1/8] mm/huge_memory: simplify vma_is_specal_huge() Lorenzo Stoakes (Oracle)
2026-03-18 20:39 ` [PATCH 2/8] mm/huge: avoid big else branch in zap_huge_pmd() Lorenzo Stoakes (Oracle)
@ 2026-03-18 20:39 ` Lorenzo Stoakes (Oracle)
2026-03-19 3:29 ` Qi Zheng
2026-03-19 6:41 ` Baolin Wang
2026-03-18 20:39 ` [PATCH 4/8] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() Lorenzo Stoakes (Oracle)
` (4 subsequent siblings)
7 siblings, 2 replies; 22+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-18 20:39 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
There's no need to use the ancient approach of returning an integer here,
just return a boolean.
Also update flush_needed to be a boolean, similarly.
Also add a kdoc comment describing the function.
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
include/linux/huge_mm.h | 4 ++--
mm/huge_memory.c | 23 ++++++++++++++++-------
2 files changed, 18 insertions(+), 9 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 61fda1672b29..2949e5acff35 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -27,8 +27,8 @@ static inline void huge_pud_set_accessed(struct vm_fault *vmf, pud_t orig_pud)
vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf);
bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
pmd_t *pmd, unsigned long addr, unsigned long next);
-int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd,
- unsigned long addr);
+bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd,
+ unsigned long addr);
int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pud,
unsigned long addr);
bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4ebe1f19341e..bba1ba1f6b67 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2418,11 +2418,20 @@ static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
mm_dec_nr_ptes(mm);
}
-int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
+/**
+ * zap_huge_pmd - Zap a huge THP which is of PMD size.
+ * @tlb: The MMU gather TLB state associated with the operation.
+ * @vma: The VMA containing the range to zap.
+ * @pmd: A pointer to the leaf PMD entry.
+ * @addr: The virtual address for the range to zap.
+ *
+ * Returns: %true on success, %false otherwise.
+ */
+bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
pmd_t *pmd, unsigned long addr)
{
struct folio *folio = NULL;
- int flush_needed = 1;
+ bool flush_needed = true;
spinlock_t *ptl;
pmd_t orig_pmd;
@@ -2430,7 +2439,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
ptl = __pmd_trans_huge_lock(pmd, vma);
if (!ptl)
- return 0;
+ return false;
/*
* For architectures like ppc64 we look at deposited pgtable
* when calling pmdp_huge_get_and_clear. So do the
@@ -2445,13 +2454,13 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
if (arch_needs_pgtable_deposit())
zap_deposited_table(tlb->mm, pmd);
spin_unlock(ptl);
- return 1;
+ return true;
}
if (is_huge_zero_pmd(orig_pmd)) {
if (!vma_is_dax(vma) || arch_needs_pgtable_deposit())
zap_deposited_table(tlb->mm, pmd);
spin_unlock(ptl);
- return 1;
+ return true;
}
if (pmd_present(orig_pmd)) {
@@ -2465,7 +2474,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
const softleaf_t entry = softleaf_from_pmd(orig_pmd);
folio = softleaf_to_folio(entry);
- flush_needed = 0;
+ flush_needed = false;
if (!thp_migration_supported())
WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
@@ -2499,7 +2508,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
if (flush_needed)
tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE);
- return 1;
+ return true;
}
#ifndef pmd_move_must_withdraw
--
2.53.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 4/8] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
2026-03-18 20:39 [PATCH 0/8] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
` (2 preceding siblings ...)
2026-03-18 20:39 ` [PATCH 3/8] mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc Lorenzo Stoakes (Oracle)
@ 2026-03-18 20:39 ` Lorenzo Stoakes (Oracle)
2026-03-19 7:00 ` Baolin Wang
2026-03-18 20:39 ` [PATCH 5/8] mm/huge_memory: add a common exit path to zap_huge_pmd() Lorenzo Stoakes (Oracle)
` (3 subsequent siblings)
7 siblings, 1 reply; 22+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-18 20:39 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
A recent bug I analysed [0] managed to, through a bug in the userfaultfd
implementation, reach an invalid point in the zap_huge_pmd() code where the
PMD was none of:
- A non-DAX, PFN or mixed map.
- The huge zero folio
- A present PMD entry
- A softleaf entry
The code at this point calls folio_test_anon() on a known-NULL
folio. Having logic like this explicitly NULL dereference in the code is
hard to understand, and makes debugging potentially more difficult.
Add an else branch to handle this case and WARN() and exit indicating
failure.
[0]:https://lore.kernel.org/all/6b3d7ad7-49e1-407a-903d-3103704160d8@lucifer.local/
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
mm/huge_memory.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bba1ba1f6b67..8e6b7ba11448 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2478,6 +2478,10 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
if (!thp_migration_supported())
WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
+ } else {
+ WARN_ON_ONCE(true);
+ spin_unlock(ptl);
+ return false;
}
if (folio_test_anon(folio)) {
--
2.53.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 5/8] mm/huge_memory: add a common exit path to zap_huge_pmd()
2026-03-18 20:39 [PATCH 0/8] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
` (3 preceding siblings ...)
2026-03-18 20:39 ` [PATCH 4/8] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() Lorenzo Stoakes (Oracle)
@ 2026-03-18 20:39 ` Lorenzo Stoakes (Oracle)
2026-03-18 20:39 ` [PATCH 6/8] mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() Lorenzo Stoakes (Oracle)
` (2 subsequent siblings)
7 siblings, 0 replies; 22+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-18 20:39 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
Other than when we acquire the PTL, we always need to unlock the PTL, and
optionally need to flush on exit.
The code is currently very duplicated in this respect, so default
flush_needed to false, set it true in the case in which it's required, then
share the same logic for all exit paths.
This also makes flush_needed make more sense as a function-scope value (we
don't need to flush for the PFN map/mixed map, zero huge, error cases for
instance).
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
mm/huge_memory.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8e6b7ba11448..724e1de74367 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2431,7 +2431,8 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
pmd_t *pmd, unsigned long addr)
{
struct folio *folio = NULL;
- bool flush_needed = true;
+ bool flush_needed = false;
+ bool ret = true;
spinlock_t *ptl;
pmd_t orig_pmd;
@@ -2453,19 +2454,18 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
if (vma_is_special_huge(vma)) {
if (arch_needs_pgtable_deposit())
zap_deposited_table(tlb->mm, pmd);
- spin_unlock(ptl);
- return true;
+ goto out;
}
if (is_huge_zero_pmd(orig_pmd)) {
if (!vma_is_dax(vma) || arch_needs_pgtable_deposit())
zap_deposited_table(tlb->mm, pmd);
- spin_unlock(ptl);
- return true;
+ goto out;
}
if (pmd_present(orig_pmd)) {
struct page *page = pmd_page(orig_pmd);
+ flush_needed = true;
folio = page_folio(page);
folio_remove_rmap_pmd(folio, page, vma);
WARN_ON_ONCE(folio_mapcount(folio) < 0);
@@ -2474,14 +2474,13 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
const softleaf_t entry = softleaf_from_pmd(orig_pmd);
folio = softleaf_to_folio(entry);
- flush_needed = false;
if (!thp_migration_supported())
WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
} else {
WARN_ON_ONCE(true);
- spin_unlock(ptl);
- return false;
+ ret = false;
+ goto out;
}
if (folio_test_anon(folio)) {
@@ -2508,11 +2507,11 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
folio_put(folio);
}
+out:
spin_unlock(ptl);
if (flush_needed)
tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE);
-
- return true;
+ return ret;
}
#ifndef pmd_move_must_withdraw
--
2.53.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 6/8] mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE()
2026-03-18 20:39 [PATCH 0/8] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
` (4 preceding siblings ...)
2026-03-18 20:39 ` [PATCH 5/8] mm/huge_memory: add a common exit path to zap_huge_pmd() Lorenzo Stoakes (Oracle)
@ 2026-03-18 20:39 ` Lorenzo Stoakes (Oracle)
2026-03-19 7:12 ` Baolin Wang
2026-03-18 20:39 ` [PATCH 7/8] mm/huge_memory: deduplicate zap deposited table call Lorenzo Stoakes (Oracle)
2026-03-18 20:39 ` [PATCH 8/8] mm/huge_memory: deduplicate zap_huge_pmd() further by tracking state Lorenzo Stoakes (Oracle)
7 siblings, 1 reply; 22+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-18 20:39 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
This has been around since the beginnings of the THP implementation. I
think we can safely assume that, if we have a THP folio, it will have a
head page.
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
mm/huge_memory.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 724e1de74367..015f6d679d26 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2469,7 +2469,6 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
folio = page_folio(page);
folio_remove_rmap_pmd(folio, page, vma);
WARN_ON_ONCE(folio_mapcount(folio) < 0);
- VM_BUG_ON_PAGE(!PageHead(page), page);
} else if (pmd_is_valid_softleaf(orig_pmd)) {
const softleaf_t entry = softleaf_from_pmd(orig_pmd);
--
2.53.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 7/8] mm/huge_memory: deduplicate zap deposited table call
2026-03-18 20:39 [PATCH 0/8] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
` (5 preceding siblings ...)
2026-03-18 20:39 ` [PATCH 6/8] mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() Lorenzo Stoakes (Oracle)
@ 2026-03-18 20:39 ` Lorenzo Stoakes (Oracle)
2026-03-18 20:39 ` [PATCH 8/8] mm/huge_memory: deduplicate zap_huge_pmd() further by tracking state Lorenzo Stoakes (Oracle)
7 siblings, 0 replies; 22+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-18 20:39 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
Rather than having separate logic for each case determining whether to zap
the deposited table, simply track this via a boolean.
We check separately if the architecture requires it.
Also use pmd_folio() direct in the present case.
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
mm/huge_memory.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 015f6d679d26..bcc74b0172fa 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2432,6 +2432,7 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
{
struct folio *folio = NULL;
bool flush_needed = false;
+ bool needs_deposit = false;
bool ret = true;
spinlock_t *ptl;
pmd_t orig_pmd;
@@ -2451,23 +2452,18 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
tlb->fullmm);
arch_check_zapped_pmd(vma, orig_pmd);
tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
- if (vma_is_special_huge(vma)) {
- if (arch_needs_pgtable_deposit())
- zap_deposited_table(tlb->mm, pmd);
+ if (vma_is_special_huge(vma))
goto out;
- }
if (is_huge_zero_pmd(orig_pmd)) {
- if (!vma_is_dax(vma) || arch_needs_pgtable_deposit())
- zap_deposited_table(tlb->mm, pmd);
+ needs_deposit = !vma_is_dax(vma);
goto out;
}
if (pmd_present(orig_pmd)) {
- struct page *page = pmd_page(orig_pmd);
+ folio = pmd_folio(orig_pmd);
flush_needed = true;
- folio = page_folio(page);
- folio_remove_rmap_pmd(folio, page, vma);
+ folio_remove_rmap_pmd(folio, &folio->page, vma);
WARN_ON_ONCE(folio_mapcount(folio) < 0);
} else if (pmd_is_valid_softleaf(orig_pmd)) {
const softleaf_t entry = softleaf_from_pmd(orig_pmd);
@@ -2483,11 +2479,9 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
}
if (folio_test_anon(folio)) {
- zap_deposited_table(tlb->mm, pmd);
+ needs_deposit = true;
add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
} else {
- if (arch_needs_pgtable_deposit())
- zap_deposited_table(tlb->mm, pmd);
add_mm_counter(tlb->mm, mm_counter_file(folio),
-HPAGE_PMD_NR);
@@ -2507,6 +2501,9 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
}
out:
+ if (arch_needs_pgtable_deposit() || needs_deposit)
+ zap_deposited_table(tlb->mm, pmd);
+
spin_unlock(ptl);
if (flush_needed)
tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE);
--
2.53.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 8/8] mm/huge_memory: deduplicate zap_huge_pmd() further by tracking state
2026-03-18 20:39 [PATCH 0/8] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
` (6 preceding siblings ...)
2026-03-18 20:39 ` [PATCH 7/8] mm/huge_memory: deduplicate zap deposited table call Lorenzo Stoakes (Oracle)
@ 2026-03-18 20:39 ` Lorenzo Stoakes (Oracle)
7 siblings, 0 replies; 22+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-18 20:39 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
The flush_needed boolean is really tracking whether a PMD entry is present,
so use it that way directly and rename it to is_present.
Deduplicate the folio_remove_rmap_pmd() and folio map count warning between
present and device private by tracking where we need to remove the rmap.
We can also remove the comment about using flush_needed to track whether a
PMD entry is present as it's now irrelevant.
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
---
mm/huge_memory.c | 28 +++++++++++++---------------
1 file changed, 13 insertions(+), 15 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index bcc74b0172fa..f6caa6d35659 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2430,9 +2430,10 @@ static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
pmd_t *pmd, unsigned long addr)
{
+ bool needs_remove_rmap = false;
struct folio *folio = NULL;
- bool flush_needed = false;
bool needs_deposit = false;
+ bool is_present = false;
bool ret = true;
spinlock_t *ptl;
pmd_t orig_pmd;
@@ -2450,6 +2451,7 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
*/
orig_pmd = pmdp_huge_get_and_clear_full(vma, addr, pmd,
tlb->fullmm);
+
arch_check_zapped_pmd(vma, orig_pmd);
tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
if (vma_is_special_huge(vma))
@@ -2459,17 +2461,15 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
goto out;
}
- if (pmd_present(orig_pmd)) {
+ is_present = pmd_present(orig_pmd);
+ if (is_present) {
folio = pmd_folio(orig_pmd);
-
- flush_needed = true;
- folio_remove_rmap_pmd(folio, &folio->page, vma);
- WARN_ON_ONCE(folio_mapcount(folio) < 0);
+ needs_remove_rmap = true;
} else if (pmd_is_valid_softleaf(orig_pmd)) {
const softleaf_t entry = softleaf_from_pmd(orig_pmd);
folio = softleaf_to_folio(entry);
-
+ needs_remove_rmap = folio_is_device_private(folio);
if (!thp_migration_supported())
WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
} else {
@@ -2485,27 +2485,25 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
add_mm_counter(tlb->mm, mm_counter_file(folio),
-HPAGE_PMD_NR);
- /*
- * Use flush_needed to indicate whether the PMD entry
- * is present, instead of checking pmd_present() again.
- */
- if (flush_needed && pmd_young(orig_pmd) &&
+ if (is_present && pmd_young(orig_pmd) &&
likely(vma_has_recency(vma)))
folio_mark_accessed(folio);
}
- if (folio_is_device_private(folio)) {
+ if (needs_remove_rmap) {
folio_remove_rmap_pmd(folio, &folio->page, vma);
WARN_ON_ONCE(folio_mapcount(folio) < 0);
- folio_put(folio);
}
out:
if (arch_needs_pgtable_deposit() || needs_deposit)
zap_deposited_table(tlb->mm, pmd);
+ if (needs_remove_rmap && !is_present)
+ folio_put(folio);
+
spin_unlock(ptl);
- if (flush_needed)
+ if (is_present)
tlb_remove_page_size(tlb, &folio->page, HPAGE_PMD_SIZE);
return ret;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 1/8] mm/huge_memory: simplify vma_is_specal_huge()
2026-03-18 20:39 ` [PATCH 1/8] mm/huge_memory: simplify vma_is_specal_huge() Lorenzo Stoakes (Oracle)
@ 2026-03-18 20:45 ` David Hildenbrand (Arm)
2026-03-19 10:34 ` Lorenzo Stoakes (Oracle)
2026-03-19 3:16 ` Qi Zheng
1 sibling, 1 reply; 22+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-18 20:45 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle), Andrew Morton
Cc: Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache, Ryan Roberts,
Dev Jain, Barry Song, Lance Yang, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel
> +++ b/mm/huge_memory.c
> @@ -100,6 +100,14 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> }
>
> +/* If returns true, we are unable to access the VMA's folios. */
> +static bool vma_is_special_huge(struct vm_area_struct *vma)
> +{
> + if (vma_is_dax(vma))
> + return false;
> + return vma_test_any(vma, VMA_PFNMAP_BIT, VMA_MIXEDMAP_BIT);
> +}
I was hoping that we could make this whole code look more like PTE code
by using vm_normal_page_pmd() ... :)
--
Cheers,
David
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/8] mm/huge_memory: simplify vma_is_specal_huge()
2026-03-18 20:39 ` [PATCH 1/8] mm/huge_memory: simplify vma_is_specal_huge() Lorenzo Stoakes (Oracle)
2026-03-18 20:45 ` David Hildenbrand (Arm)
@ 2026-03-19 3:16 ` Qi Zheng
2026-03-19 10:39 ` Lorenzo Stoakes (Oracle)
1 sibling, 1 reply; 22+ messages in thread
From: Qi Zheng @ 2026-03-19 3:16 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel, Andrew Morton
On 3/19/26 4:39 AM, Lorenzo Stoakes (Oracle) wrote:
> This function is confused - it overloads the term 'special' yet again,
> checks for DAX but in many cases the code explicitly excludes DAX before
> invoking the predicate.
>
> It also unnecessarily checks for vma->vm_file - this has to be present for
> a driver to have set VMA_MIXEDMAP_BIT or VMA_PFNMAP_BIT.
>
> In fact, a far simpler form of this is to reverse the DAX predicate and
> return false if DAX is set.
>
> This makes sense from the point of view of 'special' as in
> vm_normal_page(), as DAX actually does potentially have retrievable folios.
>
> Also there's no need to have this in mm.h so move it to huge_memory.c.
>
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
> include/linux/huge_mm.h | 4 ++--
> include/linux/mm.h | 16 ----------------
> mm/huge_memory.c | 30 +++++++++++++++++++++++-------
> 3 files changed, 25 insertions(+), 25 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index bd7f0e1d8094..61fda1672b29 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -83,7 +83,7 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr;
> * file is never split and the MAX_PAGECACHE_ORDER limit does not apply to
> * it. Same to PFNMAPs where there's neither page* nor pagecache.
> */
> -#define THP_ORDERS_ALL_SPECIAL \
> +#define THP_ORDERS_ALL_SPECIAL_DAX \
As mentioned in the comments, the pfnmap case is also include in the
'special' case, right?
> (BIT(PMD_ORDER) | BIT(PUD_ORDER))
> #define THP_ORDERS_ALL_FILE_DEFAULT \
> ((BIT(MAX_PAGECACHE_ORDER + 1) - 1) & ~BIT(0))
> @@ -92,7 +92,7 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr;
> * Mask of all large folio orders supported for THP.
> */
> #define THP_ORDERS_ALL \
> - (THP_ORDERS_ALL_ANON | THP_ORDERS_ALL_SPECIAL | THP_ORDERS_ALL_FILE_DEFAULT)
> + (THP_ORDERS_ALL_ANON | THP_ORDERS_ALL_SPECIAL_DAX | THP_ORDERS_ALL_FILE_DEFAULT)
>
> enum tva_type {
> TVA_SMAPS, /* Exposing "THPeligible:" in smaps. */
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6f0a3edb24e1..50d68b092204 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -5077,22 +5077,6 @@ long copy_folio_from_user(struct folio *dst_folio,
> const void __user *usr_src,
> bool allow_pagefault);
>
> -/**
> - * vma_is_special_huge - Are transhuge page-table entries considered special?
> - * @vma: Pointer to the struct vm_area_struct to consider
> - *
> - * Whether transhuge page-table entries are considered "special" following
> - * the definition in vm_normal_page().
> - *
> - * Return: true if transhuge page-table entries should be considered special,
> - * false otherwise.
> - */
> -static inline bool vma_is_special_huge(const struct vm_area_struct *vma)
> -{
> - return vma_is_dax(vma) || (vma->vm_file &&
> - (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)));
> -}
> -
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
>
> #if MAX_NUMNODES > 1
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 3fc02913b63e..f76edfa91e96 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -100,6 +100,14 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> }
>
> +/* If returns true, we are unable to access the VMA's folios. */
> +static bool vma_is_special_huge(struct vm_area_struct *vma)
> +{
> + if (vma_is_dax(vma))
> + return false;
> + return vma_test_any(vma, VMA_PFNMAP_BIT, VMA_MIXEDMAP_BIT);
> +}
> +
> unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> vm_flags_t vm_flags,
> enum tva_type type,
> @@ -113,8 +121,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> /* Check the intersection of requested and supported orders. */
> if (vma_is_anonymous(vma))
> supported_orders = THP_ORDERS_ALL_ANON;
> - else if (vma_is_special_huge(vma))
> - supported_orders = THP_ORDERS_ALL_SPECIAL;
> + else if (vma_is_dax(vma) || vma_is_special_huge(vma))
> + supported_orders = THP_ORDERS_ALL_SPECIAL_DAX;
> else
> supported_orders = THP_ORDERS_ALL_FILE_DEFAULT;
>
> @@ -2431,7 +2439,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> tlb->fullmm);
> arch_check_zapped_pmd(vma, orig_pmd);
> tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
> - if (!vma_is_dax(vma) && vma_is_special_huge(vma)) {
> + if (vma_is_special_huge(vma)) {
> if (arch_needs_pgtable_deposit())
> zap_deposited_table(tlb->mm, pmd);
> spin_unlock(ptl);
> @@ -2933,7 +2941,7 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma,
> orig_pud = pudp_huge_get_and_clear_full(vma, addr, pud, tlb->fullmm);
> arch_check_zapped_pud(vma, orig_pud);
> tlb_remove_pud_tlb_entry(tlb, pud, addr);
> - if (!vma_is_dax(vma) && vma_is_special_huge(vma)) {
> + if (vma_is_special_huge(vma)) {
> spin_unlock(ptl);
> /* No zero page support yet */
> } else {
> @@ -3084,7 +3092,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> */
> if (arch_needs_pgtable_deposit())
> zap_deposited_table(mm, pmd);
> - if (!vma_is_dax(vma) && vma_is_special_huge(vma))
> + if (vma_is_special_huge(vma))
> return;
> if (unlikely(pmd_is_migration_entry(old_pmd))) {
> const softleaf_t old_entry = softleaf_from_pmd(old_pmd);
> @@ -4645,8 +4653,16 @@ static void split_huge_pages_all(void)
>
> static inline bool vma_not_suitable_for_thp_split(struct vm_area_struct *vma)
> {
> - return vma_is_special_huge(vma) || (vma->vm_flags & VM_IO) ||
> - is_vm_hugetlb_page(vma);
> + if (vma_is_dax(vma))
> + return true;
> + if (vma_is_special_huge(vma))
> + return true;
> + if (vma_test(vma, VMA_IO_BIT))
> + return true;
> + if (is_vm_hugetlb_page(vma))
> + return true;
> +
> + return false;
> }
>
> static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/8] mm/huge: avoid big else branch in zap_huge_pmd()
2026-03-18 20:39 ` [PATCH 2/8] mm/huge: avoid big else branch in zap_huge_pmd() Lorenzo Stoakes (Oracle)
@ 2026-03-19 3:26 ` Qi Zheng
2026-03-19 6:38 ` Baolin Wang
1 sibling, 0 replies; 22+ messages in thread
From: Qi Zheng @ 2026-03-19 3:26 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel, Andrew Morton
On 3/19/26 4:39 AM, Lorenzo Stoakes (Oracle) wrote:
> We don't need to have an extra level of indentation, we can simply exit
> early in the first two branches.
>
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
> mm/huge_memory.c | 87 +++++++++++++++++++++++++-----------------------
> 1 file changed, 45 insertions(+), 42 deletions(-)
>
The code here isn't too complex, otherwise we can introduce the
function like do_zap_pte_range().
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/8] mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc
2026-03-18 20:39 ` [PATCH 3/8] mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc Lorenzo Stoakes (Oracle)
@ 2026-03-19 3:29 ` Qi Zheng
2026-03-19 6:41 ` Baolin Wang
1 sibling, 0 replies; 22+ messages in thread
From: Qi Zheng @ 2026-03-19 3:29 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel, Andrew Morton
On 3/19/26 4:39 AM, Lorenzo Stoakes (Oracle) wrote:
> There's no need to use the ancient approach of returning an integer here,
> just return a boolean.
>
> Also update flush_needed to be a boolean, similarly.
>
> Also add a kdoc comment describing the function.
>
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
> include/linux/huge_mm.h | 4 ++--
> mm/huge_memory.c | 23 ++++++++++++++++-------
> 2 files changed, 18 insertions(+), 9 deletions(-)
>
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/8] mm/huge: avoid big else branch in zap_huge_pmd()
2026-03-18 20:39 ` [PATCH 2/8] mm/huge: avoid big else branch in zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-19 3:26 ` Qi Zheng
@ 2026-03-19 6:38 ` Baolin Wang
1 sibling, 0 replies; 22+ messages in thread
From: Baolin Wang @ 2026-03-19 6:38 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle), Andrew Morton
Cc: David Hildenbrand, Zi Yan, Liam R . Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, linux-mm,
linux-kernel
On 3/19/26 4:39 AM, Lorenzo Stoakes (Oracle) wrote:
> We don't need to have an extra level of indentation, we can simply exit
> early in the first two branches.
>
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
LGTM.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/8] mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc
2026-03-18 20:39 ` [PATCH 3/8] mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc Lorenzo Stoakes (Oracle)
2026-03-19 3:29 ` Qi Zheng
@ 2026-03-19 6:41 ` Baolin Wang
1 sibling, 0 replies; 22+ messages in thread
From: Baolin Wang @ 2026-03-19 6:41 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle), Andrew Morton
Cc: David Hildenbrand, Zi Yan, Liam R . Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, linux-mm,
linux-kernel
On 3/19/26 4:39 AM, Lorenzo Stoakes (Oracle) wrote:
> There's no need to use the ancient approach of returning an integer here,
> just return a boolean.
>
> Also update flush_needed to be a boolean, similarly.
>
> Also add a kdoc comment describing the function.
>
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
Make sense to me.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/8] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
2026-03-18 20:39 ` [PATCH 4/8] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() Lorenzo Stoakes (Oracle)
@ 2026-03-19 7:00 ` Baolin Wang
2026-03-19 10:58 ` Lorenzo Stoakes (Oracle)
0 siblings, 1 reply; 22+ messages in thread
From: Baolin Wang @ 2026-03-19 7:00 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle), Andrew Morton
Cc: David Hildenbrand, Zi Yan, Liam R . Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, linux-mm,
linux-kernel
On 3/19/26 4:39 AM, Lorenzo Stoakes (Oracle) wrote:
> A recent bug I analysed [0] managed to, through a bug in the userfaultfd
> implementation, reach an invalid point in the zap_huge_pmd() code where the
> PMD was none of:
>
> - A non-DAX, PFN or mixed map.
> - The huge zero folio
> - A present PMD entry
> - A softleaf entry
>
> The code at this point calls folio_test_anon() on a known-NULL
> folio. Having logic like this explicitly NULL dereference in the code is
> hard to understand, and makes debugging potentially more difficult.
>
> Add an else branch to handle this case and WARN() and exit indicating
> failure.
>
> [0]:https://lore.kernel.org/all/6b3d7ad7-49e1-407a-903d-3103704160d8@lucifer.local/
>
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
> mm/huge_memory.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index bba1ba1f6b67..8e6b7ba11448 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2478,6 +2478,10 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>
> if (!thp_migration_supported())
> WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
> + } else {
> + WARN_ON_ONCE(true);
> + spin_unlock(ptl);
The warning looks reasonable to me, but ...
> + return false;
IIUC, if we return false here, the caller zap_pmd_range() will fall back
to call zap_pte_range(). Since pmd_trans_huge(pmd) returns true,
zap_pte_range() will simply return 'addr', causing an infinite loop in
zap_pmd_range(), right?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 6/8] mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE()
2026-03-18 20:39 ` [PATCH 6/8] mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() Lorenzo Stoakes (Oracle)
@ 2026-03-19 7:12 ` Baolin Wang
0 siblings, 0 replies; 22+ messages in thread
From: Baolin Wang @ 2026-03-19 7:12 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle), Andrew Morton
Cc: David Hildenbrand, Zi Yan, Liam R . Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, linux-mm,
linux-kernel
On 3/19/26 4:39 AM, Lorenzo Stoakes (Oracle) wrote:
> This has been around since the beginnings of the THP implementation. I
> think we can safely assume that, if we have a THP folio, it will have a
> head page.
>
> Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> ---
Yes, it is under the PTL, so the PMD entry should be stable. I also
can't think of any case where it can be not a head page.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/8] mm/huge_memory: simplify vma_is_specal_huge()
2026-03-18 20:45 ` David Hildenbrand (Arm)
@ 2026-03-19 10:34 ` Lorenzo Stoakes (Oracle)
2026-03-19 13:03 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 22+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-19 10:34 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Andrew Morton, Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, linux-mm,
linux-kernel
On Wed, Mar 18, 2026 at 09:45:31PM +0100, David Hildenbrand (Arm) wrote:
> > +++ b/mm/huge_memory.c
> > @@ -100,6 +100,14 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> > return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> > }
> >
> > +/* If returns true, we are unable to access the VMA's folios. */
> > +static bool vma_is_special_huge(struct vm_area_struct *vma)
> > +{
> > + if (vma_is_dax(vma))
> > + return false;
> > + return vma_test_any(vma, VMA_PFNMAP_BIT, VMA_MIXEDMAP_BIT);
> > +}
>
> I was hoping that we could make this whole code look more like PTE code
> by using vm_normal_page_pmd() ... :)
Could that be a follow up?
I did want to do something where we get the folio and use that to figure things
out, but there's a lot of special cases esp. around DAX that make that tricky.
>
> --
> Cheers,
>
> David
Thanks, Lorenzo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/8] mm/huge_memory: simplify vma_is_specal_huge()
2026-03-19 3:16 ` Qi Zheng
@ 2026-03-19 10:39 ` Lorenzo Stoakes (Oracle)
0 siblings, 0 replies; 22+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-19 10:39 UTC (permalink / raw)
To: Qi Zheng
Cc: David Hildenbrand, Zi Yan, Baolin Wang, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel, Andrew Morton
On Thu, Mar 19, 2026 at 11:16:20AM +0800, Qi Zheng wrote:
>
>
> On 3/19/26 4:39 AM, Lorenzo Stoakes (Oracle) wrote:
> > This function is confused - it overloads the term 'special' yet again,
> > checks for DAX but in many cases the code explicitly excludes DAX before
> > invoking the predicate.
> >
> > It also unnecessarily checks for vma->vm_file - this has to be present for
> > a driver to have set VMA_MIXEDMAP_BIT or VMA_PFNMAP_BIT.
> >
> > In fact, a far simpler form of this is to reverse the DAX predicate and
> > return false if DAX is set.
> >
> > This makes sense from the point of view of 'special' as in
> > vm_normal_page(), as DAX actually does potentially have retrievable folios.
> >
> > Also there's no need to have this in mm.h so move it to huge_memory.c.
> >
> > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > ---
> > include/linux/huge_mm.h | 4 ++--
> > include/linux/mm.h | 16 ----------------
> > mm/huge_memory.c | 30 +++++++++++++++++++++++-------
> > 3 files changed, 25 insertions(+), 25 deletions(-)
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index bd7f0e1d8094..61fda1672b29 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -83,7 +83,7 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr;
> > * file is never split and the MAX_PAGECACHE_ORDER limit does not apply to
> > * it. Same to PFNMAPs where there's neither page* nor pagecache.
> > */
> > -#define THP_ORDERS_ALL_SPECIAL \
> > +#define THP_ORDERS_ALL_SPECIAL_DAX \
>
> As mentioned in the comments, the pfnmap case is also include in the
> 'special' case, right?
Yeah special = pfnmap, mixedmap. so renaming to SPECIAL_DAX to make clear it's
either dax or 'special' in the meaning of vm_normal_page().
>
> > (BIT(PMD_ORDER) | BIT(PUD_ORDER))
> > #define THP_ORDERS_ALL_FILE_DEFAULT \
> > ((BIT(MAX_PAGECACHE_ORDER + 1) - 1) & ~BIT(0))
> > @@ -92,7 +92,7 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr;
> > * Mask of all large folio orders supported for THP.
> > */
> > #define THP_ORDERS_ALL \
> > - (THP_ORDERS_ALL_ANON | THP_ORDERS_ALL_SPECIAL | THP_ORDERS_ALL_FILE_DEFAULT)
> > + (THP_ORDERS_ALL_ANON | THP_ORDERS_ALL_SPECIAL_DAX | THP_ORDERS_ALL_FILE_DEFAULT)
> > enum tva_type {
> > TVA_SMAPS, /* Exposing "THPeligible:" in smaps. */
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 6f0a3edb24e1..50d68b092204 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -5077,22 +5077,6 @@ long copy_folio_from_user(struct folio *dst_folio,
> > const void __user *usr_src,
> > bool allow_pagefault);
> > -/**
> > - * vma_is_special_huge - Are transhuge page-table entries considered special?
> > - * @vma: Pointer to the struct vm_area_struct to consider
> > - *
> > - * Whether transhuge page-table entries are considered "special" following
> > - * the definition in vm_normal_page().
> > - *
> > - * Return: true if transhuge page-table entries should be considered special,
> > - * false otherwise.
> > - */
> > -static inline bool vma_is_special_huge(const struct vm_area_struct *vma)
> > -{
> > - return vma_is_dax(vma) || (vma->vm_file &&
> > - (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)));
> > -}
> > -
> > #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
> > #if MAX_NUMNODES > 1
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 3fc02913b63e..f76edfa91e96 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -100,6 +100,14 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> > return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> > }
> > +/* If returns true, we are unable to access the VMA's folios. */
> > +static bool vma_is_special_huge(struct vm_area_struct *vma)
> > +{
> > + if (vma_is_dax(vma))
> > + return false;
> > + return vma_test_any(vma, VMA_PFNMAP_BIT, VMA_MIXEDMAP_BIT);
> > +}
> > +
> > unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> > vm_flags_t vm_flags,
> > enum tva_type type,
> > @@ -113,8 +121,8 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> > /* Check the intersection of requested and supported orders. */
> > if (vma_is_anonymous(vma))
> > supported_orders = THP_ORDERS_ALL_ANON;
> > - else if (vma_is_special_huge(vma))
> > - supported_orders = THP_ORDERS_ALL_SPECIAL;
> > + else if (vma_is_dax(vma) || vma_is_special_huge(vma))
> > + supported_orders = THP_ORDERS_ALL_SPECIAL_DAX;
> > else
> > supported_orders = THP_ORDERS_ALL_FILE_DEFAULT;
> > @@ -2431,7 +2439,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> > tlb->fullmm);
> > arch_check_zapped_pmd(vma, orig_pmd);
> > tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
> > - if (!vma_is_dax(vma) && vma_is_special_huge(vma)) {
> > + if (vma_is_special_huge(vma)) {
> > if (arch_needs_pgtable_deposit())
> > zap_deposited_table(tlb->mm, pmd);
> > spin_unlock(ptl);
> > @@ -2933,7 +2941,7 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma,
> > orig_pud = pudp_huge_get_and_clear_full(vma, addr, pud, tlb->fullmm);
> > arch_check_zapped_pud(vma, orig_pud);
> > tlb_remove_pud_tlb_entry(tlb, pud, addr);
> > - if (!vma_is_dax(vma) && vma_is_special_huge(vma)) {
> > + if (vma_is_special_huge(vma)) {
> > spin_unlock(ptl);
> > /* No zero page support yet */
> > } else {
> > @@ -3084,7 +3092,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
> > */
> > if (arch_needs_pgtable_deposit())
> > zap_deposited_table(mm, pmd);
> > - if (!vma_is_dax(vma) && vma_is_special_huge(vma))
> > + if (vma_is_special_huge(vma))
> > return;
> > if (unlikely(pmd_is_migration_entry(old_pmd))) {
> > const softleaf_t old_entry = softleaf_from_pmd(old_pmd);
> > @@ -4645,8 +4653,16 @@ static void split_huge_pages_all(void)
> > static inline bool vma_not_suitable_for_thp_split(struct vm_area_struct *vma)
> > {
> > - return vma_is_special_huge(vma) || (vma->vm_flags & VM_IO) ||
> > - is_vm_hugetlb_page(vma);
> > + if (vma_is_dax(vma))
> > + return true;
> > + if (vma_is_special_huge(vma))
> > + return true;
> > + if (vma_test(vma, VMA_IO_BIT))
> > + return true;
> > + if (is_vm_hugetlb_page(vma))
> > + return true;
> > +
> > + return false;
> > }
> > static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/8] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
2026-03-19 7:00 ` Baolin Wang
@ 2026-03-19 10:58 ` Lorenzo Stoakes (Oracle)
0 siblings, 0 replies; 22+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-19 10:58 UTC (permalink / raw)
To: Baolin Wang
Cc: Andrew Morton, David Hildenbrand, Zi Yan, Liam R . Howlett,
Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, linux-kernel
On Thu, Mar 19, 2026 at 03:00:17PM +0800, Baolin Wang wrote:
>
>
> On 3/19/26 4:39 AM, Lorenzo Stoakes (Oracle) wrote:
> > A recent bug I analysed [0] managed to, through a bug in the userfaultfd
> > implementation, reach an invalid point in the zap_huge_pmd() code where the
> > PMD was none of:
> >
> > - A non-DAX, PFN or mixed map.
> > - The huge zero folio
> > - A present PMD entry
> > - A softleaf entry
> >
> > The code at this point calls folio_test_anon() on a known-NULL
> > folio. Having logic like this explicitly NULL dereference in the code is
> > hard to understand, and makes debugging potentially more difficult.
> >
> > Add an else branch to handle this case and WARN() and exit indicating
> > failure.
> >
> > [0]:https://lore.kernel.org/all/6b3d7ad7-49e1-407a-903d-3103704160d8@lucifer.local/
> >
> > Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
> > ---
> > mm/huge_memory.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index bba1ba1f6b67..8e6b7ba11448 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2478,6 +2478,10 @@ bool zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> > if (!thp_migration_supported())
> > WARN_ONCE(1, "Non present huge pmd without pmd migration enabled!");
> > + } else {
> > + WARN_ON_ONCE(true);
> > + spin_unlock(ptl);
>
> The warning looks reasonable to me, but ...
>
> > + return false;
>
> IIUC, if we return false here, the caller zap_pmd_range() will fall back to
> call zap_pte_range(). Since pmd_trans_huge(pmd) returns true,
> zap_pte_range() will simply return 'addr', causing an infinite loop in
> zap_pmd_range(), right?
You mean because:
start_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
if (!pte)
return addr;
In any case it looks like it degrades a false to potentially carrying on forever:
addr = zap_pte_range(tlb, vma, pmd, addr, next, details);
if (addr != next)
pmd--;
So yeah, this should be a true, annoyingly.
Will fix thanks!
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/8] mm/huge_memory: simplify vma_is_specal_huge()
2026-03-19 10:34 ` Lorenzo Stoakes (Oracle)
@ 2026-03-19 13:03 ` David Hildenbrand (Arm)
2026-03-19 14:07 ` Lorenzo Stoakes (Oracle)
0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-19 13:03 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: Andrew Morton, Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, linux-mm,
linux-kernel
On 3/19/26 11:34, Lorenzo Stoakes (Oracle) wrote:
> On Wed, Mar 18, 2026 at 09:45:31PM +0100, David Hildenbrand (Arm) wrote:
>>> +++ b/mm/huge_memory.c
>>> @@ -100,6 +100,14 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
>>> return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
>>> }
>>>
>>> +/* If returns true, we are unable to access the VMA's folios. */
>>> +static bool vma_is_special_huge(struct vm_area_struct *vma)
>>> +{
>>> + if (vma_is_dax(vma))
>>> + return false;
>>> + return vma_test_any(vma, VMA_PFNMAP_BIT, VMA_MIXEDMAP_BIT);
>>> +}
>>
>> I was hoping that we could make this whole code look more like PTE code
>> by using vm_normal_page_pmd() ... :)
>
> Could that be a follow up?
I guess so, but the code is so awful :(
--
Cheers,
David
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/8] mm/huge_memory: simplify vma_is_specal_huge()
2026-03-19 13:03 ` David Hildenbrand (Arm)
@ 2026-03-19 14:07 ` Lorenzo Stoakes (Oracle)
0 siblings, 0 replies; 22+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-03-19 14:07 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Andrew Morton, Zi Yan, Baolin Wang, Liam R . Howlett, Nico Pache,
Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, linux-mm,
linux-kernel
On Thu, Mar 19, 2026 at 02:03:04PM +0100, David Hildenbrand (Arm) wrote:
> On 3/19/26 11:34, Lorenzo Stoakes (Oracle) wrote:
> > On Wed, Mar 18, 2026 at 09:45:31PM +0100, David Hildenbrand (Arm) wrote:
> >>> +++ b/mm/huge_memory.c
> >>> @@ -100,6 +100,14 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma)
> >>> return !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode);
> >>> }
> >>>
> >>> +/* If returns true, we are unable to access the VMA's folios. */
> >>> +static bool vma_is_special_huge(struct vm_area_struct *vma)
> >>> +{
> >>> + if (vma_is_dax(vma))
> >>> + return false;
> >>> + return vma_test_any(vma, VMA_PFNMAP_BIT, VMA_MIXEDMAP_BIT);
> >>> +}
> >>
> >> I was hoping that we could make this whole code look more like PTE code
> >> by using vm_normal_page_pmd() ... :)
> >
> > Could that be a follow up?
>
> I guess so, but the code is so awful :(
well I udated the code to use vm_normal_folio_pmd() :) at least should get
us closer to common logic.
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2026-03-19 14:07 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-18 20:39 [PATCH 0/8] mm/huge_memory: refactor zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-18 20:39 ` [PATCH 1/8] mm/huge_memory: simplify vma_is_specal_huge() Lorenzo Stoakes (Oracle)
2026-03-18 20:45 ` David Hildenbrand (Arm)
2026-03-19 10:34 ` Lorenzo Stoakes (Oracle)
2026-03-19 13:03 ` David Hildenbrand (Arm)
2026-03-19 14:07 ` Lorenzo Stoakes (Oracle)
2026-03-19 3:16 ` Qi Zheng
2026-03-19 10:39 ` Lorenzo Stoakes (Oracle)
2026-03-18 20:39 ` [PATCH 2/8] mm/huge: avoid big else branch in zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-19 3:26 ` Qi Zheng
2026-03-19 6:38 ` Baolin Wang
2026-03-18 20:39 ` [PATCH 3/8] mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc Lorenzo Stoakes (Oracle)
2026-03-19 3:29 ` Qi Zheng
2026-03-19 6:41 ` Baolin Wang
2026-03-18 20:39 ` [PATCH 4/8] mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-19 7:00 ` Baolin Wang
2026-03-19 10:58 ` Lorenzo Stoakes (Oracle)
2026-03-18 20:39 ` [PATCH 5/8] mm/huge_memory: add a common exit path to zap_huge_pmd() Lorenzo Stoakes (Oracle)
2026-03-18 20:39 ` [PATCH 6/8] mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() Lorenzo Stoakes (Oracle)
2026-03-19 7:12 ` Baolin Wang
2026-03-18 20:39 ` [PATCH 7/8] mm/huge_memory: deduplicate zap deposited table call Lorenzo Stoakes (Oracle)
2026-03-18 20:39 ` [PATCH 8/8] mm/huge_memory: deduplicate zap_huge_pmd() further by tracking state Lorenzo Stoakes (Oracle)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox