Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] mm: convert to walk_page_range_vma() to eliminate find_vma()
@ 2026-06-18  9:28 Kefeng Wang
  2026-06-18  9:28 ` [PATCH v2 1/4] mm: mincore: use walk_page_range_vma() in do_mincore() Kefeng Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Kefeng Wang @ 2026-06-18  9:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Liam R. Howlett, Lorenzo Stoakes,
	Vlastimil Babka, Suren Baghdasaryan, linux-mm, Kefeng Wang

walk_page_range() performs a find_vma() lookup on each page table walk.
For callers that already hold a valid VMA and operate on a known
single-VMA range, this lookup is redundant. Replace walk_page_range()
with walk_page_range_vma() where the caller guarantees single-VMA
semantics.

v2:
- Address comments from Zi and David
  - per-vma optimization is separated out
  - Remove unneeded prot_none_test()
  - Fix some spells and collect ACK/RB

Kefeng Wang (4):
  mm: mincore: use walk_page_range_vma() in do_mincore()
  mm: mprotect: use walk_page_range_vma() in mprotect_fixup()
  mm: mlock: use walk_page_range_vma() in mlock_vma_pages_range()
  mm: migrate_device: use walk_page_range_vma() in migrate_vma_collect()

 mm/migrate_device.c |  2 +-
 mm/mincore.c        | 16 +++++++++++++++-
 mm/mlock.c          |  2 +-
 mm/mprotect.c       |  9 +--------
 4 files changed, 18 insertions(+), 11 deletions(-)

-- 
2.27.0



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/4] mm: mincore: use walk_page_range_vma() in do_mincore()
  2026-06-18  9:28 [PATCH v2 0/4] mm: convert to walk_page_range_vma() to eliminate find_vma() Kefeng Wang
@ 2026-06-18  9:28 ` Kefeng Wang
  2026-06-18 11:34   ` David Hildenbrand (Arm)
  2026-06-18 11:49   ` Pedro Falcato
  2026-06-18  9:28 ` [PATCH v2 2/4] mm: mprotect: use walk_page_range_vma() in mprotect_fixup() Kefeng Wang
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 13+ messages in thread
From: Kefeng Wang @ 2026-06-18  9:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Liam R. Howlett, Lorenzo Stoakes,
	Vlastimil Babka, Suren Baghdasaryan, linux-mm, Kefeng Wang

The do_mincore() uses walk_page_range() to walk the page table.
Fortunately, the caller always passes start/end that falls within
a single VMA, so it's safe to use the walk_page_range_vma() in
do_mincore() to eliminate an unnecessary find_vma() lookup.

Unlike walk_page_range(), walk_page_range_vma() does not call
walk_page_test(), which handles VM_PFNMAP by invoking ->pte_hole()
to skip the page table walk. Without this check, PFNMAP PTEs
would be treated as present by mincore_pte_range(), changing
the returned residency status. Handle VM_PFNMAP explicitly in
do_mincore() to preserve the original behavior.

Acked-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/mincore.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/mm/mincore.c b/mm/mincore.c
index 296f2e3922b5..0c6731ae6c4d 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -259,7 +259,21 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v
 		memset(vec, 1, pages);
 		return pages;
 	}
-	err = walk_page_range(vma->vm_mm, addr, end, &mincore_walk_ops, vec);
+
+	/*
+	 * walk_page_range_vma() does not call walk_page_test(), which
+	 * handles VM_PFNMAP VMA by invoking ->pte_hole() to skip the
+	 * page table walk. Without this check, PFNMAP PTEs would be
+	 * treated as present by mincore_pte_range(), changing the returned
+	 * residency status from the historical "not resident" to "resident".
+	 * Handle VM_PFNMAP explicitly to preserve the original behavior.
+	 */
+	if (vma->vm_flags & VM_PFNMAP) {
+		__mincore_unmapped_range(addr, end, vma, vec);
+		return (end - addr) >> PAGE_SHIFT;
+	}
+
+	err = walk_page_range_vma(vma, addr, end, &mincore_walk_ops, vec);
 	if (err < 0)
 		return err;
 	return (end - addr) >> PAGE_SHIFT;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/4] mm: mprotect: use walk_page_range_vma() in mprotect_fixup()
  2026-06-18  9:28 [PATCH v2 0/4] mm: convert to walk_page_range_vma() to eliminate find_vma() Kefeng Wang
  2026-06-18  9:28 ` [PATCH v2 1/4] mm: mincore: use walk_page_range_vma() in do_mincore() Kefeng Wang
@ 2026-06-18  9:28 ` Kefeng Wang
  2026-06-18 11:52   ` Pedro Falcato
  2026-06-18  9:28 ` [PATCH v2 3/4] mm: mlock: use walk_page_range_vma() in mlock_vma_pages_range() Kefeng Wang
  2026-06-18  9:28 ` [PATCH v2 4/4] mm: migrate_device: use walk_page_range_vma() in migrate_vma_collect() Kefeng Wang
  3 siblings, 1 reply; 13+ messages in thread
From: Kefeng Wang @ 2026-06-18  9:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Liam R. Howlett, Lorenzo Stoakes,
	Vlastimil Babka, Suren Baghdasaryan, linux-mm, Kefeng Wang

In mprotect_fixup(), the PROT_NONE PFN permission check uses
walk_page_range() to walk the page table. Fortunately, the caller
always passes start/end that falls within a single VMA, the
do_mprotect_pkey() iterates per-VMA via for_each_vma_range(),
and setup_arg_pages() passes the whole VMA.

Note, walk_page_test() isn't called in walk_page_range_vma(),
however, prot_none_test() in prot_none_walk_ops always return 0,
so it's safe to replace walk_page_range() with walk_page_range_vma()
to eliminate an unnecessary find_vma() lookup, also remove
unneeded prot_none_test() too.

Reviewed-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/mprotect.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 9cbf932b028c..b1595450e241 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -708,16 +708,9 @@ static int prot_none_hugetlb_entry(pte_t *pte, unsigned long hmask,
 		0 : -EACCES;
 }
 
-static int prot_none_test(unsigned long addr, unsigned long next,
-			  struct mm_walk *walk)
-{
-	return 0;
-}
-
 static const struct mm_walk_ops prot_none_walk_ops = {
 	.pte_entry		= prot_none_pte_entry,
 	.hugetlb_entry		= prot_none_hugetlb_entry,
-	.test_walk		= prot_none_test,
 	.walk_lock		= PGWALK_WRLOCK,
 };
 
@@ -753,7 +746,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
 	    !vma_flags_test_any_mask(&new_vma_flags, VMA_ACCESS_FLAGS)) {
 		pgprot_t new_pgprot = vm_get_page_prot(newflags);
 
-		error = walk_page_range(current->mm, start, end,
+		error = walk_page_range_vma(vma, start, end,
 				&prot_none_walk_ops, &new_pgprot);
 		if (error)
 			return error;
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/4] mm: mlock: use walk_page_range_vma() in mlock_vma_pages_range()
  2026-06-18  9:28 [PATCH v2 0/4] mm: convert to walk_page_range_vma() to eliminate find_vma() Kefeng Wang
  2026-06-18  9:28 ` [PATCH v2 1/4] mm: mincore: use walk_page_range_vma() in do_mincore() Kefeng Wang
  2026-06-18  9:28 ` [PATCH v2 2/4] mm: mprotect: use walk_page_range_vma() in mprotect_fixup() Kefeng Wang
@ 2026-06-18  9:28 ` Kefeng Wang
  2026-06-18 11:53   ` Pedro Falcato
  2026-06-18  9:28 ` [PATCH v2 4/4] mm: migrate_device: use walk_page_range_vma() in migrate_vma_collect() Kefeng Wang
  3 siblings, 1 reply; 13+ messages in thread
From: Kefeng Wang @ 2026-06-18  9:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Liam R. Howlett, Lorenzo Stoakes,
	Vlastimil Babka, Suren Baghdasaryan, linux-mm, Kefeng Wang

The mlock_vma_pages_range() uses walk_page_range() to walk the
page table. Fortunately, the caller always passes start/end that
falls within a single VMA, apply_vma_lock_flags() iterates per-VMA,
and apply_mlockall_flags() passes the whole VMA.

Since there is no .test_walk in mlock_walk_ops and VM_PFNMAP
was filtered by vma_supports_mlock(), it's safe to replace
walk_page_range() with walk_page_range_vma() to eliminate an
unnecessary find_vma() lookup.

Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/mlock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index 8c227fefa2df..97e49038d8d3 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -446,7 +446,7 @@ static void mlock_vma_pages_range(struct vm_area_struct *vma,
 	vma_flags_reset_once(vma, new_vma_flags);
 
 	lru_add_drain();
-	walk_page_range(vma->vm_mm, start, end, &mlock_walk_ops, NULL);
+	walk_page_range_vma(vma, start, end, &mlock_walk_ops, NULL);
 	lru_add_drain();
 
 	if (vma_flags_test(new_vma_flags, VMA_IO_BIT)) {
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 4/4] mm: migrate_device: use walk_page_range_vma() in migrate_vma_collect()
  2026-06-18  9:28 [PATCH v2 0/4] mm: convert to walk_page_range_vma() to eliminate find_vma() Kefeng Wang
                   ` (2 preceding siblings ...)
  2026-06-18  9:28 ` [PATCH v2 3/4] mm: mlock: use walk_page_range_vma() in mlock_vma_pages_range() Kefeng Wang
@ 2026-06-18  9:28 ` Kefeng Wang
  2026-06-18 11:53   ` Pedro Falcato
  3 siblings, 1 reply; 13+ messages in thread
From: Kefeng Wang @ 2026-06-18  9:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Zi Yan, Liam R. Howlett, Lorenzo Stoakes,
	Vlastimil Babka, Suren Baghdasaryan, linux-mm, Kefeng Wang

The migrate_vma_collect() uses walk_page_range() to walk the page
table. Fortunately, migrate_vma_setup() already validates that the
entire range falls within a single VMA.

Since there is no .test_walk in migrate_vma_walk_ops and VM_PFNMAP
was filtered by migrate_vma_setup(), it's safe to replace
walk_page_range() with walk_page_range_vma() to eliminate an
unnecessary find_vma() lookup.

Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/migrate_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 554754eb26ff..ae39173d6a0e 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -513,7 +513,7 @@ static void migrate_vma_collect(struct migrate_vma *migrate)
 		migrate->pgmap_owner);
 	mmu_notifier_invalidate_range_start(&range);
 
-	walk_page_range(migrate->vma->vm_mm, migrate->start, migrate->end,
+	walk_page_range_vma(migrate->vma, migrate->start, migrate->end,
 			&migrate_vma_walk_ops, migrate);
 
 	mmu_notifier_invalidate_range_end(&range);
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/4] mm: mincore: use walk_page_range_vma() in do_mincore()
  2026-06-18  9:28 ` [PATCH v2 1/4] mm: mincore: use walk_page_range_vma() in do_mincore() Kefeng Wang
@ 2026-06-18 11:34   ` David Hildenbrand (Arm)
  2026-06-18 11:49   ` Pedro Falcato
  1 sibling, 0 replies; 13+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-18 11:34 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton
  Cc: Zi Yan, Liam R. Howlett, Lorenzo Stoakes, Vlastimil Babka,
	Suren Baghdasaryan, linux-mm

On 6/18/26 11:28, Kefeng Wang wrote:
> The do_mincore() uses walk_page_range() to walk the page table.
> Fortunately, the caller always passes start/end that falls within
> a single VMA, so it's safe to use the walk_page_range_vma() in
> do_mincore() to eliminate an unnecessary find_vma() lookup.
> 
> Unlike walk_page_range(), walk_page_range_vma() does not call
> walk_page_test(), which handles VM_PFNMAP by invoking ->pte_hole()
> to skip the page table walk. Without this check, PFNMAP PTEs
> would be treated as present by mincore_pte_range(), changing
> the returned residency status. Handle VM_PFNMAP explicitly in
> do_mincore() to preserve the original behavior.
> 
> Acked-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  mm/mincore.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/mincore.c b/mm/mincore.c
> index 296f2e3922b5..0c6731ae6c4d 100644
> --- a/mm/mincore.c
> +++ b/mm/mincore.c
> @@ -259,7 +259,21 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v
>  		memset(vec, 1, pages);
>  		return pages;
>  	}
> -	err = walk_page_range(vma->vm_mm, addr, end, &mincore_walk_ops, vec);
> +
> +	/*
> +	 * walk_page_range_vma() does not call walk_page_test(), which
> +	 * handles VM_PFNMAP VMA by invoking ->pte_hole() to skip the
> +	 * page table walk. Without this check, PFNMAP PTEs would be
> +	 * treated as present by mincore_pte_range(), changing the returned
> +	 * residency status from the historical "not resident" to "resident".
> +	 * Handle VM_PFNMAP explicitly to preserve the original behavior.
> +	 */
> +	if (vma->vm_flags & VM_PFNMAP) {
> +		__mincore_unmapped_range(addr, end, vma, vec);
> +		return (end - addr) >> PAGE_SHIFT;
> +	}

This is fine to leave behavior unchanged for now.

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

We could consider removing the special handling in a separate
patch, though. Would just do the right thing IMHO, and it's hard to believe that
someone depends on pages in VM_PFNMAP to *not* be present.

(could even contain anonymous memory!)

So I would suggest to ahve a follow-up patch where we remove that special
handling and see if anyone screams (nobody will).

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/4] mm: mincore: use walk_page_range_vma() in do_mincore()
  2026-06-18  9:28 ` [PATCH v2 1/4] mm: mincore: use walk_page_range_vma() in do_mincore() Kefeng Wang
  2026-06-18 11:34   ` David Hildenbrand (Arm)
@ 2026-06-18 11:49   ` Pedro Falcato
  2026-06-18 12:58     ` David Hildenbrand (Arm)
  2026-06-18 13:01     ` Kefeng Wang
  1 sibling, 2 replies; 13+ messages in thread
From: Pedro Falcato @ 2026-06-18 11:49 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, David Hildenbrand, Zi Yan, Liam R. Howlett,
	Lorenzo Stoakes, Vlastimil Babka, Suren Baghdasaryan, linux-mm

Please CC reviewers properly!

On Thu, Jun 18, 2026 at 05:28:42PM +0800, Kefeng Wang wrote:
> The do_mincore() uses walk_page_range() to walk the page table.
> Fortunately, the caller always passes start/end that falls within
> a single VMA, so it's safe to use the walk_page_range_vma() in
> do_mincore() to eliminate an unnecessary find_vma() lookup.
> 
> Unlike walk_page_range(), walk_page_range_vma() does not call
> walk_page_test(), which handles VM_PFNMAP by invoking ->pte_hole()

Why not? Can we fix that instead? I really don't like having this open
coded in callers. Are there callers of walk_page_range_vma() that expect
to look at PFNMAP mappings as well? From what I can see, the callers all
seem to operate on folios (and/or anonymous memory).

> to skip the page table walk. Without this check, PFNMAP PTEs
> would be treated as present by mincore_pte_range(), changing
> the returned residency status. Handle VM_PFNMAP explicitly in
> do_mincore() to preserve the original behavior.
> 
> Acked-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  mm/mincore.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/mincore.c b/mm/mincore.c
> index 296f2e3922b5..0c6731ae6c4d 100644
> --- a/mm/mincore.c
> +++ b/mm/mincore.c
> @@ -259,7 +259,21 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v
>  		memset(vec, 1, pages);
>  		return pages;
>  	}
> -	err = walk_page_range(vma->vm_mm, addr, end, &mincore_walk_ops, vec);
> +
> +	/*
> +	 * walk_page_range_vma() does not call walk_page_test(), which
> +	 * handles VM_PFNMAP VMA by invoking ->pte_hole() to skip the
> +	 * page table walk. Without this check, PFNMAP PTEs would be
> +	 * treated as present by mincore_pte_range(), changing the returned
> +	 * residency status from the historical "not resident" to "resident".
> +	 * Handle VM_PFNMAP explicitly to preserve the original behavior.
> +	 */

This whole comment looks poised to rot very very quickly.

> +	if (vma->vm_flags & VM_PFNMAP) {
> +		__mincore_unmapped_range(addr, end, vma, vec);
> +		return (end - addr) >> PAGE_SHIFT;
> +	}
> +
> +	err = walk_page_range_vma(vma, addr, end, &mincore_walk_ops, vec);
>  	if (err < 0)
>  		return err;
>  	return (end - addr) >> PAGE_SHIFT;
> -- 
> 2.27.0
> 
> 
> 

-- 
Pedro


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/4] mm: mprotect: use walk_page_range_vma() in mprotect_fixup()
  2026-06-18  9:28 ` [PATCH v2 2/4] mm: mprotect: use walk_page_range_vma() in mprotect_fixup() Kefeng Wang
@ 2026-06-18 11:52   ` Pedro Falcato
  0 siblings, 0 replies; 13+ messages in thread
From: Pedro Falcato @ 2026-06-18 11:52 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, David Hildenbrand, Zi Yan, Liam R. Howlett,
	Lorenzo Stoakes, Vlastimil Babka, Suren Baghdasaryan, linux-mm

On Thu, Jun 18, 2026 at 05:28:43PM +0800, Kefeng Wang wrote:
> In mprotect_fixup(), the PROT_NONE PFN permission check uses
> walk_page_range() to walk the page table. Fortunately, the caller
> always passes start/end that falls within a single VMA, the
> do_mprotect_pkey() iterates per-VMA via for_each_vma_range(),
> and setup_arg_pages() passes the whole VMA.
> 
> Note, walk_page_test() isn't called in walk_page_range_vma(),
> however, prot_none_test() in prot_none_walk_ops always return 0,
> so it's safe to replace walk_page_range() with walk_page_range_vma()
> to eliminate an unnecessary find_vma() lookup, also remove
> unneeded prot_none_test() too.

Again, I strongly prefer walk_page_range_vma() to be consistent with
walk_page_range() and others. But the change itself (apart from that)
is fairly uncontroversial, LGTM.

Reviewed-by: Pedro Falcato <pfalcato@suse.de>

> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  mm/mprotect.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 9cbf932b028c..b1595450e241 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -708,16 +708,9 @@ static int prot_none_hugetlb_entry(pte_t *pte, unsigned long hmask,
>  		0 : -EACCES;
>  }
>  
> -static int prot_none_test(unsigned long addr, unsigned long next,
> -			  struct mm_walk *walk)
> -{
> -	return 0;
> -}
> -
>  static const struct mm_walk_ops prot_none_walk_ops = {
>  	.pte_entry		= prot_none_pte_entry,
>  	.hugetlb_entry		= prot_none_hugetlb_entry,
> -	.test_walk		= prot_none_test,
>  	.walk_lock		= PGWALK_WRLOCK,
>  };
>  
> @@ -753,7 +746,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
>  	    !vma_flags_test_any_mask(&new_vma_flags, VMA_ACCESS_FLAGS)) {
>  		pgprot_t new_pgprot = vm_get_page_prot(newflags);
>  
> -		error = walk_page_range(current->mm, start, end,
> +		error = walk_page_range_vma(vma, start, end,
>  				&prot_none_walk_ops, &new_pgprot);
>  		if (error)
>  			return error;
> -- 
> 2.27.0
> 
> 
> 

-- 
Pedro


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 3/4] mm: mlock: use walk_page_range_vma() in mlock_vma_pages_range()
  2026-06-18  9:28 ` [PATCH v2 3/4] mm: mlock: use walk_page_range_vma() in mlock_vma_pages_range() Kefeng Wang
@ 2026-06-18 11:53   ` Pedro Falcato
  0 siblings, 0 replies; 13+ messages in thread
From: Pedro Falcato @ 2026-06-18 11:53 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, David Hildenbrand, Zi Yan, Liam R. Howlett,
	Lorenzo Stoakes, Vlastimil Babka, Suren Baghdasaryan, linux-mm

On Thu, Jun 18, 2026 at 05:28:44PM +0800, Kefeng Wang wrote:
> The mlock_vma_pages_range() uses walk_page_range() to walk the
> page table. Fortunately, the caller always passes start/end that
> falls within a single VMA, apply_vma_lock_flags() iterates per-VMA,
> and apply_mlockall_flags() passes the whole VMA.
> 
> Since there is no .test_walk in mlock_walk_ops and VM_PFNMAP
> was filtered by vma_supports_mlock(), it's safe to replace
> walk_page_range() with walk_page_range_vma() to eliminate an
> unnecessary find_vma() lookup.
> 
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

Reviewed-by: Pedro Falcato <pfalcato@suse.de>

-- 
Pedro


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 4/4] mm: migrate_device: use walk_page_range_vma() in migrate_vma_collect()
  2026-06-18  9:28 ` [PATCH v2 4/4] mm: migrate_device: use walk_page_range_vma() in migrate_vma_collect() Kefeng Wang
@ 2026-06-18 11:53   ` Pedro Falcato
  0 siblings, 0 replies; 13+ messages in thread
From: Pedro Falcato @ 2026-06-18 11:53 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, David Hildenbrand, Zi Yan, Liam R. Howlett,
	Lorenzo Stoakes, Vlastimil Babka, Suren Baghdasaryan, linux-mm

On Thu, Jun 18, 2026 at 05:28:45PM +0800, Kefeng Wang wrote:
> The migrate_vma_collect() uses walk_page_range() to walk the page
> table. Fortunately, migrate_vma_setup() already validates that the
> entire range falls within a single VMA.
> 
> Since there is no .test_walk in migrate_vma_walk_ops and VM_PFNMAP
> was filtered by migrate_vma_setup(), it's safe to replace
> walk_page_range() with walk_page_range_vma() to eliminate an
> unnecessary find_vma() lookup.
> 
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

Acked-by: Pedro Falcato <pfalcato@suse.de>

-- 
Pedro


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/4] mm: mincore: use walk_page_range_vma() in do_mincore()
  2026-06-18 11:49   ` Pedro Falcato
@ 2026-06-18 12:58     ` David Hildenbrand (Arm)
  2026-06-18 13:01     ` Kefeng Wang
  1 sibling, 0 replies; 13+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-18 12:58 UTC (permalink / raw)
  To: Pedro Falcato, Kefeng Wang
  Cc: Andrew Morton, Zi Yan, Liam R. Howlett, Lorenzo Stoakes,
	Vlastimil Babka, Suren Baghdasaryan, linux-mm

On 6/18/26 13:49, Pedro Falcato wrote:
> Please CC reviewers properly!
> 
> On Thu, Jun 18, 2026 at 05:28:42PM +0800, Kefeng Wang wrote:
>> The do_mincore() uses walk_page_range() to walk the page table.
>> Fortunately, the caller always passes start/end that falls within
>> a single VMA, so it's safe to use the walk_page_range_vma() in
>> do_mincore() to eliminate an unnecessary find_vma() lookup.
>>
>> Unlike walk_page_range(), walk_page_range_vma() does not call
>> walk_page_test(), which handles VM_PFNMAP by invoking ->pte_hole()
> 
> Why not? Can we fix that instead? I really don't like having this open
> coded in callers. Are there callers of walk_page_range_vma() that expect
> to look at PFNMAP mappings as well? From what I can see, the callers all
> seem to operate on folios (and/or anonymous memory).

I'd rather not.

commit c31783eeae7b22dc3f6edde7339de6112959225d
Author: David Hildenbrand <david@kernel.org>
Date:   Fri Oct 21 12:11:38 2022 +0200

    mm/pagewalk: don't trigger test_walk() in walk_page_vma()

    As Peter points out, the caller passes a single VMA and can just do that
    check itself.

    And in fact, no existing users rely on test_walk() getting called.  So
    let's just remove it and make the implementation slightly more efficient.


-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/4] mm: mincore: use walk_page_range_vma() in do_mincore()
  2026-06-18 11:49   ` Pedro Falcato
  2026-06-18 12:58     ` David Hildenbrand (Arm)
@ 2026-06-18 13:01     ` Kefeng Wang
  2026-06-18 15:02       ` Pedro Falcato
  1 sibling, 1 reply; 13+ messages in thread
From: Kefeng Wang @ 2026-06-18 13:01 UTC (permalink / raw)
  To: Pedro Falcato
  Cc: Andrew Morton, David Hildenbrand, Zi Yan, Liam R. Howlett,
	Lorenzo Stoakes, Vlastimil Babka, Suren Baghdasaryan, linux-mm



On 6/18/2026 7:49 PM, Pedro Falcato wrote:
> Please CC reviewers properly!
> 

Oh, I will put more reviewes to cc list.

> On Thu, Jun 18, 2026 at 05:28:42PM +0800, Kefeng Wang wrote:
>> The do_mincore() uses walk_page_range() to walk the page table.
>> Fortunately, the caller always passes start/end that falls within
>> a single VMA, so it's safe to use the walk_page_range_vma() in
>> do_mincore() to eliminate an unnecessary find_vma() lookup.
>>
>> Unlike walk_page_range(), walk_page_range_vma() does not call
>> walk_page_test(), which handles VM_PFNMAP by invoking ->pte_hole()
> 
> Why not? Can we fix that instead? I really don't like having this open
> coded in callers. Are there callers of walk_page_range_vma() that expect
> to look at PFNMAP mappings as well? From what I can see, the callers all
> seem to operate on folios (and/or anonymous memory).

As you said, all the other callers don't operate VM_PFNMAP, so we don't
want to add walk_page_test() into walk_page_range_vma(). This hack is to
preserve the original behavior, but as David said[1], we could add a 
follow-up patch to remove the special handling to see if anyone screams,
and this indeed changed some behaviors, so it's better to handle it with
another patch.

[1] 
https://lore.kernel.org/linux-mm/0e619d71-1c3d-4534-8376-2982c7348c31@kernel.org/ 


> 
>> to skip the page table walk. Without this check, PFNMAP PTEs
>> would be treated as present by mincore_pte_range(), changing
>> the returned residency status. Handle VM_PFNMAP explicitly in
>> do_mincore() to preserve the original behavior.
>>
>> Acked-by: Zi Yan <ziy@nvidia.com>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>   mm/mincore.c | 16 +++++++++++++++-
>>   1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/mincore.c b/mm/mincore.c
>> index 296f2e3922b5..0c6731ae6c4d 100644
>> --- a/mm/mincore.c
>> +++ b/mm/mincore.c
>> @@ -259,7 +259,21 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v
>>   		memset(vec, 1, pages);
>>   		return pages;
>>   	}
>> -	err = walk_page_range(vma->vm_mm, addr, end, &mincore_walk_ops, vec);
>> +
>> +	/*
>> +	 * walk_page_range_vma() does not call walk_page_test(), which
>> +	 * handles VM_PFNMAP VMA by invoking ->pte_hole() to skip the
>> +	 * page table walk. Without this check, PFNMAP PTEs would be
>> +	 * treated as present by mincore_pte_range(), changing the returned
>> +	 * residency status from the historical "not resident" to "resident".
>> +	 * Handle VM_PFNMAP explicitly to preserve the original behavior.
>> +	 */
> 
> This whole comment looks poised to rot very very quickly.
> 
>> +	if (vma->vm_flags & VM_PFNMAP) {
>> +		__mincore_unmapped_range(addr, end, vma, vec);
>> +		return (end - addr) >> PAGE_SHIFT;
>> +	}
>> +
>> +	err = walk_page_range_vma(vma, addr, end, &mincore_walk_ops, vec);
>>   	if (err < 0)
>>   		return err;
>>   	return (end - addr) >> PAGE_SHIFT;
>> -- 
>> 2.27.0
>>
>>
>>
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 1/4] mm: mincore: use walk_page_range_vma() in do_mincore()
  2026-06-18 13:01     ` Kefeng Wang
@ 2026-06-18 15:02       ` Pedro Falcato
  0 siblings, 0 replies; 13+ messages in thread
From: Pedro Falcato @ 2026-06-18 15:02 UTC (permalink / raw)
  To: Kefeng Wang, David Hildenbrand (Arm)
  Cc: Andrew Morton, Zi Yan, Liam R. Howlett, Lorenzo Stoakes,
	Vlastimil Babka, Suren Baghdasaryan, linux-mm

On Thu, Jun 18, 2026 at 09:01:08PM +0800, Kefeng Wang wrote:
> 
> 
> On 6/18/2026 7:49 PM, Pedro Falcato wrote:
> > Please CC reviewers properly!
> > 
> 
> Oh, I will put more reviewes to cc list.
> 
> > On Thu, Jun 18, 2026 at 05:28:42PM +0800, Kefeng Wang wrote:
> > > The do_mincore() uses walk_page_range() to walk the page table.
> > > Fortunately, the caller always passes start/end that falls within
> > > a single VMA, so it's safe to use the walk_page_range_vma() in
> > > do_mincore() to eliminate an unnecessary find_vma() lookup.
> > > 
> > > Unlike walk_page_range(), walk_page_range_vma() does not call
> > > walk_page_test(), which handles VM_PFNMAP by invoking ->pte_hole()
> > 
> > Why not? Can we fix that instead? I really don't like having this open
> > coded in callers. Are there callers of walk_page_range_vma() that expect
> > to look at PFNMAP mappings as well? From what I can see, the callers all
> > seem to operate on folios (and/or anonymous memory).
> 
> As you said, all the other callers don't operate VM_PFNMAP, so we don't
> want to add walk_page_test() into walk_page_range_vma(). This hack is to
> preserve the original behavior, but as David said[1], we could add a
> follow-up patch to remove the special handling to see if anyone screams,
> and this indeed changed some behaviors, so it's better to handle it with
> another patch.

Yes, I agree, I'm 95% sure no one is invoking mincore() on PFNMAP mappings.

So, if we're keeping this check for this patch:

> > > 
> > > diff --git a/mm/mincore.c b/mm/mincore.c
> > > index 296f2e3922b5..0c6731ae6c4d 100644
> > > --- a/mm/mincore.c
> > > +++ b/mm/mincore.c
> > > @@ -259,7 +259,21 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v
> > >   		memset(vec, 1, pages);
> > >   		return pages;
> > >   	}
> > > -	err = walk_page_range(vma->vm_mm, addr, end, &mincore_walk_ops, vec);
> > > +
> > > +	/*
> > > +	 * walk_page_range_vma() does not call walk_page_test(), which
> > > +	 * handles VM_PFNMAP VMA by invoking ->pte_hole() to skip the
> > > +	 * page table walk. Without this check, PFNMAP PTEs would be
> > > +	 * treated as present by mincore_pte_range(), changing the returned
> > > +	 * residency status from the historical "not resident" to "resident".
> > > +	 * Handle VM_PFNMAP explicitly to preserve the original behavior.
> > > +	 */

I would rather we amend this comment to something like:

	/* mincore (historically) reports PFNMAP mappings as non-resident. */

because we don't need to explain internal differences in walk_page_range
functions in a random comment in mincore. And perhaps attempt a separate
PFNMAP check removal patch as part of the series, or as a follow up (so if
it does matter, we can simply revert that patch instead of this conversion).

In any case,

Reviewed-by: Pedro Falcato <pfalcato@suse.de>

-- 
Pedro


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-06-18 15:02 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-18  9:28 [PATCH v2 0/4] mm: convert to walk_page_range_vma() to eliminate find_vma() Kefeng Wang
2026-06-18  9:28 ` [PATCH v2 1/4] mm: mincore: use walk_page_range_vma() in do_mincore() Kefeng Wang
2026-06-18 11:34   ` David Hildenbrand (Arm)
2026-06-18 11:49   ` Pedro Falcato
2026-06-18 12:58     ` David Hildenbrand (Arm)
2026-06-18 13:01     ` Kefeng Wang
2026-06-18 15:02       ` Pedro Falcato
2026-06-18  9:28 ` [PATCH v2 2/4] mm: mprotect: use walk_page_range_vma() in mprotect_fixup() Kefeng Wang
2026-06-18 11:52   ` Pedro Falcato
2026-06-18  9:28 ` [PATCH v2 3/4] mm: mlock: use walk_page_range_vma() in mlock_vma_pages_range() Kefeng Wang
2026-06-18 11:53   ` Pedro Falcato
2026-06-18  9:28 ` [PATCH v2 4/4] mm: migrate_device: use walk_page_range_vma() in migrate_vma_collect() Kefeng Wang
2026-06-18 11:53   ` Pedro Falcato

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox