linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/8] mm/gup: Unify hugetlb, speed up thp
@ 2023-06-23 14:29 Peter Xu
  2023-06-23 14:29 ` [PATCH v3 1/8] mm/hugetlb: Handle FOLL_DUMP well in follow_page_mask() Peter Xu
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Peter Xu @ 2023-06-23 14:29 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
	David Hildenbrand, peterx, Yang Shi, Andrea Arcangeli,
	Vlastimil Babka, Kirill A . Shutemov, James Houghton,
	Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe

v1: https://lore.kernel.org/r/20230613215346.1022773-1-peterx@redhat.com
v2: https://lore.kernel.org/r/20230619231044.112894-1-peterx@redhat.com

v3:
- Added R-bs and A-bs
- Patch 2:
  - s/huge_pte_write/pte_write/ [David]
  - Add back the WARN_ON_ONCE I used to have in v1; change retval to
    the real errno from try_grab_page(), rather than NULL
- Patch 3: make the page_mask calculation slightly prettier [David]
- Other small cosmetic changes

Hugetlb has a special path for slow gup that follow_page_mask() is actually
skipped completely along with faultin_page().  It's not only confusing, but
also duplicating a lot of logics that generic gup already has, making
hugetlb slightly special.

This patchset tries to dedup the logic, by first touching up the slow gup
code to be able to handle hugetlb pages correctly with the current follow
page and faultin routines (where we're mostly there.. due to 10 years ago
we did try to optimize thp, but half way done; more below), then at the
last patch drop the special path, then the hugetlb gup will always go the
generic routine too via faultin_page().

Note that hugetlb is still special for gup, mostly due to the pgtable
walking (hugetlb_walk()) that we rely on which is currently per-arch.  But
this is still one small step forward, and the diffstat might be a proof
too that this might be worthwhile.

Then for the "speed up thp" side: as a side effect, when I'm looking at the
chunk of code, I found that thp support is actually partially done.  It
doesn't mean that thp won't work for gup, but as long as **pages pointer
passed over, the optimization will be skipped too.  Patch 6 should address
that, so for thp we now get full speed gup.

For a quick number, "chrt -f 1 ./gup_test -m 512 -t -L -n 1024 -r 10" gives
me 13992.50us -> 378.50us.  Gup_test is an extreme case, but just to show
how it affects thp gups.

Patch 1-5:   prepares for the switch
Patch 6:     switchover to the new code and remove the old
Patch 7-8:   added some gup test matrix into run_vmtests.sh

Please review, thanks.

Peter Xu (8):
  mm/hugetlb: Handle FOLL_DUMP well in follow_page_mask()
  mm/hugetlb: Prepare hugetlb_follow_page_mask() for FOLL_PIN
  mm/hugetlb: Add page_mask for hugetlb_follow_page_mask()
  mm/gup: Cleanup next_page handling
  mm/gup: Accelerate thp gup even for "pages != NULL"
  mm/gup: Retire follow_hugetlb_page()
  selftests/mm: Add -a to run_vmtests.sh
  selftests/mm: Add gup test matrix in run_vmtests.sh

 fs/userfaultfd.c                          |   2 +-
 include/linux/hugetlb.h                   |  20 +-
 mm/gup.c                                  |  83 ++++---
 mm/hugetlb.c                              | 263 +++-------------------
 tools/testing/selftests/mm/run_vmtests.sh |  48 +++-
 5 files changed, 124 insertions(+), 292 deletions(-)

-- 
2.40.1



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/8] mm/hugetlb: Handle FOLL_DUMP well in follow_page_mask()
  2023-06-23 14:29 [PATCH v3 0/8] mm/gup: Unify hugetlb, speed up thp Peter Xu
@ 2023-06-23 14:29 ` Peter Xu
  2023-06-23 14:29 ` [PATCH v3 2/8] mm/hugetlb: Prepare hugetlb_follow_page_mask() for FOLL_PIN Peter Xu
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Peter Xu @ 2023-06-23 14:29 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
	David Hildenbrand, peterx, Yang Shi, Andrea Arcangeli,
	Vlastimil Babka, Kirill A . Shutemov, James Houghton,
	Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe

Firstly, the no_page_table() is meaningless for hugetlb which is a no-op
there, because a hugetlb page always satisfies:

  - vma_is_anonymous() == false
  - vma->vm_ops->fault != NULL

So we can already safely remove it in hugetlb_follow_page_mask(), alongside
with the page* variable.

Meanwhile, what we do in follow_hugetlb_page() actually makes sense for a
dump: we try to fault in the page only if the page cache is already
allocated.  Let's do the same here for follow_page_mask() on hugetlb.

It should so far has zero effect on real dumps, because that still goes
into follow_hugetlb_page().  But this may start to influence a bit on
follow_page() users who mimics a "dump page" scenario, but hopefully in a
good way.  This also paves way for unifying the hugetlb gup-slow.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/gup.c     | 9 ++-------
 mm/hugetlb.c | 9 +++++++++
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index ce14d4d28503..abcd841d94b7 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -767,7 +767,6 @@ static struct page *follow_page_mask(struct vm_area_struct *vma,
 			      struct follow_page_context *ctx)
 {
 	pgd_t *pgd;
-	struct page *page;
 	struct mm_struct *mm = vma->vm_mm;
 
 	ctx->page_mask = 0;
@@ -780,12 +779,8 @@ static struct page *follow_page_mask(struct vm_area_struct *vma,
 	 * hugetlb_follow_page_mask is only for follow_page() handling here.
 	 * Ordinary GUP uses follow_hugetlb_page for hugetlb processing.
 	 */
-	if (is_vm_hugetlb_page(vma)) {
-		page = hugetlb_follow_page_mask(vma, address, flags);
-		if (!page)
-			page = no_page_table(vma, flags);
-		return page;
-	}
+	if (is_vm_hugetlb_page(vma))
+		return hugetlb_follow_page_mask(vma, address, flags);
 
 	pgd = pgd_offset(mm, address);
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d76574425da3..f75f5e78ff0b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6498,6 +6498,15 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
 	spin_unlock(ptl);
 out_unlock:
 	hugetlb_vma_unlock_read(vma);
+
+	/*
+	 * Fixup retval for dump requests: if pagecache doesn't exist,
+	 * don't try to allocate a new page but just skip it.
+	 */
+	if (!page && (flags & FOLL_DUMP) &&
+	    !hugetlbfs_pagecache_present(h, vma, address))
+		page = ERR_PTR(-EFAULT);
+
 	return page;
 }
 
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/8] mm/hugetlb: Prepare hugetlb_follow_page_mask() for FOLL_PIN
  2023-06-23 14:29 [PATCH v3 0/8] mm/gup: Unify hugetlb, speed up thp Peter Xu
  2023-06-23 14:29 ` [PATCH v3 1/8] mm/hugetlb: Handle FOLL_DUMP well in follow_page_mask() Peter Xu
@ 2023-06-23 14:29 ` Peter Xu
  2023-06-26  8:06   ` David Hildenbrand
  2023-06-23 14:29 ` [PATCH v3 3/8] mm/hugetlb: Add page_mask for hugetlb_follow_page_mask() Peter Xu
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Peter Xu @ 2023-06-23 14:29 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
	David Hildenbrand, peterx, Yang Shi, Andrea Arcangeli,
	Vlastimil Babka, Kirill A . Shutemov, James Houghton,
	Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe

follow_page() doesn't use FOLL_PIN, meanwhile hugetlb seems to not be the
target of FOLL_WRITE either.  However add the checks.

Namely, either the need to CoW due to missing write bit, or proper
unsharing on !AnonExclusive pages over R/O pins to reject the follow page.
That brings this function closer to follow_hugetlb_page().

So we don't care before, and also for now.  But we'll care if we switch
over slow-gup to use hugetlb_follow_page_mask().  We'll also care when to
return -EMLINK properly, as that's the gup internal api to mean "we should
unshare".  Not really needed for follow page path, though.

When at it, switching the try_grab_page() to use WARN_ON_ONCE(), to be
clear that it just should never fail.  When error happens, instead of
setting page==NULL, capture the errno instead.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/hugetlb.c | 31 ++++++++++++++++++++-----------
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f75f5e78ff0b..27367edf5c72 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6462,13 +6462,7 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
 	struct page *page = NULL;
 	spinlock_t *ptl;
 	pte_t *pte, entry;
-
-	/*
-	 * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via
-	 * follow_hugetlb_page().
-	 */
-	if (WARN_ON_ONCE(flags & FOLL_PIN))
-		return NULL;
+	int ret;
 
 	hugetlb_vma_lock_read(vma);
 	pte = hugetlb_walk(vma, haddr, huge_page_size(h));
@@ -6478,8 +6472,21 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
 	ptl = huge_pte_lock(h, mm, pte);
 	entry = huge_ptep_get(pte);
 	if (pte_present(entry)) {
-		page = pte_page(entry) +
-				((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
+		page = pte_page(entry);
+
+		if ((flags & FOLL_WRITE) && !huge_pte_write(entry)) {
+			page = NULL;
+			goto out;
+		}
+
+		if (gup_must_unshare(vma, flags, page)) {
+			/* Tell the caller to do unsharing */
+			page = ERR_PTR(-EMLINK);
+			goto out;
+		}
+
+		page += ((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
+
 		/*
 		 * Note that page may be a sub-page, and with vmemmap
 		 * optimizations the page struct may be read only.
@@ -6489,8 +6496,10 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
 		 * try_grab_page() should always be able to get the page here,
 		 * because we hold the ptl lock and have verified pte_present().
 		 */
-		if (try_grab_page(page, flags)) {
-			page = NULL;
+		ret = try_grab_page(page, flags);
+
+		if (WARN_ON_ONCE(ret)) {
+			page = ERR_PTR(ret);
 			goto out;
 		}
 	}
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 3/8] mm/hugetlb: Add page_mask for hugetlb_follow_page_mask()
  2023-06-23 14:29 [PATCH v3 0/8] mm/gup: Unify hugetlb, speed up thp Peter Xu
  2023-06-23 14:29 ` [PATCH v3 1/8] mm/hugetlb: Handle FOLL_DUMP well in follow_page_mask() Peter Xu
  2023-06-23 14:29 ` [PATCH v3 2/8] mm/hugetlb: Prepare hugetlb_follow_page_mask() for FOLL_PIN Peter Xu
@ 2023-06-23 14:29 ` Peter Xu
  2023-06-23 14:29 ` [PATCH v3 4/8] mm/gup: Cleanup next_page handling Peter Xu
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Peter Xu @ 2023-06-23 14:29 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
	David Hildenbrand, peterx, Yang Shi, Andrea Arcangeli,
	Vlastimil Babka, Kirill A . Shutemov, James Houghton,
	Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe

follow_page() doesn't need it, but we'll start to need it when unifying gup
for hugetlb.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/hugetlb.h | 8 +++++---
 mm/gup.c                | 3 ++-
 mm/hugetlb.c            | 5 ++++-
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index beb7c63d2871..2e2d89e79d6c 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -131,7 +131,8 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
 int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *,
 			    struct vm_area_struct *, struct vm_area_struct *);
 struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
-				unsigned long address, unsigned int flags);
+				      unsigned long address, unsigned int flags,
+				      unsigned int *page_mask);
 long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
 			 struct page **, unsigned long *, unsigned long *,
 			 long, unsigned int, int *);
@@ -297,8 +298,9 @@ static inline void adjust_range_if_pmd_sharing_possible(
 {
 }
 
-static inline struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
-				unsigned long address, unsigned int flags)
+static inline struct page *hugetlb_follow_page_mask(
+    struct vm_area_struct *vma, unsigned long address, unsigned int flags,
+    unsigned int *page_mask)
 {
 	BUILD_BUG(); /* should never be compiled in if !CONFIG_HUGETLB_PAGE*/
 }
diff --git a/mm/gup.c b/mm/gup.c
index abcd841d94b7..9fc9271cba8d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -780,7 +780,8 @@ static struct page *follow_page_mask(struct vm_area_struct *vma,
 	 * Ordinary GUP uses follow_hugetlb_page for hugetlb processing.
 	 */
 	if (is_vm_hugetlb_page(vma))
-		return hugetlb_follow_page_mask(vma, address, flags);
+		return hugetlb_follow_page_mask(vma, address, flags,
+						&ctx->page_mask);
 
 	pgd = pgd_offset(mm, address);
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 27367edf5c72..b4973edef9f2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6454,7 +6454,8 @@ static inline bool __follow_hugetlb_must_fault(struct vm_area_struct *vma,
 }
 
 struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
-				unsigned long address, unsigned int flags)
+				      unsigned long address, unsigned int flags,
+				      unsigned int *page_mask)
 {
 	struct hstate *h = hstate_vma(vma);
 	struct mm_struct *mm = vma->vm_mm;
@@ -6502,6 +6503,8 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
 			page = ERR_PTR(ret);
 			goto out;
 		}
+
+		*page_mask = (1U << huge_page_order(h)) - 1;
 	}
 out:
 	spin_unlock(ptl);
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 4/8] mm/gup: Cleanup next_page handling
  2023-06-23 14:29 [PATCH v3 0/8] mm/gup: Unify hugetlb, speed up thp Peter Xu
                   ` (2 preceding siblings ...)
  2023-06-23 14:29 ` [PATCH v3 3/8] mm/hugetlb: Add page_mask for hugetlb_follow_page_mask() Peter Xu
@ 2023-06-23 14:29 ` Peter Xu
  2023-06-23 14:29 ` [PATCH v3 5/8] mm/gup: Accelerate thp gup even for "pages != NULL" Peter Xu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Peter Xu @ 2023-06-23 14:29 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
	David Hildenbrand, peterx, Yang Shi, Andrea Arcangeli,
	Vlastimil Babka, Kirill A . Shutemov, James Houghton,
	Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe

The only path that doesn't use generic "**pages" handling is the gate vma.
Make it use the same path, meanwhile tune the next_page label upper to
cover "**pages" handling.  This prepares for THP handling for "**pages".

Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/gup.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 9fc9271cba8d..4a00d609033e 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1124,7 +1124,7 @@ static long __get_user_pages(struct mm_struct *mm,
 			if (!vma && in_gate_area(mm, start)) {
 				ret = get_gate_page(mm, start & PAGE_MASK,
 						gup_flags, &vma,
-						pages ? &pages[i] : NULL);
+						pages ? &page : NULL);
 				if (ret)
 					goto out;
 				ctx.page_mask = 0;
@@ -1194,19 +1194,18 @@ static long __get_user_pages(struct mm_struct *mm,
 				ret = PTR_ERR(page);
 				goto out;
 			}
-
-			goto next_page;
 		} else if (IS_ERR(page)) {
 			ret = PTR_ERR(page);
 			goto out;
 		}
+next_page:
 		if (pages) {
 			pages[i] = page;
 			flush_anon_page(vma, page, start);
 			flush_dcache_page(page);
 			ctx.page_mask = 0;
 		}
-next_page:
+
 		page_increm = 1 + (~(start >> PAGE_SHIFT) & ctx.page_mask);
 		if (page_increm > nr_pages)
 			page_increm = nr_pages;
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 5/8] mm/gup: Accelerate thp gup even for "pages != NULL"
  2023-06-23 14:29 [PATCH v3 0/8] mm/gup: Unify hugetlb, speed up thp Peter Xu
                   ` (3 preceding siblings ...)
  2023-06-23 14:29 ` [PATCH v3 4/8] mm/gup: Cleanup next_page handling Peter Xu
@ 2023-06-23 14:29 ` Peter Xu
  2023-06-23 14:29 ` [PATCH v3 6/8] mm/gup: Retire follow_hugetlb_page() Peter Xu
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Peter Xu @ 2023-06-23 14:29 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
	David Hildenbrand, peterx, Yang Shi, Andrea Arcangeli,
	Vlastimil Babka, Kirill A . Shutemov, James Houghton,
	Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe

The acceleration of THP was done with ctx.page_mask, however it'll be
ignored if **pages is non-NULL.

The old optimization was introduced in 2013 in 240aadeedc4a ("mm:
accelerate mm_populate() treatment of THP pages").  It didn't explain why
we can't optimize the **pages non-NULL case.  It's possible that at that
time the major goal was for mm_populate() which should be enough back then.

Optimize thp for all cases, by properly looping over each subpage, doing
cache flushes, and boost refcounts / pincounts where needed in one go.

This can be verified using gup_test below:

  # chrt -f 1 ./gup_test -m 512 -t -L -n 1024 -r 10

Before:    13992.50 ( +-8.75%)
After:       378.50 (+-69.62%)

Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/gup.c | 51 ++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 44 insertions(+), 7 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 4a00d609033e..22e32cff9ac7 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1199,16 +1199,53 @@ static long __get_user_pages(struct mm_struct *mm,
 			goto out;
 		}
 next_page:
-		if (pages) {
-			pages[i] = page;
-			flush_anon_page(vma, page, start);
-			flush_dcache_page(page);
-			ctx.page_mask = 0;
-		}
-
 		page_increm = 1 + (~(start >> PAGE_SHIFT) & ctx.page_mask);
 		if (page_increm > nr_pages)
 			page_increm = nr_pages;
+
+		if (pages) {
+			struct page *subpage;
+			unsigned int j;
+
+			/*
+			 * This must be a large folio (and doesn't need to
+			 * be the whole folio; it can be part of it), do
+			 * the refcount work for all the subpages too.
+			 *
+			 * NOTE: here the page may not be the head page
+			 * e.g. when start addr is not thp-size aligned.
+			 * try_grab_folio() should have taken care of tail
+			 * pages.
+			 */
+			if (page_increm > 1) {
+				struct folio *folio;
+
+				/*
+				 * Since we already hold refcount on the
+				 * large folio, this should never fail.
+				 */
+				folio = try_grab_folio(page, page_increm - 1,
+						       foll_flags);
+				if (WARN_ON_ONCE(!folio)) {
+					/*
+					 * Release the 1st page ref if the
+					 * folio is problematic, fail hard.
+					 */
+					gup_put_folio(page_folio(page), 1,
+						      foll_flags);
+					ret = -EFAULT;
+					goto out;
+				}
+			}
+
+			for (j = 0; j < page_increm; j++) {
+				subpage = nth_page(page, j);
+				pages[i + j] = subpage;
+				flush_anon_page(vma, subpage, start + j * PAGE_SIZE);
+				flush_dcache_page(subpage);
+			}
+		}
+
 		i += page_increm;
 		start += page_increm * PAGE_SIZE;
 		nr_pages -= page_increm;
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 6/8] mm/gup: Retire follow_hugetlb_page()
  2023-06-23 14:29 [PATCH v3 0/8] mm/gup: Unify hugetlb, speed up thp Peter Xu
                   ` (4 preceding siblings ...)
  2023-06-23 14:29 ` [PATCH v3 5/8] mm/gup: Accelerate thp gup even for "pages != NULL" Peter Xu
@ 2023-06-23 14:29 ` Peter Xu
  2023-06-26  8:08   ` David Hildenbrand
  2023-06-23 14:29 ` [PATCH v3 7/8] selftests/mm: Add -a to run_vmtests.sh Peter Xu
  2023-06-23 14:29 ` [PATCH v3 8/8] selftests/mm: Add gup test matrix in run_vmtests.sh Peter Xu
  7 siblings, 1 reply; 17+ messages in thread
From: Peter Xu @ 2023-06-23 14:29 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
	David Hildenbrand, peterx, Yang Shi, Andrea Arcangeli,
	Vlastimil Babka, Kirill A . Shutemov, James Houghton,
	Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe

Now __get_user_pages() should be well prepared to handle thp completely,
as long as hugetlb gup requests even without the hugetlb's special path.

Time to retire follow_hugetlb_page().

Tweak misc comments to reflect reality of follow_hugetlb_page()'s removal.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 fs/userfaultfd.c        |   2 +-
 include/linux/hugetlb.h |  12 ---
 mm/gup.c                |  19 ----
 mm/hugetlb.c            | 224 ----------------------------------------
 4 files changed, 1 insertion(+), 256 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 7cecd49e078b..ae711f1d7a83 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -427,7 +427,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
 	 *
 	 * We also don't do userfault handling during
 	 * coredumping. hugetlbfs has the special
-	 * follow_hugetlb_page() to skip missing pages in the
+	 * hugetlb_follow_page_mask() to skip missing pages in the
 	 * FOLL_DUMP case, anon memory also checks for FOLL_DUMP with
 	 * the no_page_table() helper in follow_page_mask(), but the
 	 * shmem_vm_ops->fault method is invoked even during
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 2e2d89e79d6c..bb5024718fc1 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -133,9 +133,6 @@ int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *,
 struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
 				      unsigned long address, unsigned int flags,
 				      unsigned int *page_mask);
-long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
-			 struct page **, unsigned long *, unsigned long *,
-			 long, unsigned int, int *);
 void unmap_hugepage_range(struct vm_area_struct *,
 			  unsigned long, unsigned long, struct page *,
 			  zap_flags_t);
@@ -305,15 +302,6 @@ static inline struct page *hugetlb_follow_page_mask(
 	BUILD_BUG(); /* should never be compiled in if !CONFIG_HUGETLB_PAGE*/
 }
 
-static inline long follow_hugetlb_page(struct mm_struct *mm,
-			struct vm_area_struct *vma, struct page **pages,
-			unsigned long *position, unsigned long *nr_pages,
-			long i, unsigned int flags, int *nonblocking)
-{
-	BUG();
-	return 0;
-}
-
 static inline int copy_hugetlb_page_range(struct mm_struct *dst,
 					  struct mm_struct *src,
 					  struct vm_area_struct *dst_vma,
diff --git a/mm/gup.c b/mm/gup.c
index 22e32cff9ac7..ac61244e6fca 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -775,9 +775,6 @@ static struct page *follow_page_mask(struct vm_area_struct *vma,
 	 * Call hugetlb_follow_page_mask for hugetlb vmas as it will use
 	 * special hugetlb page table walking code.  This eliminates the
 	 * need to check for hugetlb entries in the general walking code.
-	 *
-	 * hugetlb_follow_page_mask is only for follow_page() handling here.
-	 * Ordinary GUP uses follow_hugetlb_page for hugetlb processing.
 	 */
 	if (is_vm_hugetlb_page(vma))
 		return hugetlb_follow_page_mask(vma, address, flags,
@@ -1138,22 +1135,6 @@ static long __get_user_pages(struct mm_struct *mm,
 			ret = check_vma_flags(vma, gup_flags);
 			if (ret)
 				goto out;
-
-			if (is_vm_hugetlb_page(vma)) {
-				i = follow_hugetlb_page(mm, vma, pages,
-							&start, &nr_pages, i,
-							gup_flags, locked);
-				if (!*locked) {
-					/*
-					 * We've got a VM_FAULT_RETRY
-					 * and we've lost mmap_lock.
-					 * We must stop here.
-					 */
-					BUG_ON(gup_flags & FOLL_NOWAIT);
-					goto out;
-				}
-				continue;
-			}
 		}
 retry:
 		/*
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b4973edef9f2..50a3579782a5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5721,7 +5721,6 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
 
 /*
  * Return whether there is a pagecache page to back given address within VMA.
- * Caller follow_hugetlb_page() holds page_table_lock so we cannot lock_page.
  */
 static bool hugetlbfs_pagecache_present(struct hstate *h,
 			struct vm_area_struct *vma, unsigned long address)
@@ -6422,37 +6421,6 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
 }
 #endif /* CONFIG_USERFAULTFD */
 
-static void record_subpages(struct page *page, struct vm_area_struct *vma,
-			    int refs, struct page **pages)
-{
-	int nr;
-
-	for (nr = 0; nr < refs; nr++) {
-		if (likely(pages))
-			pages[nr] = nth_page(page, nr);
-	}
-}
-
-static inline bool __follow_hugetlb_must_fault(struct vm_area_struct *vma,
-					       unsigned int flags, pte_t *pte,
-					       bool *unshare)
-{
-	pte_t pteval = huge_ptep_get(pte);
-
-	*unshare = false;
-	if (is_swap_pte(pteval))
-		return true;
-	if (huge_pte_write(pteval))
-		return false;
-	if (flags & FOLL_WRITE)
-		return true;
-	if (gup_must_unshare(vma, flags, pte_page(pteval))) {
-		*unshare = true;
-		return true;
-	}
-	return false;
-}
-
 struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
 				      unsigned long address, unsigned int flags,
 				      unsigned int *page_mask)
@@ -6522,198 +6490,6 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
 	return page;
 }
 
-long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
-			 struct page **pages, unsigned long *position,
-			 unsigned long *nr_pages, long i, unsigned int flags,
-			 int *locked)
-{
-	unsigned long pfn_offset;
-	unsigned long vaddr = *position;
-	unsigned long remainder = *nr_pages;
-	struct hstate *h = hstate_vma(vma);
-	int err = -EFAULT, refs;
-
-	while (vaddr < vma->vm_end && remainder) {
-		pte_t *pte;
-		spinlock_t *ptl = NULL;
-		bool unshare = false;
-		int absent;
-		struct page *page;
-
-		/*
-		 * If we have a pending SIGKILL, don't keep faulting pages and
-		 * potentially allocating memory.
-		 */
-		if (fatal_signal_pending(current)) {
-			remainder = 0;
-			break;
-		}
-
-		hugetlb_vma_lock_read(vma);
-		/*
-		 * Some archs (sparc64, sh*) have multiple pte_ts to
-		 * each hugepage.  We have to make sure we get the
-		 * first, for the page indexing below to work.
-		 *
-		 * Note that page table lock is not held when pte is null.
-		 */
-		pte = hugetlb_walk(vma, vaddr & huge_page_mask(h),
-				   huge_page_size(h));
-		if (pte)
-			ptl = huge_pte_lock(h, mm, pte);
-		absent = !pte || huge_pte_none(huge_ptep_get(pte));
-
-		/*
-		 * When coredumping, it suits get_dump_page if we just return
-		 * an error where there's an empty slot with no huge pagecache
-		 * to back it.  This way, we avoid allocating a hugepage, and
-		 * the sparse dumpfile avoids allocating disk blocks, but its
-		 * huge holes still show up with zeroes where they need to be.
-		 */
-		if (absent && (flags & FOLL_DUMP) &&
-		    !hugetlbfs_pagecache_present(h, vma, vaddr)) {
-			if (pte)
-				spin_unlock(ptl);
-			hugetlb_vma_unlock_read(vma);
-			remainder = 0;
-			break;
-		}
-
-		/*
-		 * We need call hugetlb_fault for both hugepages under migration
-		 * (in which case hugetlb_fault waits for the migration,) and
-		 * hwpoisoned hugepages (in which case we need to prevent the
-		 * caller from accessing to them.) In order to do this, we use
-		 * here is_swap_pte instead of is_hugetlb_entry_migration and
-		 * is_hugetlb_entry_hwpoisoned. This is because it simply covers
-		 * both cases, and because we can't follow correct pages
-		 * directly from any kind of swap entries.
-		 */
-		if (absent ||
-		    __follow_hugetlb_must_fault(vma, flags, pte, &unshare)) {
-			vm_fault_t ret;
-			unsigned int fault_flags = 0;
-
-			if (pte)
-				spin_unlock(ptl);
-			hugetlb_vma_unlock_read(vma);
-
-			if (flags & FOLL_WRITE)
-				fault_flags |= FAULT_FLAG_WRITE;
-			else if (unshare)
-				fault_flags |= FAULT_FLAG_UNSHARE;
-			if (locked) {
-				fault_flags |= FAULT_FLAG_ALLOW_RETRY |
-					FAULT_FLAG_KILLABLE;
-				if (flags & FOLL_INTERRUPTIBLE)
-					fault_flags |= FAULT_FLAG_INTERRUPTIBLE;
-			}
-			if (flags & FOLL_NOWAIT)
-				fault_flags |= FAULT_FLAG_ALLOW_RETRY |
-					FAULT_FLAG_RETRY_NOWAIT;
-			if (flags & FOLL_TRIED) {
-				/*
-				 * Note: FAULT_FLAG_ALLOW_RETRY and
-				 * FAULT_FLAG_TRIED can co-exist
-				 */
-				fault_flags |= FAULT_FLAG_TRIED;
-			}
-			ret = hugetlb_fault(mm, vma, vaddr, fault_flags);
-			if (ret & VM_FAULT_ERROR) {
-				err = vm_fault_to_errno(ret, flags);
-				remainder = 0;
-				break;
-			}
-			if (ret & VM_FAULT_RETRY) {
-				if (locked &&
-				    !(fault_flags & FAULT_FLAG_RETRY_NOWAIT))
-					*locked = 0;
-				*nr_pages = 0;
-				/*
-				 * VM_FAULT_RETRY must not return an
-				 * error, it will return zero
-				 * instead.
-				 *
-				 * No need to update "position" as the
-				 * caller will not check it after
-				 * *nr_pages is set to 0.
-				 */
-				return i;
-			}
-			continue;
-		}
-
-		pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
-		page = pte_page(huge_ptep_get(pte));
-
-		VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
-			       !PageAnonExclusive(page), page);
-
-		/*
-		 * If subpage information not requested, update counters
-		 * and skip the same_page loop below.
-		 */
-		if (!pages && !pfn_offset &&
-		    (vaddr + huge_page_size(h) < vma->vm_end) &&
-		    (remainder >= pages_per_huge_page(h))) {
-			vaddr += huge_page_size(h);
-			remainder -= pages_per_huge_page(h);
-			i += pages_per_huge_page(h);
-			spin_unlock(ptl);
-			hugetlb_vma_unlock_read(vma);
-			continue;
-		}
-
-		/* vaddr may not be aligned to PAGE_SIZE */
-		refs = min3(pages_per_huge_page(h) - pfn_offset, remainder,
-		    (vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT);
-
-		if (pages)
-			record_subpages(nth_page(page, pfn_offset),
-					vma, refs,
-					likely(pages) ? pages + i : NULL);
-
-		if (pages) {
-			/*
-			 * try_grab_folio() should always succeed here,
-			 * because: a) we hold the ptl lock, and b) we've just
-			 * checked that the huge page is present in the page
-			 * tables. If the huge page is present, then the tail
-			 * pages must also be present. The ptl prevents the
-			 * head page and tail pages from being rearranged in
-			 * any way. As this is hugetlb, the pages will never
-			 * be p2pdma or not longterm pinable. So this page
-			 * must be available at this point, unless the page
-			 * refcount overflowed:
-			 */
-			if (WARN_ON_ONCE(!try_grab_folio(pages[i], refs,
-							 flags))) {
-				spin_unlock(ptl);
-				hugetlb_vma_unlock_read(vma);
-				remainder = 0;
-				err = -ENOMEM;
-				break;
-			}
-		}
-
-		vaddr += (refs << PAGE_SHIFT);
-		remainder -= refs;
-		i += refs;
-
-		spin_unlock(ptl);
-		hugetlb_vma_unlock_read(vma);
-	}
-	*nr_pages = remainder;
-	/*
-	 * setting position is actually required only if remainder is
-	 * not zero but it's faster not to add a "if (remainder)"
-	 * branch.
-	 */
-	*position = vaddr;
-
-	return i ? i : err;
-}
-
 long hugetlb_change_protection(struct vm_area_struct *vma,
 		unsigned long address, unsigned long end,
 		pgprot_t newprot, unsigned long cp_flags)
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 7/8] selftests/mm: Add -a to run_vmtests.sh
  2023-06-23 14:29 [PATCH v3 0/8] mm/gup: Unify hugetlb, speed up thp Peter Xu
                   ` (5 preceding siblings ...)
  2023-06-23 14:29 ` [PATCH v3 6/8] mm/gup: Retire follow_hugetlb_page() Peter Xu
@ 2023-06-23 14:29 ` Peter Xu
  2023-06-26  8:10   ` David Hildenbrand
  2023-06-23 14:29 ` [PATCH v3 8/8] selftests/mm: Add gup test matrix in run_vmtests.sh Peter Xu
  7 siblings, 1 reply; 17+ messages in thread
From: Peter Xu @ 2023-06-23 14:29 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
	David Hildenbrand, peterx, Yang Shi, Andrea Arcangeli,
	Vlastimil Babka, Kirill A . Shutemov, James Houghton,
	Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe

Allows to specify optional tests in run_vmtests.sh, where we can run time
consuming test matrix only when user specified "-a".

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tools/testing/selftests/mm/run_vmtests.sh | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 3f26f6e15b2a..824e651f62f4 100644
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -12,11 +12,14 @@ exitcode=0
 
 usage() {
 	cat <<EOF
-usage: ${BASH_SOURCE[0]:-$0} [ -h | -t "<categories>"]
+usage: ${BASH_SOURCE[0]:-$0} [ options ]
+
+  -a: run all tests, including extra ones
   -t: specify specific categories to tests to run
   -h: display this message
 
-The default behavior is to run all tests.
+The default behavior is to run required tests only.  If -a is specified,
+will run all tests.
 
 Alternatively, specific groups tests can be run by passing a string
 to the -t argument containing one or more of the following categories
@@ -60,9 +63,11 @@ EOF
 	exit 0
 }
 
+RUN_ALL=false
 
-while getopts "ht:" OPT; do
+while getopts "aht:" OPT; do
 	case ${OPT} in
+		"a") RUN_ALL=true ;;
 		"h") usage ;;
 		"t") VM_SELFTEST_ITEMS=${OPTARG} ;;
 	esac
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 8/8] selftests/mm: Add gup test matrix in run_vmtests.sh
  2023-06-23 14:29 [PATCH v3 0/8] mm/gup: Unify hugetlb, speed up thp Peter Xu
                   ` (6 preceding siblings ...)
  2023-06-23 14:29 ` [PATCH v3 7/8] selftests/mm: Add -a to run_vmtests.sh Peter Xu
@ 2023-06-23 14:29 ` Peter Xu
  2023-06-26  8:11   ` David Hildenbrand
  7 siblings, 1 reply; 17+ messages in thread
From: Peter Xu @ 2023-06-23 14:29 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
	David Hildenbrand, peterx, Yang Shi, Andrea Arcangeli,
	Vlastimil Babka, Kirill A . Shutemov, James Houghton,
	Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe

Add a matrix for testing gup based on the current gup_test.  Only run the
matrix when -a is specified because it's a bit slow.

It covers:

  - Different types of huge pages: thp, hugetlb, or no huge page
  - Permissions: Write / Read-only
  - Fast-gup, with/without
  - Types of the GUP: pin / gup / longterm pins
  - Shared / Private memories
  - GUP size: 1 / 512 / random page sizes

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tools/testing/selftests/mm/run_vmtests.sh | 37 ++++++++++++++++++++---
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 824e651f62f4..9666c0c171ab 100644
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -90,6 +90,30 @@ test_selected() {
 	fi
 }
 
+run_gup_matrix() {
+    # -t: thp=on, -T: thp=off, -H: hugetlb=on
+    local hugetlb_mb=$(( needmem_KB / 1024 ))
+
+    for huge in -t -T "-H -m $hugetlb_mb"; do
+        # -u: gup-fast, -U: gup-basic, -a: pin-fast, -b: pin-basic, -L: pin-longterm
+        for test_cmd in -u -U -a -b -L; do
+            # -w: write=1, -W: write=0
+            for write in -w -W; do
+                # -S: shared
+                for share in -S " "; do
+                    # -n: How many pages to fetch together?  512 is special
+                    # because it's default thp size (or 2M on x86), 123 to
+                    # just test partial gup when hit a huge in whatever form
+                    for num in "-n 1" "-n 512" "-n 123"; do
+                        CATEGORY="gup_test" run_test ./gup_test \
+                                $huge $test_cmd $write $share $num
+                    done
+                done
+            done
+        done
+    done
+}
+
 # get huge pagesize and freepages from /proc/meminfo
 while read -r name size unit; do
 	if [ "$name" = "HugePages_Free:" ]; then
@@ -194,13 +218,16 @@ fi
 
 CATEGORY="mmap" run_test ./map_fixed_noreplace
 
-# get_user_pages_fast() benchmark
-CATEGORY="gup_test" run_test ./gup_test -u
-# pin_user_pages_fast() benchmark
-CATEGORY="gup_test" run_test ./gup_test -a
+if $RUN_ALL; then
+    run_gup_matrix
+else
+    # get_user_pages_fast() benchmark
+    CATEGORY="gup_test" run_test ./gup_test -u
+    # pin_user_pages_fast() benchmark
+    CATEGORY="gup_test" run_test ./gup_test -a
+fi
 # Dump pages 0, 19, and 4096, using pin_user_pages:
 CATEGORY="gup_test" run_test ./gup_test -ct -F 0x1 0 19 0x1000
-
 CATEGORY="gup_test" run_test ./gup_longterm
 
 CATEGORY="userfaultfd" run_test ./uffd-unit-tests
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/8] mm/hugetlb: Prepare hugetlb_follow_page_mask() for FOLL_PIN
  2023-06-23 14:29 ` [PATCH v3 2/8] mm/hugetlb: Prepare hugetlb_follow_page_mask() for FOLL_PIN Peter Xu
@ 2023-06-26  8:06   ` David Hildenbrand
  2023-06-26 16:23     ` Peter Xu
  0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2023-06-26  8:06 UTC (permalink / raw)
  To: Peter Xu, linux-mm, linux-kernel
  Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
	Yang Shi, Andrea Arcangeli, Vlastimil Babka, Kirill A . Shutemov,
	James Houghton, Matthew Wilcox, Mike Kravetz, Hugh Dickins,
	Jason Gunthorpe

On 23.06.23 16:29, Peter Xu wrote:
> follow_page() doesn't use FOLL_PIN, meanwhile hugetlb seems to not be the
> target of FOLL_WRITE either.  However add the checks.
> 
> Namely, either the need to CoW due to missing write bit, or proper
> unsharing on !AnonExclusive pages over R/O pins to reject the follow page.
> That brings this function closer to follow_hugetlb_page().
> 
> So we don't care before, and also for now.  But we'll care if we switch
> over slow-gup to use hugetlb_follow_page_mask().  We'll also care when to
> return -EMLINK properly, as that's the gup internal api to mean "we should
> unshare".  Not really needed for follow page path, though.
> 
> When at it, switching the try_grab_page() to use WARN_ON_ONCE(), to be
> clear that it just should never fail.  When error happens, instead of
> setting page==NULL, capture the errno instead.
> 
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   mm/hugetlb.c | 31 ++++++++++++++++++++-----------
>   1 file changed, 20 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f75f5e78ff0b..27367edf5c72 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6462,13 +6462,7 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
>   	struct page *page = NULL;
>   	spinlock_t *ptl;
>   	pte_t *pte, entry;
> -
> -	/*
> -	 * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via
> -	 * follow_hugetlb_page().
> -	 */
> -	if (WARN_ON_ONCE(flags & FOLL_PIN))
> -		return NULL;
> +	int ret;
>   
>   	hugetlb_vma_lock_read(vma);
>   	pte = hugetlb_walk(vma, haddr, huge_page_size(h));
> @@ -6478,8 +6472,21 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
>   	ptl = huge_pte_lock(h, mm, pte);
>   	entry = huge_ptep_get(pte);
>   	if (pte_present(entry)) {
> -		page = pte_page(entry) +
> -				((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
> +		page = pte_page(entry);
> +
> +		if ((flags & FOLL_WRITE) && !huge_pte_write(entry)) {
> +			page = NULL;
> +			goto out;
> +		}
> +
> +		if (gup_must_unshare(vma, flags, page)) {
> +			/* Tell the caller to do unsharing */
> +			page = ERR_PTR(-EMLINK);
> +			goto out;
> +		}


No need to check if the page is writable (like all other callers and as 
gup_must_unshare() documents -- "for which pages that are 
write-protected in the page table")

if (!huge_pte_write(entry) && gup_must_unshare(vma, flags, page)) {


With that

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 6/8] mm/gup: Retire follow_hugetlb_page()
  2023-06-23 14:29 ` [PATCH v3 6/8] mm/gup: Retire follow_hugetlb_page() Peter Xu
@ 2023-06-26  8:08   ` David Hildenbrand
  0 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2023-06-26  8:08 UTC (permalink / raw)
  To: Peter Xu, linux-mm, linux-kernel
  Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
	Yang Shi, Andrea Arcangeli, Vlastimil Babka, Kirill A . Shutemov,
	James Houghton, Matthew Wilcox, Mike Kravetz, Hugh Dickins,
	Jason Gunthorpe

On 23.06.23 16:29, Peter Xu wrote:
> Now __get_user_pages() should be well prepared to handle thp completely,
> as long as hugetlb gup requests even without the hugetlb's special path.
> 
> Time to retire follow_hugetlb_page().
> 
> Tweak misc comments to reflect reality of follow_hugetlb_page()'s removal.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   fs/userfaultfd.c        |   2 +-
>   include/linux/hugetlb.h |  12 ---
>   mm/gup.c                |  19 ----
>   mm/hugetlb.c            | 224 ----------------------------------------
>   4 files changed, 1 insertion(+), 256 deletions(-)

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 7/8] selftests/mm: Add -a to run_vmtests.sh
  2023-06-23 14:29 ` [PATCH v3 7/8] selftests/mm: Add -a to run_vmtests.sh Peter Xu
@ 2023-06-26  8:10   ` David Hildenbrand
  2023-06-26 16:24     ` Peter Xu
  0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2023-06-26  8:10 UTC (permalink / raw)
  To: Peter Xu, linux-mm, linux-kernel
  Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
	Yang Shi, Andrea Arcangeli, Vlastimil Babka, Kirill A . Shutemov,
	James Houghton, Matthew Wilcox, Mike Kravetz, Hugh Dickins,
	Jason Gunthorpe

On 23.06.23 16:29, Peter Xu wrote:
> Allows to specify optional tests in run_vmtests.sh, where we can run time
> consuming test matrix only when user specified "-a".

I'd have used something like "-e: extended tests that might be more 
time-consuming".

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 8/8] selftests/mm: Add gup test matrix in run_vmtests.sh
  2023-06-23 14:29 ` [PATCH v3 8/8] selftests/mm: Add gup test matrix in run_vmtests.sh Peter Xu
@ 2023-06-26  8:11   ` David Hildenbrand
  0 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2023-06-26  8:11 UTC (permalink / raw)
  To: Peter Xu, linux-mm, linux-kernel
  Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
	Yang Shi, Andrea Arcangeli, Vlastimil Babka, Kirill A . Shutemov,
	James Houghton, Matthew Wilcox, Mike Kravetz, Hugh Dickins,
	Jason Gunthorpe

On 23.06.23 16:29, Peter Xu wrote:
> Add a matrix for testing gup based on the current gup_test.  Only run the
> matrix when -a is specified because it's a bit slow.
> 
> It covers:
> 
>    - Different types of huge pages: thp, hugetlb, or no huge page
>    - Permissions: Write / Read-only
>    - Fast-gup, with/without
>    - Types of the GUP: pin / gup / longterm pins
>    - Shared / Private memories
>    - GUP size: 1 / 512 / random page sizes
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/8] mm/hugetlb: Prepare hugetlb_follow_page_mask() for FOLL_PIN
  2023-06-26  8:06   ` David Hildenbrand
@ 2023-06-26 16:23     ` Peter Xu
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Xu @ 2023-06-26 16:23 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, Lorenzo Stoakes, John Hubbard,
	Andrew Morton, Mike Rapoport, Yang Shi, Andrea Arcangeli,
	Vlastimil Babka, Kirill A . Shutemov, James Houghton,
	Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe

On Mon, Jun 26, 2023 at 10:06:24AM +0200, David Hildenbrand wrote:
> On 23.06.23 16:29, Peter Xu wrote:
> > follow_page() doesn't use FOLL_PIN, meanwhile hugetlb seems to not be the
> > target of FOLL_WRITE either.  However add the checks.
> > 
> > Namely, either the need to CoW due to missing write bit, or proper
> > unsharing on !AnonExclusive pages over R/O pins to reject the follow page.
> > That brings this function closer to follow_hugetlb_page().
> > 
> > So we don't care before, and also for now.  But we'll care if we switch
> > over slow-gup to use hugetlb_follow_page_mask().  We'll also care when to
> > return -EMLINK properly, as that's the gup internal api to mean "we should
> > unshare".  Not really needed for follow page path, though.
> > 
> > When at it, switching the try_grab_page() to use WARN_ON_ONCE(), to be
> > clear that it just should never fail.  When error happens, instead of
> > setting page==NULL, capture the errno instead.
> > 
> > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >   mm/hugetlb.c | 31 ++++++++++++++++++++-----------
> >   1 file changed, 20 insertions(+), 11 deletions(-)
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index f75f5e78ff0b..27367edf5c72 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -6462,13 +6462,7 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> >   	struct page *page = NULL;
> >   	spinlock_t *ptl;
> >   	pte_t *pte, entry;
> > -
> > -	/*
> > -	 * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via
> > -	 * follow_hugetlb_page().
> > -	 */
> > -	if (WARN_ON_ONCE(flags & FOLL_PIN))
> > -		return NULL;
> > +	int ret;
> >   	hugetlb_vma_lock_read(vma);
> >   	pte = hugetlb_walk(vma, haddr, huge_page_size(h));
> > @@ -6478,8 +6472,21 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> >   	ptl = huge_pte_lock(h, mm, pte);
> >   	entry = huge_ptep_get(pte);
> >   	if (pte_present(entry)) {
> > -		page = pte_page(entry) +
> > -				((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
> > +		page = pte_page(entry);
> > +
> > +		if ((flags & FOLL_WRITE) && !huge_pte_write(entry)) {
> > +			page = NULL;
> > +			goto out;
> > +		}
> > +
> > +		if (gup_must_unshare(vma, flags, page)) {
> > +			/* Tell the caller to do unsharing */
> > +			page = ERR_PTR(-EMLINK);
> > +			goto out;
> > +		}
> 
> 
> No need to check if the page is writable (like all other callers and as
> gup_must_unshare() documents -- "for which pages that are write-protected in
> the page table")
> 
> if (!huge_pte_write(entry) && gup_must_unshare(vma, flags, page)) {

Sure.

I was wondering whether we should just allow passing in "write" into
gup_must_unshare(), it'll just be a bit weird that it'll return false
directly if write, meanwhile hopefully that makes it easier to be
consistent.  I'll leave that as-is for now, anyway.

For this one I'll just merge it into:

		if (!huge_pte_write(entry)) {
			if (flags & FOLL_WRITE) {
				page = NULL;
				goto out;
			}

			if (gup_must_unshare(vma, flags, page)) {
				/* Tell the caller to do unsharing */
				page = ERR_PTR(-EMLINK);
				goto out;
			}
		}

> 
> 
> With that
> 
> Reviewed-by: David Hildenbrand <david@redhat.com>

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 7/8] selftests/mm: Add -a to run_vmtests.sh
  2023-06-26  8:10   ` David Hildenbrand
@ 2023-06-26 16:24     ` Peter Xu
  2023-06-26 20:24       ` John Hubbard
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Xu @ 2023-06-26 16:24 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, Lorenzo Stoakes, John Hubbard,
	Andrew Morton, Mike Rapoport, Yang Shi, Andrea Arcangeli,
	Vlastimil Babka, Kirill A . Shutemov, James Houghton,
	Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe

On Mon, Jun 26, 2023 at 10:10:26AM +0200, David Hildenbrand wrote:
> On 23.06.23 16:29, Peter Xu wrote:
> > Allows to specify optional tests in run_vmtests.sh, where we can run time
> > consuming test matrix only when user specified "-a".
> 
> I'd have used something like "-e: extended tests that might be more
> time-consuming".

'-a' stands for "all" here, I worry '-e' may be misread into running "extra
tests only".

> 
> Acked-by: David Hildenbrand <david@redhat.com>

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 7/8] selftests/mm: Add -a to run_vmtests.sh
  2023-06-26 16:24     ` Peter Xu
@ 2023-06-26 20:24       ` John Hubbard
  2023-06-26 20:40         ` Peter Xu
  0 siblings, 1 reply; 17+ messages in thread
From: John Hubbard @ 2023-06-26 20:24 UTC (permalink / raw)
  To: Peter Xu, David Hildenbrand
  Cc: linux-mm, linux-kernel, Lorenzo Stoakes, Andrew Morton,
	Mike Rapoport, Yang Shi, Andrea Arcangeli, Vlastimil Babka,
	Kirill A . Shutemov, James Houghton, Matthew Wilcox, Mike Kravetz,
	Hugh Dickins, Jason Gunthorpe

On 6/26/23 09:24, Peter Xu wrote:
> On Mon, Jun 26, 2023 at 10:10:26AM +0200, David Hildenbrand wrote:
>> On 23.06.23 16:29, Peter Xu wrote:
>>> Allows to specify optional tests in run_vmtests.sh, where we can run time
>>> consuming test matrix only when user specified "-a".
>>
>> I'd have used something like "-e: extended tests that might be more
>> time-consuming".
> 
> '-a' stands for "all" here, I worry '-e' may be misread into running "extra
> tests only".
> 

It's getting to the point where long options would help. This one
could be "--stress-tests" or "--all", or "--include-long-running",
for example.

I hesitate to suggest it because it's extra work, but at some point
it would be nice. The options are already hopelessly un-guessable.


thanks,
-- 
John Hubbard
NVIDIA



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 7/8] selftests/mm: Add -a to run_vmtests.sh
  2023-06-26 20:24       ` John Hubbard
@ 2023-06-26 20:40         ` Peter Xu
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Xu @ 2023-06-26 20:40 UTC (permalink / raw)
  To: John Hubbard
  Cc: David Hildenbrand, linux-mm, linux-kernel, Lorenzo Stoakes,
	Andrew Morton, Mike Rapoport, Yang Shi, Andrea Arcangeli,
	Vlastimil Babka, Kirill A . Shutemov, James Houghton,
	Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe

On Mon, Jun 26, 2023 at 01:24:45PM -0700, John Hubbard wrote:
> On 6/26/23 09:24, Peter Xu wrote:
> > On Mon, Jun 26, 2023 at 10:10:26AM +0200, David Hildenbrand wrote:
> > > On 23.06.23 16:29, Peter Xu wrote:
> > > > Allows to specify optional tests in run_vmtests.sh, where we can run time
> > > > consuming test matrix only when user specified "-a".
> > > 
> > > I'd have used something like "-e: extended tests that might be more
> > > time-consuming".
> > 
> > '-a' stands for "all" here, I worry '-e' may be misread into running "extra
> > tests only".
> > 
> 
> It's getting to the point where long options would help. This one
> could be "--stress-tests" or "--all", or "--include-long-running",
> for example.
> 
> I hesitate to suggest it because it's extra work, but at some point
> it would be nice. The options are already hopelessly un-guessable.

Thanks for being considerate, John.  Yes I do hope this series can converge
soon, indeed hopefully any further test patches can be worked on top.

To me I would be fine with either option (-a/-e), as long as there's easy
way to dump a help message, where "run_vmtest.sh -h" works for me always.

What's more "hopeless" along my way as an user is there's no help message
for gup_test.c even though its parameter list is crazily long.  I used to
"convince" myself that gup is just so special so someone should just
understand what option goes to what ioctl, better read into the .c file.
But I know that's an excuse.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-06-26 20:40 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-23 14:29 [PATCH v3 0/8] mm/gup: Unify hugetlb, speed up thp Peter Xu
2023-06-23 14:29 ` [PATCH v3 1/8] mm/hugetlb: Handle FOLL_DUMP well in follow_page_mask() Peter Xu
2023-06-23 14:29 ` [PATCH v3 2/8] mm/hugetlb: Prepare hugetlb_follow_page_mask() for FOLL_PIN Peter Xu
2023-06-26  8:06   ` David Hildenbrand
2023-06-26 16:23     ` Peter Xu
2023-06-23 14:29 ` [PATCH v3 3/8] mm/hugetlb: Add page_mask for hugetlb_follow_page_mask() Peter Xu
2023-06-23 14:29 ` [PATCH v3 4/8] mm/gup: Cleanup next_page handling Peter Xu
2023-06-23 14:29 ` [PATCH v3 5/8] mm/gup: Accelerate thp gup even for "pages != NULL" Peter Xu
2023-06-23 14:29 ` [PATCH v3 6/8] mm/gup: Retire follow_hugetlb_page() Peter Xu
2023-06-26  8:08   ` David Hildenbrand
2023-06-23 14:29 ` [PATCH v3 7/8] selftests/mm: Add -a to run_vmtests.sh Peter Xu
2023-06-26  8:10   ` David Hildenbrand
2023-06-26 16:24     ` Peter Xu
2023-06-26 20:24       ` John Hubbard
2023-06-26 20:40         ` Peter Xu
2023-06-23 14:29 ` [PATCH v3 8/8] selftests/mm: Add gup test matrix in run_vmtests.sh Peter Xu
2023-06-26  8:11   ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).