* [PATCH v3 1/8] mm/hugetlb: Handle FOLL_DUMP well in follow_page_mask()
2023-06-23 14:29 [PATCH v3 0/8] mm/gup: Unify hugetlb, speed up thp Peter Xu
@ 2023-06-23 14:29 ` Peter Xu
2023-06-23 14:29 ` [PATCH v3 2/8] mm/hugetlb: Prepare hugetlb_follow_page_mask() for FOLL_PIN Peter Xu
` (6 subsequent siblings)
7 siblings, 0 replies; 17+ messages in thread
From: Peter Xu @ 2023-06-23 14:29 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
David Hildenbrand, peterx, Yang Shi, Andrea Arcangeli,
Vlastimil Babka, Kirill A . Shutemov, James Houghton,
Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe
Firstly, the no_page_table() is meaningless for hugetlb which is a no-op
there, because a hugetlb page always satisfies:
- vma_is_anonymous() == false
- vma->vm_ops->fault != NULL
So we can already safely remove it in hugetlb_follow_page_mask(), alongside
with the page* variable.
Meanwhile, what we do in follow_hugetlb_page() actually makes sense for a
dump: we try to fault in the page only if the page cache is already
allocated. Let's do the same here for follow_page_mask() on hugetlb.
It should so far has zero effect on real dumps, because that still goes
into follow_hugetlb_page(). But this may start to influence a bit on
follow_page() users who mimics a "dump page" scenario, but hopefully in a
good way. This also paves way for unifying the hugetlb gup-slow.
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
mm/gup.c | 9 ++-------
mm/hugetlb.c | 9 +++++++++
2 files changed, 11 insertions(+), 7 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index ce14d4d28503..abcd841d94b7 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -767,7 +767,6 @@ static struct page *follow_page_mask(struct vm_area_struct *vma,
struct follow_page_context *ctx)
{
pgd_t *pgd;
- struct page *page;
struct mm_struct *mm = vma->vm_mm;
ctx->page_mask = 0;
@@ -780,12 +779,8 @@ static struct page *follow_page_mask(struct vm_area_struct *vma,
* hugetlb_follow_page_mask is only for follow_page() handling here.
* Ordinary GUP uses follow_hugetlb_page for hugetlb processing.
*/
- if (is_vm_hugetlb_page(vma)) {
- page = hugetlb_follow_page_mask(vma, address, flags);
- if (!page)
- page = no_page_table(vma, flags);
- return page;
- }
+ if (is_vm_hugetlb_page(vma))
+ return hugetlb_follow_page_mask(vma, address, flags);
pgd = pgd_offset(mm, address);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d76574425da3..f75f5e78ff0b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6498,6 +6498,15 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
spin_unlock(ptl);
out_unlock:
hugetlb_vma_unlock_read(vma);
+
+ /*
+ * Fixup retval for dump requests: if pagecache doesn't exist,
+ * don't try to allocate a new page but just skip it.
+ */
+ if (!page && (flags & FOLL_DUMP) &&
+ !hugetlbfs_pagecache_present(h, vma, address))
+ page = ERR_PTR(-EFAULT);
+
return page;
}
--
2.40.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v3 2/8] mm/hugetlb: Prepare hugetlb_follow_page_mask() for FOLL_PIN
2023-06-23 14:29 [PATCH v3 0/8] mm/gup: Unify hugetlb, speed up thp Peter Xu
2023-06-23 14:29 ` [PATCH v3 1/8] mm/hugetlb: Handle FOLL_DUMP well in follow_page_mask() Peter Xu
@ 2023-06-23 14:29 ` Peter Xu
2023-06-26 8:06 ` David Hildenbrand
2023-06-23 14:29 ` [PATCH v3 3/8] mm/hugetlb: Add page_mask for hugetlb_follow_page_mask() Peter Xu
` (5 subsequent siblings)
7 siblings, 1 reply; 17+ messages in thread
From: Peter Xu @ 2023-06-23 14:29 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
David Hildenbrand, peterx, Yang Shi, Andrea Arcangeli,
Vlastimil Babka, Kirill A . Shutemov, James Houghton,
Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe
follow_page() doesn't use FOLL_PIN, meanwhile hugetlb seems to not be the
target of FOLL_WRITE either. However add the checks.
Namely, either the need to CoW due to missing write bit, or proper
unsharing on !AnonExclusive pages over R/O pins to reject the follow page.
That brings this function closer to follow_hugetlb_page().
So we don't care before, and also for now. But we'll care if we switch
over slow-gup to use hugetlb_follow_page_mask(). We'll also care when to
return -EMLINK properly, as that's the gup internal api to mean "we should
unshare". Not really needed for follow page path, though.
When at it, switching the try_grab_page() to use WARN_ON_ONCE(), to be
clear that it just should never fail. When error happens, instead of
setting page==NULL, capture the errno instead.
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
mm/hugetlb.c | 31 ++++++++++++++++++++-----------
1 file changed, 20 insertions(+), 11 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f75f5e78ff0b..27367edf5c72 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6462,13 +6462,7 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
struct page *page = NULL;
spinlock_t *ptl;
pte_t *pte, entry;
-
- /*
- * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via
- * follow_hugetlb_page().
- */
- if (WARN_ON_ONCE(flags & FOLL_PIN))
- return NULL;
+ int ret;
hugetlb_vma_lock_read(vma);
pte = hugetlb_walk(vma, haddr, huge_page_size(h));
@@ -6478,8 +6472,21 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
ptl = huge_pte_lock(h, mm, pte);
entry = huge_ptep_get(pte);
if (pte_present(entry)) {
- page = pte_page(entry) +
- ((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
+ page = pte_page(entry);
+
+ if ((flags & FOLL_WRITE) && !huge_pte_write(entry)) {
+ page = NULL;
+ goto out;
+ }
+
+ if (gup_must_unshare(vma, flags, page)) {
+ /* Tell the caller to do unsharing */
+ page = ERR_PTR(-EMLINK);
+ goto out;
+ }
+
+ page += ((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
+
/*
* Note that page may be a sub-page, and with vmemmap
* optimizations the page struct may be read only.
@@ -6489,8 +6496,10 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
* try_grab_page() should always be able to get the page here,
* because we hold the ptl lock and have verified pte_present().
*/
- if (try_grab_page(page, flags)) {
- page = NULL;
+ ret = try_grab_page(page, flags);
+
+ if (WARN_ON_ONCE(ret)) {
+ page = ERR_PTR(ret);
goto out;
}
}
--
2.40.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH v3 2/8] mm/hugetlb: Prepare hugetlb_follow_page_mask() for FOLL_PIN
2023-06-23 14:29 ` [PATCH v3 2/8] mm/hugetlb: Prepare hugetlb_follow_page_mask() for FOLL_PIN Peter Xu
@ 2023-06-26 8:06 ` David Hildenbrand
2023-06-26 16:23 ` Peter Xu
0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2023-06-26 8:06 UTC (permalink / raw)
To: Peter Xu, linux-mm, linux-kernel
Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
Yang Shi, Andrea Arcangeli, Vlastimil Babka, Kirill A . Shutemov,
James Houghton, Matthew Wilcox, Mike Kravetz, Hugh Dickins,
Jason Gunthorpe
On 23.06.23 16:29, Peter Xu wrote:
> follow_page() doesn't use FOLL_PIN, meanwhile hugetlb seems to not be the
> target of FOLL_WRITE either. However add the checks.
>
> Namely, either the need to CoW due to missing write bit, or proper
> unsharing on !AnonExclusive pages over R/O pins to reject the follow page.
> That brings this function closer to follow_hugetlb_page().
>
> So we don't care before, and also for now. But we'll care if we switch
> over slow-gup to use hugetlb_follow_page_mask(). We'll also care when to
> return -EMLINK properly, as that's the gup internal api to mean "we should
> unshare". Not really needed for follow page path, though.
>
> When at it, switching the try_grab_page() to use WARN_ON_ONCE(), to be
> clear that it just should never fail. When error happens, instead of
> setting page==NULL, capture the errno instead.
>
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> mm/hugetlb.c | 31 ++++++++++++++++++++-----------
> 1 file changed, 20 insertions(+), 11 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f75f5e78ff0b..27367edf5c72 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6462,13 +6462,7 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> struct page *page = NULL;
> spinlock_t *ptl;
> pte_t *pte, entry;
> -
> - /*
> - * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via
> - * follow_hugetlb_page().
> - */
> - if (WARN_ON_ONCE(flags & FOLL_PIN))
> - return NULL;
> + int ret;
>
> hugetlb_vma_lock_read(vma);
> pte = hugetlb_walk(vma, haddr, huge_page_size(h));
> @@ -6478,8 +6472,21 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> ptl = huge_pte_lock(h, mm, pte);
> entry = huge_ptep_get(pte);
> if (pte_present(entry)) {
> - page = pte_page(entry) +
> - ((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
> + page = pte_page(entry);
> +
> + if ((flags & FOLL_WRITE) && !huge_pte_write(entry)) {
> + page = NULL;
> + goto out;
> + }
> +
> + if (gup_must_unshare(vma, flags, page)) {
> + /* Tell the caller to do unsharing */
> + page = ERR_PTR(-EMLINK);
> + goto out;
> + }
No need to check if the page is writable (like all other callers and as
gup_must_unshare() documents -- "for which pages that are
write-protected in the page table")
if (!huge_pte_write(entry) && gup_must_unshare(vma, flags, page)) {
With that
Reviewed-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH v3 2/8] mm/hugetlb: Prepare hugetlb_follow_page_mask() for FOLL_PIN
2023-06-26 8:06 ` David Hildenbrand
@ 2023-06-26 16:23 ` Peter Xu
0 siblings, 0 replies; 17+ messages in thread
From: Peter Xu @ 2023-06-26 16:23 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-mm, linux-kernel, Lorenzo Stoakes, John Hubbard,
Andrew Morton, Mike Rapoport, Yang Shi, Andrea Arcangeli,
Vlastimil Babka, Kirill A . Shutemov, James Houghton,
Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe
On Mon, Jun 26, 2023 at 10:06:24AM +0200, David Hildenbrand wrote:
> On 23.06.23 16:29, Peter Xu wrote:
> > follow_page() doesn't use FOLL_PIN, meanwhile hugetlb seems to not be the
> > target of FOLL_WRITE either. However add the checks.
> >
> > Namely, either the need to CoW due to missing write bit, or proper
> > unsharing on !AnonExclusive pages over R/O pins to reject the follow page.
> > That brings this function closer to follow_hugetlb_page().
> >
> > So we don't care before, and also for now. But we'll care if we switch
> > over slow-gup to use hugetlb_follow_page_mask(). We'll also care when to
> > return -EMLINK properly, as that's the gup internal api to mean "we should
> > unshare". Not really needed for follow page path, though.
> >
> > When at it, switching the try_grab_page() to use WARN_ON_ONCE(), to be
> > clear that it just should never fail. When error happens, instead of
> > setting page==NULL, capture the errno instead.
> >
> > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > mm/hugetlb.c | 31 ++++++++++++++++++++-----------
> > 1 file changed, 20 insertions(+), 11 deletions(-)
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index f75f5e78ff0b..27367edf5c72 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -6462,13 +6462,7 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> > struct page *page = NULL;
> > spinlock_t *ptl;
> > pte_t *pte, entry;
> > -
> > - /*
> > - * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via
> > - * follow_hugetlb_page().
> > - */
> > - if (WARN_ON_ONCE(flags & FOLL_PIN))
> > - return NULL;
> > + int ret;
> > hugetlb_vma_lock_read(vma);
> > pte = hugetlb_walk(vma, haddr, huge_page_size(h));
> > @@ -6478,8 +6472,21 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> > ptl = huge_pte_lock(h, mm, pte);
> > entry = huge_ptep_get(pte);
> > if (pte_present(entry)) {
> > - page = pte_page(entry) +
> > - ((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
> > + page = pte_page(entry);
> > +
> > + if ((flags & FOLL_WRITE) && !huge_pte_write(entry)) {
> > + page = NULL;
> > + goto out;
> > + }
> > +
> > + if (gup_must_unshare(vma, flags, page)) {
> > + /* Tell the caller to do unsharing */
> > + page = ERR_PTR(-EMLINK);
> > + goto out;
> > + }
>
>
> No need to check if the page is writable (like all other callers and as
> gup_must_unshare() documents -- "for which pages that are write-protected in
> the page table")
>
> if (!huge_pte_write(entry) && gup_must_unshare(vma, flags, page)) {
Sure.
I was wondering whether we should just allow passing in "write" into
gup_must_unshare(), it'll just be a bit weird that it'll return false
directly if write, meanwhile hopefully that makes it easier to be
consistent. I'll leave that as-is for now, anyway.
For this one I'll just merge it into:
if (!huge_pte_write(entry)) {
if (flags & FOLL_WRITE) {
page = NULL;
goto out;
}
if (gup_must_unshare(vma, flags, page)) {
/* Tell the caller to do unsharing */
page = ERR_PTR(-EMLINK);
goto out;
}
}
>
>
> With that
>
> Reviewed-by: David Hildenbrand <david@redhat.com>
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 3/8] mm/hugetlb: Add page_mask for hugetlb_follow_page_mask()
2023-06-23 14:29 [PATCH v3 0/8] mm/gup: Unify hugetlb, speed up thp Peter Xu
2023-06-23 14:29 ` [PATCH v3 1/8] mm/hugetlb: Handle FOLL_DUMP well in follow_page_mask() Peter Xu
2023-06-23 14:29 ` [PATCH v3 2/8] mm/hugetlb: Prepare hugetlb_follow_page_mask() for FOLL_PIN Peter Xu
@ 2023-06-23 14:29 ` Peter Xu
2023-06-23 14:29 ` [PATCH v3 4/8] mm/gup: Cleanup next_page handling Peter Xu
` (4 subsequent siblings)
7 siblings, 0 replies; 17+ messages in thread
From: Peter Xu @ 2023-06-23 14:29 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
David Hildenbrand, peterx, Yang Shi, Andrea Arcangeli,
Vlastimil Babka, Kirill A . Shutemov, James Houghton,
Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe
follow_page() doesn't need it, but we'll start to need it when unifying gup
for hugetlb.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
include/linux/hugetlb.h | 8 +++++---
mm/gup.c | 3 ++-
mm/hugetlb.c | 5 ++++-
3 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index beb7c63d2871..2e2d89e79d6c 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -131,7 +131,8 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *,
struct vm_area_struct *, struct vm_area_struct *);
struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
- unsigned long address, unsigned int flags);
+ unsigned long address, unsigned int flags,
+ unsigned int *page_mask);
long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
struct page **, unsigned long *, unsigned long *,
long, unsigned int, int *);
@@ -297,8 +298,9 @@ static inline void adjust_range_if_pmd_sharing_possible(
{
}
-static inline struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
- unsigned long address, unsigned int flags)
+static inline struct page *hugetlb_follow_page_mask(
+ struct vm_area_struct *vma, unsigned long address, unsigned int flags,
+ unsigned int *page_mask)
{
BUILD_BUG(); /* should never be compiled in if !CONFIG_HUGETLB_PAGE*/
}
diff --git a/mm/gup.c b/mm/gup.c
index abcd841d94b7..9fc9271cba8d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -780,7 +780,8 @@ static struct page *follow_page_mask(struct vm_area_struct *vma,
* Ordinary GUP uses follow_hugetlb_page for hugetlb processing.
*/
if (is_vm_hugetlb_page(vma))
- return hugetlb_follow_page_mask(vma, address, flags);
+ return hugetlb_follow_page_mask(vma, address, flags,
+ &ctx->page_mask);
pgd = pgd_offset(mm, address);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 27367edf5c72..b4973edef9f2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6454,7 +6454,8 @@ static inline bool __follow_hugetlb_must_fault(struct vm_area_struct *vma,
}
struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
- unsigned long address, unsigned int flags)
+ unsigned long address, unsigned int flags,
+ unsigned int *page_mask)
{
struct hstate *h = hstate_vma(vma);
struct mm_struct *mm = vma->vm_mm;
@@ -6502,6 +6503,8 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
page = ERR_PTR(ret);
goto out;
}
+
+ *page_mask = (1U << huge_page_order(h)) - 1;
}
out:
spin_unlock(ptl);
--
2.40.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v3 4/8] mm/gup: Cleanup next_page handling
2023-06-23 14:29 [PATCH v3 0/8] mm/gup: Unify hugetlb, speed up thp Peter Xu
` (2 preceding siblings ...)
2023-06-23 14:29 ` [PATCH v3 3/8] mm/hugetlb: Add page_mask for hugetlb_follow_page_mask() Peter Xu
@ 2023-06-23 14:29 ` Peter Xu
2023-06-23 14:29 ` [PATCH v3 5/8] mm/gup: Accelerate thp gup even for "pages != NULL" Peter Xu
` (3 subsequent siblings)
7 siblings, 0 replies; 17+ messages in thread
From: Peter Xu @ 2023-06-23 14:29 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
David Hildenbrand, peterx, Yang Shi, Andrea Arcangeli,
Vlastimil Babka, Kirill A . Shutemov, James Houghton,
Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe
The only path that doesn't use generic "**pages" handling is the gate vma.
Make it use the same path, meanwhile tune the next_page label upper to
cover "**pages" handling. This prepares for THP handling for "**pages".
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
mm/gup.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 9fc9271cba8d..4a00d609033e 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1124,7 +1124,7 @@ static long __get_user_pages(struct mm_struct *mm,
if (!vma && in_gate_area(mm, start)) {
ret = get_gate_page(mm, start & PAGE_MASK,
gup_flags, &vma,
- pages ? &pages[i] : NULL);
+ pages ? &page : NULL);
if (ret)
goto out;
ctx.page_mask = 0;
@@ -1194,19 +1194,18 @@ static long __get_user_pages(struct mm_struct *mm,
ret = PTR_ERR(page);
goto out;
}
-
- goto next_page;
} else if (IS_ERR(page)) {
ret = PTR_ERR(page);
goto out;
}
+next_page:
if (pages) {
pages[i] = page;
flush_anon_page(vma, page, start);
flush_dcache_page(page);
ctx.page_mask = 0;
}
-next_page:
+
page_increm = 1 + (~(start >> PAGE_SHIFT) & ctx.page_mask);
if (page_increm > nr_pages)
page_increm = nr_pages;
--
2.40.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v3 5/8] mm/gup: Accelerate thp gup even for "pages != NULL"
2023-06-23 14:29 [PATCH v3 0/8] mm/gup: Unify hugetlb, speed up thp Peter Xu
` (3 preceding siblings ...)
2023-06-23 14:29 ` [PATCH v3 4/8] mm/gup: Cleanup next_page handling Peter Xu
@ 2023-06-23 14:29 ` Peter Xu
2023-06-23 14:29 ` [PATCH v3 6/8] mm/gup: Retire follow_hugetlb_page() Peter Xu
` (2 subsequent siblings)
7 siblings, 0 replies; 17+ messages in thread
From: Peter Xu @ 2023-06-23 14:29 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
David Hildenbrand, peterx, Yang Shi, Andrea Arcangeli,
Vlastimil Babka, Kirill A . Shutemov, James Houghton,
Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe
The acceleration of THP was done with ctx.page_mask, however it'll be
ignored if **pages is non-NULL.
The old optimization was introduced in 2013 in 240aadeedc4a ("mm:
accelerate mm_populate() treatment of THP pages"). It didn't explain why
we can't optimize the **pages non-NULL case. It's possible that at that
time the major goal was for mm_populate() which should be enough back then.
Optimize thp for all cases, by properly looping over each subpage, doing
cache flushes, and boost refcounts / pincounts where needed in one go.
This can be verified using gup_test below:
# chrt -f 1 ./gup_test -m 512 -t -L -n 1024 -r 10
Before: 13992.50 ( +-8.75%)
After: 378.50 (+-69.62%)
Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
mm/gup.c | 51 ++++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 44 insertions(+), 7 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 4a00d609033e..22e32cff9ac7 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1199,16 +1199,53 @@ static long __get_user_pages(struct mm_struct *mm,
goto out;
}
next_page:
- if (pages) {
- pages[i] = page;
- flush_anon_page(vma, page, start);
- flush_dcache_page(page);
- ctx.page_mask = 0;
- }
-
page_increm = 1 + (~(start >> PAGE_SHIFT) & ctx.page_mask);
if (page_increm > nr_pages)
page_increm = nr_pages;
+
+ if (pages) {
+ struct page *subpage;
+ unsigned int j;
+
+ /*
+ * This must be a large folio (and doesn't need to
+ * be the whole folio; it can be part of it), do
+ * the refcount work for all the subpages too.
+ *
+ * NOTE: here the page may not be the head page
+ * e.g. when start addr is not thp-size aligned.
+ * try_grab_folio() should have taken care of tail
+ * pages.
+ */
+ if (page_increm > 1) {
+ struct folio *folio;
+
+ /*
+ * Since we already hold refcount on the
+ * large folio, this should never fail.
+ */
+ folio = try_grab_folio(page, page_increm - 1,
+ foll_flags);
+ if (WARN_ON_ONCE(!folio)) {
+ /*
+ * Release the 1st page ref if the
+ * folio is problematic, fail hard.
+ */
+ gup_put_folio(page_folio(page), 1,
+ foll_flags);
+ ret = -EFAULT;
+ goto out;
+ }
+ }
+
+ for (j = 0; j < page_increm; j++) {
+ subpage = nth_page(page, j);
+ pages[i + j] = subpage;
+ flush_anon_page(vma, subpage, start + j * PAGE_SIZE);
+ flush_dcache_page(subpage);
+ }
+ }
+
i += page_increm;
start += page_increm * PAGE_SIZE;
nr_pages -= page_increm;
--
2.40.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH v3 6/8] mm/gup: Retire follow_hugetlb_page()
2023-06-23 14:29 [PATCH v3 0/8] mm/gup: Unify hugetlb, speed up thp Peter Xu
` (4 preceding siblings ...)
2023-06-23 14:29 ` [PATCH v3 5/8] mm/gup: Accelerate thp gup even for "pages != NULL" Peter Xu
@ 2023-06-23 14:29 ` Peter Xu
2023-06-26 8:08 ` David Hildenbrand
2023-06-23 14:29 ` [PATCH v3 7/8] selftests/mm: Add -a to run_vmtests.sh Peter Xu
2023-06-23 14:29 ` [PATCH v3 8/8] selftests/mm: Add gup test matrix in run_vmtests.sh Peter Xu
7 siblings, 1 reply; 17+ messages in thread
From: Peter Xu @ 2023-06-23 14:29 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
David Hildenbrand, peterx, Yang Shi, Andrea Arcangeli,
Vlastimil Babka, Kirill A . Shutemov, James Houghton,
Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe
Now __get_user_pages() should be well prepared to handle thp completely,
as long as hugetlb gup requests even without the hugetlb's special path.
Time to retire follow_hugetlb_page().
Tweak misc comments to reflect reality of follow_hugetlb_page()'s removal.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
fs/userfaultfd.c | 2 +-
include/linux/hugetlb.h | 12 ---
mm/gup.c | 19 ----
mm/hugetlb.c | 224 ----------------------------------------
4 files changed, 1 insertion(+), 256 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 7cecd49e078b..ae711f1d7a83 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -427,7 +427,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
*
* We also don't do userfault handling during
* coredumping. hugetlbfs has the special
- * follow_hugetlb_page() to skip missing pages in the
+ * hugetlb_follow_page_mask() to skip missing pages in the
* FOLL_DUMP case, anon memory also checks for FOLL_DUMP with
* the no_page_table() helper in follow_page_mask(), but the
* shmem_vm_ops->fault method is invoked even during
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 2e2d89e79d6c..bb5024718fc1 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -133,9 +133,6 @@ int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *,
struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
unsigned long address, unsigned int flags,
unsigned int *page_mask);
-long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
- struct page **, unsigned long *, unsigned long *,
- long, unsigned int, int *);
void unmap_hugepage_range(struct vm_area_struct *,
unsigned long, unsigned long, struct page *,
zap_flags_t);
@@ -305,15 +302,6 @@ static inline struct page *hugetlb_follow_page_mask(
BUILD_BUG(); /* should never be compiled in if !CONFIG_HUGETLB_PAGE*/
}
-static inline long follow_hugetlb_page(struct mm_struct *mm,
- struct vm_area_struct *vma, struct page **pages,
- unsigned long *position, unsigned long *nr_pages,
- long i, unsigned int flags, int *nonblocking)
-{
- BUG();
- return 0;
-}
-
static inline int copy_hugetlb_page_range(struct mm_struct *dst,
struct mm_struct *src,
struct vm_area_struct *dst_vma,
diff --git a/mm/gup.c b/mm/gup.c
index 22e32cff9ac7..ac61244e6fca 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -775,9 +775,6 @@ static struct page *follow_page_mask(struct vm_area_struct *vma,
* Call hugetlb_follow_page_mask for hugetlb vmas as it will use
* special hugetlb page table walking code. This eliminates the
* need to check for hugetlb entries in the general walking code.
- *
- * hugetlb_follow_page_mask is only for follow_page() handling here.
- * Ordinary GUP uses follow_hugetlb_page for hugetlb processing.
*/
if (is_vm_hugetlb_page(vma))
return hugetlb_follow_page_mask(vma, address, flags,
@@ -1138,22 +1135,6 @@ static long __get_user_pages(struct mm_struct *mm,
ret = check_vma_flags(vma, gup_flags);
if (ret)
goto out;
-
- if (is_vm_hugetlb_page(vma)) {
- i = follow_hugetlb_page(mm, vma, pages,
- &start, &nr_pages, i,
- gup_flags, locked);
- if (!*locked) {
- /*
- * We've got a VM_FAULT_RETRY
- * and we've lost mmap_lock.
- * We must stop here.
- */
- BUG_ON(gup_flags & FOLL_NOWAIT);
- goto out;
- }
- continue;
- }
}
retry:
/*
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b4973edef9f2..50a3579782a5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5721,7 +5721,6 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
/*
* Return whether there is a pagecache page to back given address within VMA.
- * Caller follow_hugetlb_page() holds page_table_lock so we cannot lock_page.
*/
static bool hugetlbfs_pagecache_present(struct hstate *h,
struct vm_area_struct *vma, unsigned long address)
@@ -6422,37 +6421,6 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
}
#endif /* CONFIG_USERFAULTFD */
-static void record_subpages(struct page *page, struct vm_area_struct *vma,
- int refs, struct page **pages)
-{
- int nr;
-
- for (nr = 0; nr < refs; nr++) {
- if (likely(pages))
- pages[nr] = nth_page(page, nr);
- }
-}
-
-static inline bool __follow_hugetlb_must_fault(struct vm_area_struct *vma,
- unsigned int flags, pte_t *pte,
- bool *unshare)
-{
- pte_t pteval = huge_ptep_get(pte);
-
- *unshare = false;
- if (is_swap_pte(pteval))
- return true;
- if (huge_pte_write(pteval))
- return false;
- if (flags & FOLL_WRITE)
- return true;
- if (gup_must_unshare(vma, flags, pte_page(pteval))) {
- *unshare = true;
- return true;
- }
- return false;
-}
-
struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
unsigned long address, unsigned int flags,
unsigned int *page_mask)
@@ -6522,198 +6490,6 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
return page;
}
-long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
- struct page **pages, unsigned long *position,
- unsigned long *nr_pages, long i, unsigned int flags,
- int *locked)
-{
- unsigned long pfn_offset;
- unsigned long vaddr = *position;
- unsigned long remainder = *nr_pages;
- struct hstate *h = hstate_vma(vma);
- int err = -EFAULT, refs;
-
- while (vaddr < vma->vm_end && remainder) {
- pte_t *pte;
- spinlock_t *ptl = NULL;
- bool unshare = false;
- int absent;
- struct page *page;
-
- /*
- * If we have a pending SIGKILL, don't keep faulting pages and
- * potentially allocating memory.
- */
- if (fatal_signal_pending(current)) {
- remainder = 0;
- break;
- }
-
- hugetlb_vma_lock_read(vma);
- /*
- * Some archs (sparc64, sh*) have multiple pte_ts to
- * each hugepage. We have to make sure we get the
- * first, for the page indexing below to work.
- *
- * Note that page table lock is not held when pte is null.
- */
- pte = hugetlb_walk(vma, vaddr & huge_page_mask(h),
- huge_page_size(h));
- if (pte)
- ptl = huge_pte_lock(h, mm, pte);
- absent = !pte || huge_pte_none(huge_ptep_get(pte));
-
- /*
- * When coredumping, it suits get_dump_page if we just return
- * an error where there's an empty slot with no huge pagecache
- * to back it. This way, we avoid allocating a hugepage, and
- * the sparse dumpfile avoids allocating disk blocks, but its
- * huge holes still show up with zeroes where they need to be.
- */
- if (absent && (flags & FOLL_DUMP) &&
- !hugetlbfs_pagecache_present(h, vma, vaddr)) {
- if (pte)
- spin_unlock(ptl);
- hugetlb_vma_unlock_read(vma);
- remainder = 0;
- break;
- }
-
- /*
- * We need call hugetlb_fault for both hugepages under migration
- * (in which case hugetlb_fault waits for the migration,) and
- * hwpoisoned hugepages (in which case we need to prevent the
- * caller from accessing to them.) In order to do this, we use
- * here is_swap_pte instead of is_hugetlb_entry_migration and
- * is_hugetlb_entry_hwpoisoned. This is because it simply covers
- * both cases, and because we can't follow correct pages
- * directly from any kind of swap entries.
- */
- if (absent ||
- __follow_hugetlb_must_fault(vma, flags, pte, &unshare)) {
- vm_fault_t ret;
- unsigned int fault_flags = 0;
-
- if (pte)
- spin_unlock(ptl);
- hugetlb_vma_unlock_read(vma);
-
- if (flags & FOLL_WRITE)
- fault_flags |= FAULT_FLAG_WRITE;
- else if (unshare)
- fault_flags |= FAULT_FLAG_UNSHARE;
- if (locked) {
- fault_flags |= FAULT_FLAG_ALLOW_RETRY |
- FAULT_FLAG_KILLABLE;
- if (flags & FOLL_INTERRUPTIBLE)
- fault_flags |= FAULT_FLAG_INTERRUPTIBLE;
- }
- if (flags & FOLL_NOWAIT)
- fault_flags |= FAULT_FLAG_ALLOW_RETRY |
- FAULT_FLAG_RETRY_NOWAIT;
- if (flags & FOLL_TRIED) {
- /*
- * Note: FAULT_FLAG_ALLOW_RETRY and
- * FAULT_FLAG_TRIED can co-exist
- */
- fault_flags |= FAULT_FLAG_TRIED;
- }
- ret = hugetlb_fault(mm, vma, vaddr, fault_flags);
- if (ret & VM_FAULT_ERROR) {
- err = vm_fault_to_errno(ret, flags);
- remainder = 0;
- break;
- }
- if (ret & VM_FAULT_RETRY) {
- if (locked &&
- !(fault_flags & FAULT_FLAG_RETRY_NOWAIT))
- *locked = 0;
- *nr_pages = 0;
- /*
- * VM_FAULT_RETRY must not return an
- * error, it will return zero
- * instead.
- *
- * No need to update "position" as the
- * caller will not check it after
- * *nr_pages is set to 0.
- */
- return i;
- }
- continue;
- }
-
- pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
- page = pte_page(huge_ptep_get(pte));
-
- VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
- !PageAnonExclusive(page), page);
-
- /*
- * If subpage information not requested, update counters
- * and skip the same_page loop below.
- */
- if (!pages && !pfn_offset &&
- (vaddr + huge_page_size(h) < vma->vm_end) &&
- (remainder >= pages_per_huge_page(h))) {
- vaddr += huge_page_size(h);
- remainder -= pages_per_huge_page(h);
- i += pages_per_huge_page(h);
- spin_unlock(ptl);
- hugetlb_vma_unlock_read(vma);
- continue;
- }
-
- /* vaddr may not be aligned to PAGE_SIZE */
- refs = min3(pages_per_huge_page(h) - pfn_offset, remainder,
- (vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT);
-
- if (pages)
- record_subpages(nth_page(page, pfn_offset),
- vma, refs,
- likely(pages) ? pages + i : NULL);
-
- if (pages) {
- /*
- * try_grab_folio() should always succeed here,
- * because: a) we hold the ptl lock, and b) we've just
- * checked that the huge page is present in the page
- * tables. If the huge page is present, then the tail
- * pages must also be present. The ptl prevents the
- * head page and tail pages from being rearranged in
- * any way. As this is hugetlb, the pages will never
- * be p2pdma or not longterm pinable. So this page
- * must be available at this point, unless the page
- * refcount overflowed:
- */
- if (WARN_ON_ONCE(!try_grab_folio(pages[i], refs,
- flags))) {
- spin_unlock(ptl);
- hugetlb_vma_unlock_read(vma);
- remainder = 0;
- err = -ENOMEM;
- break;
- }
- }
-
- vaddr += (refs << PAGE_SHIFT);
- remainder -= refs;
- i += refs;
-
- spin_unlock(ptl);
- hugetlb_vma_unlock_read(vma);
- }
- *nr_pages = remainder;
- /*
- * setting position is actually required only if remainder is
- * not zero but it's faster not to add a "if (remainder)"
- * branch.
- */
- *position = vaddr;
-
- return i ? i : err;
-}
-
long hugetlb_change_protection(struct vm_area_struct *vma,
unsigned long address, unsigned long end,
pgprot_t newprot, unsigned long cp_flags)
--
2.40.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH v3 6/8] mm/gup: Retire follow_hugetlb_page()
2023-06-23 14:29 ` [PATCH v3 6/8] mm/gup: Retire follow_hugetlb_page() Peter Xu
@ 2023-06-26 8:08 ` David Hildenbrand
0 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2023-06-26 8:08 UTC (permalink / raw)
To: Peter Xu, linux-mm, linux-kernel
Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
Yang Shi, Andrea Arcangeli, Vlastimil Babka, Kirill A . Shutemov,
James Houghton, Matthew Wilcox, Mike Kravetz, Hugh Dickins,
Jason Gunthorpe
On 23.06.23 16:29, Peter Xu wrote:
> Now __get_user_pages() should be well prepared to handle thp completely,
> as long as hugetlb gup requests even without the hugetlb's special path.
>
> Time to retire follow_hugetlb_page().
>
> Tweak misc comments to reflect reality of follow_hugetlb_page()'s removal.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> fs/userfaultfd.c | 2 +-
> include/linux/hugetlb.h | 12 ---
> mm/gup.c | 19 ----
> mm/hugetlb.c | 224 ----------------------------------------
> 4 files changed, 1 insertion(+), 256 deletions(-)
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 7/8] selftests/mm: Add -a to run_vmtests.sh
2023-06-23 14:29 [PATCH v3 0/8] mm/gup: Unify hugetlb, speed up thp Peter Xu
` (5 preceding siblings ...)
2023-06-23 14:29 ` [PATCH v3 6/8] mm/gup: Retire follow_hugetlb_page() Peter Xu
@ 2023-06-23 14:29 ` Peter Xu
2023-06-26 8:10 ` David Hildenbrand
2023-06-23 14:29 ` [PATCH v3 8/8] selftests/mm: Add gup test matrix in run_vmtests.sh Peter Xu
7 siblings, 1 reply; 17+ messages in thread
From: Peter Xu @ 2023-06-23 14:29 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
David Hildenbrand, peterx, Yang Shi, Andrea Arcangeli,
Vlastimil Babka, Kirill A . Shutemov, James Houghton,
Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe
Allows to specify optional tests in run_vmtests.sh, where we can run time
consuming test matrix only when user specified "-a".
Signed-off-by: Peter Xu <peterx@redhat.com>
---
tools/testing/selftests/mm/run_vmtests.sh | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 3f26f6e15b2a..824e651f62f4 100644
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -12,11 +12,14 @@ exitcode=0
usage() {
cat <<EOF
-usage: ${BASH_SOURCE[0]:-$0} [ -h | -t "<categories>"]
+usage: ${BASH_SOURCE[0]:-$0} [ options ]
+
+ -a: run all tests, including extra ones
-t: specify specific categories to tests to run
-h: display this message
-The default behavior is to run all tests.
+The default behavior is to run required tests only. If -a is specified,
+will run all tests.
Alternatively, specific groups tests can be run by passing a string
to the -t argument containing one or more of the following categories
@@ -60,9 +63,11 @@ EOF
exit 0
}
+RUN_ALL=false
-while getopts "ht:" OPT; do
+while getopts "aht:" OPT; do
case ${OPT} in
+ "a") RUN_ALL=true ;;
"h") usage ;;
"t") VM_SELFTEST_ITEMS=${OPTARG} ;;
esac
--
2.40.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH v3 7/8] selftests/mm: Add -a to run_vmtests.sh
2023-06-23 14:29 ` [PATCH v3 7/8] selftests/mm: Add -a to run_vmtests.sh Peter Xu
@ 2023-06-26 8:10 ` David Hildenbrand
2023-06-26 16:24 ` Peter Xu
0 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2023-06-26 8:10 UTC (permalink / raw)
To: Peter Xu, linux-mm, linux-kernel
Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
Yang Shi, Andrea Arcangeli, Vlastimil Babka, Kirill A . Shutemov,
James Houghton, Matthew Wilcox, Mike Kravetz, Hugh Dickins,
Jason Gunthorpe
On 23.06.23 16:29, Peter Xu wrote:
> Allows to specify optional tests in run_vmtests.sh, where we can run time
> consuming test matrix only when user specified "-a".
I'd have used something like "-e: extended tests that might be more
time-consuming".
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 7/8] selftests/mm: Add -a to run_vmtests.sh
2023-06-26 8:10 ` David Hildenbrand
@ 2023-06-26 16:24 ` Peter Xu
2023-06-26 20:24 ` John Hubbard
0 siblings, 1 reply; 17+ messages in thread
From: Peter Xu @ 2023-06-26 16:24 UTC (permalink / raw)
To: David Hildenbrand
Cc: linux-mm, linux-kernel, Lorenzo Stoakes, John Hubbard,
Andrew Morton, Mike Rapoport, Yang Shi, Andrea Arcangeli,
Vlastimil Babka, Kirill A . Shutemov, James Houghton,
Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe
On Mon, Jun 26, 2023 at 10:10:26AM +0200, David Hildenbrand wrote:
> On 23.06.23 16:29, Peter Xu wrote:
> > Allows to specify optional tests in run_vmtests.sh, where we can run time
> > consuming test matrix only when user specified "-a".
>
> I'd have used something like "-e: extended tests that might be more
> time-consuming".
'-a' stands for "all" here, I worry '-e' may be misread into running "extra
tests only".
>
> Acked-by: David Hildenbrand <david@redhat.com>
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 7/8] selftests/mm: Add -a to run_vmtests.sh
2023-06-26 16:24 ` Peter Xu
@ 2023-06-26 20:24 ` John Hubbard
2023-06-26 20:40 ` Peter Xu
0 siblings, 1 reply; 17+ messages in thread
From: John Hubbard @ 2023-06-26 20:24 UTC (permalink / raw)
To: Peter Xu, David Hildenbrand
Cc: linux-mm, linux-kernel, Lorenzo Stoakes, Andrew Morton,
Mike Rapoport, Yang Shi, Andrea Arcangeli, Vlastimil Babka,
Kirill A . Shutemov, James Houghton, Matthew Wilcox, Mike Kravetz,
Hugh Dickins, Jason Gunthorpe
On 6/26/23 09:24, Peter Xu wrote:
> On Mon, Jun 26, 2023 at 10:10:26AM +0200, David Hildenbrand wrote:
>> On 23.06.23 16:29, Peter Xu wrote:
>>> Allows to specify optional tests in run_vmtests.sh, where we can run time
>>> consuming test matrix only when user specified "-a".
>>
>> I'd have used something like "-e: extended tests that might be more
>> time-consuming".
>
> '-a' stands for "all" here, I worry '-e' may be misread into running "extra
> tests only".
>
It's getting to the point where long options would help. This one
could be "--stress-tests" or "--all", or "--include-long-running",
for example.
I hesitate to suggest it because it's extra work, but at some point
it would be nice. The options are already hopelessly un-guessable.
thanks,
--
John Hubbard
NVIDIA
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v3 7/8] selftests/mm: Add -a to run_vmtests.sh
2023-06-26 20:24 ` John Hubbard
@ 2023-06-26 20:40 ` Peter Xu
0 siblings, 0 replies; 17+ messages in thread
From: Peter Xu @ 2023-06-26 20:40 UTC (permalink / raw)
To: John Hubbard
Cc: David Hildenbrand, linux-mm, linux-kernel, Lorenzo Stoakes,
Andrew Morton, Mike Rapoport, Yang Shi, Andrea Arcangeli,
Vlastimil Babka, Kirill A . Shutemov, James Houghton,
Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe
On Mon, Jun 26, 2023 at 01:24:45PM -0700, John Hubbard wrote:
> On 6/26/23 09:24, Peter Xu wrote:
> > On Mon, Jun 26, 2023 at 10:10:26AM +0200, David Hildenbrand wrote:
> > > On 23.06.23 16:29, Peter Xu wrote:
> > > > Allows to specify optional tests in run_vmtests.sh, where we can run time
> > > > consuming test matrix only when user specified "-a".
> > >
> > > I'd have used something like "-e: extended tests that might be more
> > > time-consuming".
> >
> > '-a' stands for "all" here, I worry '-e' may be misread into running "extra
> > tests only".
> >
>
> It's getting to the point where long options would help. This one
> could be "--stress-tests" or "--all", or "--include-long-running",
> for example.
>
> I hesitate to suggest it because it's extra work, but at some point
> it would be nice. The options are already hopelessly un-guessable.
Thanks for being considerate, John. Yes I do hope this series can converge
soon, indeed hopefully any further test patches can be worked on top.
To me I would be fine with either option (-a/-e), as long as there's easy
way to dump a help message, where "run_vmtest.sh -h" works for me always.
What's more "hopeless" along my way as an user is there's no help message
for gup_test.c even though its parameter list is crazily long. I used to
"convince" myself that gup is just so special so someone should just
understand what option goes to what ioctl, better read into the .c file.
But I know that's an excuse.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v3 8/8] selftests/mm: Add gup test matrix in run_vmtests.sh
2023-06-23 14:29 [PATCH v3 0/8] mm/gup: Unify hugetlb, speed up thp Peter Xu
` (6 preceding siblings ...)
2023-06-23 14:29 ` [PATCH v3 7/8] selftests/mm: Add -a to run_vmtests.sh Peter Xu
@ 2023-06-23 14:29 ` Peter Xu
2023-06-26 8:11 ` David Hildenbrand
7 siblings, 1 reply; 17+ messages in thread
From: Peter Xu @ 2023-06-23 14:29 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
David Hildenbrand, peterx, Yang Shi, Andrea Arcangeli,
Vlastimil Babka, Kirill A . Shutemov, James Houghton,
Matthew Wilcox, Mike Kravetz, Hugh Dickins, Jason Gunthorpe
Add a matrix for testing gup based on the current gup_test. Only run the
matrix when -a is specified because it's a bit slow.
It covers:
- Different types of huge pages: thp, hugetlb, or no huge page
- Permissions: Write / Read-only
- Fast-gup, with/without
- Types of the GUP: pin / gup / longterm pins
- Shared / Private memories
- GUP size: 1 / 512 / random page sizes
Signed-off-by: Peter Xu <peterx@redhat.com>
---
tools/testing/selftests/mm/run_vmtests.sh | 37 ++++++++++++++++++++---
1 file changed, 32 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 824e651f62f4..9666c0c171ab 100644
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -90,6 +90,30 @@ test_selected() {
fi
}
+run_gup_matrix() {
+ # -t: thp=on, -T: thp=off, -H: hugetlb=on
+ local hugetlb_mb=$(( needmem_KB / 1024 ))
+
+ for huge in -t -T "-H -m $hugetlb_mb"; do
+ # -u: gup-fast, -U: gup-basic, -a: pin-fast, -b: pin-basic, -L: pin-longterm
+ for test_cmd in -u -U -a -b -L; do
+ # -w: write=1, -W: write=0
+ for write in -w -W; do
+ # -S: shared
+ for share in -S " "; do
+ # -n: How many pages to fetch together? 512 is special
+ # because it's default thp size (or 2M on x86), 123 to
+ # just test partial gup when hit a huge in whatever form
+ for num in "-n 1" "-n 512" "-n 123"; do
+ CATEGORY="gup_test" run_test ./gup_test \
+ $huge $test_cmd $write $share $num
+ done
+ done
+ done
+ done
+ done
+}
+
# get huge pagesize and freepages from /proc/meminfo
while read -r name size unit; do
if [ "$name" = "HugePages_Free:" ]; then
@@ -194,13 +218,16 @@ fi
CATEGORY="mmap" run_test ./map_fixed_noreplace
-# get_user_pages_fast() benchmark
-CATEGORY="gup_test" run_test ./gup_test -u
-# pin_user_pages_fast() benchmark
-CATEGORY="gup_test" run_test ./gup_test -a
+if $RUN_ALL; then
+ run_gup_matrix
+else
+ # get_user_pages_fast() benchmark
+ CATEGORY="gup_test" run_test ./gup_test -u
+ # pin_user_pages_fast() benchmark
+ CATEGORY="gup_test" run_test ./gup_test -a
+fi
# Dump pages 0, 19, and 4096, using pin_user_pages:
CATEGORY="gup_test" run_test ./gup_test -ct -F 0x1 0 19 0x1000
-
CATEGORY="gup_test" run_test ./gup_longterm
CATEGORY="userfaultfd" run_test ./uffd-unit-tests
--
2.40.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH v3 8/8] selftests/mm: Add gup test matrix in run_vmtests.sh
2023-06-23 14:29 ` [PATCH v3 8/8] selftests/mm: Add gup test matrix in run_vmtests.sh Peter Xu
@ 2023-06-26 8:11 ` David Hildenbrand
0 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2023-06-26 8:11 UTC (permalink / raw)
To: Peter Xu, linux-mm, linux-kernel
Cc: Lorenzo Stoakes, John Hubbard, Andrew Morton, Mike Rapoport,
Yang Shi, Andrea Arcangeli, Vlastimil Babka, Kirill A . Shutemov,
James Houghton, Matthew Wilcox, Mike Kravetz, Hugh Dickins,
Jason Gunthorpe
On 23.06.23 16:29, Peter Xu wrote:
> Add a matrix for testing gup based on the current gup_test. Only run the
> matrix when -a is specified because it's a bit slow.
>
> It covers:
>
> - Different types of huge pages: thp, hugetlb, or no huge page
> - Permissions: Write / Read-only
> - Fast-gup, with/without
> - Types of the GUP: pin / gup / longterm pins
> - Shared / Private memories
> - GUP size: 1 / 512 / random page sizes
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 17+ messages in thread