* [PATCH v2 00/28] Convert GUP to folios
@ 2022-01-10  4:23 Matthew Wilcox (Oracle)
  2022-01-10  4:23 ` [PATCH v2 01/28] gup: Remove for_each_compound_range() Matthew Wilcox (Oracle)
                   ` (29 more replies)
  0 siblings, 30 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
This patch series is against my current folio for-next branch.  I know
it won't apply to sfr's next tree, and it's not for-next material yet.
I intend to submit it for 5.18 after I've rebased it to one of the
5.17-rc releases.
The overall effect of this (ignoring the primary "preparing for folios
that are not PAGE or PMD sized" purpose) is to reduce the size of gup.o
by ~700 bytes in the config I normally test with.
This patchset just converts existing implementations to use folios.
There's no new API for consumers here to provide information in a more
efficient (address, length) format.  That will be a separate patchset.
In v2, I've tried to address all the comments from the reviews I got
on v1.  Apologies if I missed anything; I got a lot of good feedback.
Primarily I separated out the folio changes (later) from the "While
I'm looking at this ..." changes (earlier).  I'm not sure the story
of the patchset is necessarily coherent this way, but it should be
easier to review.
Another big change is that compound_pincount is now available to all
compound pages, not just those that are order-2-or-higher.  Patch 11.
I did notice one bug in my original patchset which I'm disappointed GCC
didn't diagnose:
		pages[nr++] = nth_page(page, nr);
Given the massive reorg of the patchset, I didn't feel right using
anyone's SoB from v1 on any of the patches.  But, despite the increased
number of patches, I hope it's easier to review than v1.
And I'd dearly love a better name than 'folio_nth'.  page_nth() is
a temporary construct, so doesn't need a better name.  If you need
context, it's in the gup_folio_range_next() patch and its job
is to tell you, given a page and a folio, what # page it is within
a folio (so a number between [0 and folio_nr_pages())).
Matthew Wilcox (Oracle) (28):
  gup: Remove for_each_compound_range()
  gup: Remove for_each_compound_head()
  gup: Change the calling convention for compound_range_next()
  gup: Optimise compound_range_next()
  gup: Change the calling convention for compound_next()
  gup: Fix some contiguous memmap assumptions
  gup: Remove an assumption of a contiguous memmap
  gup: Handle page split race more efficiently
  gup: Turn hpage_pincount_add() into page_pincount_add()
  gup: Turn hpage_pincount_sub() into page_pincount_sub()
  mm: Make compound_pincount always available
  mm: Add folio_put_refs()
  mm: Add folio_pincount_ptr()
  mm: Convert page_maybe_dma_pinned() to use a folio
  gup: Add try_get_folio() and try_grab_folio()
  mm: Remove page_cache_add_speculative() and
    page_cache_get_speculative()
  gup: Add gup_put_folio()
  hugetlb: Use try_grab_folio() instead of try_grab_compound_head()
  gup: Convert try_grab_page() to call try_grab_folio()
  gup: Convert gup_pte_range() to use a folio
  gup: Convert gup_hugepte() to use a folio
  gup: Convert gup_huge_pmd() to use a folio
  gup: Convert gup_huge_pud() to use a folio
  gup: Convert gup_huge_pgd() to use a folio
  gup: Convert compound_next() to gup_folio_next()
  gup: Convert compound_range_next() to gup_folio_range_next()
  mm: Add isolate_lru_folio()
  gup: Convert check_and_migrate_movable_pages() to use a folio
 Documentation/core-api/pin_user_pages.rst |  18 +-
 arch/powerpc/include/asm/mmu_context.h    |   1 -
 include/linux/mm.h                        |  70 +++--
 include/linux/mm_types.h                  |  13 +-
 include/linux/pagemap.h                   |  11 -
 mm/debug.c                                |  14 +-
 mm/folio-compat.c                         |   8 +
 mm/gup.c                                  | 359 ++++++++++------------
 mm/hugetlb.c                              |   7 +-
 mm/internal.h                             |   8 +-
 mm/page_alloc.c                           |   3 +-
 mm/rmap.c                                 |   6 +-
 mm/vmscan.c                               |  43 ++-
 13 files changed, 263 insertions(+), 298 deletions(-)
-- 
2.33.0
^ permalink raw reply	[flat|nested] 96+ messages in thread
* [PATCH v2 01/28] gup: Remove for_each_compound_range()
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:22   ` Christoph Hellwig
  2022-01-11  1:07   ` John Hubbard
  2022-01-10  4:23 ` [PATCH v2 02/28] gup: Remove for_each_compound_head() Matthew Wilcox (Oracle)
                   ` (28 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
This macro doesn't simplify the users; it's easier to just call
compound_range_next() inside the loop.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 2c51e9748a6a..7a07e0c00bf5 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -236,9 +236,6 @@ static inline void compound_range_next(unsigned long i, unsigned long npages,
 	struct page *next, *page;
 	unsigned int nr = 1;
 
-	if (i >= npages)
-		return;
-
 	next = *list + i;
 	page = compound_head(next);
 	if (PageCompound(page) && compound_order(page) >= 1)
@@ -249,12 +246,6 @@ static inline void compound_range_next(unsigned long i, unsigned long npages,
 	*ntails = nr;
 }
 
-#define for_each_compound_range(__i, __list, __npages, __head, __ntails) \
-	for (__i = 0, \
-	     compound_range_next(__i, __npages, __list, &(__head), &(__ntails)); \
-	     __i < __npages; __i += __ntails, \
-	     compound_range_next(__i, __npages, __list, &(__head), &(__ntails)))
-
 static inline void compound_next(unsigned long i, unsigned long npages,
 				 struct page **list, struct page **head,
 				 unsigned int *ntails)
@@ -371,7 +362,8 @@ void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
 	struct page *head;
 	unsigned int ntails;
 
-	for_each_compound_range(index, &page, npages, head, ntails) {
+	for (index = 0; index < npages; index += ntails) {
+		compound_range_next(index, npages, &page, &head, &ntails);
 		if (make_dirty && !PageDirty(head))
 			set_page_dirty_lock(head);
 		put_compound_head(head, ntails, FOLL_PIN);
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 02/28] gup: Remove for_each_compound_head()
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
  2022-01-10  4:23 ` [PATCH v2 01/28] gup: Remove for_each_compound_range() Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:23   ` Christoph Hellwig
  2022-01-11  1:11   ` John Hubbard
  2022-01-10  4:23 ` [PATCH v2 03/28] gup: Change the calling convention for compound_range_next() Matthew Wilcox (Oracle)
                   ` (27 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
This macro doesn't simplify the users; it's easier to just call
compound_next() inside a standard loop.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 7a07e0c00bf5..86f8d843de72 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -253,9 +253,6 @@ static inline void compound_next(unsigned long i, unsigned long npages,
 	struct page *page;
 	unsigned int nr;
 
-	if (i >= npages)
-		return;
-
 	page = compound_head(list[i]);
 	for (nr = i + 1; nr < npages; nr++) {
 		if (compound_head(list[nr]) != page)
@@ -266,12 +263,6 @@ static inline void compound_next(unsigned long i, unsigned long npages,
 	*ntails = nr - i;
 }
 
-#define for_each_compound_head(__i, __list, __npages, __head, __ntails) \
-	for (__i = 0, \
-	     compound_next(__i, __npages, __list, &(__head), &(__ntails)); \
-	     __i < __npages; __i += __ntails, \
-	     compound_next(__i, __npages, __list, &(__head), &(__ntails)))
-
 /**
  * unpin_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages
  * @pages:  array of pages to be maybe marked dirty, and definitely released.
@@ -306,7 +297,8 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
 		return;
 	}
 
-	for_each_compound_head(index, pages, npages, head, ntails) {
+	for (index = 0; index < npages; index += ntails) {
+		compound_next(index, npages, pages, &head, &ntails);
 		/*
 		 * Checking PageDirty at this point may race with
 		 * clear_page_dirty_for_io(), but that's OK. Two key
@@ -394,8 +386,10 @@ void unpin_user_pages(struct page **pages, unsigned long npages)
 	if (WARN_ON(IS_ERR_VALUE(npages)))
 		return;
 
-	for_each_compound_head(index, pages, npages, head, ntails)
+	for (index = 0; index < npages; index += ntails) {
+		compound_next(index, npages, pages, &head, &ntails);
 		put_compound_head(head, ntails, FOLL_PIN);
+	}
 }
 EXPORT_SYMBOL(unpin_user_pages);
 
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 03/28] gup: Change the calling convention for compound_range_next()
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
  2022-01-10  4:23 ` [PATCH v2 01/28] gup: Remove for_each_compound_range() Matthew Wilcox (Oracle)
  2022-01-10  4:23 ` [PATCH v2 02/28] gup: Remove for_each_compound_head() Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:25   ` Christoph Hellwig
  2022-01-11  1:14   ` John Hubbard
  2022-01-10  4:23 ` [PATCH v2 04/28] gup: Optimise compound_range_next() Matthew Wilcox (Oracle)
                   ` (26 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
Return the head page instead of storing it to a passed parameter.
Pass the start page directly instead of passing a pointer to it.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 86f8d843de72..3c93d2fdf4da 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -229,21 +229,20 @@ void unpin_user_page(struct page *page)
 }
 EXPORT_SYMBOL(unpin_user_page);
 
-static inline void compound_range_next(unsigned long i, unsigned long npages,
-				       struct page **list, struct page **head,
-				       unsigned int *ntails)
+static inline struct page *compound_range_next(unsigned long i,
+		unsigned long npages, struct page *start, unsigned int *ntails)
 {
 	struct page *next, *page;
 	unsigned int nr = 1;
 
-	next = *list + i;
+	next = start + i;
 	page = compound_head(next);
 	if (PageCompound(page) && compound_order(page) >= 1)
 		nr = min_t(unsigned int,
 			   page + compound_nr(page) - next, npages - i);
 
-	*head = page;
 	*ntails = nr;
+	return page;
 }
 
 static inline void compound_next(unsigned long i, unsigned long npages,
@@ -355,7 +354,7 @@ void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
 	unsigned int ntails;
 
 	for (index = 0; index < npages; index += ntails) {
-		compound_range_next(index, npages, &page, &head, &ntails);
+		head = compound_range_next(index, npages, page, &ntails);
 		if (make_dirty && !PageDirty(head))
 			set_page_dirty_lock(head);
 		put_compound_head(head, ntails, FOLL_PIN);
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 04/28] gup: Optimise compound_range_next()
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2022-01-10  4:23 ` [PATCH v2 03/28] gup: Change the calling convention for compound_range_next() Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:26   ` Christoph Hellwig
  2022-01-11  1:16   ` John Hubbard
  2022-01-10  4:23 ` [PATCH v2 05/28] gup: Change the calling convention for compound_next() Matthew Wilcox (Oracle)
                   ` (25 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
By definition, a compound page has an order >= 1, so the second half
of the test was redundant.  Also, this cannot be a tail page since
it's the result of calling compound_head(), so use PageHead() instead
of PageCompound().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/gup.c b/mm/gup.c
index 3c93d2fdf4da..6eedca605b3d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -237,7 +237,7 @@ static inline struct page *compound_range_next(unsigned long i,
 
 	next = start + i;
 	page = compound_head(next);
-	if (PageCompound(page) && compound_order(page) >= 1)
+	if (PageHead(page))
 		nr = min_t(unsigned int,
 			   page + compound_nr(page) - next, npages - i);
 
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 05/28] gup: Change the calling convention for compound_next()
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2022-01-10  4:23 ` [PATCH v2 04/28] gup: Optimise compound_range_next() Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:27   ` Christoph Hellwig
  2022-01-11  1:18   ` John Hubbard
  2022-01-10  4:23 ` [PATCH v2 06/28] gup: Fix some contiguous memmap assumptions Matthew Wilcox (Oracle)
                   ` (24 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
Return the head page instead of storing it to a passed parameter.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 6eedca605b3d..8a0ea220ced1 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -245,9 +245,8 @@ static inline struct page *compound_range_next(unsigned long i,
 	return page;
 }
 
-static inline void compound_next(unsigned long i, unsigned long npages,
-				 struct page **list, struct page **head,
-				 unsigned int *ntails)
+static inline struct page *compound_next(unsigned long i,
+		unsigned long npages, struct page **list, unsigned int *ntails)
 {
 	struct page *page;
 	unsigned int nr;
@@ -258,8 +257,8 @@ static inline void compound_next(unsigned long i, unsigned long npages,
 			break;
 	}
 
-	*head = page;
 	*ntails = nr - i;
+	return page;
 }
 
 /**
@@ -297,7 +296,7 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
 	}
 
 	for (index = 0; index < npages; index += ntails) {
-		compound_next(index, npages, pages, &head, &ntails);
+		head = compound_next(index, npages, pages, &ntails);
 		/*
 		 * Checking PageDirty at this point may race with
 		 * clear_page_dirty_for_io(), but that's OK. Two key
@@ -386,7 +385,7 @@ void unpin_user_pages(struct page **pages, unsigned long npages)
 		return;
 
 	for (index = 0; index < npages; index += ntails) {
-		compound_next(index, npages, pages, &head, &ntails);
+		head = compound_next(index, npages, pages, &ntails);
 		put_compound_head(head, ntails, FOLL_PIN);
 	}
 }
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 06/28] gup: Fix some contiguous memmap assumptions
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2022-01-10  4:23 ` [PATCH v2 05/28] gup: Change the calling convention for compound_next() Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:29   ` Christoph Hellwig
  2022-01-11  1:47   ` John Hubbard
  2022-01-10  4:23 ` [PATCH v2 07/28] gup: Remove an assumption of a contiguous memmap Matthew Wilcox (Oracle)
                   ` (23 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
Several functions in gup.c assume that a compound page has virtually
contiguous page structs.  This isn't true for SPARSEMEM configs unless
SPARSEMEM_VMEMMAP is also set.  Fix them by using nth_page() instead of
plain pointer arithmetic.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 8a0ea220ced1..9c0a702a4e03 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -235,7 +235,7 @@ static inline struct page *compound_range_next(unsigned long i,
 	struct page *next, *page;
 	unsigned int nr = 1;
 
-	next = start + i;
+	next = nth_page(start, i);
 	page = compound_head(next);
 	if (PageHead(page))
 		nr = min_t(unsigned int,
@@ -2430,8 +2430,8 @@ static int record_subpages(struct page *page, unsigned long addr,
 {
 	int nr;
 
-	for (nr = 0; addr != end; addr += PAGE_SIZE)
-		pages[nr++] = page++;
+	for (nr = 0; addr != end; nr++, addr += PAGE_SIZE)
+		pages[nr] = nth_page(page, nr);
 
 	return nr;
 }
@@ -2466,7 +2466,7 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
 	VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 
 	head = pte_page(pte);
-	page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
+	page = nth_page(head, (addr & (sz-1)) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
 	head = try_grab_compound_head(head, refs, flags);
@@ -2526,7 +2526,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 					     pages, nr);
 	}
 
-	page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+	page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
 	head = try_grab_compound_head(pmd_page(orig), refs, flags);
@@ -2560,7 +2560,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 					     pages, nr);
 	}
 
-	page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+	page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
 	head = try_grab_compound_head(pud_page(orig), refs, flags);
@@ -2589,7 +2589,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
 
 	BUILD_BUG_ON(pgd_devmap(orig));
 
-	page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
+	page = nth_page(pgd_page(orig), (addr & ~PGDIR_MASK) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
 	head = try_grab_compound_head(pgd_page(orig), refs, flags);
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 07/28] gup: Remove an assumption of a contiguous memmap
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (5 preceding siblings ...)
  2022-01-10  4:23 ` [PATCH v2 06/28] gup: Fix some contiguous memmap assumptions Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:30   ` Christoph Hellwig
  2022-01-11  3:27   ` John Hubbard
  2022-01-10  4:23 ` [PATCH v2 08/28] gup: Handle page split race more efficiently Matthew Wilcox (Oracle)
                   ` (22 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
This assumption needs the inverse of nth_page(), which I've temporarily
named page_nth() until someone comes up with a better name.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h | 2 ++
 mm/gup.c           | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d8b7d7ed14dd..f2f3400665a4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -216,8 +216,10 @@ int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *,
 
 #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
 #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
+#define page_nth(head, tail)	(page_to_pfn(tail) - page_to_pfn(head))
 #else
 #define nth_page(page,n) ((page) + (n))
+#define page_nth(head, tail)	((tail) - (head))
 #endif
 
 /* to align the pointer to the (next) page boundary */
diff --git a/mm/gup.c b/mm/gup.c
index 9c0a702a4e03..afb638a30e44 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -238,8 +238,8 @@ static inline struct page *compound_range_next(unsigned long i,
 	next = nth_page(start, i);
 	page = compound_head(next);
 	if (PageHead(page))
-		nr = min_t(unsigned int,
-			   page + compound_nr(page) - next, npages - i);
+		nr = min_t(unsigned int, npages - i,
+			   compound_nr(page) - page_nth(page, next));
 
 	*ntails = nr;
 	return page;
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 08/28] gup: Handle page split race more efficiently
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (6 preceding siblings ...)
  2022-01-10  4:23 ` [PATCH v2 07/28] gup: Remove an assumption of a contiguous memmap Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:31   ` Christoph Hellwig
  2022-01-11  3:30   ` John Hubbard
  2022-01-10  4:23 ` [PATCH v2 09/28] gup: Turn hpage_pincount_add() into page_pincount_add() Matthew Wilcox (Oracle)
                   ` (21 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
If we hit the page split race, the current code returns NULL which will
presumably trigger a retry under the mmap_lock.  This isn't necessary;
we can just retry the compound_head() lookup.  This is a very minor
optimisation of an unlikely path, but conceptually it matches (eg)
the page cache RCU-protected lookup.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index afb638a30e44..dbb1b54d0def 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -68,7 +68,10 @@ static void put_page_refs(struct page *page, int refs)
  */
 static inline struct page *try_get_compound_head(struct page *page, int refs)
 {
-	struct page *head = compound_head(page);
+	struct page *head;
+
+retry:
+	head = compound_head(page);
 
 	if (WARN_ON_ONCE(page_ref_count(head) < 0))
 		return NULL;
@@ -86,7 +89,7 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
 	 */
 	if (unlikely(compound_head(page) != head)) {
 		put_page_refs(head, refs);
-		return NULL;
+		goto retry;
 	}
 
 	return head;
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 09/28] gup: Turn hpage_pincount_add() into page_pincount_add()
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (7 preceding siblings ...)
  2022-01-10  4:23 ` [PATCH v2 08/28] gup: Handle page split race more efficiently Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:31   ` Christoph Hellwig
                     ` (2 more replies)
  2022-01-10  4:23 ` [PATCH v2 10/28] gup: Turn hpage_pincount_sub() into page_pincount_sub() Matthew Wilcox (Oracle)
                   ` (20 subsequent siblings)
  29 siblings, 3 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
Simplify try_grab_compound_head() and remove an unnecessary
VM_BUG_ON by handling pages both with and without a pincount field in
page_pincount_add().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 33 +++++++++++++++------------------
 1 file changed, 15 insertions(+), 18 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index dbb1b54d0def..3ed9907f3c8d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -29,12 +29,23 @@ struct follow_page_context {
 	unsigned int page_mask;
 };
 
-static void hpage_pincount_add(struct page *page, int refs)
+/*
+ * When pinning a compound page of order > 1 (which is what
+ * hpage_pincount_available() checks for), use an exact count to track
+ * it, via page_pincount_add/_sub().
+ *
+ * However, be sure to *also* increment the normal page refcount field
+ * at least once, so that the page really is pinned.  That's why the
+ * refcount from the earlier try_get_compound_head() is left intact.
+ */
+static void page_pincount_add(struct page *page, int refs)
 {
-	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
 	VM_BUG_ON_PAGE(page != compound_head(page), page);
 
-	atomic_add(refs, compound_pincount_ptr(page));
+	if (hpage_pincount_available(page))
+		atomic_add(refs, compound_pincount_ptr(page));
+	else
+		page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
 }
 
 static void hpage_pincount_sub(struct page *page, int refs)
@@ -150,21 +161,7 @@ struct page *try_grab_compound_head(struct page *page,
 		if (!page)
 			return NULL;
 
-		/*
-		 * When pinning a compound page of order > 1 (which is what
-		 * hpage_pincount_available() checks for), use an exact count to
-		 * track it, via hpage_pincount_add/_sub().
-		 *
-		 * However, be sure to *also* increment the normal page refcount
-		 * field at least once, so that the page really is pinned.
-		 * That's why the refcount from the earlier
-		 * try_get_compound_head() is left intact.
-		 */
-		if (hpage_pincount_available(page))
-			hpage_pincount_add(page, refs);
-		else
-			page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
-
+		page_pincount_add(page, refs);
 		mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
 				    refs);
 
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 10/28] gup: Turn hpage_pincount_sub() into page_pincount_sub()
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (8 preceding siblings ...)
  2022-01-10  4:23 ` [PATCH v2 09/28] gup: Turn hpage_pincount_add() into page_pincount_add() Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:32   ` Christoph Hellwig
  2022-01-11  6:40   ` John Hubbard
  2022-01-10  4:23 ` [PATCH v2 11/28] mm: Make compound_pincount always available Matthew Wilcox (Oracle)
                   ` (19 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
Remove an unnecessary VM_BUG_ON by handling pages both with and without
a pincount field in page_pincount_sub().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 3ed9907f3c8d..aed48de3912e 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -48,12 +48,15 @@ static void page_pincount_add(struct page *page, int refs)
 		page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
 }
 
-static void hpage_pincount_sub(struct page *page, int refs)
+static int page_pincount_sub(struct page *page, int refs)
 {
-	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
 	VM_BUG_ON_PAGE(page != compound_head(page), page);
 
-	atomic_sub(refs, compound_pincount_ptr(page));
+	if (hpage_pincount_available(page))
+		atomic_sub(refs, compound_pincount_ptr(page));
+	else
+		refs *= GUP_PIN_COUNTING_BIAS;
+	return refs;
 }
 
 /* Equivalent to calling put_page() @refs times. */
@@ -177,11 +180,7 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)
 	if (flags & FOLL_PIN) {
 		mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED,
 				    refs);
-
-		if (hpage_pincount_available(page))
-			hpage_pincount_sub(page, refs);
-		else
-			refs *= GUP_PIN_COUNTING_BIAS;
+		refs = page_pincount_sub(page, refs);
 	}
 
 	put_page_refs(page, refs);
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 11/28] mm: Make compound_pincount always available
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (9 preceding siblings ...)
  2022-01-10  4:23 ` [PATCH v2 10/28] gup: Turn hpage_pincount_sub() into page_pincount_sub() Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-11  4:06   ` John Hubbard
  2022-01-20  9:15   ` Christoph Hellwig
  2022-01-10  4:23 ` [PATCH v2 12/28] mm: Add folio_put_refs() Matthew Wilcox (Oracle)
                   ` (18 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
Move compound_pincount from the third page to the second page, which
means it's available for all compound pages.  That lets us delete
hpage_pincount_available().
On 32-bit systems, there isn't enough space for both compound_pincount
and compound_nr in the second page (it would collide with page->private,
which is in use for pages in the swap cache), so revert the optimisation
of storing both compound_order and compound_nr on 32-bit systems.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 Documentation/core-api/pin_user_pages.rst | 18 +++++++++---------
 include/linux/mm.h                        | 21 ++++++++-------------
 include/linux/mm_types.h                  |  7 +++++--
 mm/debug.c                                | 14 ++++----------
 mm/gup.c                                  | 18 ++++++++----------
 mm/page_alloc.c                           |  3 +--
 mm/rmap.c                                 |  6 ++----
 7 files changed, 37 insertions(+), 50 deletions(-)
diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst
index fcf605be43d0..b18416f4500f 100644
--- a/Documentation/core-api/pin_user_pages.rst
+++ b/Documentation/core-api/pin_user_pages.rst
@@ -55,18 +55,18 @@ flags the caller provides. The caller is required to pass in a non-null struct
 pages* array, and the function then pins pages by incrementing each by a special
 value: GUP_PIN_COUNTING_BIAS.
 
-For huge pages (and in fact, any compound page of more than 2 pages), the
-GUP_PIN_COUNTING_BIAS scheme is not used. Instead, an exact form of pin counting
-is achieved, by using the 3rd struct page in the compound page. A new struct
-page field, hpage_pinned_refcount, has been added in order to support this.
+For compound pages, the GUP_PIN_COUNTING_BIAS scheme is not used. Instead,
+an exact form of pin counting is achieved, by using the 2nd struct page
+in the compound page. A new struct page field, compound_pincount, has
+been added in order to support this.
 
 This approach for compound pages avoids the counting upper limit problems that
 are discussed below. Those limitations would have been aggravated severely by
 huge pages, because each tail page adds a refcount to the head page. And in
-fact, testing revealed that, without a separate hpage_pinned_refcount field,
+fact, testing revealed that, without a separate compound_pincount field,
 page overflows were seen in some huge page stress tests.
 
-This also means that huge pages and compound pages (of order > 1) do not suffer
+This also means that huge pages and compound pages do not suffer
 from the false positives problem that is mentioned below.::
 
  Function
@@ -264,9 +264,9 @@ place.)
 Other diagnostics
 =================
 
-dump_page() has been enhanced slightly, to handle these new counting fields, and
-to better report on compound pages in general. Specifically, for compound pages
-with order > 1, the exact (hpage_pinned_refcount) pincount is reported.
+dump_page() has been enhanced slightly, to handle these new counting
+fields, and to better report on compound pages in general. Specifically,
+for compound pages, the exact (compound_pincount) pincount is reported.
 
 References
 ==========
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f2f3400665a4..598be27d4d2e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -929,17 +929,6 @@ static inline void destroy_compound_page(struct page *page)
 	compound_page_dtors[page[1].compound_dtor](page);
 }
 
-static inline bool hpage_pincount_available(struct page *page)
-{
-	/*
-	 * Can the page->hpage_pinned_refcount field be used? That field is in
-	 * the 3rd page of the compound page, so the smallest (2-page) compound
-	 * pages cannot support it.
-	 */
-	page = compound_head(page);
-	return PageCompound(page) && compound_order(page) > 1;
-}
-
 static inline int head_compound_pincount(struct page *head)
 {
 	return atomic_read(compound_pincount_ptr(head));
@@ -947,7 +936,7 @@ static inline int head_compound_pincount(struct page *head)
 
 static inline int compound_pincount(struct page *page)
 {
-	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
+	VM_BUG_ON_PAGE(!PageCompound(page), page);
 	page = compound_head(page);
 	return head_compound_pincount(page);
 }
@@ -955,7 +944,9 @@ static inline int compound_pincount(struct page *page)
 static inline void set_compound_order(struct page *page, unsigned int order)
 {
 	page[1].compound_order = order;
+#ifdef CONFIG_64BIT
 	page[1].compound_nr = 1U << order;
+#endif
 }
 
 /* Returns the number of pages in this potentially compound page. */
@@ -963,7 +954,11 @@ static inline unsigned long compound_nr(struct page *page)
 {
 	if (!PageHead(page))
 		return 1;
+#ifdef CONFIG_64BIT
 	return page[1].compound_nr;
+#else
+	return 1UL << compound_order(page);
+#endif
 }
 
 /* Returns the number of bytes in this potentially compound page. */
@@ -1325,7 +1320,7 @@ void unpin_user_pages(struct page **pages, unsigned long npages);
  */
 static inline bool page_maybe_dma_pinned(struct page *page)
 {
-	if (hpage_pincount_available(page))
+	if (PageCompound(page))
 		return compound_pincount(page) > 0;
 
 	/*
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index c3a6e6209600..60e4595eaf63 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -150,11 +150,14 @@ struct page {
 			unsigned char compound_dtor;
 			unsigned char compound_order;
 			atomic_t compound_mapcount;
+			atomic_t compound_pincount;
+#ifdef CONFIG_64BIT
 			unsigned int compound_nr; /* 1 << compound_order */
+#endif
 		};
 		struct {	/* Second tail page of compound page */
 			unsigned long _compound_pad_1;	/* compound_head */
-			atomic_t hpage_pinned_refcount;
+			unsigned long _compound_pad_2;
 			/* For both global and memcg */
 			struct list_head deferred_list;
 		};
@@ -311,7 +314,7 @@ static inline atomic_t *compound_mapcount_ptr(struct page *page)
 
 static inline atomic_t *compound_pincount_ptr(struct page *page)
 {
-	return &page[2].hpage_pinned_refcount;
+	return &page[1].compound_pincount;
 }
 
 /*
diff --git a/mm/debug.c b/mm/debug.c
index a05a39ff8fe4..7925fac2bd8e 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -92,16 +92,10 @@ static void __dump_page(struct page *page)
 			page, page_ref_count(head), mapcount, mapping,
 			page_to_pgoff(page), page_to_pfn(page));
 	if (compound) {
-		if (hpage_pincount_available(page)) {
-			pr_warn("head:%p order:%u compound_mapcount:%d compound_pincount:%d\n",
-					head, compound_order(head),
-					head_compound_mapcount(head),
-					head_compound_pincount(head));
-		} else {
-			pr_warn("head:%p order:%u compound_mapcount:%d\n",
-					head, compound_order(head),
-					head_compound_mapcount(head));
-		}
+		pr_warn("head:%p order:%u compound_mapcount:%d compound_pincount:%d\n",
+				head, compound_order(head),
+				head_compound_mapcount(head),
+				head_compound_pincount(head));
 	}
 
 #ifdef CONFIG_MEMCG
diff --git a/mm/gup.c b/mm/gup.c
index aed48de3912e..1282d29357b7 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -30,9 +30,8 @@ struct follow_page_context {
 };
 
 /*
- * When pinning a compound page of order > 1 (which is what
- * hpage_pincount_available() checks for), use an exact count to track
- * it, via page_pincount_add/_sub().
+ * When pinning a compound page, use an exact count to track it, via
+ * page_pincount_add/_sub().
  *
  * However, be sure to *also* increment the normal page refcount field
  * at least once, so that the page really is pinned.  That's why the
@@ -42,7 +41,7 @@ static void page_pincount_add(struct page *page, int refs)
 {
 	VM_BUG_ON_PAGE(page != compound_head(page), page);
 
-	if (hpage_pincount_available(page))
+	if (PageHead(page))
 		atomic_add(refs, compound_pincount_ptr(page));
 	else
 		page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
@@ -52,7 +51,7 @@ static int page_pincount_sub(struct page *page, int refs)
 {
 	VM_BUG_ON_PAGE(page != compound_head(page), page);
 
-	if (hpage_pincount_available(page))
+	if (PageHead(page))
 		atomic_sub(refs, compound_pincount_ptr(page));
 	else
 		refs *= GUP_PIN_COUNTING_BIAS;
@@ -129,12 +128,11 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
  *
  *    FOLL_GET: page's refcount will be incremented by @refs.
  *
- *    FOLL_PIN on compound pages that are > two pages long: page's refcount will
- *    be incremented by @refs, and page[2].hpage_pinned_refcount will be
- *    incremented by @refs * GUP_PIN_COUNTING_BIAS.
+ *    FOLL_PIN on compound pages: page's refcount will be incremented by
+ *    @refs, and page[1].compound_pincount will be incremented by @refs.
  *
- *    FOLL_PIN on normal pages, or compound pages that are two pages long:
- *    page's refcount will be incremented by @refs * GUP_PIN_COUNTING_BIAS.
+ *    FOLL_PIN on normal pages: page's refcount will be incremented by
+ *    @refs * GUP_PIN_COUNTING_BIAS.
  *
  * Return: head page (with refcount appropriately incremented) for success, or
  * NULL upon failure. If neither FOLL_GET nor FOLL_PIN was set, that's
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c5952749ad40..6b030c0cb207 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -741,8 +741,7 @@ void prep_compound_page(struct page *page, unsigned int order)
 	set_compound_page_dtor(page, COMPOUND_PAGE_DTOR);
 	set_compound_order(page, order);
 	atomic_set(compound_mapcount_ptr(page), -1);
-	if (hpage_pincount_available(page))
-		atomic_set(compound_pincount_ptr(page), 0);
+	atomic_set(compound_pincount_ptr(page), 0);
 }
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
diff --git a/mm/rmap.c b/mm/rmap.c
index 163ac4e6bcee..a44a32db4803 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1187,8 +1187,7 @@ void page_add_new_anon_rmap(struct page *page,
 		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
 		/* increment count (starts at -1) */
 		atomic_set(compound_mapcount_ptr(page), 0);
-		if (hpage_pincount_available(page))
-			atomic_set(compound_pincount_ptr(page), 0);
+		atomic_set(compound_pincount_ptr(page), 0);
 
 		__mod_lruvec_page_state(page, NR_ANON_THPS, nr);
 	} else {
@@ -2410,8 +2409,7 @@ void hugepage_add_new_anon_rmap(struct page *page,
 {
 	BUG_ON(address < vma->vm_start || address >= vma->vm_end);
 	atomic_set(compound_mapcount_ptr(page), 0);
-	if (hpage_pincount_available(page))
-		atomic_set(compound_pincount_ptr(page), 0);
+	atomic_set(compound_pincount_ptr(page), 0);
 
 	__page_set_anon_rmap(page, vma, address, 1);
 }
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 12/28] mm: Add folio_put_refs()
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (10 preceding siblings ...)
  2022-01-10  4:23 ` [PATCH v2 11/28] mm: Make compound_pincount always available Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:32   ` Christoph Hellwig
  2022-01-11  4:14   ` John Hubbard
  2022-01-10  4:23 ` [PATCH v2 13/28] mm: Add folio_pincount_ptr() Matthew Wilcox (Oracle)
                   ` (17 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
This is like folio_put(), but puts N references at once instead of
just one.  It's like put_page_refs(), but does one atomic operation
instead of two, and is available to more than just gup.c.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 598be27d4d2e..bf9624ca61c3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1234,6 +1234,26 @@ static inline void folio_put(struct folio *folio)
 		__put_page(&folio->page);
 }
 
+/**
+ * folio_put_refs - Reduce the reference count on a folio.
+ * @folio: The folio.
+ * @refs: The number of references to reduce.
+ *
+ * If the folio's reference count reaches zero, the memory will be
+ * released back to the page allocator and may be used by another
+ * allocation immediately.  Do not access the memory or the struct folio
+ * after calling folio_put_refs() unless you can be sure that these weren't
+ * the last references.
+ *
+ * Context: May be called in process or interrupt context, but not in NMI
+ * context.  May be called while holding a spinlock.
+ */
+static inline void folio_put_refs(struct folio *folio, int refs)
+{
+	if (folio_ref_sub_and_test(folio, refs))
+		__put_page(&folio->page);
+}
+
 static inline void put_page(struct page *page)
 {
 	struct folio *folio = page_folio(page);
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 13/28] mm: Add folio_pincount_ptr()
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (11 preceding siblings ...)
  2022-01-10  4:23 ` [PATCH v2 12/28] mm: Add folio_put_refs() Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:33   ` Christoph Hellwig
  2022-01-11  4:22   ` John Hubbard
  2022-01-10  4:23 ` [PATCH v2 14/28] mm: Convert page_maybe_dma_pinned() to use a folio Matthew Wilcox (Oracle)
                   ` (16 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
This is the folio equivalent of compound_pincount_ptr().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm_types.h | 6 ++++++
 1 file changed, 6 insertions(+)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 60e4595eaf63..34c7114ea9e9 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -312,6 +312,12 @@ static inline atomic_t *compound_mapcount_ptr(struct page *page)
 	return &page[1].compound_mapcount;
 }
 
+static inline atomic_t *folio_pincount_ptr(struct folio *folio)
+{
+	struct page *tail = &folio->page + 1;
+	return &tail->compound_pincount;
+}
+
 static inline atomic_t *compound_pincount_ptr(struct page *page)
 {
 	return &page[1].compound_pincount;
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 14/28] mm: Convert page_maybe_dma_pinned() to use a folio
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (12 preceding siblings ...)
  2022-01-10  4:23 ` [PATCH v2 13/28] mm: Add folio_pincount_ptr() Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:33   ` Christoph Hellwig
  2022-01-11  4:27   ` John Hubbard
  2022-01-10  4:23 ` [PATCH v2 15/28] gup: Add try_get_folio() and try_grab_folio() Matthew Wilcox (Oracle)
                   ` (15 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
Replace three calls to compound_head() with one.  This removes the last
user of compound_pincount(), so remove that helper too.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bf9624ca61c3..d3769897c8ac 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -934,13 +934,6 @@ static inline int head_compound_pincount(struct page *head)
 	return atomic_read(compound_pincount_ptr(head));
 }
 
-static inline int compound_pincount(struct page *page)
-{
-	VM_BUG_ON_PAGE(!PageCompound(page), page);
-	page = compound_head(page);
-	return head_compound_pincount(page);
-}
-
 static inline void set_compound_order(struct page *page, unsigned int order)
 {
 	page[1].compound_order = order;
@@ -1340,18 +1333,20 @@ void unpin_user_pages(struct page **pages, unsigned long npages);
  */
 static inline bool page_maybe_dma_pinned(struct page *page)
 {
-	if (PageCompound(page))
-		return compound_pincount(page) > 0;
+	struct folio *folio = page_folio(page);
+
+	if (folio_test_large(folio))
+		return atomic_read(folio_pincount_ptr(folio)) > 0;
 
 	/*
-	 * page_ref_count() is signed. If that refcount overflows, then
-	 * page_ref_count() returns a negative value, and callers will avoid
+	 * folio_ref_count() is signed. If that refcount overflows, then
+	 * folio_ref_count() returns a negative value, and callers will avoid
 	 * further incrementing the refcount.
 	 *
-	 * Here, for that overflow case, use the signed bit to count a little
+	 * Here, for that overflow case, use the sign bit to count a little
 	 * bit higher via unsigned math, and thus still get an accurate result.
 	 */
-	return ((unsigned int)page_ref_count(compound_head(page))) >=
+	return ((unsigned int)folio_ref_count(folio)) >=
 		GUP_PIN_COUNTING_BIAS;
 }
 
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 15/28] gup: Add try_get_folio() and try_grab_folio()
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (13 preceding siblings ...)
  2022-01-10  4:23 ` [PATCH v2 14/28] mm: Convert page_maybe_dma_pinned() to use a folio Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:34   ` Christoph Hellwig
  2022-01-11  5:00   ` John Hubbard
  2022-01-10  4:23 ` [PATCH v2 16/28] mm: Remove page_cache_add_speculative() and page_cache_get_speculative() Matthew Wilcox (Oracle)
                   ` (14 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
Convert try_get_compound_head() into try_get_folio() and convert
try_grab_compound_head() into try_grab_folio().  Also convert
hpage_pincount_add() to folio_pincount_add().  Add a temporary
try_grab_compound_head() wrapper around try_grab_folio() to
let us convert callers individually.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c      | 104 +++++++++++++++++++++++++-------------------------
 mm/internal.h |   5 +++
 2 files changed, 56 insertions(+), 53 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 1282d29357b7..9e581201d679 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -30,21 +30,19 @@ struct follow_page_context {
 };
 
 /*
- * When pinning a compound page, use an exact count to track it, via
- * page_pincount_add/_sub().
+ * When pinning a large folio, use an exact count to track it.
  *
- * However, be sure to *also* increment the normal page refcount field
- * at least once, so that the page really is pinned.  That's why the
- * refcount from the earlier try_get_compound_head() is left intact.
+ * However, be sure to *also* increment the normal folio refcount
+ * field at least once, so that the folio really is pinned.
+ * That's why the refcount from the earlier
+ * try_get_folio() is left intact.
  */
-static void page_pincount_add(struct page *page, int refs)
+static void folio_pincount_add(struct folio *folio, int refs)
 {
-	VM_BUG_ON_PAGE(page != compound_head(page), page);
-
-	if (PageHead(page))
-		atomic_add(refs, compound_pincount_ptr(page));
+	if (folio_test_large(folio))
+		atomic_add(refs, folio_pincount_ptr(folio));
 	else
-		page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
+		folio_ref_add(folio, refs * (GUP_PIN_COUNTING_BIAS - 1));
 }
 
 static int page_pincount_sub(struct page *page, int refs)
@@ -76,75 +74,70 @@ static void put_page_refs(struct page *page, int refs)
 }
 
 /*
- * Return the compound head page with ref appropriately incremented,
+ * Return the folio with ref appropriately incremented,
  * or NULL if that failed.
  */
-static inline struct page *try_get_compound_head(struct page *page, int refs)
+static inline struct folio *try_get_folio(struct page *page, int refs)
 {
-	struct page *head;
+	struct folio *folio;
 
 retry:
-	head = compound_head(page);
-
-	if (WARN_ON_ONCE(page_ref_count(head) < 0))
+	folio = page_folio(page);
+	if (WARN_ON_ONCE(folio_ref_count(folio) < 0))
 		return NULL;
-	if (unlikely(!page_cache_add_speculative(head, refs)))
+	if (unlikely(!folio_ref_try_add_rcu(folio, refs)))
 		return NULL;
 
 	/*
-	 * At this point we have a stable reference to the head page; but it
-	 * could be that between the compound_head() lookup and the refcount
-	 * increment, the compound page was split, in which case we'd end up
-	 * holding a reference on a page that has nothing to do with the page
+	 * At this point we have a stable reference to the folio; but it
+	 * could be that between calling page_folio() and the refcount
+	 * increment, the folio was split, in which case we'd end up
+	 * holding a reference on a folio that has nothing to do with the page
 	 * we were given anymore.
-	 * So now that the head page is stable, recheck that the pages still
-	 * belong together.
+	 * So now that the folio is stable, recheck that the page still
+	 * belongs to this folio.
 	 */
-	if (unlikely(compound_head(page) != head)) {
-		put_page_refs(head, refs);
+	if (unlikely(page_folio(page) != folio)) {
+		folio_put_refs(folio, refs);
 		goto retry;
 	}
 
-	return head;
+	return folio;
 }
 
 /**
- * try_grab_compound_head() - attempt to elevate a page's refcount, by a
- * flags-dependent amount.
- *
- * Even though the name includes "compound_head", this function is still
- * appropriate for callers that have a non-compound @page to get.
- *
+ * try_grab_folio() - Attempt to get or pin a folio.
  * @page:  pointer to page to be grabbed
- * @refs:  the value to (effectively) add to the page's refcount
+ * @refs:  the value to (effectively) add to the folio's refcount
  * @flags: gup flags: these are the FOLL_* flag values.
  *
  * "grab" names in this file mean, "look at flags to decide whether to use
- * FOLL_PIN or FOLL_GET behavior, when incrementing the page's refcount.
+ * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount.
  *
  * Either FOLL_PIN or FOLL_GET (or neither) must be set, but not both at the
  * same time. (That's true throughout the get_user_pages*() and
  * pin_user_pages*() APIs.) Cases:
  *
- *    FOLL_GET: page's refcount will be incremented by @refs.
+ *    FOLL_GET: folio's refcount will be incremented by @refs.
  *
- *    FOLL_PIN on compound pages: page's refcount will be incremented by
- *    @refs, and page[1].compound_pincount will be incremented by @refs.
+ *    FOLL_PIN on large folios: folio's refcount will be incremented by
+ *    @refs, and its compound_pincount will be incremented by @refs.
  *
- *    FOLL_PIN on normal pages: page's refcount will be incremented by
+ *    FOLL_PIN on single-page folios: folio's refcount will be incremented by
  *    @refs * GUP_PIN_COUNTING_BIAS.
  *
- * Return: head page (with refcount appropriately incremented) for success, or
- * NULL upon failure. If neither FOLL_GET nor FOLL_PIN was set, that's
- * considered failure, and furthermore, a likely bug in the caller, so a warning
- * is also emitted.
+ * Return: The folio containing @page (with refcount appropriately
+ * incremented) for success, or NULL upon failure. If neither FOLL_GET
+ * nor FOLL_PIN was set, that's considered failure, and furthermore,
+ * a likely bug in the caller, so a warning is also emitted.
  */
-struct page *try_grab_compound_head(struct page *page,
-				    int refs, unsigned int flags)
+struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
 {
 	if (flags & FOLL_GET)
-		return try_get_compound_head(page, refs);
+		return try_get_folio(page, refs);
 	else if (flags & FOLL_PIN) {
+		struct folio *folio;
+
 		/*
 		 * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
 		 * right zone, so fail and let the caller fall back to the slow
@@ -158,21 +151,26 @@ struct page *try_grab_compound_head(struct page *page,
 		 * CAUTION: Don't use compound_head() on the page before this
 		 * point, the result won't be stable.
 		 */
-		page = try_get_compound_head(page, refs);
-		if (!page)
+		folio = try_get_folio(page, refs);
+		if (!folio)
 			return NULL;
 
-		page_pincount_add(page, refs);
-		mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
-				    refs);
+		folio_pincount_add(folio, refs);
+		node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
 
-		return page;
+		return folio;
 	}
 
 	WARN_ON_ONCE(1);
 	return NULL;
 }
 
+struct page *try_grab_compound_head(struct page *page,
+		int refs, unsigned int flags)
+{
+	return &try_grab_folio(page, refs, flags)->page;
+}
+
 static void put_compound_head(struct page *page, int refs, unsigned int flags)
 {
 	if (flags & FOLL_PIN) {
@@ -196,7 +194,7 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)
  * @flags:   gup flags: these are the FOLL_* flag values.
  *
  * Either FOLL_PIN or FOLL_GET (or neither) may be set, but not both at the same
- * time. Cases: please see the try_grab_compound_head() documentation, with
+ * time. Cases: please see the try_grab_folio() documentation, with
  * "refs=1".
  *
  * Return: true for success, or if no action was required (if neither FOLL_PIN
diff --git a/mm/internal.h b/mm/internal.h
index 26af8a5a5be3..9a72d1ecdab4 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -723,4 +723,9 @@ void vunmap_range_noflush(unsigned long start, unsigned long end);
 int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
 		      unsigned long addr, int page_nid, int *flags);
 
+/*
+ * mm/gup.c
+ */
+struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags);
+
 #endif	/* __MM_INTERNAL_H */
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 16/28] mm: Remove page_cache_add_speculative() and page_cache_get_speculative()
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (14 preceding siblings ...)
  2022-01-10  4:23 ` [PATCH v2 15/28] gup: Add try_get_folio() and try_grab_folio() Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:35   ` Christoph Hellwig
  2022-01-11  5:14   ` John Hubbard
  2022-01-10  4:23 ` [PATCH v2 17/28] gup: Add gup_put_folio() Matthew Wilcox (Oracle)
                   ` (13 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
These wrappers have no more callers, so delete them.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h      |  7 +++----
 include/linux/pagemap.h | 11 -----------
 2 files changed, 3 insertions(+), 15 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d3769897c8ac..b249156f7cf1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1291,10 +1291,9 @@ static inline void put_page(struct page *page)
  * applications that don't have huge page reference counts, this won't be an
  * issue.
  *
- * Locking: the lockless algorithm described in page_cache_get_speculative()
- * and page_cache_gup_pin_speculative() provides safe operation for
- * get_user_pages and page_mkclean and other calls that race to set up page
- * table entries.
+ * Locking: the lockless algorithm described in folio_try_get_rcu()
+ * provides safe operation for get_user_pages(), page_mkclean() and
+ * other calls that race to set up page table entries.
  */
 #define GUP_PIN_COUNTING_BIAS (1U << 10)
 
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 704cb1b4b15d..4a63176b6417 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -283,17 +283,6 @@ static inline struct inode *folio_inode(struct folio *folio)
 	return folio->mapping->host;
 }
 
-static inline bool page_cache_add_speculative(struct page *page, int count)
-{
-	VM_BUG_ON_PAGE(PageTail(page), page);
-	return folio_ref_try_add_rcu((struct folio *)page, count);
-}
-
-static inline bool page_cache_get_speculative(struct page *page)
-{
-	return page_cache_add_speculative(page, 1);
-}
-
 /**
  * folio_attach_private - Attach private data to a folio.
  * @folio: Folio to attach data to.
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 17/28] gup: Add gup_put_folio()
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (15 preceding siblings ...)
  2022-01-10  4:23 ` [PATCH v2 16/28] mm: Remove page_cache_add_speculative() and page_cache_get_speculative() Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:35   ` Christoph Hellwig
  2022-01-11  6:44   ` John Hubbard
  2022-01-10  4:23 ` [PATCH v2 18/28] hugetlb: Use try_grab_folio() instead of try_grab_compound_head() Matthew Wilcox (Oracle)
                   ` (12 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
Convert put_compound_head() to gup_put_folio() and hpage_pincount_sub()
to folio_pincount_sub().  This removes the last call to put_page_refs(),
so delete it.  Add a temporary put_compound_head() wrapper which will
be deleted by the end of this series.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 42 ++++++++++++++----------------------------
 1 file changed, 14 insertions(+), 28 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 9e581201d679..719252fa0402 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -45,34 +45,15 @@ static void folio_pincount_add(struct folio *folio, int refs)
 		folio_ref_add(folio, refs * (GUP_PIN_COUNTING_BIAS - 1));
 }
 
-static int page_pincount_sub(struct page *page, int refs)
+static int folio_pincount_sub(struct folio *folio, int refs)
 {
-	VM_BUG_ON_PAGE(page != compound_head(page), page);
-
-	if (PageHead(page))
-		atomic_sub(refs, compound_pincount_ptr(page));
+	if (folio_test_large(folio))
+		atomic_sub(refs, folio_pincount_ptr(folio));
 	else
 		refs *= GUP_PIN_COUNTING_BIAS;
 	return refs;
 }
 
-/* Equivalent to calling put_page() @refs times. */
-static void put_page_refs(struct page *page, int refs)
-{
-#ifdef CONFIG_DEBUG_VM
-	if (VM_WARN_ON_ONCE_PAGE(page_ref_count(page) < refs, page))
-		return;
-#endif
-
-	/*
-	 * Calling put_page() for each ref is unnecessarily slow. Only the last
-	 * ref needs a put_page().
-	 */
-	if (refs > 1)
-		page_ref_sub(page, refs - 1);
-	put_page(page);
-}
-
 /*
  * Return the folio with ref appropriately incremented,
  * or NULL if that failed.
@@ -171,15 +152,20 @@ struct page *try_grab_compound_head(struct page *page,
 	return &try_grab_folio(page, refs, flags)->page;
 }
 
-static void put_compound_head(struct page *page, int refs, unsigned int flags)
+static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
 {
 	if (flags & FOLL_PIN) {
-		mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED,
-				    refs);
-		refs = page_pincount_sub(page, refs);
+		node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs);
+		refs = folio_pincount_sub(folio, refs);
 	}
 
-	put_page_refs(page, refs);
+	folio_put_refs(folio, refs);
+}
+
+static void put_compound_head(struct page *page, int refs, unsigned int flags)
+{
+	VM_BUG_ON_PAGE(PageTail(page), page);
+	gup_put_folio((struct folio *)page, refs, flags);
 }
 
 /**
@@ -220,7 +206,7 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags)
  */
 void unpin_user_page(struct page *page)
 {
-	put_compound_head(compound_head(page), 1, FOLL_PIN);
+	gup_put_folio(page_folio(page), 1, FOLL_PIN);
 }
 EXPORT_SYMBOL(unpin_user_page);
 
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 18/28] hugetlb: Use try_grab_folio() instead of try_grab_compound_head()
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (16 preceding siblings ...)
  2022-01-10  4:23 ` [PATCH v2 17/28] gup: Add gup_put_folio() Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:36   ` Christoph Hellwig
  2022-01-11  6:47   ` John Hubbard
  2022-01-10  4:23 ` [PATCH v2 19/28] gup: Convert try_grab_page() to call try_grab_folio() Matthew Wilcox (Oracle)
                   ` (11 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
follow_hugetlb_page() only cares about success or failure, so it doesn't
need to know the type of the returned pointer, only whether it's NULL
or not.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h | 3 ---
 mm/gup.c           | 2 +-
 mm/hugetlb.c       | 7 +++----
 3 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b249156f7cf1..c103c6401ecd 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1195,9 +1195,6 @@ static inline void get_page(struct page *page)
 }
 
 bool __must_check try_grab_page(struct page *page, unsigned int flags);
-struct page *try_grab_compound_head(struct page *page, int refs,
-				    unsigned int flags);
-
 
 static inline __must_check bool try_get_page(struct page *page)
 {
diff --git a/mm/gup.c b/mm/gup.c
index 719252fa0402..20703de2f107 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -146,7 +146,7 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
 	return NULL;
 }
 
-struct page *try_grab_compound_head(struct page *page,
+static inline struct page *try_grab_compound_head(struct page *page,
 		int refs, unsigned int flags)
 {
 	return &try_grab_folio(page, refs, flags)->page;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index abcd1785c629..ab67b13c4a71 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6072,7 +6072,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 
 		if (pages) {
 			/*
-			 * try_grab_compound_head() should always succeed here,
+			 * try_grab_folio() should always succeed here,
 			 * because: a) we hold the ptl lock, and b) we've just
 			 * checked that the huge page is present in the page
 			 * tables. If the huge page is present, then the tail
@@ -6081,9 +6081,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			 * any way. So this page must be available at this
 			 * point, unless the page refcount overflowed:
 			 */
-			if (WARN_ON_ONCE(!try_grab_compound_head(pages[i],
-								 refs,
-								 flags))) {
+			if (WARN_ON_ONCE(!try_grab_folio(pages[i], refs,
+							 flags))) {
 				spin_unlock(ptl);
 				remainder = 0;
 				err = -ENOMEM;
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 19/28] gup: Convert try_grab_page() to call try_grab_folio()
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (17 preceding siblings ...)
  2022-01-10  4:23 ` [PATCH v2 18/28] hugetlb: Use try_grab_folio() instead of try_grab_compound_head() Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:36   ` Christoph Hellwig
  2022-01-11  7:01   ` John Hubbard
  2022-01-10  4:23 ` [PATCH v2 20/28] gup: Convert gup_pte_range() to use a folio Matthew Wilcox (Oracle)
                   ` (10 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
try_grab_page() only cares about success or failure, not about the
type returned.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/gup.c b/mm/gup.c
index 20703de2f107..c3e514172eaf 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -192,7 +192,7 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags)
 	if (!(flags & (FOLL_GET | FOLL_PIN)))
 		return true;
 
-	return try_grab_compound_head(page, 1, flags);
+	return try_grab_folio(page, 1, flags);
 }
 
 /**
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 20/28] gup: Convert gup_pte_range() to use a folio
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (18 preceding siblings ...)
  2022-01-10  4:23 ` [PATCH v2 19/28] gup: Convert try_grab_page() to call try_grab_folio() Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:37   ` Christoph Hellwig
  2022-01-11  7:06   ` John Hubbard
  2022-01-10  4:23 ` [PATCH v2 21/28] gup: Convert gup_hugepte() " Matthew Wilcox (Oracle)
                   ` (9 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
We still call try_grab_folio() once per PTE; a future patch could
optimise to just adjust the reference count for each page within
the folio.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index c3e514172eaf..27cc097ec05d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2235,7 +2235,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 	ptem = ptep = pte_offset_map(&pmd, addr);
 	do {
 		pte_t pte = ptep_get_lockless(ptep);
-		struct page *head, *page;
+		struct page *page;
+		struct folio *folio;
 
 		/*
 		 * Similar to the PMD case below, NUMA hinting must take slow
@@ -2262,22 +2263,20 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 		page = pte_page(pte);
 
-		head = try_grab_compound_head(page, 1, flags);
-		if (!head)
+		folio = try_grab_folio(page, 1, flags);
+		if (!folio)
 			goto pte_unmap;
 
 		if (unlikely(page_is_secretmem(page))) {
-			put_compound_head(head, 1, flags);
+			gup_put_folio(folio, 1, flags);
 			goto pte_unmap;
 		}
 
 		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
-			put_compound_head(head, 1, flags);
+			gup_put_folio(folio, 1, flags);
 			goto pte_unmap;
 		}
 
-		VM_BUG_ON_PAGE(compound_head(page) != head, page);
-
 		/*
 		 * We need to make the page accessible if and only if we are
 		 * going to access its content (the FOLL_PIN case).  Please
@@ -2291,10 +2290,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 				goto pte_unmap;
 			}
 		}
-		SetPageReferenced(page);
+		folio_set_referenced(folio);
 		pages[*nr] = page;
 		(*nr)++;
-
 	} while (ptep++, addr += PAGE_SIZE, addr != end);
 
 	ret = 1;
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 21/28] gup: Convert gup_hugepte() to use a folio
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (19 preceding siblings ...)
  2022-01-10  4:23 ` [PATCH v2 20/28] gup: Convert gup_pte_range() to use a folio Matthew Wilcox (Oracle)
@ 2022-01-10  4:23 ` Matthew Wilcox (Oracle)
  2022-01-10  8:37   ` Christoph Hellwig
  2022-01-11  7:33   ` John Hubbard
  2022-01-10  4:24 ` [PATCH v2 22/28] gup: Convert gup_huge_pmd() " Matthew Wilcox (Oracle)
                   ` (8 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:23 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
There should be little to no effect from this patch; just removing
uses of some old APIs.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 27cc097ec05d..250326458df6 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2428,7 +2428,8 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
 		       struct page **pages, int *nr)
 {
 	unsigned long pte_end;
-	struct page *head, *page;
+	struct page *page;
+	struct folio *folio;
 	pte_t pte;
 	int refs;
 
@@ -2444,21 +2445,20 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
 	/* hugepages are never "special" */
 	VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 
-	head = pte_page(pte);
-	page = nth_page(head, (addr & (sz-1)) >> PAGE_SHIFT);
+	page = nth_page(pte_page(pte), (addr & (sz - 1)) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
-	head = try_grab_compound_head(head, refs, flags);
-	if (!head)
+	folio = try_grab_folio(page, refs, flags);
+	if (!folio)
 		return 0;
 
 	if (unlikely(pte_val(pte) != pte_val(*ptep))) {
-		put_compound_head(head, refs, flags);
+		gup_put_folio(folio, refs, flags);
 		return 0;
 	}
 
 	*nr += refs;
-	SetPageReferenced(head);
+	folio_set_referenced(folio);
 	return 1;
 }
 
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 22/28] gup: Convert gup_huge_pmd() to use a folio
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (20 preceding siblings ...)
  2022-01-10  4:23 ` [PATCH v2 21/28] gup: Convert gup_hugepte() " Matthew Wilcox (Oracle)
@ 2022-01-10  4:24 ` Matthew Wilcox (Oracle)
  2022-01-10  8:37   ` Christoph Hellwig
  2022-01-11  7:36   ` John Hubbard
  2022-01-10  4:24 ` [PATCH v2 23/28] gup: Convert gup_huge_pud() " Matthew Wilcox (Oracle)
                   ` (7 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:24 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
Use the new folio-based APIs.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 250326458df6..a006bce2d47b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2492,7 +2492,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 			unsigned long end, unsigned int flags,
 			struct page **pages, int *nr)
 {
-	struct page *head, *page;
+	struct page *page;
+	struct folio *folio;
 	int refs;
 
 	if (!pmd_access_permitted(orig, flags & FOLL_WRITE))
@@ -2508,17 +2509,17 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
 	page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
-	head = try_grab_compound_head(pmd_page(orig), refs, flags);
-	if (!head)
+	folio = try_grab_folio(page, refs, flags);
+	if (!folio)
 		return 0;
 
 	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
-		put_compound_head(head, refs, flags);
+		gup_put_folio(folio, refs, flags);
 		return 0;
 	}
 
 	*nr += refs;
-	SetPageReferenced(head);
+	folio_set_referenced(folio);
 	return 1;
 }
 
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 23/28] gup: Convert gup_huge_pud() to use a folio
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (21 preceding siblings ...)
  2022-01-10  4:24 ` [PATCH v2 22/28] gup: Convert gup_huge_pmd() " Matthew Wilcox (Oracle)
@ 2022-01-10  4:24 ` Matthew Wilcox (Oracle)
  2022-01-10  8:38   ` Christoph Hellwig
  2022-01-11  7:38   ` John Hubbard
  2022-01-10  4:24 ` [PATCH v2 24/28] gup: Convert gup_huge_pgd() " Matthew Wilcox (Oracle)
                   ` (6 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:24 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
Use the new folio-based APIs.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index a006bce2d47b..7b7bf8361558 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2527,7 +2527,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 			unsigned long end, unsigned int flags,
 			struct page **pages, int *nr)
 {
-	struct page *head, *page;
+	struct page *page;
+	struct folio *folio;
 	int refs;
 
 	if (!pud_access_permitted(orig, flags & FOLL_WRITE))
@@ -2543,17 +2544,17 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
 	page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
-	head = try_grab_compound_head(pud_page(orig), refs, flags);
-	if (!head)
+	folio = try_grab_folio(page, refs, flags);
+	if (!folio)
 		return 0;
 
 	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
-		put_compound_head(head, refs, flags);
+		gup_put_folio(folio, refs, flags);
 		return 0;
 	}
 
 	*nr += refs;
-	SetPageReferenced(head);
+	folio_set_referenced(folio);
 	return 1;
 }
 
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 24/28] gup: Convert gup_huge_pgd() to use a folio
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (22 preceding siblings ...)
  2022-01-10  4:24 ` [PATCH v2 23/28] gup: Convert gup_huge_pud() " Matthew Wilcox (Oracle)
@ 2022-01-10  4:24 ` Matthew Wilcox (Oracle)
  2022-01-10  8:38   ` Christoph Hellwig
  2022-01-11  7:38   ` John Hubbard
  2022-01-10  4:24 ` [PATCH v2 25/28] gup: Convert compound_next() to gup_folio_next() Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:24 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
Use the new folio-based APIs.  This was the last user of
try_grab_compound_head(), so remove it.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 7b7bf8361558..b5786e83c418 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -146,12 +146,6 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
 	return NULL;
 }
 
-static inline struct page *try_grab_compound_head(struct page *page,
-		int refs, unsigned int flags)
-{
-	return &try_grab_folio(page, refs, flags)->page;
-}
-
 static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
 {
 	if (flags & FOLL_PIN) {
@@ -2563,7 +2557,8 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
 			struct page **pages, int *nr)
 {
 	int refs;
-	struct page *head, *page;
+	struct page *page;
+	struct folio *folio;
 
 	if (!pgd_access_permitted(orig, flags & FOLL_WRITE))
 		return 0;
@@ -2573,17 +2568,17 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
 	page = nth_page(pgd_page(orig), (addr & ~PGDIR_MASK) >> PAGE_SHIFT);
 	refs = record_subpages(page, addr, end, pages + *nr);
 
-	head = try_grab_compound_head(pgd_page(orig), refs, flags);
-	if (!head)
+	folio = try_grab_folio(page, refs, flags);
+	if (!folio)
 		return 0;
 
 	if (unlikely(pgd_val(orig) != pgd_val(*pgdp))) {
-		put_compound_head(head, refs, flags);
+		gup_put_folio(folio, refs, flags);
 		return 0;
 	}
 
 	*nr += refs;
-	SetPageReferenced(head);
+	folio_set_referenced(folio);
 	return 1;
 }
 
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 25/28] gup: Convert compound_next() to gup_folio_next()
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (23 preceding siblings ...)
  2022-01-10  4:24 ` [PATCH v2 24/28] gup: Convert gup_huge_pgd() " Matthew Wilcox (Oracle)
@ 2022-01-10  4:24 ` Matthew Wilcox (Oracle)
  2022-01-10  8:39   ` Christoph Hellwig
  2022-01-11  7:41   ` John Hubbard
  2022-01-10  4:24 ` [PATCH v2 26/28] gup: Convert compound_range_next() to gup_folio_range_next() Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:24 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
Convert both callers to work on folios instead of pages.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 41 ++++++++++++++++++++++-------------------
 1 file changed, 22 insertions(+), 19 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index b5786e83c418..0cf2d5fd8d2d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -220,20 +220,20 @@ static inline struct page *compound_range_next(unsigned long i,
 	return page;
 }
 
-static inline struct page *compound_next(unsigned long i,
+static inline struct folio *gup_folio_next(unsigned long i,
 		unsigned long npages, struct page **list, unsigned int *ntails)
 {
-	struct page *page;
+	struct folio *folio;
 	unsigned int nr;
 
-	page = compound_head(list[i]);
+	folio = page_folio(list[i]);
 	for (nr = i + 1; nr < npages; nr++) {
-		if (compound_head(list[nr]) != page)
+		if (page_folio(list[nr]) != folio)
 			break;
 	}
 
 	*ntails = nr - i;
-	return page;
+	return folio;
 }
 
 /**
@@ -261,17 +261,17 @@ static inline struct page *compound_next(unsigned long i,
 void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
 				 bool make_dirty)
 {
-	unsigned long index;
-	struct page *head;
-	unsigned int ntails;
+	unsigned long i;
+	struct folio *folio;
+	unsigned int nr;
 
 	if (!make_dirty) {
 		unpin_user_pages(pages, npages);
 		return;
 	}
 
-	for (index = 0; index < npages; index += ntails) {
-		head = compound_next(index, npages, pages, &ntails);
+	for (i = 0; i < npages; i += nr) {
+		folio = gup_folio_next(i, npages, pages, &nr);
 		/*
 		 * Checking PageDirty at this point may race with
 		 * clear_page_dirty_for_io(), but that's OK. Two key
@@ -292,9 +292,12 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
 		 * written back, so it gets written back again in the
 		 * next writeback cycle. This is harmless.
 		 */
-		if (!PageDirty(head))
-			set_page_dirty_lock(head);
-		put_compound_head(head, ntails, FOLL_PIN);
+		if (!folio_test_dirty(folio)) {
+			folio_lock(folio);
+			folio_mark_dirty(folio);
+			folio_unlock(folio);
+		}
+		gup_put_folio(folio, nr, FOLL_PIN);
 	}
 }
 EXPORT_SYMBOL(unpin_user_pages_dirty_lock);
@@ -347,9 +350,9 @@ EXPORT_SYMBOL(unpin_user_page_range_dirty_lock);
  */
 void unpin_user_pages(struct page **pages, unsigned long npages)
 {
-	unsigned long index;
-	struct page *head;
-	unsigned int ntails;
+	unsigned long i;
+	struct folio *folio;
+	unsigned int nr;
 
 	/*
 	 * If this WARN_ON() fires, then the system *might* be leaking pages (by
@@ -359,9 +362,9 @@ void unpin_user_pages(struct page **pages, unsigned long npages)
 	if (WARN_ON(IS_ERR_VALUE(npages)))
 		return;
 
-	for (index = 0; index < npages; index += ntails) {
-		head = compound_next(index, npages, pages, &ntails);
-		put_compound_head(head, ntails, FOLL_PIN);
+	for (i = 0; i < npages; i += nr) {
+		folio = gup_folio_next(i, npages, pages, &nr);
+		gup_put_folio(folio, nr, FOLL_PIN);
 	}
 }
 EXPORT_SYMBOL(unpin_user_pages);
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 26/28] gup: Convert compound_range_next() to gup_folio_range_next()
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (24 preceding siblings ...)
  2022-01-10  4:24 ` [PATCH v2 25/28] gup: Convert compound_next() to gup_folio_next() Matthew Wilcox (Oracle)
@ 2022-01-10  4:24 ` Matthew Wilcox (Oracle)
  2022-01-10  8:41   ` Christoph Hellwig
  2022-01-11  7:44   ` John Hubbard
  2022-01-10  4:24 ` [PATCH v2 27/28] mm: Add isolate_lru_folio() Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:24 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
Convert the only caller to work on folios instead of pages.
This removes the last caller of put_compound_head(), so delete it.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h |  4 ++--
 mm/gup.c           | 38 ++++++++++++++++++--------------------
 2 files changed, 20 insertions(+), 22 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c103c6401ecd..1ddb0a55b5ca 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -216,10 +216,10 @@ int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *,
 
 #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
 #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
-#define page_nth(head, tail)	(page_to_pfn(tail) - page_to_pfn(head))
+#define folio_nth(folio, page)	(page_to_pfn(page) - folio_pfn(folio))
 #else
 #define nth_page(page,n) ((page) + (n))
-#define page_nth(head, tail)	((tail) - (head))
+#define folio_nth(folio, tail)	((tail) - &(folio)->page)
 #endif
 
 /* to align the pointer to the (next) page boundary */
diff --git a/mm/gup.c b/mm/gup.c
index 0cf2d5fd8d2d..1cdd5f2887a8 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -156,12 +156,6 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
 	folio_put_refs(folio, refs);
 }
 
-static void put_compound_head(struct page *page, int refs, unsigned int flags)
-{
-	VM_BUG_ON_PAGE(PageTail(page), page);
-	gup_put_folio((struct folio *)page, refs, flags);
-}
-
 /**
  * try_grab_page() - elevate a page's refcount by a flag-dependent amount
  *
@@ -204,20 +198,21 @@ void unpin_user_page(struct page *page)
 }
 EXPORT_SYMBOL(unpin_user_page);
 
-static inline struct page *compound_range_next(unsigned long i,
+static inline struct folio *gup_folio_range_next(unsigned long i,
 		unsigned long npages, struct page *start, unsigned int *ntails)
 {
-	struct page *next, *page;
+	struct page *next;
+	struct folio *folio;
 	unsigned int nr = 1;
 
 	next = nth_page(start, i);
-	page = compound_head(next);
-	if (PageHead(page))
+	folio = page_folio(next);
+	if (folio_test_large(folio))
 		nr = min_t(unsigned int, npages - i,
-			   compound_nr(page) - page_nth(page, next));
+			   folio_nr_pages(folio) - folio_nth(folio, next));
 
 	*ntails = nr;
-	return page;
+	return folio;
 }
 
 static inline struct folio *gup_folio_next(unsigned long i,
@@ -326,15 +321,18 @@ EXPORT_SYMBOL(unpin_user_pages_dirty_lock);
 void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
 				      bool make_dirty)
 {
-	unsigned long index;
-	struct page *head;
-	unsigned int ntails;
+	unsigned long i;
+	struct folio *folio;
+	unsigned int nr;
 
-	for (index = 0; index < npages; index += ntails) {
-		head = compound_range_next(index, npages, page, &ntails);
-		if (make_dirty && !PageDirty(head))
-			set_page_dirty_lock(head);
-		put_compound_head(head, ntails, FOLL_PIN);
+	for (i = 0; i < npages; i += nr) {
+		folio = gup_folio_range_next(i, npages, page, &nr);
+		if (make_dirty && !folio_test_dirty(folio)) {
+			folio_lock(folio);
+			folio_mark_dirty(folio);
+			folio_unlock(folio);
+		}
+		gup_put_folio(folio, nr, FOLL_PIN);
 	}
 }
 EXPORT_SYMBOL(unpin_user_page_range_dirty_lock);
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 27/28] mm: Add isolate_lru_folio()
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (25 preceding siblings ...)
  2022-01-10  4:24 ` [PATCH v2 26/28] gup: Convert compound_range_next() to gup_folio_range_next() Matthew Wilcox (Oracle)
@ 2022-01-10  4:24 ` Matthew Wilcox (Oracle)
  2022-01-10  8:42   ` Christoph Hellwig
  2022-01-11  7:49   ` John Hubbard
  2022-01-10  4:24 ` [PATCH v2 28/28] gup: Convert check_and_migrate_movable_pages() to use a folio Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:24 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
Turn isolate_lru_page() into a wrapper around isolate_lru_folio().
TestClearPageLRU() would have always failed on a tail page, so
returning -EBUSY is the same behaviour.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 arch/powerpc/include/asm/mmu_context.h |  1 -
 mm/folio-compat.c                      |  8 +++++
 mm/internal.h                          |  3 +-
 mm/vmscan.c                            | 43 ++++++++++++--------------
 4 files changed, 29 insertions(+), 26 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 9ba6b585337f..b9cab0a11421 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -21,7 +21,6 @@ extern void destroy_context(struct mm_struct *mm);
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 struct mm_iommu_table_group_mem_t;
 
-extern int isolate_lru_page(struct page *page);	/* from internal.h */
 extern bool mm_iommu_preregistered(struct mm_struct *mm);
 extern long mm_iommu_new(struct mm_struct *mm,
 		unsigned long ua, unsigned long entries,
diff --git a/mm/folio-compat.c b/mm/folio-compat.c
index 749555a232a8..782e766cd1ee 100644
--- a/mm/folio-compat.c
+++ b/mm/folio-compat.c
@@ -7,6 +7,7 @@
 #include <linux/migrate.h>
 #include <linux/pagemap.h>
 #include <linux/swap.h>
+#include "internal.h"
 
 struct address_space *page_mapping(struct page *page)
 {
@@ -151,3 +152,10 @@ int try_to_release_page(struct page *page, gfp_t gfp)
 	return filemap_release_folio(page_folio(page), gfp);
 }
 EXPORT_SYMBOL(try_to_release_page);
+
+int isolate_lru_page(struct page *page)
+{
+	if (WARN_RATELIMIT(PageTail(page), "trying to isolate tail page"))
+		return -EBUSY;
+	return isolate_lru_folio((struct folio *)page);
+}
diff --git a/mm/internal.h b/mm/internal.h
index 9a72d1ecdab4..8b90db90e7f2 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -157,7 +157,8 @@ extern unsigned long highest_memmap_pfn;
 /*
  * in mm/vmscan.c:
  */
-extern int isolate_lru_page(struct page *page);
+int isolate_lru_page(struct page *page);
+int isolate_lru_folio(struct folio *folio);
 extern void putback_lru_page(struct page *page);
 extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason);
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fb9584641ac7..ac2f5b76cdb2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2168,45 +2168,40 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 }
 
 /**
- * isolate_lru_page - tries to isolate a page from its LRU list
- * @page: page to isolate from its LRU list
+ * isolate_lru_folio - Try to isolate a folio from its LRU list.
+ * @folio: Folio to isolate from its LRU list.
  *
- * Isolates a @page from an LRU list, clears PageLRU and adjusts the
- * vmstat statistic corresponding to whatever LRU list the page was on.
+ * Isolate a @folio from an LRU list and adjust the vmstat statistic
+ * corresponding to whatever LRU list the folio was on.
  *
- * Returns 0 if the page was removed from an LRU list.
- * Returns -EBUSY if the page was not on an LRU list.
- *
- * The returned page will have PageLRU() cleared.  If it was found on
- * the active list, it will have PageActive set.  If it was found on
- * the unevictable list, it will have the PageUnevictable bit set. That flag
+ * The folio will have its LRU flag cleared.  If it was found on the
+ * active list, it will have the Active flag set.  If it was found on the
+ * unevictable list, it will have the Unevictable flag set.  These flags
  * may need to be cleared by the caller before letting the page go.
  *
- * The vmstat statistic corresponding to the list on which the page was
- * found will be decremented.
- *
- * Restrictions:
+ * Context:
  *
  * (1) Must be called with an elevated refcount on the page. This is a
- *     fundamental difference from isolate_lru_pages (which is called
+ *     fundamental difference from isolate_lru_pages() (which is called
  *     without a stable reference).
- * (2) the lru_lock must not be held.
- * (3) interrupts must be enabled.
+ * (2) The lru_lock must not be held.
+ * (3) Interrupts must be enabled.
+ *
+ * Return: 0 if the folio was removed from an LRU list.
+ * -EBUSY if the folio was not on an LRU list.
  */
-int isolate_lru_page(struct page *page)
+int isolate_lru_folio(struct folio *folio)
 {
-	struct folio *folio = page_folio(page);
 	int ret = -EBUSY;
 
-	VM_BUG_ON_PAGE(!page_count(page), page);
-	WARN_RATELIMIT(PageTail(page), "trying to isolate tail page");
+	VM_BUG_ON_FOLIO(!folio_ref_count(folio), folio);
 
-	if (TestClearPageLRU(page)) {
+	if (folio_test_clear_lru(folio)) {
 		struct lruvec *lruvec;
 
-		get_page(page);
+		folio_get(folio);
 		lruvec = folio_lruvec_lock_irq(folio);
-		del_page_from_lru_list(page, lruvec);
+		lruvec_del_folio(lruvec, folio);
 		unlock_page_lruvec_irq(lruvec);
 		ret = 0;
 	}
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* [PATCH v2 28/28] gup: Convert check_and_migrate_movable_pages() to use a folio
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (26 preceding siblings ...)
  2022-01-10  4:24 ` [PATCH v2 27/28] mm: Add isolate_lru_folio() Matthew Wilcox (Oracle)
@ 2022-01-10  4:24 ` Matthew Wilcox (Oracle)
  2022-01-10  8:42   ` Christoph Hellwig
  2022-01-11  7:52   ` John Hubbard
  2022-01-10 15:31 ` [PATCH v2 00/28] Convert GUP to folios Jason Gunthorpe
  2022-01-10 17:26 ` William Kucharski
  29 siblings, 2 replies; 96+ messages in thread
From: Matthew Wilcox (Oracle) @ 2022-01-10  4:24 UTC (permalink / raw)
  To: linux-mm
  Cc: Matthew Wilcox (Oracle), John Hubbard, Christoph Hellwig,
	William Kucharski, linux-kernel, Jason Gunthorpe
Switch from head pages to folios.  This removes an assumption that
THPs are the only way to have a high-order page.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/gup.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 1cdd5f2887a8..b2d109626c44 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1801,41 +1801,41 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
 	bool drain_allow = true;
 	LIST_HEAD(movable_page_list);
 	long ret = 0;
-	struct page *prev_head = NULL;
-	struct page *head;
+	struct folio *folio, *prev_folio = NULL;
 	struct migration_target_control mtc = {
 		.nid = NUMA_NO_NODE,
 		.gfp_mask = GFP_USER | __GFP_NOWARN,
 	};
 
 	for (i = 0; i < nr_pages; i++) {
-		head = compound_head(pages[i]);
-		if (head == prev_head)
+		folio = page_folio(pages[i]);
+		if (folio == prev_folio)
 			continue;
-		prev_head = head;
+		prev_folio = folio;
 		/*
 		 * If we get a movable page, since we are going to be pinning
 		 * these entries, try to move them out if possible.
 		 */
-		if (!is_pinnable_page(head)) {
-			if (PageHuge(head)) {
-				if (!isolate_huge_page(head, &movable_page_list))
+		if (!is_pinnable_page(&folio->page)) {
+			if (folio_test_hugetlb(folio)) {
+				if (!isolate_huge_page(&folio->page,
+							&movable_page_list))
 					isolation_error_count++;
 			} else {
-				if (!PageLRU(head) && drain_allow) {
+				if (!folio_test_lru(folio) && drain_allow) {
 					lru_add_drain_all();
 					drain_allow = false;
 				}
 
-				if (isolate_lru_page(head)) {
+				if (isolate_lru_folio(folio)) {
 					isolation_error_count++;
 					continue;
 				}
-				list_add_tail(&head->lru, &movable_page_list);
-				mod_node_page_state(page_pgdat(head),
+				list_add_tail(&folio->lru, &movable_page_list);
+				node_stat_mod_folio(folio,
 						    NR_ISOLATED_ANON +
-						    page_is_file_lru(head),
-						    thp_nr_pages(head));
+						    folio_is_file_lru(folio),
+						    folio_nr_pages(folio));
 			}
 		}
 	}
-- 
2.33.0
^ permalink raw reply related	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 01/28] gup: Remove for_each_compound_range()
  2022-01-10  4:23 ` [PATCH v2 01/28] gup: Remove for_each_compound_range() Matthew Wilcox (Oracle)
@ 2022-01-10  8:22   ` Christoph Hellwig
  2022-01-11  1:07   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:22 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
On Mon, Jan 10, 2022 at 04:23:39AM +0000, Matthew Wilcox (Oracle) wrote:
> This macro doesn't simplify the users; it's easier to just call
> compound_range_next() inside the loop.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 02/28] gup: Remove for_each_compound_head()
  2022-01-10  4:23 ` [PATCH v2 02/28] gup: Remove for_each_compound_head() Matthew Wilcox (Oracle)
@ 2022-01-10  8:23   ` Christoph Hellwig
  2022-01-11  1:11   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:23 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 03/28] gup: Change the calling convention for compound_range_next()
  2022-01-10  4:23 ` [PATCH v2 03/28] gup: Change the calling convention for compound_range_next() Matthew Wilcox (Oracle)
@ 2022-01-10  8:25   ` Christoph Hellwig
  2022-01-11  1:14   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:25 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
On Mon, Jan 10, 2022 at 04:23:41AM +0000, Matthew Wilcox (Oracle) wrote:
> Return the head page instead of storing it to a passed parameter.
> Pass the start page directly instead of passing a pointer to it.
Looks good, but when we're changing the calling conventions anyway:
> -static inline void compound_range_next(unsigned long i, unsigned long npages,
> -				       struct page **list, struct page **head,
> -				       unsigned int *ntails)
> +static inline struct page *compound_range_next(unsigned long i,
> +		unsigned long npages, struct page *start, unsigned int *ntails)
To me the logical argument order would be something like:
static inline struct page *compound_range_next(struct page *start,
		unsigned long npages,, unsigned long i, unsigned int *ntails)
where the two first arguments pass in what is worked on and match the 
calling conventions of the caller.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 04/28] gup: Optimise compound_range_next()
  2022-01-10  4:23 ` [PATCH v2 04/28] gup: Optimise compound_range_next() Matthew Wilcox (Oracle)
@ 2022-01-10  8:26   ` Christoph Hellwig
  2022-01-11  1:16   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:26 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
On Mon, Jan 10, 2022 at 04:23:42AM +0000, Matthew Wilcox (Oracle) wrote:
> By definition, a compound page has an order >= 1, so the second half
> of the test was redundant.  Also, this cannot be a tail page since
> it's the result of calling compound_head(), so use PageHead() instead
> of PageCompound().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 05/28] gup: Change the calling convention for compound_next()
  2022-01-10  4:23 ` [PATCH v2 05/28] gup: Change the calling convention for compound_next() Matthew Wilcox (Oracle)
@ 2022-01-10  8:27   ` Christoph Hellwig
  2022-01-10 13:28     ` Matthew Wilcox
  2022-01-11  1:18   ` John Hubbard
  1 sibling, 1 reply; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:27 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
On Mon, Jan 10, 2022 at 04:23:43AM +0000, Matthew Wilcox (Oracle) wrote:
> Return the head page instead of storing it to a passed parameter.
Looks good, but same comment about maybe passing list and npages first.
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 06/28] gup: Fix some contiguous memmap assumptions
  2022-01-10  4:23 ` [PATCH v2 06/28] gup: Fix some contiguous memmap assumptions Matthew Wilcox (Oracle)
@ 2022-01-10  8:29   ` Christoph Hellwig
  2022-01-10 13:37     ` Matthew Wilcox
  2022-01-11  1:47   ` John Hubbard
  1 sibling, 1 reply; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:29 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
On Mon, Jan 10, 2022 at 04:23:44AM +0000, Matthew Wilcox (Oracle) wrote:
> Several functions in gup.c assume that a compound page has virtually
> contiguous page structs.  This isn't true for SPARSEMEM configs unless
> SPARSEMEM_VMEMMAP is also set.  Fix them by using nth_page() instead of
> plain pointer arithmetic.
So is this an actualy bug that need a Fixes tag, or do all architectures
that support THP and sparsemem use SPARSEMEM_VMEMMAP?
> +	page = nth_page(head, (addr & (sz-1)) >> PAGE_SHIFT);
Would be nice to fix the indeation for sz - 1 while you're at it.
Otherwise looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 07/28] gup: Remove an assumption of a contiguous memmap
  2022-01-10  4:23 ` [PATCH v2 07/28] gup: Remove an assumption of a contiguous memmap Matthew Wilcox (Oracle)
@ 2022-01-10  8:30   ` Christoph Hellwig
  2022-01-11  3:27   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:30 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 08/28] gup: Handle page split race more efficiently
  2022-01-10  4:23 ` [PATCH v2 08/28] gup: Handle page split race more efficiently Matthew Wilcox (Oracle)
@ 2022-01-10  8:31   ` Christoph Hellwig
  2022-01-11  3:30   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:31 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
On Mon, Jan 10, 2022 at 04:23:46AM +0000, Matthew Wilcox (Oracle) wrote:
> If we hit the page split race, the current code returns NULL which will
> presumably trigger a retry under the mmap_lock.  This isn't necessary;
> we can just retry the compound_head() lookup.  This is a very minor
> optimisation of an unlikely path, but conceptually it matches (eg)
> the page cache RCU-protected lookup.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 09/28] gup: Turn hpage_pincount_add() into page_pincount_add()
  2022-01-10  4:23 ` [PATCH v2 09/28] gup: Turn hpage_pincount_add() into page_pincount_add() Matthew Wilcox (Oracle)
@ 2022-01-10  8:31   ` Christoph Hellwig
  2022-01-11  3:35   ` John Hubbard
  2022-01-11  4:32   ` John Hubbard
  2 siblings, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:31 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
On Mon, Jan 10, 2022 at 04:23:47AM +0000, Matthew Wilcox (Oracle) wrote:
> Simplify try_grab_compound_head() and remove an unnecessary
> VM_BUG_ON by handling pages both with and without a pincount field in
> page_pincount_add().
Nice:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 10/28] gup: Turn hpage_pincount_sub() into page_pincount_sub()
  2022-01-10  4:23 ` [PATCH v2 10/28] gup: Turn hpage_pincount_sub() into page_pincount_sub() Matthew Wilcox (Oracle)
@ 2022-01-10  8:32   ` Christoph Hellwig
  2022-01-11  6:40   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:32 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
On Mon, Jan 10, 2022 at 04:23:48AM +0000, Matthew Wilcox (Oracle) wrote:
> Remove an unnecessary VM_BUG_ON by handling pages both with and without
> a pincount field in page_pincount_sub().
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 12/28] mm: Add folio_put_refs()
  2022-01-10  4:23 ` [PATCH v2 12/28] mm: Add folio_put_refs() Matthew Wilcox (Oracle)
@ 2022-01-10  8:32   ` Christoph Hellwig
  2022-01-11  4:14   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:32 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
On Mon, Jan 10, 2022 at 04:23:50AM +0000, Matthew Wilcox (Oracle) wrote:
> This is like folio_put(), but puts N references at once instead of
> just one.  It's like put_page_refs(), but does one atomic operation
> instead of two, and is available to more than just gup.c.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 13/28] mm: Add folio_pincount_ptr()
  2022-01-10  4:23 ` [PATCH v2 13/28] mm: Add folio_pincount_ptr() Matthew Wilcox (Oracle)
@ 2022-01-10  8:33   ` Christoph Hellwig
  2022-01-11  4:22   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:33 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
On Mon, Jan 10, 2022 at 04:23:51AM +0000, Matthew Wilcox (Oracle) wrote:
> +static inline atomic_t *folio_pincount_ptr(struct folio *folio)
> +{
> +	struct page *tail = &folio->page + 1;
> +	return &tail->compound_pincount;
> +}
And empty line after the declaration would be nice here.
Otherwise looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 14/28] mm: Convert page_maybe_dma_pinned() to use a folio
  2022-01-10  4:23 ` [PATCH v2 14/28] mm: Convert page_maybe_dma_pinned() to use a folio Matthew Wilcox (Oracle)
@ 2022-01-10  8:33   ` Christoph Hellwig
  2022-01-11  4:27   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:33 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
On Mon, Jan 10, 2022 at 04:23:52AM +0000, Matthew Wilcox (Oracle) wrote:
> Replace three calls to compound_head() with one.  This removes the last
> user of compound_pincount(), so remove that helper too.
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 15/28] gup: Add try_get_folio() and try_grab_folio()
  2022-01-10  4:23 ` [PATCH v2 15/28] gup: Add try_get_folio() and try_grab_folio() Matthew Wilcox (Oracle)
@ 2022-01-10  8:34   ` Christoph Hellwig
  2022-01-11  5:00   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:34 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 16/28] mm: Remove page_cache_add_speculative() and page_cache_get_speculative()
  2022-01-10  4:23 ` [PATCH v2 16/28] mm: Remove page_cache_add_speculative() and page_cache_get_speculative() Matthew Wilcox (Oracle)
@ 2022-01-10  8:35   ` Christoph Hellwig
  2022-01-11  5:14   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:35 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
On Mon, Jan 10, 2022 at 04:23:54AM +0000, Matthew Wilcox (Oracle) wrote:
> These wrappers have no more callers, so delete them.
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 17/28] gup: Add gup_put_folio()
  2022-01-10  4:23 ` [PATCH v2 17/28] gup: Add gup_put_folio() Matthew Wilcox (Oracle)
@ 2022-01-10  8:35   ` Christoph Hellwig
  2022-01-11  6:44   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:35 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 18/28] hugetlb: Use try_grab_folio() instead of try_grab_compound_head()
  2022-01-10  4:23 ` [PATCH v2 18/28] hugetlb: Use try_grab_folio() instead of try_grab_compound_head() Matthew Wilcox (Oracle)
@ 2022-01-10  8:36   ` Christoph Hellwig
  2022-01-11  6:47   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:36 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
On Mon, Jan 10, 2022 at 04:23:56AM +0000, Matthew Wilcox (Oracle) wrote:
> follow_hugetlb_page() only cares about success or failure, so it doesn't
> need to know the type of the returned pointer, only whether it's NULL
> or not.
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 19/28] gup: Convert try_grab_page() to call try_grab_folio()
  2022-01-10  4:23 ` [PATCH v2 19/28] gup: Convert try_grab_page() to call try_grab_folio() Matthew Wilcox (Oracle)
@ 2022-01-10  8:36   ` Christoph Hellwig
  2022-01-11  7:01   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:36 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
On Mon, Jan 10, 2022 at 04:23:57AM +0000, Matthew Wilcox (Oracle) wrote:
> try_grab_page() only cares about success or failure, not about the
> type returned.
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 20/28] gup: Convert gup_pte_range() to use a folio
  2022-01-10  4:23 ` [PATCH v2 20/28] gup: Convert gup_pte_range() to use a folio Matthew Wilcox (Oracle)
@ 2022-01-10  8:37   ` Christoph Hellwig
  2022-01-11  7:06   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:37 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
On Mon, Jan 10, 2022 at 04:23:58AM +0000, Matthew Wilcox (Oracle) wrote:
> We still call try_grab_folio() once per PTE; a future patch could
> optimise to just adjust the reference count for each page within
> the folio.
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 21/28] gup: Convert gup_hugepte() to use a folio
  2022-01-10  4:23 ` [PATCH v2 21/28] gup: Convert gup_hugepte() " Matthew Wilcox (Oracle)
@ 2022-01-10  8:37   ` Christoph Hellwig
  2022-01-11  7:33   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:37 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
On Mon, Jan 10, 2022 at 04:23:59AM +0000, Matthew Wilcox (Oracle) wrote:
> There should be little to no effect from this patch; just removing
> uses of some old APIs.
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 22/28] gup: Convert gup_huge_pmd() to use a folio
  2022-01-10  4:24 ` [PATCH v2 22/28] gup: Convert gup_huge_pmd() " Matthew Wilcox (Oracle)
@ 2022-01-10  8:37   ` Christoph Hellwig
  2022-01-11  7:36   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:37 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 23/28] gup: Convert gup_huge_pud() to use a folio
  2022-01-10  4:24 ` [PATCH v2 23/28] gup: Convert gup_huge_pud() " Matthew Wilcox (Oracle)
@ 2022-01-10  8:38   ` Christoph Hellwig
  2022-01-11  7:38   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:38 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 24/28] gup: Convert gup_huge_pgd() to use a folio
  2022-01-10  4:24 ` [PATCH v2 24/28] gup: Convert gup_huge_pgd() " Matthew Wilcox (Oracle)
@ 2022-01-10  8:38   ` Christoph Hellwig
  2022-01-11  7:38   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:38 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 25/28] gup: Convert compound_next() to gup_folio_next()
  2022-01-10  4:24 ` [PATCH v2 25/28] gup: Convert compound_next() to gup_folio_next() Matthew Wilcox (Oracle)
@ 2022-01-10  8:39   ` Christoph Hellwig
  2022-01-11  7:41   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:39 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 26/28] gup: Convert compound_range_next() to gup_folio_range_next()
  2022-01-10  4:24 ` [PATCH v2 26/28] gup: Convert compound_range_next() to gup_folio_range_next() Matthew Wilcox (Oracle)
@ 2022-01-10  8:41   ` Christoph Hellwig
  2022-01-10 13:41     ` Matthew Wilcox
  2022-01-11  7:44   ` John Hubbard
  1 sibling, 1 reply; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:41 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
On Mon, Jan 10, 2022 at 04:24:04AM +0000, Matthew Wilcox (Oracle) wrote:
> +static inline struct folio *gup_folio_range_next(unsigned long i,
>  		unsigned long npages, struct page *start, unsigned int *ntails)
>  {
> -	struct page *next, *page;
> +	struct page *next;
> +	struct folio *folio;
>  	unsigned int nr = 1;
>  
>  	next = nth_page(start, i);
> +	folio = page_folio(next);
Superficial nit:  initialization next and folio at declaration time
would reada little better.
Otherwise looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 27/28] mm: Add isolate_lru_folio()
  2022-01-10  4:24 ` [PATCH v2 27/28] mm: Add isolate_lru_folio() Matthew Wilcox (Oracle)
@ 2022-01-10  8:42   ` Christoph Hellwig
  2022-01-11  7:49   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:42 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 28/28] gup: Convert check_and_migrate_movable_pages() to use a folio
  2022-01-10  4:24 ` [PATCH v2 28/28] gup: Convert check_and_migrate_movable_pages() to use a folio Matthew Wilcox (Oracle)
@ 2022-01-10  8:42   ` Christoph Hellwig
  2022-01-11  7:52   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-10  8:42 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 05/28] gup: Change the calling convention for compound_next()
  2022-01-10  8:27   ` Christoph Hellwig
@ 2022-01-10 13:28     ` Matthew Wilcox
  0 siblings, 0 replies; 96+ messages in thread
From: Matthew Wilcox @ 2022-01-10 13:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, John Hubbard, William Kucharski, linux-kernel,
	Jason Gunthorpe
On Mon, Jan 10, 2022 at 12:27:29AM -0800, Christoph Hellwig wrote:
> On Mon, Jan 10, 2022 at 04:23:43AM +0000, Matthew Wilcox (Oracle) wrote:
> > Return the head page instead of storing it to a passed parameter.
> 
> Looks good, but same comment about maybe passing list and npages first.
Done for both.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 06/28] gup: Fix some contiguous memmap assumptions
  2022-01-10  8:29   ` Christoph Hellwig
@ 2022-01-10 13:37     ` Matthew Wilcox
  2022-01-10 19:05       ` [External] : " Mike Kravetz
  0 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox @ 2022-01-10 13:37 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, John Hubbard, William Kucharski, linux-kernel,
	Jason Gunthorpe, Mike Kravetz
On Mon, Jan 10, 2022 at 12:29:58AM -0800, Christoph Hellwig wrote:
> On Mon, Jan 10, 2022 at 04:23:44AM +0000, Matthew Wilcox (Oracle) wrote:
> > Several functions in gup.c assume that a compound page has virtually
> > contiguous page structs.  This isn't true for SPARSEMEM configs unless
> > SPARSEMEM_VMEMMAP is also set.  Fix them by using nth_page() instead of
> > plain pointer arithmetic.
> 
> So is this an actualy bug that need a Fixes tag, or do all architectures
> that support THP and sparsemem use SPARSEMEM_VMEMMAP?
As far as I can tell (and I am by no means an expert in this area),
this problem only affects pages of order MAX_ORDER or higher.  That is,
somebody using regular 2MB hugepages on x86 won't see a problem, whether
they're using VMEMMAP or not.  It only starts to become a problem for
1GB hugepages.
Since THPs are (currently) only allocated from the page allocator, it's
never a problem for THPs, only hugetlbfs.  Correcting the places which
can't see a 1GB page is just defense against copy-and-paste programming.
So I'll defer to Mike -- does this ever affect real systems and thus
warrant a backport?  I know this doesn't affect UEK because we enable
SPARSEMEM_VMEMMAP.
> > +	page = nth_page(head, (addr & (sz-1)) >> PAGE_SHIFT);
> 
> Would be nice to fix the indeation for sz - 1 while you're at it.
Done.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 26/28] gup: Convert compound_range_next() to gup_folio_range_next()
  2022-01-10  8:41   ` Christoph Hellwig
@ 2022-01-10 13:41     ` Matthew Wilcox
  0 siblings, 0 replies; 96+ messages in thread
From: Matthew Wilcox @ 2022-01-10 13:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-mm, John Hubbard, William Kucharski, linux-kernel,
	Jason Gunthorpe
On Mon, Jan 10, 2022 at 12:41:22AM -0800, Christoph Hellwig wrote:
> On Mon, Jan 10, 2022 at 04:24:04AM +0000, Matthew Wilcox (Oracle) wrote:
> > +static inline struct folio *gup_folio_range_next(unsigned long i,
> >  		unsigned long npages, struct page *start, unsigned int *ntails)
> >  {
> > -	struct page *next, *page;
> > +	struct page *next;
> > +	struct folio *folio;
> >  	unsigned int nr = 1;
> >  
> >  	next = nth_page(start, i);
> > +	folio = page_folio(next);
> 
> Superficial nit:  initialization next and folio at declaration time
> would reada little better.
Done.  Also the previous patch:
-static inline struct page *compound_next(struct page **list,
+static inline struct folio *gup_folio_next(struct page **list,
                unsigned long npages, unsigned long i, unsigned int *ntails)
 {
-       struct page *page;
+       struct folio *folio = page_folio(list[i]);
        unsigned int nr;
-       page = compound_head(list[i]);
        for (nr = i + 1; nr < npages; nr++) {
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 00/28] Convert GUP to folios
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (27 preceding siblings ...)
  2022-01-10  4:24 ` [PATCH v2 28/28] gup: Convert check_and_migrate_movable_pages() to use a folio Matthew Wilcox (Oracle)
@ 2022-01-10 15:31 ` Jason Gunthorpe
  2022-01-10 16:09   ` Matthew Wilcox
  2022-01-10 17:26 ` William Kucharski
  29 siblings, 1 reply; 96+ messages in thread
From: Jason Gunthorpe @ 2022-01-10 15:31 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel
On Mon, Jan 10, 2022 at 04:23:38AM +0000, Matthew Wilcox (Oracle) wrote:
> This patch series is against my current folio for-next branch.  I know
> it won't apply to sfr's next tree, and it's not for-next material yet.
> I intend to submit it for 5.18 after I've rebased it to one of the
> 5.17-rc releases.
> 
> The overall effect of this (ignoring the primary "preparing for folios
> that are not PAGE or PMD sized" purpose) is to reduce the size of gup.o
> by ~700 bytes in the config I normally test with.
> 
> This patchset just converts existing implementations to use folios.
> There's no new API for consumers here to provide information in a more
> efficient (address, length) format.  That will be a separate patchset.
> 
> In v2, I've tried to address all the comments from the reviews I got
> on v1.  Apologies if I missed anything; I got a lot of good feedback.
> Primarily I separated out the folio changes (later) from the "While
> I'm looking at this ..." changes (earlier).  I'm not sure the story
> of the patchset is necessarily coherent this way, but it should be
> easier to review.
> 
> Another big change is that compound_pincount is now available to all
> compound pages, not just those that are order-2-or-higher.  Patch 11.
> 
> I did notice one bug in my original patchset which I'm disappointed GCC
> didn't diagnose:
> 
> 		pages[nr++] = nth_page(page, nr);
> 
> Given the massive reorg of the patchset, I didn't feel right using
> anyone's SoB from v1 on any of the patches.  But, despite the increased
> number of patches, I hope it's easier to review than v1.
> 
> And I'd dearly love a better name than 'folio_nth'.  page_nth() is
> a temporary construct, so doesn't need a better name.  If you need
> context, it's in the gup_folio_range_next() patch and its job
> is to tell you, given a page and a folio, what # page it is within
> a folio (so a number between [0 and folio_nr_pages())).
folio_tail_index() ?
Series still looks good to me, though I checked each patch
carefully than the prior series:
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Jason
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 00/28] Convert GUP to folios
  2022-01-10 15:31 ` [PATCH v2 00/28] Convert GUP to folios Jason Gunthorpe
@ 2022-01-10 16:09   ` Matthew Wilcox
  0 siblings, 0 replies; 96+ messages in thread
From: Matthew Wilcox @ 2022-01-10 16:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel
On Mon, Jan 10, 2022 at 11:31:03AM -0400, Jason Gunthorpe wrote:
> 
> On Mon, Jan 10, 2022 at 04:23:38AM +0000, Matthew Wilcox (Oracle) wrote:
> > This patch series is against my current folio for-next branch.  I know
> > it won't apply to sfr's next tree, and it's not for-next material yet.
> > I intend to submit it for 5.18 after I've rebased it to one of the
> > 5.17-rc releases.
> > 
> > The overall effect of this (ignoring the primary "preparing for folios
> > that are not PAGE or PMD sized" purpose) is to reduce the size of gup.o
> > by ~700 bytes in the config I normally test with.
> > 
> > This patchset just converts existing implementations to use folios.
> > There's no new API for consumers here to provide information in a more
> > efficient (address, length) format.  That will be a separate patchset.
> > 
> > In v2, I've tried to address all the comments from the reviews I got
> > on v1.  Apologies if I missed anything; I got a lot of good feedback.
> > Primarily I separated out the folio changes (later) from the "While
> > I'm looking at this ..." changes (earlier).  I'm not sure the story
> > of the patchset is necessarily coherent this way, but it should be
> > easier to review.
> > 
> > Another big change is that compound_pincount is now available to all
> > compound pages, not just those that are order-2-or-higher.  Patch 11.
> > 
> > I did notice one bug in my original patchset which I'm disappointed GCC
> > didn't diagnose:
> > 
> > 		pages[nr++] = nth_page(page, nr);
> > 
> > Given the massive reorg of the patchset, I didn't feel right using
> > anyone's SoB from v1 on any of the patches.  But, despite the increased
> > number of patches, I hope it's easier to review than v1.
> > 
> > And I'd dearly love a better name than 'folio_nth'.  page_nth() is
> > a temporary construct, so doesn't need a better name.  If you need
> > context, it's in the gup_folio_range_next() patch and its job
> > is to tell you, given a page and a folio, what # page it is within
> > a folio (so a number between [0 and folio_nr_pages())).
> 
> folio_tail_index() ?
I'm a little wary of "index" because folios are used to cache file data,
and folio->index means offset of the folio within the file.  We could
make the argument for folio_page_index() (since it might be the head
page, not a tail page), and argue this is the index into the folio,
not the index into the file.
It's better than folio_nth ;-)
> Series still looks good to me, though I checked each patch
> carefully than the prior series:
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Thanks!
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 00/28] Convert GUP to folios
  2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
                   ` (28 preceding siblings ...)
  2022-01-10 15:31 ` [PATCH v2 00/28] Convert GUP to folios Jason Gunthorpe
@ 2022-01-10 17:26 ` William Kucharski
  29 siblings, 0 replies; 96+ messages in thread
From: William Kucharski @ 2022-01-10 17:26 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm@kvack.org, John Hubbard, Christoph Hellwig,
	linux-kernel@vger.kernel.org, Jason Gunthorpe
Series still looks good.
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
> On Jan 9, 2022, at 9:23 PM, Matthew Wilcox (Oracle) <willy@infradead.org> wrote:
> 
> This patch series is against my current folio for-next branch.  I know
> it won't apply to sfr's next tree, and it's not for-next material yet.
> I intend to submit it for 5.18 after I've rebased it to one of the
> 5.17-rc releases.
> 
> The overall effect of this (ignoring the primary "preparing for folios
> that are not PAGE or PMD sized" purpose) is to reduce the size of gup.o
> by ~700 bytes in the config I normally test with.
> 
> This patchset just converts existing implementations to use folios.
> There's no new API for consumers here to provide information in a more
> efficient (address, length) format.  That will be a separate patchset.
> 
> In v2, I've tried to address all the comments from the reviews I got
> on v1.  Apologies if I missed anything; I got a lot of good feedback.
> Primarily I separated out the folio changes (later) from the "While
> I'm looking at this ..." changes (earlier).  I'm not sure the story
> of the patchset is necessarily coherent this way, but it should be
> easier to review.
> 
> Another big change is that compound_pincount is now available to all
> compound pages, not just those that are order-2-or-higher.  Patch 11.
> 
> I did notice one bug in my original patchset which I'm disappointed GCC
> didn't diagnose:
> 
> 		pages[nr++] = nth_page(page, nr);
> 
> Given the massive reorg of the patchset, I didn't feel right using
> anyone's SoB from v1 on any of the patches.  But, despite the increased
> number of patches, I hope it's easier to review than v1.
> 
> And I'd dearly love a better name than 'folio_nth'.  page_nth() is
> a temporary construct, so doesn't need a better name.  If you need
> context, it's in the gup_folio_range_next() patch and its job
> is to tell you, given a page and a folio, what # page it is within
> a folio (so a number between [0 and folio_nr_pages())).
> 
> Matthew Wilcox (Oracle) (28):
>  gup: Remove for_each_compound_range()
>  gup: Remove for_each_compound_head()
>  gup: Change the calling convention for compound_range_next()
>  gup: Optimise compound_range_next()
>  gup: Change the calling convention for compound_next()
>  gup: Fix some contiguous memmap assumptions
>  gup: Remove an assumption of a contiguous memmap
>  gup: Handle page split race more efficiently
>  gup: Turn hpage_pincount_add() into page_pincount_add()
>  gup: Turn hpage_pincount_sub() into page_pincount_sub()
>  mm: Make compound_pincount always available
>  mm: Add folio_put_refs()
>  mm: Add folio_pincount_ptr()
>  mm: Convert page_maybe_dma_pinned() to use a folio
>  gup: Add try_get_folio() and try_grab_folio()
>  mm: Remove page_cache_add_speculative() and
>    page_cache_get_speculative()
>  gup: Add gup_put_folio()
>  hugetlb: Use try_grab_folio() instead of try_grab_compound_head()
>  gup: Convert try_grab_page() to call try_grab_folio()
>  gup: Convert gup_pte_range() to use a folio
>  gup: Convert gup_hugepte() to use a folio
>  gup: Convert gup_huge_pmd() to use a folio
>  gup: Convert gup_huge_pud() to use a folio
>  gup: Convert gup_huge_pgd() to use a folio
>  gup: Convert compound_next() to gup_folio_next()
>  gup: Convert compound_range_next() to gup_folio_range_next()
>  mm: Add isolate_lru_folio()
>  gup: Convert check_and_migrate_movable_pages() to use a folio
> 
> Documentation/core-api/pin_user_pages.rst |  18 +-
> arch/powerpc/include/asm/mmu_context.h    |   1 -
> include/linux/mm.h                        |  70 +++--
> include/linux/mm_types.h                  |  13 +-
> include/linux/pagemap.h                   |  11 -
> mm/debug.c                                |  14 +-
> mm/folio-compat.c                         |   8 +
> mm/gup.c                                  | 359 ++++++++++------------
> mm/hugetlb.c                              |   7 +-
> mm/internal.h                             |   8 +-
> mm/page_alloc.c                           |   3 +-
> mm/rmap.c                                 |   6 +-
> mm/vmscan.c                               |  43 ++-
> 13 files changed, 263 insertions(+), 298 deletions(-)
> 
> -- 
> 2.33.0
> 
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [External] : Re: [PATCH v2 06/28] gup: Fix some contiguous memmap assumptions
  2022-01-10 13:37     ` Matthew Wilcox
@ 2022-01-10 19:05       ` Mike Kravetz
  0 siblings, 0 replies; 96+ messages in thread
From: Mike Kravetz @ 2022-01-10 19:05 UTC (permalink / raw)
  To: Matthew Wilcox, Christoph Hellwig
  Cc: linux-mm, John Hubbard, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/10/22 05:37, Matthew Wilcox wrote:
> On Mon, Jan 10, 2022 at 12:29:58AM -0800, Christoph Hellwig wrote:
>> On Mon, Jan 10, 2022 at 04:23:44AM +0000, Matthew Wilcox (Oracle) wrote:
>>> Several functions in gup.c assume that a compound page has virtually
>>> contiguous page structs.  This isn't true for SPARSEMEM configs unless
>>> SPARSEMEM_VMEMMAP is also set.  Fix them by using nth_page() instead of
>>> plain pointer arithmetic.
>>
>> So is this an actualy bug that need a Fixes tag, or do all architectures
>> that support THP and sparsemem use SPARSEMEM_VMEMMAP?
> 
> As far as I can tell (and I am by no means an expert in this area),
> this problem only affects pages of order MAX_ORDER or higher.  That is,
> somebody using regular 2MB hugepages on x86 won't see a problem, whether
> they're using VMEMMAP or not.  It only starts to become a problem for
> 1GB hugepages.
> 
> Since THPs are (currently) only allocated from the page allocator, it's
> never a problem for THPs, only hugetlbfs.  Correcting the places which
> can't see a 1GB page is just defense against copy-and-paste programming.
> 
> So I'll defer to Mike -- does this ever affect real systems and thus
> warrant a backport?  I know this doesn't affect UEK because we enable
> SPARSEMEM_VMEMMAP.
I guess it all depends on your definition of 'real' systems.  I am unaware
of any distros that disable SPARSEMEM_VMEMMAP, but I do not know or have
access to them all.
In arch specific Kconfig files, SPARSEMEM_VMEMMAP is enabled by default
(if sparsemem is enabled).  However, it is 'possible' to configure a kernel
with SPARSEMEM and without SPARSEMEM_VMEMMAP.
This issue came up almost a year ago in this thread:
https://lore.kernel.org/linux-mm/20210217184926.33567-1-mike.kravetz@oracle.com/
In practice, I do not recall ever seeing this outside of debug environments
specifically trying to hit the issue.
-- 
Mike Kravetz
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 01/28] gup: Remove for_each_compound_range()
  2022-01-10  4:23 ` [PATCH v2 01/28] gup: Remove for_each_compound_range() Matthew Wilcox (Oracle)
  2022-01-10  8:22   ` Christoph Hellwig
@ 2022-01-11  1:07   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  1:07 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> This macro doesn't simplify the users; it's easier to just call
> compound_range_next() inside the loop.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 12 ++----------
>   1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 2c51e9748a6a..7a07e0c00bf5 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -236,9 +236,6 @@ static inline void compound_range_next(unsigned long i, unsigned long npages,
>   	struct page *next, *page;
>   	unsigned int nr = 1;
>   
> -	if (i >= npages)
> -		return;
> -
>   	next = *list + i;
>   	page = compound_head(next);
>   	if (PageCompound(page) && compound_order(page) >= 1)
> @@ -249,12 +246,6 @@ static inline void compound_range_next(unsigned long i, unsigned long npages,
>   	*ntails = nr;
>   }
>   
> -#define for_each_compound_range(__i, __list, __npages, __head, __ntails) \
> -	for (__i = 0, \
> -	     compound_range_next(__i, __npages, __list, &(__head), &(__ntails)); \
> -	     __i < __npages; __i += __ntails, \
> -	     compound_range_next(__i, __npages, __list, &(__head), &(__ntails)))
> -
Yes!! Thank you.
>   static inline void compound_next(unsigned long i, unsigned long npages,
>   				 struct page **list, struct page **head,
>   				 unsigned int *ntails)
> @@ -371,7 +362,8 @@ void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
>   	struct page *head;
>   	unsigned int ntails;
>   
> -	for_each_compound_range(index, &page, npages, head, ntails) {
> +	for (index = 0; index < npages; index += ntails) {
> +		compound_range_next(index, npages, &page, &head, &ntails);
And now the code is more "honest and up front": items that are changed are
passed by reference, and one can see right here what those are.
>   		if (make_dirty && !PageDirty(head))
>   			set_page_dirty_lock(head);
>   		put_compound_head(head, ntails, FOLL_PIN);
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 02/28] gup: Remove for_each_compound_head()
  2022-01-10  4:23 ` [PATCH v2 02/28] gup: Remove for_each_compound_head() Matthew Wilcox (Oracle)
  2022-01-10  8:23   ` Christoph Hellwig
@ 2022-01-11  1:11   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  1:11 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> This macro doesn't simplify the users; it's easier to just call
> compound_next() inside a standard loop.
Yes it is. :)
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
Looks great.
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
>   mm/gup.c | 16 +++++-----------
>   1 file changed, 5 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 7a07e0c00bf5..86f8d843de72 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -253,9 +253,6 @@ static inline void compound_next(unsigned long i, unsigned long npages,
>   	struct page *page;
>   	unsigned int nr;
>   
> -	if (i >= npages)
> -		return;
> -
>   	page = compound_head(list[i]);
>   	for (nr = i + 1; nr < npages; nr++) {
>   		if (compound_head(list[nr]) != page)
> @@ -266,12 +263,6 @@ static inline void compound_next(unsigned long i, unsigned long npages,
>   	*ntails = nr - i;
>   }
>   
> -#define for_each_compound_head(__i, __list, __npages, __head, __ntails) \
> -	for (__i = 0, \
> -	     compound_next(__i, __npages, __list, &(__head), &(__ntails)); \
> -	     __i < __npages; __i += __ntails, \
> -	     compound_next(__i, __npages, __list, &(__head), &(__ntails)))
> -
>   /**
>    * unpin_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages
>    * @pages:  array of pages to be maybe marked dirty, and definitely released.
> @@ -306,7 +297,8 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
>   		return;
>   	}
>   
> -	for_each_compound_head(index, pages, npages, head, ntails) {
> +	for (index = 0; index < npages; index += ntails) {
> +		compound_next(index, npages, pages, &head, &ntails);
>   		/*
>   		 * Checking PageDirty at this point may race with
>   		 * clear_page_dirty_for_io(), but that's OK. Two key
> @@ -394,8 +386,10 @@ void unpin_user_pages(struct page **pages, unsigned long npages)
>   	if (WARN_ON(IS_ERR_VALUE(npages)))
>   		return;
>   
> -	for_each_compound_head(index, pages, npages, head, ntails)
> +	for (index = 0; index < npages; index += ntails) {
> +		compound_next(index, npages, pages, &head, &ntails);
>   		put_compound_head(head, ntails, FOLL_PIN);
> +	}
>   }
>   EXPORT_SYMBOL(unpin_user_pages);
>   
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 03/28] gup: Change the calling convention for compound_range_next()
  2022-01-10  4:23 ` [PATCH v2 03/28] gup: Change the calling convention for compound_range_next() Matthew Wilcox (Oracle)
  2022-01-10  8:25   ` Christoph Hellwig
@ 2022-01-11  1:14   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  1:14 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> Return the head page instead of storing it to a passed parameter.
> Pass the start page directly instead of passing a pointer to it.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
>   mm/gup.c | 11 +++++------
>   1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 86f8d843de72..3c93d2fdf4da 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -229,21 +229,20 @@ void unpin_user_page(struct page *page)
>   }
>   EXPORT_SYMBOL(unpin_user_page);
>   
> -static inline void compound_range_next(unsigned long i, unsigned long npages,
> -				       struct page **list, struct page **head,
> -				       unsigned int *ntails)
> +static inline struct page *compound_range_next(unsigned long i,
> +		unsigned long npages, struct page *start, unsigned int *ntails)
>   {
>   	struct page *next, *page;
>   	unsigned int nr = 1;
>   
> -	next = *list + i;
> +	next = start + i;
>   	page = compound_head(next);
>   	if (PageCompound(page) && compound_order(page) >= 1)
>   		nr = min_t(unsigned int,
>   			   page + compound_nr(page) - next, npages - i);
>   
> -	*head = page;
>   	*ntails = nr;
> +	return page;
>   }
>   
>   static inline void compound_next(unsigned long i, unsigned long npages,
> @@ -355,7 +354,7 @@ void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
>   	unsigned int ntails;
>   
>   	for (index = 0; index < npages; index += ntails) {
> -		compound_range_next(index, npages, &page, &head, &ntails);
> +		head = compound_range_next(index, npages, page, &ntails);
>   		if (make_dirty && !PageDirty(head))
>   			set_page_dirty_lock(head);
>   		put_compound_head(head, ntails, FOLL_PIN);
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 04/28] gup: Optimise compound_range_next()
  2022-01-10  4:23 ` [PATCH v2 04/28] gup: Optimise compound_range_next() Matthew Wilcox (Oracle)
  2022-01-10  8:26   ` Christoph Hellwig
@ 2022-01-11  1:16   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  1:16 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> By definition, a compound page has an order >= 1, so the second half
> of the test was redundant.  Also, this cannot be a tail page since
> it's the result of calling compound_head(), so use PageHead() instead
> of PageCompound().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
>   mm/gup.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 3c93d2fdf4da..6eedca605b3d 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -237,7 +237,7 @@ static inline struct page *compound_range_next(unsigned long i,
>   
>   	next = start + i;
>   	page = compound_head(next);
> -	if (PageCompound(page) && compound_order(page) >= 1)
> +	if (PageHead(page))
>   		nr = min_t(unsigned int,
>   			   page + compound_nr(page) - next, npages - i);
>   
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 05/28] gup: Change the calling convention for compound_next()
  2022-01-10  4:23 ` [PATCH v2 05/28] gup: Change the calling convention for compound_next() Matthew Wilcox (Oracle)
  2022-01-10  8:27   ` Christoph Hellwig
@ 2022-01-11  1:18   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  1:18 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> Return the head page instead of storing it to a passed parameter.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 11 +++++------
>   1 file changed, 5 insertions(+), 6 deletions(-)
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 6eedca605b3d..8a0ea220ced1 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -245,9 +245,8 @@ static inline struct page *compound_range_next(unsigned long i,
>   	return page;
>   }
>   
> -static inline void compound_next(unsigned long i, unsigned long npages,
> -				 struct page **list, struct page **head,
> -				 unsigned int *ntails)
> +static inline struct page *compound_next(unsigned long i,
> +		unsigned long npages, struct page **list, unsigned int *ntails)
>   {
>   	struct page *page;
>   	unsigned int nr;
> @@ -258,8 +257,8 @@ static inline void compound_next(unsigned long i, unsigned long npages,
>   			break;
>   	}
>   
> -	*head = page;
>   	*ntails = nr - i;
> +	return page;
>   }
>   
>   /**
> @@ -297,7 +296,7 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
>   	}
>   
>   	for (index = 0; index < npages; index += ntails) {
> -		compound_next(index, npages, pages, &head, &ntails);
> +		head = compound_next(index, npages, pages, &ntails);
>   		/*
>   		 * Checking PageDirty at this point may race with
>   		 * clear_page_dirty_for_io(), but that's OK. Two key
> @@ -386,7 +385,7 @@ void unpin_user_pages(struct page **pages, unsigned long npages)
>   		return;
>   
>   	for (index = 0; index < npages; index += ntails) {
> -		compound_next(index, npages, pages, &head, &ntails);
> +		head = compound_next(index, npages, pages, &ntails);
>   		put_compound_head(head, ntails, FOLL_PIN);
>   	}
>   }
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 06/28] gup: Fix some contiguous memmap assumptions
  2022-01-10  4:23 ` [PATCH v2 06/28] gup: Fix some contiguous memmap assumptions Matthew Wilcox (Oracle)
  2022-01-10  8:29   ` Christoph Hellwig
@ 2022-01-11  1:47   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  1:47 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> Several functions in gup.c assume that a compound page has virtually
> contiguous page structs.  This isn't true for SPARSEMEM configs unless
> SPARSEMEM_VMEMMAP is also set.  Fix them by using nth_page() instead of
> plain pointer arithmetic.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 14 +++++++-------
>   1 file changed, 7 insertions(+), 7 deletions(-)
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 8a0ea220ced1..9c0a702a4e03 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -235,7 +235,7 @@ static inline struct page *compound_range_next(unsigned long i,
>   	struct page *next, *page;
>   	unsigned int nr = 1;
>   
> -	next = start + i;
> +	next = nth_page(start, i);
>   	page = compound_head(next);
>   	if (PageHead(page))
>   		nr = min_t(unsigned int,
> @@ -2430,8 +2430,8 @@ static int record_subpages(struct page *page, unsigned long addr,
>   {
>   	int nr;
>   
> -	for (nr = 0; addr != end; addr += PAGE_SIZE)
> -		pages[nr++] = page++;
> +	for (nr = 0; addr != end; nr++, addr += PAGE_SIZE)
> +		pages[nr] = nth_page(page, nr);
>   
>   	return nr;
>   }
> @@ -2466,7 +2466,7 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
>   	VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
>   
>   	head = pte_page(pte);
> -	page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
> +	page = nth_page(head, (addr & (sz-1)) >> PAGE_SHIFT);
>   	refs = record_subpages(page, addr, end, pages + *nr);
>   
>   	head = try_grab_compound_head(head, refs, flags);
> @@ -2526,7 +2526,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
>   					     pages, nr);
>   	}
>   
> -	page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> +	page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
>   	refs = record_subpages(page, addr, end, pages + *nr);
>   
>   	head = try_grab_compound_head(pmd_page(orig), refs, flags);
> @@ -2560,7 +2560,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
>   					     pages, nr);
>   	}
>   
> -	page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> +	page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT);
>   	refs = record_subpages(page, addr, end, pages + *nr);
>   
>   	head = try_grab_compound_head(pud_page(orig), refs, flags);
> @@ -2589,7 +2589,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
>   
>   	BUILD_BUG_ON(pgd_devmap(orig));
>   
> -	page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
> +	page = nth_page(pgd_page(orig), (addr & ~PGDIR_MASK) >> PAGE_SHIFT);
>   	refs = record_subpages(page, addr, end, pages + *nr);
>   
>   	head = try_grab_compound_head(pgd_page(orig), refs, flags);
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 07/28] gup: Remove an assumption of a contiguous memmap
  2022-01-10  4:23 ` [PATCH v2 07/28] gup: Remove an assumption of a contiguous memmap Matthew Wilcox (Oracle)
  2022-01-10  8:30   ` Christoph Hellwig
@ 2022-01-11  3:27   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  3:27 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> This assumption needs the inverse of nth_page(), which I've temporarily
> named page_nth() until someone comes up with a better name.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   include/linux/mm.h | 2 ++
>   mm/gup.c           | 4 ++--
>   2 files changed, 4 insertions(+), 2 deletions(-)
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Maybe a better name for page_nth() will come to mind by the time I
reach the end of this series, we'll see. :)
thanks,
-- 
John Hubbard
NVIDIA
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index d8b7d7ed14dd..f2f3400665a4 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -216,8 +216,10 @@ int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *,
>   
>   #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
>   #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
> +#define page_nth(head, tail)	(page_to_pfn(tail) - page_to_pfn(head))
>   #else
>   #define nth_page(page,n) ((page) + (n))
> +#define page_nth(head, tail)	((tail) - (head))
>   #endif
>   
>   /* to align the pointer to the (next) page boundary */
> diff --git a/mm/gup.c b/mm/gup.c
> index 9c0a702a4e03..afb638a30e44 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -238,8 +238,8 @@ static inline struct page *compound_range_next(unsigned long i,
>   	next = nth_page(start, i);
>   	page = compound_head(next);
>   	if (PageHead(page))
> -		nr = min_t(unsigned int,
> -			   page + compound_nr(page) - next, npages - i);
> +		nr = min_t(unsigned int, npages - i,
> +			   compound_nr(page) - page_nth(page, next));
>   
>   	*ntails = nr;
>   	return page;
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 08/28] gup: Handle page split race more efficiently
  2022-01-10  4:23 ` [PATCH v2 08/28] gup: Handle page split race more efficiently Matthew Wilcox (Oracle)
  2022-01-10  8:31   ` Christoph Hellwig
@ 2022-01-11  3:30   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  3:30 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> If we hit the page split race, the current code returns NULL which will
> presumably trigger a retry under the mmap_lock.  This isn't necessary;
> we can just retry the compound_head() lookup.  This is a very minor
> optimisation of an unlikely path, but conceptually it matches (eg)
> the page cache RCU-protected lookup.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index afb638a30e44..dbb1b54d0def 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -68,7 +68,10 @@ static void put_page_refs(struct page *page, int refs)
>    */
>   static inline struct page *try_get_compound_head(struct page *page, int refs)
>   {
> -	struct page *head = compound_head(page);
> +	struct page *head;
> +
> +retry:
> +	head = compound_head(page);
>   
>   	if (WARN_ON_ONCE(page_ref_count(head) < 0))
>   		return NULL;
> @@ -86,7 +89,7 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
>   	 */
>   	if (unlikely(compound_head(page) != head)) {
>   		put_page_refs(head, refs);
> -		return NULL;
> +		goto retry;
>   	}
>   
>   	return head;
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 09/28] gup: Turn hpage_pincount_add() into page_pincount_add()
  2022-01-10  4:23 ` [PATCH v2 09/28] gup: Turn hpage_pincount_add() into page_pincount_add() Matthew Wilcox (Oracle)
  2022-01-10  8:31   ` Christoph Hellwig
@ 2022-01-11  3:35   ` John Hubbard
  2022-01-11  4:32   ` John Hubbard
  2 siblings, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  3:35 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> Simplify try_grab_compound_head() and remove an unnecessary
> VM_BUG_ON by handling pages both with and without a pincount field in
> page_pincount_add().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 33 +++++++++++++++------------------
>   1 file changed, 15 insertions(+), 18 deletions(-)
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index dbb1b54d0def..3ed9907f3c8d 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -29,12 +29,23 @@ struct follow_page_context {
>   	unsigned int page_mask;
>   };
>   
> -static void hpage_pincount_add(struct page *page, int refs)
> +/*
> + * When pinning a compound page of order > 1 (which is what
> + * hpage_pincount_available() checks for), use an exact count to track
> + * it, via page_pincount_add/_sub().
> + *
> + * However, be sure to *also* increment the normal page refcount field
> + * at least once, so that the page really is pinned.  That's why the
> + * refcount from the earlier try_get_compound_head() is left intact.
> + */
> +static void page_pincount_add(struct page *page, int refs)
>   {
> -	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
>   	VM_BUG_ON_PAGE(page != compound_head(page), page);
>   
> -	atomic_add(refs, compound_pincount_ptr(page));
> +	if (hpage_pincount_available(page))
> +		atomic_add(refs, compound_pincount_ptr(page));
> +	else
> +		page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
>   }
>   
>   static void hpage_pincount_sub(struct page *page, int refs)
> @@ -150,21 +161,7 @@ struct page *try_grab_compound_head(struct page *page,
>   		if (!page)
>   			return NULL;
>   
> -		/*
> -		 * When pinning a compound page of order > 1 (which is what
> -		 * hpage_pincount_available() checks for), use an exact count to
> -		 * track it, via hpage_pincount_add/_sub().
> -		 *
> -		 * However, be sure to *also* increment the normal page refcount
> -		 * field at least once, so that the page really is pinned.
> -		 * That's why the refcount from the earlier
> -		 * try_get_compound_head() is left intact.
> -		 */
> -		if (hpage_pincount_available(page))
> -			hpage_pincount_add(page, refs);
> -		else
> -			page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
> -
> +		page_pincount_add(page, refs);
>   		mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
>   				    refs);
>   
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 11/28] mm: Make compound_pincount always available
  2022-01-10  4:23 ` [PATCH v2 11/28] mm: Make compound_pincount always available Matthew Wilcox (Oracle)
@ 2022-01-11  4:06   ` John Hubbard
  2022-01-11  4:38     ` Matthew Wilcox
  2022-01-20  9:15   ` Christoph Hellwig
  1 sibling, 1 reply; 96+ messages in thread
From: John Hubbard @ 2022-01-11  4:06 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> Move compound_pincount from the third page to the second page, which
> means it's available for all compound pages.  That lets us delete
> hpage_pincount_available().
Wow, OK. That's a welcome simplification. Looks good. A couple comments
below, too.
...
> @@ -955,7 +944,9 @@ static inline int compound_pincount(struct page *page)
>   static inline void set_compound_order(struct page *page, unsigned int order)
>   {
>   	page[1].compound_order = order;
> +#ifdef CONFIG_64BIT
>   	page[1].compound_nr = 1U << order;
> +#endif
>   }
>   
>   /* Returns the number of pages in this potentially compound page. */
> @@ -963,7 +954,11 @@ static inline unsigned long compound_nr(struct page *page)
>   {
>   	if (!PageHead(page))
>   		return 1;
> +#ifdef CONFIG_64BIT
>   	return page[1].compound_nr;
> +#else
> +	return 1UL << compound_order(page);
> +#endif
Now that you are highlighting this, I have this persistent feeling (not
yet confirmed by any testing) that compound_nr is a micro-optimization
that is actually invisible at runtime--but is now slicing up our code
with ifdefs, and using space in a fairly valuable location.
Not for this patch or series, but maybe a separate patch or series
should just remove the compound_nr field entirely, yes? It is
surprising to carry around both compound_order and (1 <<
compound_order), right next to each other. It would be different if this
were an expensive calculation, but it's just a shift.
Maybe testing would prove that that's a bad idea, and maybe someone has
already looked into it, but I wanted to point it out.
...
> @@ -42,7 +41,7 @@ static void page_pincount_add(struct page *page, int refs)
>   {
>   	VM_BUG_ON_PAGE(page != compound_head(page), page);
>   
> -	if (hpage_pincount_available(page))
> +	if (PageHead(page))
>   		atomic_add(refs, compound_pincount_ptr(page));
>   	else
>   		page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
> @@ -52,7 +51,7 @@ static int page_pincount_sub(struct page *page, int refs)
>   {
>   	VM_BUG_ON_PAGE(page != compound_head(page), page);
>   
> -	if (hpage_pincount_available(page))
> +	if (PageHead(page))
OK, so we just verified (via VM_BUG_ON_PAGE(), which is not always active)
that this is not a tail page. And so PageHead() effectively means PageCompound().
I wonder if it would be better to just use PageCompound() here and in similar
cases. Because that's what is logically being checked, after all. It seems
slightly more accurate.
>   		atomic_sub(refs, compound_pincount_ptr(page));
>   	else
>   		refs *= GUP_PIN_COUNTING_BIAS;
> @@ -129,12 +128,11 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
>    *
>    *    FOLL_GET: page's refcount will be incremented by @refs.
>    *
> - *    FOLL_PIN on compound pages that are > two pages long: page's refcount will
> - *    be incremented by @refs, and page[2].hpage_pinned_refcount will be
> - *    incremented by @refs * GUP_PIN_COUNTING_BIAS.
> + *    FOLL_PIN on compound pages: page's refcount will be incremented by
> + *    @refs, and page[1].compound_pincount will be incremented by @refs.
ha, thanks for fixing that documentation bug!
This all looks good, the above are very minor questions,
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 12/28] mm: Add folio_put_refs()
  2022-01-10  4:23 ` [PATCH v2 12/28] mm: Add folio_put_refs() Matthew Wilcox (Oracle)
  2022-01-10  8:32   ` Christoph Hellwig
@ 2022-01-11  4:14   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  4:14 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> This is like folio_put(), but puts N references at once instead of
> just one.  It's like put_page_refs(), but does one atomic operation
> instead of two, and is available to more than just gup.c.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   include/linux/mm.h | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 598be27d4d2e..bf9624ca61c3 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1234,6 +1234,26 @@ static inline void folio_put(struct folio *folio)
>   		__put_page(&folio->page);
>   }
>   
> +/**
> + * folio_put_refs - Reduce the reference count on a folio.
> + * @folio: The folio.
> + * @refs: The number of references to reduce.
Tiny rewording suggestion, if you end up sending another version:
  * @refs: The amount to subtract from the folio's reference count.
> + *
> + * If the folio's reference count reaches zero, the memory will be
> + * released back to the page allocator and may be used by another
> + * allocation immediately.  Do not access the memory or the struct folio
> + * after calling folio_put_refs() unless you can be sure that these weren't
> + * the last references.
> + *
> + * Context: May be called in process or interrupt context, but not in NMI
> + * context.  May be called while holding a spinlock.
> + */
> +static inline void folio_put_refs(struct folio *folio, int refs)
> +{
> +	if (folio_ref_sub_and_test(folio, refs))
> +		__put_page(&folio->page);
> +}
> +
>   static inline void put_page(struct page *page)
>   {
>   	struct folio *folio = page_folio(page);
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 13/28] mm: Add folio_pincount_ptr()
  2022-01-10  4:23 ` [PATCH v2 13/28] mm: Add folio_pincount_ptr() Matthew Wilcox (Oracle)
  2022-01-10  8:33   ` Christoph Hellwig
@ 2022-01-11  4:22   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  4:22 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> This is the folio equivalent of compound_pincount_ptr().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   include/linux/mm_types.h | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 60e4595eaf63..34c7114ea9e9 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -312,6 +312,12 @@ static inline atomic_t *compound_mapcount_ptr(struct page *page)
>   	return &page[1].compound_mapcount;
>   }
>   
> +static inline atomic_t *folio_pincount_ptr(struct folio *folio)
> +{
> +	struct page *tail = &folio->page + 1;
> +	return &tail->compound_pincount;
> +}
> +
Should we make this code (the various folio map/pin count inline
functions) more uniform? I see that you prefer "+1" over page[1], sure,
but having a mix seems like it's trying to call out a difference for
which the reader would search in vain. :)
After the whole series is applied, this area ends up with a 50/50 mix of the
two.
Or am I overlooking something, and they really *should* look different?
Just a very minor point, so either way,
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
>   static inline atomic_t *compound_pincount_ptr(struct page *page)
>   {
>   	return &page[1].compound_pincount;
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 14/28] mm: Convert page_maybe_dma_pinned() to use a folio
  2022-01-10  4:23 ` [PATCH v2 14/28] mm: Convert page_maybe_dma_pinned() to use a folio Matthew Wilcox (Oracle)
  2022-01-10  8:33   ` Christoph Hellwig
@ 2022-01-11  4:27   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  4:27 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> Replace three calls to compound_head() with one.  This removes the last
> user of compound_pincount(), so remove that helper too.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   include/linux/mm.h | 21 ++++++++-------------
>   1 file changed, 8 insertions(+), 13 deletions(-)
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bf9624ca61c3..d3769897c8ac 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -934,13 +934,6 @@ static inline int head_compound_pincount(struct page *head)
>   	return atomic_read(compound_pincount_ptr(head));
>   }
>   
> -static inline int compound_pincount(struct page *page)
> -{
> -	VM_BUG_ON_PAGE(!PageCompound(page), page);
> -	page = compound_head(page);
> -	return head_compound_pincount(page);
> -}
> -
>   static inline void set_compound_order(struct page *page, unsigned int order)
>   {
>   	page[1].compound_order = order;
> @@ -1340,18 +1333,20 @@ void unpin_user_pages(struct page **pages, unsigned long npages);
>    */
>   static inline bool page_maybe_dma_pinned(struct page *page)
>   {
> -	if (PageCompound(page))
> -		return compound_pincount(page) > 0;
> +	struct folio *folio = page_folio(page);
> +
> +	if (folio_test_large(folio))
> +		return atomic_read(folio_pincount_ptr(folio)) > 0;
>   
>   	/*
> -	 * page_ref_count() is signed. If that refcount overflows, then
> -	 * page_ref_count() returns a negative value, and callers will avoid
> +	 * folio_ref_count() is signed. If that refcount overflows, then
> +	 * folio_ref_count() returns a negative value, and callers will avoid
>   	 * further incrementing the refcount.
>   	 *
> -	 * Here, for that overflow case, use the signed bit to count a little
> +	 * Here, for that overflow case, use the sign bit to count a little
>   	 * bit higher via unsigned math, and thus still get an accurate result.
>   	 */
> -	return ((unsigned int)page_ref_count(compound_head(page))) >=
> +	return ((unsigned int)folio_ref_count(folio)) >=
>   		GUP_PIN_COUNTING_BIAS;
>   }
>   
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 09/28] gup: Turn hpage_pincount_add() into page_pincount_add()
  2022-01-10  4:23 ` [PATCH v2 09/28] gup: Turn hpage_pincount_add() into page_pincount_add() Matthew Wilcox (Oracle)
  2022-01-10  8:31   ` Christoph Hellwig
  2022-01-11  3:35   ` John Hubbard
@ 2022-01-11  4:32   ` John Hubbard
  2022-01-11 13:46     ` Matthew Wilcox
  2 siblings, 1 reply; 96+ messages in thread
From: John Hubbard @ 2022-01-11  4:32 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
...
> diff --git a/mm/gup.c b/mm/gup.c
> index dbb1b54d0def..3ed9907f3c8d 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -29,12 +29,23 @@ struct follow_page_context {
>   	unsigned int page_mask;
>   };
>   
> -static void hpage_pincount_add(struct page *page, int refs)
> +/*
> + * When pinning a compound page of order > 1 (which is what
> + * hpage_pincount_available() checks for), use an exact count to track
> + * it, via page_pincount_add/_sub().
> + *
> + * However, be sure to *also* increment the normal page refcount field
> + * at least once, so that the page really is pinned.  That's why the
> + * refcount from the earlier try_get_compound_head() is left intact.
> + */
I just realized, after looking at this again in a later patch, that the
last paragraph, above, is now misplaced. It refers to the behavior of the
caller, not to this routine. So it needs to be airlifted back to the
caller.
> +static void page_pincount_add(struct page *page, int refs)
>   {
> -	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
>   	VM_BUG_ON_PAGE(page != compound_head(page), page);
>   
> -	atomic_add(refs, compound_pincount_ptr(page));
> +	if (hpage_pincount_available(page))
> +		atomic_add(refs, compound_pincount_ptr(page));
> +	else
> +		page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
>   }
>   
>   static void hpage_pincount_sub(struct page *page, int refs)
> @@ -150,21 +161,7 @@ struct page *try_grab_compound_head(struct page *page,
>   		if (!page)
>   			return NULL;
>   
> -		/*
> -		 * When pinning a compound page of order > 1 (which is what
> -		 * hpage_pincount_available() checks for), use an exact count to
> -		 * track it, via hpage_pincount_add/_sub().
> -		 *
> -		 * However, be sure to *also* increment the normal page refcount
> -		 * field at least once, so that the page really is pinned.
> -		 * That's why the refcount from the earlier
> -		 * try_get_compound_head() is left intact.
> -		 */
...here.
> -		if (hpage_pincount_available(page))
> -			hpage_pincount_add(page, refs);
> -		else
> -			page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
> -
> +		page_pincount_add(page, refs);
>   		mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
>   				    refs);
>   
thanks,
-- 
John Hubbard
NVIDIA
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 11/28] mm: Make compound_pincount always available
  2022-01-11  4:06   ` John Hubbard
@ 2022-01-11  4:38     ` Matthew Wilcox
  2022-01-11  5:10       ` John Hubbard
  0 siblings, 1 reply; 96+ messages in thread
From: Matthew Wilcox @ 2022-01-11  4:38 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-mm, Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On Mon, Jan 10, 2022 at 08:06:54PM -0800, John Hubbard wrote:
> > +#ifdef CONFIG_64BIT
> >   	return page[1].compound_nr;
> > +#else
> > +	return 1UL << compound_order(page);
> > +#endif
> 
> Now that you are highlighting this, I have this persistent feeling (not
> yet confirmed by any testing) that compound_nr is a micro-optimization
> that is actually invisible at runtime--but is now slicing up our code
> with ifdefs, and using space in a fairly valuable location.
> 
> Not for this patch or series, but maybe a separate patch or series
> should just remove the compound_nr field entirely, yes? It is
> surprising to carry around both compound_order and (1 <<
> compound_order), right next to each other. It would be different if this
> were an expensive calculation, but it's just a shift.
> 
> Maybe testing would prove that that's a bad idea, and maybe someone has
> already looked into it, but I wanted to point it out.
It' probably worth looking at the patch which added it ... 1378a5ee451a
in August 2020.  I didn't provide any performance numbers, but code size
definitely went down.
> > @@ -52,7 +51,7 @@ static int page_pincount_sub(struct page *page, int refs)
> >   {
> >   	VM_BUG_ON_PAGE(page != compound_head(page), page);
> > -	if (hpage_pincount_available(page))
> > +	if (PageHead(page))
> 
> OK, so we just verified (via VM_BUG_ON_PAGE(), which is not always active)
> that this is not a tail page. And so PageHead() effectively means PageCompound().
> 
> I wonder if it would be better to just use PageCompound() here and in similar
> cases. Because that's what is logically being checked, after all. It seems
> slightly more accurate.
Well PageCompound() is defined as PageHead() || PageTail().  I don't
think the intent was for people to always ask "Is this a compound page",
more "This is a good shorthand to replace PageHead() || PageTail()".
It's kind of moot anyway because this gets replaced with
folio_test_large() further down the patch series.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 15/28] gup: Add try_get_folio() and try_grab_folio()
  2022-01-10  4:23 ` [PATCH v2 15/28] gup: Add try_get_folio() and try_grab_folio() Matthew Wilcox (Oracle)
  2022-01-10  8:34   ` Christoph Hellwig
@ 2022-01-11  5:00   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  5:00 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> Convert try_get_compound_head() into try_get_folio() and convert
> try_grab_compound_head() into try_grab_folio().  Also convert
> hpage_pincount_add() to folio_pincount_add().  Add a temporary
> try_grab_compound_head() wrapper around try_grab_folio() to
> let us convert callers individually.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c      | 104 +++++++++++++++++++++++++-------------------------
>   mm/internal.h |   5 +++
>   2 files changed, 56 insertions(+), 53 deletions(-)
Still looks good.
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 1282d29357b7..9e581201d679 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -30,21 +30,19 @@ struct follow_page_context {
>   };
>   
>   /*
> - * When pinning a compound page, use an exact count to track it, via
> - * page_pincount_add/_sub().
> + * When pinning a large folio, use an exact count to track it.
>    *
> - * However, be sure to *also* increment the normal page refcount field
> - * at least once, so that the page really is pinned.  That's why the
> - * refcount from the earlier try_get_compound_head() is left intact.
> + * However, be sure to *also* increment the normal folio refcount
> + * field at least once, so that the folio really is pinned.
> + * That's why the refcount from the earlier
> + * try_get_folio() is left intact.
>    */
> -static void page_pincount_add(struct page *page, int refs)
> +static void folio_pincount_add(struct folio *folio, int refs)
>   {
> -	VM_BUG_ON_PAGE(page != compound_head(page), page);
> -
> -	if (PageHead(page))
> -		atomic_add(refs, compound_pincount_ptr(page));
> +	if (folio_test_large(folio))
> +		atomic_add(refs, folio_pincount_ptr(folio));
>   	else
> -		page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
> +		folio_ref_add(folio, refs * (GUP_PIN_COUNTING_BIAS - 1));
>   }
>   
>   static int page_pincount_sub(struct page *page, int refs)
> @@ -76,75 +74,70 @@ static void put_page_refs(struct page *page, int refs)
>   }
>   
>   /*
> - * Return the compound head page with ref appropriately incremented,
> + * Return the folio with ref appropriately incremented,
>    * or NULL if that failed.
>    */
> -static inline struct page *try_get_compound_head(struct page *page, int refs)
> +static inline struct folio *try_get_folio(struct page *page, int refs)
>   {
> -	struct page *head;
> +	struct folio *folio;
>   
>   retry:
> -	head = compound_head(page);
> -
> -	if (WARN_ON_ONCE(page_ref_count(head) < 0))
> +	folio = page_folio(page);
> +	if (WARN_ON_ONCE(folio_ref_count(folio) < 0))
>   		return NULL;
> -	if (unlikely(!page_cache_add_speculative(head, refs)))
> +	if (unlikely(!folio_ref_try_add_rcu(folio, refs)))
>   		return NULL;
>   
>   	/*
> -	 * At this point we have a stable reference to the head page; but it
> -	 * could be that between the compound_head() lookup and the refcount
> -	 * increment, the compound page was split, in which case we'd end up
> -	 * holding a reference on a page that has nothing to do with the page
> +	 * At this point we have a stable reference to the folio; but it
> +	 * could be that between calling page_folio() and the refcount
> +	 * increment, the folio was split, in which case we'd end up
> +	 * holding a reference on a folio that has nothing to do with the page
>   	 * we were given anymore.
> -	 * So now that the head page is stable, recheck that the pages still
> -	 * belong together.
> +	 * So now that the folio is stable, recheck that the page still
> +	 * belongs to this folio.
>   	 */
> -	if (unlikely(compound_head(page) != head)) {
> -		put_page_refs(head, refs);
> +	if (unlikely(page_folio(page) != folio)) {
> +		folio_put_refs(folio, refs);
>   		goto retry;
>   	}
>   
> -	return head;
> +	return folio;
>   }
>   
>   /**
> - * try_grab_compound_head() - attempt to elevate a page's refcount, by a
> - * flags-dependent amount.
> - *
> - * Even though the name includes "compound_head", this function is still
> - * appropriate for callers that have a non-compound @page to get.
> - *
> + * try_grab_folio() - Attempt to get or pin a folio.
>    * @page:  pointer to page to be grabbed
> - * @refs:  the value to (effectively) add to the page's refcount
> + * @refs:  the value to (effectively) add to the folio's refcount
>    * @flags: gup flags: these are the FOLL_* flag values.
>    *
>    * "grab" names in this file mean, "look at flags to decide whether to use
> - * FOLL_PIN or FOLL_GET behavior, when incrementing the page's refcount.
> + * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount.
>    *
>    * Either FOLL_PIN or FOLL_GET (or neither) must be set, but not both at the
>    * same time. (That's true throughout the get_user_pages*() and
>    * pin_user_pages*() APIs.) Cases:
>    *
> - *    FOLL_GET: page's refcount will be incremented by @refs.
> + *    FOLL_GET: folio's refcount will be incremented by @refs.
>    *
> - *    FOLL_PIN on compound pages: page's refcount will be incremented by
> - *    @refs, and page[1].compound_pincount will be incremented by @refs.
> + *    FOLL_PIN on large folios: folio's refcount will be incremented by
> + *    @refs, and its compound_pincount will be incremented by @refs.
>    *
> - *    FOLL_PIN on normal pages: page's refcount will be incremented by
> + *    FOLL_PIN on single-page folios: folio's refcount will be incremented by
>    *    @refs * GUP_PIN_COUNTING_BIAS.
>    *
> - * Return: head page (with refcount appropriately incremented) for success, or
> - * NULL upon failure. If neither FOLL_GET nor FOLL_PIN was set, that's
> - * considered failure, and furthermore, a likely bug in the caller, so a warning
> - * is also emitted.
> + * Return: The folio containing @page (with refcount appropriately
> + * incremented) for success, or NULL upon failure. If neither FOLL_GET
> + * nor FOLL_PIN was set, that's considered failure, and furthermore,
> + * a likely bug in the caller, so a warning is also emitted.
>    */
> -struct page *try_grab_compound_head(struct page *page,
> -				    int refs, unsigned int flags)
> +struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
>   {
>   	if (flags & FOLL_GET)
> -		return try_get_compound_head(page, refs);
> +		return try_get_folio(page, refs);
>   	else if (flags & FOLL_PIN) {
> +		struct folio *folio;
> +
>   		/*
>   		 * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
>   		 * right zone, so fail and let the caller fall back to the slow
> @@ -158,21 +151,26 @@ struct page *try_grab_compound_head(struct page *page,
>   		 * CAUTION: Don't use compound_head() on the page before this
>   		 * point, the result won't be stable.
>   		 */
> -		page = try_get_compound_head(page, refs);
> -		if (!page)
> +		folio = try_get_folio(page, refs);
> +		if (!folio)
>   			return NULL;
>   
> -		page_pincount_add(page, refs);
> -		mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
> -				    refs);
> +		folio_pincount_add(folio, refs);
> +		node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
>   
> -		return page;
> +		return folio;
>   	}
>   
>   	WARN_ON_ONCE(1);
>   	return NULL;
>   }
>   
> +struct page *try_grab_compound_head(struct page *page,
> +		int refs, unsigned int flags)
> +{
> +	return &try_grab_folio(page, refs, flags)->page;
> +}
> +
>   static void put_compound_head(struct page *page, int refs, unsigned int flags)
>   {
>   	if (flags & FOLL_PIN) {
> @@ -196,7 +194,7 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)
>    * @flags:   gup flags: these are the FOLL_* flag values.
>    *
>    * Either FOLL_PIN or FOLL_GET (or neither) may be set, but not both at the same
> - * time. Cases: please see the try_grab_compound_head() documentation, with
> + * time. Cases: please see the try_grab_folio() documentation, with
>    * "refs=1".
>    *
>    * Return: true for success, or if no action was required (if neither FOLL_PIN
> diff --git a/mm/internal.h b/mm/internal.h
> index 26af8a5a5be3..9a72d1ecdab4 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -723,4 +723,9 @@ void vunmap_range_noflush(unsigned long start, unsigned long end);
>   int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
>   		      unsigned long addr, int page_nid, int *flags);
>   
> +/*
> + * mm/gup.c
> + */
> +struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags);
> +
>   #endif	/* __MM_INTERNAL_H */
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 11/28] mm: Make compound_pincount always available
  2022-01-11  4:38     ` Matthew Wilcox
@ 2022-01-11  5:10       ` John Hubbard
  0 siblings, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  5:10 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/10/22 20:38, Matthew Wilcox wrote:
> On Mon, Jan 10, 2022 at 08:06:54PM -0800, John Hubbard wrote:
>>> +#ifdef CONFIG_64BIT
>>>    	return page[1].compound_nr;
>>> +#else
>>> +	return 1UL << compound_order(page);
>>> +#endif
>>
>> Now that you are highlighting this, I have this persistent feeling (not
>> yet confirmed by any testing) that compound_nr is a micro-optimization
>> that is actually invisible at runtime--but is now slicing up our code
>> with ifdefs, and using space in a fairly valuable location.
>>
>> Not for this patch or series, but maybe a separate patch or series
>> should just remove the compound_nr field entirely, yes? It is
>> surprising to carry around both compound_order and (1 <<
>> compound_order), right next to each other. It would be different if this
>> were an expensive calculation, but it's just a shift.
>>
>> Maybe testing would prove that that's a bad idea, and maybe someone has
>> already looked into it, but I wanted to point it out.
> 
> It' probably worth looking at the patch which added it ... 1378a5ee451a
> in August 2020.  I didn't provide any performance numbers, but code size
> definitely went down.
I looked at that, and the lore link for the conversation, but failed to learn
anything additional. Of course if you recall that there was in fact a measurable
performance improvement, then as of now, it's recorded somewhere. :)
It's far from clear whether we'll need or want this space in page[1] in the
future anyway, just wanted to poke at it though.
> 
>>> @@ -52,7 +51,7 @@ static int page_pincount_sub(struct page *page, int refs)
>>>    {
>>>    	VM_BUG_ON_PAGE(page != compound_head(page), page);
>>> -	if (hpage_pincount_available(page))
>>> +	if (PageHead(page))
>>
>> OK, so we just verified (via VM_BUG_ON_PAGE(), which is not always active)
>> that this is not a tail page. And so PageHead() effectively means PageCompound().
>>
>> I wonder if it would be better to just use PageCompound() here and in similar
>> cases. Because that's what is logically being checked, after all. It seems
>> slightly more accurate.
> 
> Well PageCompound() is defined as PageHead() || PageTail().  I don't
> think the intent was for people to always ask "Is this a compound page",
> more "This is a good shorthand to replace PageHead() || PageTail()".
> It's kind of moot anyway because this gets replaced with
> folio_test_large() further down the patch series.
> 
OK.
thanks,
-- 
John Hubbard
NVIDIA
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 16/28] mm: Remove page_cache_add_speculative() and page_cache_get_speculative()
  2022-01-10  4:23 ` [PATCH v2 16/28] mm: Remove page_cache_add_speculative() and page_cache_get_speculative() Matthew Wilcox (Oracle)
  2022-01-10  8:35   ` Christoph Hellwig
@ 2022-01-11  5:14   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  5:14 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> These wrappers have no more callers, so delete them.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   include/linux/mm.h      |  7 +++----
>   include/linux/pagemap.h | 11 -----------
>   2 files changed, 3 insertions(+), 15 deletions(-)
Confirmed that at this point in the series there are no more callers.
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index d3769897c8ac..b249156f7cf1 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1291,10 +1291,9 @@ static inline void put_page(struct page *page)
>    * applications that don't have huge page reference counts, this won't be an
>    * issue.
>    *
> - * Locking: the lockless algorithm described in page_cache_get_speculative()
> - * and page_cache_gup_pin_speculative() provides safe operation for
> - * get_user_pages and page_mkclean and other calls that race to set up page
> - * table entries.
> + * Locking: the lockless algorithm described in folio_try_get_rcu()
> + * provides safe operation for get_user_pages(), page_mkclean() and
> + * other calls that race to set up page table entries.
>    */
>   #define GUP_PIN_COUNTING_BIAS (1U << 10)
>   
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 704cb1b4b15d..4a63176b6417 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -283,17 +283,6 @@ static inline struct inode *folio_inode(struct folio *folio)
>   	return folio->mapping->host;
>   }
>   
> -static inline bool page_cache_add_speculative(struct page *page, int count)
> -{
> -	VM_BUG_ON_PAGE(PageTail(page), page);
> -	return folio_ref_try_add_rcu((struct folio *)page, count);
> -}
> -
> -static inline bool page_cache_get_speculative(struct page *page)
> -{
> -	return page_cache_add_speculative(page, 1);
> -}
> -
>   /**
>    * folio_attach_private - Attach private data to a folio.
>    * @folio: Folio to attach data to.
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 10/28] gup: Turn hpage_pincount_sub() into page_pincount_sub()
  2022-01-10  4:23 ` [PATCH v2 10/28] gup: Turn hpage_pincount_sub() into page_pincount_sub() Matthew Wilcox (Oracle)
  2022-01-10  8:32   ` Christoph Hellwig
@ 2022-01-11  6:40   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  6:40 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> Remove an unnecessary VM_BUG_ON by handling pages both with and without
> a pincount field in page_pincount_sub().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 15 +++++++--------
>   1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 3ed9907f3c8d..aed48de3912e 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -48,12 +48,15 @@ static void page_pincount_add(struct page *page, int refs)
>   		page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
>   }
>   
> -static void hpage_pincount_sub(struct page *page, int refs)
OK, this is a correct transformation. But it does make the function, as
small as it is, rather hard to understand. It is now factored out at a
strange place, as evidenced in the difficulty in documenting it. Here's
my best attempt so far at documenting it here:
/*
  * Subtract @refs from the compound_pincount, if applicable. Otherwise, do
  * nothing. And in all cases, return the number of pages that should be passed
  * to put_page_refs().
  */
...which I think is worth adding, if we can't find a better factoring.
Either way, it's correct, so:
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
> +static int page_pincount_sub(struct page *page, int refs)
>   {
> -	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
>   	VM_BUG_ON_PAGE(page != compound_head(page), page);
>   
> -	atomic_sub(refs, compound_pincount_ptr(page));
> +	if (hpage_pincount_available(page))
> +		atomic_sub(refs, compound_pincount_ptr(page));
> +	else
> +		refs *= GUP_PIN_COUNTING_BIAS;
> +	return refs;
>   }
>   
>   /* Equivalent to calling put_page() @refs times. */
> @@ -177,11 +180,7 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)
>   	if (flags & FOLL_PIN) {
>   		mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED,
>   				    refs);
> -
> -		if (hpage_pincount_available(page))
> -			hpage_pincount_sub(page, refs);
> -		else
> -			refs *= GUP_PIN_COUNTING_BIAS;
> +		refs = page_pincount_sub(page, refs);
>   	}
>   
>   	put_page_refs(page, refs);
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 17/28] gup: Add gup_put_folio()
  2022-01-10  4:23 ` [PATCH v2 17/28] gup: Add gup_put_folio() Matthew Wilcox (Oracle)
  2022-01-10  8:35   ` Christoph Hellwig
@ 2022-01-11  6:44   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  6:44 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> Convert put_compound_head() to gup_put_folio() and hpage_pincount_sub()
> to folio_pincount_sub().  This removes the last call to put_page_refs(),
> so delete it.  Add a temporary put_compound_head() wrapper which will
> be deleted by the end of this series.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 42 ++++++++++++++----------------------------
>   1 file changed, 14 insertions(+), 28 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 9e581201d679..719252fa0402 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -45,34 +45,15 @@ static void folio_pincount_add(struct folio *folio, int refs)
>   		folio_ref_add(folio, refs * (GUP_PIN_COUNTING_BIAS - 1));
>   }
>   
> -static int page_pincount_sub(struct page *page, int refs)
> +static int folio_pincount_sub(struct folio *folio, int refs)
I just noticed that this return value was a little hard to reason about, which
prompted me to notice that I'd skipped reviewing patch 10. So I went back
and added a note there. :)
Anyway, this one looks good,
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
>   {
> -	VM_BUG_ON_PAGE(page != compound_head(page), page);
> -
> -	if (PageHead(page))
> -		atomic_sub(refs, compound_pincount_ptr(page));
> +	if (folio_test_large(folio))
> +		atomic_sub(refs, folio_pincount_ptr(folio));
>   	else
>   		refs *= GUP_PIN_COUNTING_BIAS;
>   	return refs;
>   }
>   
> -/* Equivalent to calling put_page() @refs times. */
> -static void put_page_refs(struct page *page, int refs)
> -{
> -#ifdef CONFIG_DEBUG_VM
> -	if (VM_WARN_ON_ONCE_PAGE(page_ref_count(page) < refs, page))
> -		return;
> -#endif
> -
> -	/*
> -	 * Calling put_page() for each ref is unnecessarily slow. Only the last
> -	 * ref needs a put_page().
> -	 */
> -	if (refs > 1)
> -		page_ref_sub(page, refs - 1);
> -	put_page(page);
> -}
> -
>   /*
>    * Return the folio with ref appropriately incremented,
>    * or NULL if that failed.
> @@ -171,15 +152,20 @@ struct page *try_grab_compound_head(struct page *page,
>   	return &try_grab_folio(page, refs, flags)->page;
>   }
>   
> -static void put_compound_head(struct page *page, int refs, unsigned int flags)
> +static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
>   {
>   	if (flags & FOLL_PIN) {
> -		mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED,
> -				    refs);
> -		refs = page_pincount_sub(page, refs);
> +		node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs);
> +		refs = folio_pincount_sub(folio, refs);
>   	}
>   
> -	put_page_refs(page, refs);
> +	folio_put_refs(folio, refs);
> +}
> +
> +static void put_compound_head(struct page *page, int refs, unsigned int flags)
> +{
> +	VM_BUG_ON_PAGE(PageTail(page), page);
> +	gup_put_folio((struct folio *)page, refs, flags);
>   }
>   
>   /**
> @@ -220,7 +206,7 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags)
>    */
>   void unpin_user_page(struct page *page)
>   {
> -	put_compound_head(compound_head(page), 1, FOLL_PIN);
> +	gup_put_folio(page_folio(page), 1, FOLL_PIN);
>   }
>   EXPORT_SYMBOL(unpin_user_page);
>   
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 18/28] hugetlb: Use try_grab_folio() instead of try_grab_compound_head()
  2022-01-10  4:23 ` [PATCH v2 18/28] hugetlb: Use try_grab_folio() instead of try_grab_compound_head() Matthew Wilcox (Oracle)
  2022-01-10  8:36   ` Christoph Hellwig
@ 2022-01-11  6:47   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  6:47 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> follow_hugetlb_page() only cares about success or failure, so it doesn't
> need to know the type of the returned pointer, only whether it's NULL
> or not.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   include/linux/mm.h | 3 ---
>   mm/gup.c           | 2 +-
>   mm/hugetlb.c       | 7 +++----
>   3 files changed, 4 insertions(+), 8 deletions(-)
> 
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index b249156f7cf1..c103c6401ecd 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1195,9 +1195,6 @@ static inline void get_page(struct page *page)
>   }
>   
>   bool __must_check try_grab_page(struct page *page, unsigned int flags);
> -struct page *try_grab_compound_head(struct page *page, int refs,
> -				    unsigned int flags);
> -
>   
>   static inline __must_check bool try_get_page(struct page *page)
>   {
> diff --git a/mm/gup.c b/mm/gup.c
> index 719252fa0402..20703de2f107 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -146,7 +146,7 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
>   	return NULL;
>   }
>   
> -struct page *try_grab_compound_head(struct page *page,
> +static inline struct page *try_grab_compound_head(struct page *page,
>   		int refs, unsigned int flags)
>   {
>   	return &try_grab_folio(page, refs, flags)->page;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index abcd1785c629..ab67b13c4a71 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6072,7 +6072,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>   
>   		if (pages) {
>   			/*
> -			 * try_grab_compound_head() should always succeed here,
> +			 * try_grab_folio() should always succeed here,
>   			 * because: a) we hold the ptl lock, and b) we've just
>   			 * checked that the huge page is present in the page
>   			 * tables. If the huge page is present, then the tail
> @@ -6081,9 +6081,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
>   			 * any way. So this page must be available at this
>   			 * point, unless the page refcount overflowed:
>   			 */
> -			if (WARN_ON_ONCE(!try_grab_compound_head(pages[i],
> -								 refs,
> -								 flags))) {
> +			if (WARN_ON_ONCE(!try_grab_folio(pages[i], refs,
> +							 flags))) {
>   				spin_unlock(ptl);
>   				remainder = 0;
>   				err = -ENOMEM;
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 19/28] gup: Convert try_grab_page() to call try_grab_folio()
  2022-01-10  4:23 ` [PATCH v2 19/28] gup: Convert try_grab_page() to call try_grab_folio() Matthew Wilcox (Oracle)
  2022-01-10  8:36   ` Christoph Hellwig
@ 2022-01-11  7:01   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  7:01 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> try_grab_page() only cares about success or failure, not about the
> type returned.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 20703de2f107..c3e514172eaf 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -192,7 +192,7 @@ bool __must_check try_grab_page(struct page *page, unsigned int flags)
>   	if (!(flags & (FOLL_GET | FOLL_PIN)))
>   		return true;
>   
> -	return try_grab_compound_head(page, 1, flags);
> +	return try_grab_folio(page, 1, flags);
>   }
>   
>   /**
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 20/28] gup: Convert gup_pte_range() to use a folio
  2022-01-10  4:23 ` [PATCH v2 20/28] gup: Convert gup_pte_range() to use a folio Matthew Wilcox (Oracle)
  2022-01-10  8:37   ` Christoph Hellwig
@ 2022-01-11  7:06   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  7:06 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> We still call try_grab_folio() once per PTE; a future patch could
> optimise to just adjust the reference count for each page within
> the folio.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 16 +++++++---------
>   1 file changed, 7 insertions(+), 9 deletions(-)
Still looks good.
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index c3e514172eaf..27cc097ec05d 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2235,7 +2235,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>   	ptem = ptep = pte_offset_map(&pmd, addr);
>   	do {
>   		pte_t pte = ptep_get_lockless(ptep);
> -		struct page *head, *page;
> +		struct page *page;
> +		struct folio *folio;
>   
>   		/*
>   		 * Similar to the PMD case below, NUMA hinting must take slow
> @@ -2262,22 +2263,20 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>   		VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
>   		page = pte_page(pte);
>   
> -		head = try_grab_compound_head(page, 1, flags);
> -		if (!head)
> +		folio = try_grab_folio(page, 1, flags);
> +		if (!folio)
>   			goto pte_unmap;
>   
>   		if (unlikely(page_is_secretmem(page))) {
> -			put_compound_head(head, 1, flags);
> +			gup_put_folio(folio, 1, flags);
>   			goto pte_unmap;
>   		}
>   
>   		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> -			put_compound_head(head, 1, flags);
> +			gup_put_folio(folio, 1, flags);
>   			goto pte_unmap;
>   		}
>   
> -		VM_BUG_ON_PAGE(compound_head(page) != head, page);
> -
>   		/*
>   		 * We need to make the page accessible if and only if we are
>   		 * going to access its content (the FOLL_PIN case).  Please
> @@ -2291,10 +2290,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
>   				goto pte_unmap;
>   			}
>   		}
> -		SetPageReferenced(page);
> +		folio_set_referenced(folio);
>   		pages[*nr] = page;
>   		(*nr)++;
> -
>   	} while (ptep++, addr += PAGE_SIZE, addr != end);
>   
>   	ret = 1;
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 21/28] gup: Convert gup_hugepte() to use a folio
  2022-01-10  4:23 ` [PATCH v2 21/28] gup: Convert gup_hugepte() " Matthew Wilcox (Oracle)
  2022-01-10  8:37   ` Christoph Hellwig
@ 2022-01-11  7:33   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  7:33 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> There should be little to no effect from this patch; just removing
> uses of some old APIs.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 14 +++++++-------
>   1 file changed, 7 insertions(+), 7 deletions(-)
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 27cc097ec05d..250326458df6 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2428,7 +2428,8 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
>   		       struct page **pages, int *nr)
>   {
>   	unsigned long pte_end;
> -	struct page *head, *page;
> +	struct page *page;
> +	struct folio *folio;
>   	pte_t pte;
>   	int refs;
>   
> @@ -2444,21 +2445,20 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
>   	/* hugepages are never "special" */
>   	VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
>   
> -	head = pte_page(pte);
> -	page = nth_page(head, (addr & (sz-1)) >> PAGE_SHIFT);
> +	page = nth_page(pte_page(pte), (addr & (sz - 1)) >> PAGE_SHIFT);
>   	refs = record_subpages(page, addr, end, pages + *nr);
>   
> -	head = try_grab_compound_head(head, refs, flags);
> -	if (!head)
> +	folio = try_grab_folio(page, refs, flags);
> +	if (!folio)
>   		return 0;
>   
>   	if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> -		put_compound_head(head, refs, flags);
> +		gup_put_folio(folio, refs, flags);
>   		return 0;
>   	}
>   
>   	*nr += refs;
> -	SetPageReferenced(head);
> +	folio_set_referenced(folio);
>   	return 1;
>   }
>   
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 22/28] gup: Convert gup_huge_pmd() to use a folio
  2022-01-10  4:24 ` [PATCH v2 22/28] gup: Convert gup_huge_pmd() " Matthew Wilcox (Oracle)
  2022-01-10  8:37   ` Christoph Hellwig
@ 2022-01-11  7:36   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  7:36 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:24, Matthew Wilcox (Oracle) wrote:
> Use the new folio-based APIs.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 11 ++++++-----
>   1 file changed, 6 insertions(+), 5 deletions(-)
> 
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
> diff --git a/mm/gup.c b/mm/gup.c
> index 250326458df6..a006bce2d47b 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2492,7 +2492,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
>   			unsigned long end, unsigned int flags,
>   			struct page **pages, int *nr)
>   {
> -	struct page *head, *page;
> +	struct page *page;
> +	struct folio *folio;
>   	int refs;
>   
>   	if (!pmd_access_permitted(orig, flags & FOLL_WRITE))
> @@ -2508,17 +2509,17 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
>   	page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
>   	refs = record_subpages(page, addr, end, pages + *nr);
>   
> -	head = try_grab_compound_head(pmd_page(orig), refs, flags);
> -	if (!head)
> +	folio = try_grab_folio(page, refs, flags);
> +	if (!folio)
>   		return 0;
>   
>   	if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
> -		put_compound_head(head, refs, flags);
> +		gup_put_folio(folio, refs, flags);
>   		return 0;
>   	}
>   
>   	*nr += refs;
> -	SetPageReferenced(head);
> +	folio_set_referenced(folio);
>   	return 1;
>   }
>   
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 23/28] gup: Convert gup_huge_pud() to use a folio
  2022-01-10  4:24 ` [PATCH v2 23/28] gup: Convert gup_huge_pud() " Matthew Wilcox (Oracle)
  2022-01-10  8:38   ` Christoph Hellwig
@ 2022-01-11  7:38   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  7:38 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:24, Matthew Wilcox (Oracle) wrote:
> Use the new folio-based APIs.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 11 ++++++-----
>   1 file changed, 6 insertions(+), 5 deletions(-)
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index a006bce2d47b..7b7bf8361558 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2527,7 +2527,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
>   			unsigned long end, unsigned int flags,
>   			struct page **pages, int *nr)
>   {
> -	struct page *head, *page;
> +	struct page *page;
> +	struct folio *folio;
>   	int refs;
>   
>   	if (!pud_access_permitted(orig, flags & FOLL_WRITE))
> @@ -2543,17 +2544,17 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
>   	page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT);
>   	refs = record_subpages(page, addr, end, pages + *nr);
>   
> -	head = try_grab_compound_head(pud_page(orig), refs, flags);
> -	if (!head)
> +	folio = try_grab_folio(page, refs, flags);
> +	if (!folio)
>   		return 0;
>   
>   	if (unlikely(pud_val(orig) != pud_val(*pudp))) {
> -		put_compound_head(head, refs, flags);
> +		gup_put_folio(folio, refs, flags);
>   		return 0;
>   	}
>   
>   	*nr += refs;
> -	SetPageReferenced(head);
> +	folio_set_referenced(folio);
>   	return 1;
>   }
>   
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 24/28] gup: Convert gup_huge_pgd() to use a folio
  2022-01-10  4:24 ` [PATCH v2 24/28] gup: Convert gup_huge_pgd() " Matthew Wilcox (Oracle)
  2022-01-10  8:38   ` Christoph Hellwig
@ 2022-01-11  7:38   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  7:38 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:24, Matthew Wilcox (Oracle) wrote:
> Use the new folio-based APIs.  This was the last user of
> try_grab_compound_head(), so remove it.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 17 ++++++-----------
>   1 file changed, 6 insertions(+), 11 deletions(-)
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 7b7bf8361558..b5786e83c418 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -146,12 +146,6 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
>   	return NULL;
>   }
>   
> -static inline struct page *try_grab_compound_head(struct page *page,
> -		int refs, unsigned int flags)
> -{
> -	return &try_grab_folio(page, refs, flags)->page;
> -}
> -
>   static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
>   {
>   	if (flags & FOLL_PIN) {
> @@ -2563,7 +2557,8 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
>   			struct page **pages, int *nr)
>   {
>   	int refs;
> -	struct page *head, *page;
> +	struct page *page;
> +	struct folio *folio;
>   
>   	if (!pgd_access_permitted(orig, flags & FOLL_WRITE))
>   		return 0;
> @@ -2573,17 +2568,17 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
>   	page = nth_page(pgd_page(orig), (addr & ~PGDIR_MASK) >> PAGE_SHIFT);
>   	refs = record_subpages(page, addr, end, pages + *nr);
>   
> -	head = try_grab_compound_head(pgd_page(orig), refs, flags);
> -	if (!head)
> +	folio = try_grab_folio(page, refs, flags);
> +	if (!folio)
>   		return 0;
>   
>   	if (unlikely(pgd_val(orig) != pgd_val(*pgdp))) {
> -		put_compound_head(head, refs, flags);
> +		gup_put_folio(folio, refs, flags);
>   		return 0;
>   	}
>   
>   	*nr += refs;
> -	SetPageReferenced(head);
> +	folio_set_referenced(folio);
>   	return 1;
>   }
>   
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 25/28] gup: Convert compound_next() to gup_folio_next()
  2022-01-10  4:24 ` [PATCH v2 25/28] gup: Convert compound_next() to gup_folio_next() Matthew Wilcox (Oracle)
  2022-01-10  8:39   ` Christoph Hellwig
@ 2022-01-11  7:41   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  7:41 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:24, Matthew Wilcox (Oracle) wrote:
> Convert both callers to work on folios instead of pages.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 41 ++++++++++++++++++++++-------------------
>   1 file changed, 22 insertions(+), 19 deletions(-)
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index b5786e83c418..0cf2d5fd8d2d 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -220,20 +220,20 @@ static inline struct page *compound_range_next(unsigned long i,
>   	return page;
>   }
>   
> -static inline struct page *compound_next(unsigned long i,
> +static inline struct folio *gup_folio_next(unsigned long i,
>   		unsigned long npages, struct page **list, unsigned int *ntails)
>   {
> -	struct page *page;
> +	struct folio *folio;
>   	unsigned int nr;
>   
> -	page = compound_head(list[i]);
> +	folio = page_folio(list[i]);
>   	for (nr = i + 1; nr < npages; nr++) {
> -		if (compound_head(list[nr]) != page)
> +		if (page_folio(list[nr]) != folio)
>   			break;
>   	}
>   
>   	*ntails = nr - i;
> -	return page;
> +	return folio;
>   }
>   
>   /**
> @@ -261,17 +261,17 @@ static inline struct page *compound_next(unsigned long i,
>   void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
>   				 bool make_dirty)
>   {
> -	unsigned long index;
> -	struct page *head;
> -	unsigned int ntails;
> +	unsigned long i;
> +	struct folio *folio;
> +	unsigned int nr;
>   
>   	if (!make_dirty) {
>   		unpin_user_pages(pages, npages);
>   		return;
>   	}
>   
> -	for (index = 0; index < npages; index += ntails) {
> -		head = compound_next(index, npages, pages, &ntails);
> +	for (i = 0; i < npages; i += nr) {
> +		folio = gup_folio_next(i, npages, pages, &nr);
>   		/*
>   		 * Checking PageDirty at this point may race with
>   		 * clear_page_dirty_for_io(), but that's OK. Two key
> @@ -292,9 +292,12 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
>   		 * written back, so it gets written back again in the
>   		 * next writeback cycle. This is harmless.
>   		 */
> -		if (!PageDirty(head))
> -			set_page_dirty_lock(head);
> -		put_compound_head(head, ntails, FOLL_PIN);
> +		if (!folio_test_dirty(folio)) {
> +			folio_lock(folio);
> +			folio_mark_dirty(folio);
> +			folio_unlock(folio);
> +		}
> +		gup_put_folio(folio, nr, FOLL_PIN);
>   	}
>   }
>   EXPORT_SYMBOL(unpin_user_pages_dirty_lock);
> @@ -347,9 +350,9 @@ EXPORT_SYMBOL(unpin_user_page_range_dirty_lock);
>    */
>   void unpin_user_pages(struct page **pages, unsigned long npages)
>   {
> -	unsigned long index;
> -	struct page *head;
> -	unsigned int ntails;
> +	unsigned long i;
> +	struct folio *folio;
> +	unsigned int nr;
>   
>   	/*
>   	 * If this WARN_ON() fires, then the system *might* be leaking pages (by
> @@ -359,9 +362,9 @@ void unpin_user_pages(struct page **pages, unsigned long npages)
>   	if (WARN_ON(IS_ERR_VALUE(npages)))
>   		return;
>   
> -	for (index = 0; index < npages; index += ntails) {
> -		head = compound_next(index, npages, pages, &ntails);
> -		put_compound_head(head, ntails, FOLL_PIN);
> +	for (i = 0; i < npages; i += nr) {
> +		folio = gup_folio_next(i, npages, pages, &nr);
> +		gup_put_folio(folio, nr, FOLL_PIN);
>   	}
>   }
>   EXPORT_SYMBOL(unpin_user_pages);
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 26/28] gup: Convert compound_range_next() to gup_folio_range_next()
  2022-01-10  4:24 ` [PATCH v2 26/28] gup: Convert compound_range_next() to gup_folio_range_next() Matthew Wilcox (Oracle)
  2022-01-10  8:41   ` Christoph Hellwig
@ 2022-01-11  7:44   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  7:44 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:24, Matthew Wilcox (Oracle) wrote:
> Convert the only caller to work on folios instead of pages.
> This removes the last caller of put_compound_head(), so delete it.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   include/linux/mm.h |  4 ++--
>   mm/gup.c           | 38 ++++++++++++++++++--------------------
>   2 files changed, 20 insertions(+), 22 deletions(-)
> 
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index c103c6401ecd..1ddb0a55b5ca 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -216,10 +216,10 @@ int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *,
>   
>   #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
>   #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
> -#define page_nth(head, tail)	(page_to_pfn(tail) - page_to_pfn(head))
> +#define folio_nth(folio, page)	(page_to_pfn(page) - folio_pfn(folio))
>   #else
>   #define nth_page(page,n) ((page) + (n))
> -#define page_nth(head, tail)	((tail) - (head))
> +#define folio_nth(folio, tail)	((tail) - &(folio)->page)
>   #endif
>   
>   /* to align the pointer to the (next) page boundary */
> diff --git a/mm/gup.c b/mm/gup.c
> index 0cf2d5fd8d2d..1cdd5f2887a8 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -156,12 +156,6 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
>   	folio_put_refs(folio, refs);
>   }
>   
> -static void put_compound_head(struct page *page, int refs, unsigned int flags)
> -{
> -	VM_BUG_ON_PAGE(PageTail(page), page);
> -	gup_put_folio((struct folio *)page, refs, flags);
> -}
> -
>   /**
>    * try_grab_page() - elevate a page's refcount by a flag-dependent amount
>    *
> @@ -204,20 +198,21 @@ void unpin_user_page(struct page *page)
>   }
>   EXPORT_SYMBOL(unpin_user_page);
>   
> -static inline struct page *compound_range_next(unsigned long i,
> +static inline struct folio *gup_folio_range_next(unsigned long i,
>   		unsigned long npages, struct page *start, unsigned int *ntails)
>   {
> -	struct page *next, *page;
> +	struct page *next;
> +	struct folio *folio;
>   	unsigned int nr = 1;
>   
>   	next = nth_page(start, i);
> -	page = compound_head(next);
> -	if (PageHead(page))
> +	folio = page_folio(next);
> +	if (folio_test_large(folio))
>   		nr = min_t(unsigned int, npages - i,
> -			   compound_nr(page) - page_nth(page, next));
> +			   folio_nr_pages(folio) - folio_nth(folio, next));
>   
>   	*ntails = nr;
> -	return page;
> +	return folio;
>   }
>   
>   static inline struct folio *gup_folio_next(unsigned long i,
> @@ -326,15 +321,18 @@ EXPORT_SYMBOL(unpin_user_pages_dirty_lock);
>   void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
>   				      bool make_dirty)
>   {
> -	unsigned long index;
> -	struct page *head;
> -	unsigned int ntails;
> +	unsigned long i;
> +	struct folio *folio;
> +	unsigned int nr;
>   
> -	for (index = 0; index < npages; index += ntails) {
> -		head = compound_range_next(index, npages, page, &ntails);
> -		if (make_dirty && !PageDirty(head))
> -			set_page_dirty_lock(head);
> -		put_compound_head(head, ntails, FOLL_PIN);
> +	for (i = 0; i < npages; i += nr) {
> +		folio = gup_folio_range_next(i, npages, page, &nr);
> +		if (make_dirty && !folio_test_dirty(folio)) {
> +			folio_lock(folio);
> +			folio_mark_dirty(folio);
> +			folio_unlock(folio);
> +		}
> +		gup_put_folio(folio, nr, FOLL_PIN);
>   	}
>   }
>   EXPORT_SYMBOL(unpin_user_page_range_dirty_lock);
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 27/28] mm: Add isolate_lru_folio()
  2022-01-10  4:24 ` [PATCH v2 27/28] mm: Add isolate_lru_folio() Matthew Wilcox (Oracle)
  2022-01-10  8:42   ` Christoph Hellwig
@ 2022-01-11  7:49   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  7:49 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:24, Matthew Wilcox (Oracle) wrote:
> Turn isolate_lru_page() into a wrapper around isolate_lru_folio().
> TestClearPageLRU() would have always failed on a tail page, so
> returning -EBUSY is the same behaviour.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   arch/powerpc/include/asm/mmu_context.h |  1 -
>   mm/folio-compat.c                      |  8 +++++
>   mm/internal.h                          |  3 +-
>   mm/vmscan.c                            | 43 ++++++++++++--------------
>   4 files changed, 29 insertions(+), 26 deletions(-)
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 9ba6b585337f..b9cab0a11421 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -21,7 +21,6 @@ extern void destroy_context(struct mm_struct *mm);
>   #ifdef CONFIG_SPAPR_TCE_IOMMU
>   struct mm_iommu_table_group_mem_t;
>   
> -extern int isolate_lru_page(struct page *page);	/* from internal.h */
>   extern bool mm_iommu_preregistered(struct mm_struct *mm);
>   extern long mm_iommu_new(struct mm_struct *mm,
>   		unsigned long ua, unsigned long entries,
> diff --git a/mm/folio-compat.c b/mm/folio-compat.c
> index 749555a232a8..782e766cd1ee 100644
> --- a/mm/folio-compat.c
> +++ b/mm/folio-compat.c
> @@ -7,6 +7,7 @@
>   #include <linux/migrate.h>
>   #include <linux/pagemap.h>
>   #include <linux/swap.h>
> +#include "internal.h"
>   
>   struct address_space *page_mapping(struct page *page)
>   {
> @@ -151,3 +152,10 @@ int try_to_release_page(struct page *page, gfp_t gfp)
>   	return filemap_release_folio(page_folio(page), gfp);
>   }
>   EXPORT_SYMBOL(try_to_release_page);
> +
> +int isolate_lru_page(struct page *page)
> +{
> +	if (WARN_RATELIMIT(PageTail(page), "trying to isolate tail page"))
> +		return -EBUSY;
> +	return isolate_lru_folio((struct folio *)page);
> +}
> diff --git a/mm/internal.h b/mm/internal.h
> index 9a72d1ecdab4..8b90db90e7f2 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -157,7 +157,8 @@ extern unsigned long highest_memmap_pfn;
>   /*
>    * in mm/vmscan.c:
>    */
> -extern int isolate_lru_page(struct page *page);
> +int isolate_lru_page(struct page *page);
> +int isolate_lru_folio(struct folio *folio);
>   extern void putback_lru_page(struct page *page);
>   extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason);
>   
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index fb9584641ac7..ac2f5b76cdb2 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2168,45 +2168,40 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>   }
>   
>   /**
> - * isolate_lru_page - tries to isolate a page from its LRU list
> - * @page: page to isolate from its LRU list
> + * isolate_lru_folio - Try to isolate a folio from its LRU list.
> + * @folio: Folio to isolate from its LRU list.
>    *
> - * Isolates a @page from an LRU list, clears PageLRU and adjusts the
> - * vmstat statistic corresponding to whatever LRU list the page was on.
> + * Isolate a @folio from an LRU list and adjust the vmstat statistic
> + * corresponding to whatever LRU list the folio was on.
>    *
> - * Returns 0 if the page was removed from an LRU list.
> - * Returns -EBUSY if the page was not on an LRU list.
> - *
> - * The returned page will have PageLRU() cleared.  If it was found on
> - * the active list, it will have PageActive set.  If it was found on
> - * the unevictable list, it will have the PageUnevictable bit set. That flag
> + * The folio will have its LRU flag cleared.  If it was found on the
> + * active list, it will have the Active flag set.  If it was found on the
> + * unevictable list, it will have the Unevictable flag set.  These flags
>    * may need to be cleared by the caller before letting the page go.
>    *
> - * The vmstat statistic corresponding to the list on which the page was
> - * found will be decremented.
> - *
> - * Restrictions:
> + * Context:
>    *
>    * (1) Must be called with an elevated refcount on the page. This is a
> - *     fundamental difference from isolate_lru_pages (which is called
> + *     fundamental difference from isolate_lru_pages() (which is called
>    *     without a stable reference).
> - * (2) the lru_lock must not be held.
> - * (3) interrupts must be enabled.
> + * (2) The lru_lock must not be held.
> + * (3) Interrupts must be enabled.
> + *
> + * Return: 0 if the folio was removed from an LRU list.
> + * -EBUSY if the folio was not on an LRU list.
>    */
> -int isolate_lru_page(struct page *page)
> +int isolate_lru_folio(struct folio *folio)
>   {
> -	struct folio *folio = page_folio(page);
>   	int ret = -EBUSY;
>   
> -	VM_BUG_ON_PAGE(!page_count(page), page);
> -	WARN_RATELIMIT(PageTail(page), "trying to isolate tail page");
> +	VM_BUG_ON_FOLIO(!folio_ref_count(folio), folio);
>   
> -	if (TestClearPageLRU(page)) {
> +	if (folio_test_clear_lru(folio)) {
>   		struct lruvec *lruvec;
>   
> -		get_page(page);
> +		folio_get(folio);
>   		lruvec = folio_lruvec_lock_irq(folio);
> -		del_page_from_lru_list(page, lruvec);
> +		lruvec_del_folio(lruvec, folio);
>   		unlock_page_lruvec_irq(lruvec);
>   		ret = 0;
>   	}
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 28/28] gup: Convert check_and_migrate_movable_pages() to use a folio
  2022-01-10  4:24 ` [PATCH v2 28/28] gup: Convert check_and_migrate_movable_pages() to use a folio Matthew Wilcox (Oracle)
  2022-01-10  8:42   ` Christoph Hellwig
@ 2022-01-11  7:52   ` John Hubbard
  1 sibling, 0 replies; 96+ messages in thread
From: John Hubbard @ 2022-01-11  7:52 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm
  Cc: Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On 1/9/22 20:24, Matthew Wilcox (Oracle) wrote:
> Switch from head pages to folios.  This removes an assumption that
> THPs are the only way to have a high-order page.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>   mm/gup.c | 28 ++++++++++++++--------------
>   1 file changed, 14 insertions(+), 14 deletions(-)
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
thanks,
-- 
John Hubbard
NVIDIA
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 1cdd5f2887a8..b2d109626c44 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1801,41 +1801,41 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>   	bool drain_allow = true;
>   	LIST_HEAD(movable_page_list);
>   	long ret = 0;
> -	struct page *prev_head = NULL;
> -	struct page *head;
> +	struct folio *folio, *prev_folio = NULL;
>   	struct migration_target_control mtc = {
>   		.nid = NUMA_NO_NODE,
>   		.gfp_mask = GFP_USER | __GFP_NOWARN,
>   	};
>   
>   	for (i = 0; i < nr_pages; i++) {
> -		head = compound_head(pages[i]);
> -		if (head == prev_head)
> +		folio = page_folio(pages[i]);
> +		if (folio == prev_folio)
>   			continue;
> -		prev_head = head;
> +		prev_folio = folio;
>   		/*
>   		 * If we get a movable page, since we are going to be pinning
>   		 * these entries, try to move them out if possible.
>   		 */
> -		if (!is_pinnable_page(head)) {
> -			if (PageHuge(head)) {
> -				if (!isolate_huge_page(head, &movable_page_list))
> +		if (!is_pinnable_page(&folio->page)) {
> +			if (folio_test_hugetlb(folio)) {
> +				if (!isolate_huge_page(&folio->page,
> +							&movable_page_list))
>   					isolation_error_count++;
>   			} else {
> -				if (!PageLRU(head) && drain_allow) {
> +				if (!folio_test_lru(folio) && drain_allow) {
>   					lru_add_drain_all();
>   					drain_allow = false;
>   				}
>   
> -				if (isolate_lru_page(head)) {
> +				if (isolate_lru_folio(folio)) {
>   					isolation_error_count++;
>   					continue;
>   				}
> -				list_add_tail(&head->lru, &movable_page_list);
> -				mod_node_page_state(page_pgdat(head),
> +				list_add_tail(&folio->lru, &movable_page_list);
> +				node_stat_mod_folio(folio,
>   						    NR_ISOLATED_ANON +
> -						    page_is_file_lru(head),
> -						    thp_nr_pages(head));
> +						    folio_is_file_lru(folio),
> +						    folio_nr_pages(folio));
>   			}
>   		}
>   	}
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 09/28] gup: Turn hpage_pincount_add() into page_pincount_add()
  2022-01-11  4:32   ` John Hubbard
@ 2022-01-11 13:46     ` Matthew Wilcox
  0 siblings, 0 replies; 96+ messages in thread
From: Matthew Wilcox @ 2022-01-11 13:46 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-mm, Christoph Hellwig, William Kucharski, linux-kernel,
	Jason Gunthorpe
On Mon, Jan 10, 2022 at 08:32:20PM -0800, John Hubbard wrote:
> On 1/9/22 20:23, Matthew Wilcox (Oracle) wrote:
> ...
> > diff --git a/mm/gup.c b/mm/gup.c
> > index dbb1b54d0def..3ed9907f3c8d 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -29,12 +29,23 @@ struct follow_page_context {
> >   	unsigned int page_mask;
> >   };
> > -static void hpage_pincount_add(struct page *page, int refs)
> > +/*
> > + * When pinning a compound page of order > 1 (which is what
> > + * hpage_pincount_available() checks for), use an exact count to track
> > + * it, via page_pincount_add/_sub().
> > + *
> > + * However, be sure to *also* increment the normal page refcount field
> > + * at least once, so that the page really is pinned.  That's why the
> > + * refcount from the earlier try_get_compound_head() is left intact.
> > + */
> 
> I just realized, after looking at this again in a later patch, that the
> last paragraph, above, is now misplaced. It refers to the behavior of the
> caller, not to this routine. So it needs to be airlifted back to the
> caller.
I really do think it fits better here.  The thing is there's just one
caller, so it's a little hard to decide what "all callers" need when
there's only one.  Maybe I can wordsmith this a bit to read better.
> > +static void page_pincount_add(struct page *page, int refs)
> >   {
> > -	VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
> >   	VM_BUG_ON_PAGE(page != compound_head(page), page);
> > -	atomic_add(refs, compound_pincount_ptr(page));
> > +	if (hpage_pincount_available(page))
> > +		atomic_add(refs, compound_pincount_ptr(page));
> > +	else
> > +		page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
> >   }
> >   static void hpage_pincount_sub(struct page *page, int refs)
> > @@ -150,21 +161,7 @@ struct page *try_grab_compound_head(struct page *page,
> >   		if (!page)
> >   			return NULL;
> > -		/*
> > -		 * When pinning a compound page of order > 1 (which is what
> > -		 * hpage_pincount_available() checks for), use an exact count to
> > -		 * track it, via hpage_pincount_add/_sub().
> > -		 *
> > -		 * However, be sure to *also* increment the normal page refcount
> > -		 * field at least once, so that the page really is pinned.
> > -		 * That's why the refcount from the earlier
> > -		 * try_get_compound_head() is left intact.
> > -		 */
> 
> ...here.
> 
> > -		if (hpage_pincount_available(page))
> > -			hpage_pincount_add(page, refs);
> > -		else
> > -			page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
> > -
> > +		page_pincount_add(page, refs);
> >   		mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
> >   				    refs);
> 
> thanks,
> -- 
> John Hubbard
> NVIDIA
^ permalink raw reply	[flat|nested] 96+ messages in thread
* Re: [PATCH v2 11/28] mm: Make compound_pincount always available
  2022-01-10  4:23 ` [PATCH v2 11/28] mm: Make compound_pincount always available Matthew Wilcox (Oracle)
  2022-01-11  4:06   ` John Hubbard
@ 2022-01-20  9:15   ` Christoph Hellwig
  1 sibling, 0 replies; 96+ messages in thread
From: Christoph Hellwig @ 2022-01-20  9:15 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: linux-mm, John Hubbard, Christoph Hellwig, William Kucharski,
	linux-kernel, Jason Gunthorpe
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply	[flat|nested] 96+ messages in thread
end of thread, other threads:[~2022-01-20  9:15 UTC | newest]
Thread overview: 96+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-10  4:23 [PATCH v2 00/28] Convert GUP to folios Matthew Wilcox (Oracle)
2022-01-10  4:23 ` [PATCH v2 01/28] gup: Remove for_each_compound_range() Matthew Wilcox (Oracle)
2022-01-10  8:22   ` Christoph Hellwig
2022-01-11  1:07   ` John Hubbard
2022-01-10  4:23 ` [PATCH v2 02/28] gup: Remove for_each_compound_head() Matthew Wilcox (Oracle)
2022-01-10  8:23   ` Christoph Hellwig
2022-01-11  1:11   ` John Hubbard
2022-01-10  4:23 ` [PATCH v2 03/28] gup: Change the calling convention for compound_range_next() Matthew Wilcox (Oracle)
2022-01-10  8:25   ` Christoph Hellwig
2022-01-11  1:14   ` John Hubbard
2022-01-10  4:23 ` [PATCH v2 04/28] gup: Optimise compound_range_next() Matthew Wilcox (Oracle)
2022-01-10  8:26   ` Christoph Hellwig
2022-01-11  1:16   ` John Hubbard
2022-01-10  4:23 ` [PATCH v2 05/28] gup: Change the calling convention for compound_next() Matthew Wilcox (Oracle)
2022-01-10  8:27   ` Christoph Hellwig
2022-01-10 13:28     ` Matthew Wilcox
2022-01-11  1:18   ` John Hubbard
2022-01-10  4:23 ` [PATCH v2 06/28] gup: Fix some contiguous memmap assumptions Matthew Wilcox (Oracle)
2022-01-10  8:29   ` Christoph Hellwig
2022-01-10 13:37     ` Matthew Wilcox
2022-01-10 19:05       ` [External] : " Mike Kravetz
2022-01-11  1:47   ` John Hubbard
2022-01-10  4:23 ` [PATCH v2 07/28] gup: Remove an assumption of a contiguous memmap Matthew Wilcox (Oracle)
2022-01-10  8:30   ` Christoph Hellwig
2022-01-11  3:27   ` John Hubbard
2022-01-10  4:23 ` [PATCH v2 08/28] gup: Handle page split race more efficiently Matthew Wilcox (Oracle)
2022-01-10  8:31   ` Christoph Hellwig
2022-01-11  3:30   ` John Hubbard
2022-01-10  4:23 ` [PATCH v2 09/28] gup: Turn hpage_pincount_add() into page_pincount_add() Matthew Wilcox (Oracle)
2022-01-10  8:31   ` Christoph Hellwig
2022-01-11  3:35   ` John Hubbard
2022-01-11  4:32   ` John Hubbard
2022-01-11 13:46     ` Matthew Wilcox
2022-01-10  4:23 ` [PATCH v2 10/28] gup: Turn hpage_pincount_sub() into page_pincount_sub() Matthew Wilcox (Oracle)
2022-01-10  8:32   ` Christoph Hellwig
2022-01-11  6:40   ` John Hubbard
2022-01-10  4:23 ` [PATCH v2 11/28] mm: Make compound_pincount always available Matthew Wilcox (Oracle)
2022-01-11  4:06   ` John Hubbard
2022-01-11  4:38     ` Matthew Wilcox
2022-01-11  5:10       ` John Hubbard
2022-01-20  9:15   ` Christoph Hellwig
2022-01-10  4:23 ` [PATCH v2 12/28] mm: Add folio_put_refs() Matthew Wilcox (Oracle)
2022-01-10  8:32   ` Christoph Hellwig
2022-01-11  4:14   ` John Hubbard
2022-01-10  4:23 ` [PATCH v2 13/28] mm: Add folio_pincount_ptr() Matthew Wilcox (Oracle)
2022-01-10  8:33   ` Christoph Hellwig
2022-01-11  4:22   ` John Hubbard
2022-01-10  4:23 ` [PATCH v2 14/28] mm: Convert page_maybe_dma_pinned() to use a folio Matthew Wilcox (Oracle)
2022-01-10  8:33   ` Christoph Hellwig
2022-01-11  4:27   ` John Hubbard
2022-01-10  4:23 ` [PATCH v2 15/28] gup: Add try_get_folio() and try_grab_folio() Matthew Wilcox (Oracle)
2022-01-10  8:34   ` Christoph Hellwig
2022-01-11  5:00   ` John Hubbard
2022-01-10  4:23 ` [PATCH v2 16/28] mm: Remove page_cache_add_speculative() and page_cache_get_speculative() Matthew Wilcox (Oracle)
2022-01-10  8:35   ` Christoph Hellwig
2022-01-11  5:14   ` John Hubbard
2022-01-10  4:23 ` [PATCH v2 17/28] gup: Add gup_put_folio() Matthew Wilcox (Oracle)
2022-01-10  8:35   ` Christoph Hellwig
2022-01-11  6:44   ` John Hubbard
2022-01-10  4:23 ` [PATCH v2 18/28] hugetlb: Use try_grab_folio() instead of try_grab_compound_head() Matthew Wilcox (Oracle)
2022-01-10  8:36   ` Christoph Hellwig
2022-01-11  6:47   ` John Hubbard
2022-01-10  4:23 ` [PATCH v2 19/28] gup: Convert try_grab_page() to call try_grab_folio() Matthew Wilcox (Oracle)
2022-01-10  8:36   ` Christoph Hellwig
2022-01-11  7:01   ` John Hubbard
2022-01-10  4:23 ` [PATCH v2 20/28] gup: Convert gup_pte_range() to use a folio Matthew Wilcox (Oracle)
2022-01-10  8:37   ` Christoph Hellwig
2022-01-11  7:06   ` John Hubbard
2022-01-10  4:23 ` [PATCH v2 21/28] gup: Convert gup_hugepte() " Matthew Wilcox (Oracle)
2022-01-10  8:37   ` Christoph Hellwig
2022-01-11  7:33   ` John Hubbard
2022-01-10  4:24 ` [PATCH v2 22/28] gup: Convert gup_huge_pmd() " Matthew Wilcox (Oracle)
2022-01-10  8:37   ` Christoph Hellwig
2022-01-11  7:36   ` John Hubbard
2022-01-10  4:24 ` [PATCH v2 23/28] gup: Convert gup_huge_pud() " Matthew Wilcox (Oracle)
2022-01-10  8:38   ` Christoph Hellwig
2022-01-11  7:38   ` John Hubbard
2022-01-10  4:24 ` [PATCH v2 24/28] gup: Convert gup_huge_pgd() " Matthew Wilcox (Oracle)
2022-01-10  8:38   ` Christoph Hellwig
2022-01-11  7:38   ` John Hubbard
2022-01-10  4:24 ` [PATCH v2 25/28] gup: Convert compound_next() to gup_folio_next() Matthew Wilcox (Oracle)
2022-01-10  8:39   ` Christoph Hellwig
2022-01-11  7:41   ` John Hubbard
2022-01-10  4:24 ` [PATCH v2 26/28] gup: Convert compound_range_next() to gup_folio_range_next() Matthew Wilcox (Oracle)
2022-01-10  8:41   ` Christoph Hellwig
2022-01-10 13:41     ` Matthew Wilcox
2022-01-11  7:44   ` John Hubbard
2022-01-10  4:24 ` [PATCH v2 27/28] mm: Add isolate_lru_folio() Matthew Wilcox (Oracle)
2022-01-10  8:42   ` Christoph Hellwig
2022-01-11  7:49   ` John Hubbard
2022-01-10  4:24 ` [PATCH v2 28/28] gup: Convert check_and_migrate_movable_pages() to use a folio Matthew Wilcox (Oracle)
2022-01-10  8:42   ` Christoph Hellwig
2022-01-11  7:52   ` John Hubbard
2022-01-10 15:31 ` [PATCH v2 00/28] Convert GUP to folios Jason Gunthorpe
2022-01-10 16:09   ` Matthew Wilcox
2022-01-10 17:26 ` William Kucharski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).