Re: [PATCH v2 5/8] mm/gup: Accelerate thp gup even for "pages != NULL"

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mike Rapoport <rppt@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	John Hubbard <jhubbard@nvidia.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	James Houghton <jthoughton@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Lorenzo Stoakes <lstoakes@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: [PATCH v2 5/8] mm/gup: Accelerate thp gup even for "pages != NULL"
Date: Tue, 20 Jun 2023 12:23:55 -0400	[thread overview]
Message-ID: <ZJHSm/UbEy3JndZ4@x1n> (raw)
In-Reply-To: <02a057a3-3d9e-4013-8762-25ceb1beec86@redhat.com>

On Tue, Jun 20, 2023 at 05:43:35PM +0200, David Hildenbrand wrote:
> On 20.06.23 01:10, Peter Xu wrote:
> > The acceleration of THP was done with ctx.page_mask, however it'll be
> > ignored if **pages is non-NULL.
> > 
> > The old optimization was introduced in 2013 in 240aadeedc4a ("mm:
> > accelerate mm_populate() treatment of THP pages").  It didn't explain why
> > we can't optimize the **pages non-NULL case.  It's possible that at that
> > time the major goal was for mm_populate() which should be enough back then.
> 
> In the past we had these sub-page refcounts for THP. My best guess (and I
> didn't check if that was still the case in 2013) would be that it was
> simpler regarding refcount handling to to do it one-subpage at a time.
> 
> But I might be just wrong.
> 
> > 
> > Optimize thp for all cases, by properly looping over each subpage, doing
> > cache flushes, and boost refcounts / pincounts where needed in one go.
> > 
> > This can be verified using gup_test below:
> > 
> >    # chrt -f 1 ./gup_test -m 512 -t -L -n 1024 -r 10
> > 
> > Before:    13992.50 ( +-8.75%)
> > After:       378.50 (+-69.62%)
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >   mm/gup.c | 51 ++++++++++++++++++++++++++++++++++++++++++++-------
> >   1 file changed, 44 insertions(+), 7 deletions(-)
> > 
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 4a00d609033e..b50272012e49 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -1199,16 +1199,53 @@ static long __get_user_pages(struct mm_struct *mm,
> >   			goto out;
> >   		}
> >   next_page:
> > -		if (pages) {
> > -			pages[i] = page;
> > -			flush_anon_page(vma, page, start);
> > -			flush_dcache_page(page);
> > -			ctx.page_mask = 0;
> > -		}
> > -
> >   		page_increm = 1 + (~(start >> PAGE_SHIFT) & ctx.page_mask);
> >   		if (page_increm > nr_pages)
> >   			page_increm = nr_pages;
> > +
> > +		if (pages) {
> > +			struct page *subpage;
> > +			unsigned int j;
> > +
> > +			/*
> > +			 * This must be a large folio (and doesn't need to
> > +			 * be the whole folio; it can be part of it), do
> > +			 * the refcount work for all the subpages too.
> > +			 *
> > +			 * NOTE: here the page may not be the head page
> > +			 * e.g. when start addr is not thp-size aligned.
> > +			 * try_grab_folio() should have taken care of tail
> > +			 * pages.
> > +			 */
> > +			if (page_increm > 1) {
> > +				struct folio *folio;
> > +
> > +				/*
> > +				 * Since we already hold refcount on the
> > +				 * large folio, this should never fail.
> > +				 */
> > +				folio = try_grab_folio(page, page_increm - 1,
> > +						       foll_flags);
> > +				if (WARN_ON_ONCE(!folio)) {
> > +					/*
> > +					 * Release the 1st page ref if the
> > +					 * folio is problematic, fail hard.
> > +					 */
> > +					gup_put_folio(page_folio(page), 1,
> > +						      foll_flags);
> > +					ret = -EFAULT;
> > +					goto out;
> > +				}
> > +			}
> > +
> > +			for (j = 0; j < page_increm; j++) {
> > +				subpage = nth_page(page, j);
> > +				pages[i+j] = subpage;
> 
> Doe checkpatch like pages[i+j]? I'd have used spaces around the +.

Can do.

> 
> > +				flush_anon_page(vma, subpage, start + j * PAGE_SIZE);
> > +				flush_dcache_page(subpage);
> > +			}
> > +		}
> > +
> >   		i += page_increm;
> >   		start += page_increm * PAGE_SIZE;
> >   		nr_pages -= page_increm;
> 
> 
> So, we did the first try_grab_folio() while our page was PMD-mapped udner
> the PT lock and we had sufficient permissions (e.g., mapped writable, no
> unsharing required). With FOLL_PIN, we incremented the pincount.
> 
> 
> I was wondering if something could have happened ever since we unlocked the
> PT table lock and possibly PTE-mapped the THP. ... but as it's already
> pinned, it cannot get shared during fork() [will stay exclusive].
> 
> So we can just take additional pins on that folio.
> 
> 
> LGTM, although I do like the GUP-fast way of recording+ref'ing it at a
> central place (see gup_huge_pmd() with record_subpages() and friends), not
> after the effects.

My read on this is follow_page_mask() is also used in follow page, which
does not need page*.

No strong opinion here. Maybe we leave this as a follow up even if it can
be justified?  This patch is probably still the smallest (and still clean)
change to speed this whole thing up over either thp or hugetlb.

-- 
Peter Xu

next prev parent reply	other threads:[~2023-06-20 16:24 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-19 23:10 [PATCH v2 0/8] mm/gup: Unify hugetlb, speed up thp Peter Xu
2023-06-19 23:10 ` [PATCH v2 1/8] mm/hugetlb: Handle FOLL_DUMP well in follow_page_mask() Peter Xu
2023-06-19 23:10 ` [PATCH v2 2/8] mm/hugetlb: Prepare hugetlb_follow_page_mask() for FOLL_PIN Peter Xu
2023-06-20 15:22   ` David Hildenbrand
2023-06-20 16:03     ` Peter Xu
2023-06-20 15:28   ` David Hildenbrand
2023-06-20 16:06     ` Peter Xu
2023-06-19 23:10 ` [PATCH v2 3/8] mm/hugetlb: Add page_mask for hugetlb_follow_page_mask() Peter Xu
2023-06-20 15:23   ` David Hildenbrand
2023-06-20 16:28     ` Peter Xu
2023-06-20 17:54       ` David Hildenbrand
2023-06-19 23:10 ` [PATCH v2 4/8] mm/gup: Cleanup next_page handling Peter Xu
2023-06-20 15:23   ` David Hildenbrand
2023-06-19 23:10 ` [PATCH v2 5/8] mm/gup: Accelerate thp gup even for "pages != NULL" Peter Xu
2023-06-20 15:43   ` David Hildenbrand
2023-06-20 16:23     ` Peter Xu [this message]
2023-06-20 18:02       ` David Hildenbrand
2023-06-20 20:12         ` Peter Xu
2023-06-20 21:43   ` Lorenzo Stoakes
2023-06-19 23:10 ` [PATCH v2 6/8] mm/gup: Retire follow_hugetlb_page() Peter Xu
2023-06-19 23:10 ` [PATCH v2 7/8] selftests/mm: Add -a to run_vmtests.sh Peter Xu
2023-06-19 23:10 ` [PATCH v2 8/8] selftests/mm: Add gup test matrix in run_vmtests.sh Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZJHSm/UbEy3JndZ4@x1n \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jthoughton@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=mike.kravetz@oracle.com \
    --cc=rppt@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.