Re: [PATCH v8 07/15] mm/gup: migrate device coherent pages when pinning instead of failing

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Alistair Popple <apopple@nvidia.com>
To: David Hildenbrand <david@redhat.com>
Cc: Alex Sierra <alex.sierra@amd.com>,
	rcampbell@nvidia.com, willy@infradead.org,
	Felix.Kuehling@amd.com, amd-gfx@lists.freedesktop.org,
	linux-xfs@vger.kernel.org, linux-mm@kvack.org,
	jglisse@redhat.com, dri-devel@lists.freedesktop.org,
	jgg@nvidia.com, akpm@linux-foundation.org,
	linux-ext4@vger.kernel.org, hch@lst.de
Subject: Re: [PATCH v8 07/15] mm/gup: migrate device coherent pages when pinning instead of failing
Date: Thu, 14 Jul 2022 15:39:49 +1000	[thread overview]
Message-ID: <87sfn4cj8u.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <2c4dd559-4fa9-f874-934f-d6b674543d0f@redhat.com>


David Hildenbrand <david@redhat.com> writes:

> On 07.07.22 21:03, Alex Sierra wrote:
>> From: Alistair Popple <apopple@nvidia.com>
>>
>> Currently any attempts to pin a device coherent page will fail. This is
>> because device coherent pages need to be managed by a device driver, and
>> pinning them would prevent a driver from migrating them off the device.
>>
>> However this is no reason to fail pinning of these pages. These are
>> coherent and accessible from the CPU so can be migrated just like
>> pinning ZONE_MOVABLE pages. So instead of failing all attempts to pin
>> them first try migrating them out of ZONE_DEVICE.
>>
>> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
>> [hch: rebased to the split device memory checks,
>>       moved migrate_device_page to migrate_device.c]
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>>  mm/gup.c            | 47 +++++++++++++++++++++++++++++++++++-----
>>  mm/internal.h       |  1 +
>>  mm/migrate_device.c | 53 +++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 96 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index b65fe8bf5af4..9b6b9923d22d 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -1891,9 +1891,43 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>>  			continue;
>>  		prev_folio = folio;
>>
>> -		if (folio_is_longterm_pinnable(folio))
>> +		/*
>> +		 * Device private pages will get faulted in during gup so it
>> +		 * shouldn't be possible to see one here.
>> +		 */
>> +		if (WARN_ON_ONCE(folio_is_device_private(folio))) {
>> +			ret = -EFAULT;
>> +			goto unpin_pages;
>> +		}
>
> I'd just drop that. Device private pages are never part of a present PTE. So if we
> could actually get a grab of one via GUP we would be in bigger trouble ...
> already before this patch.

Fair.

>> +
>> +		/*
>> +		 * Device coherent pages are managed by a driver and should not
>> +		 * be pinned indefinitely as it prevents the driver moving the
>> +		 * page. So when trying to pin with FOLL_LONGTERM instead try
>> +		 * to migrate the page out of device memory.
>> +		 */
>> +		if (folio_is_device_coherent(folio)) {
>> +			WARN_ON_ONCE(PageCompound(&folio->page));
>
> Maybe that belongs into migrate_device_page()?

Ok (noting Matthew's comment there as well).

>> +
>> +			/*
>> +			 * Migration will fail if the page is pinned, so convert
>
> [...]
>
>>  /*
>>   * mm/gup.c
>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>> index cf9668376c5a..5decd26dd551 100644
>> --- a/mm/migrate_device.c
>> +++ b/mm/migrate_device.c
>> @@ -794,3 +794,56 @@ void migrate_vma_finalize(struct migrate_vma *migrate)
>>  	}
>>  }
>>  EXPORT_SYMBOL(migrate_vma_finalize);
>> +
>> +/*
>> + * Migrate a device coherent page back to normal memory.  The caller should have
>> + * a reference on page which will be copied to the new page if migration is
>> + * successful or dropped on failure.
>> + */
>> +struct page *migrate_device_page(struct page *page, unsigned int gup_flags)
>
> Function name should most probably indicate that we're dealing with coherent pages here?

Ok.

>> +{
>> +	unsigned long src_pfn, dst_pfn = 0;
>> +	struct migrate_vma args;
>> +	struct page *dpage;
>> +
>> +	lock_page(page);
>> +	src_pfn = migrate_pfn(page_to_pfn(page)) | MIGRATE_PFN_MIGRATE;
>> +	args.src = &src_pfn;
>> +	args.dst = &dst_pfn;
>> +	args.cpages = 1;
>> +	args.npages = 1;
>> +	args.vma = NULL;
>> +	migrate_vma_setup(&args);
>> +	if (!(src_pfn & MIGRATE_PFN_MIGRATE))
>> +		return NULL;
>
> Wow, these refcount and page locking/unlocking rules with this migrate_* api are
> confusing now. And the usage here of sometimes returning and sometimes falling
> trough don't make it particularly easier to understand here.
>
> I'm not 100% happy about reusing migrate_vma_setup() usage if there *is no VMA*.
> That's just absolutely confusing, because usually migrate_vma_setup() itself
> would do the collection step and ref+lock pages. :/
>
> In general, I can see why/how we're reusing the migrate_vma_* API here, but there
> is absolutely no VMA ... not sure what to improve besides providing a second API
> that does a simple single-page migration. But that can be changed later ...

Yeah, as noted in your other response I think it should be ok to just
call migrate_vma_unmap() directly from migrate_device_page() so I assume
that would adequately deal with this.

>> +
>> +	dpage = alloc_pages(GFP_USER | __GFP_NOWARN, 0);
>> +
>
> alloc_page()
>
>> +	/*
>> +	 * get/pin the new page now so we don't have to retry gup after
>> +	 * migrating. We already have a reference so this should never fail.
>> +	 */
>> +	if (dpage && WARN_ON_ONCE(!try_grab_page(dpage, gup_flags))) {
>> +		__free_pages(dpage, 0);
>
> __free_page()
>
>> +		dpage = NULL;
>> +	}
>
> Hm, this means that we are not pinning via the PTE at hand, but via something
> we expect migration to put into the PTE. I'm not really happy about this.
>
> Ideally, we'd make the pinning decision only on the actual GUP path, not in here.
> Just like in the migrate_pages() case, where we end up dropping all refs/pins
> and looking up again via GUP from the PTE.
>
> For example, I wonder if something nasty could happen if the PTE got mapped
> R/O in the meantime and you're pinning R/W here ...
>
> TBH, all this special casing on gup_flags here is nasty. Please, let's just do
> it like migrate_pages() and do another GUP walk. Absolutely no need to optimize.

The only reason to pass gup_flags is to check FOLL_PIN vs. FOLL_GET so
that we can do the right reference on the destination page. I did the
optimisation because we already have the destination page with a
reference and GUP/PUP does not make any guarantees about the current PTE
state anyway.

However I noticed there might be a race here - during migration we
replace present PTEs with migration entries. On fork these get copied
via copy_nonpresent_pte() and made read-only. However we don't check if
the page a migration entry points to is pinned or not. For an ordinary
PTE copy_present_pte() would copy the page for a COW mapping, but this
won't happen if the page happens to be undergoing migration (even though
the migration will ultimately fail due to the pin).

Anyway I don't think this patch currently makes that any worse, but if
we fix the above it will because there is a brief period during which
the page we're pinning won't look like a pinned page.

So I will go with the suggestion to do another GUP walk.

> [...]
>
>
>
> I'd go with something like the following on top (which does not touch on the
> general semantic issue with migrate_vma_* ). Note that I most probably messed
> up some refcount/lock handling and that it's broken.
> Just to give you an idea what I think could be cleaner.

Thanks! At a glance it looks roughly right but I will check and respin
it to incorporate the comments.

> diff --git a/mm/gup.c b/mm/gup.c
> index 9b6b9923d22d..17041b3e605e 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1881,7 +1881,7 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>  	unsigned long isolation_error_count = 0, i;
>  	struct folio *prev_folio = NULL;
>  	LIST_HEAD(movable_page_list);
> -	bool drain_allow = true;
> +	bool drain_allow = true, any_device_coherent = false;
>  	int ret = 0;
>
>  	for (i = 0; i < nr_pages; i++) {
> @@ -1891,15 +1891,6 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>  			continue;
>  		prev_folio = folio;
>
> -		/*
> -		 * Device private pages will get faulted in during gup so it
> -		 * shouldn't be possible to see one here.
> -		 */
> -		if (WARN_ON_ONCE(folio_is_device_private(folio))) {
> -			ret = -EFAULT;
> -			goto unpin_pages;
> -		}
> -
>  		/*
>  		 * Device coherent pages are managed by a driver and should not
>  		 * be pinned indefinitely as it prevents the driver moving the
> @@ -1907,7 +1898,12 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>  		 * to migrate the page out of device memory.
>  		 */
>  		if (folio_is_device_coherent(folio)) {
> -			WARN_ON_ONCE(PageCompound(&folio->page));
> +			/*
> +			 * We always want a new GUP lookup with device coherent
> +			 * pages.
> +			 */
> +			any_device_coherent = true;
> +			pages[i] = 0;
>
>  			/*
>  			 * Migration will fail if the page is pinned, so convert
> @@ -1918,11 +1914,12 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>  				unpin_user_page(&folio->page);
>  			}
>
> -			pages[i] = migrate_device_page(&folio->page, gup_flags);
> -			if (!pages[i]) {
> -				ret = -EBUSY;
> +			ret = migrate_device_coherent_page(&folio->page);
> +			if (ret)
>  				goto unpin_pages;
> -			}
> +			/* The reference to our folio is stale now. */
> +			prev_folio = NULL;
> +			folio = NULL;
>  			continue;
>  		}
>
> @@ -1953,7 +1950,8 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>  				    folio_nr_pages(folio));
>  	}
>
> -	if (!list_empty(&movable_page_list) || isolation_error_count)
> +	if (!list_empty(&movable_page_list) || isolation_error_count ||
> +	    any_device_coherent)
>  		goto unpin_pages;
>
>  	/*
> @@ -1963,14 +1961,19 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>  	return nr_pages;
>
>  unpin_pages:
> -	for (i = 0; i < nr_pages; i++) {
> -		if (!pages[i])
> -			continue;
> +	/* We have to be careful if we stumbled over device coherent pages. */
> +	if (unlikely(any_device_coherent || !(gup_flags & FOLL_PIN))) {
> +		for (i = 0; i < nr_pages; i++) {
> +			if (!pages[i])
> +				continue;
>
> -		if (gup_flags & FOLL_PIN)
> -			unpin_user_page(pages[i]);
> -		else
> -			put_page(pages[i]);
> +			if (gup_flags & FOLL_PIN)
> +				unpin_user_page(pages[i]);
> +			else
> +				put_page(pages[i]);
> +		}
> +	} else {
> +		unpin_user_pages(pages, nr_pages);
>  	}
>
>  	if (!list_empty(&movable_page_list)) {
> diff --git a/mm/internal.h b/mm/internal.h
> index eeab4ee7a4a3..899dab512c5a 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -853,7 +853,7 @@ int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
>  		      unsigned long addr, int page_nid, int *flags);
>
>  void free_zone_device_page(struct page *page);
> -struct page *migrate_device_page(struct page *page, unsigned int gup_flags);
> +int migrate_device_coherent_page(struct page *page);
>
>  /*
>   * mm/gup.c
> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> index 5decd26dd551..dfb78ea3d326 100644
> --- a/mm/migrate_device.c
> +++ b/mm/migrate_device.c
> @@ -797,53 +797,40 @@ EXPORT_SYMBOL(migrate_vma_finalize);
>
>  /*
>   * Migrate a device coherent page back to normal memory.  The caller should have
> - * a reference on page which will be copied to the new page if migration is
> - * successful or dropped on failure.
> + * a reference on page, which will be dropped on return.
>   */
> -struct page *migrate_device_page(struct page *page, unsigned int gup_flags)
> +int migrate_device_coherent_page(struct page *page)
>  {
>  	unsigned long src_pfn, dst_pfn = 0;
> -	struct migrate_vma args;
> +	struct migrate_vma args = {
> +		.src = &src_pfn,
> +		.dst = &dst_pfn,
> +		.cpages = 1,
> +		.npages = 1,
> +		.vma = NULL,
> +	};
>  	struct page *dpage;
>
> +	VM_WARN_ON_ONCE(PageCompound(page));
> +
>  	lock_page(page);
>  	src_pfn = migrate_pfn(page_to_pfn(page)) | MIGRATE_PFN_MIGRATE;
> -	args.src = &src_pfn;
> -	args.dst = &dst_pfn;
> -	args.cpages = 1;
> -	args.npages = 1;
> -	args.vma = NULL;
> -	migrate_vma_setup(&args);
> -	if (!(src_pfn & MIGRATE_PFN_MIGRATE))
> -		return NULL;
> -
> -	dpage = alloc_pages(GFP_USER | __GFP_NOWARN, 0);
> -
> -	/*
> -	 * get/pin the new page now so we don't have to retry gup after
> -	 * migrating. We already have a reference so this should never fail.
> -	 */
> -	if (dpage && WARN_ON_ONCE(!try_grab_page(dpage, gup_flags))) {
> -		__free_pages(dpage, 0);
> -		dpage = NULL;
> -	}
>
> -	if (dpage) {
> -		lock_page(dpage);
> -		dst_pfn = migrate_pfn(page_to_pfn(dpage));
> +	migrate_vma_setup(&args);
> +	if (src_pfn & MIGRATE_PFN_MIGRATE) {
> +		dpage = alloc_page(GFP_USER | __GFP_NOWARN);
> +		if (dpage) {
> +			dst_pfn = migrate_pfn(page_to_pfn(dpage));
> +			lock_page(dpage);
> +		}
>  	}
>
>  	migrate_vma_pages(&args);
>  	if (src_pfn & MIGRATE_PFN_MIGRATE)
>  		copy_highpage(dpage, page);
>  	migrate_vma_finalize(&args);
> -	if (dpage && !(src_pfn & MIGRATE_PFN_MIGRATE)) {
> -		if (gup_flags & FOLL_PIN)
> -			unpin_user_page(dpage);
> -		else
> -			put_page(dpage);
> -		dpage = NULL;
> -	}
>
> -	return dpage;
> +	if (src_pfn & MIGRATE_PFN_MIGRATE)
> +		return 0;
> +	return -EBUSY;
>  }
> --
> 2.35.3

WARNING: multiple messages have this Message-ID (diff)

From: Alistair Popple <apopple@nvidia.com>
To: David Hildenbrand <david@redhat.com>
Cc: Alex Sierra <alex.sierra@amd.com>,
	jgg@nvidia.com, Felix.Kuehling@amd.com, linux-mm@kvack.org,
	rcampbell@nvidia.com, linux-ext4@vger.kernel.org,
	linux-xfs@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, hch@lst.de, jglisse@redhat.com,
	willy@infradead.org, akpm@linux-foundation.org
Subject: Re: [PATCH v8 07/15] mm/gup: migrate device coherent pages when pinning instead of failing
Date: Thu, 14 Jul 2022 15:39:49 +1000	[thread overview]
Message-ID: <87sfn4cj8u.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <2c4dd559-4fa9-f874-934f-d6b674543d0f@redhat.com>


David Hildenbrand <david@redhat.com> writes:

> On 07.07.22 21:03, Alex Sierra wrote:
>> From: Alistair Popple <apopple@nvidia.com>
>>
>> Currently any attempts to pin a device coherent page will fail. This is
>> because device coherent pages need to be managed by a device driver, and
>> pinning them would prevent a driver from migrating them off the device.
>>
>> However this is no reason to fail pinning of these pages. These are
>> coherent and accessible from the CPU so can be migrated just like
>> pinning ZONE_MOVABLE pages. So instead of failing all attempts to pin
>> them first try migrating them out of ZONE_DEVICE.
>>
>> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
>> [hch: rebased to the split device memory checks,
>>       moved migrate_device_page to migrate_device.c]
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>>  mm/gup.c            | 47 +++++++++++++++++++++++++++++++++++-----
>>  mm/internal.h       |  1 +
>>  mm/migrate_device.c | 53 +++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 96 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index b65fe8bf5af4..9b6b9923d22d 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -1891,9 +1891,43 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>>  			continue;
>>  		prev_folio = folio;
>>
>> -		if (folio_is_longterm_pinnable(folio))
>> +		/*
>> +		 * Device private pages will get faulted in during gup so it
>> +		 * shouldn't be possible to see one here.
>> +		 */
>> +		if (WARN_ON_ONCE(folio_is_device_private(folio))) {
>> +			ret = -EFAULT;
>> +			goto unpin_pages;
>> +		}
>
> I'd just drop that. Device private pages are never part of a present PTE. So if we
> could actually get a grab of one via GUP we would be in bigger trouble ...
> already before this patch.

Fair.

>> +
>> +		/*
>> +		 * Device coherent pages are managed by a driver and should not
>> +		 * be pinned indefinitely as it prevents the driver moving the
>> +		 * page. So when trying to pin with FOLL_LONGTERM instead try
>> +		 * to migrate the page out of device memory.
>> +		 */
>> +		if (folio_is_device_coherent(folio)) {
>> +			WARN_ON_ONCE(PageCompound(&folio->page));
>
> Maybe that belongs into migrate_device_page()?

Ok (noting Matthew's comment there as well).

>> +
>> +			/*
>> +			 * Migration will fail if the page is pinned, so convert
>
> [...]
>
>>  /*
>>   * mm/gup.c
>> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>> index cf9668376c5a..5decd26dd551 100644
>> --- a/mm/migrate_device.c
>> +++ b/mm/migrate_device.c
>> @@ -794,3 +794,56 @@ void migrate_vma_finalize(struct migrate_vma *migrate)
>>  	}
>>  }
>>  EXPORT_SYMBOL(migrate_vma_finalize);
>> +
>> +/*
>> + * Migrate a device coherent page back to normal memory.  The caller should have
>> + * a reference on page which will be copied to the new page if migration is
>> + * successful or dropped on failure.
>> + */
>> +struct page *migrate_device_page(struct page *page, unsigned int gup_flags)
>
> Function name should most probably indicate that we're dealing with coherent pages here?

Ok.

>> +{
>> +	unsigned long src_pfn, dst_pfn = 0;
>> +	struct migrate_vma args;
>> +	struct page *dpage;
>> +
>> +	lock_page(page);
>> +	src_pfn = migrate_pfn(page_to_pfn(page)) | MIGRATE_PFN_MIGRATE;
>> +	args.src = &src_pfn;
>> +	args.dst = &dst_pfn;
>> +	args.cpages = 1;
>> +	args.npages = 1;
>> +	args.vma = NULL;
>> +	migrate_vma_setup(&args);
>> +	if (!(src_pfn & MIGRATE_PFN_MIGRATE))
>> +		return NULL;
>
> Wow, these refcount and page locking/unlocking rules with this migrate_* api are
> confusing now. And the usage here of sometimes returning and sometimes falling
> trough don't make it particularly easier to understand here.
>
> I'm not 100% happy about reusing migrate_vma_setup() usage if there *is no VMA*.
> That's just absolutely confusing, because usually migrate_vma_setup() itself
> would do the collection step and ref+lock pages. :/
>
> In general, I can see why/how we're reusing the migrate_vma_* API here, but there
> is absolutely no VMA ... not sure what to improve besides providing a second API
> that does a simple single-page migration. But that can be changed later ...

Yeah, as noted in your other response I think it should be ok to just
call migrate_vma_unmap() directly from migrate_device_page() so I assume
that would adequately deal with this.

>> +
>> +	dpage = alloc_pages(GFP_USER | __GFP_NOWARN, 0);
>> +
>
> alloc_page()
>
>> +	/*
>> +	 * get/pin the new page now so we don't have to retry gup after
>> +	 * migrating. We already have a reference so this should never fail.
>> +	 */
>> +	if (dpage && WARN_ON_ONCE(!try_grab_page(dpage, gup_flags))) {
>> +		__free_pages(dpage, 0);
>
> __free_page()
>
>> +		dpage = NULL;
>> +	}
>
> Hm, this means that we are not pinning via the PTE at hand, but via something
> we expect migration to put into the PTE. I'm not really happy about this.
>
> Ideally, we'd make the pinning decision only on the actual GUP path, not in here.
> Just like in the migrate_pages() case, where we end up dropping all refs/pins
> and looking up again via GUP from the PTE.
>
> For example, I wonder if something nasty could happen if the PTE got mapped
> R/O in the meantime and you're pinning R/W here ...
>
> TBH, all this special casing on gup_flags here is nasty. Please, let's just do
> it like migrate_pages() and do another GUP walk. Absolutely no need to optimize.

The only reason to pass gup_flags is to check FOLL_PIN vs. FOLL_GET so
that we can do the right reference on the destination page. I did the
optimisation because we already have the destination page with a
reference and GUP/PUP does not make any guarantees about the current PTE
state anyway.

However I noticed there might be a race here - during migration we
replace present PTEs with migration entries. On fork these get copied
via copy_nonpresent_pte() and made read-only. However we don't check if
the page a migration entry points to is pinned or not. For an ordinary
PTE copy_present_pte() would copy the page for a COW mapping, but this
won't happen if the page happens to be undergoing migration (even though
the migration will ultimately fail due to the pin).

Anyway I don't think this patch currently makes that any worse, but if
we fix the above it will because there is a brief period during which
the page we're pinning won't look like a pinned page.

So I will go with the suggestion to do another GUP walk.

> [...]
>
>
>
> I'd go with something like the following on top (which does not touch on the
> general semantic issue with migrate_vma_* ). Note that I most probably messed
> up some refcount/lock handling and that it's broken.
> Just to give you an idea what I think could be cleaner.

Thanks! At a glance it looks roughly right but I will check and respin
it to incorporate the comments.

> diff --git a/mm/gup.c b/mm/gup.c
> index 9b6b9923d22d..17041b3e605e 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1881,7 +1881,7 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>  	unsigned long isolation_error_count = 0, i;
>  	struct folio *prev_folio = NULL;
>  	LIST_HEAD(movable_page_list);
> -	bool drain_allow = true;
> +	bool drain_allow = true, any_device_coherent = false;
>  	int ret = 0;
>
>  	for (i = 0; i < nr_pages; i++) {
> @@ -1891,15 +1891,6 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>  			continue;
>  		prev_folio = folio;
>
> -		/*
> -		 * Device private pages will get faulted in during gup so it
> -		 * shouldn't be possible to see one here.
> -		 */
> -		if (WARN_ON_ONCE(folio_is_device_private(folio))) {
> -			ret = -EFAULT;
> -			goto unpin_pages;
> -		}
> -
>  		/*
>  		 * Device coherent pages are managed by a driver and should not
>  		 * be pinned indefinitely as it prevents the driver moving the
> @@ -1907,7 +1898,12 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>  		 * to migrate the page out of device memory.
>  		 */
>  		if (folio_is_device_coherent(folio)) {
> -			WARN_ON_ONCE(PageCompound(&folio->page));
> +			/*
> +			 * We always want a new GUP lookup with device coherent
> +			 * pages.
> +			 */
> +			any_device_coherent = true;
> +			pages[i] = 0;
>
>  			/*
>  			 * Migration will fail if the page is pinned, so convert
> @@ -1918,11 +1914,12 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>  				unpin_user_page(&folio->page);
>  			}
>
> -			pages[i] = migrate_device_page(&folio->page, gup_flags);
> -			if (!pages[i]) {
> -				ret = -EBUSY;
> +			ret = migrate_device_coherent_page(&folio->page);
> +			if (ret)
>  				goto unpin_pages;
> -			}
> +			/* The reference to our folio is stale now. */
> +			prev_folio = NULL;
> +			folio = NULL;
>  			continue;
>  		}
>
> @@ -1953,7 +1950,8 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>  				    folio_nr_pages(folio));
>  	}
>
> -	if (!list_empty(&movable_page_list) || isolation_error_count)
> +	if (!list_empty(&movable_page_list) || isolation_error_count ||
> +	    any_device_coherent)
>  		goto unpin_pages;
>
>  	/*
> @@ -1963,14 +1961,19 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>  	return nr_pages;
>
>  unpin_pages:
> -	for (i = 0; i < nr_pages; i++) {
> -		if (!pages[i])
> -			continue;
> +	/* We have to be careful if we stumbled over device coherent pages. */
> +	if (unlikely(any_device_coherent || !(gup_flags & FOLL_PIN))) {
> +		for (i = 0; i < nr_pages; i++) {
> +			if (!pages[i])
> +				continue;
>
> -		if (gup_flags & FOLL_PIN)
> -			unpin_user_page(pages[i]);
> -		else
> -			put_page(pages[i]);
> +			if (gup_flags & FOLL_PIN)
> +				unpin_user_page(pages[i]);
> +			else
> +				put_page(pages[i]);
> +		}
> +	} else {
> +		unpin_user_pages(pages, nr_pages);
>  	}
>
>  	if (!list_empty(&movable_page_list)) {
> diff --git a/mm/internal.h b/mm/internal.h
> index eeab4ee7a4a3..899dab512c5a 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -853,7 +853,7 @@ int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
>  		      unsigned long addr, int page_nid, int *flags);
>
>  void free_zone_device_page(struct page *page);
> -struct page *migrate_device_page(struct page *page, unsigned int gup_flags);
> +int migrate_device_coherent_page(struct page *page);
>
>  /*
>   * mm/gup.c
> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> index 5decd26dd551..dfb78ea3d326 100644
> --- a/mm/migrate_device.c
> +++ b/mm/migrate_device.c
> @@ -797,53 +797,40 @@ EXPORT_SYMBOL(migrate_vma_finalize);
>
>  /*
>   * Migrate a device coherent page back to normal memory.  The caller should have
> - * a reference on page which will be copied to the new page if migration is
> - * successful or dropped on failure.
> + * a reference on page, which will be dropped on return.
>   */
> -struct page *migrate_device_page(struct page *page, unsigned int gup_flags)
> +int migrate_device_coherent_page(struct page *page)
>  {
>  	unsigned long src_pfn, dst_pfn = 0;
> -	struct migrate_vma args;
> +	struct migrate_vma args = {
> +		.src = &src_pfn,
> +		.dst = &dst_pfn,
> +		.cpages = 1,
> +		.npages = 1,
> +		.vma = NULL,
> +	};
>  	struct page *dpage;
>
> +	VM_WARN_ON_ONCE(PageCompound(page));
> +
>  	lock_page(page);
>  	src_pfn = migrate_pfn(page_to_pfn(page)) | MIGRATE_PFN_MIGRATE;
> -	args.src = &src_pfn;
> -	args.dst = &dst_pfn;
> -	args.cpages = 1;
> -	args.npages = 1;
> -	args.vma = NULL;
> -	migrate_vma_setup(&args);
> -	if (!(src_pfn & MIGRATE_PFN_MIGRATE))
> -		return NULL;
> -
> -	dpage = alloc_pages(GFP_USER | __GFP_NOWARN, 0);
> -
> -	/*
> -	 * get/pin the new page now so we don't have to retry gup after
> -	 * migrating. We already have a reference so this should never fail.
> -	 */
> -	if (dpage && WARN_ON_ONCE(!try_grab_page(dpage, gup_flags))) {
> -		__free_pages(dpage, 0);
> -		dpage = NULL;
> -	}
>
> -	if (dpage) {
> -		lock_page(dpage);
> -		dst_pfn = migrate_pfn(page_to_pfn(dpage));
> +	migrate_vma_setup(&args);
> +	if (src_pfn & MIGRATE_PFN_MIGRATE) {
> +		dpage = alloc_page(GFP_USER | __GFP_NOWARN);
> +		if (dpage) {
> +			dst_pfn = migrate_pfn(page_to_pfn(dpage));
> +			lock_page(dpage);
> +		}
>  	}
>
>  	migrate_vma_pages(&args);
>  	if (src_pfn & MIGRATE_PFN_MIGRATE)
>  		copy_highpage(dpage, page);
>  	migrate_vma_finalize(&args);
> -	if (dpage && !(src_pfn & MIGRATE_PFN_MIGRATE)) {
> -		if (gup_flags & FOLL_PIN)
> -			unpin_user_page(dpage);
> -		else
> -			put_page(dpage);
> -		dpage = NULL;
> -	}
>
> -	return dpage;
> +	if (src_pfn & MIGRATE_PFN_MIGRATE)
> +		return 0;
> +	return -EBUSY;
>  }
> --
> 2.35.3

next prev parent reply	other threads:[~2022-07-14 12:15 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-07 19:03 [PATCH v8 00/15] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping Alex Sierra
2022-07-07 19:03 ` Alex Sierra
2022-07-07 19:03 ` [PATCH v8 01/15] mm: rename is_pinnable_pages to is_longterm_pinnable_pages Alex Sierra
2022-07-07 19:03   ` Alex Sierra
2022-07-08 11:26   ` David Hildenbrand
2022-07-08 11:26     ` David Hildenbrand
2022-07-07 19:03 ` [PATCH v8 02/15] mm: move page zone helpers into new header-specific file Alex Sierra
2022-07-07 19:03   ` Alex Sierra
2022-07-08 11:28   ` David Hildenbrand
2022-07-08 11:28     ` David Hildenbrand
2022-07-08 21:25     ` Felix Kuehling
2022-07-08 21:25       ` Felix Kuehling
2022-07-11 13:56       ` David Hildenbrand
2022-07-11 13:56         ` David Hildenbrand
2022-07-14 16:15       ` [PATCH] mm: move page zone helpers from mm.h to mmzone.h Alex Sierra
2022-07-14 16:15         ` Alex Sierra
2022-07-07 19:03 ` [PATCH v8 03/15] mm: add zone device coherent type memory support Alex Sierra
2022-07-07 19:03   ` Alex Sierra
2022-07-07 19:03 ` [PATCH v8 04/15] mm: handling Non-LRU pages returned by vm_normal_pages Alex Sierra
2022-07-07 19:03   ` Alex Sierra
2022-07-07 19:03 ` [PATCH v8 05/15] mm: add device coherent vma selection for memory migration Alex Sierra
2022-07-07 19:03   ` Alex Sierra
2022-07-07 19:03 ` [PATCH v8 06/15] mm: remove the vma check in migrate_vma_setup() Alex Sierra
2022-07-07 19:03   ` Alex Sierra
2022-07-11 13:52   ` David Hildenbrand
2022-07-11 13:52     ` David Hildenbrand
2022-07-14  5:31     ` Alistair Popple
2022-07-14  5:31       ` Alistair Popple
2022-07-07 19:03 ` [PATCH v8 07/15] mm/gup: migrate device coherent pages when pinning instead of failing Alex Sierra
2022-07-07 19:03   ` Alex Sierra
2022-07-11 13:35   ` David Hildenbrand
2022-07-11 13:35     ` David Hildenbrand
2022-07-11 14:00     ` Matthew Wilcox
2022-07-11 14:00       ` Matthew Wilcox
2022-07-11 14:00       ` David Hildenbrand
2022-07-11 14:00         ` David Hildenbrand
2022-07-15  2:11         ` [PATCH] " Alistair Popple
2022-07-15  2:11           ` Alistair Popple
2022-07-15 14:12           ` Sierra Guiza, Alejandro (Alex)
2022-07-15 14:12             ` Sierra Guiza, Alejandro (Alex)
2022-07-14  5:39     ` Alistair Popple [this message]
2022-07-14  5:39       ` [PATCH v8 07/15] " Alistair Popple
2022-07-07 19:03 ` [PATCH v8 08/15] drm/amdkfd: add SPM support for SVM Alex Sierra
2022-07-07 19:03   ` Alex Sierra
2022-07-07 19:03 ` [PATCH v8 09/15] lib: test_hmm add ioctl to get zone device type Alex Sierra
2022-07-07 19:03   ` Alex Sierra
2022-07-07 19:03 ` [PATCH v8 10/15] lib: test_hmm add module param for " Alex Sierra
2022-07-07 19:03   ` Alex Sierra
2022-07-07 19:03 ` [PATCH v8 11/15] lib: add support for device coherent type in test_hmm Alex Sierra
2022-07-07 19:03   ` Alex Sierra
2022-07-07 19:03 ` [PATCH v8 12/15] tools: update hmm-test to support device coherent type Alex Sierra
2022-07-07 19:03   ` Alex Sierra
2022-07-07 19:03 ` [PATCH v8 13/15] tools: update test_hmm script to support SP config Alex Sierra
2022-07-07 19:03   ` Alex Sierra
2022-07-07 19:03 ` [PATCH v8 14/15] tools: add hmm gup tests for device coherent type Alex Sierra
2022-07-07 19:03   ` Alex Sierra
2022-07-07 19:03 ` [PATCH v8 15/15] tools: add selftests to hmm for COW in device memory Alex Sierra
2022-07-07 19:03   ` Alex Sierra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sfn4cj8u.fsf@nvdebian.thelocal \
    --to=apopple@nvidia.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.sierra@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=david@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@lst.de \
    --cc=jgg@nvidia.com \
    --cc=jglisse@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=rcampbell@nvidia.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.