Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	airlied@gmail.com, christian.koenig@amd.com,
	thomas.hellstrom@linux.intel.com, simona.vetter@ffwll.ch,
	felix.kuehling@amd.com, dakr@kernel.org
Subject: Re: [PATCH v2 03/29] mm/migrate: Trylock device page in do_swap_page
Date: Thu, 17 Oct 2024 12:51:08 +1100	[thread overview]
Message-ID: <87cyjzo5pt.fsf@nvdebian.thelocal> (raw)
In-Reply-To: <Zw9EBRHCZkLvXmZs@DUT025-TGLU.fm.intel.com>


Matthew Brost <matthew.brost@intel.com> writes:

> On Wed, Oct 16, 2024 at 03:00:08PM +1100, Alistair Popple wrote:
>> 
>> Matthew Brost <matthew.brost@intel.com> writes:
>> 
>> > Avoid multiple CPU page faults to the same device page racing by trying
>> > to lock the page in do_swap_page before taking an extra reference to the
>> > page. This prevents scenarios where multiple CPU page faults each take
>> > an extra reference to a device page, which could abort migration in
>> > folio_migrate_mapping. With the device page being locked in
>> > do_swap_page, the migrate_vma_* functions need to be updated to avoid
>> > locking the fault_page argument.
>> >
>> > Prior to this change, a livelock scenario could occur in Xe's (Intel GPU
>> > DRM driver) SVM implementation if enough threads faulted the same device
>> > page.
>> >
>> > Cc: Philip Yang <Philip.Yang@amd.com>
>> > Cc: Felix Kuehling <felix.kuehling@amd.com>
>> > Cc: Christian König <christian.koenig@amd.com>
>> > Cc: Andrew Morton <akpm@linux-foundation.org>
>> > Suggessted-by: Simona Vetter <simona.vetter@ffwll.ch>
>> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> > ---
>> >  mm/memory.c         | 13 ++++++---
>> >  mm/migrate_device.c | 69 ++++++++++++++++++++++++++++++---------------
>> >  2 files changed, 56 insertions(+), 26 deletions(-)
>> >
>> > diff --git a/mm/memory.c b/mm/memory.c
>> > index 2366578015ad..b72bde782611 100644
>> > --- a/mm/memory.c
>> > +++ b/mm/memory.c
>> > @@ -4252,10 +4252,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>> >  			 * Get a page reference while we know the page can't be
>> >  			 * freed.
>> >  			 */
>> > -			get_page(vmf->page);
>> > -			pte_unmap_unlock(vmf->pte, vmf->ptl);
>> > -			ret = vmf->page->pgmap->ops->migrate_to_ram(vmf);
>> > -			put_page(vmf->page);
>> > +			if (trylock_page(vmf->page)) {
>> > +				get_page(vmf->page);
>> > +				pte_unmap_unlock(vmf->pte, vmf->ptl);
>> > +				ret = vmf->page->pgmap->ops->migrate_to_ram(vmf);
>> > +				put_page(vmf->page);
>> > +				unlock_page(vmf->page);
>> 
>> I don't think my previous review of this change has really been
>> addressed. Why don't we just install the migration entry here? Seems
>> like it would be a much simpler way of solving this.
>> 
>
> I should have mentioned this in the cover-letter, I haven't got around
> to trying that out yet. Included this existing version for correctness
> but I also think this is not strickly required to merge this series as
> our locking in migrate_to_ram only relies on the core MM locks so
> some thread would eventually win the race and make forward progress.
>
> So I guess just ignore this patch and will send an updated version
> individually with installing a migration entry in do_swap_page. If for
> some reason that doesn't work, I'll respond here explaining why.

That would be great. I have a fairly strong preference for doing that
instead of adding more special cases for the fault page in the migration
code. And if we can't do that it would be good to understand
why. Thanks.

 - Alistair

> Matt
>
>> > +			} else {
>> > +				pte_unmap_unlock(vmf->pte, vmf->ptl);
>> > +			}
>> >  		} else if (is_hwpoison_entry(entry)) {
>> >  			ret = VM_FAULT_HWPOISON;
>> >  		} else if (is_pte_marker_entry(entry)) {
>> > diff --git a/mm/migrate_device.c b/mm/migrate_device.c
>> > index f163c2131022..2477d39f57be 100644
>> > --- a/mm/migrate_device.c
>> > +++ b/mm/migrate_device.c
>> > @@ -60,6 +60,8 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>> >  				   struct mm_walk *walk)
>> >  {
>> >  	struct migrate_vma *migrate = walk->private;
>> > +	struct folio *fault_folio = migrate->fault_page ?
>> > +		page_folio(migrate->fault_page) : NULL;
>> >  	struct vm_area_struct *vma = walk->vma;
>> >  	struct mm_struct *mm = vma->vm_mm;
>> >  	unsigned long addr = start, unmapped = 0;
>> > @@ -88,11 +90,13 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>> >  
>> >  			folio_get(folio);
>> >  			spin_unlock(ptl);
>> > -			if (unlikely(!folio_trylock(folio)))
>> > +			if (unlikely(fault_folio != folio &&
>> > +				     !folio_trylock(folio)))
>> >  				return migrate_vma_collect_skip(start, end,
>> >  								walk);
>> >  			ret = split_folio(folio);
>> > -			folio_unlock(folio);
>> > +			if (fault_folio != folio)
>> > +				folio_unlock(folio);
>> >  			folio_put(folio);
>> >  			if (ret)
>> >  				return migrate_vma_collect_skip(start, end,
>> > @@ -192,7 +196,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>> >  		 * optimisation to avoid walking the rmap later with
>> >  		 * try_to_migrate().
>> >  		 */
>> > -		if (folio_trylock(folio)) {
>> > +		if (fault_folio == folio || folio_trylock(folio)) {
>> >  			bool anon_exclusive;
>> >  			pte_t swp_pte;
>> >  
>> > @@ -204,7 +208,8 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
>> >  
>> >  				if (folio_try_share_anon_rmap_pte(folio, page)) {
>> >  					set_pte_at(mm, addr, ptep, pte);
>> > -					folio_unlock(folio);
>> > +					if (fault_folio != folio)
>> > +						folio_unlock(folio);
>> >  					folio_put(folio);
>> >  					mpfn = 0;
>> >  					goto next;
>> > @@ -363,6 +368,8 @@ static unsigned long migrate_device_unmap(unsigned long *src_pfns,
>> >  					  unsigned long npages,
>> >  					  struct page *fault_page)
>> >  {
>> > +	struct folio *fault_folio = fault_page ?
>> > +		page_folio(fault_page) : NULL;
>> >  	unsigned long i, restore = 0;
>> >  	bool allow_drain = true;
>> >  	unsigned long unmapped = 0;
>> > @@ -427,7 +434,8 @@ static unsigned long migrate_device_unmap(unsigned long *src_pfns,
>> >  		remove_migration_ptes(folio, folio, 0);
>> >  
>> >  		src_pfns[i] = 0;
>> > -		folio_unlock(folio);
>> > +		if (fault_folio != folio)
>> > +			folio_unlock(folio);
>> >  		folio_put(folio);
>> >  		restore--;
>> >  	}
>> > @@ -536,6 +544,8 @@ int migrate_vma_setup(struct migrate_vma *args)
>> >  		return -EINVAL;
>> >  	if (args->fault_page && !is_device_private_page(args->fault_page))
>> >  		return -EINVAL;
>> > +	if (args->fault_page && !PageLocked(args->fault_page))
>> > +		return -EINVAL;
>> >  
>> >  	memset(args->src, 0, sizeof(*args->src) * nr_pages);
>> >  	args->cpages = 0;
>> > @@ -799,19 +809,13 @@ void migrate_vma_pages(struct migrate_vma *migrate)
>> >  }
>> >  EXPORT_SYMBOL(migrate_vma_pages);
>> >  
>> > -/*
>> > - * migrate_device_finalize() - complete page migration
>> > - * @src_pfns: src_pfns returned from migrate_device_range()
>> > - * @dst_pfns: array of pfns allocated by the driver to migrate memory to
>> > - * @npages: number of pages in the range
>> > - *
>> > - * Completes migration of the page by removing special migration entries.
>> > - * Drivers must ensure copying of page data is complete and visible to the CPU
>> > - * before calling this.
>> > - */
>> > -void migrate_device_finalize(unsigned long *src_pfns,
>> > -			unsigned long *dst_pfns, unsigned long npages)
>> > +static void __migrate_device_finalize(unsigned long *src_pfns,
>> > +				      unsigned long *dst_pfns,
>> > +				      unsigned long npages,
>> > +				      struct page *fault_page)
>> >  {
>> > +	struct folio *fault_folio = fault_page ?
>> > +		page_folio(fault_page) : NULL;
>> >  	unsigned long i;
>> >  
>> >  	for (i = 0; i < npages; i++) {
>> > @@ -824,7 +828,8 @@ void migrate_device_finalize(unsigned long *src_pfns,
>> >  
>> >  		if (!page) {
>> >  			if (dst) {
>> > -				folio_unlock(dst);
>> > +				if (fault_folio != dst)
>> > +					folio_unlock(dst);
>> >  				folio_put(dst);
>> >  			}
>> >  			continue;
>> > @@ -834,14 +839,16 @@ void migrate_device_finalize(unsigned long *src_pfns,
>> >  
>> >  		if (!(src_pfns[i] & MIGRATE_PFN_MIGRATE) || !dst) {
>> >  			if (dst) {
>> > -				folio_unlock(dst);
>> > +				if (fault_folio != dst)
>> > +					folio_unlock(dst);
>> >  				folio_put(dst);
>> >  			}
>> >  			dst = src;
>> >  		}
>> >  
>> >  		remove_migration_ptes(src, dst, 0);
>> > -		folio_unlock(src);
>> > +		if (fault_folio != src)
>> > +			folio_unlock(src);
>> >  
>> >  		if (folio_is_zone_device(src))
>> >  			folio_put(src);
>> > @@ -849,7 +856,8 @@ void migrate_device_finalize(unsigned long *src_pfns,
>> >  			folio_putback_lru(src);
>> >  
>> >  		if (dst != src) {
>> > -			folio_unlock(dst);
>> > +			if (fault_folio != dst)
>> > +				folio_unlock(dst);
>> >  			if (folio_is_zone_device(dst))
>> >  				folio_put(dst);
>> >  			else
>> > @@ -857,6 +865,22 @@ void migrate_device_finalize(unsigned long *src_pfns,
>> >  		}
>> >  	}
>> >  }
>> > +
>> > +/*
>> > + * migrate_device_finalize() - complete page migration
>> > + * @src_pfns: src_pfns returned from migrate_device_range()
>> > + * @dst_pfns: array of pfns allocated by the driver to migrate memory to
>> > + * @npages: number of pages in the range
>> > + *
>> > + * Completes migration of the page by removing special migration entries.
>> > + * Drivers must ensure copying of page data is complete and visible to the CPU
>> > + * before calling this.
>> > + */
>> > +void migrate_device_finalize(unsigned long *src_pfns,
>> > +			unsigned long *dst_pfns, unsigned long npages)
>> > +{
>> > +	return __migrate_device_finalize(src_pfns, dst_pfns, npages, NULL);
>> > +}
>> >  EXPORT_SYMBOL(migrate_device_finalize);
>> >  
>> >  /**
>> > @@ -872,7 +896,8 @@ EXPORT_SYMBOL(migrate_device_finalize);
>> >   */
>> >  void migrate_vma_finalize(struct migrate_vma *migrate)
>> >  {
>> > -	migrate_device_finalize(migrate->src, migrate->dst, migrate->npages);
>> > +	__migrate_device_finalize(migrate->src, migrate->dst, migrate->npages,
>> > +				  migrate->fault_page);
>> >  }
>> >  EXPORT_SYMBOL(migrate_vma_finalize);
>> 


  reply	other threads:[~2024-10-17  1:53 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-16  3:24 [PATCH v2 00/29] Introduce GPU SVM and Xe SVM implementation Matthew Brost
2024-10-16  3:24 ` [PATCH v2 01/29] drm/xe: Retry BO allocation Matthew Brost
2024-10-16  3:24 ` [PATCH v2 02/29] mm/migrate: Add migrate_device_prepopulated_range Matthew Brost
2024-10-16  4:04   ` Alistair Popple
2024-10-16  4:46     ` Matthew Brost
2024-10-17  0:56       ` Matthew Brost
2024-10-17  1:49         ` Alistair Popple
2024-10-17  2:45           ` Matthew Brost
2024-10-17  3:21             ` Alistair Popple
2024-10-17  4:07               ` Matthew Brost
2024-10-17  5:49                 ` Alistair Popple
2024-10-17 15:40                   ` Matthew Brost
2024-10-17 21:58                     ` Alistair Popple
2024-10-18  0:54                       ` Matthew Brost
2024-10-18  5:59                         ` Alistair Popple
2024-10-18  6:39                           ` Mika Penttilä
2024-10-18  7:16                           ` Matthew Brost
2024-10-18  7:33                             ` Matthew Brost
2024-10-18  7:34                             ` Alistair Popple
2024-10-18  7:57                               ` Matthew Brost
2024-10-18  4:02                       ` Mika Penttilä
2024-10-18  5:55                         ` Alistair Popple
2024-10-16  3:24 ` [PATCH v2 03/29] mm/migrate: Trylock device page in do_swap_page Matthew Brost
2024-10-16  4:00   ` Alistair Popple
2024-10-16  4:41     ` Matthew Brost
2024-10-17  1:51       ` Alistair Popple [this message]
2024-10-25  0:31         ` Matthew Brost
2024-10-29  6:37           ` Alistair Popple
2024-11-01 17:19             ` Matthew Brost
2024-11-28 23:31   ` Alistair Popple
2024-12-13 22:16     ` Matthew Brost
2024-12-14  5:59       ` Matthew Brost
2024-10-16  3:24 ` [PATCH v2 04/29] drm/pagemap: Add DRM pagemap Matthew Brost
2024-10-16  3:24 ` [PATCH v2 05/29] drm/gpusvm: Add support for GPU Shared Virtual Memory Matthew Brost
2024-10-31 18:58   ` Thomas Hellström
2024-11-04 22:53     ` Matthew Brost
2024-11-04 15:25   ` Thomas Hellström
2024-11-04 17:21     ` Matthew Brost
2024-11-04 18:59       ` Thomas Hellström
2024-11-04 23:07         ` Matthew Brost
2024-11-05 10:22           ` Thomas Hellström
2024-11-05 16:12             ` Matthew Brost
2024-11-05 16:28               ` Thomas Hellström
2024-11-05 14:48   ` Thomas Hellström
2024-11-05 16:32     ` Matthew Brost
2024-11-20  3:00   ` Gwan-gyeong Mun
2024-11-29  0:00   ` Alistair Popple
2024-12-14  1:16     ` Matthew Brost
2024-10-16  3:24 ` [PATCH v2 06/29] drm/xe/uapi: Add DRM_XE_VM_BIND_FLAG_SYSTEM_ALLOCATON flag Matthew Brost
2024-11-18 13:44   ` Thomas Hellström
2024-11-19 16:01     ` Matthew Brost
2024-10-16  3:24 ` [PATCH v2 07/29] drm/xe: Add SVM init / close / fini to faulting VMs Matthew Brost
2024-11-19 12:13   ` Thomas Hellström
2024-11-19 16:22     ` Matthew Brost
2024-10-16  3:24 ` [PATCH v2 08/29] drm/xe: Add dma_addr res cursor Matthew Brost
2024-11-19 12:15   ` Thomas Hellström
2024-11-19 16:24     ` Matthew Brost
2024-10-16  3:24 ` [PATCH v2 09/29] drm/xe: Add SVM range invalidation Matthew Brost
2024-11-19 13:56   ` Thomas Hellström
2024-12-11 19:01     ` Matthew Brost
2024-12-14 23:11       ` Matthew Brost
2024-12-16 10:01       ` Thomas Hellström
2024-12-16 16:09         ` Matthew Brost
2024-12-16 17:35           ` Thomas Hellström
2024-10-16  3:24 ` [PATCH v2 10/29] drm/gpuvm: Add DRM_GPUVA_OP_USER Matthew Brost
2024-11-19 13:57   ` Thomas Hellström
2024-11-19 16:26     ` Matthew Brost
2024-10-16  3:25 ` [PATCH v2 11/29] drm/xe: Add (re)bind to SVM page fault handler Matthew Brost
2024-11-19 14:26   ` Thomas Hellström
2024-12-11 19:07     ` Matthew Brost
2024-12-16 10:03       ` Thomas Hellström
2024-10-16  3:25 ` [PATCH v2 12/29] drm/xe: Add SVM garbage collector Matthew Brost
2024-11-19 14:45   ` Thomas Hellström
2024-12-11 19:17     ` Matthew Brost
2024-12-16 10:36       ` Thomas Hellström
2024-12-16 23:46         ` Matthew Brost
2024-10-16  3:25 ` [PATCH v2 13/29] drm/xe: Add unbind to " Matthew Brost
2024-11-19 15:31   ` Thomas Hellström
2024-11-19 23:44     ` Matthew Brost
2024-10-16  3:25 ` [PATCH v2 14/29] drm/xe: Do not allow system allocator VMA unbind if the GPU has bindings Matthew Brost
2024-11-19 16:33   ` Thomas Hellström
2024-11-19 23:37     ` Matthew Brost
2024-10-16  3:25 ` [PATCH v2 15/29] drm/xe: Enable system allocator uAPI Matthew Brost
2024-11-19 16:34   ` Thomas Hellström
2024-10-16  3:25 ` [PATCH v2 16/29] drm/xe: Add migrate layer functions for SVM support Matthew Brost
2024-11-19 16:45   ` Thomas Hellström
2024-11-19 23:08     ` Matthew Brost
2024-11-20  8:04       ` Thomas Hellström
2024-12-11 19:11         ` Matthew Brost
2024-10-16  3:25 ` [PATCH v2 17/29] drm/xe: Add SVM device memory mirroring Matthew Brost
2024-11-19 16:50   ` Thomas Hellström
2024-11-20  3:05   ` Gwan-gyeong Mun
2024-12-11 19:44     ` Matthew Brost
2024-10-16  3:25 ` [PATCH v2 18/29] drm/xe: Add drm_gpusvm_devmem to xe_bo Matthew Brost
2024-11-19 16:51   ` Thomas Hellström
2024-12-15  4:38     ` Matthew Brost
2024-10-16  3:25 ` [PATCH v2 19/29] drm/xe: Add GPUSVM devic memory copy vfunc functions Matthew Brost
2024-12-02 10:13   ` Thomas Hellström
2024-12-12  3:59     ` Matthew Brost
2024-10-16  3:25 ` [PATCH v2 20/29] drm/xe: Add drm_pagemap ops to SVM Matthew Brost
2024-10-16  3:25 ` [PATCH v2 21/29] drm/xe: Add Xe SVM populate_devmem_pfn vfunc Matthew Brost
2024-12-02 10:19   ` Thomas Hellström
2024-10-16  3:25 ` [PATCH v2 22/29] drm/xe: Add Xe SVM devmem_release vfunc Matthew Brost
2024-12-02 10:21   ` Thomas Hellström
2024-10-16  3:25 ` [PATCH v2 23/29] drm/xe: Add BO flags required for SVM Matthew Brost
2024-12-02 10:44   ` Thomas Hellström
2024-12-11 21:42     ` Matthew Brost
2024-12-16 10:44       ` Thomas Hellström
2024-10-16  3:25 ` [PATCH v2 24/29] drm/xe: Add SVM VRAM migration Matthew Brost
2024-12-02 12:06   ` Thomas Hellström
2024-12-11 20:17     ` Matthew Brost
2024-10-16  3:25 ` [PATCH v2 25/29] drm/xe: Basic SVM BO eviction Matthew Brost
2024-12-02 12:27   ` Thomas Hellström
2024-12-11 19:47     ` Matthew Brost
2024-10-16  3:25 ` [PATCH v2 26/29] drm/xe: Add SVM debug Matthew Brost
2024-12-02 12:33   ` Thomas Hellström
2024-12-17  1:05     ` Matthew Brost
2024-10-16  3:25 ` [PATCH v2 27/29] drm/xe: Add modparam for SVM notifier size Matthew Brost
2024-12-02 12:37   ` Thomas Hellström
2024-12-11 19:50     ` Matthew Brost
2024-10-16  3:25 ` [PATCH v2 28/29] drm/xe: Add always_migrate_to_vram modparam Matthew Brost
2024-12-02 12:40   ` Thomas Hellström
2024-12-11 19:51     ` Matthew Brost
2024-10-16  3:25 ` [PATCH v2 29/29] drm/doc: gpusvm: Add GPU SVM documentation Matthew Brost
2024-12-02 13:00   ` Thomas Hellström
2024-12-17 23:14     ` Matthew Brost
2024-10-16  3:30 ` ✓ CI.Patch_applied: success for Introduce GPU SVM and Xe SVM implementation (rev2) Patchwork
2024-10-16  3:31 ` ✗ CI.checkpatch: warning " Patchwork
2024-10-16  3:31 ` ✗ CI.KUnit: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cyjzo5pt.fsf@nvdebian.thelocal \
    --to=apopple@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=simona.vetter@ffwll.ch \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox