dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	himal.prasad.ghimiray@intel.com, apopple@nvidia.com,
	airlied@gmail.com, "Simona Vetter" <simona.vetter@ffwll.ch>,
	felix.kuehling@amd.com,
	"Christian König" <christian.koenig@amd.com>,
	dakr@kernel.org, "Mrozek, Michal" <michal.mrozek@intel.com>,
	"Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>
Subject: Re: [PATCH 15/15] drm/xe: Retry migration once
Date: Tue, 28 Oct 2025 10:11:18 +0100	[thread overview]
Message-ID: <aa2638b6324ea7162f4f1a2e867f36df6da413f7.camel@linux.intel.com> (raw)
In-Reply-To: <aQAKsw5nd+WVNLu+@lstrano-desk.jf.intel.com>

Hi, Matt

On Mon, 2025-10-27 at 17:13 -0700, Matthew Brost wrote:
> On Sat, Oct 25, 2025 at 02:04:12PM +0200, Thomas Hellström wrote:
> > Data present in foreign device memory may cause migration to fail.
> > For now, retry once after first migrating to system.
> > 
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_svm.c | 19 +++++++++++++++----
> >  1 file changed, 15 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_svm.c
> > b/drivers/gpu/drm/xe/xe_svm.c
> > index 9814f95cb212..41e075aa015c 100644
> > --- a/drivers/gpu/drm/xe/xe_svm.c
> > +++ b/drivers/gpu/drm/xe/xe_svm.c
> > @@ -1529,13 +1529,24 @@ struct drm_pagemap
> > *xe_vma_resolve_pagemap(struct xe_vma *vma, struct xe_tile *t
> >  int xe_svm_alloc_vram(struct xe_svm_range *range, const struct
> > drm_gpusvm_ctx *ctx,
> >  		      struct drm_pagemap *dpagemap)
> >  {
> > +	int err, retries = 1;
> > +
> >  	xe_assert(range_to_vm(&range->base)->xe, range-
> > >base.pages.flags.migrate_devmem);
> >  	range_debug(range, "ALLOCATE VRAM");
> >  
> > -	return drm_pagemap_populate_mm(dpagemap,
> > xe_svm_range_start(range),
> > -				       xe_svm_range_end(range),
> > -				       range->base.gpusvm->mm,
> > -				       ctx->timeslice_ms);
> > +retry:
> > +	err = drm_pagemap_populate_mm(dpagemap,
> > xe_svm_range_start(range),
> > +				      xe_svm_range_end(range),
> > +				      range->base.gpusvm->mm,
> > +				      ctx->timeslice_ms);
> > +	if ((err == -EBUSY || err == -EFAULT) && retries--) {
> 
> I don't think this is what we want to do here. -EFAULT indicates that
> the pages are entirely present somewhere in device memory. This could
> be
> either on the local device or on a foreign device, but we don’t have
> enough information here to determine which case it is.
> 
> If this is on our local device, we're always good. This could occur
> playing mremap games.
> 
> If it's on a foreign device, things get trickier. If our interconnect
> supports atomics we're still good. But if the interconnect
> doesn't support atomics (e.g., PCIe P2P), this an atomic fault, then
> we
> need to move the memory. Also, if there's no path between device
> memories, then of course we need to move the memory.
> 
> Again, we don’t have enough information here to make the correct
> decision. We really need to call drm_gpusvm_range_get_pages to gather
> the CPU pages in order to make this kind of decision. Ideally, the
> logic
> should be built into drm_gpusvm_range_get_pages to understand atomic
> migration requirements.

For multi-device I'm just looking at a patch that considers p2p
migration and at the same time returns 0 if data is placed in
compatible memory, given migration policies and interconnects. But
until that patch lands, we need a way to evict memory from foreign
devices so that we can migrate to the desired device.

I would have expected that if memory is already present in local device
memory we'd have that xe_svm_range_in_vram() flag set and would not
attempt to migrate, at least in most cases? Currently, if data is
already fully or partly present in another p2p-device memory, and we
ignore the -EFAULT, then get_pages() wouldn't detect that? Or well, we
can look at the dpagemap returned from get_pages and retry the
migration at that point.

We also need to realize that with multi-gpu, the chances of migration
races increases dramatically and whether those return -EBUSY or -EFAULT
appears a bit arbitrary to me? We can't really assume that cpages == 0
means all pages are already where they are supposed to be.

My current thinking how to handle all this is the following:

1) xe_svm_range_in_vram(), first check to avoid migration.
2) (NEW, not implemented yet) if we decide to migrate, first run a
hmm_range_fault() without faulting flag to determine current memory
migration status - Perhaps optional. This may reject migration more
efficiently than if we collect pages for migration and then inspect
them, because then we've already sent an invalidation event.
3) Call into drm_pagemap_populate_mm(). This collects all compatible-
and system pages, and determines what to migrate. If no migration
needed, returns 0. If racing or needing to migrate foreign devices to
system, return -EBUSY,
4) If -EBUSY evict, and retry migration once.

For now, I think we make atomic faults use local VRAM only. Moving fow

> 
> Once drm_gpusvm_range_get_pages returns, we can take appropriate
> action.
> Initially, for simplicity, this might just be a bounce to system
> memory.
> Later, it could evolve into a direct device-to-device move.

I agree we need a pass with hmm_range_fault(), question is in what
order we do this. I think a pass without the fault flag on before
trying to migrate would

a) Avoid populating with system pages for data that's going to be in
VRAM anyway,
b) Possibly avoiding collecting migrate pages and thus also an
invalidation for all devices.

The drawback is we'd might unnecessarily run a non-faulting
hmm_range_fault() when we need to migrate anyway. My thinking is, that
would be a rather quick call, though compared to the reverse lookups in
the migrate code.

> 
> The logic inside drm_gpusvm_range_get_pages would likely involve
> devmem_only combined with a drm_pagemap passed in, which can detect
> connectivity and atomic support between devices—based on the
> drm_pagemap
> extracted from the ZDD.
> 
> Let know if thia makes sense, or if you have thought about doing this
> in
> a follow up.

In any case I think we need to set up a flow-chart / flow-list similar
to the above and consider the most important cases and what to do with
them. for now, I think we can replace this patch if necessary with a
dpagemap check, whether desired equals what's present and rerun after
that. We'd probably need that anyway.

Thomas




> 
> Matt
> 
> > +		range_debug(range, "ALLOCATE VRAM - Retry.");
> > +
> > +		drm_gpusvm_range_evict(range->base.gpusvm, &range-
> > >base);
> > +		goto retry;
> > +	}
> > +
> > +	return err;
> >  }
> >  
> >  static struct drm_pagemap_addr
> > -- 
> > 2.51.0
> > 


  reply	other threads:[~2025-10-28  9:11 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-25 12:03 [PATCH 00/15] Dynamic drm_pagemaps and Initial multi-device SVM Thomas Hellström
2025-10-25 12:03 ` [PATCH 01/15] drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap Thomas Hellström
2025-10-29  0:31   ` Matthew Brost
2025-10-29  1:11   ` Matthew Brost
2025-10-29 14:51     ` Thomas Hellström
2025-10-25 12:03 ` [PATCH 02/15] drm/pagemap: Add a refcounted drm_pagemap backpointer to struct drm_pagemap_zdd Thomas Hellström
2025-10-29  0:33   ` Matthew Brost
2025-10-25 12:04 ` [PATCH 03/15] drm/pagemap, drm/xe: Manage drm_pagemap provider lifetimes Thomas Hellström
2025-10-29  0:46   ` Matthew Brost
2025-10-29 14:49     ` Thomas Hellström
2025-10-30  2:46       ` Matthew Brost
2025-10-25 12:04 ` [PATCH 04/15] drm/pagemap: Add a drm_pagemap cache and shrinker Thomas Hellström
2025-10-28  1:23   ` Matthew Brost
2025-10-28  9:46     ` Thomas Hellström
2025-10-28 10:29       ` Thomas Hellström
2025-10-28 18:38         ` Matthew Brost
2025-10-29 22:41   ` Matthew Brost
2025-10-29 22:48   ` Matthew Brost
2025-10-25 12:04 ` [PATCH 05/15] drm/xe: Use the " Thomas Hellström
2025-10-30  0:43   ` Matthew Brost
2025-10-25 12:04 ` [PATCH 06/15] drm/pagemap: Remove the drm_pagemap_create() interface Thomas Hellström
2025-10-29  1:00   ` Matthew Brost
2025-10-25 12:04 ` [PATCH 07/15] drm/pagemap_util: Add a utility to assign an owner to a set of interconnected gpus Thomas Hellström
2025-10-29  1:21   ` Matthew Brost
2025-10-29 14:52     ` Thomas Hellström
2025-10-25 12:04 ` [PATCH 08/15] drm/xe: Use the drm_pagemap_util helper to get a svm pagemap owner Thomas Hellström
2025-10-27 23:02   ` Matthew Brost
2025-10-25 12:04 ` [PATCH 09/15] drm/xe: Pass a drm_pagemap pointer around with the memory advise attributes Thomas Hellström
2025-10-28  0:35   ` Matthew Brost
2025-10-25 12:04 ` [PATCH 10/15] drm/xe: Use the vma attibute drm_pagemap to select where to migrate Thomas Hellström
2025-10-25 18:01   ` kernel test robot
2025-10-29  3:27   ` Matthew Brost
2025-10-29 14:56     ` Thomas Hellström
2025-10-29 16:59   ` kernel test robot
2025-10-25 12:04 ` [PATCH 11/15] drm/xe: Simplify madvise_preferred_mem_loc() Thomas Hellström
2025-10-27 23:14   ` Matthew Brost
2025-10-25 12:04 ` [PATCH 12/15] drm/xe/uapi: Extend the madvise functionality to support foreign pagemap placement for svm Thomas Hellström
2025-10-28  0:51   ` Matthew Brost
2025-10-25 12:04 ` [PATCH 13/15] drm/xe: Support pcie p2p dma as a fast interconnect Thomas Hellström
2025-10-28  1:14   ` Matthew Brost
2025-10-28  9:32     ` Thomas Hellström
2025-10-29  2:17   ` Matthew Brost
2025-10-29 14:54     ` Thomas Hellström
2025-10-25 12:04 ` [PATCH 14/15] drm/xe/vm: Add a prefetch debug printout Thomas Hellström
2025-10-27 23:16   ` Matthew Brost
2025-10-25 12:04 ` [PATCH 15/15] drm/xe: Retry migration once Thomas Hellström
2025-10-28  0:13   ` Matthew Brost
2025-10-28  9:11     ` Thomas Hellström [this message]
2025-10-28 19:03       ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa2638b6324ea7162f4f1a2e867f36df6da413f7.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=airlied@gmail.com \
    --cc=apopple@nvidia.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=matthew.brost@intel.com \
    --cc=michal.mrozek@intel.com \
    --cc=simona.vetter@ffwll.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).