Re: [RFC/POC PATCH 00/12] POC SVM implementation in AMDGPU based on drm_gpusvm

AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Matthew Brost <matthew.brost@intel.com>
To: "Huang, Honglei1" <honghuan@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>,
	amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	Alexander.Deucher@amd.com, Felix.Kuehling@amd.com,
	"Honglei Huang" <honglei1.huang@amd.com>,
	Oak.Zeng@amd.com, Jenny-Jing.Liu@amd.com, Philip.Yang@amd.com,
	Xiaogang.Chen@amd.com, Ray.Huang@amd.com, Lingshan.Zhu@amd.com,
	Junhua.Shen@amd.com,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"Alice Ryhl" <aliceryhl@google.com>
Subject: Re: [RFC/POC PATCH 00/12] POC SVM implementation in AMDGPU based on drm_gpusvm
Date: Thu, 23 Apr 2026 13:21:31 -0700	[thread overview]
Message-ID: <aep/S6m7vG5K7Ra9@gsse-cloud1.jf.intel.com> (raw)
In-Reply-To: <812102ac-906b-4fbf-bc8c-1881493a3eca@amd.com>

On Thu, Apr 23, 2026 at 07:03:52PM +0800, Huang, Honglei1 wrote:
> 
> 
> On 4/23/2026 3:18 PM, Matthew Brost wrote:
> ...
> > > > > > > > This clarifies a lot. This is what we’d call in Xe “preemption fence”
> > > > > > > > mode for a VM. Anytime memory is moved, we trigger a GPU preemption and
> > > > > > > > resume. We don’t actually support SVM in this case; instead, we use
> > > > > > > > “userptr binds,” which are built on gpusvm for page
> > > > > > > > collection. However,
> > > > > > > > we don’t support migrating memory to the device—though we could.
> > > > > > > > 
> > > > > > > > I’d look at how we converted 'userptr' to be based on GPU SVM [2]. In
> > > > > > > > this case, don’t maintain a range tree, as those—as you
> > > > > > > > suggest—are more
> > > > > > > > of an on-demand fault driver concern. Instead, just embed 'struct
> > > > > > > > drm_gpusvm_pages' in the VMA struct defined by the IOCTLs..
> > > > > > > > 
> > > > > > > > We could extend this to support migrating 'userptr', but we
> > > > > > > > just haven’t
> > > > > > > > done that yet—this may be what you want to do in “XNACK off..
> > > > > > > > 
> > > > > > > > [2] https://patchwork.freedesktop.org/series/146553/
> > > > > > > > 
> > > > > > > 
> > > > > > > Actually we need to swith the xnack mode between on and off, so
> > > > > > > in xnack off
> > > > > > > mode, the driver operats in "implicit  prefetch mode". This may
> > > > > > > be due to
> > > > > > > compatibility with older hardware and the need for UMD runtime. We will
> > > > > > > further discuss the handling method under xnack off internally.
> > > > > > > 
> > > > 
> > > > Hi Matt,
> > > > 
> > > > I studied the xe_userptr code and the conversion series [2] you
> > > > pointed to.
> > > > 
> > > > I have a question that:
> > > > Would it be possible to reuse drm_gpusvm_range to handle the hardware
> > > > without gpu fault feature(xnack off mode).
> > > 
> > > That’s not how we’ve done it. We embedded drm_gpusvm_pages into our VMA
> > > structure and then attached a notifier. The notifier attachment is
> > > open-coded on the Xe side, and this could be normalized and opened up
> > > for common driver use cases.
> 
> The way in xe_userptr likes the implementation in kfd_svm: embeded physical
> pages into structure and attach same size notifier.
> But kfd_svm is an implementation of SVM semantics, which supports partial
> unmap, doesn't need explicitly delete userptr ioctl calling when remove ,
> and doesn't need a explicitly userptr flag when creating.
> And actually there is also a existing implementation for userptr semantics
> in amdgpu kfd: KFD_IOC_ALLOC_MEM_FLAGS_USERPTR.
> If the no gpu fault mode can not use the drm gpu svm fram work, use the same
> way for xe_userptr, it seems like doing the duplicate work.
> 
> I think the core gap is we are trying to use the drmgpu_svm to implement a
> SVM semantics driver for no gpu fault hardware instead of userptr semantics.
> 
> > > 
> > > The problem with reusing drm_gpusvm_range directly is that a VMA may
> > > span multiple gpusvm notifiers—i.e., it can be larger than the notifier
> > > size. Of course, we could rework this as well.
> 
> So the "VMA spans multiple gpusvm notifiers" concern: I'd like to clarify
> that this is not actually a blocker for amdgpu's XNACK-off path, because
> amdgpu does not try to represent one user ioctl virtual address interval as
> a single drm_gpusvm_range.
> 
> we walk the attr interval and call drm_gpusvm_range_find_or_insert()
> repeatedly, letting gpusvm pick chunk aligned ranges bounded by
> notifier_size. One ioctl interval will create N chunk sized ranges.
> 
> > > 
> > 
> > Sorry for the double reply—I just glanced at the latest series. I don’t
> > think creating a range per page of the userptr is desirable. While it
> > would work, from a time-complexity point of view I don’t think this is
> > ideal.
> > 
> > The issue with spans across multiple notifiers is real, though.
> > 
> > My rough idea would be:
> > 
> > - Give drivers an interface to create larger ranges.
> 
> So maybe we do not need to create larger ranges if we call
> drm_gpusvm_range_find_or_insert() repeatedly.
> 

That will be functional, but consider it from a time-complexity point of
view.

Multiple ranges increase the time complexity of range-tree searches.
This isn’t a huge deal, but it will show up to some extent.

Multiple ranges will also slow down DMA mapping and migration. We
switched over to the dma_iova_alloc/link/unlink/sync uAPI here [1].
While dma_iova_link is a relatively fast radix-tree walk, the allocation
and sync steps are where things get expensive. Therefore, it is
advantageous to perform these steps as few times as possible. For
example, if your SVM buffer is 512MB, instead of doing these steps 256
times, you do them once. The same logic applies to the migrate_vma_*
functions—they are quite expensive, so doing them in a single shot is
significantly faster.

The same applies to invalidations. If you can invalidate a large range
in a single shot, it will be faster. Although the logic in the notifier
should be able to zap multiple ranges in one shot (Xe does this), having
to DMA-unmap a single large range will still be faster than multiple
smaller DMA unmaps.

The TL;DR is if your driver knows size of SVM allocation upfront (e.g.,
an IOCTL tells you the size) it makes more sense to use a single large
struct (either embedded drm_gpusvm_pages into a VMA or we figure out an
interface to insert large ranges / notifiers).

[1] https://patchwork.freedesktop.org/series/160587/

> > 
> > - If the range fits inside a single notifier’s size → done.
> > 
> > - If the range spans multiple notifier sizes → round up to a power of
> >    two and create a larger notifier. This may overlap with existing
> >    notifiers, which is likely fine given that interval trees support
> >    overlaps (?). We’d need to double-check and test this. If overlapping
> >    notifiers are not acceptable, we’d need some heavy-handed notifier merge
> >    logic—it will be complicated, but isolated, so once we get it right
> >    everyone can use it.
> 
> If we call drm_gpusvm_range_find_or_insert() repeatedly the drmgpu_svm will
> create the corresponding notifier correctly as far as I can see.
> 

I agree this will be functional but not ideal. You can always start the
approach you have here and optimize it later by adding the required
support in GPU SVM.

Matt

> Regards,
> Honglei
> 
> > 
> > - Finally, make sure that individual userptr pages can reside at any
> >    location.
> > 
> > Over conversely:
> > 
> > - Normalize embedding of drm_gpusvm_pages in VMA structs + notifier
> >    creation
> > 
> > - Make sure that individual userptr pages can reside at any location.
> 
> > 
> > Both options actually sound really similar after typing this out.
> > 
> > Matt
> > 
> > > So either way, the Xe userptr + gpusvm implementation should be refined
> > > further for common driver use.
> > > 
> > > > 
> > > > Reusing drm_gpusvm_range for the XNACK-off case would simplify our
> > > > implementation considerably, it already provides large page chunk
> > > > optimization, can reuse the existing migration infrastructure.
> > > > 
> > > > Building these on top of a standalone drm_gpusvm_pages
> > > > would mean reimplementing much of what the range layer already offers.
> > > > It would also let us keep a single code path for both XNACK modes,
> > > > which reduces maintenance burden and avoids behavioral difference.
> > > > 
> > > > Would this direction be acceptable, or do you see concerns with reusing
> > > > the range infrastructure for the no-fault case?
> > > > 
> > > 
> > > If you prefer something like insert a range exactly here + create range
> > > + notifier I think that completely reasonable direction and Xe would
> > > likely switch over to using this.
> > > 
> > > I guess my only concern is sub-userptr migration. We are trending
> > > towards allowing userptrs to being migrated either via prefetch IOCTLs
> > > or access counters on the GPU side - access counter we'd likely a single
> > > 2M page at time migration within the userptr. get_pages() supports mixed
> > > mappings between VRAM + system but likely needs some more work to really
> > > make this complete though.
> > > 
> > > Matt
> > > > Regards,
> > > > Honglei
> ...
> 
>

next prev parent reply	other threads:[~2026-04-23 20:22 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17 11:29 [RFC/POC PATCH 00/12] POC SVM implementation in AMDGPU based on drm_gpusvm Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 01/12] drm/amdgpu: add SVM UAPI definitions Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 02/12] drm/amdgpu: add SVM data structures and header Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 03/12] drm/amdgpu: add SVM attribute data structures Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 04/12] drm/amdgpu: implement SVM attribute tree operations Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 05/12] drm/amdgpu: implement SVM attribute set Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 06/12] drm/amdgpu: add SVM range data structures Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 07/12] drm/amdgpu: implement SVM range PTE flags and GPU mapping Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 08/12] drm/amdgpu: implement SVM range notifier and invalidation Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 09/12] drm/amdgpu: implement SVM range workers Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 10/12] drm/amdgpu: implement SVM core initialization and fini Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 11/12] drm/amdgpu: implement SVM ioctl and fault handler Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 12/12] drm/amdgpu: wire up SVM build system " Honglei Huang
2026-03-17 11:48 ` [RFC/POC PATCH 00/12] POC SVM implementation in AMDGPU based on drm_gpusvm Christian König
2026-03-18  8:59   ` Honglei Huang
2026-03-19  5:08     ` Matthew Brost
2026-03-19 14:17       ` Honglei Huang
2026-03-23  6:31         ` Matthew Brost
2026-03-24  7:24           ` Honglei Huang
2026-03-25 22:24             ` Matthew Brost
2026-03-26 12:16               ` Honglei Huang
2026-04-15 10:04                 ` Huang, Honglei1
2026-04-23  6:40                   ` Matthew Brost
2026-04-23  7:18                     ` Matthew Brost
2026-04-23 11:03                       ` Huang, Honglei1
2026-04-23 20:21                         ` Matthew Brost [this message]
2026-04-24 10:43                           ` Huang, Honglei1
2026-04-27 20:00                             ` Felix Kuehling
2026-04-28  2:23                               ` Huang, Honglei1
2026-04-30  3:04                                 ` Matthew Brost
2026-04-23  6:09           ` Huang, Honglei1
2026-04-23  6:52             ` Matthew Brost
2026-04-23  8:22               ` Huang, Honglei1
2026-04-29  9:56                 ` Huang, Honglei1
2026-04-30  2:56                   ` Huang, Honglei1
2026-04-30  3:12                     ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aep/S6m7vG5K7Ra9@gsse-cloud1.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Jenny-Jing.Liu@amd.com \
    --cc=Junhua.Shen@amd.com \
    --cc=Lingshan.Zhu@amd.com \
    --cc=Oak.Zeng@amd.com \
    --cc=Philip.Yang@amd.com \
    --cc=Ray.Huang@amd.com \
    --cc=Xiaogang.Chen@amd.com \
    --cc=aliceryhl@google.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=honghuan@amd.com \
    --cc=honglei1.huang@amd.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox