AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: "Huang, Honglei1" <honghuan@amd.com>
Cc: "Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"Kuehling, Felix" <Felix.Kuehling@amd.com>,
	"Koenig, Christian" <Christian.Koenig@amd.com>,
	"Zeng, Oak" <Oak.Zeng@amd.com>,
	"Liu, Jenny (Jing)" <Jenny-Jing.Liu@amd.com>,
	 "Yang, Philip" <Philip.Yang@amd.com>,
	"Chen, Xiaogang" <Xiaogang.Chen@amd.com>,
	"Huang, Ray" <Ray.Huang@amd.com>,
	"Zhu, Lingshan" <Lingshan.Zhu@amd.com>,
	"Shen, Junhua" <Junhua.Shen@amd.com>, <yiru.ma@amd.com>,
	"sima@ffwll.ch" <sima@ffwll.ch>,
	"rodrigo.vivi@intel.com" <rodrigo.vivi@intel.com>,
	"thomas.hellstrom@linux.intel.com"
	<thomas.hellstrom@linux.intel.com>,
	"dakr@kernel.org" <dakr@kernel.org>,
	"aliceryhl@google.com" <aliceryhl@google.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>
Subject: Re: [RFC/POC PATCH 00/12] POC SVM implementation in AMDGPU based on drm_gpusvm
Date: Wed, 29 Apr 2026 20:12:31 -0700	[thread overview]
Message-ID: <afLInxpNTkA2yOZn@gsse-cloud1.jf.intel.com> (raw)
In-Reply-To: <02ded343-38bd-4621-83fd-78a027eb96e2@amd.com>

On Thu, Apr 30, 2026 at 10:56:47AM +0800, Huang, Honglei1 wrote:
> 
> 
> On 4/29/2026 5:56 PM, Huang, Honglei1 wrote:
> > 
> > 
> > On 4/23/2026 4:22 PM, Huang, Honglei1 wrote:
> > ...
> > > > > > > > > > > 
> > > > > > > > > > > This patch series implements basic
> > > > > > > > > > > SVM support with the following
> > > > > > > > > > > features:
> > > > > > > > > > > 
> > > > > > > > > > >       1. attributes sepatarated from physical page management:
> > > > > > > > > > > 
> > > > > > > > > > >         - Attribute layer
> > > > > > > > > > > (amdgpu_svm_attr_tree): a driver
> > > > > > > > > > > side interval
> > > > > > > > > > >           tree that stores SVM
> > > > > > > > > > > attributes. Managed through the
> > > > > > > > > > > SET_ATTR,
> > > > > > > > > > >           and mmu notifier callback.
> > > > > > > > 
> > > > > > > > Can you explain the mmu notifier callback
> > > > > > > > interaction here? See below in
> > > > > > > > Xe the attribute tree is existing VMA tree (gpuvm).
> > > > > > > > 
> > > > > > > 
> > > > > > > Let me try to explain, apologies if the description is not fully
> > > > > > > precise.
> > > > > > > 
> > > > > > > In current implementation, the MMU notifier callback
> > > > > > > interacts with the attr
> > > > > > > tree only in the munmap path remove the corresponding attribute
> > > > > > > entries from the attr tree so that stale attributes
> > > > > > > do not persist for
> > > > > > > freed address space.
> > > > > > > 
> > > > > > 
> > > > > > Ah, yes. We reset our attributes upon munmap too. We
> > > > > > actually don't this
> > > > > > 100% correct quite either and series in flight to fix [1].
> > > > > > 
> > > > > > [1] https://patchwork.freedesktop.org/series/161815/
> > > > > 
> > > > > Hi matt,
> > > > > 
> > > > > It seems like you are tring to modify the implementation
> > > > > into remove the
> > > > > attributes when munmap.
> > > > > 
> > > > > Actuall we have a discussion internally that does the driver
> > > > > need to remove
> > > > > the attributes when munmap.
> > > > > 
> > > > > So there servel ideas:
> > > > > 
> > > > > 1. attribute need keep: attributes may be needed again when a new VMA
> > > > > appears or on subsequent faults.
> > > > > 2.attribute need keep: attributes can be set independent of
> > > > > whether memory
> > > > > is currently mapped; attributes persist and are modified explicitly via
> > > > > ioctl, not implicitly by notifier callbacks.
> > > > > 3. attribute need remove: casue VMA is gone, driver can do
> > > > > nothing without
> > > > > VMA.
> > > > > 
> > > > > and I saw xe_svm set default attribute in the previous
> > > > > version, this is also
> > > > > a option.
> > > > > 
> > > > > Can you please help to give some information that why xe_svm
> > > > > is turing to
> > > > > remove the attribute when munmap? And does keeping attribute
> > > > > is a valid way?
> > > > > 
> > > > 
> > > > This is a semantic choice, and we’re trying to match the semantics of
> > > > CPU madvise. I believe any semantic an individual driver stack wants to
> > > > define is valid, but if vendors mismatch sematics this will create a
> > > > level of vendor lock in which may (cough Nvidia, CUDA) or may not (open
> > > > source) be desired.
> > > > 
> > > > AFAIK, if you do something like this in C (a CPU-only program):
> > > > 
> > > > mmap(addr_range);
> > > > madvise(addr_range, some_flags);
> > > > munmap(addr_range);
> > > > 
> > > > mmap(addr_range); /* Here the madvise attributes are reset */
> > > > 
> > > > Also, AFAIK, the CUDA GPU madvise API works this way as well.
> > > > 
> > > > That said, making this work 100% reliably is quite difficult, especially
> > > > with a rude user.
> > > > 
> > > > For example:
> > > > 
> > > > mmap(addr_range);
> > > > gpu_madvise(addr_range, some_flags);
> > > > /* GPU never actually touches memory */
> > > > munmap(addr_range);
> > > > 
> > > > So we have an opt-in VM bind flag,
> > > > DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET, which we’re working on to mostly
> > > > handle the “rude” case above. Maybe we can reach 100% correctness, but
> > > > again, this is a difficult problem is WIP.
> > > > 
> > > 
> > > I think the current amdgpu SVM draft version also has the issue for
> > > the rude user situlation. Maybe this is caused by the separation of
> > > attribute layer and physical layer. Seems like KFD_SVM doesn't have
> > > this issue.
> > > 
> > > Maybe driver can find_or_insert in madvise ioctl path, to add a MMU
> > > notifier but do not to get_page, and then clean the attribyte in mmu
> > > notifier callback instead of GC. This is just my thought.
> > > 
> > > And really thanks for the information, and waiting for other's comments.
> > > 
> > 
> > Hi Matt,
> > 
> > I'd like to share a concrete bug we hit in the amdgpu SVM
> > implementation/tests about stale attributes.
> > 
> > In short it is stale attr_range overlaps with VM_PFNMAP VMA
> > 
> > 1: User allocates memory and sets GPU attributes but never use/fault it.
> > 
> >     CPU VMA (anonymous):
> >     |<── 0x1000 ── 0x5000 ──>|
> > 
> >     attr_range:
> >     |<── 0x1000 ── 0x5000 ──>|
> > 
> > 2: User munmaps the region, attr_range is NOT cleaned up
> > 
> >     CPU VMA: (gone)
> > 
> >     attr_range:           stale
> >     |<── 0x1000 ── 0x5000  ─>|
> > 
> >     No gpusvm_range existed, No MMU notifier, No GC, No cleanup
> > 
> > 
> > 3: User mmaps new memory for device pfn remap  partially overlapping the
> > old range,
> > and new memory for GPU set attribute
> > 
> > 
> >     CPU VMAs:
> >     |<── 0x1000 ── 0x2000 ──>|<── 0x2000 ── 0x4000 ──>|
> >     |   VM_PFNMAP                  |   anonymous (new alloc)      |
> > 
> >     attr_range:              STILL STALE
> >     |<────── 0x1000 ─── 0x5000 ─────────────>|
> > 
> > 
> > 4: GPU faults at address 0x3000
> > 
> >     Fault handler finds the stale attr_range [0x1000, 0x5000):
> >       gpuva_start = 0x1000  (from stale attr_range)
> >       gpuva_end   = 0x5000  (from stale attr_range)
> > 
> >     drm_gpusvm_range_chunk_size()  find the  chunk: 0x0000-0x5000 cover
> > the VM_PFNMAP area
> > 
> >     |<── 0x1000 ── 0x2000 ──>|<── 0x2000 ──── 0x4000 ───>|
> >     |   VM_PFNMAP                  |                                    |
> >      hmm_range_fault fails
> >      or vma check:
> >      VM_PFNMAP -> -EOPNOTSUPP
> > 
> > I think it is caused by of the stale attribute.
> > Or is this considered as an invalid userspace behavior?

The above example is exactly what we concerned about and handling in Xe
is WIP (i.e., this example breaks Xe).

> 
> Hi all,
> 
> After internal discussion, I'd like to summary some conclusions from team.
> 
> It is decided that default behavior will be to keep attributes on munmap for
> behavioral consistency, advantages of explicit interfaces, safety and
> extensibility.
> And provide an opt-in flag similar to Xe's
> DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET.
> 
> There is a concern about behavioral consistency from Felix. In userspace,
> whether free() actually triggers a munmap is outside the user's control, the

Yes, free() is the real killer here as libc may or or may call munmap.
Thus we have an opt in flag...

> C library allocator may retain pages internally rather than returning them
> to the OS. As It is said: "when you call malloc and then free, that doesn't
> necessarily result in unmapping the pages it may result in unmapping pages
> if you are freeing something big that was allocated with mmap under the
> hood, but it may also just stick around. From a user's point of view who
> just uses malloc/free and sets some attributes, they have no way of knowing
> whether their attributes will stick or not. I think if the attributes always
> stick, that would give you a more consistent behavior."
> 
> And it is from Christian that explicit interfaces are better than implicit
> ones auto-removing attributes on munmap is an implicit kernel reaction that
> users may not even be aware of. As it is said: "Explicit interfaces which
> say 'hey kernel, do something' are usually better than implicit interfaces
> where the kernel is doing something on its own and we are just reacting to
> it." And it is also noted that we can always add a flag for auto-removal
> later if needed, but the reverse would be a UAPI breaking change.
> 

This might be better - I believe Xe already was explict interfaces. So
DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET might be a non-sense idea but it
is in our uAPI so what is done is done... The free() case is where
DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET may fall apart / become
unpredictable... But it will work (after some Xe changes) if
say an app is also allocating largers sizes (128k plus iirc for default
libc behavior) where libc malloc/free directly map to mmap / munmap
which is why I believe we provided this option.

Matt

> Based on the above, the next steps are:
> 
> 1. Change the logic to keep the attribute as default in current
> implementation.
> 2. Maybe need adding a new ioctl op to explicitly delete attribute?
> Currently the UAPI can only overwrite attributes to not to map/default but
> not truly remove them.
> 3. Add a new UAPI flag similar to DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET.
> 4. For the auto reset path, need to address the "never-faulted" issue.
>     Maybe we can register a lightweight MMU notifier at madvise ioctl time
> to observe munmap and clean up attr_range, even if the GPU never touches the
> range.
> 
> Please correct me if I'm wrong on any of the above assumptions. Would be
> interested to hear thoughts on this direction.
> 
> Regards,
> Honglei
> 
> > Regards,
> > Honglei
> > 
> > > Regards,
> > > Honglei
> > > 
> > > 
> > > > Matt
> > > > 
> > > > > 
> > > > > Regards,
> > > > > Honglei...
> 

      reply	other threads:[~2026-04-30  3:12 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17 11:29 [RFC/POC PATCH 00/12] POC SVM implementation in AMDGPU based on drm_gpusvm Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 01/12] drm/amdgpu: add SVM UAPI definitions Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 02/12] drm/amdgpu: add SVM data structures and header Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 03/12] drm/amdgpu: add SVM attribute data structures Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 04/12] drm/amdgpu: implement SVM attribute tree operations Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 05/12] drm/amdgpu: implement SVM attribute set Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 06/12] drm/amdgpu: add SVM range data structures Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 07/12] drm/amdgpu: implement SVM range PTE flags and GPU mapping Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 08/12] drm/amdgpu: implement SVM range notifier and invalidation Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 09/12] drm/amdgpu: implement SVM range workers Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 10/12] drm/amdgpu: implement SVM core initialization and fini Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 11/12] drm/amdgpu: implement SVM ioctl and fault handler Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 12/12] drm/amdgpu: wire up SVM build system " Honglei Huang
2026-03-17 11:48 ` [RFC/POC PATCH 00/12] POC SVM implementation in AMDGPU based on drm_gpusvm Christian König
2026-03-18  8:59   ` Honglei Huang
2026-03-19  5:08     ` Matthew Brost
2026-03-19 14:17       ` Honglei Huang
2026-03-23  6:31         ` Matthew Brost
2026-03-24  7:24           ` Honglei Huang
2026-03-25 22:24             ` Matthew Brost
2026-03-26 12:16               ` Honglei Huang
2026-04-15 10:04                 ` Huang, Honglei1
2026-04-23  6:40                   ` Matthew Brost
2026-04-23  7:18                     ` Matthew Brost
2026-04-23 11:03                       ` Huang, Honglei1
2026-04-23 20:21                         ` Matthew Brost
2026-04-24 10:43                           ` Huang, Honglei1
2026-04-27 20:00                             ` Felix Kuehling
2026-04-28  2:23                               ` Huang, Honglei1
2026-04-30  3:04                                 ` Matthew Brost
2026-04-23  6:09           ` Huang, Honglei1
2026-04-23  6:52             ` Matthew Brost
2026-04-23  8:22               ` Huang, Honglei1
2026-04-29  9:56                 ` Huang, Honglei1
2026-04-30  2:56                   ` Huang, Honglei1
2026-04-30  3:12                     ` Matthew Brost [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afLInxpNTkA2yOZn@gsse-cloud1.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Jenny-Jing.Liu@amd.com \
    --cc=Junhua.Shen@amd.com \
    --cc=Lingshan.Zhu@amd.com \
    --cc=Oak.Zeng@amd.com \
    --cc=Philip.Yang@amd.com \
    --cc=Ray.Huang@amd.com \
    --cc=Xiaogang.Chen@amd.com \
    --cc=aliceryhl@google.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=honghuan@amd.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=sima@ffwll.ch \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=yiru.ma@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox