All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: "Huang, Honglei1" <honghuan@amd.com>
Cc: "Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"Kuehling, Felix" <Felix.Kuehling@amd.com>,
	"Koenig, Christian" <Christian.Koenig@amd.com>,
	"Zeng, Oak" <Oak.Zeng@amd.com>,
	"Liu, Jenny (Jing)" <Jenny-Jing.Liu@amd.com>,
	 "Yang, Philip" <Philip.Yang@amd.com>,
	"Chen, Xiaogang" <Xiaogang.Chen@amd.com>,
	"Huang, Ray" <Ray.Huang@amd.com>,
	"Zhu, Lingshan" <Lingshan.Zhu@amd.com>,
	"Shen, Junhua" <Junhua.Shen@amd.com>, <yiru.ma@amd.com>,
	"sima@ffwll.ch" <sima@ffwll.ch>,
	"rodrigo.vivi@intel.com" <rodrigo.vivi@intel.com>,
	"thomas.hellstrom@linux.intel.com"
	<thomas.hellstrom@linux.intel.com>,
	"dakr@kernel.org" <dakr@kernel.org>,
	"aliceryhl@google.com" <aliceryhl@google.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>
Subject: Re: [RFC/POC PATCH 00/12] POC SVM implementation in AMDGPU based on drm_gpusvm
Date: Wed, 29 Apr 2026 20:12:31 -0700	[thread overview]
Message-ID: <afLInxpNTkA2yOZn@gsse-cloud1.jf.intel.com> (raw)
In-Reply-To: <02ded343-38bd-4621-83fd-78a027eb96e2@amd.com>

On Thu, Apr 30, 2026 at 10:56:47AM +0800, Huang, Honglei1 wrote:
> 
> 
> On 4/29/2026 5:56 PM, Huang, Honglei1 wrote:
> > 
> > 
> > On 4/23/2026 4:22 PM, Huang, Honglei1 wrote:
> > ...
> > > > > > > > > > > 
> > > > > > > > > > > This patch series implements basic
> > > > > > > > > > > SVM support with the following
> > > > > > > > > > > features:
> > > > > > > > > > > 
> > > > > > > > > > >       1. attributes sepatarated from physical page management:
> > > > > > > > > > > 
> > > > > > > > > > >         - Attribute layer
> > > > > > > > > > > (amdgpu_svm_attr_tree): a driver
> > > > > > > > > > > side interval
> > > > > > > > > > >           tree that stores SVM
> > > > > > > > > > > attributes. Managed through the
> > > > > > > > > > > SET_ATTR,
> > > > > > > > > > >           and mmu notifier callback.
> > > > > > > > 
> > > > > > > > Can you explain the mmu notifier callback
> > > > > > > > interaction here? See below in
> > > > > > > > Xe the attribute tree is existing VMA tree (gpuvm).
> > > > > > > > 
> > > > > > > 
> > > > > > > Let me try to explain, apologies if the description is not fully
> > > > > > > precise.
> > > > > > > 
> > > > > > > In current implementation, the MMU notifier callback
> > > > > > > interacts with the attr
> > > > > > > tree only in the munmap path remove the corresponding attribute
> > > > > > > entries from the attr tree so that stale attributes
> > > > > > > do not persist for
> > > > > > > freed address space.
> > > > > > > 
> > > > > > 
> > > > > > Ah, yes. We reset our attributes upon munmap too. We
> > > > > > actually don't this
> > > > > > 100% correct quite either and series in flight to fix [1].
> > > > > > 
> > > > > > [1] https://patchwork.freedesktop.org/series/161815/
> > > > > 
> > > > > Hi matt,
> > > > > 
> > > > > It seems like you are tring to modify the implementation
> > > > > into remove the
> > > > > attributes when munmap.
> > > > > 
> > > > > Actuall we have a discussion internally that does the driver
> > > > > need to remove
> > > > > the attributes when munmap.
> > > > > 
> > > > > So there servel ideas:
> > > > > 
> > > > > 1. attribute need keep: attributes may be needed again when a new VMA
> > > > > appears or on subsequent faults.
> > > > > 2.attribute need keep: attributes can be set independent of
> > > > > whether memory
> > > > > is currently mapped; attributes persist and are modified explicitly via
> > > > > ioctl, not implicitly by notifier callbacks.
> > > > > 3. attribute need remove: casue VMA is gone, driver can do
> > > > > nothing without
> > > > > VMA.
> > > > > 
> > > > > and I saw xe_svm set default attribute in the previous
> > > > > version, this is also
> > > > > a option.
> > > > > 
> > > > > Can you please help to give some information that why xe_svm
> > > > > is turing to
> > > > > remove the attribute when munmap? And does keeping attribute
> > > > > is a valid way?
> > > > > 
> > > > 
> > > > This is a semantic choice, and we’re trying to match the semantics of
> > > > CPU madvise. I believe any semantic an individual driver stack wants to
> > > > define is valid, but if vendors mismatch sematics this will create a
> > > > level of vendor lock in which may (cough Nvidia, CUDA) or may not (open
> > > > source) be desired.
> > > > 
> > > > AFAIK, if you do something like this in C (a CPU-only program):
> > > > 
> > > > mmap(addr_range);
> > > > madvise(addr_range, some_flags);
> > > > munmap(addr_range);
> > > > 
> > > > mmap(addr_range); /* Here the madvise attributes are reset */
> > > > 
> > > > Also, AFAIK, the CUDA GPU madvise API works this way as well.
> > > > 
> > > > That said, making this work 100% reliably is quite difficult, especially
> > > > with a rude user.
> > > > 
> > > > For example:
> > > > 
> > > > mmap(addr_range);
> > > > gpu_madvise(addr_range, some_flags);
> > > > /* GPU never actually touches memory */
> > > > munmap(addr_range);
> > > > 
> > > > So we have an opt-in VM bind flag,
> > > > DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET, which we’re working on to mostly
> > > > handle the “rude” case above. Maybe we can reach 100% correctness, but
> > > > again, this is a difficult problem is WIP.
> > > > 
> > > 
> > > I think the current amdgpu SVM draft version also has the issue for
> > > the rude user situlation. Maybe this is caused by the separation of
> > > attribute layer and physical layer. Seems like KFD_SVM doesn't have
> > > this issue.
> > > 
> > > Maybe driver can find_or_insert in madvise ioctl path, to add a MMU
> > > notifier but do not to get_page, and then clean the attribyte in mmu
> > > notifier callback instead of GC. This is just my thought.
> > > 
> > > And really thanks for the information, and waiting for other's comments.
> > > 
> > 
> > Hi Matt,
> > 
> > I'd like to share a concrete bug we hit in the amdgpu SVM
> > implementation/tests about stale attributes.
> > 
> > In short it is stale attr_range overlaps with VM_PFNMAP VMA
> > 
> > 1: User allocates memory and sets GPU attributes but never use/fault it.
> > 
> >     CPU VMA (anonymous):
> >     |<── 0x1000 ── 0x5000 ──>|
> > 
> >     attr_range:
> >     |<── 0x1000 ── 0x5000 ──>|
> > 
> > 2: User munmaps the region, attr_range is NOT cleaned up
> > 
> >     CPU VMA: (gone)
> > 
> >     attr_range:           stale
> >     |<── 0x1000 ── 0x5000  ─>|
> > 
> >     No gpusvm_range existed, No MMU notifier, No GC, No cleanup
> > 
> > 
> > 3: User mmaps new memory for device pfn remap  partially overlapping the
> > old range,
> > and new memory for GPU set attribute
> > 
> > 
> >     CPU VMAs:
> >     |<── 0x1000 ── 0x2000 ──>|<── 0x2000 ── 0x4000 ──>|
> >     |   VM_PFNMAP                  |   anonymous (new alloc)      |
> > 
> >     attr_range:              STILL STALE
> >     |<────── 0x1000 ─── 0x5000 ─────────────>|
> > 
> > 
> > 4: GPU faults at address 0x3000
> > 
> >     Fault handler finds the stale attr_range [0x1000, 0x5000):
> >       gpuva_start = 0x1000  (from stale attr_range)
> >       gpuva_end   = 0x5000  (from stale attr_range)
> > 
> >     drm_gpusvm_range_chunk_size()  find the  chunk: 0x0000-0x5000 cover
> > the VM_PFNMAP area
> > 
> >     |<── 0x1000 ── 0x2000 ──>|<── 0x2000 ──── 0x4000 ───>|
> >     |   VM_PFNMAP                  |                                    |
> >      hmm_range_fault fails
> >      or vma check:
> >      VM_PFNMAP -> -EOPNOTSUPP
> > 
> > I think it is caused by of the stale attribute.
> > Or is this considered as an invalid userspace behavior?

The above example is exactly what we concerned about and handling in Xe
is WIP (i.e., this example breaks Xe).

> 
> Hi all,
> 
> After internal discussion, I'd like to summary some conclusions from team.
> 
> It is decided that default behavior will be to keep attributes on munmap for
> behavioral consistency, advantages of explicit interfaces, safety and
> extensibility.
> And provide an opt-in flag similar to Xe's
> DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET.
> 
> There is a concern about behavioral consistency from Felix. In userspace,
> whether free() actually triggers a munmap is outside the user's control, the

Yes, free() is the real killer here as libc may or or may call munmap.
Thus we have an opt in flag...

> C library allocator may retain pages internally rather than returning them
> to the OS. As It is said: "when you call malloc and then free, that doesn't
> necessarily result in unmapping the pages it may result in unmapping pages
> if you are freeing something big that was allocated with mmap under the
> hood, but it may also just stick around. From a user's point of view who
> just uses malloc/free and sets some attributes, they have no way of knowing
> whether their attributes will stick or not. I think if the attributes always
> stick, that would give you a more consistent behavior."
> 
> And it is from Christian that explicit interfaces are better than implicit
> ones auto-removing attributes on munmap is an implicit kernel reaction that
> users may not even be aware of. As it is said: "Explicit interfaces which
> say 'hey kernel, do something' are usually better than implicit interfaces
> where the kernel is doing something on its own and we are just reacting to
> it." And it is also noted that we can always add a flag for auto-removal
> later if needed, but the reverse would be a UAPI breaking change.
> 

This might be better - I believe Xe already was explict interfaces. So
DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET might be a non-sense idea but it
is in our uAPI so what is done is done... The free() case is where
DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET may fall apart / become
unpredictable... But it will work (after some Xe changes) if
say an app is also allocating largers sizes (128k plus iirc for default
libc behavior) where libc malloc/free directly map to mmap / munmap
which is why I believe we provided this option.

Matt

> Based on the above, the next steps are:
> 
> 1. Change the logic to keep the attribute as default in current
> implementation.
> 2. Maybe need adding a new ioctl op to explicitly delete attribute?
> Currently the UAPI can only overwrite attributes to not to map/default but
> not truly remove them.
> 3. Add a new UAPI flag similar to DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET.
> 4. For the auto reset path, need to address the "never-faulted" issue.
>     Maybe we can register a lightweight MMU notifier at madvise ioctl time
> to observe munmap and clean up attr_range, even if the GPU never touches the
> range.
> 
> Please correct me if I'm wrong on any of the above assumptions. Would be
> interested to hear thoughts on this direction.
> 
> Regards,
> Honglei
> 
> > Regards,
> > Honglei
> > 
> > > Regards,
> > > Honglei
> > > 
> > > 
> > > > Matt
> > > > 
> > > > > 
> > > > > Regards,
> > > > > Honglei...
> 

      reply	other threads:[~2026-04-30  3:12 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17 11:29 [RFC/POC PATCH 00/12] POC SVM implementation in AMDGPU based on drm_gpusvm Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 01/12] drm/amdgpu: add SVM UAPI definitions Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 02/12] drm/amdgpu: add SVM data structures and header Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 03/12] drm/amdgpu: add SVM attribute data structures Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 04/12] drm/amdgpu: implement SVM attribute tree operations Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 05/12] drm/amdgpu: implement SVM attribute set Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 06/12] drm/amdgpu: add SVM range data structures Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 07/12] drm/amdgpu: implement SVM range PTE flags and GPU mapping Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 08/12] drm/amdgpu: implement SVM range notifier and invalidation Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 09/12] drm/amdgpu: implement SVM range workers Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 10/12] drm/amdgpu: implement SVM core initialization and fini Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 11/12] drm/amdgpu: implement SVM ioctl and fault handler Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 12/12] drm/amdgpu: wire up SVM build system " Honglei Huang
2026-03-17 11:48 ` [RFC/POC PATCH 00/12] POC SVM implementation in AMDGPU based on drm_gpusvm Christian König
2026-03-18  8:59   ` Honglei Huang
2026-03-19  5:08     ` Matthew Brost
2026-03-19 14:17       ` Honglei Huang
2026-03-23  6:31         ` Matthew Brost
2026-03-24  7:24           ` Honglei Huang
2026-03-25 22:24             ` Matthew Brost
2026-03-26 12:16               ` Honglei Huang
2026-04-15 10:04                 ` Huang, Honglei1
2026-04-23  6:40                   ` Matthew Brost
2026-04-23  7:18                     ` Matthew Brost
2026-04-23 11:03                       ` Huang, Honglei1
2026-04-23 20:21                         ` Matthew Brost
2026-04-24 10:43                           ` Huang, Honglei1
2026-04-27 20:00                             ` Felix Kuehling
2026-04-28  2:23                               ` Huang, Honglei1
2026-04-30  3:04                                 ` Matthew Brost
2026-04-23  6:09           ` Huang, Honglei1
2026-04-23  6:52             ` Matthew Brost
2026-04-23  8:22               ` Huang, Honglei1
2026-04-29  9:56                 ` Huang, Honglei1
2026-04-30  2:56                   ` Huang, Honglei1
2026-04-30  3:12                     ` Matthew Brost [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afLInxpNTkA2yOZn@gsse-cloud1.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Jenny-Jing.Liu@amd.com \
    --cc=Junhua.Shen@amd.com \
    --cc=Lingshan.Zhu@amd.com \
    --cc=Oak.Zeng@amd.com \
    --cc=Philip.Yang@amd.com \
    --cc=Ray.Huang@amd.com \
    --cc=Xiaogang.Chen@amd.com \
    --cc=aliceryhl@google.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=honghuan@amd.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=sima@ffwll.ch \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=yiru.ma@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.