Re: [RFC/POC PATCH 00/12] POC SVM implementation in AMDGPU based on drm_gpusvm

AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: "Huang, Honglei1" <honghuan@amd.com>
To: "Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"Kuehling, Felix" <Felix.Kuehling@amd.com>,
	"Koenig, Christian" <Christian.Koenig@amd.com>,
	"Zeng, Oak" <Oak.Zeng@amd.com>,
	"Liu, Jenny (Jing)" <Jenny-Jing.Liu@amd.com>,
	"Yang, Philip" <Philip.Yang@amd.com>,
	"Chen, Xiaogang" <Xiaogang.Chen@amd.com>,
	"Huang, Ray" <Ray.Huang@amd.com>,
	"Zhu, Lingshan" <Lingshan.Zhu@amd.com>,
	"Shen, Junhua" <Junhua.Shen@amd.com>,
	yiru.ma@amd.com, "sima@ffwll.ch" <sima@ffwll.ch>,
	"matthew.brost@intel.com" <matthew.brost@intel.com>,
	"rodrigo.vivi@intel.com" <rodrigo.vivi@intel.com>,
	"thomas.hellstrom@linux.intel.com"
	<thomas.hellstrom@linux.intel.com>,
	"dakr@kernel.org" <dakr@kernel.org>,
	"aliceryhl@google.com" <aliceryhl@google.com>
Cc: "amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>
Subject: Re: [RFC/POC PATCH 00/12] POC SVM implementation in AMDGPU based on drm_gpusvm
Date: Thu, 30 Apr 2026 10:56:47 +0800	[thread overview]
Message-ID: <02ded343-38bd-4621-83fd-78a027eb96e2@amd.com> (raw)
In-Reply-To: <5e44f013-a1c1-4621-9eea-e4f2459f223e@amd.com>



On 4/29/2026 5:56 PM, Huang, Honglei1 wrote:
> 
> 
> On 4/23/2026 4:22 PM, Huang, Honglei1 wrote:
> ...
>>>>>>>>>>
>>>>>>>>>> This patch series implements basic SVM support with the 
>>>>>>>>>> following features:
>>>>>>>>>>
>>>>>>>>>>       1. attributes sepatarated from physical page management:
>>>>>>>>>>
>>>>>>>>>>         - Attribute layer (amdgpu_svm_attr_tree): a driver 
>>>>>>>>>> side interval
>>>>>>>>>>           tree that stores SVM attributes. Managed through the 
>>>>>>>>>> SET_ATTR,
>>>>>>>>>>           and mmu notifier callback.
>>>>>>>
>>>>>>> Can you explain the mmu notifier callback interaction here? See 
>>>>>>> below in
>>>>>>> Xe the attribute tree is existing VMA tree (gpuvm).
>>>>>>>
>>>>>>
>>>>>> Let me try to explain, apologies if the description is not fully
>>>>>> precise.
>>>>>>
>>>>>> In current implementation, the MMU notifier callback interacts 
>>>>>> with the attr
>>>>>> tree only in the munmap path remove the corresponding attribute
>>>>>> entries from the attr tree so that stale attributes do not persist 
>>>>>> for
>>>>>> freed address space.
>>>>>>
>>>>>
>>>>> Ah, yes. We reset our attributes upon munmap too. We actually don't 
>>>>> this
>>>>> 100% correct quite either and series in flight to fix [1].
>>>>>
>>>>> [1] https://patchwork.freedesktop.org/series/161815/
>>>>
>>>> Hi matt,
>>>>
>>>> It seems like you are tring to modify the implementation into remove 
>>>> the
>>>> attributes when munmap.
>>>>
>>>> Actuall we have a discussion internally that does the driver need to 
>>>> remove
>>>> the attributes when munmap.
>>>>
>>>> So there servel ideas:
>>>>
>>>> 1. attribute need keep: attributes may be needed again when a new VMA
>>>> appears or on subsequent faults.
>>>> 2.attribute need keep: attributes can be set independent of whether 
>>>> memory
>>>> is currently mapped; attributes persist and are modified explicitly via
>>>> ioctl, not implicitly by notifier callbacks.
>>>> 3. attribute need remove: casue VMA is gone, driver can do nothing 
>>>> without
>>>> VMA.
>>>>
>>>> and I saw xe_svm set default attribute in the previous version, this 
>>>> is also
>>>> a option.
>>>>
>>>> Can you please help to give some information that why xe_svm is 
>>>> turing to
>>>> remove the attribute when munmap? And does keeping attribute is a 
>>>> valid way?
>>>>
>>>
>>> This is a semantic choice, and we’re trying to match the semantics of
>>> CPU madvise. I believe any semantic an individual driver stack wants to
>>> define is valid, but if vendors mismatch sematics this will create a
>>> level of vendor lock in which may (cough Nvidia, CUDA) or may not (open
>>> source) be desired.
>>>
>>> AFAIK, if you do something like this in C (a CPU-only program):
>>>
>>> mmap(addr_range);
>>> madvise(addr_range, some_flags);
>>> munmap(addr_range);
>>>
>>> mmap(addr_range); /* Here the madvise attributes are reset */
>>>
>>> Also, AFAIK, the CUDA GPU madvise API works this way as well.
>>>
>>> That said, making this work 100% reliably is quite difficult, especially
>>> with a rude user.
>>>
>>> For example:
>>>
>>> mmap(addr_range);
>>> gpu_madvise(addr_range, some_flags);
>>> /* GPU never actually touches memory */
>>> munmap(addr_range);
>>>
>>> So we have an opt-in VM bind flag,
>>> DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET, which we’re working on to mostly
>>> handle the “rude” case above. Maybe we can reach 100% correctness, but
>>> again, this is a difficult problem is WIP.
>>>
>>
>> I think the current amdgpu SVM draft version also has the issue for 
>> the rude user situlation. Maybe this is caused by the separation of 
>> attribute layer and physical layer. Seems like KFD_SVM doesn't have 
>> this issue.
>>
>> Maybe driver can find_or_insert in madvise ioctl path, to add a MMU 
>> notifier but do not to get_page, and then clean the attribyte in mmu 
>> notifier callback instead of GC. This is just my thought.
>>
>> And really thanks for the information, and waiting for other's comments.
>>
> 
> Hi Matt,
> 
> I'd like to share a concrete bug we hit in the amdgpu SVM 
> implementation/tests about stale attributes.
> 
> In short it is stale attr_range overlaps with VM_PFNMAP VMA
> 
> 1: User allocates memory and sets GPU attributes but never use/fault it.
> 
>     CPU VMA (anonymous):
>     |<── 0x1000 ── 0x5000 ──>|
> 
>     attr_range:
>     |<── 0x1000 ── 0x5000 ──>|
> 
> 2: User munmaps the region, attr_range is NOT cleaned up
> 
>     CPU VMA: (gone)
> 
>     attr_range:           stale
>     |<── 0x1000 ── 0x5000  ─>|
> 
>     No gpusvm_range existed, No MMU notifier, No GC, No cleanup
> 
> 
> 3: User mmaps new memory for device pfn remap  partially overlapping the 
> old range,
> and new memory for GPU set attribute
> 
> 
>     CPU VMAs:
>     |<── 0x1000 ── 0x2000 ──>|<── 0x2000 ── 0x4000 ──>|
>     |   VM_PFNMAP                  |   anonymous (new alloc)      |
> 
>     attr_range:              STILL STALE
>     |<────── 0x1000 ─── 0x5000 ─────────────>|
> 
> 
> 4: GPU faults at address 0x3000
> 
>     Fault handler finds the stale attr_range [0x1000, 0x5000):
>       gpuva_start = 0x1000  (from stale attr_range)
>       gpuva_end   = 0x5000  (from stale attr_range)
> 
>     drm_gpusvm_range_chunk_size()  find the  chunk: 0x0000-0x5000 cover 
> the VM_PFNMAP area
> 
>     |<── 0x1000 ── 0x2000 ──>|<── 0x2000 ──── 0x4000 ───>|
>     |   VM_PFNMAP                  |                                    |
>      hmm_range_fault fails
>      or vma check:
>      VM_PFNMAP -> -EOPNOTSUPP
> 
> I think it is caused by of the stale attribute.
> Or is this considered as an invalid userspace behavior?

Hi all,

After internal discussion, I'd like to summary some conclusions from team.

It is decided that default behavior will be to keep attributes on munmap 
for behavioral consistency, advantages of explicit interfaces, safety 
and extensibility.
And provide an opt-in flag similar to Xe's 
DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET.

There is a concern about behavioral consistency from Felix. In 
userspace, whether free() actually triggers a munmap is outside the 
user's control, the C library allocator may retain pages internally 
rather than returning them to the OS. As It is said: "when you call 
malloc and then free, that doesn't necessarily result in unmapping the 
pages it may result in unmapping pages if you are freeing something big 
that was allocated with mmap under the hood, but it may also just stick 
around. From a user's point of view who just uses malloc/free and sets 
some attributes, they have no way of knowing whether their attributes 
will stick or not. I think if the attributes always stick, that would 
give you a more consistent behavior."

And it is from Christian that explicit interfaces are better than 
implicit ones auto-removing attributes on munmap is an implicit kernel 
reaction that users may not even be aware of. As it is said: "Explicit 
interfaces which say 'hey kernel, do something' are usually better than 
implicit interfaces where the kernel is doing something on its own and 
we are just reacting to it." And it is also noted that we can always add 
a flag for auto-removal later if needed, but the reverse would be a UAPI 
breaking change.

Based on the above, the next steps are:

1. Change the logic to keep the attribute as default in current 
implementation.
2. Maybe need adding a new ioctl op to explicitly delete attribute? 
Currently the UAPI can only overwrite attributes to not to map/default 
but not truly remove them.
3. Add a new UAPI flag similar to DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET.
4. For the auto reset path, need to address the "never-faulted" issue.
     Maybe we can register a lightweight MMU notifier at madvise ioctl 
time to observe munmap and clean up attr_range, even if the GPU never 
touches the range.

Please correct me if I'm wrong on any of the above assumptions. Would be 
interested to hear thoughts on this direction.

Regards,
Honglei

> Regards,
> Honglei
> 
>> Regards,
>> Honglei
>>
>>
>>> Matt
>>>
>>>>
>>>> Regards,
>>>> Honglei...

next prev parent reply	other threads:[~2026-04-30  2:56 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17 11:29 [RFC/POC PATCH 00/12] POC SVM implementation in AMDGPU based on drm_gpusvm Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 01/12] drm/amdgpu: add SVM UAPI definitions Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 02/12] drm/amdgpu: add SVM data structures and header Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 03/12] drm/amdgpu: add SVM attribute data structures Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 04/12] drm/amdgpu: implement SVM attribute tree operations Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 05/12] drm/amdgpu: implement SVM attribute set Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 06/12] drm/amdgpu: add SVM range data structures Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 07/12] drm/amdgpu: implement SVM range PTE flags and GPU mapping Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 08/12] drm/amdgpu: implement SVM range notifier and invalidation Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 09/12] drm/amdgpu: implement SVM range workers Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 10/12] drm/amdgpu: implement SVM core initialization and fini Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 11/12] drm/amdgpu: implement SVM ioctl and fault handler Honglei Huang
2026-03-17 11:29 ` [RFC/POC PATCH 12/12] drm/amdgpu: wire up SVM build system " Honglei Huang
2026-03-17 11:48 ` [RFC/POC PATCH 00/12] POC SVM implementation in AMDGPU based on drm_gpusvm Christian König
2026-03-18  8:59   ` Honglei Huang
2026-03-19  5:08     ` Matthew Brost
2026-03-19 14:17       ` Honglei Huang
2026-03-23  6:31         ` Matthew Brost
2026-03-24  7:24           ` Honglei Huang
2026-03-25 22:24             ` Matthew Brost
2026-03-26 12:16               ` Honglei Huang
2026-04-15 10:04                 ` Huang, Honglei1
2026-04-23  6:40                   ` Matthew Brost
2026-04-23  7:18                     ` Matthew Brost
2026-04-23 11:03                       ` Huang, Honglei1
2026-04-23 20:21                         ` Matthew Brost
2026-04-24 10:43                           ` Huang, Honglei1
2026-04-27 20:00                             ` Felix Kuehling
2026-04-28  2:23                               ` Huang, Honglei1
2026-04-30  3:04                                 ` Matthew Brost
2026-04-23  6:09           ` Huang, Honglei1
2026-04-23  6:52             ` Matthew Brost
2026-04-23  8:22               ` Huang, Honglei1
2026-04-29  9:56                 ` Huang, Honglei1
2026-04-30  2:56                   ` Huang, Honglei1 [this message]
2026-04-30  3:12                     ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=02ded343-38bd-4621-83fd-78a027eb96e2@amd.com \
    --to=honghuan@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Jenny-Jing.Liu@amd.com \
    --cc=Junhua.Shen@amd.com \
    --cc=Lingshan.Zhu@amd.com \
    --cc=Oak.Zeng@amd.com \
    --cc=Philip.Yang@amd.com \
    --cc=Ray.Huang@amd.com \
    --cc=Xiaogang.Chen@amd.com \
    --cc=aliceryhl@google.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=sima@ffwll.ch \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=yiru.ma@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox