All of lore.kernel.org
 help / color / mirror / Atom feed
From: Huang Rui <ray.huang@amd.com>
To: Honglei Huang <honglei1.huang@amd.com>
Cc: Alexander.Deucher@amd.com, Felix.Kuehling@amd.com,
	Christian.Koenig@amd.com, Oak.Zeng@amd.com,
	Jenny-Jing.Liu@amd.com, Philip.Yang@amd.com,
	Xiaogang.Chen@amd.com, Lingshan.Zhu@amd.com, Junhua.Shen@amd.com,
	matthew.brost@intel.com, rodrigo.vivi@intel.com,
	thomas.hellstrom@linux.intel.com, dakr@kernel.org,
	aliceryhl@google.com, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, honghuan@amd.com
Subject: Re: [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm
Date: Tue, 21 Apr 2026 10:31:09 +0800	[thread overview]
Message-ID: <aebhbe9gMAqz1tpM@amd.com> (raw)
In-Reply-To: <20260420131307.1816671-1-honglei1.huang@amd.com>

On Mon, Apr 20, 2026 at 09:12:55PM +0800, Honglei Huang wrote:
> From: Honglei Huang <honghuan@amd.com>
> 
> V3 of the SVM patch series for amdgpu based on the drm_gpusvm framework. 
> This revision incorporates feedback from V1, adds XNACK on GPU fault handling,
> improves code organization, and removes the XNACK off (no GPU fault) implementation
> to focus on the fault driven model that aligns with drm_gpusvm's design. 
> The implementation references extensively from xe_svm.
> 
> This patch series implements SVM support with the following design:
> 
>   1. Attributes separated from physical page management:
> 
>     - Attribute layer (amdgpu_svm_attr_tree): a driver-side interval
>       tree storing per-range SVM attributes. Managed through SET_ATTR
>       ioctl and preserved across range lifecycle events.
> 
>     - Physical page layer (drm_gpusvm ranges): managed by the
>       drm_gpusvm framework, representing HMM-backed DMA mappings
>       and GPU page table entries.
> 
>     This separation ensures attributes survive when GPU ranges are
>     destroyed (partial munmap, attribute split, GC). The fault
>     handler recreates GPU ranges from the attribute tree on demand.
> 
>   2. GPU fault driven mapping (XNACK on):
> 
>     The core mapping path is driven by GPU page faults instead of ioctls.
>     amdgpu_svm_handle_fault() looks up SVM by PASID, runs GC,
>     resolves attributes, then maps via find_or_insert -> get_pages
>     -> GPU PTE update. For unregistered addresses, default
>     attributes are derived from VMA properties automatically.
> 
>   3. MMU notifier invalidation:
> 
>     Two-phase callback: event_begin() zaps GPU PTEs and flushes
>     TLB, event_end() unmaps DMA pages. UNMAP events queue ranges
>     to GC for deferred cleanup. Non-UNMAP events (eviction) rely
>     on GPU fault to remap.
> 
>   4. Garbage collector:
> 
>     GC workqueue processes unmapped ranges: removes them
>     from drm_gpusvm and clears corresponding attributes. No
>     rebuild or restore logic, GPU fault handles recreation.
> 
> Changes since V2:
>   - Add version tittle in commit message.
>   - Fix some content mistaken.
> 
> Changes since V1:
>   - Added GPU fault handler: amdgpu_svm_handle_fault with PASID-based
>     SVM lookup, following the standard flow: garbage collector ->
>     find or insert range -> check valid -> migrate (TODO) / get_pages
>     -> GPU bind/map.
> 
>   - Removed the restore worker queue entirely. V1 had separate GC
>     and restore workers: restore workers were responsible for 
>     synchronously restore in queue stop/start cause no GPU fault support.
>     With XNACK on fault driven model, synchronous restore is unnecessary,
>     the GPU fault handler recreates ranges on demand. The GC worker in 
>     V2 is simplified to only discard ranges and clear their attributes, 
>     with no rebuild or restore logic. AMDGPU_SVM_FLAG_GPU_ALWAYS_MAPPED
>     support is removed as no restore worker.
> 
>   - Reworked MMU notifier callback (amdgpu_svm_range_invalidate):
>     V1 had a monolithic dispatcher with flag combinations and
>     queue ops (CLEAR_PTE/QUEUE_INTERVAL, UNMAP/RESTORE) plus
>     begin_restore() to quiesce KFD queues. V2 uses a two-phase
>     model: event_begin() zaps GPU PTEs and flushes TLB,
>     event_end() unmaps DMA pages and queues UNMAP ranges to GC.
>     Non-UNMAP events (eviction) just zap PTEs and let GPU fault
>     remap. Removed begin_restore/end_restore callbacks,
>     has_always_mapped_range() check, and NOTIFIER flag dispatch.
>     Added checkpoint timestamp capture on UNMAP for fault dedup.
> 
>   - Added amdgpu_svm_range_invalidate_interval(): when userspace
>     sets new attributes on a sub region of an existing attribute
>     range, the attribute tree splits the old range and the new
>     sub region gets different attributes. However, existing
>     drm_gpusvm ranges may across the new attribute boundary
>     (e.g., a 2M GPU range covers both the old and new attribute
>     regions). This function walks all gpusvm ranges in the
>     affected interval, zaps GPU PTEs and flushes TLB. Ranges
>     that cross the new boundary and old boundary are removed 
>     entirely so the GPU fault handler can recreate them with 
>     boundaries aligned to the updated attribute layout.
> 
>   - On MMU_NOTIFY_UNMAP events, discard all affected gpusvm ranges
>     entirely without synchronous rebuild in v1. The unmap may destroy
>     more ranges than strictly necessary (e.g., a partial munmap
>     hits a 2M range that extends beyond the unmapped region), but
>     the attribute layer preserves the still valid attributes for
>     the remaining address space. When the GPU next accesses those
>     addresses, the fault handler automatically recreates the
>     ranges with correct boundaries from the surviving attributes.
>     This avoids the synchronous rebuild logic that V1 required 
>     (unmap -> rebuild in GC/restore worker).
> 
>   - Add attribute creation for unregistered addresses:
>     amdgpu_svm_range_get_unregistered_attrs() derives default
>     SVM attributes from VMA properties and GPU IP capabilities
>     when the faulting address has no user attributes registered.
>     this feature is needed to pass ROCm user mode runtime tests:
>     kfd/rocr/hip. ROCm supports no registered virtual address access
>     with default SVM attributes before, so amdgpu svm needs to support.
> 
>   - Explicitly returns -EOPNOTSUPP in amdgpu_svm_init when XNACK
>     is disabled. V1 attempted mixed XNACK on/off support with
>     complex KFD queue quiesce/resume callbacks and ioctl driven
>     mapping paths, which added substantial complexity. V2 drops
>     these implementations to focus on the fault driven model.
> 
>   - Removed kgd2kfd_quiesce_mm()/resume_mm() dependency that V1
>     used for XNACK off queue control. For XNACK on, the GPU fault 
>     handler is the enterance for SVM range mapping, so no quiesce/resume
>     is needed for this version. 
> 
>   - Added new change triggers: TRIGGER_RANGE_SPLIT, TRIGGER_PREFETCH.
>     for sub attr set and prefetch trigger support.
> 
>   - Added helper functions: find_locked, get_bounds_locked,
>     set_default for GPU fault handling.
> 
>   - Design questions section removed.
> 
> TODO:
>   - Add multi GPU support.
>   - Add XNACK off mode.
>   - Add migration or prefetch. This part work is ongoing in:
>     https://lore.kernel.org/amd-gfx/20260410113146.146212-1-Junhua.Shen@amd.com/
> 
> Test results:
>   Tested on gfx943 (MI300X) and gfx906 (MI60) with XNACK on:
>   - KFD test: 95%+ passed.
>   - ROCR test: all passed.
>   - HIP catch test: gfx943 (MI300X): 96% passed.
>                     gfx906 (MI60):99% passed.

It would be best to also include the ROCm runtime merge request in the
cover letter, and clarify that the above test results are based on V3 +
user-space ROCR.

https://github.com/ROCm/rocm-systems/pull/4364

Thanks,
Ray

> 
> Patch overview:
> 
>   01/12 UAPI: DRM_AMDGPU_GEM_SVM ioctl, SVM flags, SET_ATTR/GET_ATTR
>         operations, attribute types in amdgpu_drm.h.
> 
>   02/12 Core header: amdgpu_svm wrapping drm_gpusvm with refcount,
>         attr_tree, GC struct, locks, and VM integration hooks.
> 
>   03/12 Attribute types: amdgpu_svm_attrs, attr_range (interval tree
>         node), attr_tree, access enum, flag masks, change triggers.
> 
>   04/12 Attribute tree ops: interval tree lookup, insert, remove,
>         find_locked, get_bounds_locked, set_default, and lifecycle.
> 
>   05/12 Attribute set/get/clear: validate UAPI attributes, apply to
>         tree with head/tail splitting, change propagation, and query.
> 
>   06/12 Range types: amdgpu_svm_range extending drm_gpusvm_range
>         with gpu_mapped state, pending ops, work queue linkage,
>         and op_ctx for batch processing.
> 
>   07/12 Range GPU mapping: PTE flags computation with read_only
>         support, GPU page table update, range mapping loop.
> 
>   08/12 Notifier and GC helpers: two-phase notifier events, range
>         removal, GC enqueue/add with dedicated workqueue.
> 
>   09/12 Attribute change and invalidation: apply attribute triggers
>         to GPU ranges, invalidate_interval for boundary realignment,
>         work queue dequeue helpers, checkpoint timestamp.
> 
>   10/12 Initialization and lifecycle: kmem_cache, drm_gpusvm_init
>         with chunk sizes (2M/64K/4K), XNACK detection, GC init,
>         PASID lookup, TLB flush, and init/close/fini lifecycle.
> 
>   11/12 Ioctl, GC, and fault handler: ioctl dispatcher, GC worker,
>         and amdgpu_svm_fault.c/h with full fault path including
>         unregistered attribute derivation and retry logic.
> 
>   12/12 Build integration: Kconfig (CONFIG_DRM_AMDGPU_SVM), Makefile
>         rules, ioctl registration, and amdgpu_vm fault dispatch.
> 
> Honglei Huang (12):
>   drm/amdgpu: define SVM UAPI for GPU shared virtual memory
>   drm/amdgpu: introduce SVM core header and VM integration
>   drm/amdgpu: define SVM attribute subsystem types
>   drm/amdgpu: implement SVM attribute tree and helper functions
>   drm/amdgpu: implement SVM attribute set, get, and clear
>   drm/amdgpu: define SVM range types and work queue interface
>   drm/amdgpu: implement SVM range GPU mapping core
>   drm/amdgpu: implement SVM range notifier and GC helpers
>   drm/amdgpu: implement SVM attribute change and invalidation callback
>   drm/amdgpu: implement SVM initialization and lifecycle
>   drm/amdgpu: add SVM ioctl, garbage collector, and fault handler
>   drm/amdgpu: integrate SVM into build system and VM fault path
> 
>  drivers/gpu/drm/amd/amdgpu/Kconfig            |  11 +
>  drivers/gpu/drm/amd/amdgpu/Makefile           |  13 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c       | 467 +++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_svm.h       | 162 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.c  | 952 ++++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.h  | 144 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.c | 368 +++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.h |  39 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c | 863 ++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.h | 148 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |  20 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        |   4 +
>  include/uapi/drm/amdgpu_drm.h                 |  39 +
>  14 files changed, 3231 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm.h
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.h
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.h
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.h
> 
> -- 
> 2.34.1
> 

  parent reply	other threads:[~2026-04-21  2:31 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-20 13:12 [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm Honglei Huang
2026-04-20 13:12 ` [RFC V3 01/12] drm/amdgpu: define SVM UAPI for GPU shared virtual memory Honglei Huang
2026-04-20 13:12 ` [RFC V3 02/12] drm/amdgpu: introduce SVM core header and VM integration Honglei Huang
2026-04-20 13:12 ` [RFC V3 03/12] drm/amdgpu: define SVM attribute subsystem types Honglei Huang
2026-04-20 13:12 ` [RFC V3 04/12] drm/amdgpu: implement SVM attribute tree and helper functions Honglei Huang
2026-04-20 13:13 ` [RFC V3 05/12] drm/amdgpu: implement SVM attribute set, get, and clear Honglei Huang
2026-04-20 13:13 ` [RFC V3 06/12] drm/amdgpu: define SVM range types and work queue interface Honglei Huang
2026-04-20 13:13 ` [RFC V3 07/12] drm/amdgpu: implement SVM range GPU mapping core Honglei Huang
2026-04-20 13:13 ` [RFC V3 08/12] drm/amdgpu: implement SVM range notifier and GC helpers Honglei Huang
2026-04-20 13:13 ` [RFC V3 09/12] drm/amdgpu: implement SVM attribute change and invalidation callback Honglei Huang
2026-04-20 13:13 ` [RFC V3 10/12] drm/amdgpu: implement SVM initialization and lifecycle Honglei Huang
2026-04-20 13:13 ` [RFC V3 11/12] drm/amdgpu: add SVM ioctl, garbage collector, and fault handler Honglei Huang
2026-04-20 16:24   ` Matthew Brost
2026-04-21 10:07     ` Huang, Honglei1
2026-04-20 13:13 ` [RFC V3 12/12] drm/amdgpu: integrate SVM into build system and VM fault path Honglei Huang
2026-04-21  2:31 ` Huang Rui [this message]
2026-04-21  9:54   ` [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm Huang, Honglei1

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aebhbe9gMAqz1tpM@amd.com \
    --to=ray.huang@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Jenny-Jing.Liu@amd.com \
    --cc=Junhua.Shen@amd.com \
    --cc=Lingshan.Zhu@amd.com \
    --cc=Oak.Zeng@amd.com \
    --cc=Philip.Yang@amd.com \
    --cc=Xiaogang.Chen@amd.com \
    --cc=aliceryhl@google.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=honghuan@amd.com \
    --cc=honglei1.huang@amd.com \
    --cc=matthew.brost@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.