[RFC V3 00/12] drm/amdgpu: SVM implementation based on drm

public inbox for dri-devel@lists.freedesktop.org
 help / color / mirror / Atom feed

* [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm
@ 2026-04-20 13:12 Honglei Huang
  2026-04-20 13:12 ` [RFC V3 01/12] drm/amdgpu: define SVM UAPI for GPU shared virtual memory Honglei Huang
                   ` (12 more replies)
  0 siblings, 13 replies; 15+ messages in thread
From: Honglei Huang @ 2026-04-20 13:12 UTC (permalink / raw)
  To: Alexander.Deucher, Felix.Kuehling, Christian.Koenig, Oak.Zeng,
	Jenny-Jing.Liu, Philip.Yang, Xiaogang.Chen, Ray.Huang,
	Lingshan.Zhu, Junhua.Shen, matthew.brost, rodrigo.vivi,
	thomas.hellstrom, dakr, aliceryhl
  Cc: amd-gfx, dri-devel, honghuan

From: Honglei Huang <honghuan@amd.com>

V3 of the SVM patch series for amdgpu based on the drm_gpusvm framework. 
This revision incorporates feedback from V1, adds XNACK on GPU fault handling,
improves code organization, and removes the XNACK off (no GPU fault) implementation
to focus on the fault driven model that aligns with drm_gpusvm's design. 
The implementation references extensively from xe_svm.

This patch series implements SVM support with the following design:

  1. Attributes separated from physical page management:

    - Attribute layer (amdgpu_svm_attr_tree): a driver-side interval
      tree storing per-range SVM attributes. Managed through SET_ATTR
      ioctl and preserved across range lifecycle events.

    - Physical page layer (drm_gpusvm ranges): managed by the
      drm_gpusvm framework, representing HMM-backed DMA mappings
      and GPU page table entries.

    This separation ensures attributes survive when GPU ranges are
    destroyed (partial munmap, attribute split, GC). The fault
    handler recreates GPU ranges from the attribute tree on demand.

  2. GPU fault driven mapping (XNACK on):

    The core mapping path is driven by GPU page faults instead of ioctls.
    amdgpu_svm_handle_fault() looks up SVM by PASID, runs GC,
    resolves attributes, then maps via find_or_insert -> get_pages
    -> GPU PTE update. For unregistered addresses, default
    attributes are derived from VMA properties automatically.

  3. MMU notifier invalidation:

    Two-phase callback: event_begin() zaps GPU PTEs and flushes
    TLB, event_end() unmaps DMA pages. UNMAP events queue ranges
    to GC for deferred cleanup. Non-UNMAP events (eviction) rely
    on GPU fault to remap.

  4. Garbage collector:

    GC workqueue processes unmapped ranges: removes them
    from drm_gpusvm and clears corresponding attributes. No
    rebuild or restore logic, GPU fault handles recreation.

Changes since V2:
  - Add version tittle in commit message.
  - Fix some content mistaken.

Changes since V1:
  - Added GPU fault handler: amdgpu_svm_handle_fault with PASID-based
    SVM lookup, following the standard flow: garbage collector ->
    find or insert range -> check valid -> migrate (TODO) / get_pages
    -> GPU bind/map.

  - Removed the restore worker queue entirely. V1 had separate GC
    and restore workers: restore workers were responsible for 
    synchronously restore in queue stop/start cause no GPU fault support.
    With XNACK on fault driven model, synchronous restore is unnecessary,
    the GPU fault handler recreates ranges on demand. The GC worker in 
    V2 is simplified to only discard ranges and clear their attributes, 
    with no rebuild or restore logic. AMDGPU_SVM_FLAG_GPU_ALWAYS_MAPPED
    support is removed as no restore worker.

  - Reworked MMU notifier callback (amdgpu_svm_range_invalidate):
    V1 had a monolithic dispatcher with flag combinations and
    queue ops (CLEAR_PTE/QUEUE_INTERVAL, UNMAP/RESTORE) plus
    begin_restore() to quiesce KFD queues. V2 uses a two-phase
    model: event_begin() zaps GPU PTEs and flushes TLB,
    event_end() unmaps DMA pages and queues UNMAP ranges to GC.
    Non-UNMAP events (eviction) just zap PTEs and let GPU fault
    remap. Removed begin_restore/end_restore callbacks,
    has_always_mapped_range() check, and NOTIFIER flag dispatch.
    Added checkpoint timestamp capture on UNMAP for fault dedup.

  - Added amdgpu_svm_range_invalidate_interval(): when userspace
    sets new attributes on a sub region of an existing attribute
    range, the attribute tree splits the old range and the new
    sub region gets different attributes. However, existing
    drm_gpusvm ranges may across the new attribute boundary
    (e.g., a 2M GPU range covers both the old and new attribute
    regions). This function walks all gpusvm ranges in the
    affected interval, zaps GPU PTEs and flushes TLB. Ranges
    that cross the new boundary and old boundary are removed 
    entirely so the GPU fault handler can recreate them with 
    boundaries aligned to the updated attribute layout.

  - On MMU_NOTIFY_UNMAP events, discard all affected gpusvm ranges
    entirely without synchronous rebuild in v1. The unmap may destroy
    more ranges than strictly necessary (e.g., a partial munmap
    hits a 2M range that extends beyond the unmapped region), but
    the attribute layer preserves the still valid attributes for
    the remaining address space. When the GPU next accesses those
    addresses, the fault handler automatically recreates the
    ranges with correct boundaries from the surviving attributes.
    This avoids the synchronous rebuild logic that V1 required 
    (unmap -> rebuild in GC/restore worker).

  - Add attribute creation for unregistered addresses:
    amdgpu_svm_range_get_unregistered_attrs() derives default
    SVM attributes from VMA properties and GPU IP capabilities
    when the faulting address has no user attributes registered.
    this feature is needed to pass ROCm user mode runtime tests:
    kfd/rocr/hip. ROCm supports no registered virtual address access
    with default SVM attributes before, so amdgpu svm needs to support.

  - Explicitly returns -EOPNOTSUPP in amdgpu_svm_init when XNACK
    is disabled. V1 attempted mixed XNACK on/off support with
    complex KFD queue quiesce/resume callbacks and ioctl driven
    mapping paths, which added substantial complexity. V2 drops
    these implementations to focus on the fault driven model.

  - Removed kgd2kfd_quiesce_mm()/resume_mm() dependency that V1
    used for XNACK off queue control. For XNACK on, the GPU fault 
    handler is the enterance for SVM range mapping, so no quiesce/resume
    is needed for this version. 

  - Added new change triggers: TRIGGER_RANGE_SPLIT, TRIGGER_PREFETCH.
    for sub attr set and prefetch trigger support.

  - Added helper functions: find_locked, get_bounds_locked,
    set_default for GPU fault handling.

  - Design questions section removed.

TODO:
  - Add multi GPU support.
  - Add XNACK off mode.
  - Add migration or prefetch. This part work is ongoing in:
    https://lore.kernel.org/amd-gfx/20260410113146.146212-1-Junhua.Shen@amd.com/

Test results:
  Tested on gfx943 (MI300X) and gfx906 (MI60) with XNACK on:
  - KFD test: 95%+ passed.
  - ROCR test: all passed.
  - HIP catch test: gfx943 (MI300X): 96% passed.
                    gfx906 (MI60):99% passed.

Patch overview:

  01/12 UAPI: DRM_AMDGPU_GEM_SVM ioctl, SVM flags, SET_ATTR/GET_ATTR
        operations, attribute types in amdgpu_drm.h.

  02/12 Core header: amdgpu_svm wrapping drm_gpusvm with refcount,
        attr_tree, GC struct, locks, and VM integration hooks.

  03/12 Attribute types: amdgpu_svm_attrs, attr_range (interval tree
        node), attr_tree, access enum, flag masks, change triggers.

  04/12 Attribute tree ops: interval tree lookup, insert, remove,
        find_locked, get_bounds_locked, set_default, and lifecycle.

  05/12 Attribute set/get/clear: validate UAPI attributes, apply to
        tree with head/tail splitting, change propagation, and query.

  06/12 Range types: amdgpu_svm_range extending drm_gpusvm_range
        with gpu_mapped state, pending ops, work queue linkage,
        and op_ctx for batch processing.

  07/12 Range GPU mapping: PTE flags computation with read_only
        support, GPU page table update, range mapping loop.

  08/12 Notifier and GC helpers: two-phase notifier events, range
        removal, GC enqueue/add with dedicated workqueue.

  09/12 Attribute change and invalidation: apply attribute triggers
        to GPU ranges, invalidate_interval for boundary realignment,
        work queue dequeue helpers, checkpoint timestamp.

  10/12 Initialization and lifecycle: kmem_cache, drm_gpusvm_init
        with chunk sizes (2M/64K/4K), XNACK detection, GC init,
        PASID lookup, TLB flush, and init/close/fini lifecycle.

  11/12 Ioctl, GC, and fault handler: ioctl dispatcher, GC worker,
        and amdgpu_svm_fault.c/h with full fault path including
        unregistered attribute derivation and retry logic.

  12/12 Build integration: Kconfig (CONFIG_DRM_AMDGPU_SVM), Makefile
        rules, ioctl registration, and amdgpu_vm fault dispatch.

Honglei Huang (12):
  drm/amdgpu: define SVM UAPI for GPU shared virtual memory
  drm/amdgpu: introduce SVM core header and VM integration
  drm/amdgpu: define SVM attribute subsystem types
  drm/amdgpu: implement SVM attribute tree and helper functions
  drm/amdgpu: implement SVM attribute set, get, and clear
  drm/amdgpu: define SVM range types and work queue interface
  drm/amdgpu: implement SVM range GPU mapping core
  drm/amdgpu: implement SVM range notifier and GC helpers
  drm/amdgpu: implement SVM attribute change and invalidation callback
  drm/amdgpu: implement SVM initialization and lifecycle
  drm/amdgpu: add SVM ioctl, garbage collector, and fault handler
  drm/amdgpu: integrate SVM into build system and VM fault path

 drivers/gpu/drm/amd/amdgpu/Kconfig            |  11 +
 drivers/gpu/drm/amd/amdgpu/Makefile           |  13 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c       | 467 +++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm.h       | 162 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.c  | 952 ++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.h  | 144 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.c | 368 +++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.h |  39 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c | 863 ++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.h | 148 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |  20 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        |   4 +
 include/uapi/drm/amdgpu_drm.h                 |  39 +
 14 files changed, 3231 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm.h
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.h
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.h
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC V3 01/12] drm/amdgpu: define SVM UAPI for GPU shared virtual memory
  2026-04-20 13:12 [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm Honglei Huang
@ 2026-04-20 13:12 ` Honglei Huang
  2026-04-20 13:12 ` [RFC V3 02/12] drm/amdgpu: introduce SVM core header and VM integration Honglei Huang
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Honglei Huang @ 2026-04-20 13:12 UTC (permalink / raw)
  To: Alexander.Deucher, Felix.Kuehling, Christian.Koenig, Oak.Zeng,
	Jenny-Jing.Liu, Philip.Yang, Xiaogang.Chen, Ray.Huang,
	Lingshan.Zhu, Junhua.Shen, matthew.brost, rodrigo.vivi,
	thomas.hellstrom, dakr, aliceryhl
  Cc: amd-gfx, dri-devel, honghuan, Honghuan He

From: Honglei Huang <honghuan@amd.com>

Add user-space API definitions for the amdgpu SVM subsystem built
on the DRM GPUSVM framework.

Define ioctl DRM_AMDGPU_GEM_SVM (0x1a) and the wrapped
DRM_IOCTL_AMDGPU_GEM_SVM command.

Memory attribute flags (AMDGPU_SVM_FLAG_*):
  HOST_ACCESS, COHERENT, HIVE_LOCAL, GPU_RO, GPU_EXEC,
  GPU_READ_MOSTLY, GPU_ALWAYS_MAPPED, EXT_COHERENT.

Operations: SET_ATTR and GET_ATTR.

Attribute types (AMDGPU_SVM_ATTR_*): PREFERRED_LOC, PREFETCH_LOC,
ACCESS, ACCESS_IN_PLACE, NO_ACCESS, SET_FLAGS, CLR_FLAGS,
GRANULARITY.

Location constants: SYSMEM (0) and UNDEFINED (0xffffffff).

Structures: drm_amdgpu_svm_attribute for per-attribute key/value
pairs, drm_amdgpu_gem_svm for the ioctl payload containing start
address, size, operation code, attribute count and user pointer.

Signed-off-by: Honghuan He <honghuan.he@amd.com>
---
 include/uapi/drm/amdgpu_drm.h | 39 +++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 9f3090db2..a315adb7f 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -58,6 +58,7 @@ extern "C" {
 #define DRM_AMDGPU_USERQ_SIGNAL		0x17
 #define DRM_AMDGPU_USERQ_WAIT		0x18
 #define DRM_AMDGPU_GEM_LIST_HANDLES	0x19
+#define DRM_AMDGPU_GEM_SVM		0x1a
 
 #define DRM_IOCTL_AMDGPU_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
 #define DRM_IOCTL_AMDGPU_GEM_MMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
@@ -79,6 +80,7 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_USERQ_SIGNAL	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ_SIGNAL, struct drm_amdgpu_userq_signal)
 #define DRM_IOCTL_AMDGPU_USERQ_WAIT	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_USERQ_WAIT, struct drm_amdgpu_userq_wait)
 #define DRM_IOCTL_AMDGPU_GEM_LIST_HANDLES DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_LIST_HANDLES, struct drm_amdgpu_gem_list_handles)
+#define DRM_IOCTL_AMDGPU_GEM_SVM	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_SVM, struct drm_amdgpu_gem_svm)
 
 /**
  * DOC: memory domains
@@ -1673,6 +1675,43 @@ struct drm_amdgpu_info_uq_metadata {
 #define AMDGPU_FAMILY_GC_11_5_4			154 /* GC 11.5.4 */
 #define AMDGPU_FAMILY_GC_12_0_0			152 /* GC 12.0.0 */
 
+#define AMDGPU_SVM_FLAG_HOST_ACCESS		0x00000001
+#define AMDGPU_SVM_FLAG_COHERENT		0x00000002
+#define AMDGPU_SVM_FLAG_HIVE_LOCAL		0x00000004
+#define AMDGPU_SVM_FLAG_GPU_RO			0x00000008
+#define AMDGPU_SVM_FLAG_GPU_EXEC		0x00000010
+#define AMDGPU_SVM_FLAG_GPU_READ_MOSTLY		0x00000020
+#define AMDGPU_SVM_FLAG_GPU_ALWAYS_MAPPED	0x00000040
+#define AMDGPU_SVM_FLAG_EXT_COHERENT		0x00000080
+
+#define AMDGPU_SVM_OP_SET_ATTR		0
+#define AMDGPU_SVM_OP_GET_ATTR		1
+
+#define AMDGPU_SVM_ATTR_PREFERRED_LOC		0
+#define AMDGPU_SVM_ATTR_PREFETCH_LOC		1
+#define AMDGPU_SVM_ATTR_ACCESS			2
+#define AMDGPU_SVM_ATTR_ACCESS_IN_PLACE		3
+#define AMDGPU_SVM_ATTR_NO_ACCESS		4
+#define AMDGPU_SVM_ATTR_SET_FLAGS		5
+#define AMDGPU_SVM_ATTR_CLR_FLAGS		6
+#define AMDGPU_SVM_ATTR_GRANULARITY		7
+
+#define AMDGPU_SVM_LOCATION_SYSMEM		0
+#define AMDGPU_SVM_LOCATION_UNDEFINED		0xffffffff
+
+struct drm_amdgpu_svm_attribute {
+	__u32 type;
+	__u32 value;
+};
+
+struct drm_amdgpu_gem_svm {
+	__u64 start_addr;
+	__u64 size;
+	__u32 operation;
+	__u32 nattr;
+	__u64 attrs_ptr;
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC V3 02/12] drm/amdgpu: introduce SVM core header and VM integration
  2026-04-20 13:12 [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm Honglei Huang
  2026-04-20 13:12 ` [RFC V3 01/12] drm/amdgpu: define SVM UAPI for GPU shared virtual memory Honglei Huang
@ 2026-04-20 13:12 ` Honglei Huang
  2026-04-20 13:12 ` [RFC V3 03/12] drm/amdgpu: define SVM attribute subsystem types Honglei Huang
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Honglei Huang @ 2026-04-20 13:12 UTC (permalink / raw)
  To: Alexander.Deucher, Felix.Kuehling, Christian.Koenig, Oak.Zeng,
	Jenny-Jing.Liu, Philip.Yang, Xiaogang.Chen, Ray.Huang,
	Lingshan.Zhu, Junhua.Shen, matthew.brost, rodrigo.vivi,
	thomas.hellstrom, dakr, aliceryhl
  Cc: amd-gfx, dri-devel, honghuan, Honghuan He

From: Honglei Huang <honghuan@amd.com>

Add amdgpu_svm.h with the SVM data structures, debug macros,
and public API declarations. Update amdgpu_vm.h to hold the
SVM context pointer.

amdgpu_svm.h provides:
- AMDGPU_SVM_TRACE / AMDGPU_SVM_ERR: debug output macros
- AMDGPU_SVM_KMEM_CACHE_CREATE / DESTROY: slab cache helpers
- amdgpu_svm_assert_in_notifier(): lockdep assertion for notifier
  lock
- struct amdgpu_svm_gc: garbage collector state with workqueue,
  list head, and work_struct for deferred range cleanup
- struct amdgpu_svm: per-VM SVM context embedding drm_gpusvm,
  kref reference count, amdgpu_device back-pointer, attribute
  tree, work queue, svm_lock mutex, garbage collector, xnack
  state, atomic checkpoint timestamp, and TLB flush callback
- to_amdgpu_svm() container_of helper

Public API declarations:
  amdgpu_svm_cache_init/fini, amdgpu_svm_init/close/fini,
  amdgpu_svm_put, amdgpu_svm_lookup_by_pasid,
  amdgpu_svm_handle_fault, amdgpu_gem_svm_ioctl,
  amdgpu_svm_gc_init/fini/flush, amdgpu_svm_garbage_collector,
  amdgpu_svm_range_clean_queue, amdgpu_svm_is_enabled

Static inline stubs when CONFIG_DRM_AMDGPU_SVM is disabled.

amdgpu_vm.h: forward-declare struct amdgpu_svm and add an svm
pointer to struct amdgpu_vm.

Signed-off-by: Honghuan He <honghuan.he@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm.h | 162 ++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |   4 +
 2 files changed, 166 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm.h
new file mode 100644
index 000000000..e298f415b
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm.h
@@ -0,0 +1,162 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright 2026 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef __AMDGPU_SVM_H__
+#define __AMDGPU_SVM_H__
+
+#include <drm/amdgpu_drm.h>
+#include <drm/drm_gpusvm.h>
+#include <linux/atomic.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/printk.h>
+#include <linux/rwsem.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+#include <linux/workqueue.h>
+
+struct amdgpu_device;
+struct amdgpu_vm;
+struct amdgpu_svm_attr_tree;
+struct drm_device;
+struct drm_file;
+
+#define AMDGPU_SVM_TRACE(fmt, ...) \
+	printk("%s: " fmt, __func__, ##__VA_ARGS__)
+
+#define AMDGPU_SVM_ERR(fmt, ...) \
+	printk("%s: " fmt, __func__, ##__VA_ARGS__)
+
+#define AMDGPU_SVM_KMEM_CACHE_CREATE(name, type) \
+	kmem_cache_create((name), sizeof(type), 0, 0, NULL)
+
+#define AMDGPU_SVM_KMEM_CACHE_DESTROY(cache) \
+	do { \
+		if ((cache) != NULL) { \
+			kmem_cache_destroy((cache)); \
+			(cache) = NULL; \
+		} \
+	} while (0)
+
+#define amdgpu_svm_assert_in_notifier(svm__) \
+	lockdep_assert_held_write(&(svm__)->gpusvm.notifier_lock)
+
+struct amdgpu_svm_gc {
+	struct workqueue_struct *wq;
+	struct list_head list;
+	struct work_struct work;
+};
+
+struct amdgpu_svm {
+	struct drm_gpusvm gpusvm;
+	struct kref refcount;
+	struct amdgpu_device *adev;
+	struct amdgpu_vm *vm;
+	struct amdgpu_svm_attr_tree *attr_tree;
+	struct rw_semaphore svm_lock;
+	spinlock_t work_lock;
+	struct amdgpu_svm_gc gc;
+	atomic_t exiting;
+	uint64_t checkpoint_ts;
+	u8 default_granularity;
+	bool xnack_enabled;
+	void (*flush_tlb)(struct amdgpu_svm *svm);
+};
+
+static inline struct amdgpu_svm *to_amdgpu_svm(struct drm_gpusvm *gpusvm)
+{
+	return container_of(gpusvm, struct amdgpu_svm, gpusvm);
+}
+
+#if IS_ENABLED(CONFIG_DRM_AMDGPU_SVM)
+int amdgpu_svm_cache_init(void);
+void amdgpu_svm_cache_fini(void);
+
+int amdgpu_svm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm);
+void amdgpu_svm_close(struct amdgpu_vm *vm);
+void amdgpu_svm_fini(struct amdgpu_vm *vm);
+
+void amdgpu_svm_put(struct amdgpu_svm *svm);
+struct amdgpu_svm *amdgpu_svm_lookup_by_pasid(struct amdgpu_device *adev,
+					       uint32_t pasid);
+int amdgpu_svm_handle_fault(struct amdgpu_device *adev, uint32_t pasid,
+			    uint64_t fault_addr, uint64_t ts,
+			    bool write_fault);
+bool amdgpu_svm_is_enabled(struct amdgpu_vm *vm);
+
+int amdgpu_gem_svm_ioctl(struct drm_device *dev, void *data,
+			 struct drm_file *filp);
+int amdgpu_svm_gc_init(struct amdgpu_svm *svm);
+void amdgpu_svm_gc_fini(struct amdgpu_svm *svm);
+void amdgpu_svm_gc_flush(struct amdgpu_svm *svm);
+int amdgpu_svm_garbage_collector(struct amdgpu_svm *svm);
+void amdgpu_svm_range_clean_queue(struct amdgpu_svm *svm,
+				  struct list_head *work_list);
+#else
+static inline int amdgpu_svm_init(struct amdgpu_device *adev,
+				  struct amdgpu_vm *vm)
+{
+	return 0;
+}
+
+static inline int amdgpu_svm_cache_init(void)
+{
+	return 0;
+}
+
+static inline void amdgpu_svm_cache_fini(void)
+{
+}
+
+static inline void amdgpu_svm_close(struct amdgpu_vm *vm)
+{
+}
+
+static inline void amdgpu_svm_fini(struct amdgpu_vm *vm)
+{
+}
+
+static inline int amdgpu_svm_handle_fault(struct amdgpu_device *adev,
+					  uint32_t pasid,
+					  uint64_t fault_addr,
+					  uint64_t ts,
+					  bool write_fault)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline bool amdgpu_svm_is_enabled(struct amdgpu_vm *vm)
+{
+	return false;
+}
+
+static inline int amdgpu_gem_svm_ioctl(struct drm_device *dev, void *data,
+				       struct drm_file *filp)
+{
+	return -EOPNOTSUPP;
+}
+#endif /* CONFIG_DRM_AMDGPU_SVM */
+
+#endif /* __AMDGPU_SVM_H__ */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 602deb8a7..9931cc0bb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -43,6 +43,7 @@ struct amdgpu_bo_va;
 struct amdgpu_job;
 struct amdgpu_bo_list_entry;
 struct amdgpu_bo_vm;
+struct amdgpu_svm;
 
 /*
  * GPUVM handling
@@ -449,6 +450,9 @@ struct amdgpu_vm {
 
 	/* cached fault info */
 	struct amdgpu_vm_fault_info fault_info;
+
+	/* SVM experimental implementation */
+	struct amdgpu_svm *svm;
 };
 
 struct amdgpu_vm_manager {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC V3 03/12] drm/amdgpu: define SVM attribute subsystem types
  2026-04-20 13:12 [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm Honglei Huang
  2026-04-20 13:12 ` [RFC V3 01/12] drm/amdgpu: define SVM UAPI for GPU shared virtual memory Honglei Huang
  2026-04-20 13:12 ` [RFC V3 02/12] drm/amdgpu: introduce SVM core header and VM integration Honglei Huang
@ 2026-04-20 13:12 ` Honglei Huang
  2026-04-20 13:12 ` [RFC V3 04/12] drm/amdgpu: implement SVM attribute tree and helper functions Honglei Huang
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Honglei Huang @ 2026-04-20 13:12 UTC (permalink / raw)
  To: Alexander.Deucher, Felix.Kuehling, Christian.Koenig, Oak.Zeng,
	Jenny-Jing.Liu, Philip.Yang, Xiaogang.Chen, Ray.Huang,
	Lingshan.Zhu, Junhua.Shen, matthew.brost, rodrigo.vivi,
	thomas.hellstrom, dakr, aliceryhl
  Cc: amd-gfx, dri-devel, honghuan, Honghuan He

From: Honglei Huang <honghuan@amd.com>

Add amdgpu_svm_attr.h defining the attribute management types
used by the SVM subsystem to track per-range memory policies.

Types:
- enum amdgpu_svm_attr_access: NONE, ENABLE, IN_PLACE access modes
- AMDGPU_SVM_PTE_FLAG_MASK / AMDGPU_SVM_MAPPING_FLAG_MASK:
  bitmasks partitioning UAPI flags into PTE-affecting vs
  mapping-affecting groups
- struct amdgpu_svm_attrs: preferred/prefetch location, flags,
  granularity, access mode per range
- struct amdgpu_svm_attr_range: interval tree node binding a page
  range [start_page, last_page] to its amdgpu_svm_attrs
- struct amdgpu_svm_attr_tree: mutex-protected interval tree
  container with linked list for iteration

- enum amdgpu_svm_attr_change_trigger: bitmask flags for
  ACCESS_CHANGE, PTE_FLAG_CHANGE, MAPPING_FLAG_CHANGE,
  LOCATION_CHANGE, GRANULARITY_CHANGE, ATTR_ONLY, RANGE_SPLIT,
  PREFETCH

Declare the full attribute API: tree create/destroy, cache
init/fini, find/get_bounds, set/get/clear, range alloc/insert,
default setting, VMA validation, and devmem/VRAM preference
helpers.

Signed-off-by: Honghuan He <honghuan.he@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.h | 144 +++++++++++++++++++
 1 file changed, 144 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.h
new file mode 100644
index 000000000..34afafdf7
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.h
@@ -0,0 +1,144 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright 2026 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef __AMDGPU_SVM_ATTR_H__
+#define __AMDGPU_SVM_ATTR_H__
+
+#include <drm/amdgpu_drm.h>
+#include <linux/interval_tree.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/rbtree.h>
+#include <linux/types.h>
+
+
+/* one fd one svm one GPU so no bit map
+ * only three status for this pattren.
+ */
+enum amdgpu_svm_attr_access {
+	AMDGPU_SVM_ACCESS_NONE = 0,
+	AMDGPU_SVM_ACCESS_ENABLE = 1,
+	AMDGPU_SVM_ACCESS_IN_PLACE = 2,
+};
+
+#define AMDGPU_SVM_PTE_FLAG_MASK \
+	(AMDGPU_SVM_FLAG_COHERENT | AMDGPU_SVM_FLAG_EXT_COHERENT | \
+	 AMDGPU_SVM_FLAG_GPU_RO | AMDGPU_SVM_FLAG_GPU_EXEC)
+
+#define AMDGPU_SVM_MAPPING_FLAG_MASK \
+	(AMDGPU_SVM_FLAG_HOST_ACCESS | AMDGPU_SVM_FLAG_HIVE_LOCAL | \
+	 AMDGPU_SVM_FLAG_GPU_READ_MOSTLY | AMDGPU_SVM_FLAG_GPU_ALWAYS_MAPPED)
+
+struct amdgpu_svm_attrs {
+	/* keep preferred_loc to adapt to kfd API */
+	int32_t preferred_loc;
+	int32_t prefetch_loc;
+	uint32_t flags;
+	uint32_t granularity;
+	enum amdgpu_svm_attr_access access;
+};
+
+struct amdgpu_svm_attr_range {
+	struct interval_tree_node it_node;
+	struct list_head list;
+	struct amdgpu_svm_attrs attrs;
+};
+
+static inline unsigned long
+amdgpu_svm_attr_start_page(const struct amdgpu_svm_attr_range *range)
+{
+	return range->it_node.start;
+}
+
+static inline unsigned long
+amdgpu_svm_attr_last_page(const struct amdgpu_svm_attr_range *range)
+{
+	return range->it_node.last;
+}
+
+struct amdgpu_svm;
+struct mm_struct;
+struct vm_area_struct;
+
+struct amdgpu_svm_attr_tree {
+	struct mutex lock;
+	struct rb_root_cached tree;
+	struct list_head range_list;
+	struct amdgpu_svm *svm;
+};
+
+enum amdgpu_svm_attr_change_trigger {
+	AMDGPU_SVM_ATTR_TRIGGER_ACCESS_CHANGE = (1U << 0),
+	AMDGPU_SVM_ATTR_TRIGGER_PTE_FLAG_CHANGE = (1U << 1),
+	AMDGPU_SVM_ATTR_TRIGGER_MAPPING_FLAG_CHANGE = (1U << 2),
+	AMDGPU_SVM_ATTR_TRIGGER_LOCATION_CHANGE = (1U << 3),
+	AMDGPU_SVM_ATTR_TRIGGER_GRANULARITY_CHANGE = (1U << 4),
+	AMDGPU_SVM_ATTR_TRIGGER_ATTR_ONLY = (1U << 5),
+	AMDGPU_SVM_ATTR_TRIGGER_RANGE_SPLIT = (1U << 6),
+	AMDGPU_SVM_ATTR_TRIGGER_PREFETCH = (1U << 7),
+};
+
+struct amdgpu_svm_attr_tree *
+amdgpu_svm_attr_tree_create(struct amdgpu_svm *svm);
+void amdgpu_svm_attr_tree_destroy(struct amdgpu_svm_attr_tree *attr_tree);
+int amdgpu_svm_attr_cache_init(void);
+void amdgpu_svm_attr_cache_fini(void);
+struct amdgpu_svm_attr_range *
+amdgpu_svm_attr_find_locked(struct amdgpu_svm_attr_tree *attr_tree,
+			   unsigned long page);
+struct amdgpu_svm_attr_range *
+amdgpu_svm_attr_get_bounds_locked(struct amdgpu_svm_attr_tree *attr_tree,
+				  unsigned long page,
+				  unsigned long *start_page,
+				  unsigned long *last_page);
+void amdgpu_svm_attr_set_default(struct amdgpu_svm *svm,
+				 struct amdgpu_svm_attrs *attrs);
+
+int amdgpu_svm_attr_set(struct amdgpu_svm_attr_tree *attr_tree,
+			   uint64_t start,
+			   uint64_t size,
+			   uint32_t nattr,
+			   const struct drm_amdgpu_svm_attribute *attrs);
+int amdgpu_svm_attr_get(struct amdgpu_svm_attr_tree *attr_tree,
+				       uint64_t start,
+				       uint64_t size,
+				       uint32_t nattr,
+				       struct drm_amdgpu_svm_attribute *attrs);
+int amdgpu_svm_attr_clear_pages(struct amdgpu_svm_attr_tree *attr_tree,
+				unsigned long start_page,
+				unsigned long last_page);
+struct amdgpu_svm_attr_range *
+amdgpu_svm_attr_range_alloc(unsigned long start_page,
+			   unsigned long last_page,
+			   const struct amdgpu_svm_attrs *attrs);
+void amdgpu_svm_attr_range_insert_locked(struct amdgpu_svm_attr_tree *attr_tree,
+					 struct amdgpu_svm_attr_range *range);
+bool amdgpu_svm_attr_devmem_possible(struct amdgpu_svm *svm,
+				     const struct amdgpu_svm_attrs *attrs);
+bool amdgpu_svm_attr_prefer_vram(struct amdgpu_svm *svm,
+				 const struct amdgpu_svm_attrs *attrs);
+struct vm_area_struct *amdgpu_svm_check_vma(struct mm_struct *mm,
+					unsigned long addr);
+
+#endif /* __AMDGPU_SVM_ATTR_H__ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC V3 04/12] drm/amdgpu: implement SVM attribute tree and helper functions
  2026-04-20 13:12 [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm Honglei Huang
                   ` (2 preceding siblings ...)
  2026-04-20 13:12 ` [RFC V3 03/12] drm/amdgpu: define SVM attribute subsystem types Honglei Huang
@ 2026-04-20 13:12 ` Honglei Huang
  2026-04-20 13:13 ` [RFC V3 05/12] drm/amdgpu: implement SVM attribute set, get, and clear Honglei Huang
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Honglei Huang @ 2026-04-20 13:12 UTC (permalink / raw)
  To: Alexander.Deucher, Felix.Kuehling, Christian.Koenig, Oak.Zeng,
	Jenny-Jing.Liu, Philip.Yang, Xiaogang.Chen, Ray.Huang,
	Lingshan.Zhu, Junhua.Shen, matthew.brost, rodrigo.vivi,
	thomas.hellstrom, dakr, aliceryhl
  Cc: amd-gfx, dri-devel, honghuan, Honghuan He

From: Honglei Huang <honghuan@amd.com>

Implement the foundational attribute tree operations in
amdgpu_svm_attr.c providing slab cache management, helper
functions, change detection, and interval tree CRUD.

Slab cache:
- amdgpu_svm_attr_cache_init/fini(): create and destroy the
  kmem_cache for amdgpu_svm_attr_range allocation

Context structures:
- struct attr_set_ctx / struct attr_get_ctx: internal state
  for tree walk operations during attribute set and get

Helpers:
- amdgpu_svm_attr_devmem_possible(): check DEVICE_PRIVATE config
  and drm_gpusvm device memory support
- amdgpu_svm_attr_prefer_vram(): detect VRAM preference from
  preferred_loc != SYSMEM and != UNDEFINED
- amdgpu_svm_check_vma(): validate VMA existence via find_vma()
- amdgpu_svm_attr_set_default(): initialize default attributes
  (SYSMEM preferred, all-access enabled, coherent flags)
- attr_equal(): compare two amdgpu_svm_attrs for equality
- attr_change_ctx_trigger(): compute a bitmask of what changed
  between old and new attributes (access, PTE flags, mapping
  flags, location, granularity)
- attr_has_access(): check if attribute access mode allows
  GPU mapping

Interval tree operations:
- attr_set_interval(): set start_page/last_page on a range node
- amdgpu_svm_attr_find_locked(): find range containing a page
- amdgpu_svm_attr_get_bounds_locked(): find effective bounds
  for a page based on adjacent ranges with matching attributes
- amdgpu_svm_attr_range_alloc(): allocate from kmem_cache
- amdgpu_svm_attr_range_insert_locked(): insert into interval
  tree and linked list
- attr_remove_range_locked(): remove and free from the tree

Signed-off-by: Honghuan He <honghuan.he@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.c | 301 +++++++++++++++++++
 1 file changed, 301 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.c

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.c
new file mode 100644
index 000000000..03ea2f005
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.c
@@ -0,0 +1,301 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright 2026 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "amdgpu_svm.h"
+#include "amdgpu_svm_attr.h"
+#include "amdgpu_svm_range.h"
+#include "amdgpu.h"
+
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/lockdep.h>
+#include <linux/minmax.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+
+#define AMDGPU_SVM_VALID_FLAG_MASK \
+	(AMDGPU_SVM_FLAG_HOST_ACCESS | AMDGPU_SVM_FLAG_COHERENT | \
+	 AMDGPU_SVM_FLAG_HIVE_LOCAL | AMDGPU_SVM_FLAG_GPU_RO | \
+	 AMDGPU_SVM_FLAG_GPU_EXEC | AMDGPU_SVM_FLAG_GPU_READ_MOSTLY | \
+	 AMDGPU_SVM_FLAG_GPU_ALWAYS_MAPPED | AMDGPU_SVM_FLAG_EXT_COHERENT)
+
+
+static struct kmem_cache *amdgpu_svm_attr_range_cache;
+
+struct attr_set_ctx {
+	uint32_t trigger;
+	struct amdgpu_svm_attrs prev_attrs;
+	struct amdgpu_svm_attr_range *attr_range;
+};
+
+struct attr_get_ctx {
+	int32_t preferred_loc;
+	int32_t prefetch_loc;
+	enum amdgpu_svm_attr_access access;
+	uint32_t granularity;
+	uint32_t flags_and;
+	uint32_t flags_or;
+	bool has_range;
+};
+
+bool amdgpu_svm_attr_devmem_possible(struct amdgpu_svm *svm,
+				     const struct amdgpu_svm_attrs *attrs)
+{
+	if (svm->adev->apu_prefer_gtt)
+		return false;
+
+	if (attrs->preferred_loc == AMDGPU_SVM_LOCATION_SYSMEM)
+		return false;
+
+	return true;
+}
+
+bool amdgpu_svm_attr_prefer_vram(struct amdgpu_svm *svm,
+				 const struct amdgpu_svm_attrs *attrs)
+{
+	if (!amdgpu_svm_attr_devmem_possible(svm, attrs))
+		return false;
+
+	if (attrs->preferred_loc != AMDGPU_SVM_LOCATION_UNDEFINED && 
+	    attrs->preferred_loc != AMDGPU_SVM_LOCATION_SYSMEM)
+		return true;
+
+	if (attrs->prefetch_loc != AMDGPU_SVM_LOCATION_UNDEFINED && 
+	    attrs->prefetch_loc != AMDGPU_SVM_LOCATION_SYSMEM)
+		return true;
+
+	return false;
+}
+
+struct vm_area_struct *amdgpu_svm_check_vma(struct mm_struct *mm,
+					unsigned long addr)
+{
+	struct vm_area_struct *vma = vma_lookup(mm, addr);
+
+	if (!vma)
+		return ERR_PTR(-EFAULT);
+
+	if (vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP))
+		return ERR_PTR(-EOPNOTSUPP);
+
+	return vma;
+}
+
+int amdgpu_svm_attr_cache_init(void)
+{
+	amdgpu_svm_attr_range_cache = AMDGPU_SVM_KMEM_CACHE_CREATE(
+				"amdgpu_svm_attr_range_cache", struct amdgpu_svm_attr_range);
+	if (!amdgpu_svm_attr_range_cache)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void amdgpu_svm_attr_cache_fini(void)
+{
+	AMDGPU_SVM_KMEM_CACHE_DESTROY(amdgpu_svm_attr_range_cache);
+}
+
+static void attr_set_interval(struct amdgpu_svm_attr_range *range,
+				unsigned long start_page,
+				unsigned long last_page)
+{
+	range->it_node.start = start_page;
+	range->it_node.last = last_page;
+}
+
+void amdgpu_svm_attr_set_default(struct amdgpu_svm *svm,
+				 struct amdgpu_svm_attrs *attrs)
+{
+	attrs->preferred_loc = AMDGPU_SVM_LOCATION_UNDEFINED;
+	attrs->prefetch_loc = AMDGPU_SVM_LOCATION_UNDEFINED;
+	attrs->granularity = svm->default_granularity;
+	attrs->flags = AMDGPU_SVM_FLAG_HOST_ACCESS | AMDGPU_SVM_FLAG_COHERENT;
+	attrs->access = svm->xnack_enabled ?
+		AMDGPU_SVM_ACCESS_ENABLE : AMDGPU_SVM_ACCESS_NONE;
+}
+
+struct amdgpu_svm_attr_range *
+amdgpu_svm_attr_find_locked(struct amdgpu_svm_attr_tree *attr_tree,
+			   unsigned long page)
+{
+	struct interval_tree_node *node;
+
+	node = interval_tree_iter_first(&attr_tree->tree, page, page);
+	if (node)
+		return container_of(node, struct amdgpu_svm_attr_range, it_node);
+
+	return NULL;
+}
+
+struct amdgpu_svm_attr_range *
+amdgpu_svm_attr_get_bounds_locked(struct amdgpu_svm_attr_tree *attr_tree,
+				  unsigned long page,
+				  unsigned long *start_page,
+				  unsigned long *last_page)
+{
+	struct amdgpu_svm_attr_range *attr_range;
+	struct interval_tree_node *node;
+	struct rb_node *rb;
+
+	attr_range = amdgpu_svm_attr_find_locked(attr_tree, page);
+	if (attr_range) {
+		*start_page = amdgpu_svm_attr_start_page(attr_range);
+		*last_page = amdgpu_svm_attr_last_page(attr_range);
+		return attr_range;
+	}
+
+	*start_page = 0;
+	*last_page = ULONG_MAX;
+
+	if (page == ULONG_MAX)
+		return NULL;
+
+	node = interval_tree_iter_first(&attr_tree->tree, page + 1, ULONG_MAX);
+	if (node) {
+		if (node->start > page)
+			*last_page = node->start - 1;
+
+		rb = rb_prev(&node->rb);
+		if (rb) {
+			node = container_of(rb, struct interval_tree_node, rb);
+			if (node->last < page)
+				*start_page = node->last + 1;
+		}
+	} else {
+		rb = rb_last(&attr_tree->tree.rb_root);
+
+		if (rb) {
+			node = container_of(rb, struct interval_tree_node, rb);
+			if (node->last < page)
+				*start_page = node->last + 1;
+		}
+	}
+
+	return NULL;
+}
+
+static bool attr_equal(const struct amdgpu_svm_attrs *a,
+				 const struct amdgpu_svm_attrs *b)
+{
+	return a->flags == b->flags &&
+	       a->preferred_loc == b->preferred_loc &&
+	       a->prefetch_loc == b->prefetch_loc &&
+		       a->granularity == b->granularity &&
+		       a->access == b->access;
+}
+
+static uint32_t
+attr_change_ctx_trigger(const struct amdgpu_svm_attrs *prev_attrs,
+		      const struct amdgpu_svm_attrs *new_attrs)
+{
+	uint32_t trigger = 0;
+	uint32_t changed_flags = prev_attrs->flags ^ new_attrs->flags;
+
+	if (prev_attrs->access != new_attrs->access)
+		trigger |= AMDGPU_SVM_ATTR_TRIGGER_ACCESS_CHANGE;
+	if (changed_flags & AMDGPU_SVM_PTE_FLAG_MASK)
+		trigger |= AMDGPU_SVM_ATTR_TRIGGER_PTE_FLAG_CHANGE;
+	if (changed_flags & AMDGPU_SVM_MAPPING_FLAG_MASK)
+		trigger |= AMDGPU_SVM_ATTR_TRIGGER_MAPPING_FLAG_CHANGE;
+	if (prev_attrs->preferred_loc != new_attrs->preferred_loc)
+		trigger |= AMDGPU_SVM_ATTR_TRIGGER_LOCATION_CHANGE;
+	if (prev_attrs->granularity != new_attrs->granularity)
+		trigger |= AMDGPU_SVM_ATTR_TRIGGER_GRANULARITY_CHANGE;
+	if (new_attrs->prefetch_loc != AMDGPU_SVM_LOCATION_UNDEFINED && 
+		new_attrs->prefetch_loc != AMDGPU_SVM_LOCATION_SYSMEM)
+		trigger |= AMDGPU_SVM_ATTR_TRIGGER_PREFETCH;
+
+	if (!trigger)
+		trigger = AMDGPU_SVM_ATTR_TRIGGER_ATTR_ONLY;
+
+	return trigger;
+}
+
+static bool attr_has_access(uint32_t nattr,
+					  const struct drm_amdgpu_svm_attribute *attrs)
+{
+	uint32_t i;
+
+	for (i = 0; i < nattr; i++) {
+		switch (attrs[i].type) {
+		case AMDGPU_SVM_ATTR_ACCESS:
+		case AMDGPU_SVM_ATTR_ACCESS_IN_PLACE:
+			return true;
+		}
+	}
+
+	return false;
+}
+
+struct amdgpu_svm_attr_range *
+amdgpu_svm_attr_range_alloc(unsigned long start_page,
+			   unsigned long last_page,
+			   const struct amdgpu_svm_attrs *attrs)
+{
+	struct amdgpu_svm_attr_range *range;
+
+	range = kmem_cache_zalloc(amdgpu_svm_attr_range_cache, GFP_KERNEL);
+	if (!range)
+		return NULL;
+
+	INIT_LIST_HEAD(&range->list);
+	attr_set_interval(range, start_page, last_page);
+	range->attrs = *attrs;
+	return range;
+}
+
+void amdgpu_svm_attr_range_insert_locked(struct amdgpu_svm_attr_tree *attr_tree,
+					 struct amdgpu_svm_attr_range *range)
+{
+	struct interval_tree_node *node;
+	struct amdgpu_svm_attr_range *next;
+
+	lockdep_assert_held(&attr_tree->lock);
+
+	node = interval_tree_iter_first(&attr_tree->tree, amdgpu_svm_attr_start_page(range),
+					ULONG_MAX);
+	if (node) {
+		next = container_of(node, struct amdgpu_svm_attr_range, it_node);
+		list_add_tail(&range->list, &next->list);
+	} else {
+		list_add_tail(&range->list, &attr_tree->range_list);
+	}
+
+	interval_tree_insert(&range->it_node, &attr_tree->tree);
+}
+
+static void attr_remove_range_locked(struct amdgpu_svm_attr_tree *attr_tree,
+					  struct amdgpu_svm_attr_range *range,
+					  bool free_range)
+{
+	lockdep_assert_held(&attr_tree->lock);
+
+	interval_tree_remove(&range->it_node, &attr_tree->tree);
+	list_del_init(&range->list);
+	if (free_range)
+		kmem_cache_free(amdgpu_svm_attr_range_cache, range);
+}
+
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC V3 05/12] drm/amdgpu: implement SVM attribute set, get, and clear
  2026-04-20 13:12 [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm Honglei Huang
                   ` (3 preceding siblings ...)
  2026-04-20 13:12 ` [RFC V3 04/12] drm/amdgpu: implement SVM attribute tree and helper functions Honglei Huang
@ 2026-04-20 13:13 ` Honglei Huang
  2026-04-20 13:13 ` [RFC V3 06/12] drm/amdgpu: define SVM range types and work queue interface Honglei Huang
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Honglei Huang @ 2026-04-20 13:13 UTC (permalink / raw)
  To: Alexander.Deucher, Felix.Kuehling, Christian.Koenig, Oak.Zeng,
	Jenny-Jing.Liu, Philip.Yang, Xiaogang.Chen, Ray.Huang,
	Lingshan.Zhu, Junhua.Shen, matthew.brost, rodrigo.vivi,
	thomas.hellstrom, dakr, aliceryhl
  Cc: amd-gfx, dri-devel, honghuan, Honghuan He

From: Honglei Huang <honghuan@amd.com>

Complete the attribute subsystem with tree modification and query
operations for setting, getting, and clearing per-range attributes.

Change propagation:
- amdgpu_svm_attr_change_ctx_set(): store change context
- amdgpu_svm_attr_apply_change(): dispatch range-level updates
  when attributes change (PTE remap, prefetch, rebuild)

Validation:
- attr_check_preferred_loc/prefetch_loc/access/flags/granularity():
  per-attribute value validation
- amdgpu_svm_attr_set_validate(): validate a UAPI attribute array
- amdgpu_svm_attr_validate_range_vma(): ensure VMAs cover the
  entire target address range

Attribute application:
- amdgpu_svm_attr_apply(): apply one UAPI attribute to an attrs
  struct, accumulating set/clear flag operations
- attr_same_attrs(): fast check if existing range already matches

Tree modification:
- amdgpu_svm_attr_set_hole(): create a new attr range in a gap
- amdgpu_svm_attr_set_existing(): modify or split an existing
  range when attributes partially overlap
- amdgpu_svm_attr_set_range(): walk the tree applying attributes
  to a page interval, handling gaps and overlaps

Lifecycle:
- amdgpu_svm_attr_tree_create(): allocate and initialize tree
- amdgpu_svm_attr_tree_destroy(): drain and free all ranges

Public API:
- amdgpu_svm_attr_set(): validate, lock, walk tree, trigger
  downstream attribute changes
- amdgpu_svm_attr_clear_pages(): remove ranges for a page interval
- amdgpu_svm_attr_get(): query attributes for an address range,
  intersecting results via attr_get_ctx_add/attr_get_ctx_to_result

Signed-off-by: Honghuan He <honghuan.he@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.c | 651 +++++++++++++++++++
 1 file changed, 651 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.c
index 03ea2f005..ed4d89ecf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.c
@@ -299,3 +299,654 @@ static void attr_remove_range_locked(struct amdgpu_svm_attr_tree *attr_tree,
 		kmem_cache_free(amdgpu_svm_attr_range_cache, range);
 }
 
+static void amdgpu_svm_attr_change_ctx_set(
+		struct attr_set_ctx *change,
+		uint32_t trigger,
+		const struct amdgpu_svm_attrs *prev_attrs)
+{
+	change->trigger = trigger;
+	change->prev_attrs = *prev_attrs;
+}
+
+static int amdgpu_svm_attr_apply_change(
+				struct amdgpu_svm *svm,
+				const struct attr_set_ctx *change)
+{
+	int ret;
+
+	lockdep_assert_held_write(&svm->svm_lock);
+
+	if (!change->trigger ||
+	    change->trigger == AMDGPU_SVM_ATTR_TRIGGER_ATTR_ONLY)
+		return 0;
+
+	ret = amdgpu_svm_range_apply_attr_change(svm, change->trigger,
+						 &change->prev_attrs,
+						 change->attr_range);
+	if (ret)
+		AMDGPU_SVM_TRACE("mapping apply failed ret=%d trigger=0x%x\n",
+				 ret, change->trigger);
+
+	return ret;
+}
+
+static inline int attr_check_preferred_loc(uint32_t value)
+{
+	/* casue one svm one gpu so value > 0 then means prefered loc is this GPU */
+	if (value == AMDGPU_SVM_LOCATION_SYSMEM || value == AMDGPU_SVM_LOCATION_UNDEFINED)
+		return 0;
+
+	return 0;
+}
+
+static inline int attr_check_prefetch_loc(uint32_t value)
+{
+	/* casue one svm one gpu so value > 0 then means prefetch loc is this GPU 
+	 * keep prefetch loc to adapt to KFD API
+	 */
+	if (value == AMDGPU_SVM_LOCATION_SYSMEM)
+		return 0;
+
+	if (value == AMDGPU_SVM_LOCATION_UNDEFINED)
+		return -EINVAL;
+
+	return 0;
+}
+
+static inline int attr_check_access(uint32_t value)
+{
+	if (!value || value == AMDGPU_SVM_LOCATION_UNDEFINED)
+		return -EINVAL;
+
+	return 0;
+}
+
+static inline int attr_check_flags(uint32_t value)
+{
+	if (value & ~AMDGPU_SVM_VALID_FLAG_MASK)
+		return -EINVAL;
+
+	if (value & AMDGPU_SVM_FLAG_GPU_ALWAYS_MAPPED) {
+		AMDGPU_SVM_TRACE("AMDGPU_SVM_FLAG_GPU_ALWAYS_MAPPED is not supported yet\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static inline int attr_check_granularity(uint32_t value)
+{
+	return 0;
+}
+
+static int
+amdgpu_svm_attr_validate_range_vma(struct amdgpu_svm_attr_tree *attr_tree,
+				   unsigned long start_page,
+				   unsigned long last_page)
+{
+	struct vm_area_struct *vma;
+	struct mm_struct *mm;
+	unsigned long start, end;
+	int ret = 0;
+
+	if (start_page > last_page)
+		return -EINVAL;
+
+	if (last_page == ULONG_MAX)
+		return -EINVAL;
+
+	start = start_page << PAGE_SHIFT;
+	end = (last_page + 1) << PAGE_SHIFT;
+	mm = attr_tree->svm->gpusvm.mm;
+	if (!mm)
+		return -EFAULT;
+
+	mmap_read_lock(mm);
+	while (start < end) {
+		vma = amdgpu_svm_check_vma(mm, start);
+		if (IS_ERR(vma)) {
+			ret = PTR_ERR(vma);
+			break;
+		}
+
+		start = min(end, vma->vm_end);
+	}
+	mmap_read_unlock(mm);
+
+	return ret;
+}
+
+static int amdgpu_svm_attr_set_validate(const struct drm_amdgpu_svm_attribute *attr)
+{
+	switch (attr->type) {
+	case AMDGPU_SVM_ATTR_PREFERRED_LOC:
+		return attr_check_preferred_loc(attr->value);
+	case AMDGPU_SVM_ATTR_PREFETCH_LOC:
+		return attr_check_prefetch_loc(attr->value);
+	case AMDGPU_SVM_ATTR_ACCESS:
+	case AMDGPU_SVM_ATTR_ACCESS_IN_PLACE:
+	case AMDGPU_SVM_ATTR_NO_ACCESS:
+		return attr_check_access(attr->value);
+	case AMDGPU_SVM_ATTR_SET_FLAGS:
+	case AMDGPU_SVM_ATTR_CLR_FLAGS:
+		return attr_check_flags(attr->value);
+	case AMDGPU_SVM_ATTR_GRANULARITY:
+		return attr_check_granularity(attr->value);
+	default:
+		return -EINVAL;
+	}
+}
+
+static void amdgpu_svm_attr_apply(struct amdgpu_svm_attrs *attrs,
+					uint32_t nattr,
+					const struct drm_amdgpu_svm_attribute *pattrs)
+{
+	const struct drm_amdgpu_svm_attribute *attr;
+
+	for (attr = pattrs; nattr--; attr++) {
+		switch (attr->type) {
+		case AMDGPU_SVM_ATTR_PREFERRED_LOC:
+			attrs->preferred_loc = (int32_t)attr->value;
+			break;
+		case AMDGPU_SVM_ATTR_PREFETCH_LOC:
+			attrs->prefetch_loc = (int32_t)attr->value;
+			break;
+		case AMDGPU_SVM_ATTR_ACCESS:
+			attrs->access = AMDGPU_SVM_ACCESS_ENABLE;
+			break;
+		case AMDGPU_SVM_ATTR_ACCESS_IN_PLACE:
+			attrs->access = AMDGPU_SVM_ACCESS_IN_PLACE;
+			break;
+		case AMDGPU_SVM_ATTR_NO_ACCESS:
+			attrs->access = AMDGPU_SVM_ACCESS_NONE;
+			break;
+		case AMDGPU_SVM_ATTR_SET_FLAGS:
+			attrs->flags |= attr->value;
+			break;
+		case AMDGPU_SVM_ATTR_CLR_FLAGS:
+			attrs->flags &= ~attr->value;
+			break;
+		case AMDGPU_SVM_ATTR_GRANULARITY:
+			attrs->granularity = min_t(uint32_t, attr->value, 0x3f);
+			break;
+		default:
+			break;
+		}
+	}
+}
+
+static bool attr_same_attrs(const struct amdgpu_svm_attr_range *range,
+			    uint32_t nattr,
+			    const struct drm_amdgpu_svm_attribute *attrs)
+{
+	struct amdgpu_svm_attrs target;
+
+	target = range->attrs;
+	amdgpu_svm_attr_apply(&target, nattr, attrs);
+	return attr_equal(&range->attrs, &target);
+}
+
+static int
+amdgpu_svm_attr_set_hole(struct amdgpu_svm_attr_tree *attr_tree,
+			  const struct amdgpu_svm_attrs *default_attrs,
+			  unsigned long start_page, unsigned long last_page,
+			  uint32_t nattr,
+			  const struct drm_amdgpu_svm_attribute *attrs,
+			  struct attr_set_ctx *change)
+{
+	struct amdgpu_svm_attrs new_attrs;
+	struct amdgpu_svm_attr_range *range;
+	uint32_t trigger;
+
+	lockdep_assert_held(&attr_tree->lock);
+
+	if (start_page > last_page)
+		return 0;
+
+	new_attrs = *default_attrs;
+	amdgpu_svm_attr_apply(&new_attrs, nattr, attrs);
+
+	/* Always create a range entry even when attrs equal defaults */
+	range = amdgpu_svm_attr_range_alloc(start_page, last_page, &new_attrs);
+	if (!range)
+		return -ENOMEM;
+
+	amdgpu_svm_attr_range_insert_locked(attr_tree, range);
+
+	trigger = attr_change_ctx_trigger(default_attrs, &new_attrs);
+	amdgpu_svm_attr_change_ctx_set(change, trigger, default_attrs);
+	change->attr_range = range;
+	return 0;
+}
+
+static int
+amdgpu_svm_attr_set_existing(struct amdgpu_svm_attr_tree *attr_tree,
+			     struct amdgpu_svm_attr_range *range,
+			     unsigned long start_page, unsigned long last_page,
+			     uint32_t nattr,
+			     const struct drm_amdgpu_svm_attribute *attrs,
+			     struct attr_set_ctx *change)
+{
+	unsigned long range_start = amdgpu_svm_attr_start_page(range);
+	unsigned long range_last = amdgpu_svm_attr_last_page(range);
+	struct amdgpu_svm_attr_range *left = NULL;
+	struct amdgpu_svm_attr_range *right = NULL;
+	struct amdgpu_svm_attrs old_attrs;
+	struct amdgpu_svm_attrs new_attrs;
+	uint32_t trigger;
+	bool force_trigger;
+
+	lockdep_assert_held(&attr_tree->lock);
+
+	old_attrs = range->attrs;
+
+	force_trigger = !attr_tree->svm->xnack_enabled && attr_has_access(nattr, attrs);
+
+	if (attr_same_attrs(range, nattr, attrs)) {
+		if (!force_trigger)
+			return 0;
+
+		amdgpu_svm_attr_change_ctx_set(change,
+						   AMDGPU_SVM_ATTR_TRIGGER_ACCESS_CHANGE,
+						   &old_attrs);
+		change->attr_range = range;
+		return 0;
+	}
+
+	new_attrs = old_attrs;
+	amdgpu_svm_attr_apply(&new_attrs, nattr, attrs);
+	trigger = attr_change_ctx_trigger(&old_attrs, &new_attrs);
+
+	/* only need to update attr */
+	if (start_page == range_start && last_page == range_last) {
+		range->attrs = new_attrs;
+		amdgpu_svm_attr_change_ctx_set(change, trigger, &old_attrs);
+		change->attr_range = range;
+		return 0;
+	}
+
+	/* split head */
+	if (start_page > range_start) {
+		left = amdgpu_svm_attr_range_alloc(range_start, start_page - 1, &old_attrs);
+		if (!left)
+			return -ENOMEM;
+	}
+
+	/* split tail */
+	if (last_page < range_last) {
+		right = amdgpu_svm_attr_range_alloc(last_page + 1, range_last, &old_attrs);
+		if (!right) {
+			if (left)
+				kmem_cache_free(amdgpu_svm_attr_range_cache, left);
+			return -ENOMEM;
+		}
+	}
+
+	attr_remove_range_locked(attr_tree, range, false);
+	if (left)
+		amdgpu_svm_attr_range_insert_locked(attr_tree, left);
+	attr_set_interval(range, start_page, last_page);
+	range->attrs = new_attrs;
+	amdgpu_svm_attr_range_insert_locked(attr_tree, range);
+	if (right)
+		amdgpu_svm_attr_range_insert_locked(attr_tree, right);
+
+	/* trigger if new attrs is set in sub old attrs range.
+	 */
+	if (left || right)
+		trigger |= AMDGPU_SVM_ATTR_TRIGGER_RANGE_SPLIT;
+
+	amdgpu_svm_attr_change_ctx_set(change, trigger, &old_attrs);
+	change->attr_range = range;
+	return 0;
+}
+
+static int
+amdgpu_svm_attr_set_range(struct amdgpu_svm_attr_tree *attr_tree,
+			  const struct amdgpu_svm_attrs *default_attrs,
+			  unsigned long start_page, unsigned long last_page,
+			  uint32_t nattr,
+			  const struct drm_amdgpu_svm_attribute *attrs)
+{
+	struct amdgpu_svm *svm = attr_tree->svm;
+	unsigned long cursor = start_page;
+	bool need_retry = false;
+
+	while (cursor <= last_page) {
+		struct interval_tree_node *node;
+		unsigned long seg_last;
+		struct attr_set_ctx change = { 0 };
+		int ret;
+
+		mutex_lock(&attr_tree->lock);
+		node = interval_tree_iter_first(&attr_tree->tree, cursor, cursor);
+		if (node) {
+			struct amdgpu_svm_attr_range *range;
+
+			range = container_of(node, struct amdgpu_svm_attr_range, it_node);
+			seg_last = min(last_page, amdgpu_svm_attr_last_page(range));
+			ret = amdgpu_svm_attr_set_existing(attr_tree, range,
+								   cursor, seg_last,
+								   nattr, attrs, &change);
+		} else {
+			struct interval_tree_node *next;
+
+			seg_last = last_page;
+			if (cursor != ULONG_MAX) {
+				next = interval_tree_iter_first(&attr_tree->tree,
+								cursor + 1,
+								ULONG_MAX);
+				if (next) {
+					struct amdgpu_svm_attr_range *next_range;
+
+					next_range = container_of(next,
+						struct amdgpu_svm_attr_range,
+						it_node);
+					seg_last = min(last_page,
+						       amdgpu_svm_attr_start_page(next_range) - 1);
+				}
+			}
+			ret = amdgpu_svm_attr_set_hole(attr_tree,
+							       default_attrs,
+							       cursor, seg_last,
+							       nattr, attrs,
+							       &change);
+		}
+		mutex_unlock(&attr_tree->lock);
+
+		if (ret)
+			return ret;
+
+		down_write(&svm->svm_lock);
+		ret = amdgpu_svm_attr_apply_change(svm, &change);
+		up_write(&svm->svm_lock);
+
+		if (ret == -EAGAIN) {
+			need_retry = true;
+			ret = 0;
+		}
+
+		if (ret)
+			return ret;
+
+		if (seg_last == ULONG_MAX || seg_last == last_page)
+			break;
+
+		cursor = seg_last + 1;
+	}
+
+	return need_retry ? -EAGAIN : 0;
+}
+
+struct amdgpu_svm_attr_tree *
+amdgpu_svm_attr_tree_create(struct amdgpu_svm *svm)
+{
+	struct amdgpu_svm_attr_tree *attr_tree;
+
+	attr_tree = kzalloc(sizeof(*attr_tree), GFP_KERNEL);
+	if (!attr_tree)
+		return NULL;
+
+	mutex_init(&attr_tree->lock);
+	attr_tree->tree = RB_ROOT_CACHED;
+	INIT_LIST_HEAD(&attr_tree->range_list);
+	attr_tree->svm = svm;
+	return attr_tree;
+}
+
+void amdgpu_svm_attr_tree_destroy(struct amdgpu_svm_attr_tree *attr_tree)
+{
+	struct amdgpu_svm_attr_range *range, *tmp;
+
+	if (!attr_tree)
+		return;
+
+	mutex_lock(&attr_tree->lock);
+	list_for_each_entry_safe(range, tmp, &attr_tree->range_list, list) {
+		interval_tree_remove(&range->it_node, &attr_tree->tree);
+		list_del_init(&range->list);
+		kmem_cache_free(amdgpu_svm_attr_range_cache, range);
+	}
+	mutex_unlock(&attr_tree->lock);
+
+	mutex_destroy(&attr_tree->lock);
+	kfree(attr_tree);
+}
+
+int amdgpu_svm_attr_set(struct amdgpu_svm_attr_tree *attr_tree,
+			uint64_t start,
+			uint64_t size,
+			uint32_t nattr,
+			const struct drm_amdgpu_svm_attribute *attrs)
+{
+	struct amdgpu_svm *svm = attr_tree->svm;
+	struct amdgpu_svm_attrs default_attrs;
+	unsigned long start_page, last_page;
+	uint32_t i;
+	int r;
+
+	start_page = start >> PAGE_SHIFT;
+	last_page = (start + size - 1) >> PAGE_SHIFT;
+
+	for (i = 0; i < nattr; i++) {
+		AMDGPU_SVM_TRACE("set attr type %u value 0x%08x for page range [%lx, %lx] xnack:%d", 
+			attrs[i].type, attrs[i].value, start_page, last_page, svm->xnack_enabled ? 1 : 0);
+		r = amdgpu_svm_attr_set_validate(&attrs[i]);
+		if (r) {
+			AMDGPU_SVM_TRACE("invalid attribute %u value 0x%08x", attrs[i].type, attrs[i].value);
+			return r;
+		}
+	}
+
+	r = amdgpu_svm_attr_validate_range_vma(attr_tree, start_page, last_page);
+	if (r)
+		return r;
+
+	amdgpu_svm_attr_set_default(attr_tree->svm, &default_attrs);
+
+retry:
+	r = amdgpu_svm_attr_set_range(attr_tree, &default_attrs,
+					       start_page, last_page,
+					       nattr, attrs);
+	if (r == -EAGAIN) {
+		AMDGPU_SVM_TRACE("attr_set retry [0x%lx-0x%lx]\n",
+				 start_page, last_page);
+		amdgpu_svm_gc_flush(svm);
+		cond_resched();
+		goto retry;
+	}
+
+	return r;
+}
+
+int amdgpu_svm_attr_clear_pages(struct amdgpu_svm_attr_tree *attr_tree,
+				unsigned long start_page,
+				unsigned long last_page)
+{
+	struct interval_tree_node *node;
+	int r = 0;
+
+	if (start_page > last_page)
+		return -EINVAL;
+
+	mutex_lock(&attr_tree->lock);
+
+	node = interval_tree_iter_first(&attr_tree->tree, start_page, last_page);
+	while (node) {
+		struct interval_tree_node *next;
+		struct amdgpu_svm_attr_range *range;
+		unsigned long range_start;
+		unsigned long range_last;
+
+		range = container_of(node, struct amdgpu_svm_attr_range, it_node);
+		next = interval_tree_iter_next(node, start_page, last_page);
+		range_start = amdgpu_svm_attr_start_page(range);
+		range_last = amdgpu_svm_attr_last_page(range);
+
+		if (range_start < start_page && range_last > last_page) {
+			struct amdgpu_svm_attr_range *tail;
+
+			tail = amdgpu_svm_attr_range_alloc(last_page + 1, range_last, &range->attrs);
+			if (!tail) {
+				r = -ENOMEM;
+				break;
+			}
+
+			attr_remove_range_locked(attr_tree, range, false);
+			attr_set_interval(range, range_start, start_page - 1);
+			amdgpu_svm_attr_range_insert_locked(attr_tree, range);
+			amdgpu_svm_attr_range_insert_locked(attr_tree, tail);
+		} else if (range_start < start_page) {
+			attr_remove_range_locked(attr_tree, range, false);
+			attr_set_interval(range, range_start, start_page - 1);
+			amdgpu_svm_attr_range_insert_locked(attr_tree, range);
+		} else if (range_last > last_page) {
+			attr_remove_range_locked(attr_tree, range, false);
+			attr_set_interval(range, last_page + 1, range_last);
+			amdgpu_svm_attr_range_insert_locked(attr_tree, range);
+		} else {
+			attr_remove_range_locked(attr_tree, range, true);
+		}
+
+		node = next;
+	}
+
+	mutex_unlock(&attr_tree->lock);
+	return r;
+}
+
+static void attr_get_ctx_add(struct attr_get_ctx *ctx,
+			       const struct amdgpu_svm_attrs *attrs)
+{
+	if (!ctx->has_range) {
+		ctx->preferred_loc = attrs->preferred_loc;
+		ctx->prefetch_loc = attrs->prefetch_loc;
+		ctx->granularity = attrs->granularity;
+		ctx->access = attrs->access;
+		ctx->flags_and = attrs->flags;
+		ctx->flags_or = attrs->flags;
+		ctx->has_range = true;
+		return;
+	}
+
+	if (ctx->preferred_loc != attrs->preferred_loc)
+		ctx->preferred_loc = AMDGPU_SVM_LOCATION_UNDEFINED;
+	if (ctx->prefetch_loc != attrs->prefetch_loc)
+		ctx->prefetch_loc = AMDGPU_SVM_LOCATION_UNDEFINED;
+	if (attrs->granularity < ctx->granularity)
+		ctx->granularity = attrs->granularity;
+	if (ctx->access != attrs->access)
+		ctx->access = AMDGPU_SVM_ACCESS_NONE;
+	ctx->flags_and &= attrs->flags;
+	ctx->flags_or |= attrs->flags;
+}
+
+static int attr_get_ctx_to_result(const struct attr_get_ctx *ctx,
+				uint32_t nattr,
+				struct drm_amdgpu_svm_attribute *attrs)
+{
+	uint32_t i;
+
+	for (i = 0; i < nattr; i++) {
+		switch (attrs[i].type) {
+		case AMDGPU_SVM_ATTR_PREFERRED_LOC:
+			attrs[i].value = ctx->preferred_loc;
+			break;
+		case AMDGPU_SVM_ATTR_PREFETCH_LOC:
+			attrs[i].value = ctx->prefetch_loc;
+			break;
+		case AMDGPU_SVM_ATTR_ACCESS:
+			if (ctx->access == AMDGPU_SVM_ACCESS_ENABLE)
+				attrs[i].type = AMDGPU_SVM_ATTR_ACCESS;
+			else if (ctx->access == AMDGPU_SVM_ACCESS_IN_PLACE)
+				attrs[i].type = AMDGPU_SVM_ATTR_ACCESS_IN_PLACE;
+			else
+				attrs[i].type = AMDGPU_SVM_ATTR_NO_ACCESS;
+			break;
+		case AMDGPU_SVM_ATTR_SET_FLAGS:
+			attrs[i].value = ctx->flags_and;
+			break;
+		case AMDGPU_SVM_ATTR_CLR_FLAGS:
+			attrs[i].value = ~ctx->flags_or;
+			break;
+		case AMDGPU_SVM_ATTR_GRANULARITY:
+			attrs[i].value = ctx->granularity;
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+int amdgpu_svm_attr_get(struct amdgpu_svm_attr_tree *attr_tree,
+			uint64_t start, uint64_t size,
+			uint32_t nattr,
+			struct drm_amdgpu_svm_attribute *attrs)
+{
+	struct amdgpu_svm_attrs default_attrs;
+	struct attr_get_ctx ctx = { 0 };
+	struct interval_tree_node *node;
+	unsigned long start_page, last_page, cursor;
+	int r;
+
+	start_page = start >> PAGE_SHIFT;
+	last_page = (start + size - 1) >> PAGE_SHIFT;
+
+	mutex_lock(&attr_tree->lock);
+	amdgpu_svm_attr_set_default(attr_tree->svm, &default_attrs);
+	node = interval_tree_iter_first(&attr_tree->tree, start_page, last_page);
+
+	if (!node) {
+		mutex_unlock(&attr_tree->lock);
+		return -EINVAL;
+	}
+
+	cursor = start_page;
+	while (cursor <= last_page) {
+		const struct amdgpu_svm_attrs *range_attrs;
+		unsigned long range_last = last_page;
+		struct amdgpu_svm_attr_range *range = NULL;
+		unsigned long next;
+
+		if (node) {
+			range = container_of(node, struct amdgpu_svm_attr_range,
+					     it_node);
+
+			if (amdgpu_svm_attr_last_page(range) < cursor) {
+				node = interval_tree_iter_next(node, start_page,
+							      last_page);
+				continue;
+			}
+
+			if (amdgpu_svm_attr_start_page(range) <= cursor) {
+				range_last = min(last_page, amdgpu_svm_attr_last_page(range));
+				node = interval_tree_iter_next(node, start_page,
+							      last_page);
+			} else {
+				range_last = min(last_page,
+						 amdgpu_svm_attr_start_page(range) - 1);
+				range = NULL;
+			}
+		}
+
+		range_attrs = range ? &range->attrs : &default_attrs;
+		attr_get_ctx_add(&ctx, range_attrs);
+
+		if (range_last == ULONG_MAX)
+			break;
+
+		next = range_last + 1;
+		if (next <= cursor)
+			break;
+		cursor = next;
+	}
+
+	if (!ctx.has_range)
+		attr_get_ctx_add(&ctx, &default_attrs);
+
+	r = attr_get_ctx_to_result(&ctx, nattr, attrs);
+	mutex_unlock(&attr_tree->lock);
+	return r;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC V3 06/12] drm/amdgpu: define SVM range types and work queue interface
  2026-04-20 13:12 [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm Honglei Huang
                   ` (4 preceding siblings ...)
  2026-04-20 13:13 ` [RFC V3 05/12] drm/amdgpu: implement SVM attribute set, get, and clear Honglei Huang
@ 2026-04-20 13:13 ` Honglei Huang
  2026-04-20 13:13 ` [RFC V3 07/12] drm/amdgpu: implement SVM range GPU mapping core Honglei Huang
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Honglei Huang @ 2026-04-20 13:13 UTC (permalink / raw)
  To: Alexander.Deucher, Felix.Kuehling, Christian.Koenig, Oak.Zeng,
	Jenny-Jing.Liu, Philip.Yang, Xiaogang.Chen, Ray.Huang,
	Lingshan.Zhu, Junhua.Shen, matthew.brost, rodrigo.vivi,
	thomas.hellstrom, dakr, aliceryhl
  Cc: amd-gfx, dri-devel, honghuan, Honghuan He

From: Honglei Huang <honghuan@amd.com>

Add amdgpu_svm_range.h defining the GPU-mapped range types and
deferred work interface for the SVM subsystem.

struct amdgpu_svm_range extends drm_gpusvm_range with:
- work_node: list linkage for deferred work processing
- gpu_mapped: active GPU mapping state flag
- gc_queued / in_queue: garbage collection and work queue state
- pending_ops / pending_start_page / pending_last_page: batched
  deferred operation tracking with address bounds
- pte_flags: cached GPU PTE flags for the current mapping
- attr_flags: cached attribute flags
- validate_timestamp: last successful mapping time

AMDGPU_SVM_RANGE_DEBUG macro for formatted debug output including
PASID, GPU virtual address range, and mapping state.

enum amdgpu_svm_range_op / struct amdgpu_svm_range_op_ctx:
define deferred operation types (AMDGPU_SVM_RANGE_OP_UNMAP) and
their parameters.

Helper macros: UNMAP_WORK(), XNACK_OFF(), NEED_REBUILD().

Declare the range API: amdgpu_svm_range_attr_pte_flags(),
amdgpu_svm_range_lock_vm_pd(), amdgpu_svm_range_pages_valid(),
amdgpu_svm_range_is_valid(), amdgpu_svm_range_update_gpu_range(),
amdgpu_svm_range_update_mapping(), amdgpu_svm_range_find_or_insert(),
amdgpu_svm_range_get_pages(), amdgpu_svm_range_remove(),
amdgpu_svm_range_map_interval(),
amdgpu_svm_range_apply_attr_change(),
amdgpu_svm_range_invalidate(),
amdgpu_svm_range_dequeue_locked(),
amdgpu_svm_range_put_if_dequeued(),
amdgpu_svm_capture_checkpoint_ts().

Signed-off-by: Honghuan He <honghuan.he@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.h | 148 ++++++++++++++++++
 1 file changed, 148 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.h
new file mode 100644
index 000000000..a32b806a7
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.h
@@ -0,0 +1,148 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright 2026 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef __AMDGPU_SVM_RANGE_H__
+#define __AMDGPU_SVM_RANGE_H__
+
+#include <drm/drm_gpusvm.h>
+
+#include "amdgpu_svm.h"
+#include "amdgpu_vm.h"
+
+#include <linux/ktime.h>
+#include <linux/list.h>
+#include <linux/types.h>
+
+struct amdgpu_svm;
+struct amdgpu_svm_attr_range;
+struct amdgpu_svm_attrs;
+struct dma_fence;
+struct drm_exec;
+struct drm_gpusvm_notifier;
+struct drm_gpusvm_range;
+struct mmu_notifier_range;
+
+struct amdgpu_svm_range {
+	struct drm_gpusvm_range base;
+	struct list_head work_node;
+	bool gpu_mapped;
+	bool gc_queued;
+	bool in_queue;
+	u8 pending_ops;
+	unsigned long pending_start_page;
+	unsigned long pending_last_page;
+	uint64_t pte_flags;
+	uint32_t attr_flags;
+	ktime_t validate_timestamp;
+};
+
+static inline struct amdgpu_svm_range *
+to_amdgpu_svm_range(struct drm_gpusvm_range *range)
+{
+	return container_of(range, struct amdgpu_svm_range, base);
+}
+
+#define AMDGPU_SVM_RANGE_DEBUG(r__, op__)						\
+	AMDGPU_SVM_TRACE("%s: pasid=%u, gpusvm=%p, mapped=%d, "	\
+			 "seqno=%lu, range: [0x%lx-0x%lx)-"		\
+			 "%lu\n",					\
+			 (op__),					\
+			 to_amdgpu_svm((r__)->base.gpusvm)->vm->pasid,	\
+			 (r__)->base.gpusvm,				\
+			 READ_ONCE((r__)->gpu_mapped),			\
+			 (r__)->base.pages.notifier_seq,		\
+			 drm_gpusvm_range_start(&(r__)->base),		\
+			 drm_gpusvm_range_end(&(r__)->base),		\
+			 drm_gpusvm_range_end(&(r__)->base) -		\
+			 drm_gpusvm_range_start(&(r__)->base))
+
+enum amdgpu_svm_range_op {
+	AMDGPU_SVM_RANGE_OP_NONE    = 0,
+	AMDGPU_SVM_RANGE_OP_UNMAP   = BIT(0),
+};
+
+struct amdgpu_svm_range_op_ctx {
+	struct amdgpu_svm_range *range;
+	unsigned long start_page;
+	unsigned long last_page;
+	uint8_t pending_ops;
+};
+
+#define UNMAP_WORK(ops)		((ops) & AMDGPU_SVM_RANGE_OP_UNMAP)
+#define XNACK_OFF(svm)		((svm)->xnack_enabled == false)
+#define NEED_REBUILD(svm)	(XNACK_OFF(svm))
+
+void amdgpu_svm_capture_checkpoint_ts(struct amdgpu_svm *svm);
+
+uint64_t amdgpu_svm_range_attr_pte_flags(struct amdgpu_svm *svm,
+					 const struct amdgpu_svm_attrs *attrs,
+					 bool read_only);
+int amdgpu_svm_range_lock_vm_pd(struct amdgpu_svm *svm,
+				struct drm_exec *exec,
+				bool intr);
+bool amdgpu_svm_range_pages_valid(struct amdgpu_svm *svm,
+				  struct amdgpu_svm_range *range);
+bool amdgpu_svm_range_is_valid(struct amdgpu_svm *svm,
+			       struct amdgpu_svm_range *range,
+			       const struct amdgpu_svm_attrs *attrs,
+			       uint64_t pte_flags);
+int amdgpu_svm_range_update_gpu_range(struct amdgpu_svm *svm,
+				      struct amdgpu_svm_range *range,
+				      uint64_t pte_flags,
+				      bool flush_tlb, bool wait,
+				      struct dma_fence **fence);
+int amdgpu_svm_range_update_mapping(struct amdgpu_svm *svm,
+				    struct amdgpu_svm_range *range,
+				    uint64_t pte_flags,
+				    uint32_t attrs_flags,
+				    bool intr, bool wait,
+				    bool flush_tlb);
+bool amdgpu_svm_range_dequeue_locked(struct amdgpu_svm *svm,
+				     struct list_head *work_list,
+				     struct amdgpu_svm_range_op_ctx *op_ctx);
+void amdgpu_svm_range_put_if_dequeued(struct amdgpu_svm *svm,
+				      struct amdgpu_svm_range *range);
+void amdgpu_svm_range_remove(struct amdgpu_svm *svm,
+			     struct amdgpu_svm_range *range,
+			     struct drm_gpusvm_ctx *ctx);
+
+int amdgpu_svm_range_map_interval(struct amdgpu_svm *svm,
+				     unsigned long start_page,
+				     unsigned long last_page);
+int amdgpu_svm_range_apply_attr_change(
+	struct amdgpu_svm *svm, uint32_t trigger,
+	const struct amdgpu_svm_attrs *prev_attrs,
+	struct amdgpu_svm_attr_range *attr_range);
+void amdgpu_svm_range_invalidate(struct amdgpu_svm *svm,
+				 struct drm_gpusvm_notifier *notifier,
+				 const struct mmu_notifier_range *mmu_range);
+struct amdgpu_svm_range *
+amdgpu_svm_range_find_or_insert(struct amdgpu_svm *svm, unsigned long addr,
+				const struct amdgpu_svm_attr_range *attr_range,
+				struct drm_gpusvm_ctx *ctx);
+int amdgpu_svm_range_get_pages(struct amdgpu_svm *svm,
+			       struct drm_gpusvm_range *range,
+			       struct drm_gpusvm_ctx *ctx);
+
+#endif /* __AMDGPU_SVM_RANGE_H__ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC V3 07/12] drm/amdgpu: implement SVM range GPU mapping core
  2026-04-20 13:12 [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm Honglei Huang
                   ` (5 preceding siblings ...)
  2026-04-20 13:13 ` [RFC V3 06/12] drm/amdgpu: define SVM range types and work queue interface Honglei Huang
@ 2026-04-20 13:13 ` Honglei Huang
  2026-04-20 13:13 ` [RFC V3 08/12] drm/amdgpu: implement SVM range notifier and GC helpers Honglei Huang
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Honglei Huang @ 2026-04-20 13:13 UTC (permalink / raw)
  To: Alexander.Deucher, Felix.Kuehling, Christian.Koenig, Oak.Zeng,
	Jenny-Jing.Liu, Philip.Yang, Xiaogang.Chen, Ray.Huang,
	Lingshan.Zhu, Junhua.Shen, matthew.brost, rodrigo.vivi,
	thomas.hellstrom, dakr, aliceryhl
  Cc: amd-gfx, dri-devel, honghuan, Honghuan He

From: Honglei Huang <honghuan@amd.com>

Implement the GPU page table mapping core for SVM ranges in
amdgpu_svm_range.c.

Internal helpers:
- range_has_access(): check amdgpu_svm_attr_access mode
- range_invalidate_gpu_mapping(): clear gpu_mapped flag
- amdgpu_svm_range_pages_valid(): validate page notifier seq
- amdgpu_svm_range_is_valid(): check PTE flags, attr flags,
  and GPU mapping state match expected values

PTE management:
- amdgpu_svm_range_zap_ptes(): clear GPU PTEs via
  amdgpu_vm_clear_range() with TLB flush
- amdgpu_svm_range_attr_pte_flags(): compute GPUVM PTE flags
  from SVM attributes, selecting MTYPE coherency mode
  (UC/NC/CC/RW) based on GC IP version (9.4.x, 11.x, 12.x),
  handling SNOOP, read-only, and executable bits

GPU mapping pipeline:
- amdgpu_svm_range_lock_vm_pd(): lock VM page directory via
  drm_exec
- amdgpu_svm_range_update_gpu_range(): walk DMA address array,
  coalescing contiguous entries per segment, call
  amdgpu_vm_update_range() with optional TLB flush and fence
  on the last segment
- amdgpu_svm_range_find_or_insert(): find or create a gpusvm
  range using attribute tree bounds, retry with read_only on
  -EPERM
- amdgpu_svm_range_get_pages(): get pages via drm_gpusvm with
  eviction fallback on -EOPNOTSUPP
- amdgpu_svm_range_update_mapping(): full pipeline: lock PD,
  take notifier lock, validate pages, update GPU PTEs, update
  PDE, flush TLB, store pte_flags/attr_flags/timestamp

Mapping drivers:
- amdgpu_svm_range_map_attr_range(): map all gpusvm ranges
  within a single attribute range
- amdgpu_svm_range_map_interval(): walk the attribute tree and
  invoke map_attr_range for each overlapping attribute range

Signed-off-by: Honghuan He <honghuan.he@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c | 489 ++++++++++++++++++
 1 file changed, 489 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c
new file mode 100644
index 000000000..790935914
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c
@@ -0,0 +1,489 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright 2026 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "amdgpu_svm.h"
+#include "amdgpu_svm_attr.h"
+#include "amdgpu_svm_range.h"
+#include "amdgpu_svm_fault.h"
+#include "amdgpu.h"
+#include "amdgpu_vm.h"
+
+#include <drm/drm_exec.h>
+#include <drm/drm_pagemap.h>
+
+#include <linux/mmu_notifier.h>
+#include <uapi/linux/kfd_ioctl.h>
+
+static inline bool
+range_has_access(enum amdgpu_svm_attr_access access)
+{
+	return access == AMDGPU_SVM_ACCESS_ENABLE ||
+	       access == AMDGPU_SVM_ACCESS_IN_PLACE;
+}
+
+static void
+range_invalidate_gpu_mapping(struct amdgpu_svm_range *range)
+{
+	WRITE_ONCE(range->gpu_mapped, false);
+}
+
+bool
+amdgpu_svm_range_pages_valid(struct amdgpu_svm *svm,
+		  struct amdgpu_svm_range *range)
+{
+	struct drm_gpusvm_range *base = &range->base;
+
+	lockdep_assert_held(&svm->gpusvm.notifier_lock);
+
+	if (base->pages.flags.unmapped || base->pages.flags.partial_unmap)
+		return false;
+
+	return drm_gpusvm_range_pages_valid(&svm->gpusvm, base);
+}
+
+bool amdgpu_svm_range_is_valid(struct amdgpu_svm *svm,
+			       struct amdgpu_svm_range *range,
+			       const struct amdgpu_svm_attrs *attrs,
+			       uint64_t pte_flags)
+{
+	unsigned int flags;
+	bool valid;
+
+	flags = memalloc_noreclaim_save();
+	drm_gpusvm_notifier_lock(&svm->gpusvm);
+	valid = range->gpu_mapped &&
+		range->pte_flags == pte_flags &&
+		range->attr_flags == attrs->flags &&
+		amdgpu_svm_range_pages_valid(svm, range);
+	drm_gpusvm_notifier_unlock(&svm->gpusvm);
+	memalloc_noreclaim_restore(flags);
+
+	return valid;
+}
+
+
+static int
+amdgpu_svm_range_zap_ptes(struct amdgpu_svm *svm,
+				      struct amdgpu_svm_range *range,
+				      const struct mmu_notifier_range *mmu_range)
+{
+	struct drm_gpusvm_range *base = &range->base;
+	struct dma_fence *fence = NULL;
+	unsigned long start_page = max(drm_gpusvm_range_start(base),
+				       mmu_range->start) >> PAGE_SHIFT;
+	unsigned long last_page = (min(drm_gpusvm_range_end(base),
+				       mmu_range->end) >> PAGE_SHIFT) - 1;
+	unsigned int flags;
+	int ret;
+
+	if (last_page < start_page)
+		return 0;
+
+	flags = memalloc_noreclaim_save();
+	ret = amdgpu_vm_update_range(svm->adev, svm->vm, false, true, true, false,
+				     NULL, start_page, last_page, 0, 0, 0, NULL,
+				     NULL, &fence);
+	memalloc_noreclaim_restore(flags);
+
+	if (!ret && fence) {
+		ret = dma_fence_wait(fence, false);
+		if (ret < 0)
+			AMDGPU_SVM_TRACE("notifier unmap fence wait failed: ret=%d [0x%lx-0x%lx]-0x%lx\n",
+					 ret, start_page, last_page,
+					 last_page - start_page + 1);
+	}
+
+	dma_fence_put(fence);
+	return ret;
+}
+
+uint64_t
+amdgpu_svm_range_attr_pte_flags(struct amdgpu_svm *svm,
+			    const struct amdgpu_svm_attrs *attrs,
+			    bool read_only)
+{
+	/* a simple pte flags func */
+	uint32_t gc_ip_version = amdgpu_ip_version(svm->adev, GC_HWIP, 0);
+	uint32_t flags = attrs->flags;
+	uint32_t mapping_flags = 0;
+	uint64_t pte_flags;
+	bool coherent = flags & (AMDGPU_SVM_FLAG_COHERENT |
+				 AMDGPU_SVM_FLAG_EXT_COHERENT);
+	bool ext_coherent = flags & AMDGPU_SVM_FLAG_EXT_COHERENT;
+	bool snoop = true;
+	unsigned int mtype_local;
+
+	switch (gc_ip_version) {
+	case IP_VERSION(9, 4, 1):
+	case IP_VERSION(9, 4, 2):
+		mapping_flags |= coherent ?
+			AMDGPU_VM_MTYPE_UC : AMDGPU_VM_MTYPE_NC;
+		break;
+	case IP_VERSION(9, 4, 3):
+	case IP_VERSION(9, 4, 4):
+	case IP_VERSION(9, 5, 0):
+		if (ext_coherent)
+			mtype_local = AMDGPU_VM_MTYPE_CC;
+		else
+			mtype_local = amdgpu_mtype_local == 1 ? AMDGPU_VM_MTYPE_NC :
+				amdgpu_mtype_local == 2 ? AMDGPU_VM_MTYPE_CC :
+				AMDGPU_VM_MTYPE_RW;
+		if (svm->adev->flags & AMD_IS_APU) {
+			if (num_possible_nodes() <= 1)
+				mapping_flags |= mtype_local;
+			else
+				mapping_flags |= ext_coherent ?
+					AMDGPU_VM_MTYPE_UC : AMDGPU_VM_MTYPE_NC;
+		} else {
+			if (gc_ip_version < IP_VERSION(9, 5, 0) || ext_coherent)
+				mapping_flags |= AMDGPU_VM_MTYPE_UC;
+			else
+				mapping_flags |= AMDGPU_VM_MTYPE_NC;
+		}
+		break;
+	case IP_VERSION(11, 0, 0):
+	case IP_VERSION(11, 0, 1):
+	case IP_VERSION(11, 0, 2):
+	case IP_VERSION(11, 0, 3):
+	case IP_VERSION(11, 0, 4):
+	case IP_VERSION(11, 5, 0):
+	case IP_VERSION(11, 5, 1):
+	case IP_VERSION(11, 5, 2):
+	case IP_VERSION(11, 5, 3):
+		mapping_flags |= coherent ?
+			AMDGPU_VM_MTYPE_UC : AMDGPU_VM_MTYPE_NC;
+		break;
+	case IP_VERSION(12, 0, 0):
+	case IP_VERSION(12, 0, 1):
+		mapping_flags |= AMDGPU_VM_MTYPE_NC;
+		break;
+	default:
+		mapping_flags |= coherent ?
+			AMDGPU_VM_MTYPE_UC : AMDGPU_VM_MTYPE_NC;
+		break;
+	}
+
+	if (flags & AMDGPU_SVM_FLAG_GPU_EXEC)
+		mapping_flags |= AMDGPU_VM_PAGE_EXECUTABLE;
+
+	pte_flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SYSTEM;
+	pte_flags |= snoop ? AMDGPU_PTE_SNOOPED : 0;
+	if (gc_ip_version >= IP_VERSION(12, 0, 0))
+		pte_flags |= AMDGPU_PTE_IS_PTE;
+
+	amdgpu_gmc_get_vm_pte(svm->adev, svm->vm, NULL, mapping_flags, &pte_flags);
+	pte_flags |= AMDGPU_PTE_READABLE;
+	if (!(flags & AMDGPU_SVM_FLAG_GPU_RO) && !read_only)
+		pte_flags |= AMDGPU_PTE_WRITEABLE;
+
+	return pte_flags;
+}
+
+
+
+int amdgpu_svm_range_lock_vm_pd(struct amdgpu_svm *svm, struct drm_exec *exec,
+				bool intr)
+{
+	unsigned int exec_flags = DRM_EXEC_IGNORE_DUPLICATES;
+	int ret;
+
+	if (intr)
+		exec_flags |= DRM_EXEC_INTERRUPTIBLE_WAIT;
+
+	drm_exec_init(exec, exec_flags, 0);
+	drm_exec_until_all_locked(exec) {
+		ret = amdgpu_vm_lock_pd(svm->vm, exec, 1);
+		drm_exec_retry_on_contention(exec);
+		if (ret) {
+			drm_exec_fini(exec);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+int
+amdgpu_svm_range_update_gpu_range(struct amdgpu_svm *svm,
+				  struct amdgpu_svm_range *range,
+				  uint64_t pte_flags,
+				  bool flush_tlb,
+				  bool wait_fence,
+				  struct dma_fence **fence)
+{
+	struct drm_gpusvm_range *base = &range->base;
+
+	lockdep_assert_held(&svm->gpusvm.notifier_lock);
+
+	const unsigned long range_start_page = drm_gpusvm_range_start(base) >> PAGE_SHIFT;
+	const unsigned long range_end_page = drm_gpusvm_range_end(base) >> PAGE_SHIFT;
+	const unsigned long npages = range_end_page - range_start_page;
+	unsigned long mapped_pages = 0;
+	unsigned long dma_idx = 0;
+	int ret;
+
+	if (!base->pages.dma_addr || !npages)
+		return -EINVAL;
+
+	while (mapped_pages < npages) {
+		const struct drm_pagemap_addr *entry = &base->pages.dma_addr[dma_idx++];
+		unsigned long seg_pages = min_t(unsigned long, 1UL << entry->order,
+						npages - mapped_pages);
+		unsigned long start_page, last_page;
+		bool is_last_seg;
+
+		if (entry->proto != DRM_INTERCONNECT_SYSTEM)
+			return -EOPNOTSUPP;
+
+		start_page = range_start_page + mapped_pages;
+		last_page = start_page + seg_pages - 1;
+		mapped_pages += seg_pages;
+		is_last_seg = mapped_pages == npages;
+
+		ret = amdgpu_vm_update_range(svm->adev, svm->vm, false, false,
+					     flush_tlb && is_last_seg, true, NULL,
+					     start_page, last_page, pte_flags,
+					     0, entry->addr, NULL, NULL,
+					     wait_fence && is_last_seg ? fence : NULL);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+struct amdgpu_svm_range *
+amdgpu_svm_range_find_or_insert(struct amdgpu_svm *svm, unsigned long addr,
+				const struct amdgpu_svm_attr_range *attr_range,
+				struct drm_gpusvm_ctx *ctx)
+{
+	unsigned long gpuva_start = amdgpu_svm_attr_start_page(attr_range) << PAGE_SHIFT;
+	unsigned long gpuva_end = (amdgpu_svm_attr_last_page(attr_range) + 1) << PAGE_SHIFT;
+	struct drm_gpusvm_range *r;
+
+retry:
+	r = drm_gpusvm_range_find_or_insert(&svm->gpusvm, addr,
+					    gpuva_start, gpuva_end, ctx);
+
+	if (PTR_ERR_OR_ZERO(r) == -EPERM && !ctx->read_only) {
+		ctx->read_only = true;
+		goto retry;
+	}
+
+	if (IS_ERR(r))
+		return ERR_CAST(r);
+
+	return to_amdgpu_svm_range(r);
+}
+
+int amdgpu_svm_range_get_pages(struct amdgpu_svm *svm,
+			       struct drm_gpusvm_range *range,
+			       struct drm_gpusvm_ctx *ctx)
+{
+	int ret;
+
+	ret = drm_gpusvm_range_get_pages(&svm->gpusvm, range, ctx);
+	if (ret == -EOPNOTSUPP) {
+		AMDGPU_SVM_ERR("range get pages failed with -EOPNOTSUPP, evicting range and retrying: gpuva=[0x%lx-0x%lx) ret=%d\n",
+				drm_gpusvm_range_start(range),
+				drm_gpusvm_range_end(range), ret);
+		drm_gpusvm_range_evict(&svm->gpusvm, range);
+	}
+
+	return ret;
+}
+
+int amdgpu_svm_range_update_mapping(struct amdgpu_svm *svm,
+				    struct amdgpu_svm_range *range,
+				    uint64_t pte_flags,
+				    uint32_t attrs_flags,
+				    bool intr, bool wait,
+				    bool flush_tlb)
+{
+	struct drm_exec exec;
+	struct dma_fence *fence = NULL;
+	unsigned int flags;
+	int ret;
+
+	ret = amdgpu_svm_range_lock_vm_pd(svm, &exec, intr);
+	if (ret)
+		return ret;
+
+	flags = memalloc_noreclaim_save();
+	drm_gpusvm_notifier_lock(&svm->gpusvm);
+
+	if (!amdgpu_svm_range_pages_valid(svm, range)) {
+		range_invalidate_gpu_mapping(range);
+		ret = -EAGAIN;
+	} else {
+		ret = amdgpu_svm_range_update_gpu_range(svm, range, pte_flags,
+							flush_tlb, wait,
+							wait ? &fence : NULL);
+	}
+
+	drm_gpusvm_notifier_unlock(&svm->gpusvm);
+	memalloc_noreclaim_restore(flags);
+
+	if (!ret && fence)
+		dma_fence_wait(fence, intr);
+	dma_fence_put(fence);
+
+	if (!ret)
+		ret = amdgpu_vm_update_pdes(svm->adev, svm->vm, false);
+
+	if (!ret) {
+		if (flush_tlb)
+			svm->flush_tlb(svm);
+		WRITE_ONCE(range->pte_flags, pte_flags);
+		WRITE_ONCE(range->attr_flags, attrs_flags);
+		WRITE_ONCE(range->gpu_mapped, true);
+		range->validate_timestamp = ktime_get_boottime();
+	}
+
+	drm_exec_fini(&exec);
+	return ret;
+}
+
+static int
+amdgpu_svm_range_map_attr_range(struct amdgpu_svm *svm,
+		       const struct amdgpu_svm_attr_range *attr_range)
+{
+	const struct amdgpu_svm_attrs *attrs = &attr_range->attrs;
+	unsigned long start = amdgpu_svm_attr_start_page(attr_range) << PAGE_SHIFT;
+	unsigned long end = (amdgpu_svm_attr_last_page(attr_range) + 1) << PAGE_SHIFT;
+	unsigned long addr = start;
+	int ret;
+	bool devmem_possible = amdgpu_svm_attr_devmem_possible(svm, attrs);
+	bool need_vram_migration = amdgpu_svm_attr_prefer_vram(svm, attrs);
+	devmem_possible = false; /* TODO: add migration */
+	struct drm_gpusvm_ctx map_ctx = {
+		.read_only = !!(attrs->flags & AMDGPU_SVM_FLAG_GPU_RO),
+		.devmem_possible = devmem_possible,
+		.devmem_only = need_vram_migration && devmem_possible,
+		.check_pages_threshold = devmem_possible ? SZ_64K : 0,
+	};
+
+	while (addr < end) {
+		struct amdgpu_svm_range *range;
+		unsigned long next_addr;
+		uint64_t range_pte_flags;
+		range = amdgpu_svm_range_find_or_insert(svm, addr,
+							attr_range, &map_ctx);
+		if (IS_ERR(range)) {
+			AMDGPU_SVM_ERR("failed to find or insert range for gpuva 0x%lx [0x%lx-0x%lx), ret=%ld\n",
+					addr, start, end, PTR_ERR(range));
+			return PTR_ERR(range);
+		}
+
+		next_addr = drm_gpusvm_range_end(&range->base);
+		if (next_addr <= addr)
+			return -EINVAL;
+
+		range_pte_flags = amdgpu_svm_range_attr_pte_flags(
+						svm, attrs, map_ctx.read_only);
+
+		if (amdgpu_svm_range_is_valid(svm, range,
+							attrs, range_pte_flags)) {
+			addr = next_addr;
+			continue;
+		}
+
+		/* TODO: add migration */
+
+		AMDGPU_SVM_RANGE_DEBUG(range, "PREFETCH - GET PAGES");
+
+		ret = amdgpu_svm_range_get_pages(svm, &range->base,
+						 &map_ctx);
+		if (ret) {
+			AMDGPU_SVM_ERR("failed to get pages for range [0x%lx-0x%lx), ret=%d\n",
+					drm_gpusvm_range_start(&range->base),
+					drm_gpusvm_range_end(&range->base), ret);
+			return ret;
+		}
+
+		AMDGPU_SVM_RANGE_DEBUG(range, "PREFETCH - UPDATE MAPPING");
+
+		ret = amdgpu_svm_range_update_mapping(svm, range,
+						      range_pte_flags,
+						      attrs->flags,
+						      true, true,
+						      true);
+		if (ret) {
+			AMDGPU_SVM_ERR("failed to update gpu mapping for range [0x%lx-0x%lx), ret=%d\n",
+					drm_gpusvm_range_start(&range->base),
+					drm_gpusvm_range_end(&range->base), ret);
+			return ret;
+		}
+
+		addr = next_addr;
+	}
+
+	return 0;
+}
+
+int
+amdgpu_svm_range_map_interval(struct amdgpu_svm *svm,
+				 unsigned long start_page,
+				 unsigned long last_page)
+{
+	lockdep_assert_held_write(&svm->svm_lock);
+
+	struct amdgpu_svm_attr_tree *attr_tree = svm->attr_tree;
+	unsigned long cursor = start_page;
+
+	while (cursor <= last_page) {
+		struct amdgpu_svm_attrs attrs;
+		struct amdgpu_svm_attr_range *attr_range;
+		unsigned long seg_last;
+		unsigned long seg_start;
+		unsigned long next;
+		int ret;
+
+		mutex_lock(&attr_tree->lock);
+		attr_range = amdgpu_svm_attr_get_bounds_locked(attr_tree, cursor,
+							       &seg_start, &seg_last);
+		if (attr_range)
+			attrs = attr_range->attrs;
+		mutex_unlock(&attr_tree->lock);
+
+		seg_last = min(seg_last, last_page);
+		if (attr_range && range_has_access(attrs.access)) {
+			/* map may fail here cause no vma or access deny */
+			ret = amdgpu_svm_range_map_attr_range(svm, attr_range);
+			if (ret)
+				return ret;
+		}
+
+		if (seg_last == ULONG_MAX || seg_last == last_page)
+			break;
+
+		next = seg_last + 1;
+		if (next <= cursor)
+			break;
+		cursor = next;
+	}
+
+	return 0;
+}
+
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC V3 08/12] drm/amdgpu: implement SVM range notifier and GC helpers
  2026-04-20 13:12 [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm Honglei Huang
                   ` (6 preceding siblings ...)
  2026-04-20 13:13 ` [RFC V3 07/12] drm/amdgpu: implement SVM range GPU mapping core Honglei Huang
@ 2026-04-20 13:13 ` Honglei Huang
  2026-04-20 13:13 ` [RFC V3 09/12] drm/amdgpu: implement SVM attribute change and invalidation callback Honglei Huang
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Honglei Huang @ 2026-04-20 13:13 UTC (permalink / raw)
  To: Alexander.Deucher, Felix.Kuehling, Christian.Koenig, Oak.Zeng,
	Jenny-Jing.Liu, Philip.Yang, Xiaogang.Chen, Ray.Huang,
	Lingshan.Zhu, Junhua.Shen, matthew.brost, rodrigo.vivi,
	thomas.hellstrom, dakr, aliceryhl
  Cc: amd-gfx, dri-devel, honghuan, Honghuan He

From: Honglei Huang <honghuan@amd.com>

Implement MMU notifier event handling, range removal, and garbage
collection helpers in amdgpu_svm_range.c.

Range lifecycle:
- amdgpu_svm_range_remove(): unmap DMA pages via drm_gpusvm,
  invalidate GPU mapping, and remove the range from drm_gpusvm

MMU notifier events:
- amdgpu_svm_range_notifier_event_begin(): zap GPU PTEs for an
  affected range under memalloc_noreclaim context, wait for
  pending fences, and flush TLB
- amdgpu_svm_range_notifier_event_end(): unmap DMA pages; for
  UNMAP events, queue the range for garbage collection

Garbage collection helpers:
- amdgpu_svm_gc_enqueue(): set pending UNMAP operation and
  address bounds on a range
- amdgpu_svm_gc_add_range(): add range to GC list under spinlock,
  mark gc_queued, schedule GC work

Invalidation:
- amdgpu_svm_range_invalidate_interval(): walk all gpusvm ranges
  in a notifier interval, dispatch begin/end notifier events,
  queue rebuild operations for non-UNMAP events when running in
  xnack-off mode

Signed-off-by: Honghuan He <honghuan.he@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c | 179 ++++++++++++++++++
 1 file changed, 179 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c
index 790935914..2e53b786c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c
@@ -487,3 +487,182 @@ amdgpu_svm_range_map_interval(struct amdgpu_svm *svm,
 	return 0;
 }
 
+void amdgpu_svm_range_remove(struct amdgpu_svm *svm,
+			     struct amdgpu_svm_range *range,
+			     struct drm_gpusvm_ctx *ctx)
+{
+	struct drm_gpusvm_range *base = &range->base;
+
+	lockdep_assert_held_write(&svm->svm_lock);
+
+	if (!base->pages.flags.unmapped && !base->pages.flags.partial_unmap)
+		drm_gpusvm_range_unmap_pages(&svm->gpusvm, base, ctx);
+
+	range_invalidate_gpu_mapping(range);
+	drm_gpusvm_range_remove(&svm->gpusvm, base);
+}
+
+static bool
+amdgpu_svm_range_notifier_event_begin(struct amdgpu_svm *svm,
+				      struct drm_gpusvm_range *range,
+				      const struct mmu_notifier_range *mmu_range)
+{
+	struct amdgpu_svm_range *svm_range = to_amdgpu_svm_range(range);
+
+	amdgpu_svm_assert_in_notifier(svm);
+
+	AMDGPU_SVM_RANGE_DEBUG(svm_range, "NOTIFIER");
+
+	if (range->pages.flags.unmapped || !svm_range->gpu_mapped)
+		return false;
+
+	AMDGPU_SVM_RANGE_DEBUG(svm_range, "NOTIFIER - EXECUTE");
+
+	amdgpu_svm_range_zap_ptes(svm, svm_range, mmu_range);
+	range_invalidate_gpu_mapping(svm_range);
+
+	return true;
+}
+
+static void
+amdgpu_svm_gc_enqueue(struct amdgpu_svm *svm,
+		      struct amdgpu_svm_range *range,
+		      unsigned long start_page, unsigned long last_page)
+{
+	if (atomic_read(&svm->exiting))
+		return;
+
+	spin_lock(&svm->work_lock);
+	if (!range->in_queue) {
+		drm_gpusvm_range_get(&range->base);
+		range->in_queue = true;
+	}
+
+	range->pending_start_page = min(range->pending_start_page, start_page);
+	range->pending_last_page = max(range->pending_last_page, last_page);
+	range->pending_ops |= AMDGPU_SVM_RANGE_OP_UNMAP;
+
+	if (!range->gc_queued) {
+		list_add_tail(&range->work_node, &svm->gc.list);
+		range->gc_queued = true;
+	}
+	spin_unlock(&svm->work_lock);
+
+	queue_work(svm->gc.wq, &svm->gc.work);
+}
+
+static void
+amdgpu_svm_gc_add_range(struct amdgpu_svm *svm,
+			struct amdgpu_svm_range *svm_range,
+			const struct mmu_notifier_range *mmu_range)
+{
+	unsigned long start_page = max(drm_gpusvm_range_start(&svm_range->base),
+				       mmu_range->start) >> PAGE_SHIFT;
+	unsigned long last_page = (min(drm_gpusvm_range_end(&svm_range->base),
+				       mmu_range->end) >> PAGE_SHIFT) - 1;
+
+	AMDGPU_SVM_RANGE_DEBUG(svm_range, "GARBAGE COLLECTOR ADD");
+
+	drm_gpusvm_range_set_unmapped(&svm_range->base, mmu_range);
+	amdgpu_svm_gc_enqueue(svm, svm_range, start_page, last_page);
+}
+
+static void
+amdgpu_svm_range_notifier_event_end(struct amdgpu_svm *svm,
+				    struct drm_gpusvm_range *range,
+				    const struct mmu_notifier_range *mmu_range)
+{
+	struct drm_gpusvm_ctx ctx = { .in_notifier = true, };
+
+	amdgpu_svm_assert_in_notifier(svm);
+
+	drm_gpusvm_range_unmap_pages(&svm->gpusvm, range, &ctx);
+	if (mmu_range->event == MMU_NOTIFY_UNMAP)
+		amdgpu_svm_gc_add_range(svm, to_amdgpu_svm_range(range),
+					mmu_range);
+}
+
+static int
+amdgpu_svm_range_invalidate_interval(struct amdgpu_svm *svm,
+				     unsigned long start_page,
+				     unsigned long last_page)
+{
+	unsigned long start = start_page << PAGE_SHIFT;
+	unsigned long end = (last_page + 1) << PAGE_SHIFT;
+	struct drm_gpusvm_notifier *notifier, *next_notifier;
+	struct drm_gpusvm_ctx ctx = { .in_notifier = false };
+	struct drm_exec exec;
+	struct dma_fence *fence = NULL;
+	bool needs_flush = false;
+	unsigned int flags;
+	int ret;
+
+	lockdep_assert_held_write(&svm->svm_lock);
+
+	ret = amdgpu_svm_range_lock_vm_pd(svm, &exec, true);
+	if (ret)
+		return ret;
+
+	drm_gpusvm_for_each_notifier_safe(notifier, next_notifier, &svm->gpusvm,
+					  start, end)
+	{
+		struct drm_gpusvm_range *range, *next_range;
+
+		drm_gpusvm_for_each_range_safe(range, next_range, notifier,
+					       start, end)
+		{
+			struct amdgpu_svm_range *svm_range = to_amdgpu_svm_range(range);
+			unsigned long range_start = drm_gpusvm_range_start(range);
+			unsigned long range_end = drm_gpusvm_range_end(range);
+			unsigned long rs = range_start >> PAGE_SHIFT;
+			unsigned long rl = (range_end >> PAGE_SHIFT) - 1;
+			bool crosses_boundary = start > range_start || end < range_end;
+
+			if (svm_range->gpu_mapped) {
+				AMDGPU_SVM_RANGE_DEBUG(svm_range, crosses_boundary ? "ATTR DESTROY" :
+										"ATTR ZAP PTE");
+
+				flags = memalloc_noreclaim_save();
+				ret = amdgpu_vm_update_range(svm->adev, svm->vm, false, true, true,
+									false, NULL, rs, rl, 0, 0, 0, NULL, NULL, &fence);
+				memalloc_noreclaim_restore(flags);
+
+				if (!ret && fence) {
+					dma_fence_wait(fence, false);
+					dma_fence_put(fence);
+					fence = NULL;
+				}
+
+				if (ret) {
+					AMDGPU_SVM_TRACE(
+						"attr invalidate PTE clear failed: ret=%d [0x%lx-0x%lx]\n",
+						ret, rs, rl);
+					drm_exec_fini(&exec);
+					return ret;
+				}
+				needs_flush = true;
+			}
+
+			if (crosses_boundary) {
+				/* remove the ranges crosses boundary to let GPU fault create new ranges
+				 * bounded by the updated attr_range boundaries.
+				 */
+				amdgpu_svm_range_remove(svm, svm_range, &ctx);
+			} else {
+				range_invalidate_gpu_mapping(svm_range);
+			}
+		}
+	}
+
+	drm_exec_fini(&exec);
+
+	if (needs_flush)
+		svm->flush_tlb(svm);
+
+	AMDGPU_SVM_TRACE("attr invalidate done [0x%lx-0x%lx]-0x%lx needs_flush=%d\n",
+			 start_page, last_page, last_page - start_page + 1,
+			 needs_flush ? 1 : 0);
+
+	return 0;
+}
+
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC V3 09/12] drm/amdgpu: implement SVM attribute change and invalidation callback
  2026-04-20 13:12 [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm Honglei Huang
                   ` (7 preceding siblings ...)
  2026-04-20 13:13 ` [RFC V3 08/12] drm/amdgpu: implement SVM range notifier and GC helpers Honglei Huang
@ 2026-04-20 13:13 ` Honglei Huang
  2026-04-20 13:13 ` [RFC V3 10/12] drm/amdgpu: implement SVM initialization and lifecycle Honglei Huang
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Honglei Huang @ 2026-04-20 13:13 UTC (permalink / raw)
  To: Alexander.Deucher, Felix.Kuehling, Christian.Koenig, Oak.Zeng,
	Jenny-Jing.Liu, Philip.Yang, Xiaogang.Chen, Ray.Huang,
	Lingshan.Zhu, Junhua.Shen, matthew.brost, rodrigo.vivi,
	thomas.hellstrom, dakr, aliceryhl
  Cc: amd-gfx, dri-devel, honghuan, Honghuan He

From: Honglei Huang <honghuan@amd.com>

Implement attribute change handling, work queue helpers, and the
top-level MMU invalidation callback in amdgpu_svm_range.c.

Attribute change handling:
- amdgpu_svm_range_apply_attr_change(): apply attribute triggers
  to existing GPU ranges. For ACCESS_CHANGE: remap accessible
  ranges or zap PTEs for disabled access. For PTE_FLAG_CHANGE:
  update mappings with new flags. For LOCATION/MAPPING_FLAG
  changes and PREFETCH: remap the full interval. For ATTR_ONLY
  or RANGE_SPLIT with xnack-off: trigger full rebuild. Walk
  gpusvm ranges within attribute range bounds.

Work queue helpers:
- amdgpu_svm_range_dequeue_locked(): dequeue ranges with pending
  operations from the work list for batch processing
- range_try_dequeue(): CAS-based attempt to move a range from
  queued to dequeued state
- amdgpu_svm_range_put_if_dequeued(): release a dequeued range
  back to idle state

Timestamp:
- amdgpu_svm_capture_checkpoint_ts(): capture current ktime for
  fault deduplication

Main invalidation callback:
- amdgpu_svm_range_invalidate(): the drm_gpusvm_ops.invalidate
  callback invoked by MMU notifiers. Iterates all overlapping
  notifier intervals and delegates to
  amdgpu_svm_range_invalidate_interval().

Signed-off-by: Honghuan He <honghuan.he@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c | 195 ++++++++++++++++++
 1 file changed, 195 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c
index 2e53b786c..e039784c4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c
@@ -666,3 +666,198 @@ amdgpu_svm_range_invalidate_interval(struct amdgpu_svm *svm,
 	return 0;
 }
 
+int amdgpu_svm_range_apply_attr_change(struct amdgpu_svm *svm,
+				       uint32_t trigger,
+				       const struct amdgpu_svm_attrs *prev_attrs,
+				       struct amdgpu_svm_attr_range *attr_range)
+{
+	const struct amdgpu_svm_attrs *new_attrs = &attr_range->attrs;
+	unsigned long start_page = amdgpu_svm_attr_start_page(attr_range);
+	unsigned long last_page = amdgpu_svm_attr_last_page(attr_range);
+	bool old_access, new_access;
+	bool update_mapping = false;
+	int ret;
+
+	lockdep_assert_held_write(&svm->svm_lock);
+
+	old_access = range_has_access(prev_attrs->access);
+	new_access = range_has_access(new_attrs->access);
+
+	AMDGPU_SVM_TRACE("attr change trigger=0x%x old_access=%d new_access=%d [0x%lx-0x%lx]-0x%lx, xnack=%d\n",
+			 trigger, old_access, new_access, start_page, last_page, last_page - start_page + 1,
+			 svm->xnack_enabled ? 1 : 0);
+
+	if (trigger & AMDGPU_SVM_ATTR_TRIGGER_ACCESS_CHANGE) {
+		if (!new_access && old_access) {
+			/*
+			 * Do nothing align with kfd svm
+			 * TODO: unmap ranges from GPU that lost access
+			 */
+			AMDGPU_SVM_TRACE("skip unmap ioctl operation [0x%lx-0x%lx]-0x%lx\n",
+					 start_page, last_page, last_page - start_page + 1);
+		} else if (new_access) {
+			if (XNACK_OFF(svm) ||
+			    (new_attrs->flags & AMDGPU_SVM_FLAG_GPU_ALWAYS_MAPPED))
+				update_mapping = true;
+		}
+	}
+
+	if ((trigger & (AMDGPU_SVM_ATTR_TRIGGER_PTE_FLAG_CHANGE |
+			AMDGPU_SVM_ATTR_TRIGGER_MAPPING_FLAG_CHANGE)) &&
+	    new_access && XNACK_OFF(svm))
+		/* only do mapping update when xnack off */
+		update_mapping = true;
+
+	if (trigger & AMDGPU_SVM_ATTR_TRIGGER_PREFETCH) {
+		/* only do prefetch when xnack on */
+		update_mapping = true;
+	}
+
+	if (!XNACK_OFF(svm) &&
+	    (trigger & AMDGPU_SVM_ATTR_TRIGGER_RANGE_SPLIT)) {
+		AMDGPU_SVM_TRACE("attr split invalidate [0x%lx-0x%lx]-0x%lx\n",
+				 start_page, last_page,
+				 last_page - start_page + 1);
+		ret = amdgpu_svm_range_invalidate_interval(svm, start_page,
+							    last_page);
+		if (ret) {
+			AMDGPU_SVM_ERR("failed to invalidate range for attr split: [0x%lx-0x%lx], ret=%d\n",
+				start_page, last_page, ret);
+			return ret;
+		}
+	}
+
+	if (!update_mapping)
+		return 0;
+
+	return amdgpu_svm_range_map_attr_range(svm, attr_range);
+}
+
+bool
+amdgpu_svm_range_dequeue_locked(struct amdgpu_svm *svm,
+					struct list_head *work_list,
+					struct amdgpu_svm_range_op_ctx *op_ctx)
+{
+	struct amdgpu_svm_range *range;
+
+	lockdep_assert_held(&svm->work_lock);
+
+	range = list_first_entry_or_null(work_list, struct amdgpu_svm_range,
+					work_node);
+	if (!range)
+		return false;
+
+	list_del_init(&range->work_node);
+	range->gc_queued = false;
+
+	op_ctx->range = range;
+	op_ctx->start_page = range->pending_start_page;
+	op_ctx->last_page = range->pending_last_page;
+	op_ctx->pending_ops = range->pending_ops;
+
+	range->pending_start_page = ULONG_MAX;
+	range->pending_last_page = 0;
+	range->pending_ops = AMDGPU_SVM_RANGE_OP_NONE;
+
+	return true;
+}
+
+static bool
+range_try_dequeue(struct amdgpu_svm_range *range)
+{
+	if (!range->in_queue)
+		return false;
+
+	if (range->gc_queued ||
+	    range->pending_start_page <= range->pending_last_page ||
+	    range->pending_ops != AMDGPU_SVM_RANGE_OP_NONE)
+		return false;
+
+	range->in_queue = false;
+	return true;
+}
+
+void
+amdgpu_svm_range_put_if_dequeued(struct amdgpu_svm *svm,
+				     struct amdgpu_svm_range *range)
+{
+	bool dequeue;
+
+	spin_lock(&svm->work_lock);
+	dequeue = range_try_dequeue(range);
+	spin_unlock(&svm->work_lock);
+
+	if (dequeue)
+		drm_gpusvm_range_put(&range->base);
+}
+
+void amdgpu_svm_capture_checkpoint_ts(struct amdgpu_svm *svm)
+{
+	struct amdgpu_device *adev = svm->adev;
+	struct amdgpu_ih_ring *ih;
+	uint32_t checkpoint_wptr;
+
+	if (!adev->irq.retry_cam_enabled && adev->irq.ih1.ring_size) {
+		ih = &adev->irq.ih1;
+		checkpoint_wptr = amdgpu_ih_get_wptr(adev, ih);
+		if (ih->rptr != checkpoint_wptr) {
+			WRITE_ONCE(svm->checkpoint_ts,
+				   amdgpu_ih_decode_iv_ts(adev, ih,
+							  checkpoint_wptr, -1));
+			return;
+		}
+	}
+
+	ih = &adev->irq.ih_soft;
+	checkpoint_wptr = amdgpu_ih_get_wptr(adev, ih);
+	if (ih->rptr != checkpoint_wptr)
+		WRITE_ONCE(svm->checkpoint_ts,
+			   amdgpu_ih_decode_iv_ts(adev, ih,
+						  checkpoint_wptr, -1));
+}
+
+void amdgpu_svm_range_invalidate(struct amdgpu_svm *svm,
+				 struct drm_gpusvm_notifier *notifier,
+				 const struct mmu_notifier_range *mmu_range)
+{
+	struct drm_gpusvm_range *r, *first;
+	uint64_t adj_start = mmu_range->start, adj_end = mmu_range->end;
+	bool needs_flush = false;
+
+	amdgpu_svm_assert_in_notifier(svm);
+
+	AMDGPU_SVM_TRACE("INVALIDATE: pasid=%u, gpusvm=%p, seqno=%lu, [0x%016lx-0x%016lx]-0x%lx, event=%d\n",
+			 svm->vm->pasid, &svm->gpusvm,
+			 notifier->notifier.invalidate_seq,
+			 mmu_range->start, mmu_range->end,
+			 mmu_range->end - mmu_range->start, mmu_range->event);
+
+	if (mmu_range->event == MMU_NOTIFY_RELEASE)
+		return;
+	if (atomic_read(&svm->exiting))
+		return;
+
+	adj_start = max(drm_gpusvm_notifier_start(notifier), adj_start);
+	adj_end = min(drm_gpusvm_notifier_end(notifier), adj_end);
+
+	first = drm_gpusvm_range_find(notifier, adj_start, adj_end);
+	if (!first)
+		return;
+
+	if (mmu_range->event == MMU_NOTIFY_UNMAP)
+		amdgpu_svm_capture_checkpoint_ts(svm);
+
+	r = first;
+	drm_gpusvm_for_each_range(r, notifier, adj_start, adj_end)
+		needs_flush |= amdgpu_svm_range_notifier_event_begin(svm, r,
+								     mmu_range);
+	if (!needs_flush)
+		goto range_notifier_event_end;
+
+	svm->flush_tlb(svm);
+
+range_notifier_event_end:
+	r = first;
+	drm_gpusvm_for_each_range(r, notifier, adj_start, adj_end)
+		amdgpu_svm_range_notifier_event_end(svm, r, mmu_range);
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC V3 10/12] drm/amdgpu: implement SVM initialization and lifecycle
  2026-04-20 13:12 [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm Honglei Huang
                   ` (8 preceding siblings ...)
  2026-04-20 13:13 ` [RFC V3 09/12] drm/amdgpu: implement SVM attribute change and invalidation callback Honglei Huang
@ 2026-04-20 13:13 ` Honglei Huang
  2026-04-20 13:13 ` [RFC V3 11/12] drm/amdgpu: add SVM ioctl, garbage collector, and fault handler Honglei Huang
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Honglei Huang @ 2026-04-20 13:13 UTC (permalink / raw)
  To: Alexander.Deucher, Felix.Kuehling, Christian.Koenig, Oak.Zeng,
	Jenny-Jing.Liu, Philip.Yang, Xiaogang.Chen, Ray.Huang,
	Lingshan.Zhu, Junhua.Shen, matthew.brost, rodrigo.vivi,
	thomas.hellstrom, dakr, aliceryhl
  Cc: amd-gfx, dri-devel, honghuan, Honghuan He

From: Honglei Huang <honghuan@amd.com>

Implement the SVM core module in amdgpu_svm.c providing
initialization, teardown, and drm_gpusvm integration.

Constants:
- AMDGPU_SVM_MAX_ATTRS (64): maximum per-ioctl attribute count
- AMDGPU_SVM_DEFAULT_SVM_NOTIFIER_SIZE (512 MB)
- AMDGPU_SVM_GC_WQ_NAME: garbage collector workqueue name

drm_gpusvm callbacks (struct drm_gpusvm_ops):
- amdgpu_svm_invalidate(): forward to range invalidation handler
- amdgpu_svm_range_alloc(): kmem_cache alloc with INIT_LIST_HEAD
- amdgpu_svm_range_free(): kmem_cache free

Reference counting:
- amdgpu_svm_release(): kref release callback, kfrees amdgpu_svm
- amdgpu_svm_put(): decrement refcount via kref_put

PASID lookup:
- amdgpu_svm_lookup_by_pasid(): find amdgpu_svm by PASID from
  the VM manager xarray, take kref reference for async safety

Module cache:
- amdgpu_svm_cache_init/fini(): create/destroy kmem_caches for
  amdgpu_svm_range

Internal helpers:
- amdgpu_svm_set_attr/get_attr(): dispatch to attribute tree
  set/get operations
- amdgpu_svm_default_xnack_enabled(): detect xnack support by
  GC IP version (9.4.x+ enables xnack)
- amdgpu_svm_flush_tlb_compute(): TLB flush for compute VMs

Initialization:
- amdgpu_svm_init_with_ops(): allocate amdgpu_svm, create
  attribute tree, init drm_gpusvm with fault chunk sizes
  (2M/64K/4K), set up garbage collector, work queue, xnack
  state, and TLB flush callback
- amdgpu_svm_init_compute/amdgpu_svm_init(): public entry points

Teardown:
- amdgpu_svm_close(): flush pending work, clean remaining ranges,
  finalize drm_gpusvm, destroy attribute tree, release kref
- amdgpu_svm_fini(): final cleanup
- amdgpu_svm_is_enabled(): check if SVM is active on a VM

Signed-off-by: Honghuan He <honghuan.he@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c | 318 ++++++++++++++++++++++++
 1 file changed, 318 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c
new file mode 100644
index 000000000..5fbed9b9f
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c
@@ -0,0 +1,318 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright 2026 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include <linux/sched/mm.h>
+#include <linux/uaccess.h>
+#include <linux/xarray.h>
+
+#include <drm/drm_file.h>
+
+#include "amdgpu.h"
+#include "amdgpu_svm.h"
+#include "amdgpu_svm_attr.h"
+#include "amdgpu_svm_fault.h"
+#include "amdgpu_svm_range.h"
+#include "amdgpu_vm.h"
+
+#if IS_ENABLED(CONFIG_DRM_AMDGPU_SVM)
+
+#define AMDGPU_SVM_MAX_ATTRS 64
+#define AMDGPU_SVM_DEFAULT_SVM_NOTIFIER_SIZE 512
+
+static const unsigned long amdgpu_svm_chunk_sizes[] = {
+	SZ_2M,
+	SZ_64K,
+	SZ_4K,
+};
+
+#define AMDGPU_SVM_GC_WQ_NAME "amdgpu_svm_gc"
+
+static struct kmem_cache *amdgpu_svm_range_cache;
+
+static void amdgpu_svm_invalidate(struct drm_gpusvm *gpusvm,
+				  struct drm_gpusvm_notifier *notifier,
+				  const struct mmu_notifier_range *mmu_range)
+{
+	amdgpu_svm_range_invalidate(to_amdgpu_svm(gpusvm), notifier, mmu_range);
+}
+
+static struct drm_gpusvm_range *amdgpu_svm_range_alloc(struct drm_gpusvm *gpusvm)
+{
+	struct amdgpu_svm_range *range;
+
+	range = kmem_cache_zalloc(amdgpu_svm_range_cache, GFP_KERNEL);
+	if (!range)
+		return NULL;
+
+	INIT_LIST_HEAD(&range->work_node);
+	range->pending_start_page = ULONG_MAX;
+	return &range->base;
+}
+
+static void amdgpu_svm_range_free(struct drm_gpusvm_range *range)
+{
+	kmem_cache_free(amdgpu_svm_range_cache, to_amdgpu_svm_range(range));
+}
+
+static const struct drm_gpusvm_ops amdgpu_gpusvm_ops = {
+	.range_alloc = amdgpu_svm_range_alloc,
+	.range_free = amdgpu_svm_range_free,
+	.invalidate = amdgpu_svm_invalidate,
+};
+
+static void amdgpu_svm_release(struct kref *ref)
+{
+	kfree(container_of(ref, struct amdgpu_svm, refcount));
+}
+
+void amdgpu_svm_put(struct amdgpu_svm *svm)
+{
+	if (svm)
+		kref_put(&svm->refcount, amdgpu_svm_release);
+}
+
+struct amdgpu_svm *
+amdgpu_svm_lookup_by_pasid(struct amdgpu_device *adev, uint32_t pasid)
+{
+	struct amdgpu_svm *svm = NULL;
+	struct amdgpu_vm *vm;
+	unsigned long irqflags;
+
+	xa_lock_irqsave(&adev->vm_manager.pasids, irqflags);
+	vm = xa_load(&adev->vm_manager.pasids, pasid);
+	if (vm && vm->svm) {
+		svm = vm->svm;
+		kref_get(&svm->refcount);
+	}
+	xa_unlock_irqrestore(&adev->vm_manager.pasids, irqflags);
+
+	return svm;
+}
+
+int amdgpu_svm_cache_init(void)
+{
+	int ret = 0;
+
+	if (amdgpu_svm_range_cache)
+		return 0;
+
+	amdgpu_svm_range_cache = AMDGPU_SVM_KMEM_CACHE_CREATE("amdgpu_svm_range_cache",
+								 struct amdgpu_svm_range);
+	if (!amdgpu_svm_range_cache)
+		return -ENOMEM;
+
+	ret = amdgpu_svm_attr_cache_init();
+	if (ret)
+		goto free_out;
+
+	return 0;
+free_out:
+	amdgpu_svm_attr_cache_fini();
+	AMDGPU_SVM_KMEM_CACHE_DESTROY(amdgpu_svm_range_cache);
+	return ret;
+}
+
+void amdgpu_svm_cache_fini(void)
+{
+	if (!amdgpu_svm_range_cache)
+		return;
+
+	amdgpu_svm_attr_cache_fini();
+	AMDGPU_SVM_KMEM_CACHE_DESTROY(amdgpu_svm_range_cache);
+}
+
+static int amdgpu_svm_set_attr(struct amdgpu_vm *vm,
+			      uint64_t start,
+			      uint64_t size,
+			      uint32_t nattr,
+			      const struct drm_amdgpu_svm_attribute *attrs)
+{
+	struct amdgpu_svm *svm = vm->svm;
+
+	amdgpu_svm_gc_flush(svm);
+
+	return amdgpu_svm_attr_set(svm->attr_tree, start, size, nattr,
+				   attrs);
+}
+
+static int amdgpu_svm_get_attr(struct amdgpu_vm *vm,
+			      uint64_t start,
+			      uint64_t size,
+			      uint32_t nattr,
+			      struct drm_amdgpu_svm_attribute *attrs)
+{
+	return amdgpu_svm_attr_get(vm->svm->attr_tree, start, size, nattr, attrs);
+}
+
+static bool amdgpu_svm_default_xnack_enabled(struct amdgpu_device *adev)
+{
+	uint32_t gc_ver = amdgpu_ip_version(adev, GC_HWIP, 0);
+
+	if (gc_ver < IP_VERSION(9, 0, 1))
+		return false;
+	if (!amdgpu_sriov_xnack_support(adev))
+		return false;
+
+	switch (gc_ver) {
+	case IP_VERSION(9, 4, 2):
+	case IP_VERSION(9, 4, 3):
+	case IP_VERSION(9, 4, 4):
+	case IP_VERSION(9, 5, 0):
+		return true;
+	default:
+		break;
+	}
+	if (gc_ver >= IP_VERSION(10, 1, 1))
+		return false;
+	return !adev->gmc.noretry;
+}
+
+static void amdgpu_svm_flush_tlb_compute(struct amdgpu_svm *svm)
+{
+	amdgpu_vm_flush_compute_tlb(svm->adev, svm->vm, TLB_FLUSH_HEAVYWEIGHT,
+				    svm->adev->gfx.xcc_mask);
+}
+
+static int amdgpu_svm_init_with_ops(struct amdgpu_device *adev,
+				    struct amdgpu_vm *vm,
+				    void (*flush_tlb)(struct amdgpu_svm *))
+{
+	struct amdgpu_svm *svm;
+	int ret;
+
+	if (vm->svm)
+		return 0;
+
+	ret = amdgpu_svm_cache_init();
+	if (ret)
+		return ret;
+
+	svm = kzalloc(sizeof(*svm), GFP_KERNEL);
+	if (!svm)
+		return -ENOMEM;
+
+	kref_init(&svm->refcount);
+	svm->adev = adev;
+	svm->vm = vm;
+
+	svm->default_granularity = min_t(u8, amdgpu_svm_default_granularity, 0x1B);
+	svm->xnack_enabled = amdgpu_svm_default_xnack_enabled(adev);
+	svm->flush_tlb = flush_tlb;
+	atomic_set(&svm->exiting, 0);
+
+	if (!svm->xnack_enabled) {
+		/* only support xnack on currently */
+		AMDGPU_SVM_ERR("amdgpu SVM is not supported with xnack off mode temporarily\n");
+		ret = -EOPNOTSUPP;
+		goto err_free;
+	}
+
+	ret = amdgpu_svm_gc_init(svm);
+	if (ret)
+		goto err_free;
+
+	init_rwsem(&svm->svm_lock);
+	spin_lock_init(&svm->work_lock);
+
+	svm->attr_tree = amdgpu_svm_attr_tree_create(svm);
+	if (!svm->attr_tree) {
+		ret = -ENOMEM;
+		goto err_gc_fini;
+	}
+
+	ret = drm_gpusvm_init(&svm->gpusvm, "AMDGPU SVM",
+						adev_to_drm(adev), current->mm, 0,
+						adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT,
+						AMDGPU_SVM_DEFAULT_SVM_NOTIFIER_SIZE * SZ_1M,
+						&amdgpu_gpusvm_ops,
+						amdgpu_svm_chunk_sizes,
+						ARRAY_SIZE(amdgpu_svm_chunk_sizes));
+
+	if (ret)
+		goto err_attr_tree_destroy;
+
+	AMDGPU_SVM_TRACE("AMDGPU SVM initialized with default granularity: 0x%lx bytes, xnack: %s\n",
+	       1UL << (svm->default_granularity + PAGE_SHIFT),
+	       svm->xnack_enabled ? "enabled" : "disabled");
+
+	drm_gpusvm_driver_set_lock(&svm->gpusvm, &svm->svm_lock);
+	vm->svm = svm;
+	return 0;
+
+err_attr_tree_destroy:
+	amdgpu_svm_attr_tree_destroy(svm->attr_tree);
+err_gc_fini:
+	amdgpu_svm_gc_fini(svm);
+err_free:
+	kfree(svm);
+	return ret;
+}
+
+static int amdgpu_svm_init_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm)
+{
+	return amdgpu_svm_init_with_ops(adev, vm,
+					amdgpu_svm_flush_tlb_compute);
+}
+
+int amdgpu_svm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm)
+{
+	/* graphics svm init maybe different */
+
+	return amdgpu_svm_init_compute(adev, vm);
+}
+
+void amdgpu_svm_close(struct amdgpu_vm *vm)
+{
+	if (!vm->svm)
+		return;
+
+	if (atomic_xchg(&vm->svm->exiting, 1))
+		return;
+
+	amdgpu_svm_gc_flush(vm->svm);
+}
+
+void amdgpu_svm_fini(struct amdgpu_vm *vm)
+{
+	struct amdgpu_svm *svm = vm->svm;
+
+	if (!svm)
+		return;
+
+	amdgpu_svm_close(vm);
+	down_write(&svm->svm_lock);
+	drm_gpusvm_fini(&svm->gpusvm);
+	up_write(&svm->svm_lock);
+
+	amdgpu_svm_gc_fini(svm);
+	amdgpu_svm_attr_tree_destroy(svm->attr_tree);
+	vm->svm = NULL;
+	amdgpu_svm_put(svm);
+}
+
+bool amdgpu_svm_is_enabled(struct amdgpu_vm *vm)
+{
+	return vm->svm != NULL;
+}
+
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC V3 11/12] drm/amdgpu: add SVM ioctl, garbage collector, and fault handler
  2026-04-20 13:12 [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm Honglei Huang
                   ` (9 preceding siblings ...)
  2026-04-20 13:13 ` [RFC V3 10/12] drm/amdgpu: implement SVM initialization and lifecycle Honglei Huang
@ 2026-04-20 13:13 ` Honglei Huang
  2026-04-20 16:24   ` Matthew Brost
  2026-04-20 13:13 ` [RFC V3 12/12] drm/amdgpu: integrate SVM into build system and VM fault path Honglei Huang
  2026-04-21  2:31 ` [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm Huang Rui
  12 siblings, 1 reply; 15+ messages in thread
From: Honglei Huang @ 2026-04-20 13:13 UTC (permalink / raw)
  To: Alexander.Deucher, Felix.Kuehling, Christian.Koenig, Oak.Zeng,
	Jenny-Jing.Liu, Philip.Yang, Xiaogang.Chen, Ray.Huang,
	Lingshan.Zhu, Junhua.Shen, matthew.brost, rodrigo.vivi,
	thomas.hellstrom, dakr, aliceryhl
  Cc: amd-gfx, dri-devel, honghuan, Honghuan He

From: Honglei Huang <honghuan@amd.com>

Add the ioctl entry point and garbage collector to amdgpu_svm.c,
and introduce amdgpu_svm_fault.c and amdgpu_svm_fault.h as a
dedicated fault handler module.

Ioctl (amdgpu_svm.c):
- amdgpu_svm_copy_attrs(): copy and validate user attribute array
  from userspace with size and alignment checks
- amdgpu_gem_svm_ioctl(): handle DRM_AMDGPU_GEM_SVM dispatching
  to SET_ATTR or GET_ATTR with copy_to_user for GET results

Garbage collector (amdgpu_svm.c):
- amdgpu_svm_garbage_collector(): dequeue and remove GC-listed
  ranges under svm_lock, clear corresponding attributes
- amdgpu_svm_range_clean_queue(): batch cleanup for dequeued
  work items
- amdgpu_svm_garbage_collector_work_func(): GC work handler
- amdgpu_svm_gc_init/fini/flush(): lifecycle management for
  the GC workqueue

Fault handler (amdgpu_svm_fault.c):
- AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING: 2ms dedup threshold
- amdgpu_svm_range_get_unregistered_attrs(): derive default
  attributes for faulting addresses without explicit registration,
  using VMA properties and GPU IP capabilities
- svm_check_fault_allowed(): validate fault access against
  attribute permissions and read-only enforcement
- amdgpu_svm_range_map_fault(): core fault mapping that finds or
  creates a gpusvm range, gets pages, maps into GPU page tables,
  retries on -EAGAIN up to 3 times
- amdgpu_svm_handle_fault(): main entry called from
  amdgpu_vm_handle_fault(). Looks up SVM by PASID, acquires
  mmap_read_lock and svm_lock, runs garbage collector, resolves
  attributes from the tree or derives defaults, uses timestamp
  deduplication to skip stale faults, dispatches to map_fault

Fault header (amdgpu_svm_fault.h):
- Forward declarations and amdgpu_svm_handle_fault() prototype

Signed-off-by: Honghuan He <honghuan.he@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c       | 149 +++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.c | 368 ++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.h |  39 ++
 3 files changed, 556 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c
index 5fbed9b9f..a672deede 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c
@@ -316,3 +316,152 @@ bool amdgpu_svm_is_enabled(struct amdgpu_vm *vm)
 	return vm->svm != NULL;
 }
 
+static int amdgpu_svm_copy_attrs(const struct drm_amdgpu_gem_svm *args,
+					   struct drm_amdgpu_svm_attribute **attrs,
+					   size_t *size)
+{
+	if (!args->nattr || args->nattr > AMDGPU_SVM_MAX_ATTRS)
+		return -EINVAL;
+	if (!args->attrs_ptr)
+		return -EINVAL;
+
+	*size = args->nattr * sizeof(**attrs);
+	*attrs = memdup_user(u64_to_user_ptr(args->attrs_ptr), *size);
+
+	return PTR_ERR_OR_ZERO(*attrs);
+}
+
+int amdgpu_svm_garbage_collector(struct amdgpu_svm *svm)
+{
+	int ret;
+	struct amdgpu_svm_range_op_ctx op_ctx;
+
+	lockdep_assert_held_write(&svm->svm_lock);
+
+	spin_lock(&svm->work_lock);
+	while (amdgpu_svm_range_dequeue_locked(svm, &svm->gc.list, &op_ctx)) {
+		spin_unlock(&svm->work_lock);
+
+		if (UNMAP_WORK(op_ctx.pending_ops)) {
+			ret = amdgpu_svm_attr_clear_pages(
+				svm->attr_tree, op_ctx.start_page, op_ctx.last_page);
+			if (ret)
+				return ret;
+
+			drm_gpusvm_range_remove(&svm->gpusvm,
+						&op_ctx.range->base);
+		}
+
+		amdgpu_svm_range_put_if_dequeued(svm, op_ctx.range);
+		spin_lock(&svm->work_lock);
+	}
+	spin_unlock(&svm->work_lock);
+	return 0;
+}
+
+void
+amdgpu_svm_range_clean_queue(struct amdgpu_svm *svm,
+			     struct list_head *work_list)
+{
+	struct amdgpu_svm_range_op_ctx op_ctx;
+
+	spin_lock(&svm->work_lock);
+	while (amdgpu_svm_range_dequeue_locked(svm, work_list,
+				    &op_ctx)) {
+		spin_unlock(&svm->work_lock);
+		amdgpu_svm_range_put_if_dequeued(svm, op_ctx.range);
+		spin_lock(&svm->work_lock);
+	}
+	spin_unlock(&svm->work_lock);
+}
+
+static void amdgpu_svm_garbage_collector_work_func(struct work_struct *w)
+{
+	struct amdgpu_svm_gc *gc = container_of(w, struct amdgpu_svm_gc, work);
+	struct amdgpu_svm *svm = container_of(gc, struct amdgpu_svm, gc);
+
+	down_write(&svm->svm_lock);
+	amdgpu_svm_garbage_collector(svm);
+	up_write(&svm->svm_lock);
+}
+
+int amdgpu_svm_gc_init(struct amdgpu_svm *svm)
+{
+	svm->gc.wq = alloc_workqueue(AMDGPU_SVM_GC_WQ_NAME,
+					WQ_UNBOUND | WQ_HIGHPRI | WQ_MEM_RECLAIM, 0);
+	if (!svm->gc.wq)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&svm->gc.list);
+	INIT_WORK(&svm->gc.work, amdgpu_svm_garbage_collector_work_func);
+
+	return 0;
+}
+
+void amdgpu_svm_gc_fini(struct amdgpu_svm *svm)
+{
+	flush_work(&svm->gc.work);
+	amdgpu_svm_range_clean_queue(svm, &svm->gc.list);
+	destroy_workqueue(svm->gc.wq);
+	svm->gc.wq = NULL;
+}
+
+void amdgpu_svm_gc_flush(struct amdgpu_svm *svm)
+{
+	flush_work(&svm->gc.work);
+}
+
+int amdgpu_gem_svm_ioctl(struct drm_device *dev, void *data,
+			 struct drm_file *filp)
+{
+	struct amdgpu_fpriv *fpriv = filp->driver_priv;
+	struct amdgpu_device *adev = drm_to_adev(dev);
+	struct drm_amdgpu_gem_svm *args = data;
+	struct drm_amdgpu_svm_attribute *attrs = NULL;
+	struct amdgpu_vm *vm;
+	size_t attrs_size = 0;
+	int ret = 0;
+
+	AMDGPU_SVM_TRACE("ioctl op=%u va:[0x%llx-0x%llx)-0x%llx nattr=%u\n",
+			 args->operation, args->start_addr, args->start_addr + args->size,
+			 args->size, args->nattr);
+
+	vm = &fpriv->vm;
+	if (!amdgpu_svm_is_enabled(vm)) {
+		ret = amdgpu_svm_init(adev, vm);
+		if (ret)
+			return ret;
+	}
+
+	if ((args->start_addr & ~PAGE_MASK) || (args->size & ~PAGE_MASK))
+		return -EINVAL;
+
+	if (!args->start_addr || !args->size)
+		return -EINVAL;
+
+	ret = amdgpu_svm_copy_attrs(args, &attrs, &attrs_size);
+	if (ret)
+		return ret;
+
+	switch (args->operation) {
+	case AMDGPU_SVM_OP_SET_ATTR:
+		ret = amdgpu_svm_set_attr(vm, args->start_addr, args->size,
+					 args->nattr, attrs);
+		break;
+	case AMDGPU_SVM_OP_GET_ATTR:
+		ret = amdgpu_svm_get_attr(vm, args->start_addr, args->size,
+					 args->nattr, attrs);
+		if (!ret && copy_to_user(u64_to_user_ptr(args->attrs_ptr),
+					 attrs, attrs_size))
+			ret = -EFAULT;
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	kvfree(attrs);
+	return ret;
+}
+
+#endif /* CONFIG_DRM_AMDGPU_SVM */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.c
new file mode 100644
index 000000000..968fb402b
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.c
@@ -0,0 +1,368 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright 2026 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "amdgpu_svm.h"
+#include "amdgpu_svm_attr.h"
+#include "amdgpu_svm_fault.h"
+#include "amdgpu_svm_range.h"
+#include "amdgpu.h"
+#include "amdgpu_vm.h"
+#include "amdgpu_gmc.h"
+#include "amdgpu_ih.h"
+
+#include <drm/drm_exec.h>
+#include <drm/drm_gpusvm.h>
+
+#include <linux/delay.h>
+#include <linux/mm.h>
+#include <linux/sched/mm.h>
+
+#if IS_ENABLED(CONFIG_DRM_AMDGPU_SVM)
+
+#define AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING	(2UL * NSEC_PER_MSEC)
+
+static int amdgpu_svm_range_get_unregistered_attrs(struct amdgpu_svm *svm,
+					    unsigned long fault_addr,
+					    unsigned long attr_start_page,
+					    unsigned long attr_last_page,
+					    struct amdgpu_svm_attr_range **out)
+{
+	struct amdgpu_svm_attr_tree *attr_tree = svm->attr_tree;
+	struct amdgpu_svm_attr_range *range;
+	struct amdgpu_svm_attrs attrs;
+	struct mm_struct *mm = svm->gpusvm.mm;
+	struct vm_area_struct *vma;
+	unsigned long fault_page = fault_addr >> PAGE_SHIFT;
+	unsigned long start_page, last_page;
+	unsigned long vma_start_page, vma_last_page;
+
+	amdgpu_svm_attr_set_default(svm, &attrs);
+
+	mmap_read_lock(mm);
+
+	vma = amdgpu_svm_check_vma(mm, fault_addr);
+	if (IS_ERR(vma)) {
+		mmap_read_unlock(mm);
+		AMDGPU_SVM_ERR("get_unregistered_attrs: invalid VMA for fault_addr=0x%lx\n",
+		       fault_addr);
+		return PTR_ERR(vma);
+	}
+	vma_start_page = vma->vm_start >> PAGE_SHIFT;
+	vma_last_page = (vma->vm_end >> PAGE_SHIFT) - 1;
+
+	if (vma_is_initial_heap(vma) || vma_is_initial_stack(vma))
+		attrs.preferred_loc = AMDGPU_SVM_LOCATION_SYSMEM;
+
+	mmap_read_unlock(mm);
+
+	start_page = max(vma_start_page,
+		    (unsigned long)ALIGN_DOWN(fault_page, 1UL << attrs.granularity));
+	last_page = min(vma_last_page,
+		   (unsigned long)ALIGN(fault_page + 1, 1UL << attrs.granularity) - 1);
+
+	start_page = max(start_page, attr_start_page);
+	last_page = min(last_page, attr_last_page);
+
+	mutex_lock(&attr_tree->lock);
+	range = amdgpu_svm_attr_range_alloc(start_page, last_page, &attrs);
+	if (!range) {
+		mutex_unlock(&attr_tree->lock);
+		return -ENOMEM;
+	}
+	amdgpu_svm_attr_range_insert_locked(attr_tree, range);
+	mutex_unlock(&attr_tree->lock);
+
+	AMDGPU_SVM_TRACE(
+		"Created unregistered range for fault_addr=0x%lx: attr range=[0x%lx-0x%lx] size: 0x%lx attrs={preferred_loc=%d, prefetch_loc=%d, flags=0x%x, granularity=%u, access=%u}\n",
+		fault_addr, amdgpu_svm_attr_start_page(range),
+		amdgpu_svm_attr_last_page(range) + 1,
+		amdgpu_svm_attr_last_page(range) -
+			amdgpu_svm_attr_start_page(range) + 1,
+		range->attrs.preferred_loc, range->attrs.prefetch_loc,
+		range->attrs.flags, range->attrs.granularity,
+		range->attrs.access);
+
+	*out = range;
+	return 0;
+}
+
+static int svm_check_fault_allowed(struct amdgpu_svm *svm,
+				   unsigned long fault_addr, bool write_fault)
+{
+	struct mm_struct *mm = svm->gpusvm.mm;
+	struct vm_area_struct *vma;
+	unsigned long requested = VM_READ;
+	int ret = 0;
+
+	if (write_fault)
+		requested |= VM_WRITE;
+
+	mmap_read_lock(mm);
+	vma = vma_lookup(mm, fault_addr);
+	if (vma && (vma->vm_flags & requested) != requested) {
+		AMDGPU_SVM_ERR("fault addr 0x%lx no %s permission\n",
+			 fault_addr, write_fault ? "write" : "read");
+		ret = -EPERM;
+	}
+	mmap_read_unlock(mm);
+
+	return ret;
+}
+
+static int amdgpu_svm_range_map_fault(struct amdgpu_svm *svm,
+			       unsigned long fault_addr,
+			       const struct amdgpu_svm_attr_range *attr_range,
+			       bool write_fault)
+{
+	const struct amdgpu_svm_attrs *attrs = &attr_range->attrs;
+	bool devmem_possible = amdgpu_svm_attr_devmem_possible(svm, attrs);
+	bool need_vram_migration = amdgpu_svm_attr_prefer_vram(svm, attrs);
+	devmem_possible = false; /* TODO: add migration */
+	struct drm_gpusvm_ctx map_ctx = {
+		.read_only = !!(attrs->flags & AMDGPU_SVM_FLAG_GPU_RO),
+		.devmem_possible = devmem_possible,
+		.check_pages_threshold = devmem_possible ? SZ_64K : 0,
+		.devmem_only = need_vram_migration && devmem_possible,
+		.timeslice_ms = need_vram_migration && devmem_possible ? 5 : 0,
+	};
+	struct amdgpu_svm_range *range;
+	ktime_t timestamp = ktime_get_boottime();
+	uint64_t range_pte_flags;
+	int retry_count = 3;
+	int ret;
+
+	lockdep_assert_held_write(&svm->svm_lock);
+	WARN_ON(!svm->xnack_enabled);
+
+retry:
+	ret = amdgpu_svm_garbage_collector(svm);
+	if (ret) {
+		AMDGPU_SVM_ERR(
+			"fault garbage collector failed: ret=%d, fault_addr=0x%lx\n",
+			ret, fault_addr);
+		return ret;
+	}
+
+	ret = svm_check_fault_allowed(svm, fault_addr, write_fault);
+	if (ret)
+		return ret;
+
+	range = amdgpu_svm_range_find_or_insert(svm, fault_addr,
+						 attr_range, &map_ctx);
+	if (IS_ERR(range)) {
+		ret = PTR_ERR(range);
+		AMDGPU_SVM_ERR("map_fault: range_find_or_insert failed: fault=0x%lx ret=%d\n",
+				 fault_addr, ret);
+		/*
+		 * -EINVAL: fault_addr out of gpusvm range, or no chunk size
+		 *          fits within VMA/notifier/attr_range bounds.
+		 * -EFAULT: mmget_not_zero failed.
+		 * -ENOENT: No VMA at fault_addr.
+		 * -ENOMEM: Notifier or range allocation failed.
+		 */
+		if (ret == -EFAULT || ret == -ENOENT) {
+			AMDGPU_SVM_ERR("no vma or mm is dying: 0x%lx, ret=%d\n",
+					 fault_addr, ret);
+			ret = 0;
+		}
+
+		return ret;
+	}
+
+	if (ktime_before(timestamp, ktime_add_ns(range->validate_timestamp,
+					 AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING))) {
+		AMDGPU_SVM_TRACE("already restored, skip: fault=0x%lx range=[0x%lx-0x%lx)\n",
+				 fault_addr, drm_gpusvm_range_start(&range->base),
+				 drm_gpusvm_range_end(&range->base));
+		goto out;
+	}
+
+	range_pte_flags = amdgpu_svm_range_attr_pte_flags(
+					svm, attrs, map_ctx.read_only);
+
+	if (!(write_fault && map_ctx.read_only) &&
+	    amdgpu_svm_range_is_valid(svm, range, attrs, range_pte_flags)) {
+		AMDGPU_SVM_TRACE("valid range, skip: fault=0x%lx range=[0x%lx-0x%lx)\n",
+				 fault_addr, drm_gpusvm_range_start(&range->base),
+				 drm_gpusvm_range_end(&range->base));
+		goto out;
+	}
+
+	AMDGPU_SVM_RANGE_DEBUG(range, "PAGE FAULT");
+	/* TODO: add migration*/
+
+	AMDGPU_SVM_RANGE_DEBUG(range, "GET PAGES");
+	ret = amdgpu_svm_range_get_pages(svm, &range->base, &map_ctx);
+	if (ret == -EOPNOTSUPP || ret == -EFAULT) {
+		/*
+		* -EOPNOTSUPP  Mixed page types within range.
+		* -EFAULT      (a) mm is dying.
+		*              (b) range was unmapped.
+		*              (c) DMA mapping failed.
+		*              (d) devmem_only requested but system page encountered.
+		*              (e) hmm_range_fault: no VMA, page fault error, bad pte/pmd.
+		* -EBUSY       HMM retry loop timed out.
+		* -ENOMEM      PFN or DMA address array allocation failed.
+		* -EINVAL      hmm_range_fault: invalid VMA type.
+		*/
+		map_ctx.timeslice_ms <<= 1;
+		if (!map_ctx.devmem_only && --retry_count > 0) {
+			AMDGPU_SVM_ERR("start retry: get_pages failed with %d, retries_left=%d: fault=0x%lx range=[0x%lx-0x%lx)\n",
+					 ret, retry_count, fault_addr,
+					 drm_gpusvm_range_start(&range->base),
+					 drm_gpusvm_range_end(&range->base));
+			goto retry;
+		} else {
+			AMDGPU_SVM_ERR("map_fault: get_pages failed with %d, devmem fallback allowed, but no devmem pages: fault=0x%lx range=[0x%lx-0x%lx)\n",
+					 ret, fault_addr, drm_gpusvm_range_start(&range->base),
+					 drm_gpusvm_range_end(&range->base));
+		}
+	}
+
+	if (ret == -EPERM) {
+		AMDGPU_SVM_ERR("get_pages -EPERM: fault=0x%lx range=[0x%lx-0x%lx)\n",
+			       fault_addr, drm_gpusvm_range_start(&range->base),
+				       drm_gpusvm_range_end(&range->base));
+		return ret;
+	}
+
+	if (ret) {
+		AMDGPU_SVM_RANGE_DEBUG(range, "PAGE FAULT - FAIL PAGE COLLECT");
+		goto out;
+	}
+
+	AMDGPU_SVM_RANGE_DEBUG(range, "PAGE FAULT - GPU MAP");
+
+	ret = amdgpu_svm_range_update_mapping(svm, range,
+					      range_pte_flags, attrs->flags,
+					      false, false, false);
+
+	if (ret)
+		goto err_out;
+
+out:
+	return 0;
+
+err_out:
+	if (ret == -EAGAIN && --retry_count > 0) {
+		map_ctx.timeslice_ms <<= 1;
+		AMDGPU_SVM_RANGE_DEBUG(range, "PAGE FAULT - RETRY GPU MAP");
+		goto retry;
+	}
+
+	return ret;
+}
+
+int amdgpu_svm_handle_fault(struct amdgpu_device *adev, uint32_t pasid,
+			    uint64_t fault_addr, uint64_t ts,
+			    bool write_fault)
+{
+	struct amdgpu_svm *svm;
+	struct amdgpu_svm_attr_range *attr_range;
+	unsigned long attr_start_page, attr_last_page;
+	unsigned long fault_page;
+	uint64_t ckpt;
+	int ret;
+
+	fault_addr = fault_addr << PAGE_SHIFT;
+	fault_page = fault_addr >> PAGE_SHIFT;
+
+	svm = amdgpu_svm_lookup_by_pasid(adev, pasid);
+	if (!svm) {
+		AMDGPU_SVM_ERR("handle_fault: no SVM context for pasid %u\n", pasid);
+		return -EOPNOTSUPP;
+	}
+
+	if (atomic_read(&svm->exiting)) {
+		AMDGPU_SVM_ERR("handle_fault: SVM context is exiting for pasid %u\n", pasid);
+		ret = -EAGAIN;
+		goto out_put;
+	}
+
+	if (!svm->xnack_enabled) {
+		AMDGPU_SVM_ERR("handle_fault: SVM context does not have xnack enabled for pasid %u\n", pasid);
+		ret = -EOPNOTSUPP;
+		goto out_put;
+	}
+
+	ckpt = READ_ONCE(svm->checkpoint_ts);
+	if (ckpt != 0) {
+		if (amdgpu_ih_ts_after_or_equal(ts, ckpt)) {
+			AMDGPU_SVM_TRACE(
+			"handle_fault: draining stale retry fault, drop fault 0x%llx ts=%llu checkpoint=%llu\n",
+				fault_addr, ts, ckpt);
+			amdgpu_gmc_filter_faults_remove(
+				adev, fault_addr >> PAGE_SHIFT, pasid);
+			ret = 0;
+			goto out_put;
+		} else {
+			WRITE_ONCE(svm->checkpoint_ts, 0);
+		}
+	}
+
+	down_write(&svm->svm_lock);
+
+retry:
+	mutex_lock(&svm->attr_tree->lock);
+	attr_range = amdgpu_svm_attr_get_bounds_locked(svm->attr_tree,
+						       fault_page,
+						       &attr_start_page, &attr_last_page);
+	mutex_unlock(&svm->attr_tree->lock);
+	if (!attr_range) {
+		ret = amdgpu_svm_range_get_unregistered_attrs(svm, fault_addr,
+							      attr_start_page,
+							      attr_last_page,
+							      &attr_range);
+		if (ret) {
+			if (ret == -EFAULT)
+				goto out_no_vma;
+			goto out_unlock;
+		}
+	}
+	ret = amdgpu_svm_range_map_fault(svm, fault_addr, attr_range,
+					 write_fault);
+
+	if (ret == -EAGAIN) {
+		AMDGPU_SVM_ERR("handle_fault: got -EAGAIN: fault=0x%llx\n",
+			       fault_addr);
+		amdgpu_gmc_filter_faults_remove(adev, fault_addr>>PAGE_SHIFT, pasid);
+		goto retry;
+	}
+
+	goto out_unlock;
+
+out_no_vma:
+	AMDGPU_SVM_ERR("handle_fault: no VMA for fault=0x%llx (stale retry or GPU NULL deref)\n",
+		 fault_addr);
+	ret = 0;
+
+out_unlock:
+	up_write(&svm->svm_lock);
+
+out_put:
+	amdgpu_svm_put(svm);
+	return ret;
+}
+
+#endif /* CONFIG_DRM_AMDGPU_SVM */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.h
new file mode 100644
index 000000000..1c8f6c15e
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+/*
+ * Copyright 2026 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef __AMDGPU_SVM_FAULT_H__
+#define __AMDGPU_SVM_FAULT_H__
+
+#include <linux/types.h>
+
+struct amdgpu_device;
+struct amdgpu_svm;
+struct amdgpu_svm_attr_range;
+struct amdgpu_svm_attrs;
+
+int amdgpu_svm_handle_fault(struct amdgpu_device *adev, uint32_t pasid,
+			    uint64_t fault_addr, uint64_t ts,
+			    bool write_fault);
+
+#endif /* __AMDGPU_SVM_FAULT_H__ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC V3 12/12] drm/amdgpu: integrate SVM into build system and VM fault path
  2026-04-20 13:12 [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm Honglei Huang
                   ` (10 preceding siblings ...)
  2026-04-20 13:13 ` [RFC V3 11/12] drm/amdgpu: add SVM ioctl, garbage collector, and fault handler Honglei Huang
@ 2026-04-20 13:13 ` Honglei Huang
  2026-04-21  2:31 ` [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm Huang Rui
  12 siblings, 0 replies; 15+ messages in thread
From: Honglei Huang @ 2026-04-20 13:13 UTC (permalink / raw)
  To: Alexander.Deucher, Felix.Kuehling, Christian.Koenig, Oak.Zeng,
	Jenny-Jing.Liu, Philip.Yang, Xiaogang.Chen, Ray.Huang,
	Lingshan.Zhu, Junhua.Shen, matthew.brost, rodrigo.vivi,
	thomas.hellstrom, dakr, aliceryhl
  Cc: amd-gfx, dri-devel, honghuan, Honghuan He

From: Honglei Huang <honghuan@amd.com>

Wire the amdgpu SVM subsystem into the kernel build and the
runtime fault handling path.

Kconfig:
- Add CONFIG_DRM_AMDGPU_SVM (default y), depends on
  DEVICE_PRIVATE, selects DRM_GPUSVM, HMM_MIRROR, MMU_NOTIFIER

Makefile:
- Build amdgpu_svm.o, amdgpu_svm_attr.o, amdgpu_svm_fault.o,
  and amdgpu_svm_range.o when CONFIG_DRM_AMDGPU_SVM is enabled
- Add KBUILD_EXTRA_SYMBOLS for drm Module.symvers to resolve
  drm_gpusvm symbols

amdgpu_drv.c:
- Register DRM_IOCTL_AMDGPU_GEM_SVM in the ioctl table

amdgpu_vm.c:
- Initialize vm->svm = NULL in amdgpu_vm_init()
- Call amdgpu_svm_init() in amdgpu_vm_make_compute() to create
  SVM state for compute VMs
- Call amdgpu_svm_close() and amdgpu_svm_fini() in
  amdgpu_vm_fini() for cleanup
- Route GPU page faults through amdgpu_svm_handle_fault() in
  amdgpu_vm_handle_fault() when SVM is enabled, before falling
  back to the KFD svm_range_restore_pages() path

Signed-off-by: Honghuan He <honghuan.he@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/Kconfig      | 11 +++++++++++
 drivers/gpu/drm/amd/amdgpu/Makefile     | 13 +++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c  | 20 +++++++++++++++++++-
 4 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 7f515be51..337314011 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -82,6 +82,17 @@ config DRM_AMDGPU_USERPTR
 	  This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
 	  isn't already selected to enabled full userptr support.
 
+config DRM_AMDGPU_SVM
+	bool "Enable AMDGPU SVM support (experimental)"
+	depends on DRM_AMDGPU
+	depends on DEVICE_PRIVATE
+	select DRM_GPUSVM
+	select HMM_MIRROR
+	select MMU_NOTIFIER
+	default y
+	help
+	  Experimental SVM support based on DRM GPUSVM.
+
 config DRM_AMD_ISP
 	bool "Enable AMD Image Signal Processor IP support"
 	depends on DRM_AMDGPU && ACPI
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 6a7e9bfec..a40a42995 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -44,6 +44,10 @@ ccflags-y := -I$(FULL_AMD_PATH)/include/asic_reg \
 subdir-ccflags-y += -Wno-override-init
 subdir-ccflags-$(CONFIG_DRM_AMDGPU_WERROR) += -Werror
 
+ifneq ($(wildcard $(objtree)/drivers/gpu/drm/Module.symvers),)
+KBUILD_EXTRA_SYMBOLS += $(objtree)/drivers/gpu/drm/Module.symvers
+endif
+
 amdgpu-y := amdgpu_drv.o
 
 # add KMS driver
@@ -317,6 +321,15 @@ amdgpu-$(CONFIG_VGA_SWITCHEROO) += amdgpu_atpx_handler.o
 amdgpu-$(CONFIG_ACPI) += amdgpu_acpi.o
 amdgpu-$(CONFIG_HMM_MIRROR) += amdgpu_hmm.o
 
+# svm support
+amdgpu-$(CONFIG_DRM_AMDGPU_SVM) += amdgpu_svm.o amdgpu_svm_attr.o \
+	amdgpu_svm_fault.o amdgpu_svm_range.o
+
+.PHONY: clean-svm
+clean-svm:
+	rm -f $(obj)/amdgpu_svm.o $(obj)/amdgpu_svm_attr.o $(obj)/amdgpu_svm_fault.o $(obj)/amdgpu_svm_range.o \
+	      $(obj)/.amdgpu_svm.o.cmd $(obj)/.amdgpu_svm_attr.o.cmd $(obj)/.amdgpu_svm_fault.o.cmd $(obj)/.amdgpu_svm_range.o.cmd
+
 include $(FULL_AMD_PATH)/pm/Makefile
 
 amdgpu-y += $(AMD_POWERPLAY_FILES)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index a44baa9ee..d5ccacdf2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -51,6 +51,7 @@
 #include "amdgpu_ras.h"
 #include "amdgpu_reset.h"
 #include "amdgpu_sched.h"
+#include "amdgpu_svm.h"
 #include "amdgpu_xgmi.h"
 #include "amdgpu_userq.h"
 #include "amdgpu_userq_fence.h"
@@ -3064,6 +3065,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
 	DRM_IOCTL_DEF_DRV(AMDGPU_USERQ_SIGNAL, amdgpu_userq_signal_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_USERQ_WAIT, amdgpu_userq_wait_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_LIST_HANDLES, amdgpu_gem_list_handles_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_SVM, amdgpu_gem_svm_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 };
 
 static const struct drm_driver amdgpu_kms_driver = {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 429947f75..86603d2b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -43,6 +43,7 @@
 #include "amdgpu_xgmi.h"
 #include "amdgpu_dma_buf.h"
 #include "amdgpu_res_cursor.h"
+#include "amdgpu_svm.h"
 #include "kfd_svm.h"
 
 /**
@@ -2606,6 +2607,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	int r, i;
 
 	vm->va = RB_ROOT_CACHED;
+	vm->svm = NULL;
 	for (i = 0; i < AMDGPU_MAX_VMHUBS; i++)
 		vm->reserved_vmid[i] = NULL;
 	INIT_LIST_HEAD(&vm->evicted);
@@ -2766,6 +2768,10 @@ int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 	vm->is_compute_context = true;
 	vm->need_tlb_fence = true;
 
+	r = amdgpu_svm_init(adev, vm);
+	if (r)
+		goto unreserve_bo;
+
 unreserve_bo:
 	amdgpu_bo_unreserve(vm->root.bo);
 	return r;
@@ -2798,6 +2804,9 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 	unsigned long flags;
 	int i;
 
+	amdgpu_svm_close(vm);
+	amdgpu_svm_fini(vm);
+
 	amdgpu_amdkfd_gpuvm_destroy_cb(adev, vm);
 
 	root = amdgpu_bo_ref(vm->root.bo);
@@ -2976,8 +2985,10 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, u32 pasid,
 			    bool write_fault)
 {
 	bool is_compute_context = false;
+	bool has_svm = false;
 	struct amdgpu_bo *root;
 	unsigned long irqflags;
+	uint64_t fault_addr = addr;
 	uint64_t value, flags;
 	struct amdgpu_vm *vm;
 	int r;
@@ -2987,6 +2998,7 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, u32 pasid,
 	if (vm) {
 		root = amdgpu_bo_ref(vm->root.bo);
 		is_compute_context = vm->is_compute_context;
+		has_svm = !!vm->svm;
 	} else {
 		root = NULL;
 	}
@@ -2997,7 +3009,13 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, u32 pasid,
 
 	addr /= AMDGPU_GPU_PAGE_SIZE;
 
-	if (is_compute_context && !svm_range_restore_pages(adev, pasid, vmid,
+	if (is_compute_context && has_svm && !amdgpu_svm_handle_fault(adev, pasid, addr,
+		ts, write_fault)) {
+		amdgpu_bo_unref(&root);
+		return true;
+	}
+
+	if (is_compute_context && !has_svm && !svm_range_restore_pages(adev, pasid, vmid,
 	    node_id, addr, ts, write_fault)) {
 		amdgpu_bo_unref(&root);
 		return true;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC V3 11/12] drm/amdgpu: add SVM ioctl, garbage collector, and fault handler
  2026-04-20 13:13 ` [RFC V3 11/12] drm/amdgpu: add SVM ioctl, garbage collector, and fault handler Honglei Huang
@ 2026-04-20 16:24   ` Matthew Brost
  0 siblings, 0 replies; 15+ messages in thread
From: Matthew Brost @ 2026-04-20 16:24 UTC (permalink / raw)
  To: Honglei Huang
  Cc: Alexander.Deucher, Felix.Kuehling, Christian.Koenig, Oak.Zeng,
	Jenny-Jing.Liu, Philip.Yang, Xiaogang.Chen, Ray.Huang,
	Lingshan.Zhu, Junhua.Shen, rodrigo.vivi, thomas.hellstrom, dakr,
	aliceryhl, amd-gfx, dri-devel, honghuan, Honghuan He

On Mon, Apr 20, 2026 at 09:13:06PM +0800, Honglei Huang wrote:
> From: Honglei Huang <honghuan@amd.com>
> 
> Add the ioctl entry point and garbage collector to amdgpu_svm.c,
> and introduce amdgpu_svm_fault.c and amdgpu_svm_fault.h as a
> dedicated fault handler module.
> 
> Ioctl (amdgpu_svm.c):
> - amdgpu_svm_copy_attrs(): copy and validate user attribute array
>   from userspace with size and alignment checks
> - amdgpu_gem_svm_ioctl(): handle DRM_AMDGPU_GEM_SVM dispatching
>   to SET_ATTR or GET_ATTR with copy_to_user for GET results
> 
> Garbage collector (amdgpu_svm.c):
> - amdgpu_svm_garbage_collector(): dequeue and remove GC-listed
>   ranges under svm_lock, clear corresponding attributes
> - amdgpu_svm_range_clean_queue(): batch cleanup for dequeued
>   work items
> - amdgpu_svm_garbage_collector_work_func(): GC work handler
> - amdgpu_svm_gc_init/fini/flush(): lifecycle management for
>   the GC workqueue
> 
> Fault handler (amdgpu_svm_fault.c):
> - AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING: 2ms dedup threshold
> - amdgpu_svm_range_get_unregistered_attrs(): derive default
>   attributes for faulting addresses without explicit registration,
>   using VMA properties and GPU IP capabilities
> - svm_check_fault_allowed(): validate fault access against
>   attribute permissions and read-only enforcement
> - amdgpu_svm_range_map_fault(): core fault mapping that finds or
>   creates a gpusvm range, gets pages, maps into GPU page tables,
>   retries on -EAGAIN up to 3 times
> - amdgpu_svm_handle_fault(): main entry called from
>   amdgpu_vm_handle_fault(). Looks up SVM by PASID, acquires
>   mmap_read_lock and svm_lock, runs garbage collector, resolves
>   attributes from the tree or derives defaults, uses timestamp
>   deduplication to skip stale faults, dispatches to map_fault
> 
> Fault header (amdgpu_svm_fault.h):
> - Forward declarations and amdgpu_svm_handle_fault() prototype
> 
> Signed-off-by: Honghuan He <honghuan.he@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c       | 149 +++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.c | 368 ++++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.h |  39 ++
>  3 files changed, 556 insertions(+)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.h
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c
> index 5fbed9b9f..a672deede 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c
> @@ -316,3 +316,152 @@ bool amdgpu_svm_is_enabled(struct amdgpu_vm *vm)
>  	return vm->svm != NULL;
>  }
>  
> +static int amdgpu_svm_copy_attrs(const struct drm_amdgpu_gem_svm *args,
> +					   struct drm_amdgpu_svm_attribute **attrs,
> +					   size_t *size)
> +{
> +	if (!args->nattr || args->nattr > AMDGPU_SVM_MAX_ATTRS)
> +		return -EINVAL;
> +	if (!args->attrs_ptr)
> +		return -EINVAL;
> +
> +	*size = args->nattr * sizeof(**attrs);
> +	*attrs = memdup_user(u64_to_user_ptr(args->attrs_ptr), *size);
> +
> +	return PTR_ERR_OR_ZERO(*attrs);
> +}
> +
> +int amdgpu_svm_garbage_collector(struct amdgpu_svm *svm)
> +{
> +	int ret;
> +	struct amdgpu_svm_range_op_ctx op_ctx;
> +
> +	lockdep_assert_held_write(&svm->svm_lock);
> +
> +	spin_lock(&svm->work_lock);
> +	while (amdgpu_svm_range_dequeue_locked(svm, &svm->gc.list, &op_ctx)) {
> +		spin_unlock(&svm->work_lock);
> +
> +		if (UNMAP_WORK(op_ctx.pending_ops)) {
> +			ret = amdgpu_svm_attr_clear_pages(
> +				svm->attr_tree, op_ctx.start_page, op_ctx.last_page);
> +			if (ret)
> +				return ret;
> +
> +			drm_gpusvm_range_remove(&svm->gpusvm,
> +						&op_ctx.range->base);
> +		}
> +
> +		amdgpu_svm_range_put_if_dequeued(svm, op_ctx.range);
> +		spin_lock(&svm->work_lock);
> +	}
> +	spin_unlock(&svm->work_lock);
> +	return 0;
> +}
> +
> +void
> +amdgpu_svm_range_clean_queue(struct amdgpu_svm *svm,
> +			     struct list_head *work_list)
> +{
> +	struct amdgpu_svm_range_op_ctx op_ctx;
> +
> +	spin_lock(&svm->work_lock);
> +	while (amdgpu_svm_range_dequeue_locked(svm, work_list,
> +				    &op_ctx)) {
> +		spin_unlock(&svm->work_lock);
> +		amdgpu_svm_range_put_if_dequeued(svm, op_ctx.range);
> +		spin_lock(&svm->work_lock);
> +	}
> +	spin_unlock(&svm->work_lock);
> +}
> +
> +static void amdgpu_svm_garbage_collector_work_func(struct work_struct *w)
> +{
> +	struct amdgpu_svm_gc *gc = container_of(w, struct amdgpu_svm_gc, work);
> +	struct amdgpu_svm *svm = container_of(gc, struct amdgpu_svm, gc);
> +
> +	down_write(&svm->svm_lock);
> +	amdgpu_svm_garbage_collector(svm);
> +	up_write(&svm->svm_lock);
> +}
> +
> +int amdgpu_svm_gc_init(struct amdgpu_svm *svm)
> +{
> +	svm->gc.wq = alloc_workqueue(AMDGPU_SVM_GC_WQ_NAME,
> +					WQ_UNBOUND | WQ_HIGHPRI | WQ_MEM_RECLAIM, 0);
> +	if (!svm->gc.wq)
> +		return -ENOMEM;
> +
> +	INIT_LIST_HEAD(&svm->gc.list);
> +	INIT_WORK(&svm->gc.work, amdgpu_svm_garbage_collector_work_func);
> +
> +	return 0;
> +}
> +
> +void amdgpu_svm_gc_fini(struct amdgpu_svm *svm)
> +{
> +	flush_work(&svm->gc.work);
> +	amdgpu_svm_range_clean_queue(svm, &svm->gc.list);
> +	destroy_workqueue(svm->gc.wq);
> +	svm->gc.wq = NULL;
> +}
> +
> +void amdgpu_svm_gc_flush(struct amdgpu_svm *svm)
> +{
> +	flush_work(&svm->gc.work);
> +}
> +
> +int amdgpu_gem_svm_ioctl(struct drm_device *dev, void *data,
> +			 struct drm_file *filp)
> +{
> +	struct amdgpu_fpriv *fpriv = filp->driver_priv;
> +	struct amdgpu_device *adev = drm_to_adev(dev);
> +	struct drm_amdgpu_gem_svm *args = data;
> +	struct drm_amdgpu_svm_attribute *attrs = NULL;
> +	struct amdgpu_vm *vm;
> +	size_t attrs_size = 0;
> +	int ret = 0;
> +
> +	AMDGPU_SVM_TRACE("ioctl op=%u va:[0x%llx-0x%llx)-0x%llx nattr=%u\n",
> +			 args->operation, args->start_addr, args->start_addr + args->size,
> +			 args->size, args->nattr);
> +
> +	vm = &fpriv->vm;
> +	if (!amdgpu_svm_is_enabled(vm)) {
> +		ret = amdgpu_svm_init(adev, vm);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	if ((args->start_addr & ~PAGE_MASK) || (args->size & ~PAGE_MASK))
> +		return -EINVAL;
> +
> +	if (!args->start_addr || !args->size)
> +		return -EINVAL;
> +
> +	ret = amdgpu_svm_copy_attrs(args, &attrs, &attrs_size);
> +	if (ret)
> +		return ret;
> +
> +	switch (args->operation) {
> +	case AMDGPU_SVM_OP_SET_ATTR:
> +		ret = amdgpu_svm_set_attr(vm, args->start_addr, args->size,
> +					 args->nattr, attrs);
> +		break;
> +	case AMDGPU_SVM_OP_GET_ATTR:
> +		ret = amdgpu_svm_get_attr(vm, args->start_addr, args->size,
> +					 args->nattr, attrs);
> +		if (!ret && copy_to_user(u64_to_user_ptr(args->attrs_ptr),
> +					 attrs, attrs_size))
> +			ret = -EFAULT;
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +
> +	kvfree(attrs);
> +	return ret;
> +}
> +
> +#endif /* CONFIG_DRM_AMDGPU_SVM */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.c
> new file mode 100644
> index 000000000..968fb402b
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.c
> @@ -0,0 +1,368 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> +/*
> + * Copyright 2026 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#include "amdgpu_svm.h"
> +#include "amdgpu_svm_attr.h"
> +#include "amdgpu_svm_fault.h"
> +#include "amdgpu_svm_range.h"
> +#include "amdgpu.h"
> +#include "amdgpu_vm.h"
> +#include "amdgpu_gmc.h"
> +#include "amdgpu_ih.h"
> +
> +#include <drm/drm_exec.h>
> +#include <drm/drm_gpusvm.h>
> +
> +#include <linux/delay.h>
> +#include <linux/mm.h>
> +#include <linux/sched/mm.h>
> +
> +#if IS_ENABLED(CONFIG_DRM_AMDGPU_SVM)
> +
> +#define AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING	(2UL * NSEC_PER_MSEC)
> +
> +static int amdgpu_svm_range_get_unregistered_attrs(struct amdgpu_svm *svm,
> +					    unsigned long fault_addr,
> +					    unsigned long attr_start_page,
> +					    unsigned long attr_last_page,
> +					    struct amdgpu_svm_attr_range **out)
> +{
> +	struct amdgpu_svm_attr_tree *attr_tree = svm->attr_tree;
> +	struct amdgpu_svm_attr_range *range;
> +	struct amdgpu_svm_attrs attrs;
> +	struct mm_struct *mm = svm->gpusvm.mm;
> +	struct vm_area_struct *vma;
> +	unsigned long fault_page = fault_addr >> PAGE_SHIFT;
> +	unsigned long start_page, last_page;
> +	unsigned long vma_start_page, vma_last_page;
> +
> +	amdgpu_svm_attr_set_default(svm, &attrs);
> +
> +	mmap_read_lock(mm);
> +
> +	vma = amdgpu_svm_check_vma(mm, fault_addr);
> +	if (IS_ERR(vma)) {
> +		mmap_read_unlock(mm);
> +		AMDGPU_SVM_ERR("get_unregistered_attrs: invalid VMA for fault_addr=0x%lx\n",
> +		       fault_addr);
> +		return PTR_ERR(vma);
> +	}
> +	vma_start_page = vma->vm_start >> PAGE_SHIFT;
> +	vma_last_page = (vma->vm_end >> PAGE_SHIFT) - 1;
> +
> +	if (vma_is_initial_heap(vma) || vma_is_initial_stack(vma))
> +		attrs.preferred_loc = AMDGPU_SVM_LOCATION_SYSMEM;
> +
> +	mmap_read_unlock(mm);
> +
> +	start_page = max(vma_start_page,
> +		    (unsigned long)ALIGN_DOWN(fault_page, 1UL << attrs.granularity));
> +	last_page = min(vma_last_page,
> +		   (unsigned long)ALIGN(fault_page + 1, 1UL << attrs.granularity) - 1);
> +
> +	start_page = max(start_page, attr_start_page);
> +	last_page = min(last_page, attr_last_page);
> +
> +	mutex_lock(&attr_tree->lock);
> +	range = amdgpu_svm_attr_range_alloc(start_page, last_page, &attrs);
> +	if (!range) {
> +		mutex_unlock(&attr_tree->lock);
> +		return -ENOMEM;
> +	}
> +	amdgpu_svm_attr_range_insert_locked(attr_tree, range);
> +	mutex_unlock(&attr_tree->lock);
> +
> +	AMDGPU_SVM_TRACE(
> +		"Created unregistered range for fault_addr=0x%lx: attr range=[0x%lx-0x%lx] size: 0x%lx attrs={preferred_loc=%d, prefetch_loc=%d, flags=0x%x, granularity=%u, access=%u}\n",
> +		fault_addr, amdgpu_svm_attr_start_page(range),
> +		amdgpu_svm_attr_last_page(range) + 1,
> +		amdgpu_svm_attr_last_page(range) -
> +			amdgpu_svm_attr_start_page(range) + 1,
> +		range->attrs.preferred_loc, range->attrs.prefetch_loc,
> +		range->attrs.flags, range->attrs.granularity,
> +		range->attrs.access);
> +
> +	*out = range;
> +	return 0;
> +}
> +
> +static int svm_check_fault_allowed(struct amdgpu_svm *svm,
> +				   unsigned long fault_addr, bool write_fault)
> +{
> +	struct mm_struct *mm = svm->gpusvm.mm;
> +	struct vm_area_struct *vma;
> +	unsigned long requested = VM_READ;
> +	int ret = 0;
> +
> +	if (write_fault)
> +		requested |= VM_WRITE;
> +
> +	mmap_read_lock(mm);
> +	vma = vma_lookup(mm, fault_addr);
> +	if (vma && (vma->vm_flags & requested) != requested) {
> +		AMDGPU_SVM_ERR("fault addr 0x%lx no %s permission\n",
> +			 fault_addr, write_fault ? "write" : "read");
> +		ret = -EPERM;
> +	}
> +	mmap_read_unlock(mm);
> +
> +	return ret;
> +}
> +
> +static int amdgpu_svm_range_map_fault(struct amdgpu_svm *svm,
> +			       unsigned long fault_addr,
> +			       const struct amdgpu_svm_attr_range *attr_range,
> +			       bool write_fault)
> +{
> +	const struct amdgpu_svm_attrs *attrs = &attr_range->attrs;
> +	bool devmem_possible = amdgpu_svm_attr_devmem_possible(svm, attrs);
> +	bool need_vram_migration = amdgpu_svm_attr_prefer_vram(svm, attrs);
> +	devmem_possible = false; /* TODO: add migration */
> +	struct drm_gpusvm_ctx map_ctx = {
> +		.read_only = !!(attrs->flags & AMDGPU_SVM_FLAG_GPU_RO),
> +		.devmem_possible = devmem_possible,
> +		.check_pages_threshold = devmem_possible ? SZ_64K : 0,
> +		.devmem_only = need_vram_migration && devmem_possible,
> +		.timeslice_ms = need_vram_migration && devmem_possible ? 5 : 0,
> +	};
> +	struct amdgpu_svm_range *range;
> +	ktime_t timestamp = ktime_get_boottime();
> +	uint64_t range_pte_flags;
> +	int retry_count = 3;
> +	int ret;
> +
> +	lockdep_assert_held_write(&svm->svm_lock);
> +	WARN_ON(!svm->xnack_enabled);
> +
> +retry:
> +	ret = amdgpu_svm_garbage_collector(svm);
> +	if (ret) {
> +		AMDGPU_SVM_ERR(
> +			"fault garbage collector failed: ret=%d, fault_addr=0x%lx\n",
> +			ret, fault_addr);
> +		return ret;
> +	}
> +
> +	ret = svm_check_fault_allowed(svm, fault_addr, write_fault);
> +	if (ret)
> +		return ret;
> +
> +	range = amdgpu_svm_range_find_or_insert(svm, fault_addr,
> +						 attr_range, &map_ctx);
> +	if (IS_ERR(range)) {
> +		ret = PTR_ERR(range);
> +		AMDGPU_SVM_ERR("map_fault: range_find_or_insert failed: fault=0x%lx ret=%d\n",
> +				 fault_addr, ret);
> +		/*
> +		 * -EINVAL: fault_addr out of gpusvm range, or no chunk size
> +		 *          fits within VMA/notifier/attr_range bounds.
> +		 * -EFAULT: mmget_not_zero failed.
> +		 * -ENOENT: No VMA at fault_addr.
> +		 * -ENOMEM: Notifier or range allocation failed.
> +		 */

Just a drive-by comment: as we’re getting to multiple users of GPU SVM,
and each driver is making decisions based on the error codes returned by
the common layer, it may be time to update the GPU SVM kernel
documentation to clearly define what each return code means for every
call.

There may also be some inconsistency in the return codes due to the
ad-hoc nature of how this evolved. If we need to clean up any return
values, this is probably something we should do now—before we end up in
a situation where we change a return value and then have to fix multiple
drivers.

Please let us know if, while you’re working in this area, you notice any
GPU SVM return values that don’t make sense or could use adjustment.

Matt

> +		if (ret == -EFAULT || ret == -ENOENT) {
> +			AMDGPU_SVM_ERR("no vma or mm is dying: 0x%lx, ret=%d\n",
> +					 fault_addr, ret);
> +			ret = 0;
> +		}
> +
> +		return ret;
> +	}
> +
> +	if (ktime_before(timestamp, ktime_add_ns(range->validate_timestamp,
> +					 AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING))) {
> +		AMDGPU_SVM_TRACE("already restored, skip: fault=0x%lx range=[0x%lx-0x%lx)\n",
> +				 fault_addr, drm_gpusvm_range_start(&range->base),
> +				 drm_gpusvm_range_end(&range->base));
> +		goto out;
> +	}
> +
> +	range_pte_flags = amdgpu_svm_range_attr_pte_flags(
> +					svm, attrs, map_ctx.read_only);
> +
> +	if (!(write_fault && map_ctx.read_only) &&
> +	    amdgpu_svm_range_is_valid(svm, range, attrs, range_pte_flags)) {
> +		AMDGPU_SVM_TRACE("valid range, skip: fault=0x%lx range=[0x%lx-0x%lx)\n",
> +				 fault_addr, drm_gpusvm_range_start(&range->base),
> +				 drm_gpusvm_range_end(&range->base));
> +		goto out;
> +	}
> +
> +	AMDGPU_SVM_RANGE_DEBUG(range, "PAGE FAULT");
> +	/* TODO: add migration*/
> +
> +	AMDGPU_SVM_RANGE_DEBUG(range, "GET PAGES");
> +	ret = amdgpu_svm_range_get_pages(svm, &range->base, &map_ctx);
> +	if (ret == -EOPNOTSUPP || ret == -EFAULT) {
> +		/*
> +		* -EOPNOTSUPP  Mixed page types within range.
> +		* -EFAULT      (a) mm is dying.
> +		*              (b) range was unmapped.
> +		*              (c) DMA mapping failed.
> +		*              (d) devmem_only requested but system page encountered.
> +		*              (e) hmm_range_fault: no VMA, page fault error, bad pte/pmd.
> +		* -EBUSY       HMM retry loop timed out.
> +		* -ENOMEM      PFN or DMA address array allocation failed.
> +		* -EINVAL      hmm_range_fault: invalid VMA type.
> +		*/
> +		map_ctx.timeslice_ms <<= 1;
> +		if (!map_ctx.devmem_only && --retry_count > 0) {
> +			AMDGPU_SVM_ERR("start retry: get_pages failed with %d, retries_left=%d: fault=0x%lx range=[0x%lx-0x%lx)\n",
> +					 ret, retry_count, fault_addr,
> +					 drm_gpusvm_range_start(&range->base),
> +					 drm_gpusvm_range_end(&range->base));
> +			goto retry;
> +		} else {
> +			AMDGPU_SVM_ERR("map_fault: get_pages failed with %d, devmem fallback allowed, but no devmem pages: fault=0x%lx range=[0x%lx-0x%lx)\n",
> +					 ret, fault_addr, drm_gpusvm_range_start(&range->base),
> +					 drm_gpusvm_range_end(&range->base));
> +		}
> +	}
> +
> +	if (ret == -EPERM) {
> +		AMDGPU_SVM_ERR("get_pages -EPERM: fault=0x%lx range=[0x%lx-0x%lx)\n",
> +			       fault_addr, drm_gpusvm_range_start(&range->base),
> +				       drm_gpusvm_range_end(&range->base));
> +		return ret;
> +	}
> +
> +	if (ret) {
> +		AMDGPU_SVM_RANGE_DEBUG(range, "PAGE FAULT - FAIL PAGE COLLECT");
> +		goto out;
> +	}
> +
> +	AMDGPU_SVM_RANGE_DEBUG(range, "PAGE FAULT - GPU MAP");
> +
> +	ret = amdgpu_svm_range_update_mapping(svm, range,
> +					      range_pte_flags, attrs->flags,
> +					      false, false, false);
> +
> +	if (ret)
> +		goto err_out;
> +
> +out:
> +	return 0;
> +
> +err_out:
> +	if (ret == -EAGAIN && --retry_count > 0) {
> +		map_ctx.timeslice_ms <<= 1;
> +		AMDGPU_SVM_RANGE_DEBUG(range, "PAGE FAULT - RETRY GPU MAP");
> +		goto retry;
> +	}
> +
> +	return ret;
> +}
> +
> +int amdgpu_svm_handle_fault(struct amdgpu_device *adev, uint32_t pasid,
> +			    uint64_t fault_addr, uint64_t ts,
> +			    bool write_fault)
> +{
> +	struct amdgpu_svm *svm;
> +	struct amdgpu_svm_attr_range *attr_range;
> +	unsigned long attr_start_page, attr_last_page;
> +	unsigned long fault_page;
> +	uint64_t ckpt;
> +	int ret;
> +
> +	fault_addr = fault_addr << PAGE_SHIFT;
> +	fault_page = fault_addr >> PAGE_SHIFT;
> +
> +	svm = amdgpu_svm_lookup_by_pasid(adev, pasid);
> +	if (!svm) {
> +		AMDGPU_SVM_ERR("handle_fault: no SVM context for pasid %u\n", pasid);
> +		return -EOPNOTSUPP;
> +	}
> +
> +	if (atomic_read(&svm->exiting)) {
> +		AMDGPU_SVM_ERR("handle_fault: SVM context is exiting for pasid %u\n", pasid);
> +		ret = -EAGAIN;
> +		goto out_put;
> +	}
> +
> +	if (!svm->xnack_enabled) {
> +		AMDGPU_SVM_ERR("handle_fault: SVM context does not have xnack enabled for pasid %u\n", pasid);
> +		ret = -EOPNOTSUPP;
> +		goto out_put;
> +	}
> +
> +	ckpt = READ_ONCE(svm->checkpoint_ts);
> +	if (ckpt != 0) {
> +		if (amdgpu_ih_ts_after_or_equal(ts, ckpt)) {
> +			AMDGPU_SVM_TRACE(
> +			"handle_fault: draining stale retry fault, drop fault 0x%llx ts=%llu checkpoint=%llu\n",
> +				fault_addr, ts, ckpt);
> +			amdgpu_gmc_filter_faults_remove(
> +				adev, fault_addr >> PAGE_SHIFT, pasid);
> +			ret = 0;
> +			goto out_put;
> +		} else {
> +			WRITE_ONCE(svm->checkpoint_ts, 0);
> +		}
> +	}
> +
> +	down_write(&svm->svm_lock);
> +
> +retry:
> +	mutex_lock(&svm->attr_tree->lock);
> +	attr_range = amdgpu_svm_attr_get_bounds_locked(svm->attr_tree,
> +						       fault_page,
> +						       &attr_start_page, &attr_last_page);
> +	mutex_unlock(&svm->attr_tree->lock);
> +	if (!attr_range) {
> +		ret = amdgpu_svm_range_get_unregistered_attrs(svm, fault_addr,
> +							      attr_start_page,
> +							      attr_last_page,
> +							      &attr_range);
> +		if (ret) {
> +			if (ret == -EFAULT)
> +				goto out_no_vma;
> +			goto out_unlock;
> +		}
> +	}
> +	ret = amdgpu_svm_range_map_fault(svm, fault_addr, attr_range,
> +					 write_fault);
> +
> +	if (ret == -EAGAIN) {
> +		AMDGPU_SVM_ERR("handle_fault: got -EAGAIN: fault=0x%llx\n",
> +			       fault_addr);
> +		amdgpu_gmc_filter_faults_remove(adev, fault_addr>>PAGE_SHIFT, pasid);
> +		goto retry;
> +	}
> +
> +	goto out_unlock;
> +
> +out_no_vma:
> +	AMDGPU_SVM_ERR("handle_fault: no VMA for fault=0x%llx (stale retry or GPU NULL deref)\n",
> +		 fault_addr);
> +	ret = 0;
> +
> +out_unlock:
> +	up_write(&svm->svm_lock);
> +
> +out_put:
> +	amdgpu_svm_put(svm);
> +	return ret;
> +}
> +
> +#endif /* CONFIG_DRM_AMDGPU_SVM */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.h
> new file mode 100644
> index 000000000..1c8f6c15e
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.h
> @@ -0,0 +1,39 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR MIT */
> +/*
> + * Copyright 2026 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef __AMDGPU_SVM_FAULT_H__
> +#define __AMDGPU_SVM_FAULT_H__
> +
> +#include <linux/types.h>
> +
> +struct amdgpu_device;
> +struct amdgpu_svm;
> +struct amdgpu_svm_attr_range;
> +struct amdgpu_svm_attrs;
> +
> +int amdgpu_svm_handle_fault(struct amdgpu_device *adev, uint32_t pasid,
> +			    uint64_t fault_addr, uint64_t ts,
> +			    bool write_fault);
> +
> +#endif /* __AMDGPU_SVM_FAULT_H__ */
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm
  2026-04-20 13:12 [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm Honglei Huang
                   ` (11 preceding siblings ...)
  2026-04-20 13:13 ` [RFC V3 12/12] drm/amdgpu: integrate SVM into build system and VM fault path Honglei Huang
@ 2026-04-21  2:31 ` Huang Rui
  12 siblings, 0 replies; 15+ messages in thread
From: Huang Rui @ 2026-04-21  2:31 UTC (permalink / raw)
  To: Honglei Huang
  Cc: Alexander.Deucher, Felix.Kuehling, Christian.Koenig, Oak.Zeng,
	Jenny-Jing.Liu, Philip.Yang, Xiaogang.Chen, Lingshan.Zhu,
	Junhua.Shen, matthew.brost, rodrigo.vivi, thomas.hellstrom, dakr,
	aliceryhl, amd-gfx, dri-devel, honghuan

On Mon, Apr 20, 2026 at 09:12:55PM +0800, Honglei Huang wrote:
> From: Honglei Huang <honghuan@amd.com>
> 
> V3 of the SVM patch series for amdgpu based on the drm_gpusvm framework. 
> This revision incorporates feedback from V1, adds XNACK on GPU fault handling,
> improves code organization, and removes the XNACK off (no GPU fault) implementation
> to focus on the fault driven model that aligns with drm_gpusvm's design. 
> The implementation references extensively from xe_svm.
> 
> This patch series implements SVM support with the following design:
> 
>   1. Attributes separated from physical page management:
> 
>     - Attribute layer (amdgpu_svm_attr_tree): a driver-side interval
>       tree storing per-range SVM attributes. Managed through SET_ATTR
>       ioctl and preserved across range lifecycle events.
> 
>     - Physical page layer (drm_gpusvm ranges): managed by the
>       drm_gpusvm framework, representing HMM-backed DMA mappings
>       and GPU page table entries.
> 
>     This separation ensures attributes survive when GPU ranges are
>     destroyed (partial munmap, attribute split, GC). The fault
>     handler recreates GPU ranges from the attribute tree on demand.
> 
>   2. GPU fault driven mapping (XNACK on):
> 
>     The core mapping path is driven by GPU page faults instead of ioctls.
>     amdgpu_svm_handle_fault() looks up SVM by PASID, runs GC,
>     resolves attributes, then maps via find_or_insert -> get_pages
>     -> GPU PTE update. For unregistered addresses, default
>     attributes are derived from VMA properties automatically.
> 
>   3. MMU notifier invalidation:
> 
>     Two-phase callback: event_begin() zaps GPU PTEs and flushes
>     TLB, event_end() unmaps DMA pages. UNMAP events queue ranges
>     to GC for deferred cleanup. Non-UNMAP events (eviction) rely
>     on GPU fault to remap.
> 
>   4. Garbage collector:
> 
>     GC workqueue processes unmapped ranges: removes them
>     from drm_gpusvm and clears corresponding attributes. No
>     rebuild or restore logic, GPU fault handles recreation.
> 
> Changes since V2:
>   - Add version tittle in commit message.
>   - Fix some content mistaken.
> 
> Changes since V1:
>   - Added GPU fault handler: amdgpu_svm_handle_fault with PASID-based
>     SVM lookup, following the standard flow: garbage collector ->
>     find or insert range -> check valid -> migrate (TODO) / get_pages
>     -> GPU bind/map.
> 
>   - Removed the restore worker queue entirely. V1 had separate GC
>     and restore workers: restore workers were responsible for 
>     synchronously restore in queue stop/start cause no GPU fault support.
>     With XNACK on fault driven model, synchronous restore is unnecessary,
>     the GPU fault handler recreates ranges on demand. The GC worker in 
>     V2 is simplified to only discard ranges and clear their attributes, 
>     with no rebuild or restore logic. AMDGPU_SVM_FLAG_GPU_ALWAYS_MAPPED
>     support is removed as no restore worker.
> 
>   - Reworked MMU notifier callback (amdgpu_svm_range_invalidate):
>     V1 had a monolithic dispatcher with flag combinations and
>     queue ops (CLEAR_PTE/QUEUE_INTERVAL, UNMAP/RESTORE) plus
>     begin_restore() to quiesce KFD queues. V2 uses a two-phase
>     model: event_begin() zaps GPU PTEs and flushes TLB,
>     event_end() unmaps DMA pages and queues UNMAP ranges to GC.
>     Non-UNMAP events (eviction) just zap PTEs and let GPU fault
>     remap. Removed begin_restore/end_restore callbacks,
>     has_always_mapped_range() check, and NOTIFIER flag dispatch.
>     Added checkpoint timestamp capture on UNMAP for fault dedup.
> 
>   - Added amdgpu_svm_range_invalidate_interval(): when userspace
>     sets new attributes on a sub region of an existing attribute
>     range, the attribute tree splits the old range and the new
>     sub region gets different attributes. However, existing
>     drm_gpusvm ranges may across the new attribute boundary
>     (e.g., a 2M GPU range covers both the old and new attribute
>     regions). This function walks all gpusvm ranges in the
>     affected interval, zaps GPU PTEs and flushes TLB. Ranges
>     that cross the new boundary and old boundary are removed 
>     entirely so the GPU fault handler can recreate them with 
>     boundaries aligned to the updated attribute layout.
> 
>   - On MMU_NOTIFY_UNMAP events, discard all affected gpusvm ranges
>     entirely without synchronous rebuild in v1. The unmap may destroy
>     more ranges than strictly necessary (e.g., a partial munmap
>     hits a 2M range that extends beyond the unmapped region), but
>     the attribute layer preserves the still valid attributes for
>     the remaining address space. When the GPU next accesses those
>     addresses, the fault handler automatically recreates the
>     ranges with correct boundaries from the surviving attributes.
>     This avoids the synchronous rebuild logic that V1 required 
>     (unmap -> rebuild in GC/restore worker).
> 
>   - Add attribute creation for unregistered addresses:
>     amdgpu_svm_range_get_unregistered_attrs() derives default
>     SVM attributes from VMA properties and GPU IP capabilities
>     when the faulting address has no user attributes registered.
>     this feature is needed to pass ROCm user mode runtime tests:
>     kfd/rocr/hip. ROCm supports no registered virtual address access
>     with default SVM attributes before, so amdgpu svm needs to support.
> 
>   - Explicitly returns -EOPNOTSUPP in amdgpu_svm_init when XNACK
>     is disabled. V1 attempted mixed XNACK on/off support with
>     complex KFD queue quiesce/resume callbacks and ioctl driven
>     mapping paths, which added substantial complexity. V2 drops
>     these implementations to focus on the fault driven model.
> 
>   - Removed kgd2kfd_quiesce_mm()/resume_mm() dependency that V1
>     used for XNACK off queue control. For XNACK on, the GPU fault 
>     handler is the enterance for SVM range mapping, so no quiesce/resume
>     is needed for this version. 
> 
>   - Added new change triggers: TRIGGER_RANGE_SPLIT, TRIGGER_PREFETCH.
>     for sub attr set and prefetch trigger support.
> 
>   - Added helper functions: find_locked, get_bounds_locked,
>     set_default for GPU fault handling.
> 
>   - Design questions section removed.
> 
> TODO:
>   - Add multi GPU support.
>   - Add XNACK off mode.
>   - Add migration or prefetch. This part work is ongoing in:
>     https://lore.kernel.org/amd-gfx/20260410113146.146212-1-Junhua.Shen@amd.com/
> 
> Test results:
>   Tested on gfx943 (MI300X) and gfx906 (MI60) with XNACK on:
>   - KFD test: 95%+ passed.
>   - ROCR test: all passed.
>   - HIP catch test: gfx943 (MI300X): 96% passed.
>                     gfx906 (MI60):99% passed.

It would be best to also include the ROCm runtime merge request in the
cover letter, and clarify that the above test results are based on V3 +
user-space ROCR.

https://github.com/ROCm/rocm-systems/pull/4364

Thanks,
Ray

> 
> Patch overview:
> 
>   01/12 UAPI: DRM_AMDGPU_GEM_SVM ioctl, SVM flags, SET_ATTR/GET_ATTR
>         operations, attribute types in amdgpu_drm.h.
> 
>   02/12 Core header: amdgpu_svm wrapping drm_gpusvm with refcount,
>         attr_tree, GC struct, locks, and VM integration hooks.
> 
>   03/12 Attribute types: amdgpu_svm_attrs, attr_range (interval tree
>         node), attr_tree, access enum, flag masks, change triggers.
> 
>   04/12 Attribute tree ops: interval tree lookup, insert, remove,
>         find_locked, get_bounds_locked, set_default, and lifecycle.
> 
>   05/12 Attribute set/get/clear: validate UAPI attributes, apply to
>         tree with head/tail splitting, change propagation, and query.
> 
>   06/12 Range types: amdgpu_svm_range extending drm_gpusvm_range
>         with gpu_mapped state, pending ops, work queue linkage,
>         and op_ctx for batch processing.
> 
>   07/12 Range GPU mapping: PTE flags computation with read_only
>         support, GPU page table update, range mapping loop.
> 
>   08/12 Notifier and GC helpers: two-phase notifier events, range
>         removal, GC enqueue/add with dedicated workqueue.
> 
>   09/12 Attribute change and invalidation: apply attribute triggers
>         to GPU ranges, invalidate_interval for boundary realignment,
>         work queue dequeue helpers, checkpoint timestamp.
> 
>   10/12 Initialization and lifecycle: kmem_cache, drm_gpusvm_init
>         with chunk sizes (2M/64K/4K), XNACK detection, GC init,
>         PASID lookup, TLB flush, and init/close/fini lifecycle.
> 
>   11/12 Ioctl, GC, and fault handler: ioctl dispatcher, GC worker,
>         and amdgpu_svm_fault.c/h with full fault path including
>         unregistered attribute derivation and retry logic.
> 
>   12/12 Build integration: Kconfig (CONFIG_DRM_AMDGPU_SVM), Makefile
>         rules, ioctl registration, and amdgpu_vm fault dispatch.
> 
> Honglei Huang (12):
>   drm/amdgpu: define SVM UAPI for GPU shared virtual memory
>   drm/amdgpu: introduce SVM core header and VM integration
>   drm/amdgpu: define SVM attribute subsystem types
>   drm/amdgpu: implement SVM attribute tree and helper functions
>   drm/amdgpu: implement SVM attribute set, get, and clear
>   drm/amdgpu: define SVM range types and work queue interface
>   drm/amdgpu: implement SVM range GPU mapping core
>   drm/amdgpu: implement SVM range notifier and GC helpers
>   drm/amdgpu: implement SVM attribute change and invalidation callback
>   drm/amdgpu: implement SVM initialization and lifecycle
>   drm/amdgpu: add SVM ioctl, garbage collector, and fault handler
>   drm/amdgpu: integrate SVM into build system and VM fault path
> 
>  drivers/gpu/drm/amd/amdgpu/Kconfig            |  11 +
>  drivers/gpu/drm/amd/amdgpu/Makefile           |  13 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |   2 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c       | 467 +++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_svm.h       | 162 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.c  | 952 ++++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.h  | 144 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.c | 368 +++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.h |  39 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c | 863 ++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.h | 148 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |  20 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        |   4 +
>  include/uapi/drm/amdgpu_drm.h                 |  39 +
>  14 files changed, 3231 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm.h
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.h
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.h
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.h
> 
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-04-21  2:31 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-20 13:12 [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm Honglei Huang
2026-04-20 13:12 ` [RFC V3 01/12] drm/amdgpu: define SVM UAPI for GPU shared virtual memory Honglei Huang
2026-04-20 13:12 ` [RFC V3 02/12] drm/amdgpu: introduce SVM core header and VM integration Honglei Huang
2026-04-20 13:12 ` [RFC V3 03/12] drm/amdgpu: define SVM attribute subsystem types Honglei Huang
2026-04-20 13:12 ` [RFC V3 04/12] drm/amdgpu: implement SVM attribute tree and helper functions Honglei Huang
2026-04-20 13:13 ` [RFC V3 05/12] drm/amdgpu: implement SVM attribute set, get, and clear Honglei Huang
2026-04-20 13:13 ` [RFC V3 06/12] drm/amdgpu: define SVM range types and work queue interface Honglei Huang
2026-04-20 13:13 ` [RFC V3 07/12] drm/amdgpu: implement SVM range GPU mapping core Honglei Huang
2026-04-20 13:13 ` [RFC V3 08/12] drm/amdgpu: implement SVM range notifier and GC helpers Honglei Huang
2026-04-20 13:13 ` [RFC V3 09/12] drm/amdgpu: implement SVM attribute change and invalidation callback Honglei Huang
2026-04-20 13:13 ` [RFC V3 10/12] drm/amdgpu: implement SVM initialization and lifecycle Honglei Huang
2026-04-20 13:13 ` [RFC V3 11/12] drm/amdgpu: add SVM ioctl, garbage collector, and fault handler Honglei Huang
2026-04-20 16:24   ` Matthew Brost
2026-04-20 13:13 ` [RFC V3 12/12] drm/amdgpu: integrate SVM into build system and VM fault path Honglei Huang
2026-04-21  2:31 ` [RFC V3 00/12] drm/amdgpu: SVM implementation based on drm_gpusvm Huang Rui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox