dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM
@ 2025-03-12 21:03 Thomas Hellström
  2025-03-12 21:03 ` [RFC PATCH 01/19] drm/xe: Introduce CONFIG_DRM_XE_GPUSVM Thomas Hellström
                   ` (19 more replies)
  0 siblings, 20 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:03 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Thomas Hellström, himal.prasad.ghimiray, apopple, airlied,
	Simona Vetter, felix.kuehling, Matthew Brost,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

This RFC implements and requests comments for a way to handle SVM with multi-device,
typically with fast interconnects. It adds generic code and helpers in drm, and
device-specific code for xe.

For SVM, devices set up maps of device-private struct pages, using a struct dev_pagemap,
The CPU virtual address space (mm), can then be set up using special page-table entries
to point to such pages, but they can't be accessed directly by the CPU, but possibly
by other devices using a fast interconnect. This series aims to provide helpers to
identify pagemaps that take part in such a fast interconnect and to aid in migrating
between them.

This is initially done by augmenting the struct dev_pagemap with a struct drm_pagemap,
and having the struct drm_pagemap implement a "populate_mm" method, where a region of
the CPU virtual address space (mm) is populated with device_private pages from the
dev_pagemap associated with the drm_pagemap, migrating data from system memory or other
devices if necessary. The drm_pagemap_populate_mm() function is then typically called
from a fault handler, using the struct drm_pagemap pointer of choice. It could be
referencing a local drm_pagemap or a remote one. The migration is now completely done
by drm_pagemap callbacks, (typically using a copy-engine local to the dev_pagemap local
memory).

In addition there are helpers to build a drm_pagemap UAPI using file-descripors
representing struct drm_pagemaps, and a helper to register devices with a common
fast interconnect. The UAPI is intended to be private to the device, but if drivers
agree to identify struct drm_pagemaps by file descriptors one could in theory
do cross-driver multi-device SVM if a use-case were found.

The implementation for the Xe driver uses dynamic pagemaps which are created on first
use and removed 5s after the last reference is gone. Pagemaps are revoked on
device unbind, and data is then migrated to system.

Status:
This is a POC series. It has been tested with an IGT test soon to be published, with a
DG1 drm_pagemap and a BattleMage SVM client. There is separate work ongoing for the
gpu_madvise functionality.

The Xe implementation of the "populate_mm()" callback is
still rudimentary and doesn't migrate from foreign devices. It should be tuned to do
smarter choices.

Any feedback appreciated.

Patch overview:
Patch 1:
- Extends the way the Xe driver can compile out SVM support and pagemaps.
Patch 2:
- Fixes an existing potential UAF in the Xe SVM code.
Patch 3:
- Introduces the drm_pagemap.c file and moves drm_pagemap functionality to it.
Patch 4:
- Adds a populate_mm op to drm_pagemap.
Patch 5:
- Implement Xe's version of the populate_mm op.
Patch 6:
- Refcount struct drm_pagemap.
Patch 7:
- Cleanup patch.
Patch 8:
- Add a bo_remove callback for Xe, Used during device unbind.
Patch 9:
- Add a drm_pagemap utility to calculate a common owner structure
Patch 10:
- Adopt GPUSVM to a (sort of) dynamic owner.
Patch 11:
- Xe calculates the dev_private owner using the drm_pagemap utility.
Patch 12:
- Update the Xe page-table code to handle per range mixed system / device_private placement.
Patch 13:
- Modify GPUSVM to allow such placements.
Patch 14:
- Add a preferred pagemap to use by the Xe fault handler.
Patch 15:
- Add a utility that converts between drm_pagemaps and file-descriptors and back.
Patch 16:
- Fix Xe so that also devices without fault capability can publish drm_pagemaps.
Patch 17:
- Add the devmem_open UAPI, creating a drm_pagemap file descriptor from a
  (device, region) pair.
Patch 18:
- (Only for POC) Add an GPU madvise prefer_devmem IOCTL.
Patch 19:
- (Only for POC) Implement pcie p2p DMA as a fast interconnect and test.

Matthew Brost (1):
  drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap

Thomas Hellström (18):
  drm/xe: Introduce CONFIG_DRM_XE_GPUSVM
  drm/xe/svm: Fix a potential bo UAF
  drm/pagemap: Add a populate_mm op
  drm/xe: Implement and use the drm_pagemap populate_mm op
  drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap and manage
    lifetime
  drm/pagemap: Get rid of the struct
    drm_pagemap_zdd::device_private_page_owner field
  drm/xe/bo: Add a bo remove callback
  drm/pagemap_util: Add a utility to assign an owner to a set of
    interconnected gpus
  drm/gpusvm, drm/xe: Move the device private owner to the
    drm_gpusvm_ctx
  drm/xe: Use the drm_pagemap_util helper to get a svm pagemap owner
  drm/xe: Make the PT code handle placement per PTE rather than per vma
    / range
  drm/gpusvm: Allow mixed mappings
  drm/xe: Add a preferred dpagemap
  drm/pagemap/util: Add file descriptors pointing to struct drm_pagemap
  drm/xe/migrate: Allow xe_migrate_vram() also on non-pagefault capable
    devices
  drm/xe/uapi: Add the devmem_open ioctl
  drm/xe/uapi: HAX: Add the xe_madvise_prefer_devmem IOCTL
  drm/xe: HAX: Use pcie p2p dma to test fast interconnect

 Documentation/gpu/rfc/gpusvm.rst     |  12 +-
 drivers/gpu/drm/Makefile             |   7 +-
 drivers/gpu/drm/drm_gpusvm.c         | 782 +---------------------
 drivers/gpu/drm/drm_pagemap.c        | 940 +++++++++++++++++++++++++++
 drivers/gpu/drm/drm_pagemap_util.c   | 203 ++++++
 drivers/gpu/drm/xe/Kconfig           |  24 +-
 drivers/gpu/drm/xe/Makefile          |   2 +-
 drivers/gpu/drm/xe/xe_bo.c           |  65 +-
 drivers/gpu/drm/xe/xe_bo.h           |   2 +
 drivers/gpu/drm/xe/xe_bo_types.h     |   2 +-
 drivers/gpu/drm/xe/xe_device.c       |   8 +
 drivers/gpu/drm/xe/xe_device_types.h |  30 +-
 drivers/gpu/drm/xe/xe_migrate.c      |   8 +-
 drivers/gpu/drm/xe/xe_pt.c           | 112 ++--
 drivers/gpu/drm/xe/xe_query.c        |   2 +-
 drivers/gpu/drm/xe/xe_svm.c          | 716 +++++++++++++++++---
 drivers/gpu/drm/xe/xe_svm.h          | 158 ++++-
 drivers/gpu/drm/xe/xe_tile.c         |  20 +-
 drivers/gpu/drm/xe/xe_tile.h         |  33 +
 drivers/gpu/drm/xe/xe_vm.c           |   6 +-
 drivers/gpu/drm/xe/xe_vm_types.h     |   7 +
 include/drm/drm_gpusvm.h             | 102 +--
 include/drm/drm_pagemap.h            | 190 +++++-
 include/drm/drm_pagemap_util.h       |  59 ++
 include/uapi/drm/xe_drm.h            |  39 ++
 25 files changed, 2458 insertions(+), 1071 deletions(-)
 create mode 100644 drivers/gpu/drm/drm_pagemap.c
 create mode 100644 drivers/gpu/drm/drm_pagemap_util.c
 create mode 100644 include/drm/drm_pagemap_util.h

-- 
2.48.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH 01/19] drm/xe: Introduce CONFIG_DRM_XE_GPUSVM
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
@ 2025-03-12 21:03 ` Thomas Hellström
  2025-03-12 21:03 ` [RFC PATCH 02/19] drm/xe/svm: Fix a potential bo UAF Thomas Hellström
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:03 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Thomas Hellström, himal.prasad.ghimiray, apopple, airlied,
	Simona Vetter, felix.kuehling, Matthew Brost,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

Don't rely on CONFIG_DRM_GPUSVM because other drivers may enable it
causing us to compile in SVM support unintentionally.

Also take the opportunity to leave more code out of compilation if
!CONFIG_DRM_XE_GPUSVM and !CONFIG_DRM_XE_DEVMEM_MIRROR

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/Kconfig           | 16 ++++++-
 drivers/gpu/drm/xe/Makefile          |  2 +-
 drivers/gpu/drm/xe/xe_device_types.h |  6 ++-
 drivers/gpu/drm/xe/xe_migrate.c      |  3 ++
 drivers/gpu/drm/xe/xe_pt.c           |  6 +++
 drivers/gpu/drm/xe/xe_query.c        |  2 +-
 drivers/gpu/drm/xe/xe_svm.c          | 15 ++++++
 drivers/gpu/drm/xe/xe_svm.h          | 72 ++++++++++++++++++++--------
 drivers/gpu/drm/xe/xe_vm.c           |  2 +-
 9 files changed, 97 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
index 7d7995196702..aea4240664fa 100644
--- a/drivers/gpu/drm/xe/Kconfig
+++ b/drivers/gpu/drm/xe/Kconfig
@@ -39,7 +39,7 @@ config DRM_XE
 	select DRM_TTM_HELPER
 	select DRM_EXEC
 	select DRM_GPUVM
-	select DRM_GPUSVM if !UML && DEVICE_PRIVATE
+	select DRM_GPUSVM if DRM_XE_GPUSVM
 	select DRM_SCHED
 	select MMU_NOTIFIER
 	select WANT_DEV_COREDUMP
@@ -74,9 +74,21 @@ config DRM_XE_DP_TUNNEL
 
 	  If in doubt say "Y".
 
+config DRM_XE_GPUSVM
+	bool "Enable CPU to GPU address mirroring"
+	depends on DRM_XE
+	depends on !UML
+	default y
+	select DEVICE_PRIVATE
+	help
+	  Enable this option if you want support for CPU to GPU address
+	  mirroring.
+
+	  If in doubut say "Y".
+
 config DRM_XE_DEVMEM_MIRROR
 	bool "Enable device memory mirror"
-	depends on DRM_XE
+	depends on DRM_XE_GPUSVM
 	select GET_FREE_REGION
 	default y
 	help
diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 9699b08585f7..e4fec90bab55 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -125,7 +125,7 @@ xe-y += xe_bb.o \
 	xe_wopcm.o
 
 xe-$(CONFIG_HMM_MIRROR) += xe_hmm.o
-xe-$(CONFIG_DRM_GPUSVM) += xe_svm.o
+xe-$(CONFIG_DRM_XE_GPUSVM) += xe_svm.o
 
 # graphics hardware monitoring (HWMON) support
 xe-$(CONFIG_HWMON) += xe_hwmon.o
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 72ef0b6fc425..8aa90acc2a0a 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -107,6 +107,9 @@ struct xe_vram_region {
 	resource_size_t actual_physical_size;
 	/** @mapping: pointer to VRAM mappable space */
 	void __iomem *mapping;
+	/** @ttm: VRAM TTM manager */
+	struct xe_ttm_vram_mgr ttm;
+#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
 	/** @pagemap: Used to remap device memory as ZONE_DEVICE */
 	struct dev_pagemap pagemap;
 	/**
@@ -120,8 +123,7 @@ struct xe_vram_region {
 	 * This is generated when remap device memory as ZONE_DEVICE
 	 */
 	resource_size_t hpa_base;
-	/** @ttm: VRAM TTM manager */
-	struct xe_ttm_vram_mgr ttm;
+#endif
 };
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index df4282c71bf0..d364c9f458e7 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -1544,6 +1544,7 @@ void xe_migrate_wait(struct xe_migrate *m)
 		dma_fence_wait(m->fence, false);
 }
 
+#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
 static u32 pte_update_cmd_size(u64 size)
 {
 	u32 num_dword;
@@ -1719,6 +1720,8 @@ struct dma_fence *xe_migrate_from_vram(struct xe_migrate *m,
 			       XE_MIGRATE_COPY_TO_SRAM);
 }
 
+#endif
+
 #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST)
 #include "tests/xe_migrate.c"
 #endif
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index ffaf0d02dc7d..9e719535a3bb 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1420,6 +1420,7 @@ static int xe_pt_userptr_pre_commit(struct xe_migrate_pt_update *pt_update)
 	return err;
 }
 
+#if IS_ENABLED(CONFIG_DRM_XE_GPUSVM)
 static int xe_pt_svm_pre_commit(struct xe_migrate_pt_update *pt_update)
 {
 	struct xe_vm *vm = pt_update->vops->vm;
@@ -1453,6 +1454,7 @@ static int xe_pt_svm_pre_commit(struct xe_migrate_pt_update *pt_update)
 
 	return 0;
 }
+#endif
 
 struct invalidation_fence {
 	struct xe_gt_tlb_invalidation_fence base;
@@ -2257,11 +2259,15 @@ static const struct xe_migrate_pt_update_ops userptr_migrate_ops = {
 	.pre_commit = xe_pt_userptr_pre_commit,
 };
 
+#if IS_ENABLED(CONFIG_DRM_XE_GPUSVM)
 static const struct xe_migrate_pt_update_ops svm_migrate_ops = {
 	.populate = xe_vm_populate_pgtable,
 	.clear = xe_migrate_clear_pgtable_callback,
 	.pre_commit = xe_pt_svm_pre_commit,
 };
+#else
+static const struct xe_migrate_pt_update_ops svm_migrate_ops;
+#endif
 
 /**
  * xe_pt_update_ops_run() - Run PT update operations
diff --git a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c
index 5e65830dad25..2dbf4066d86f 100644
--- a/drivers/gpu/drm/xe/xe_query.c
+++ b/drivers/gpu/drm/xe/xe_query.c
@@ -340,7 +340,7 @@ static int query_config(struct xe_device *xe, struct drm_xe_device_query *query)
 	if (xe_device_get_root_tile(xe)->mem.vram.usable_size)
 		config->info[DRM_XE_QUERY_CONFIG_FLAGS] |=
 			DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM;
-	if (xe->info.has_usm && IS_ENABLED(CONFIG_DRM_GPUSVM))
+	if (xe->info.has_usm && IS_ENABLED(CONFIG_DRM_XE_GPUSVM))
 		config->info[DRM_XE_QUERY_CONFIG_FLAGS] |=
 			DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR;
 	config->info[DRM_XE_QUERY_CONFIG_FLAGS] |=
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 516898e99b26..c305d4c351d7 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -339,6 +339,8 @@ static void xe_svm_garbage_collector_work_func(struct work_struct *w)
 	up_write(&vm->lock);
 }
 
+#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
+
 static struct xe_vram_region *page_to_vr(struct page *page)
 {
 	return container_of(page->pgmap, struct xe_vram_region, pagemap);
@@ -577,6 +579,8 @@ static const struct drm_gpusvm_devmem_ops gpusvm_devmem_ops = {
 	.copy_to_ram = xe_svm_copy_to_ram,
 };
 
+#endif
+
 static const struct drm_gpusvm_ops gpusvm_ops = {
 	.range_alloc = xe_svm_range_alloc,
 	.range_free = xe_svm_range_free,
@@ -650,6 +654,7 @@ static bool xe_svm_range_is_valid(struct xe_svm_range *range,
 	return (range->tile_present & ~range->tile_invalidated) & BIT(tile->id);
 }
 
+#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
 static struct xe_vram_region *tile_to_vr(struct xe_tile *tile)
 {
 	return &tile->mem.vram;
@@ -708,6 +713,15 @@ static int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
 
 	return err;
 }
+#else
+static int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
+			     struct xe_svm_range *range,
+			     const struct drm_gpusvm_ctx *ctx)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
 
 /**
  * xe_svm_handle_pagefault() - SVM handle page fault
@@ -863,6 +877,7 @@ int xe_svm_bo_evict(struct xe_bo *bo)
 }
 
 #if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
+
 static struct drm_pagemap_device_addr
 xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap,
 			  struct device *dev,
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index e059590e5076..c32b6d46ecf1 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -6,6 +6,8 @@
 #ifndef _XE_SVM_H_
 #define _XE_SVM_H_
 
+#if IS_ENABLED(CONFIG_DRM_XE_GPUSVM)
+
 #include <drm/drm_pagemap.h>
 #include <drm/drm_gpusvm.h>
 
@@ -43,7 +45,6 @@ struct xe_svm_range {
 	u8 skip_migrate	:1;
 };
 
-#if IS_ENABLED(CONFIG_DRM_GPUSVM)
 /**
  * xe_svm_range_pages_valid() - SVM range pages valid
  * @range: SVM range
@@ -72,7 +73,49 @@ bool xe_svm_has_mapping(struct xe_vm *vm, u64 start, u64 end);
 int xe_svm_bo_evict(struct xe_bo *bo);
 
 void xe_svm_range_debug(struct xe_svm_range *range, const char *operation);
+
+/**
+ * xe_svm_range_has_dma_mapping() - SVM range has DMA mapping
+ * @range: SVM range
+ *
+ * Return: True if SVM range has a DMA mapping, False otherwise
+ */
+static inline bool xe_svm_range_has_dma_mapping(struct xe_svm_range *range)
+{
+	lockdep_assert_held(&range->base.gpusvm->notifier_lock);
+	return range->base.flags.has_dma_mapping;
+}
+
+#define xe_svm_assert_in_notifier(vm__) \
+	lockdep_assert_held_write(&(vm__)->svm.gpusvm.notifier_lock)
+
+#define xe_svm_notifier_lock(vm__)	\
+	drm_gpusvm_notifier_lock(&(vm__)->svm.gpusvm)
+
+#define xe_svm_notifier_unlock(vm__)	\
+	drm_gpusvm_notifier_unlock(&(vm__)->svm.gpusvm)
+
 #else
+#include <linux/interval_tree.h>
+
+struct drm_pagemap_device_addr;
+struct xe_bo;
+struct xe_vm;
+struct xe_vma;
+struct xe_tile;
+struct xe_vram_region;
+
+#define XE_INTERCONNECT_VRAM 1
+
+struct xe_svm_range {
+	struct {
+		struct interval_tree_node itree;
+		const struct drm_pagemap_device_addr *dma_addr;
+	} base;
+	u32 tile_present;
+	u32 tile_invalidated;
+};
+
 static inline bool xe_svm_range_pages_valid(struct xe_svm_range *range)
 {
 	return false;
@@ -124,27 +167,16 @@ static inline
 void xe_svm_range_debug(struct xe_svm_range *range, const char *operation)
 {
 }
-#endif
 
-/**
- * xe_svm_range_has_dma_mapping() - SVM range has DMA mapping
- * @range: SVM range
- *
- * Return: True if SVM range has a DMA mapping, False otherwise
- */
-static inline bool xe_svm_range_has_dma_mapping(struct xe_svm_range *range)
+#define xe_svm_assert_in_notifier(...) do {} while (0)
+#define xe_svm_range_has_dma_mapping(...) false
+
+static inline void xe_svm_notifier_lock(struct xe_vm *vm)
 {
-	lockdep_assert_held(&range->base.gpusvm->notifier_lock);
-	return range->base.flags.has_dma_mapping;
 }
 
-#define xe_svm_assert_in_notifier(vm__) \
-	lockdep_assert_held_write(&(vm__)->svm.gpusvm.notifier_lock)
-
-#define xe_svm_notifier_lock(vm__)	\
-	drm_gpusvm_notifier_lock(&(vm__)->svm.gpusvm)
-
-#define xe_svm_notifier_unlock(vm__)	\
-	drm_gpusvm_notifier_unlock(&(vm__)->svm.gpusvm)
-
+static inline void xe_svm_notifier_unlock(struct xe_vm *vm)
+{
+}
+#endif
 #endif
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 60303998bd61..07c4992fb3d7 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -3109,7 +3109,7 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe, struct xe_vm *vm,
 
 		if (XE_IOCTL_DBG(xe, is_cpu_addr_mirror &&
 				 (!xe_vm_in_fault_mode(vm) ||
-				 !IS_ENABLED(CONFIG_DRM_GPUSVM)))) {
+				 !IS_ENABLED(CONFIG_DRM_XE_GPUSVM)))) {
 			err = -EINVAL;
 			goto free_bind_ops;
 		}
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 02/19] drm/xe/svm: Fix a potential bo UAF
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
  2025-03-12 21:03 ` [RFC PATCH 01/19] drm/xe: Introduce CONFIG_DRM_XE_GPUSVM Thomas Hellström
@ 2025-03-12 21:03 ` Thomas Hellström
  2025-03-12 21:04 ` [RFC PATCH 03/19] drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap Thomas Hellström
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:03 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Thomas Hellström, himal.prasad.ghimiray, apopple, airlied,
	Simona Vetter, felix.kuehling, Matthew Brost,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

If drm_gpusvm_migrate_to_devmem() succeeds, if a cpu access happens to the
range the bo may be freed before xe_bo_unlock(), causing a UAF.

Since the reference is transferred, use xe_svm_devmem_release() to
release the reference on drm_gpusvm_migrate_to_devmem() failure,
and hold a local reference to protect the UAF.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_svm.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index c305d4c351d7..1a8e17a0005d 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -701,11 +701,14 @@ static int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
 	list_for_each_entry(block, blocks, link)
 		block->private = vr;
 
+	xe_bo_get(bo);
 	err = drm_gpusvm_migrate_to_devmem(&vm->svm.gpusvm, &range->base,
 					   &bo->devmem_allocation, ctx);
-	xe_bo_unlock(bo);
 	if (err)
-		xe_bo_put(bo);	/* Creation ref */
+		xe_svm_devmem_release(&bo->devmem_allocation);
+
+	xe_bo_unlock(bo);
+	xe_bo_put(bo);
 
 unlock:
 	mmap_read_unlock(mm);
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 03/19] drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
  2025-03-12 21:03 ` [RFC PATCH 01/19] drm/xe: Introduce CONFIG_DRM_XE_GPUSVM Thomas Hellström
  2025-03-12 21:03 ` [RFC PATCH 02/19] drm/xe/svm: Fix a potential bo UAF Thomas Hellström
@ 2025-03-12 21:04 ` Thomas Hellström
  2025-03-12 21:04 ` [RFC PATCH 04/19] drm/pagemap: Add a populate_mm op Thomas Hellström
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:04 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Matthew Brost, Thomas Hellström, himal.prasad.ghimiray,
	apopple, airlied, Simona Vetter, felix.kuehling,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

From: Matthew Brost <matthew.brost@intel.com>

The migration functionality and track-keeping of per-pagemap VRAM
mapped to the CPU mm is not per GPU_vm, but rather per pagemap.
This is also reflected by the functions not needing the drm_gpusvm
structures. So move to drm_pagemap.

With this, drm_gpusvm shouldn't really access the page zone-device-data
since its meaning is internal to drm_pagemap. Currently it's used to
reject mapping ranges backed by multiple drm_pagemap allocations.
For now, make the zone-device-data a void pointer.

Rename CONFIG_DRM_XE_DEVMEM_MIRROR to CONFIG_DRM_XE_PAGEMAP.

Matt is listed as author of this commit since he wrote most of the code,
and it makes sense to retain his git authorship.
Thomas mostly moved the code around.

Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 Documentation/gpu/rfc/gpusvm.rst     |  12 +-
 drivers/gpu/drm/Makefile             |   6 +-
 drivers/gpu/drm/drm_gpusvm.c         | 750 +------------------------
 drivers/gpu/drm/drm_pagemap.c        | 784 +++++++++++++++++++++++++++
 drivers/gpu/drm/xe/Kconfig           |  10 +-
 drivers/gpu/drm/xe/xe_bo_types.h     |   2 +-
 drivers/gpu/drm/xe/xe_device_types.h |   2 +-
 drivers/gpu/drm/xe/xe_migrate.c      |   2 +-
 drivers/gpu/drm/xe/xe_svm.c          |  42 +-
 include/drm/drm_gpusvm.h             |  95 +---
 include/drm/drm_pagemap.h            |  98 ++++
 11 files changed, 943 insertions(+), 860 deletions(-)
 create mode 100644 drivers/gpu/drm/drm_pagemap.c

diff --git a/Documentation/gpu/rfc/gpusvm.rst b/Documentation/gpu/rfc/gpusvm.rst
index bcf66a8137a6..469db1372f16 100644
--- a/Documentation/gpu/rfc/gpusvm.rst
+++ b/Documentation/gpu/rfc/gpusvm.rst
@@ -73,15 +73,21 @@ Overview of baseline design
 .. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
    :doc: Locking
 
-.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
-   :doc: Migration
-
 .. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
    :doc: Partial Unmapping of Ranges
 
 .. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
    :doc: Examples
 
+Overview of drm_pagemap design
+==============================
+
+.. kernel-doc:: drivers/gpu/drm/drm_pagemap.c
+   :doc: Overview
+
+.. kernel-doc:: drivers/gpu/drm/drm_pagemap.c
+   :doc: Migration
+
 Possible future design features
 ===============================
 
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index ed54a546bbe2..6e3520bff769 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -104,7 +104,11 @@ obj-$(CONFIG_DRM_PANEL_BACKLIGHT_QUIRKS) += drm_panel_backlight_quirks.o
 #
 obj-$(CONFIG_DRM_EXEC) += drm_exec.o
 obj-$(CONFIG_DRM_GPUVM) += drm_gpuvm.o
-obj-$(CONFIG_DRM_GPUSVM) += drm_gpusvm.o
+
+drm_gpusvm_helper-y := \
+	drm_gpusvm.o\
+	drm_pagemap.o
+obj-$(CONFIG_DRM_GPUSVM) += drm_gpusvm_helper.o
 
 obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
 
diff --git a/drivers/gpu/drm/drm_gpusvm.c b/drivers/gpu/drm/drm_gpusvm.c
index 2451c816edd5..4fade7018507 100644
--- a/drivers/gpu/drm/drm_gpusvm.c
+++ b/drivers/gpu/drm/drm_gpusvm.c
@@ -8,10 +8,9 @@
 
 #include <linux/dma-mapping.h>
 #include <linux/hmm.h>
+#include <linux/hugetlb_inline.h>
 #include <linux/memremap.h>
-#include <linux/migrate.h>
 #include <linux/mm_types.h>
-#include <linux/pagemap.h>
 #include <linux/slab.h>
 
 #include <drm/drm_device.h>
@@ -107,21 +106,6 @@
  * to add annotations to GPU SVM.
  */
 
-/**
- * DOC: Migration
- *
- * The migration support is quite simple, allowing migration between RAM and
- * device memory at the range granularity. For example, GPU SVM currently does
- * not support mixing RAM and device memory pages within a range. This means
- * that upon GPU fault, the entire range can be migrated to device memory, and
- * upon CPU fault, the entire range is migrated to RAM. Mixed RAM and device
- * memory storage within a range could be added in the future if required.
- *
- * The reasoning for only supporting range granularity is as follows: it
- * simplifies the implementation, and range sizes are driver-defined and should
- * be relatively small.
- */
-
 /**
  * DOC: Partial Unmapping of Ranges
  *
@@ -193,10 +177,9 @@
  *		if (driver_migration_policy(range)) {
  *			mmap_read_lock(mm);
  *			devmem = driver_alloc_devmem();
- *			err = drm_gpusvm_migrate_to_devmem(gpusvm, range,
- *							   devmem_allocation,
- *							   &ctx);
- *			mmap_read_unlock(mm);
+ *			err = drm_pagemap_migrate_to_devmem(devmem, gpusvm->mm, gpuva_start,
+ *                                                          gpuva_end, driver_pgmap_owner());
+ *                      mmap_read_unlock(mm);
  *			if (err)	// CPU mappings may have changed
  *				goto retry;
  *		}
@@ -288,97 +271,6 @@ npages_in_range(unsigned long start, unsigned long end)
 	return (end - start) >> PAGE_SHIFT;
 }
 
-/**
- * struct drm_gpusvm_zdd - GPU SVM zone device data
- *
- * @refcount: Reference count for the zdd
- * @devmem_allocation: device memory allocation
- * @device_private_page_owner: Device private pages owner
- *
- * This structure serves as a generic wrapper installed in
- * page->zone_device_data. It provides infrastructure for looking up a device
- * memory allocation upon CPU page fault and asynchronously releasing device
- * memory once the CPU has no page references. Asynchronous release is useful
- * because CPU page references can be dropped in IRQ contexts, while releasing
- * device memory likely requires sleeping locks.
- */
-struct drm_gpusvm_zdd {
-	struct kref refcount;
-	struct drm_gpusvm_devmem *devmem_allocation;
-	void *device_private_page_owner;
-};
-
-/**
- * drm_gpusvm_zdd_alloc() - Allocate a zdd structure.
- * @device_private_page_owner: Device private pages owner
- *
- * This function allocates and initializes a new zdd structure. It sets up the
- * reference count and initializes the destroy work.
- *
- * Return: Pointer to the allocated zdd on success, ERR_PTR() on failure.
- */
-static struct drm_gpusvm_zdd *
-drm_gpusvm_zdd_alloc(void *device_private_page_owner)
-{
-	struct drm_gpusvm_zdd *zdd;
-
-	zdd = kmalloc(sizeof(*zdd), GFP_KERNEL);
-	if (!zdd)
-		return NULL;
-
-	kref_init(&zdd->refcount);
-	zdd->devmem_allocation = NULL;
-	zdd->device_private_page_owner = device_private_page_owner;
-
-	return zdd;
-}
-
-/**
- * drm_gpusvm_zdd_get() - Get a reference to a zdd structure.
- * @zdd: Pointer to the zdd structure.
- *
- * This function increments the reference count of the provided zdd structure.
- *
- * Return: Pointer to the zdd structure.
- */
-static struct drm_gpusvm_zdd *drm_gpusvm_zdd_get(struct drm_gpusvm_zdd *zdd)
-{
-	kref_get(&zdd->refcount);
-	return zdd;
-}
-
-/**
- * drm_gpusvm_zdd_destroy() - Destroy a zdd structure.
- * @ref: Pointer to the reference count structure.
- *
- * This function queues the destroy_work of the zdd for asynchronous destruction.
- */
-static void drm_gpusvm_zdd_destroy(struct kref *ref)
-{
-	struct drm_gpusvm_zdd *zdd =
-		container_of(ref, struct drm_gpusvm_zdd, refcount);
-	struct drm_gpusvm_devmem *devmem = zdd->devmem_allocation;
-
-	if (devmem) {
-		complete_all(&devmem->detached);
-		if (devmem->ops->devmem_release)
-			devmem->ops->devmem_release(devmem);
-	}
-	kfree(zdd);
-}
-
-/**
- * drm_gpusvm_zdd_put() - Put a zdd reference.
- * @zdd: Pointer to the zdd structure.
- *
- * This function decrements the reference count of the provided zdd structure
- * and schedules its destruction if the count drops to zero.
- */
-static void drm_gpusvm_zdd_put(struct drm_gpusvm_zdd *zdd)
-{
-	kref_put(&zdd->refcount, drm_gpusvm_zdd_destroy);
-}
-
 /**
  * drm_gpusvm_range_find() - Find GPU SVM range from GPU SVM notifier
  * @notifier: Pointer to the GPU SVM notifier structure.
@@ -945,7 +837,7 @@ drm_gpusvm_range_chunk_size(struct drm_gpusvm *gpusvm,
 		 * process-many-malloc' fails. In the failure case, each process
 		 * mallocs 16k but the CPU VMA is ~128k which results in 64k SVM
 		 * ranges. When migrating the SVM ranges, some processes fail in
-		 * drm_gpusvm_migrate_to_devmem with 'migrate.cpages != npages'
+		 * drm_pagemap_migrate_to_devmem with 'migrate.cpages != npages'
 		 * and then upon drm_gpusvm_range_get_pages device pages from
 		 * other processes are collected + faulted in which creates all
 		 * sorts of problems. Unsure exactly how this happening, also
@@ -1321,7 +1213,7 @@ int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm,
 		.dev_private_owner = gpusvm->device_private_page_owner,
 	};
 	struct mm_struct *mm = gpusvm->mm;
-	struct drm_gpusvm_zdd *zdd;
+	void *zdd;
 	unsigned long timeout =
 		jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
 	unsigned long i, j;
@@ -1423,7 +1315,7 @@ int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm,
 				}
 
 				pagemap = page->pgmap;
-				dpagemap = zdd->devmem_allocation->dpagemap;
+				dpagemap = drm_pagemap_page_to_dpagemap(page);
 				if (drm_WARN_ON(gpusvm->drm, !dpagemap)) {
 					/*
 					 * Raced. This is not supposed to happen
@@ -1449,7 +1341,7 @@ int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm,
 		} else {
 			dma_addr_t addr;
 
-			if (is_zone_device_page(page) || zdd) {
+			if (is_zone_device_page(page) || pagemap) {
 				err = -EOPNOTSUPP;
 				goto err_unmap;
 			}
@@ -1472,7 +1364,7 @@ int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm,
 	}
 
 	range->flags.has_dma_mapping = true;
-	if (zdd) {
+	if (pagemap) {
 		range->flags.has_devmem_pages = true;
 		range->dpagemap = dpagemap;
 	}
@@ -1497,6 +1389,7 @@ EXPORT_SYMBOL_GPL(drm_gpusvm_range_get_pages);
 
 /**
  * drm_gpusvm_range_unmap_pages() - Unmap pages associated with a GPU SVM range
+ * drm_gpusvm_range_evict() - Evict GPU SVM range
  * @gpusvm: Pointer to the GPU SVM structure
  * @range: Pointer to the GPU SVM range structure
  * @ctx: GPU SVM context
@@ -1527,553 +1420,11 @@ void drm_gpusvm_range_unmap_pages(struct drm_gpusvm *gpusvm,
 EXPORT_SYMBOL_GPL(drm_gpusvm_range_unmap_pages);
 
 /**
- * drm_gpusvm_migration_unlock_put_page() - Put a migration page
- * @page: Pointer to the page to put
- *
- * This function unlocks and puts a page.
- */
-static void drm_gpusvm_migration_unlock_put_page(struct page *page)
-{
-	unlock_page(page);
-	put_page(page);
-}
-
-/**
- * drm_gpusvm_migration_unlock_put_pages() - Put migration pages
- * @npages: Number of pages
- * @migrate_pfn: Array of migrate page frame numbers
- *
- * This function unlocks and puts an array of pages.
- */
-static void drm_gpusvm_migration_unlock_put_pages(unsigned long npages,
-						  unsigned long *migrate_pfn)
-{
-	unsigned long i;
-
-	for (i = 0; i < npages; ++i) {
-		struct page *page;
-
-		if (!migrate_pfn[i])
-			continue;
-
-		page = migrate_pfn_to_page(migrate_pfn[i]);
-		drm_gpusvm_migration_unlock_put_page(page);
-		migrate_pfn[i] = 0;
-	}
-}
-
-/**
- * drm_gpusvm_get_devmem_page() - Get a reference to a device memory page
- * @page: Pointer to the page
- * @zdd: Pointer to the GPU SVM zone device data
- *
- * This function associates the given page with the specified GPU SVM zone
- * device data and initializes it for zone device usage.
- */
-static void drm_gpusvm_get_devmem_page(struct page *page,
-				       struct drm_gpusvm_zdd *zdd)
-{
-	page->zone_device_data = drm_gpusvm_zdd_get(zdd);
-	zone_device_page_init(page);
-}
-
-/**
- * drm_gpusvm_migrate_map_pages() - Map migration pages for GPU SVM migration
- * @dev: The device for which the pages are being mapped
- * @dma_addr: Array to store DMA addresses corresponding to mapped pages
- * @migrate_pfn: Array of migrate page frame numbers to map
- * @npages: Number of pages to map
- * @dir: Direction of data transfer (e.g., DMA_BIDIRECTIONAL)
- *
- * This function maps pages of memory for migration usage in GPU SVM. It
- * iterates over each page frame number provided in @migrate_pfn, maps the
- * corresponding page, and stores the DMA address in the provided @dma_addr
- * array.
- *
- * Return: 0 on success, -EFAULT if an error occurs during mapping.
- */
-static int drm_gpusvm_migrate_map_pages(struct device *dev,
-					dma_addr_t *dma_addr,
-					unsigned long *migrate_pfn,
-					unsigned long npages,
-					enum dma_data_direction dir)
-{
-	unsigned long i;
-
-	for (i = 0; i < npages; ++i) {
-		struct page *page = migrate_pfn_to_page(migrate_pfn[i]);
-
-		if (!page)
-			continue;
-
-		if (WARN_ON_ONCE(is_zone_device_page(page)))
-			return -EFAULT;
-
-		dma_addr[i] = dma_map_page(dev, page, 0, PAGE_SIZE, dir);
-		if (dma_mapping_error(dev, dma_addr[i]))
-			return -EFAULT;
-	}
-
-	return 0;
-}
-
-/**
- * drm_gpusvm_migrate_unmap_pages() - Unmap pages previously mapped for GPU SVM migration
- * @dev: The device for which the pages were mapped
- * @dma_addr: Array of DMA addresses corresponding to mapped pages
- * @npages: Number of pages to unmap
- * @dir: Direction of data transfer (e.g., DMA_BIDIRECTIONAL)
- *
- * This function unmaps previously mapped pages of memory for GPU Shared Virtual
- * Memory (SVM). It iterates over each DMA address provided in @dma_addr, checks
- * if it's valid and not already unmapped, and unmaps the corresponding page.
- */
-static void drm_gpusvm_migrate_unmap_pages(struct device *dev,
-					   dma_addr_t *dma_addr,
-					   unsigned long npages,
-					   enum dma_data_direction dir)
-{
-	unsigned long i;
-
-	for (i = 0; i < npages; ++i) {
-		if (!dma_addr[i] || dma_mapping_error(dev, dma_addr[i]))
-			continue;
-
-		dma_unmap_page(dev, dma_addr[i], PAGE_SIZE, dir);
-	}
-}
-
-/**
- * drm_gpusvm_migrate_to_devmem() - Migrate GPU SVM range to device memory
+ * drm_gpusvm_range_evict() - Evict GPU SVM range
  * @gpusvm: Pointer to the GPU SVM structure
- * @range: Pointer to the GPU SVM range structure
- * @devmem_allocation: Pointer to the device memory allocation. The caller
- *                     should hold a reference to the device memory allocation,
- *                     which should be dropped via ops->devmem_release or upon
- *                     the failure of this function.
- * @ctx: GPU SVM context
- *
- * This function migrates the specified GPU SVM range to device memory. It
- * performs the necessary setup and invokes the driver-specific operations for
- * migration to device memory. Upon successful return, @devmem_allocation can
- * safely reference @range until ops->devmem_release is called which only upon
- * successful return. Expected to be called while holding the mmap lock in read
- * mode.
- *
- * Return: 0 on success, negative error code on failure.
- */
-int drm_gpusvm_migrate_to_devmem(struct drm_gpusvm *gpusvm,
-				 struct drm_gpusvm_range *range,
-				 struct drm_gpusvm_devmem *devmem_allocation,
-				 const struct drm_gpusvm_ctx *ctx)
-{
-	const struct drm_gpusvm_devmem_ops *ops = devmem_allocation->ops;
-	unsigned long start = drm_gpusvm_range_start(range),
-		      end = drm_gpusvm_range_end(range);
-	struct migrate_vma migrate = {
-		.start		= start,
-		.end		= end,
-		.pgmap_owner	= gpusvm->device_private_page_owner,
-		.flags		= MIGRATE_VMA_SELECT_SYSTEM,
-	};
-	struct mm_struct *mm = gpusvm->mm;
-	unsigned long i, npages = npages_in_range(start, end);
-	struct vm_area_struct *vas;
-	struct drm_gpusvm_zdd *zdd = NULL;
-	struct page **pages;
-	dma_addr_t *dma_addr;
-	void *buf;
-	int err;
-
-	mmap_assert_locked(gpusvm->mm);
-
-	if (!range->flags.migrate_devmem)
-		return -EINVAL;
-
-	if (!ops->populate_devmem_pfn || !ops->copy_to_devmem ||
-	    !ops->copy_to_ram)
-		return -EOPNOTSUPP;
-
-	vas = vma_lookup(mm, start);
-	if (!vas) {
-		err = -ENOENT;
-		goto err_out;
-	}
-
-	if (end > vas->vm_end || start < vas->vm_start) {
-		err = -EINVAL;
-		goto err_out;
-	}
-
-	if (!vma_is_anonymous(vas)) {
-		err = -EBUSY;
-		goto err_out;
-	}
-
-	buf = kvcalloc(npages, 2 * sizeof(*migrate.src) + sizeof(*dma_addr) +
-		       sizeof(*pages), GFP_KERNEL);
-	if (!buf) {
-		err = -ENOMEM;
-		goto err_out;
-	}
-	dma_addr = buf + (2 * sizeof(*migrate.src) * npages);
-	pages = buf + (2 * sizeof(*migrate.src) + sizeof(*dma_addr)) * npages;
-
-	zdd = drm_gpusvm_zdd_alloc(gpusvm->device_private_page_owner);
-	if (!zdd) {
-		err = -ENOMEM;
-		goto err_free;
-	}
-
-	migrate.vma = vas;
-	migrate.src = buf;
-	migrate.dst = migrate.src + npages;
-
-	err = migrate_vma_setup(&migrate);
-	if (err)
-		goto err_free;
-
-	if (!migrate.cpages) {
-		err = -EFAULT;
-		goto err_free;
-	}
-
-	if (migrate.cpages != npages) {
-		err = -EBUSY;
-		goto err_finalize;
-	}
-
-	err = ops->populate_devmem_pfn(devmem_allocation, npages, migrate.dst);
-	if (err)
-		goto err_finalize;
-
-	err = drm_gpusvm_migrate_map_pages(devmem_allocation->dev, dma_addr,
-					   migrate.src, npages, DMA_TO_DEVICE);
-	if (err)
-		goto err_finalize;
-
-	for (i = 0; i < npages; ++i) {
-		struct page *page = pfn_to_page(migrate.dst[i]);
-
-		pages[i] = page;
-		migrate.dst[i] = migrate_pfn(migrate.dst[i]);
-		drm_gpusvm_get_devmem_page(page, zdd);
-	}
-
-	err = ops->copy_to_devmem(pages, dma_addr, npages);
-	if (err)
-		goto err_finalize;
-
-	/* Upon success bind devmem allocation to range and zdd */
-	zdd->devmem_allocation = devmem_allocation;	/* Owns ref */
-
-err_finalize:
-	if (err)
-		drm_gpusvm_migration_unlock_put_pages(npages, migrate.dst);
-	migrate_vma_pages(&migrate);
-	migrate_vma_finalize(&migrate);
-	drm_gpusvm_migrate_unmap_pages(devmem_allocation->dev, dma_addr, npages,
-				       DMA_TO_DEVICE);
-err_free:
-	if (zdd)
-		drm_gpusvm_zdd_put(zdd);
-	kvfree(buf);
-err_out:
-	return err;
-}
-EXPORT_SYMBOL_GPL(drm_gpusvm_migrate_to_devmem);
-
-/**
- * drm_gpusvm_migrate_populate_ram_pfn() - Populate RAM PFNs for a VM area
- * @vas: Pointer to the VM area structure, can be NULL
- * @fault_page: Fault page
- * @npages: Number of pages to populate
- * @mpages: Number of pages to migrate
- * @src_mpfn: Source array of migrate PFNs
- * @mpfn: Array of migrate PFNs to populate
- * @addr: Start address for PFN allocation
- *
- * This function populates the RAM migrate page frame numbers (PFNs) for the
- * specified VM area structure. It allocates and locks pages in the VM area for
- * RAM usage. If vas is non-NULL use alloc_page_vma for allocation, if NULL use
- * alloc_page for allocation.
- *
- * Return: 0 on success, negative error code on failure.
- */
-static int drm_gpusvm_migrate_populate_ram_pfn(struct vm_area_struct *vas,
-					       struct page *fault_page,
-					       unsigned long npages,
-					       unsigned long *mpages,
-					       unsigned long *src_mpfn,
-					       unsigned long *mpfn,
-					       unsigned long addr)
-{
-	unsigned long i;
-
-	for (i = 0; i < npages; ++i, addr += PAGE_SIZE) {
-		struct page *page, *src_page;
-
-		if (!(src_mpfn[i] & MIGRATE_PFN_MIGRATE))
-			continue;
-
-		src_page = migrate_pfn_to_page(src_mpfn[i]);
-		if (!src_page)
-			continue;
-
-		if (fault_page) {
-			if (src_page->zone_device_data !=
-			    fault_page->zone_device_data)
-				continue;
-		}
-
-		if (vas)
-			page = alloc_page_vma(GFP_HIGHUSER, vas, addr);
-		else
-			page = alloc_page(GFP_HIGHUSER);
-
-		if (!page)
-			goto free_pages;
-
-		mpfn[i] = migrate_pfn(page_to_pfn(page));
-	}
-
-	for (i = 0; i < npages; ++i) {
-		struct page *page = migrate_pfn_to_page(mpfn[i]);
-
-		if (!page)
-			continue;
-
-		WARN_ON_ONCE(!trylock_page(page));
-		++*mpages;
-	}
-
-	return 0;
-
-free_pages:
-	for (i = 0; i < npages; ++i) {
-		struct page *page = migrate_pfn_to_page(mpfn[i]);
-
-		if (!page)
-			continue;
-
-		put_page(page);
-		mpfn[i] = 0;
-	}
-	return -ENOMEM;
-}
-
-/**
- * drm_gpusvm_evict_to_ram() - Evict GPU SVM range to RAM
- * @devmem_allocation: Pointer to the device memory allocation
- *
- * Similar to __drm_gpusvm_migrate_to_ram but does not require mmap lock and
- * migration done via migrate_device_* functions.
- *
- * Return: 0 on success, negative error code on failure.
- */
-int drm_gpusvm_evict_to_ram(struct drm_gpusvm_devmem *devmem_allocation)
-{
-	const struct drm_gpusvm_devmem_ops *ops = devmem_allocation->ops;
-	unsigned long npages, mpages = 0;
-	struct page **pages;
-	unsigned long *src, *dst;
-	dma_addr_t *dma_addr;
-	void *buf;
-	int i, err = 0;
-	unsigned int retry_count = 2;
-
-	npages = devmem_allocation->size >> PAGE_SHIFT;
-
-retry:
-	if (!mmget_not_zero(devmem_allocation->mm))
-		return -EFAULT;
-
-	buf = kvcalloc(npages, 2 * sizeof(*src) + sizeof(*dma_addr) +
-		       sizeof(*pages), GFP_KERNEL);
-	if (!buf) {
-		err = -ENOMEM;
-		goto err_out;
-	}
-	src = buf;
-	dst = buf + (sizeof(*src) * npages);
-	dma_addr = buf + (2 * sizeof(*src) * npages);
-	pages = buf + (2 * sizeof(*src) + sizeof(*dma_addr)) * npages;
-
-	err = ops->populate_devmem_pfn(devmem_allocation, npages, src);
-	if (err)
-		goto err_free;
-
-	err = migrate_device_pfns(src, npages);
-	if (err)
-		goto err_free;
-
-	err = drm_gpusvm_migrate_populate_ram_pfn(NULL, NULL, npages, &mpages,
-						  src, dst, 0);
-	if (err || !mpages)
-		goto err_finalize;
-
-	err = drm_gpusvm_migrate_map_pages(devmem_allocation->dev, dma_addr,
-					   dst, npages, DMA_FROM_DEVICE);
-	if (err)
-		goto err_finalize;
-
-	for (i = 0; i < npages; ++i)
-		pages[i] = migrate_pfn_to_page(src[i]);
-
-	err = ops->copy_to_ram(pages, dma_addr, npages);
-	if (err)
-		goto err_finalize;
-
-err_finalize:
-	if (err)
-		drm_gpusvm_migration_unlock_put_pages(npages, dst);
-	migrate_device_pages(src, dst, npages);
-	migrate_device_finalize(src, dst, npages);
-	drm_gpusvm_migrate_unmap_pages(devmem_allocation->dev, dma_addr, npages,
-				       DMA_FROM_DEVICE);
-err_free:
-	kvfree(buf);
-err_out:
-	mmput_async(devmem_allocation->mm);
-
-	if (completion_done(&devmem_allocation->detached))
-		return 0;
-
-	if (retry_count--) {
-		cond_resched();
-		goto retry;
-	}
-
-	return err ?: -EBUSY;
-}
-EXPORT_SYMBOL_GPL(drm_gpusvm_evict_to_ram);
-
-/**
- * __drm_gpusvm_migrate_to_ram() - Migrate GPU SVM range to RAM (internal)
- * @vas: Pointer to the VM area structure
- * @device_private_page_owner: Device private pages owner
- * @page: Pointer to the page for fault handling (can be NULL)
- * @fault_addr: Fault address
- * @size: Size of migration
- *
- * This internal function performs the migration of the specified GPU SVM range
- * to RAM. It sets up the migration, populates + dma maps RAM PFNs, and
- * invokes the driver-specific operations for migration to RAM.
- *
- * Return: 0 on success, negative error code on failure.
- */
-static int __drm_gpusvm_migrate_to_ram(struct vm_area_struct *vas,
-				       void *device_private_page_owner,
-				       struct page *page,
-				       unsigned long fault_addr,
-				       unsigned long size)
-{
-	struct migrate_vma migrate = {
-		.vma		= vas,
-		.pgmap_owner	= device_private_page_owner,
-		.flags		= MIGRATE_VMA_SELECT_DEVICE_PRIVATE |
-			MIGRATE_VMA_SELECT_DEVICE_COHERENT,
-		.fault_page	= page,
-	};
-	struct drm_gpusvm_zdd *zdd;
-	const struct drm_gpusvm_devmem_ops *ops;
-	struct device *dev = NULL;
-	unsigned long npages, mpages = 0;
-	struct page **pages;
-	dma_addr_t *dma_addr;
-	unsigned long start, end;
-	void *buf;
-	int i, err = 0;
-
-	start = ALIGN_DOWN(fault_addr, size);
-	end = ALIGN(fault_addr + 1, size);
-
-	/* Corner where VMA area struct has been partially unmapped */
-	if (start < vas->vm_start)
-		start = vas->vm_start;
-	if (end > vas->vm_end)
-		end = vas->vm_end;
-
-	migrate.start = start;
-	migrate.end = end;
-	npages = npages_in_range(start, end);
-
-	buf = kvcalloc(npages, 2 * sizeof(*migrate.src) + sizeof(*dma_addr) +
-		       sizeof(*pages), GFP_KERNEL);
-	if (!buf) {
-		err = -ENOMEM;
-		goto err_out;
-	}
-	dma_addr = buf + (2 * sizeof(*migrate.src) * npages);
-	pages = buf + (2 * sizeof(*migrate.src) + sizeof(*dma_addr)) * npages;
-
-	migrate.vma = vas;
-	migrate.src = buf;
-	migrate.dst = migrate.src + npages;
-
-	err = migrate_vma_setup(&migrate);
-	if (err)
-		goto err_free;
-
-	/* Raced with another CPU fault, nothing to do */
-	if (!migrate.cpages)
-		goto err_free;
-
-	if (!page) {
-		for (i = 0; i < npages; ++i) {
-			if (!(migrate.src[i] & MIGRATE_PFN_MIGRATE))
-				continue;
-
-			page = migrate_pfn_to_page(migrate.src[i]);
-			break;
-		}
-
-		if (!page)
-			goto err_finalize;
-	}
-	zdd = page->zone_device_data;
-	ops = zdd->devmem_allocation->ops;
-	dev = zdd->devmem_allocation->dev;
-
-	err = drm_gpusvm_migrate_populate_ram_pfn(vas, page, npages, &mpages,
-						  migrate.src, migrate.dst,
-						  start);
-	if (err)
-		goto err_finalize;
-
-	err = drm_gpusvm_migrate_map_pages(dev, dma_addr, migrate.dst, npages,
-					   DMA_FROM_DEVICE);
-	if (err)
-		goto err_finalize;
-
-	for (i = 0; i < npages; ++i)
-		pages[i] = migrate_pfn_to_page(migrate.src[i]);
-
-	err = ops->copy_to_ram(pages, dma_addr, npages);
-	if (err)
-		goto err_finalize;
-
-err_finalize:
-	if (err)
-		drm_gpusvm_migration_unlock_put_pages(npages, migrate.dst);
-	migrate_vma_pages(&migrate);
-	migrate_vma_finalize(&migrate);
-	if (dev)
-		drm_gpusvm_migrate_unmap_pages(dev, dma_addr, npages,
-					       DMA_FROM_DEVICE);
-err_free:
-	kvfree(buf);
-err_out:
-
-	return err;
-}
-
-/**
- * drm_gpusvm_range_evict - Evict GPU SVM range
  * @range: Pointer to the GPU SVM range to be removed
  *
- * This function evicts the specified GPU SVM range. This function will not
- * evict coherent pages.
+ * This function evicts the specified GPU SVM range.
  *
  * Return: 0 on success, a negative error code on failure.
  */
@@ -2125,60 +1476,6 @@ int drm_gpusvm_range_evict(struct drm_gpusvm *gpusvm,
 }
 EXPORT_SYMBOL_GPL(drm_gpusvm_range_evict);
 
-/**
- * drm_gpusvm_page_free() - Put GPU SVM zone device data associated with a page
- * @page: Pointer to the page
- *
- * This function is a callback used to put the GPU SVM zone device data
- * associated with a page when it is being released.
- */
-static void drm_gpusvm_page_free(struct page *page)
-{
-	drm_gpusvm_zdd_put(page->zone_device_data);
-}
-
-/**
- * drm_gpusvm_migrate_to_ram() - Migrate GPU SVM range to RAM (page fault handler)
- * @vmf: Pointer to the fault information structure
- *
- * This function is a page fault handler used to migrate a GPU SVM range to RAM.
- * It retrieves the GPU SVM range information from the faulting page and invokes
- * the internal migration function to migrate the range back to RAM.
- *
- * Return: VM_FAULT_SIGBUS on failure, 0 on success.
- */
-static vm_fault_t drm_gpusvm_migrate_to_ram(struct vm_fault *vmf)
-{
-	struct drm_gpusvm_zdd *zdd = vmf->page->zone_device_data;
-	int err;
-
-	err = __drm_gpusvm_migrate_to_ram(vmf->vma,
-					  zdd->device_private_page_owner,
-					  vmf->page, vmf->address,
-					  zdd->devmem_allocation->size);
-
-	return err ? VM_FAULT_SIGBUS : 0;
-}
-
-/*
- * drm_gpusvm_pagemap_ops - Device page map operations for GPU SVM
- */
-static const struct dev_pagemap_ops drm_gpusvm_pagemap_ops = {
-	.page_free = drm_gpusvm_page_free,
-	.migrate_to_ram = drm_gpusvm_migrate_to_ram,
-};
-
-/**
- * drm_gpusvm_pagemap_ops_get() - Retrieve GPU SVM device page map operations
- *
- * Return: Pointer to the GPU SVM device page map operations structure.
- */
-const struct dev_pagemap_ops *drm_gpusvm_pagemap_ops_get(void)
-{
-	return &drm_gpusvm_pagemap_ops;
-}
-EXPORT_SYMBOL_GPL(drm_gpusvm_pagemap_ops_get);
-
 /**
  * drm_gpusvm_has_mapping() - Check if GPU SVM has mapping for the given address range
  * @gpusvm: Pointer to the GPU SVM structure.
@@ -2223,28 +1520,5 @@ void drm_gpusvm_range_set_unmapped(struct drm_gpusvm_range *range,
 }
 EXPORT_SYMBOL_GPL(drm_gpusvm_range_set_unmapped);
 
-/**
- * drm_gpusvm_devmem_init() - Initialize a GPU SVM device memory allocation
- *
- * @dev: Pointer to the device structure which device memory allocation belongs to
- * @mm: Pointer to the mm_struct for the address space
- * @ops: Pointer to the operations structure for GPU SVM device memory
- * @dpagemap: The struct drm_pagemap we're allocating from.
- * @size: Size of device memory allocation
- */
-void drm_gpusvm_devmem_init(struct drm_gpusvm_devmem *devmem_allocation,
-			    struct device *dev, struct mm_struct *mm,
-			    const struct drm_gpusvm_devmem_ops *ops,
-			    struct drm_pagemap *dpagemap, size_t size)
-{
-	init_completion(&devmem_allocation->detached);
-	devmem_allocation->dev = dev;
-	devmem_allocation->mm = mm;
-	devmem_allocation->ops = ops;
-	devmem_allocation->dpagemap = dpagemap;
-	devmem_allocation->size = size;
-}
-EXPORT_SYMBOL_GPL(drm_gpusvm_devmem_init);
-
 MODULE_DESCRIPTION("DRM GPUSVM");
 MODULE_LICENSE("GPL");
diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c
new file mode 100644
index 000000000000..c46bb4384444
--- /dev/null
+++ b/drivers/gpu/drm/drm_pagemap.c
@@ -0,0 +1,784 @@
+// SPDX-License-Identifier: GPL-2.0-only OR MIT
+/*
+ * Copyright © 2024-2025 Intel Corporation
+ */
+
+#include <linux/dma-mapping.h>
+#include <linux/migrate.h>
+#include <linux/pagemap.h>
+#include <drm/drm_pagemap.h>
+
+/**
+ * DOC: Overview
+ *
+ * The DRM pagemap layer is intended to augment the dev_pagemap functionality by
+ * providing a way to populate a struct mm_struct virtual range with device
+ * private pages and to provide helpers to abstract device memory allocations,
+ * to migrate memory back and forth between device memory and system RAM and
+ * to handle access (and in the future migration) between devices implementing
+ * a fast interconnect that is not necessarily visible to the rest of the
+ * system.
+ *
+ * Typically the DRM pagemap receives requests from one or more DRM GPU SVM
+ * instances to populate struct mm_struct virtual ranges with memory, and the
+ * migration is best effort only and may thus fail. The implementation should
+ * also handle device unbinding by blocking (return an -ENODEV) error for new
+ * population requests and after that migrate all device pages to system ram.
+ */
+
+/**
+ * DOC: Migration
+ * Migration granularity typically follows the GPU SVM range requests, but
+ * if there are clashes, due to races or due to the fact that multiple GPU
+ * SVM instances have different views of the ranges used, and because of that
+ * parts of a requested range is already present in the requested device memory,
+ * the implementation has a variety of options. It can fail and it can choose
+ * to populate only the part of the range that isn't already in device memory,
+ * and it can evict the range to system before trying to migrate. Ideally an
+ * implementation would just try to migrate the missing part of the range and
+ * allocate just enough memory to do so.
+ *
+ * When migrating to system memory as a response to a cpu fault or a device
+ * memory eviction request, currently a full device memory allocation is
+ * migrated back to system. Moving forward this might need improvement for
+ * situations where a single page needs bouncing between system memory and
+ * device memory due to, for example, atomic operations.
+ *
+ * Key DRM pagemap components:
+ *
+ * - Device Memory Allocations:
+ *      Embedded structure containing enough information for the drm_pagemap to
+ *      migrate to / from device memory.
+ *
+ * - Device Memory Operations:
+ *      Define the interface for driver-specific device memory operations
+ *      release memory, populate pfns, and copy to / from device memory.
+ */
+
+/**
+ * struct drm_pagemap_zdd - GPU SVM zone device data
+ *
+ * @refcount: Reference count for the zdd
+ * @devmem_allocation: device memory allocation
+ * @device_private_page_owner: Device private pages owner
+ *
+ * This structure serves as a generic wrapper installed in
+ * page->zone_device_data. It provides infrastructure for looking up a device
+ * memory allocation upon CPU page fault and asynchronously releasing device
+ * memory once the CPU has no page references. Asynchronous release is useful
+ * because CPU page references can be dropped in IRQ contexts, while releasing
+ * device memory likely requires sleeping locks.
+ */
+struct drm_pagemap_zdd {
+	struct kref refcount;
+	struct drm_pagemap_devmem *devmem_allocation;
+	void *device_private_page_owner;
+};
+
+/**
+ * drm_pagemap_zdd_alloc() - Allocate a zdd structure.
+ * @device_private_page_owner: Device private pages owner
+ *
+ * This function allocates and initializes a new zdd structure. It sets up the
+ * reference count and initializes the destroy work.
+ *
+ * Return: Pointer to the allocated zdd on success, ERR_PTR() on failure.
+ */
+static struct drm_pagemap_zdd *
+drm_pagemap_zdd_alloc(void *device_private_page_owner)
+{
+	struct drm_pagemap_zdd *zdd;
+
+	zdd = kmalloc(sizeof(*zdd), GFP_KERNEL);
+	if (!zdd)
+		return NULL;
+
+	kref_init(&zdd->refcount);
+	zdd->devmem_allocation = NULL;
+	zdd->device_private_page_owner = device_private_page_owner;
+
+	return zdd;
+}
+
+/**
+ * drm_pagemap_zdd_get() - Get a reference to a zdd structure.
+ * @zdd: Pointer to the zdd structure.
+ *
+ * This function increments the reference count of the provided zdd structure.
+ *
+ * Return: Pointer to the zdd structure.
+ */
+static struct drm_pagemap_zdd *drm_pagemap_zdd_get(struct drm_pagemap_zdd *zdd)
+{
+	kref_get(&zdd->refcount);
+	return zdd;
+}
+
+/**
+ * drm_pagemap_zdd_destroy() - Destroy a zdd structure.
+ * @ref: Pointer to the reference count structure.
+ *
+ * This function queues the destroy_work of the zdd for asynchronous destruction.
+ */
+static void drm_pagemap_zdd_destroy(struct kref *ref)
+{
+	struct drm_pagemap_zdd *zdd =
+		container_of(ref, struct drm_pagemap_zdd, refcount);
+	struct drm_pagemap_devmem *devmem = zdd->devmem_allocation;
+
+	if (devmem) {
+		complete_all(&devmem->detached);
+		if (devmem->ops->devmem_release)
+			devmem->ops->devmem_release(devmem);
+	}
+	kfree(zdd);
+}
+
+/**
+ * drm_pagemap_zdd_put() - Put a zdd reference.
+ * @zdd: Pointer to the zdd structure.
+ *
+ * This function decrements the reference count of the provided zdd structure
+ * and schedules its destruction if the count drops to zero.
+ */
+static void drm_pagemap_zdd_put(struct drm_pagemap_zdd *zdd)
+{
+	kref_put(&zdd->refcount, drm_pagemap_zdd_destroy);
+}
+
+/**
+ * drm_pagemap_migration_unlock_put_page() - Put a migration page
+ * @page: Pointer to the page to put
+ *
+ * This function unlocks and puts a page.
+ */
+static void drm_pagemap_migration_unlock_put_page(struct page *page)
+{
+	unlock_page(page);
+	put_page(page);
+}
+
+/**
+ * drm_pagemap_migration_unlock_put_pages() - Put migration pages
+ * @npages: Number of pages
+ * @migrate_pfn: Array of migrate page frame numbers
+ *
+ * This function unlocks and puts an array of pages.
+ */
+static void drm_pagemap_migration_unlock_put_pages(unsigned long npages,
+						   unsigned long *migrate_pfn)
+{
+	unsigned long i;
+
+	for (i = 0; i < npages; ++i) {
+		struct page *page;
+
+		if (!migrate_pfn[i])
+			continue;
+
+		page = migrate_pfn_to_page(migrate_pfn[i]);
+		drm_pagemap_migration_unlock_put_page(page);
+		migrate_pfn[i] = 0;
+	}
+}
+
+/**
+ * drm_pagemap_get_devmem_page() - Get a reference to a device memory page
+ * @page: Pointer to the page
+ * @zdd: Pointer to the GPU SVM zone device data
+ *
+ * This function associates the given page with the specified GPU SVM zone
+ * device data and initializes it for zone device usage.
+ */
+static void drm_pagemap_get_devmem_page(struct page *page,
+					struct drm_pagemap_zdd *zdd)
+{
+	page->zone_device_data = drm_pagemap_zdd_get(zdd);
+	zone_device_page_init(page);
+}
+
+/**
+ * drm_pagemap_migrate_map_pages() - Map migration pages for GPU SVM migration
+ * @dev: The device for which the pages are being mapped
+ * @dma_addr: Array to store DMA addresses corresponding to mapped pages
+ * @migrate_pfn: Array of migrate page frame numbers to map
+ * @npages: Number of pages to map
+ * @dir: Direction of data transfer (e.g., DMA_BIDIRECTIONAL)
+ *
+ * This function maps pages of memory for migration usage in GPU SVM. It
+ * iterates over each page frame number provided in @migrate_pfn, maps the
+ * corresponding page, and stores the DMA address in the provided @dma_addr
+ * array.
+ *
+ * Returns: 0 on success, -EFAULT if an error occurs during mapping.
+ */
+static int drm_pagemap_migrate_map_pages(struct device *dev,
+					 dma_addr_t *dma_addr,
+					 unsigned long *migrate_pfn,
+					 unsigned long npages,
+					 enum dma_data_direction dir)
+{
+	unsigned long i;
+
+	for (i = 0; i < npages; ++i) {
+		struct page *page = migrate_pfn_to_page(migrate_pfn[i]);
+
+		if (!page)
+			continue;
+
+		if (WARN_ON_ONCE(is_zone_device_page(page)))
+			return -EFAULT;
+
+		dma_addr[i] = dma_map_page(dev, page, 0, PAGE_SIZE, dir);
+		if (dma_mapping_error(dev, dma_addr[i]))
+			return -EFAULT;
+	}
+
+	return 0;
+}
+
+/**
+ * drm_pagemap_migrate_unmap_pages() - Unmap pages previously mapped for GPU SVM migration
+ * @dev: The device for which the pages were mapped
+ * @dma_addr: Array of DMA addresses corresponding to mapped pages
+ * @npages: Number of pages to unmap
+ * @dir: Direction of data transfer (e.g., DMA_BIDIRECTIONAL)
+ *
+ * This function unmaps previously mapped pages of memory for GPU Shared Virtual
+ * Memory (SVM). It iterates over each DMA address provided in @dma_addr, checks
+ * if it's valid and not already unmapped, and unmaps the corresponding page.
+ */
+static void drm_pagemap_migrate_unmap_pages(struct device *dev,
+					    dma_addr_t *dma_addr,
+					    unsigned long npages,
+					    enum dma_data_direction dir)
+{
+	unsigned long i;
+
+	for (i = 0; i < npages; ++i) {
+		if (!dma_addr[i] || dma_mapping_error(dev, dma_addr[i]))
+			continue;
+
+		dma_unmap_page(dev, dma_addr[i], PAGE_SIZE, dir);
+	}
+}
+
+static unsigned long
+npages_in_range(unsigned long start, unsigned long end)
+{
+	return (end - start) >> PAGE_SHIFT;
+}
+
+/**
+ * drm_pagemap_migrate_to_devmem() - Migrate a struct mm_struct range to device memory
+ * @devmem_allocation: The device memory allocation to migrate to.
+ * The caller should hold a reference to the device memory allocation,
+ * and the reference is consumed by this function unless it returns with
+ * an error.
+ * @mm: Pointer to the struct mm_struct.
+ * @start: Start of the virtual address range to migrate.
+ * @end: End of the virtual address range to migrate.
+ * @pgmap_owner: Not used currently, since only system memory is considered.
+ *
+ * This function migrates the specified virtual address range to device memory.
+ * It performs the necessary setup and invokes the driver-specific operations for
+ * migration to device memory.
+ *
+ * Return:
+ * 0 on success, negative error code on failure.
+ */
+int drm_pagemap_migrate_to_devmem(struct drm_pagemap_devmem *devmem_allocation,
+				  struct mm_struct *mm,
+				  unsigned long start, unsigned long end,
+				  void *pgmap_owner)
+{
+	const struct drm_pagemap_devmem_ops *ops = devmem_allocation->ops;
+	struct migrate_vma migrate = {
+		.start		= start,
+		.end		= end,
+		.pgmap_owner	= pgmap_owner,
+		.flags		= MIGRATE_VMA_SELECT_SYSTEM,
+	};
+	unsigned long i, npages = npages_in_range(start, end);
+	struct vm_area_struct *vas;
+	struct drm_pagemap_zdd *zdd = NULL;
+	struct page **pages;
+	dma_addr_t *dma_addr;
+	void *buf;
+	int err;
+
+	mmap_assert_locked(mm);
+
+	if (!ops->populate_devmem_pfn || !ops->copy_to_devmem ||
+	    !ops->copy_to_ram)
+		return -EOPNOTSUPP;
+
+	vas = vma_lookup(mm, start);
+	if (!vas) {
+		err = -ENOENT;
+		goto err_out;
+	}
+
+	if (end > vas->vm_end || start < vas->vm_start) {
+		err = -EINVAL;
+		goto err_out;
+	}
+
+	if (!vma_is_anonymous(vas)) {
+		err = -EBUSY;
+		goto err_out;
+	}
+
+	buf = kvcalloc(npages, 2 * sizeof(*migrate.src) + sizeof(*dma_addr) +
+		       sizeof(*pages), GFP_KERNEL);
+	if (!buf) {
+		err = -ENOMEM;
+		goto err_out;
+	}
+	dma_addr = buf + (2 * sizeof(*migrate.src) * npages);
+	pages = buf + (2 * sizeof(*migrate.src) + sizeof(*dma_addr)) * npages;
+
+	zdd = drm_pagemap_zdd_alloc(pgmap_owner);
+	if (!zdd) {
+		err = -ENOMEM;
+		goto err_free;
+	}
+
+	migrate.vma = vas;
+	migrate.src = buf;
+	migrate.dst = migrate.src + npages;
+
+	err = migrate_vma_setup(&migrate);
+	if (err)
+		goto err_free;
+
+	if (!migrate.cpages) {
+		err = -EFAULT;
+		goto err_free;
+	}
+
+	if (migrate.cpages != npages) {
+		err = -EBUSY;
+		goto err_finalize;
+	}
+
+	err = ops->populate_devmem_pfn(devmem_allocation, npages, migrate.dst);
+	if (err)
+		goto err_finalize;
+
+	err = drm_pagemap_migrate_map_pages(devmem_allocation->dev, dma_addr,
+					    migrate.src, npages, DMA_TO_DEVICE);
+	if (err)
+		goto err_finalize;
+
+	for (i = 0; i < npages; ++i) {
+		struct page *page = pfn_to_page(migrate.dst[i]);
+
+		pages[i] = page;
+		migrate.dst[i] = migrate_pfn(migrate.dst[i]);
+		drm_pagemap_get_devmem_page(page, zdd);
+	}
+
+	err = ops->copy_to_devmem(pages, dma_addr, npages);
+	if (err)
+		goto err_finalize;
+
+	/* Upon success bind devmem allocation to range and zdd */
+	zdd->devmem_allocation = devmem_allocation;	/* Owns ref */
+
+err_finalize:
+	if (err)
+		drm_pagemap_migration_unlock_put_pages(npages, migrate.dst);
+	migrate_vma_pages(&migrate);
+	migrate_vma_finalize(&migrate);
+	drm_pagemap_migrate_unmap_pages(devmem_allocation->dev, dma_addr, npages,
+					DMA_TO_DEVICE);
+err_free:
+	if (zdd)
+		drm_pagemap_zdd_put(zdd);
+	kvfree(buf);
+err_out:
+	return err;
+}
+EXPORT_SYMBOL_GPL(drm_pagemap_migrate_to_devmem);
+
+/**
+ * drm_pagemap_migrate_populate_ram_pfn() - Populate RAM PFNs for a VM area
+ * @vas: Pointer to the VM area structure, can be NULL
+ * @fault_page: Fault page
+ * @npages: Number of pages to populate
+ * @mpages: Number of pages to migrate
+ * @src_mpfn: Source array of migrate PFNs
+ * @mpfn: Array of migrate PFNs to populate
+ * @addr: Start address for PFN allocation
+ *
+ * This function populates the RAM migrate page frame numbers (PFNs) for the
+ * specified VM area structure. It allocates and locks pages in the VM area for
+ * RAM usage. If vas is non-NULL use alloc_page_vma for allocation, if NULL use
+ * alloc_page for allocation.
+ *
+ * Returns:
+ * 0 on success, negative error code on failure.
+ */
+static int drm_pagemap_migrate_populate_ram_pfn(struct vm_area_struct *vas,
+						struct page *fault_page,
+						unsigned long npages,
+						unsigned long *mpages,
+						unsigned long *src_mpfn,
+						unsigned long *mpfn,
+						unsigned long addr)
+{
+	unsigned long i;
+
+	for (i = 0; i < npages; ++i, addr += PAGE_SIZE) {
+		struct page *page, *src_page;
+
+		if (!(src_mpfn[i] & MIGRATE_PFN_MIGRATE))
+			continue;
+
+		src_page = migrate_pfn_to_page(src_mpfn[i]);
+		if (!src_page)
+			continue;
+
+		if (fault_page) {
+			if (src_page->zone_device_data !=
+			    fault_page->zone_device_data)
+				continue;
+		}
+
+		if (vas)
+			page = alloc_page_vma(GFP_HIGHUSER, vas, addr);
+		else
+			page = alloc_page(GFP_HIGHUSER);
+
+		if (!page)
+			goto free_pages;
+
+		mpfn[i] = migrate_pfn(page_to_pfn(page));
+	}
+
+	for (i = 0; i < npages; ++i) {
+		struct page *page = migrate_pfn_to_page(mpfn[i]);
+
+		if (!page)
+			continue;
+
+		WARN_ON_ONCE(!trylock_page(page));
+		++*mpages;
+	}
+
+	return 0;
+
+free_pages:
+	for (i = 0; i < npages; ++i) {
+		struct page *page = migrate_pfn_to_page(mpfn[i]);
+
+		if (!page)
+			continue;
+
+		put_page(page);
+		mpfn[i] = 0;
+	}
+	return -ENOMEM;
+}
+
+/**
+ * drm_pagemap_evict_to_ram() - Evict GPU SVM range to RAM
+ * @devmem_allocation: Pointer to the device memory allocation
+ *
+ * Similar to __drm_pagemap_migrate_to_ram but does not require mmap lock and
+ * migration done via migrate_device_* functions.
+ *
+ * Returns:
+ * 0 on success, negative error code on failure.
+ */
+int drm_pagemap_evict_to_ram(struct drm_pagemap_devmem *devmem_allocation)
+{
+	const struct drm_pagemap_devmem_ops *ops = devmem_allocation->ops;
+	unsigned long npages, mpages = 0;
+	struct page **pages;
+	unsigned long *src, *dst;
+	dma_addr_t *dma_addr;
+	void *buf;
+	int i, err = 0;
+	unsigned int retry_count = 2;
+
+	npages = devmem_allocation->size >> PAGE_SHIFT;
+
+retry:
+	if (!mmget_not_zero(devmem_allocation->mm))
+		return -EFAULT;
+
+	buf = kvcalloc(npages, 2 * sizeof(*src) + sizeof(*dma_addr) +
+		       sizeof(*pages), GFP_KERNEL);
+	if (!buf) {
+		err = -ENOMEM;
+		goto err_out;
+	}
+	src = buf;
+	dst = buf + (sizeof(*src) * npages);
+	dma_addr = buf + (2 * sizeof(*src) * npages);
+	pages = buf + (2 * sizeof(*src) + sizeof(*dma_addr)) * npages;
+
+	err = ops->populate_devmem_pfn(devmem_allocation, npages, src);
+	if (err)
+		goto err_free;
+
+	err = migrate_device_pfns(src, npages);
+	if (err)
+		goto err_free;
+
+	err = drm_pagemap_migrate_populate_ram_pfn(NULL, NULL, npages, &mpages,
+						   src, dst, 0);
+	if (err || !mpages)
+		goto err_finalize;
+
+	err = drm_pagemap_migrate_map_pages(devmem_allocation->dev, dma_addr,
+					    dst, npages, DMA_FROM_DEVICE);
+	if (err)
+		goto err_finalize;
+
+	for (i = 0; i < npages; ++i)
+		pages[i] = migrate_pfn_to_page(src[i]);
+
+	err = ops->copy_to_ram(pages, dma_addr, npages);
+	if (err)
+		goto err_finalize;
+
+err_finalize:
+	if (err)
+		drm_pagemap_migration_unlock_put_pages(npages, dst);
+	migrate_device_pages(src, dst, npages);
+	migrate_device_finalize(src, dst, npages);
+	drm_pagemap_migrate_unmap_pages(devmem_allocation->dev, dma_addr, npages,
+					DMA_FROM_DEVICE);
+err_free:
+	kvfree(buf);
+err_out:
+	mmput_async(devmem_allocation->mm);
+
+	if (completion_done(&devmem_allocation->detached))
+		return 0;
+
+	if (!err || retry_count--) {
+		cond_resched();
+		goto retry;
+	}
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(drm_pagemap_evict_to_ram);
+
+/**
+ * __drm_pagemap_migrate_to_ram() - Migrate a virtual range to RAM (internal)
+ * @vas: Pointer to the VM area structure
+ * @device_private_page_owner: Device private pages owner
+ * @page: Pointer to the page for fault handling (can be NULL)
+ * @fault_addr: Fault address
+ * @size: Size of migration
+ *
+ * This internal function performs the migration of the specified GPU SVM range
+ * to RAM. It sets up the migration, populates + dma maps RAM PFNs, and
+ * invokes the driver-specific operations for migration to RAM.
+ *
+ * Returns:
+ * 0 on success, negative error code on failure.
+ */
+static int __drm_pagemap_migrate_to_ram(struct vm_area_struct *vas,
+					void *device_private_page_owner,
+					struct page *page,
+					unsigned long fault_addr,
+					unsigned long size)
+{
+	struct migrate_vma migrate = {
+		.vma		= vas,
+		.pgmap_owner	= device_private_page_owner,
+		.flags		= MIGRATE_VMA_SELECT_DEVICE_PRIVATE |
+		MIGRATE_VMA_SELECT_DEVICE_COHERENT,
+		.fault_page	= page,
+	};
+	struct drm_pagemap_zdd *zdd;
+	const struct drm_pagemap_devmem_ops *ops;
+	struct device *dev;
+	unsigned long npages, mpages = 0;
+	struct page **pages;
+	dma_addr_t *dma_addr;
+	unsigned long start, end;
+	void *buf;
+	int i, err = 0;
+
+	start = ALIGN_DOWN(fault_addr, size);
+	end = ALIGN(fault_addr + 1, size);
+
+	/* Corner where VMA area struct has been partially unmapped */
+	if (start < vas->vm_start)
+		start = vas->vm_start;
+	if (end > vas->vm_end)
+		end = vas->vm_end;
+
+	migrate.start = start;
+	migrate.end = end;
+	npages = npages_in_range(start, end);
+
+	buf = kvcalloc(npages, 2 * sizeof(*migrate.src) + sizeof(*dma_addr) +
+		       sizeof(*pages), GFP_KERNEL);
+	if (!buf) {
+		err = -ENOMEM;
+		goto err_out;
+	}
+	dma_addr = buf + (2 * sizeof(*migrate.src) * npages);
+	pages = buf + (2 * sizeof(*migrate.src) + sizeof(*dma_addr)) * npages;
+
+	migrate.vma = vas;
+	migrate.src = buf;
+	migrate.dst = migrate.src + npages;
+
+	err = migrate_vma_setup(&migrate);
+	if (err)
+		goto err_free;
+
+	/* Raced with another CPU fault, nothing to do */
+	if (!migrate.cpages)
+		goto err_free;
+
+	if (!page) {
+		for (i = 0; i < npages; ++i) {
+			if (!(migrate.src[i] & MIGRATE_PFN_MIGRATE))
+				continue;
+
+			page = migrate_pfn_to_page(migrate.src[i]);
+			break;
+		}
+
+		if (!page)
+			goto err_finalize;
+	}
+	zdd = page->zone_device_data;
+	ops = zdd->devmem_allocation->ops;
+	dev = zdd->devmem_allocation->dev;
+
+	err = drm_pagemap_migrate_populate_ram_pfn(vas, page, npages, &mpages,
+						   migrate.src, migrate.dst,
+						   start);
+	if (err)
+		goto err_finalize;
+
+	err = drm_pagemap_migrate_map_pages(dev, dma_addr, migrate.dst, npages,
+					    DMA_FROM_DEVICE);
+	if (err)
+		goto err_finalize;
+
+	for (i = 0; i < npages; ++i)
+		pages[i] = migrate_pfn_to_page(migrate.src[i]);
+
+	err = ops->copy_to_ram(pages, dma_addr, npages);
+	if (err)
+		goto err_finalize;
+
+err_finalize:
+	if (err)
+		drm_pagemap_migration_unlock_put_pages(npages, migrate.dst);
+	migrate_vma_pages(&migrate);
+	migrate_vma_finalize(&migrate);
+	drm_pagemap_migrate_unmap_pages(dev, dma_addr, npages,
+					DMA_FROM_DEVICE);
+err_free:
+	kvfree(buf);
+err_out:
+
+	return err;
+}
+
+/**
+ * drm_pagemap_page_free() - Put GPU SVM zone device data associated with a page
+ * @page: Pointer to the page
+ *
+ * This function is a callback used to put the GPU SVM zone device data
+ * associated with a page when it is being released.
+ */
+static void drm_pagemap_page_free(struct page *page)
+{
+	drm_pagemap_zdd_put(page->zone_device_data);
+}
+
+/**
+ * drm_pagemap_migrate_to_ram() - Migrate a virtual range to RAM (page fault handler)
+ * @vmf: Pointer to the fault information structure
+ *
+ * This function is a page fault handler used to migrate a virtual range
+ * to ram. The device memory allocation in which the device page is found is
+ * migrated in its entirety.
+ *
+ * Returns:
+ * VM_FAULT_SIGBUS on failure, 0 on success.
+ */
+static vm_fault_t drm_pagemap_migrate_to_ram(struct vm_fault *vmf)
+{
+	struct drm_pagemap_zdd *zdd = vmf->page->zone_device_data;
+	int err;
+
+	err = __drm_pagemap_migrate_to_ram(vmf->vma,
+					   zdd->device_private_page_owner,
+					   vmf->page, vmf->address,
+					   zdd->devmem_allocation->size);
+
+	return err ? VM_FAULT_SIGBUS : 0;
+}
+
+static const struct dev_pagemap_ops drm_pagemap_pagemap_ops = {
+	.page_free = drm_pagemap_page_free,
+	.migrate_to_ram = drm_pagemap_migrate_to_ram,
+};
+
+/**
+ * drm_pagemap_pagemap_ops_get() - Retrieve GPU SVM device page map operations
+ *
+ * Returns:
+ * Pointer to the GPU SVM device page map operations structure.
+ */
+const struct dev_pagemap_ops *drm_pagemap_pagemap_ops_get(void)
+{
+	return &drm_pagemap_pagemap_ops;
+}
+EXPORT_SYMBOL_GPL(drm_pagemap_pagemap_ops_get);
+
+/**
+ * drm_pagemap_devmem_init() - Initialize a drm_pagemap device memory allocation
+ *
+ * @devmem_allocation: The struct drm_pagemap_devmem to initialize.
+ * @dev: Pointer to the device structure which device memory allocation belongs to
+ * @mm: Pointer to the mm_struct for the address space
+ * @ops: Pointer to the operations structure for GPU SVM device memory
+ * @dpagemap: The struct drm_pagemap we're allocating from.
+ * @size: Size of device memory allocation
+ */
+void drm_pagemap_devmem_init(struct drm_pagemap_devmem *devmem_allocation,
+			     struct device *dev, struct mm_struct *mm,
+			     const struct drm_pagemap_devmem_ops *ops,
+			     struct drm_pagemap *dpagemap, size_t size)
+{
+	init_completion(&devmem_allocation->detached);
+	devmem_allocation->dev = dev;
+	devmem_allocation->mm = mm;
+	devmem_allocation->ops = ops;
+	devmem_allocation->dpagemap = dpagemap;
+	devmem_allocation->size = size;
+}
+EXPORT_SYMBOL_GPL(drm_pagemap_devmem_init);
+
+/**
+ * drm_pagemap_page_to_dpagemap() - Return a pointer the drm_pagemap of a page
+ * @page: The struct page.
+ *
+ * Return: A pointer to the struct drm_pagemap of a device private page that
+ * was populated from the struct drm_pagemap. If the page was *not* populated
+ * from a struct drm_pagemap, the result is undefined and the function call
+ * may result in dereferencing and invalid address.
+ */
+struct drm_pagemap *drm_pagemap_page_to_dpagemap(struct page *page)
+{
+	struct drm_pagemap_zdd *zdd = page->zone_device_data;
+
+	return zdd->devmem_allocation->dpagemap;
+}
+EXPORT_SYMBOL_GPL(drm_pagemap_page_to_dpagemap);
diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
index aea4240664fa..ad3c584fb741 100644
--- a/drivers/gpu/drm/xe/Kconfig
+++ b/drivers/gpu/drm/xe/Kconfig
@@ -86,14 +86,16 @@ config DRM_XE_GPUSVM
 
 	  If in doubut say "Y".
 
-config DRM_XE_DEVMEM_MIRROR
-	bool "Enable device memory mirror"
+config DRM_XE_PAGEMAP
+	bool "Enable device memory pool for SVM"
 	depends on DRM_XE_GPUSVM
 	select GET_FREE_REGION
 	default y
 	help
-	  Disable this option only if you want to compile out without device
-	  memory mirror. Will reduce KMD memory footprint when disabled.
+	  Disable this option only if you don't want to expose local device
+	  memory for SVM. Will reduce KMD memory footprint when disabled.
+
+	  If in doubut say "Y".
 
 config DRM_XE_FORCE_PROBE
 	string "Force probe xe for selected Intel hardware IDs"
diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
index 15a92e3d4898..09bfe6806925 100644
--- a/drivers/gpu/drm/xe/xe_bo_types.h
+++ b/drivers/gpu/drm/xe/xe_bo_types.h
@@ -82,7 +82,7 @@ struct xe_bo {
 	u16 cpu_caching;
 
 	/** @devmem_allocation: SVM device memory allocation */
-	struct drm_gpusvm_devmem devmem_allocation;
+	struct drm_pagemap_devmem devmem_allocation;
 
 	/** @vram_userfault_link: Link into @mem_access.vram_userfault.list */
 		struct list_head vram_userfault_link;
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 8aa90acc2a0a..d288a5880508 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -109,7 +109,7 @@ struct xe_vram_region {
 	void __iomem *mapping;
 	/** @ttm: VRAM TTM manager */
 	struct xe_ttm_vram_mgr ttm;
-#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
+#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
 	/** @pagemap: Used to remap device memory as ZONE_DEVICE */
 	struct dev_pagemap pagemap;
 	/**
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index d364c9f458e7..3894efe7ba60 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -1544,7 +1544,7 @@ void xe_migrate_wait(struct xe_migrate *m)
 		dma_fence_wait(m->fence, false);
 }
 
-#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
+#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
 static u32 pte_update_cmd_size(u64 size)
 {
 	u32 num_dword;
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 1a8e17a0005d..88b45ca8e277 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -339,7 +339,7 @@ static void xe_svm_garbage_collector_work_func(struct work_struct *w)
 	up_write(&vm->lock);
 }
 
-#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
+#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
 
 static struct xe_vram_region *page_to_vr(struct page *page)
 {
@@ -527,12 +527,12 @@ static int xe_svm_copy_to_ram(struct page **pages, dma_addr_t *dma_addr,
 	return xe_svm_copy(pages, dma_addr, npages, XE_SVM_COPY_TO_SRAM);
 }
 
-static struct xe_bo *to_xe_bo(struct drm_gpusvm_devmem *devmem_allocation)
+static struct xe_bo *to_xe_bo(struct drm_pagemap_devmem *devmem_allocation)
 {
 	return container_of(devmem_allocation, struct xe_bo, devmem_allocation);
 }
 
-static void xe_svm_devmem_release(struct drm_gpusvm_devmem *devmem_allocation)
+static void xe_svm_devmem_release(struct drm_pagemap_devmem *devmem_allocation)
 {
 	struct xe_bo *bo = to_xe_bo(devmem_allocation);
 
@@ -549,7 +549,7 @@ static struct drm_buddy *tile_to_buddy(struct xe_tile *tile)
 	return &tile->mem.vram.ttm.mm;
 }
 
-static int xe_svm_populate_devmem_pfn(struct drm_gpusvm_devmem *devmem_allocation,
+static int xe_svm_populate_devmem_pfn(struct drm_pagemap_devmem *devmem_allocation,
 				      unsigned long npages, unsigned long *pfn)
 {
 	struct xe_bo *bo = to_xe_bo(devmem_allocation);
@@ -572,7 +572,7 @@ static int xe_svm_populate_devmem_pfn(struct drm_gpusvm_devmem *devmem_allocatio
 	return 0;
 }
 
-static const struct drm_gpusvm_devmem_ops gpusvm_devmem_ops = {
+static const struct drm_pagemap_devmem_ops dpagemap_devmem_ops = {
 	.devmem_release = xe_svm_devmem_release,
 	.populate_devmem_pfn = xe_svm_populate_devmem_pfn,
 	.copy_to_devmem = xe_svm_copy_to_devmem,
@@ -654,7 +654,7 @@ static bool xe_svm_range_is_valid(struct xe_svm_range *range,
 	return (range->tile_present & ~range->tile_invalidated) & BIT(tile->id);
 }
 
-#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
+#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
 static struct xe_vram_region *tile_to_vr(struct xe_tile *tile)
 {
 	return &tile->mem.vram;
@@ -672,6 +672,9 @@ static int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
 	ktime_t end = 0;
 	int err;
 
+	if (!range->base.flags.migrate_devmem)
+		return -EINVAL;
+
 	range_debug(range, "ALLOCATE VRAM");
 
 	if (!mmget_not_zero(mm))
@@ -691,19 +694,22 @@ static int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
 		goto unlock;
 	}
 
-	drm_gpusvm_devmem_init(&bo->devmem_allocation,
-			       vm->xe->drm.dev, mm,
-			       &gpusvm_devmem_ops,
-			       &tile->mem.vram.dpagemap,
-			       xe_svm_range_size(range));
+	drm_pagemap_devmem_init(&bo->devmem_allocation,
+				vm->xe->drm.dev, mm,
+				&dpagemap_devmem_ops,
+				&tile->mem.vram.dpagemap,
+				xe_svm_range_size(range));
 
 	blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)->blocks;
 	list_for_each_entry(block, blocks, link)
 		block->private = vr;
 
 	xe_bo_get(bo);
-	err = drm_gpusvm_migrate_to_devmem(&vm->svm.gpusvm, &range->base,
-					   &bo->devmem_allocation, ctx);
+	err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation,
+					    mm,
+					    xe_svm_range_start(range),
+					    xe_svm_range_end(range),
+					    xe_svm_devm_owner(vm->xe));
 	if (err)
 		xe_svm_devmem_release(&bo->devmem_allocation);
 
@@ -746,9 +752,9 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 	struct drm_gpusvm_ctx ctx = {
 		.read_only = xe_vma_read_only(vma),
 		.devmem_possible = IS_DGFX(vm->xe) &&
-			IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR),
+			IS_ENABLED(CONFIG_DRM_XE_PAGEMAP),
 		.check_pages_threshold = IS_DGFX(vm->xe) &&
-			IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR) ? SZ_64K : 0,
+			IS_ENABLED(CONFIG_DRM_XE_PAGEMAP) ? SZ_64K : 0,
 	};
 	struct xe_svm_range *range;
 	struct drm_gpusvm_range *r;
@@ -876,10 +882,10 @@ bool xe_svm_has_mapping(struct xe_vm *vm, u64 start, u64 end)
  */
 int xe_svm_bo_evict(struct xe_bo *bo)
 {
-	return drm_gpusvm_evict_to_ram(&bo->devmem_allocation);
+	return drm_pagemap_evict_to_ram(&bo->devmem_allocation);
 }
 
-#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
+#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
 
 static struct drm_pagemap_device_addr
 xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap,
@@ -936,7 +942,7 @@ int xe_devm_add(struct xe_tile *tile, struct xe_vram_region *vr)
 	vr->pagemap.range.start = res->start;
 	vr->pagemap.range.end = res->end;
 	vr->pagemap.nr_range = 1;
-	vr->pagemap.ops = drm_gpusvm_pagemap_ops_get();
+	vr->pagemap.ops = drm_pagemap_pagemap_ops_get();
 	vr->pagemap.owner = xe_svm_devm_owner(xe);
 	addr = devm_memremap_pages(dev, &vr->pagemap);
 
diff --git a/include/drm/drm_gpusvm.h b/include/drm/drm_gpusvm.h
index df120b4d1f83..97c641bf49c5 100644
--- a/include/drm/drm_gpusvm.h
+++ b/include/drm/drm_gpusvm.h
@@ -16,88 +16,11 @@ struct drm_gpusvm;
 struct drm_gpusvm_notifier;
 struct drm_gpusvm_ops;
 struct drm_gpusvm_range;
-struct drm_gpusvm_devmem;
 struct drm_pagemap;
 struct drm_pagemap_device_addr;
 
-/**
- * struct drm_gpusvm_devmem_ops - Operations structure for GPU SVM device memory
- *
- * This structure defines the operations for GPU Shared Virtual Memory (SVM)
- * device memory. These operations are provided by the GPU driver to manage device memory
- * allocations and perform operations such as migration between device memory and system
- * RAM.
- */
-struct drm_gpusvm_devmem_ops {
-	/**
-	 * @devmem_release: Release device memory allocation (optional)
-	 * @devmem_allocation: device memory allocation
-	 *
-	 * Release device memory allocation and drop a reference to device
-	 * memory allocation.
-	 */
-	void (*devmem_release)(struct drm_gpusvm_devmem *devmem_allocation);
-
-	/**
-	 * @populate_devmem_pfn: Populate device memory PFN (required for migration)
-	 * @devmem_allocation: device memory allocation
-	 * @npages: Number of pages to populate
-	 * @pfn: Array of page frame numbers to populate
-	 *
-	 * Populate device memory page frame numbers (PFN).
-	 *
-	 * Return: 0 on success, a negative error code on failure.
-	 */
-	int (*populate_devmem_pfn)(struct drm_gpusvm_devmem *devmem_allocation,
-				   unsigned long npages, unsigned long *pfn);
-
-	/**
-	 * @copy_to_devmem: Copy to device memory (required for migration)
-	 * @pages: Pointer to array of device memory pages (destination)
-	 * @dma_addr: Pointer to array of DMA addresses (source)
-	 * @npages: Number of pages to copy
-	 *
-	 * Copy pages to device memory.
-	 *
-	 * Return: 0 on success, a negative error code on failure.
-	 */
-	int (*copy_to_devmem)(struct page **pages,
-			      dma_addr_t *dma_addr,
-			      unsigned long npages);
-
-	/**
-	 * @copy_to_ram: Copy to system RAM (required for migration)
-	 * @pages: Pointer to array of device memory pages (source)
-	 * @dma_addr: Pointer to array of DMA addresses (destination)
-	 * @npages: Number of pages to copy
-	 *
-	 * Copy pages to system RAM.
-	 *
-	 * Return: 0 on success, a negative error code on failure.
-	 */
-	int (*copy_to_ram)(struct page **pages,
-			   dma_addr_t *dma_addr,
-			   unsigned long npages);
-};
-
-/**
- * struct drm_gpusvm_devmem - Structure representing a GPU SVM device memory allocation
- *
- * @dev: Pointer to the device structure which device memory allocation belongs to
- * @mm: Pointer to the mm_struct for the address space
- * @detached: device memory allocations is detached from device pages
- * @ops: Pointer to the operations structure for GPU SVM device memory
- * @dpagemap: The struct drm_pagemap of the pages this allocation belongs to.
- * @size: Size of device memory allocation
- */
-struct drm_gpusvm_devmem {
-	struct device *dev;
-	struct mm_struct *mm;
-	struct completion detached;
-	const struct drm_gpusvm_devmem_ops *ops;
-	struct drm_pagemap *dpagemap;
-	size_t size;
-};
+struct drm_pagemap_devmem_ops;
+struct drm_pagemap_dma_addr;
 
 /**
  * struct drm_gpusvm_ops - Operations structure for GPU SVM
@@ -337,15 +260,6 @@ void drm_gpusvm_range_unmap_pages(struct drm_gpusvm *gpusvm,
 				  struct drm_gpusvm_range *range,
 				  const struct drm_gpusvm_ctx *ctx);
 
-int drm_gpusvm_migrate_to_devmem(struct drm_gpusvm *gpusvm,
-				 struct drm_gpusvm_range *range,
-				 struct drm_gpusvm_devmem *devmem_allocation,
-				 const struct drm_gpusvm_ctx *ctx);
-
-int drm_gpusvm_evict_to_ram(struct drm_gpusvm_devmem *devmem_allocation);
-
-const struct dev_pagemap_ops *drm_gpusvm_pagemap_ops_get(void);
-
 bool drm_gpusvm_has_mapping(struct drm_gpusvm *gpusvm, unsigned long start,
 			    unsigned long end);
 
@@ -356,11 +270,6 @@ drm_gpusvm_range_find(struct drm_gpusvm_notifier *notifier, unsigned long start,
 void drm_gpusvm_range_set_unmapped(struct drm_gpusvm_range *range,
 				   const struct mmu_notifier_range *mmu_range);
 
-void drm_gpusvm_devmem_init(struct drm_gpusvm_devmem *devmem_allocation,
-			    struct device *dev, struct mm_struct *mm,
-			    const struct drm_gpusvm_devmem_ops *ops,
-			    struct drm_pagemap *dpagemap, size_t size);
-
 #ifdef CONFIG_LOCKDEP
 /**
  * drm_gpusvm_driver_set_lock() - Set the lock protecting accesses to GPU SVM
diff --git a/include/drm/drm_pagemap.h b/include/drm/drm_pagemap.h
index 202c157ff4d7..32f0d7f23075 100644
--- a/include/drm/drm_pagemap.h
+++ b/include/drm/drm_pagemap.h
@@ -7,6 +7,7 @@
 #include <linux/types.h>
 
 struct drm_pagemap;
+struct drm_pagemap_zdd;
 struct device;
 
 /**
@@ -104,4 +105,101 @@ struct drm_pagemap {
 	struct device *dev;
 };
 
+struct drm_pagemap_devmem;
+
+/**
+ * struct drm_pagemap_devmem_ops - Operations structure for GPU SVM device memory
+ *
+ * This structure defines the operations for GPU Shared Virtual Memory (SVM)
+ * device memory. These operations are provided by the GPU driver to manage device memory
+ * allocations and perform operations such as migration between device memory and system
+ * RAM.
+ */
+struct drm_pagemap_devmem_ops {
+	/**
+	 * @devmem_release: Release device memory allocation (optional)
+	 * @devmem_allocation: device memory allocation
+	 *
+	 * Release device memory allocation and drop a reference to device
+	 * memory allocation.
+	 */
+	void (*devmem_release)(struct drm_pagemap_devmem *devmem_allocation);
+
+	/**
+	 * @populate_devmem_pfn: Populate device memory PFN (required for migration)
+	 * @devmem_allocation: device memory allocation
+	 * @npages: Number of pages to populate
+	 * @pfn: Array of page frame numbers to populate
+	 *
+	 * Populate device memory page frame numbers (PFN).
+	 *
+	 * Return: 0 on success, a negative error code on failure.
+	 */
+	int (*populate_devmem_pfn)(struct drm_pagemap_devmem *devmem_allocation,
+				   unsigned long npages, unsigned long *pfn);
+
+	/**
+	 * @copy_to_devmem: Copy to device memory (required for migration)
+	 * @pages: Pointer to array of device memory pages (destination)
+	 * @dma_addr: Pointer to array of DMA addresses (source)
+	 * @npages: Number of pages to copy
+	 *
+	 * Copy pages to device memory.
+	 *
+	 * Return: 0 on success, a negative error code on failure.
+	 */
+	int (*copy_to_devmem)(struct page **pages,
+			      dma_addr_t *dma_addr,
+			      unsigned long npages);
+
+	/**
+	 * @copy_to_ram: Copy to system RAM (required for migration)
+	 * @pages: Pointer to array of device memory pages (source)
+	 * @dma_addr: Pointer to array of DMA addresses (destination)
+	 * @npages: Number of pages to copy
+	 *
+	 * Copy pages to system RAM.
+	 *
+	 * Return: 0 on success, a negative error code on failure.
+	 */
+	int (*copy_to_ram)(struct page **pages,
+			   dma_addr_t *dma_addr,
+			   unsigned long npages);
+};
+
+/**
+ * struct drm_pagemap_devmem - Structure representing a GPU SVM device memory allocation
+ *
+ * @dev: Pointer to the device structure which device memory allocation belongs to
+ * @mm: Pointer to the mm_struct for the address space
+ * @detached: device memory allocations is detached from device pages
+ * @ops: Pointer to the operations structure for GPU SVM device memory
+ * @dpagemap: The struct drm_pagemap of the pages this allocation belongs to.
+ * @size: The size of the allocation.
+ */
+struct drm_pagemap_devmem {
+	struct device *dev;
+	struct mm_struct *mm;
+	struct completion detached;
+	const struct drm_pagemap_devmem_ops *ops;
+	struct drm_pagemap *dpagemap;
+	size_t size;
+};
+
+int drm_pagemap_migrate_to_devmem(struct drm_pagemap_devmem *devmem_allocation,
+				  struct mm_struct *mm,
+				  unsigned long start, unsigned long end,
+				  void *pgmap_owner);
+
+int drm_pagemap_evict_to_ram(struct drm_pagemap_devmem *devmem_allocation);
+
+const struct dev_pagemap_ops *drm_pagemap_pagemap_ops_get(void);
+
+struct drm_pagemap *drm_pagemap_page_to_dpagemap(struct page *page);
+
+void drm_pagemap_devmem_init(struct drm_pagemap_devmem *devmem_allocation,
+			     struct device *dev, struct mm_struct *mm,
+			     const struct drm_pagemap_devmem_ops *ops,
+			     struct drm_pagemap *dpagemap, size_t size);
+
 #endif
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 04/19] drm/pagemap: Add a populate_mm op
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
                   ` (2 preceding siblings ...)
  2025-03-12 21:04 ` [RFC PATCH 03/19] drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap Thomas Hellström
@ 2025-03-12 21:04 ` Thomas Hellström
  2025-03-12 21:04 ` [RFC PATCH 05/19] drm/xe: Implement and use the drm_pagemap " Thomas Hellström
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:04 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Thomas Hellström, himal.prasad.ghimiray, apopple, airlied,
	Simona Vetter, felix.kuehling, Matthew Brost,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

Add an operation to populate a part of a drm_mm with device
private memory.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/drm_gpusvm.c  |  7 ++-----
 drivers/gpu/drm/drm_pagemap.c | 32 ++++++++++++++++++++++++++++++++
 include/drm/drm_pagemap.h     | 32 ++++++++++++++++++++++++++++++++
 3 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_gpusvm.c b/drivers/gpu/drm/drm_gpusvm.c
index 4fade7018507..d84e27283768 100644
--- a/drivers/gpu/drm/drm_gpusvm.c
+++ b/drivers/gpu/drm/drm_gpusvm.c
@@ -175,11 +175,8 @@
  *		}
  *
  *		if (driver_migration_policy(range)) {
- *			mmap_read_lock(mm);
- *			devmem = driver_alloc_devmem();
- *			err = drm_pagemap_migrate_to_devmem(devmem, gpusvm->mm, gpuva_start,
- *                                                          gpuva_end, driver_pgmap_owner());
- *                      mmap_read_unlock(mm);
+ *			err = drm_pagemap_populate_mm(driver_choose_drm_pagemap(),
+ *                                                    gpuva_start, gpuva_end, gpusvm->mm);
  *			if (err)	// CPU mappings may have changed
  *				goto retry;
  *		}
diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c
index c46bb4384444..27e3f90cf49a 100644
--- a/drivers/gpu/drm/drm_pagemap.c
+++ b/drivers/gpu/drm/drm_pagemap.c
@@ -6,6 +6,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/migrate.h>
 #include <linux/pagemap.h>
+#include <drm/drm_drv.h>
 #include <drm/drm_pagemap.h>
 
 /**
@@ -782,3 +783,34 @@ struct drm_pagemap *drm_pagemap_page_to_dpagemap(struct page *page)
 	return zdd->devmem_allocation->dpagemap;
 }
 EXPORT_SYMBOL_GPL(drm_pagemap_page_to_dpagemap);
+
+/**
+ * drm_pagemap_populate_mm() - Populate a virtual range with device memory pages
+ * @dpagemap: Pointer to the drm_pagemap managing the device memory
+ * @start: Start of the virtual range to populate.
+ * @end: End of the virtual range to populate.
+ * @mm: Pointer to the virtual address space.
+ *
+ * Attempt to populate a virtual range with device memory pages,
+ * clearing them or migrating data from the existing pages if necessary.
+ * The function is best effort only, and implementations may vary
+ * in how hard they try to satisfy the request.
+ *
+ * Return: 0 on success, negative error code on error. If the hardware
+ * device was removed / unbound the function will return -ENODEV;
+ */
+int drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
+			    unsigned long start, unsigned long end,
+			    struct mm_struct *mm)
+{
+	int err;
+
+	if (!mmget_not_zero(mm))
+		return -EFAULT;
+	mmap_read_lock(mm);
+	err = dpagemap->ops->populate_mm(dpagemap, start, end, mm);
+	mmap_read_unlock(mm);
+	mmput(mm);
+
+	return err;
+}
diff --git a/include/drm/drm_pagemap.h b/include/drm/drm_pagemap.h
index 32f0d7f23075..c591736e7c48 100644
--- a/include/drm/drm_pagemap.h
+++ b/include/drm/drm_pagemap.h
@@ -92,6 +92,34 @@ struct drm_pagemap_ops {
 			     struct device *dev,
 			     struct drm_pagemap_device_addr addr);
 
+	/**
+	 * @populate_mm: Populate part of the mm with @dpagemap memory,
+	 * migrating existing data.
+	 * @dpagemap: The struct drm_pagemap managing the memory.
+	 * @start: The virtual start address in @mm
+	 * @end: The virtual end address in @mm
+	 * @mm: Pointer to a live mm. The caller must have an mmget()
+	 * reference.
+	 *
+	 * The caller will have the mm lock at least in read mode.
+	 * Note that there is no guarantee that the memory is resident
+	 * after the function returns, it's best effort only.
+	 * When the mm is not using the memory anymore,
+	 * it will be released. The struct drm_pagemap might have a
+	 * mechanism in place to reclaim the memory and the data will
+	 * then be migrated. Typically to system memory.
+	 * The implementation should hold sufficient runtime power-
+	 * references while pages are used in an address space and
+	 * should ideally guard against hardware device unbind in
+	 * a way such that device pages are migrated back to system
+	 * followed by device page removal. The implementation should
+	 * return -ENODEV after device removal.
+	 *
+	 * Return: 0 if successful. Negative error code on error.
+	 */
+	int (*populate_mm)(struct drm_pagemap *dpagemap,
+			   unsigned long start, unsigned long end,
+			   struct mm_struct *mm);
 };
 
 /**
@@ -202,4 +230,8 @@ void drm_pagemap_devmem_init(struct drm_pagemap_devmem *devmem_allocation,
 			     const struct drm_pagemap_devmem_ops *ops,
 			     struct drm_pagemap *dpagemap, size_t size);
 
+int drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
+			    unsigned long start, unsigned long end,
+			    struct mm_struct *mm);
+
 #endif
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 05/19] drm/xe: Implement and use the drm_pagemap populate_mm op
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
                   ` (3 preceding siblings ...)
  2025-03-12 21:04 ` [RFC PATCH 04/19] drm/pagemap: Add a populate_mm op Thomas Hellström
@ 2025-03-12 21:04 ` Thomas Hellström
  2025-03-12 21:04 ` [RFC PATCH 06/19] drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap and manage lifetime Thomas Hellström
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:04 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Thomas Hellström, himal.prasad.ghimiray, apopple, airlied,
	Simona Vetter, felix.kuehling, Matthew Brost,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

Add runtime PM since we might call populate_mm on a foreign device.
Also create the VRAM bos as ttm_bo_type_kernel. This avoids the
initial clearing and the creation of an mmap handle.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/drm_pagemap.c |  4 +-
 drivers/gpu/drm/xe/xe_svm.c   | 80 ++++++++++++++++++-----------------
 drivers/gpu/drm/xe/xe_tile.h  | 11 +++++
 3 files changed, 55 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c
index 27e3f90cf49a..99394c7d1d66 100644
--- a/drivers/gpu/drm/drm_pagemap.c
+++ b/drivers/gpu/drm/drm_pagemap.c
@@ -276,7 +276,8 @@ npages_in_range(unsigned long start, unsigned long end)
  * The caller should hold a reference to the device memory allocation,
  * and the reference is consumed by this function unless it returns with
  * an error.
- * @mm: Pointer to the struct mm_struct.
+ * @mm: Pointer to the struct mm_struct. This pointer should hold a reference to
+ * the mm, and the mm should be locked on entry.
  * @start: Start of the virtual address range to migrate.
  * @end: End of the virtual address range to migrate.
  * @pgmap_owner: Not used currently, since only system memory is considered.
@@ -814,3 +815,4 @@ int drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
 
 	return err;
 }
+EXPORT_SYMBOL(drm_pagemap_populate_mm);
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 88b45ca8e277..36ae7d6a218b 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -3,12 +3,16 @@
  * Copyright © 2024 Intel Corporation
  */
 
+#include <drm/drm_drv.h>
+
 #include "xe_bo.h"
 #include "xe_gt_tlb_invalidation.h"
 #include "xe_migrate.h"
 #include "xe_module.h"
+#include "xe_pm.h"
 #include "xe_pt.h"
 #include "xe_svm.h"
+#include "xe_tile.h"
 #include "xe_ttm_vram_mgr.h"
 #include "xe_vm.h"
 #include "xe_vm_types.h"
@@ -535,8 +539,10 @@ static struct xe_bo *to_xe_bo(struct drm_pagemap_devmem *devmem_allocation)
 static void xe_svm_devmem_release(struct drm_pagemap_devmem *devmem_allocation)
 {
 	struct xe_bo *bo = to_xe_bo(devmem_allocation);
+	struct xe_device *xe = xe_bo_device(bo);
 
 	xe_bo_put_async(bo);
+	xe_pm_runtime_put(xe);
 }
 
 static u64 block_offset_to_pfn(struct xe_vram_region *vr, u64 offset)
@@ -660,77 +666,66 @@ static struct xe_vram_region *tile_to_vr(struct xe_tile *tile)
 	return &tile->mem.vram;
 }
 
-static int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
-			     struct xe_svm_range *range,
-			     const struct drm_gpusvm_ctx *ctx)
+static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
+				      unsigned long start, unsigned long end,
+				      struct mm_struct *mm)
 {
-	struct mm_struct *mm = vm->svm.gpusvm.mm;
+	struct xe_tile *tile = container_of(dpagemap, typeof(*tile), mem.vram.dpagemap);
+	struct xe_device *xe = tile_to_xe(tile);
+	struct device *dev = xe->drm.dev;
 	struct xe_vram_region *vr = tile_to_vr(tile);
 	struct drm_buddy_block *block;
 	struct list_head *blocks;
 	struct xe_bo *bo;
-	ktime_t end = 0;
-	int err;
+	ktime_t time_end = 0;
+	int err, idx;
 
-	if (!range->base.flags.migrate_devmem)
-		return -EINVAL;
+	if (!drm_dev_enter(&xe->drm, &idx))
+		return -ENODEV;
 
-	range_debug(range, "ALLOCATE VRAM");
+	xe_pm_runtime_get(xe);
 
-	if (!mmget_not_zero(mm))
-		return -EFAULT;
-	mmap_read_lock(mm);
-
-retry:
-	bo = xe_bo_create_locked(tile_to_xe(tile), NULL, NULL,
-				 xe_svm_range_size(range),
-				 ttm_bo_type_device,
+ retry:
+	bo = xe_bo_create_locked(tile_to_xe(tile), NULL, NULL, end - start,
+				 ttm_bo_type_kernel,
 				 XE_BO_FLAG_VRAM_IF_DGFX(tile) |
 				 XE_BO_FLAG_CPU_ADDR_MIRROR);
 	if (IS_ERR(bo)) {
 		err = PTR_ERR(bo);
-		if (xe_vm_validate_should_retry(NULL, err, &end))
+		if (xe_vm_validate_should_retry(NULL, err, &time_end))
 			goto retry;
-		goto unlock;
+		goto out_pm_put;
 	}
 
-	drm_pagemap_devmem_init(&bo->devmem_allocation,
-				vm->xe->drm.dev, mm,
+	drm_pagemap_devmem_init(&bo->devmem_allocation, dev, mm,
 				&dpagemap_devmem_ops,
 				&tile->mem.vram.dpagemap,
-				xe_svm_range_size(range));
+				end - start);
 
 	blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)->blocks;
 	list_for_each_entry(block, blocks, link)
 		block->private = vr;
 
 	xe_bo_get(bo);
-	err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation,
-					    mm,
-					    xe_svm_range_start(range),
-					    xe_svm_range_end(range),
-					    xe_svm_devm_owner(vm->xe));
+
+	/* Ensure the device has a pm ref while there are device pages active. */
+	xe_pm_runtime_get_noresume(xe);
+	err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation, mm,
+					    start, end, xe_svm_devm_owner(xe));
 	if (err)
 		xe_svm_devmem_release(&bo->devmem_allocation);
 
 	xe_bo_unlock(bo);
 	xe_bo_put(bo);
 
-unlock:
-	mmap_read_unlock(mm);
-	mmput(mm);
+out_pm_put:
+	xe_pm_runtime_put(xe);
+	drm_dev_exit(idx);
 
 	return err;
 }
-#else
-static int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
-			     struct xe_svm_range *range,
-			     const struct drm_gpusvm_ctx *ctx)
-{
-	return -EOPNOTSUPP;
-}
-#endif
 
+#endif
 
 /**
  * xe_svm_handle_pagefault() - SVM handle page fault
@@ -787,9 +782,15 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 	/* XXX: Add migration policy, for now migrate range once */
 	if (!range->skip_migrate && range->base.flags.migrate_devmem &&
 	    xe_svm_range_size(range) >= SZ_64K) {
+		struct drm_pagemap *dpagemap;
+
 		range->skip_migrate = true;
 
-		err = xe_svm_alloc_vram(vm, tile, range, &ctx);
+		range_debug(range, "ALLOCATE VRAM");
+		dpagemap = xe_tile_local_pagemap(tile);
+		err = drm_pagemap_populate_mm(dpagemap, xe_svm_range_start(range),
+					      xe_svm_range_end(range),
+					      range->base.gpusvm->mm);
 		if (err) {
 			drm_dbg(&vm->xe->drm,
 				"VRAM allocation failed, falling back to "
@@ -911,6 +912,7 @@ xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap,
 
 static const struct drm_pagemap_ops xe_drm_pagemap_ops = {
 	.device_map = xe_drm_pagemap_device_map,
+	.populate_mm = xe_drm_pagemap_populate_mm,
 };
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_tile.h b/drivers/gpu/drm/xe/xe_tile.h
index eb939316d55b..066a3d0cea79 100644
--- a/drivers/gpu/drm/xe/xe_tile.h
+++ b/drivers/gpu/drm/xe/xe_tile.h
@@ -16,4 +16,15 @@ int xe_tile_init(struct xe_tile *tile);
 
 void xe_tile_migrate_wait(struct xe_tile *tile);
 
+#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
+static inline struct drm_pagemap *xe_tile_local_pagemap(struct xe_tile *tile)
+{
+	return &tile->mem.vram.dpagemap;
+}
+#else
+static inline struct drm_pagemap *xe_tile_local_pagemap(struct xe_tile *tile)
+{
+	return NULL;
+}
+#endif
 #endif
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 06/19] drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap and manage lifetime
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
                   ` (4 preceding siblings ...)
  2025-03-12 21:04 ` [RFC PATCH 05/19] drm/xe: Implement and use the drm_pagemap " Thomas Hellström
@ 2025-03-12 21:04 ` Thomas Hellström
  2025-03-12 21:04 ` [RFC PATCH 07/19] drm/pagemap: Get rid of the struct drm_pagemap_zdd::device_private_page_owner field Thomas Hellström
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:04 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Thomas Hellström, himal.prasad.ghimiray, apopple, airlied,
	Simona Vetter, felix.kuehling, Matthew Brost,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

Remove the xe embedded drm_pagemap, and instead allocate and
reference count.
This is a step towards adding drm_pagemaps on demand.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/drm_pagemap.c        | 58 +++++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_device_types.h |  2 +-
 drivers/gpu/drm/xe/xe_svm.c          | 27 +++++++++----
 drivers/gpu/drm/xe/xe_tile.h         |  2 +-
 include/drm/drm_pagemap.h            | 25 ++++++++++++
 5 files changed, 102 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c
index 99394c7d1d66..8a0bdf38fc65 100644
--- a/drivers/gpu/drm/drm_pagemap.c
+++ b/drivers/gpu/drm/drm_pagemap.c
@@ -61,6 +61,7 @@
  *
  * @refcount: Reference count for the zdd
  * @devmem_allocation: device memory allocation
+ * @dpagemap: Pointer to the struct drm_pagemap.
  * @device_private_page_owner: Device private pages owner
  *
  * This structure serves as a generic wrapper installed in
@@ -73,11 +74,13 @@
 struct drm_pagemap_zdd {
 	struct kref refcount;
 	struct drm_pagemap_devmem *devmem_allocation;
+	struct drm_pagemap *dpagemap;
 	void *device_private_page_owner;
 };
 
 /**
  * drm_pagemap_zdd_alloc() - Allocate a zdd structure.
+ * @dpagemap: Pointer to the struct drm_pagemap.
  * @device_private_page_owner: Device private pages owner
  *
  * This function allocates and initializes a new zdd structure. It sets up the
@@ -86,7 +89,7 @@ struct drm_pagemap_zdd {
  * Return: Pointer to the allocated zdd on success, ERR_PTR() on failure.
  */
 static struct drm_pagemap_zdd *
-drm_pagemap_zdd_alloc(void *device_private_page_owner)
+drm_pagemap_zdd_alloc(struct drm_pagemap *dpagemap, void *device_private_page_owner)
 {
 	struct drm_pagemap_zdd *zdd;
 
@@ -97,6 +100,7 @@ drm_pagemap_zdd_alloc(void *device_private_page_owner)
 	kref_init(&zdd->refcount);
 	zdd->devmem_allocation = NULL;
 	zdd->device_private_page_owner = device_private_page_owner;
+	zdd->dpagemap = dpagemap;
 
 	return zdd;
 }
@@ -340,7 +344,7 @@ int drm_pagemap_migrate_to_devmem(struct drm_pagemap_devmem *devmem_allocation,
 	dma_addr = buf + (2 * sizeof(*migrate.src) * npages);
 	pages = buf + (2 * sizeof(*migrate.src) + sizeof(*dma_addr)) * npages;
 
-	zdd = drm_pagemap_zdd_alloc(pgmap_owner);
+	zdd = drm_pagemap_zdd_alloc(devmem_allocation->dpagemap, pgmap_owner);
 	if (!zdd) {
 		err = -ENOMEM;
 		goto err_free;
@@ -484,6 +488,56 @@ static int drm_pagemap_migrate_populate_ram_pfn(struct vm_area_struct *vas,
 	return -ENOMEM;
 }
 
+static void drm_pagemap_release(struct kref *ref)
+{
+	struct drm_pagemap *dpagemap = container_of(ref, typeof(*dpagemap), ref);
+
+	kfree(dpagemap);
+}
+
+/**
+ * drm_pagemap_create() - Create a struct drm_pagemap.
+ * @dev: Pointer to a struct device providing the device-private memory.
+ * @pagemap: Pointer to a pre-setup struct dev_pagemap providing the struct pages.
+ * @ops: Pointer to the struct drm_pagemap_ops.
+ *
+ * Allocate and initialize a struct drm_pagemap.
+ *
+ * Return: A refcounted pointer to a struct drm_pagemap on success.
+ * Error pointer on error.
+ */
+struct drm_pagemap *
+drm_pagemap_create(struct device *dev,
+		   struct dev_pagemap *pagemap,
+		   const struct drm_pagemap_ops *ops)
+{
+	struct drm_pagemap *dpagemap = kzalloc(sizeof(*dpagemap), GFP_KERNEL);
+
+	if (!dpagemap)
+		return ERR_PTR(-ENOMEM);
+
+	kref_init(&dpagemap->ref);
+	dpagemap->dev = dev;
+	dpagemap->ops = ops;
+	dpagemap->pagemap = pagemap;
+
+	return dpagemap;
+}
+EXPORT_SYMBOL(drm_pagemap_create);
+
+/**
+ * drm_pagemap_put() - Put a struct drm_pagemap reference
+ * @dpagemap: Pointer to a struct drm_pagemap object.
+ *
+ * Puts a struct drm_pagemap reference and frees the drm_pagemap object
+ * if the refount reaches zero.
+ */
+void drm_pagemap_put(struct drm_pagemap *dpagemap)
+{
+	kref_put(&dpagemap->ref, drm_pagemap_release);
+}
+EXPORT_SYMBOL(drm_pagemap_put);
+
 /**
  * drm_pagemap_evict_to_ram() - Evict GPU SVM range to RAM
  * @devmem_allocation: Pointer to the device memory allocation
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index d288a5880508..40c6f88f5933 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -116,7 +116,7 @@ struct xe_vram_region {
 	 * @dpagemap: The struct drm_pagemap of the ZONE_DEVICE memory
 	 * pages of this tile.
 	 */
-	struct drm_pagemap dpagemap;
+	struct drm_pagemap *dpagemap;
 	/**
 	 * @hpa_base: base host physical address
 	 *
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 36ae7d6a218b..37e1607052ed 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -670,7 +670,8 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
 				      unsigned long start, unsigned long end,
 				      struct mm_struct *mm)
 {
-	struct xe_tile *tile = container_of(dpagemap, typeof(*tile), mem.vram.dpagemap);
+	struct xe_tile *tile = container_of(dpagemap->pagemap, typeof(*tile),
+					    mem.vram.pagemap);
 	struct xe_device *xe = tile_to_xe(tile);
 	struct device *dev = xe->drm.dev;
 	struct xe_vram_region *vr = tile_to_vr(tile);
@@ -699,7 +700,7 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
 
 	drm_pagemap_devmem_init(&bo->devmem_allocation, dev, mm,
 				&dpagemap_devmem_ops,
-				&tile->mem.vram.dpagemap,
+				tile->mem.vram.dpagemap,
 				end - start);
 
 	blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)->blocks;
@@ -940,6 +941,15 @@ int xe_devm_add(struct xe_tile *tile, struct xe_vram_region *vr)
 		return ret;
 	}
 
+	vr->dpagemap = drm_pagemap_create(dev, &vr->pagemap,
+					  &xe_drm_pagemap_ops);
+	if (IS_ERR(vr->dpagemap)) {
+		drm_err(&xe->drm, "Failed to create drm_pagemap tile %d memory: %pe\n",
+			tile->id, vr->dpagemap);
+		ret = PTR_ERR(vr->dpagemap);
+		goto out_no_dpagemap;
+	}
+
 	vr->pagemap.type = MEMORY_DEVICE_PRIVATE;
 	vr->pagemap.range.start = res->start;
 	vr->pagemap.range.end = res->end;
@@ -947,22 +957,23 @@ int xe_devm_add(struct xe_tile *tile, struct xe_vram_region *vr)
 	vr->pagemap.ops = drm_pagemap_pagemap_ops_get();
 	vr->pagemap.owner = xe_svm_devm_owner(xe);
 	addr = devm_memremap_pages(dev, &vr->pagemap);
-
-	vr->dpagemap.dev = dev;
-	vr->dpagemap.ops = &xe_drm_pagemap_ops;
-
 	if (IS_ERR(addr)) {
-		devm_release_mem_region(dev, res->start, resource_size(res));
 		ret = PTR_ERR(addr);
 		drm_err(&xe->drm, "Failed to remap tile %d memory, errno %pe\n",
 			tile->id, ERR_PTR(ret));
-		return ret;
+		goto out_failed_memremap;
 	}
 	vr->hpa_base = res->start;
 
 	drm_dbg(&xe->drm, "Added tile %d memory [%llx-%llx] to devm, remapped to %pr\n",
 		tile->id, vr->io_start, vr->io_start + vr->usable_size, res);
 	return 0;
+
+out_failed_memremap:
+	drm_pagemap_put(vr->dpagemap);
+out_no_dpagemap:
+	devm_release_mem_region(dev, res->start, resource_size(res));
+	return ret;
 }
 #else
 int xe_devm_add(struct xe_tile *tile, struct xe_vram_region *vr)
diff --git a/drivers/gpu/drm/xe/xe_tile.h b/drivers/gpu/drm/xe/xe_tile.h
index 066a3d0cea79..1d42b235c322 100644
--- a/drivers/gpu/drm/xe/xe_tile.h
+++ b/drivers/gpu/drm/xe/xe_tile.h
@@ -19,7 +19,7 @@ void xe_tile_migrate_wait(struct xe_tile *tile);
 #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
 static inline struct drm_pagemap *xe_tile_local_pagemap(struct xe_tile *tile)
 {
-	return &tile->mem.vram.dpagemap;
+	return tile->mem.vram.dpagemap;
 }
 #else
 static inline struct drm_pagemap *xe_tile_local_pagemap(struct xe_tile *tile)
diff --git a/include/drm/drm_pagemap.h b/include/drm/drm_pagemap.h
index c591736e7c48..49f2e0b6c699 100644
--- a/include/drm/drm_pagemap.h
+++ b/include/drm/drm_pagemap.h
@@ -126,11 +126,15 @@ struct drm_pagemap_ops {
  * struct drm_pagemap: Additional information for a struct dev_pagemap
  * used for device p2p handshaking.
  * @ops: The struct drm_pagemap_ops.
+ * @ref: Reference count.
  * @dev: The struct drevice owning the device-private memory.
+ * @pagemap: Pointer to the underlying dev_pagemap.
  */
 struct drm_pagemap {
 	const struct drm_pagemap_ops *ops;
+	struct kref ref;
 	struct device *dev;
+	struct dev_pagemap *pagemap;
 };
 
 struct drm_pagemap_devmem;
@@ -195,6 +199,26 @@ struct drm_pagemap_devmem_ops {
 			   unsigned long npages);
 };
 
+struct drm_pagemap *drm_pagemap_create(struct device *dev,
+				       struct dev_pagemap *pagemap,
+				       const struct drm_pagemap_ops *ops);
+
+void drm_pagemap_put(struct drm_pagemap *dpagemap);
+
+/**
+ * drm_pagemap_get() - Obtain a reference on a struct drm_pagemap
+ * @dpagemap: Pointer to the struct drm_pagemap.
+ *
+ * Return: Pointer to the struct drm_pagemap.
+ */
+static inline struct drm_pagemap *
+drm_pagemap_get(struct drm_pagemap *dpagemap)
+{
+	kref_get(&dpagemap->ref);
+
+	return dpagemap;
+}
+
 /**
  * struct drm_pagemap_devmem - Structure representing a GPU SVM device memory allocation
  *
@@ -235,3 +259,4 @@ int drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
 			    struct mm_struct *mm);
 
 #endif
+
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 07/19] drm/pagemap: Get rid of the struct drm_pagemap_zdd::device_private_page_owner field
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
                   ` (5 preceding siblings ...)
  2025-03-12 21:04 ` [RFC PATCH 06/19] drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap and manage lifetime Thomas Hellström
@ 2025-03-12 21:04 ` Thomas Hellström
  2025-03-12 21:04 ` [RFC PATCH 08/19] drm/xe/bo: Add a bo remove callback Thomas Hellström
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:04 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Thomas Hellström, himal.prasad.ghimiray, apopple, airlied,
	Simona Vetter, felix.kuehling, Matthew Brost,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

Now that there is always a valid page pointer, we can deduce the owner
from the page.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/drm_pagemap.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c
index 8a0bdf38fc65..d1efcd78a023 100644
--- a/drivers/gpu/drm/drm_pagemap.c
+++ b/drivers/gpu/drm/drm_pagemap.c
@@ -62,7 +62,6 @@
  * @refcount: Reference count for the zdd
  * @devmem_allocation: device memory allocation
  * @dpagemap: Pointer to the struct drm_pagemap.
- * @device_private_page_owner: Device private pages owner
  *
  * This structure serves as a generic wrapper installed in
  * page->zone_device_data. It provides infrastructure for looking up a device
@@ -75,13 +74,11 @@ struct drm_pagemap_zdd {
 	struct kref refcount;
 	struct drm_pagemap_devmem *devmem_allocation;
 	struct drm_pagemap *dpagemap;
-	void *device_private_page_owner;
 };
 
 /**
  * drm_pagemap_zdd_alloc() - Allocate a zdd structure.
  * @dpagemap: Pointer to the struct drm_pagemap.
- * @device_private_page_owner: Device private pages owner
  *
  * This function allocates and initializes a new zdd structure. It sets up the
  * reference count and initializes the destroy work.
@@ -89,7 +86,7 @@ struct drm_pagemap_zdd {
  * Return: Pointer to the allocated zdd on success, ERR_PTR() on failure.
  */
 static struct drm_pagemap_zdd *
-drm_pagemap_zdd_alloc(struct drm_pagemap *dpagemap, void *device_private_page_owner)
+drm_pagemap_zdd_alloc(struct drm_pagemap *dpagemap)
 {
 	struct drm_pagemap_zdd *zdd;
 
@@ -99,7 +96,6 @@ drm_pagemap_zdd_alloc(struct drm_pagemap *dpagemap, void *device_private_page_ow
 
 	kref_init(&zdd->refcount);
 	zdd->devmem_allocation = NULL;
-	zdd->device_private_page_owner = device_private_page_owner;
 	zdd->dpagemap = dpagemap;
 
 	return zdd;
@@ -344,7 +340,7 @@ int drm_pagemap_migrate_to_devmem(struct drm_pagemap_devmem *devmem_allocation,
 	dma_addr = buf + (2 * sizeof(*migrate.src) * npages);
 	pages = buf + (2 * sizeof(*migrate.src) + sizeof(*dma_addr)) * npages;
 
-	zdd = drm_pagemap_zdd_alloc(devmem_allocation->dpagemap, pgmap_owner);
+	zdd = drm_pagemap_zdd_alloc(devmem_allocation->dpagemap);
 	if (!zdd) {
 		err = -ENOMEM;
 		goto err_free;
@@ -628,8 +624,7 @@ EXPORT_SYMBOL_GPL(drm_pagemap_evict_to_ram);
 /**
  * __drm_pagemap_migrate_to_ram() - Migrate a virtual range to RAM (internal)
  * @vas: Pointer to the VM area structure
- * @device_private_page_owner: Device private pages owner
- * @page: Pointer to the page for fault handling (can be NULL)
+ * @page: Pointer to the page for fault handling.
  * @fault_addr: Fault address
  * @size: Size of migration
  *
@@ -641,14 +636,13 @@ EXPORT_SYMBOL_GPL(drm_pagemap_evict_to_ram);
  * 0 on success, negative error code on failure.
  */
 static int __drm_pagemap_migrate_to_ram(struct vm_area_struct *vas,
-					void *device_private_page_owner,
 					struct page *page,
 					unsigned long fault_addr,
 					unsigned long size)
 {
 	struct migrate_vma migrate = {
 		.vma		= vas,
-		.pgmap_owner	= device_private_page_owner,
+		.pgmap_owner	= page->pgmap->owner,
 		.flags		= MIGRATE_VMA_SELECT_DEVICE_PRIVATE |
 		MIGRATE_VMA_SELECT_DEVICE_COHERENT,
 		.fault_page	= page,
@@ -774,7 +768,6 @@ static vm_fault_t drm_pagemap_migrate_to_ram(struct vm_fault *vmf)
 	int err;
 
 	err = __drm_pagemap_migrate_to_ram(vmf->vma,
-					   zdd->device_private_page_owner,
 					   vmf->page, vmf->address,
 					   zdd->devmem_allocation->size);
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 08/19] drm/xe/bo: Add a bo remove callback
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
                   ` (6 preceding siblings ...)
  2025-03-12 21:04 ` [RFC PATCH 07/19] drm/pagemap: Get rid of the struct drm_pagemap_zdd::device_private_page_owner field Thomas Hellström
@ 2025-03-12 21:04 ` Thomas Hellström
  2025-03-14 13:05   ` Thomas Hellström
  2025-03-12 21:04 ` [RFC PATCH 09/19] drm/pagemap_util: Add a utility to assign an owner to a set of interconnected gpus Thomas Hellström
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:04 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Thomas Hellström, himal.prasad.ghimiray, apopple, airlied,
	Simona Vetter, felix.kuehling, Matthew Brost,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

On device unbind, migrate exported bos, including pagemap bos to
system. This allows importers to take proper action without
disruption. In particular, SVM clients on remote devices may
continue as if nothing happened, and can chose a different
placement.

The evict_flags() placement is chosen in such a way that bos that
aren't exported are purged.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/drm_pagemap.c        | 113 ++++++--
 drivers/gpu/drm/xe/xe_bo.c           |  53 +++-
 drivers/gpu/drm/xe/xe_bo.h           |   2 +
 drivers/gpu/drm/xe/xe_device.c       |   5 +
 drivers/gpu/drm/xe/xe_device_types.h |  28 +-
 drivers/gpu/drm/xe/xe_svm.c          | 412 ++++++++++++++++++++++-----
 drivers/gpu/drm/xe/xe_svm.h          |  49 ++++
 drivers/gpu/drm/xe/xe_tile.c         |  20 +-
 drivers/gpu/drm/xe/xe_tile.h         |  28 +-
 drivers/gpu/drm/xe/xe_vm_types.h     |   1 +
 include/drm/drm_pagemap.h            |  53 +++-
 11 files changed, 645 insertions(+), 119 deletions(-)

diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c
index d1efcd78a023..dcb26328f94b 100644
--- a/drivers/gpu/drm/drm_pagemap.c
+++ b/drivers/gpu/drm/drm_pagemap.c
@@ -97,6 +97,7 @@ drm_pagemap_zdd_alloc(struct drm_pagemap *dpagemap)
 	kref_init(&zdd->refcount);
 	zdd->devmem_allocation = NULL;
 	zdd->dpagemap = dpagemap;
+	kref_get(&dpagemap->ref);
 
 	return zdd;
 }
@@ -126,6 +127,7 @@ static void drm_pagemap_zdd_destroy(struct kref *ref)
 	struct drm_pagemap_zdd *zdd =
 		container_of(ref, struct drm_pagemap_zdd, refcount);
 	struct drm_pagemap_devmem *devmem = zdd->devmem_allocation;
+	struct drm_pagemap *dpagemap = zdd->dpagemap;
 
 	if (devmem) {
 		complete_all(&devmem->detached);
@@ -133,6 +135,7 @@ static void drm_pagemap_zdd_destroy(struct kref *ref)
 			devmem->ops->devmem_release(devmem);
 	}
 	kfree(zdd);
+	drm_pagemap_put(dpagemap);
 }
 
 /**
@@ -484,42 +487,113 @@ static int drm_pagemap_migrate_populate_ram_pfn(struct vm_area_struct *vas,
 	return -ENOMEM;
 }
 
+/**
+ * struct drm_pagemap_dev_hold - Struct to aid in drm_device release.
+ * @work: work struct for async release.
+ * @drm: drm device to put.
+ *
+ * When a struct drm_pagemap is released, we also need to release the
+ * reference it holds on the drm device. However, typically that needs
+ * to be done separately from a workqueue that is not removed in the
+ * drm device destructor since that would cause a deadlock flushing
+ * that workqueue. Each time a struct drm_pagemap is initialized
+ * (or re-initialized if cached) therefore allocate a separate work
+ * item using this struct, from which we put the drm device and
+ * associated module.
+ */
+struct drm_pagemap_dev_hold {
+	struct work_struct work;
+	struct drm_device *drm;
+};
+
 static void drm_pagemap_release(struct kref *ref)
 {
 	struct drm_pagemap *dpagemap = container_of(ref, typeof(*dpagemap), ref);
+	struct drm_pagemap_dev_hold *dev_hold = dpagemap->dev_hold;
 
-	kfree(dpagemap);
+	dpagemap->ops->destroy(dpagemap);
+	schedule_work(&dev_hold->work);
+}
+
+static void drm_pagemap_dev_unhold_work(struct work_struct *work)
+{
+	struct drm_pagemap_dev_hold *dev_hold =
+		container_of(work, typeof(*dev_hold), work);
+	struct drm_device *drm = dev_hold->drm;
+	struct module *module = drm->driver->fops->owner;
+
+	drm_dev_put(drm);
+	module_put(module);
+	kfree(dev_hold);
+}
+
+static struct drm_pagemap_dev_hold *
+drm_pagemap_dev_hold(struct drm_pagemap *dpagemap)
+{
+	struct drm_pagemap_dev_hold *dev_hold;
+	struct drm_device *drm = dpagemap->drm;
+
+	dev_hold = kzalloc(sizeof(*dev_hold), GFP_KERNEL);
+	if (!dev_hold)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_WORK(&dev_hold->work, drm_pagemap_dev_unhold_work);
+	dev_hold->drm = drm;
+	(void)try_module_get(drm->driver->fops->owner);
+	drm_dev_get(drm);
+
+	return dev_hold;
 }
 
 /**
- * drm_pagemap_create() - Create a struct drm_pagemap.
- * @dev: Pointer to a struct device providing the device-private memory.
- * @pagemap: Pointer to a pre-setup struct dev_pagemap providing the struct pages.
- * @ops: Pointer to the struct drm_pagemap_ops.
+ * drm_pagemap_reinit() - Reinitialize a drm_pagemap
+ * @dpagemap: The drm_pagemap to reinitialize
  *
- * Allocate and initialize a struct drm_pagemap.
+ * Reinitialize a drm_pagemap, for which drm_pagemap_release
+ * has already been called. This interface is intended for the
+ * situation where the driver caches a destroyed drm_pagemap.
  *
- * Return: A refcounted pointer to a struct drm_pagemap on success.
- * Error pointer on error.
+ * Return: 0 on success, negative error code on failure.
  */
-struct drm_pagemap *
-drm_pagemap_create(struct device *dev,
-		   struct dev_pagemap *pagemap,
-		   const struct drm_pagemap_ops *ops)
+int drm_pagemap_reinit(struct drm_pagemap *dpagemap)
 {
-	struct drm_pagemap *dpagemap = kzalloc(sizeof(*dpagemap), GFP_KERNEL);
+	dpagemap->dev_hold = drm_pagemap_dev_hold(dpagemap);
+	if (IS_ERR(dpagemap->dev_hold))
+		return PTR_ERR(dpagemap->dev_hold);
 
-	if (!dpagemap)
-		return ERR_PTR(-ENOMEM);
+	kref_init(&dpagemap->ref);
+	return 0;
+}
+EXPORT_SYMBOL(drm_pagemap_reinit);
 
+/**
+ * drm_pagemap_init() - Initialize a pre-allocated drm_pagemap
+ * @dpagemap: The drm_pagemap to initialize.
+ * @pagemap: The associated dev_pagemap providing the device
+ * private pages.
+ * @drm: The drm device. The drm_pagemap holds a reference on the
+ * drm_device and the module owning the drm_device until
+ * drm_pagemap_release(). This facilitates drm_pagemap exporting.
+ * @ops: The drm_pagemap ops.
+ *
+ * Initialize and take an initial reference on a drm_pagemap.
+ * After successful return, use drm_pagemap_put() to destroy.
+ *
+ ** Return: 0 on success, negative error code on error.
+ */
+int drm_pagemap_init(struct drm_pagemap *dpagemap,
+		     struct dev_pagemap *pagemap,
+		     struct drm_device *drm,
+		     const struct drm_pagemap_ops *ops)
+{
 	kref_init(&dpagemap->ref);
-	dpagemap->dev = dev;
 	dpagemap->ops = ops;
 	dpagemap->pagemap = pagemap;
+	dpagemap->drm = drm;
 
-	return dpagemap;
+	return drm_pagemap_reinit(dpagemap);
 }
-EXPORT_SYMBOL(drm_pagemap_create);
+EXPORT_SYMBOL(drm_pagemap_init);
 
 /**
  * drm_pagemap_put() - Put a struct drm_pagemap reference
@@ -530,7 +604,8 @@ EXPORT_SYMBOL(drm_pagemap_create);
  */
 void drm_pagemap_put(struct drm_pagemap *dpagemap)
 {
-	kref_put(&dpagemap->ref, drm_pagemap_release);
+	if (dpagemap)
+		kref_put(&dpagemap->ref, drm_pagemap_release);
 }
 EXPORT_SYMBOL(drm_pagemap_put);
 
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 64f9c936eea0..390f90fbd366 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -55,6 +55,8 @@ static struct ttm_placement sys_placement = {
 	.placement = &sys_placement_flags,
 };
 
+static struct ttm_placement purge_placement;
+
 static const struct ttm_place tt_placement_flags[] = {
 	{
 		.fpfn = 0,
@@ -281,6 +283,8 @@ int xe_bo_placement_for_flags(struct xe_device *xe, struct xe_bo *bo,
 static void xe_evict_flags(struct ttm_buffer_object *tbo,
 			   struct ttm_placement *placement)
 {
+	struct xe_device *xe = container_of(tbo->bdev, typeof(*xe), ttm);
+	bool device_unplugged = drm_dev_is_unplugged(&xe->drm);
 	struct xe_bo *bo;
 
 	if (!xe_bo_is_xe_bo(tbo)) {
@@ -290,7 +294,7 @@ static void xe_evict_flags(struct ttm_buffer_object *tbo,
 			return;
 		}
 
-		*placement = sys_placement;
+		*placement = device_unplugged ? purge_placement : sys_placement;
 		return;
 	}
 
@@ -300,6 +304,11 @@ static void xe_evict_flags(struct ttm_buffer_object *tbo,
 		return;
 	}
 
+	if (device_unplugged && !tbo->base.dma_buf) {
+		*placement = purge_placement;
+		return;
+	}
+
 	/*
 	 * For xe, sg bos that are evicted to system just triggers a
 	 * rebind of the sg list upon subsequent validation to XE_PL_TT.
@@ -657,11 +666,20 @@ static int xe_bo_move_dmabuf(struct ttm_buffer_object *ttm_bo,
 	struct xe_ttm_tt *xe_tt = container_of(ttm_bo->ttm, struct xe_ttm_tt,
 					       ttm);
 	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
+	bool device_unplugged = drm_dev_is_unplugged(&xe->drm);
 	struct sg_table *sg;
 
 	xe_assert(xe, attach);
 	xe_assert(xe, ttm_bo->ttm);
 
+	if (device_unplugged && new_res->mem_type == XE_PL_SYSTEM &&
+	    ttm_bo->sg) {
+		dma_resv_wait_timeout(ttm_bo->base.resv, DMA_RESV_USAGE_BOOKKEEP,
+				      false, MAX_SCHEDULE_TIMEOUT);
+		dma_buf_unmap_attachment(attach, ttm_bo->sg, DMA_BIDIRECTIONAL);
+		ttm_bo->sg = NULL;
+	}
+
 	if (new_res->mem_type == XE_PL_SYSTEM)
 		goto out;
 
@@ -2945,6 +2963,39 @@ void xe_bo_runtime_pm_release_mmap_offset(struct xe_bo *bo)
 	list_del_init(&bo->vram_userfault_link);
 }
 
+/**
+ * xe_bo_remove() - Handle bos when the pci_device is about to be removed
+ * @xe: The xe device.
+ *
+ * On pci_device removal we need to drop all dma mappings and move
+ * the data of exported bos out to system. This includes SVM bos and
+ * exported dma-buf bos. This is done by evicting all bos, but
+ * the evict placement in xe_evict_flags() is chosen such that all
+ * bos except those mentioned are purged, and thus their memory
+ * is released.
+ *
+ * Pinned bos are not handled, though. Ideally they should be released
+ * using devm_ actions.
+ */
+void xe_bo_remove(struct xe_device *xe)
+{
+	unsigned int mem_type;
+	int ret;
+
+	/*
+	 * Move pagemap bos and exported dma-buf to system.
+	 */
+	for (mem_type = XE_PL_VRAM1; mem_type >= XE_PL_TT; --mem_type) {
+		struct ttm_resource_manager *man =
+			ttm_manager_type(&xe->ttm, mem_type);
+
+		if (man) {
+			ret = ttm_resource_manager_evict_all(&xe->ttm, man);
+			drm_WARN_ON(&xe->drm, ret);
+		}
+	}
+}
+
 #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST)
 #include "tests/xe_bo.c"
 #endif
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index bda3fdd408da..22b1c63f9311 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -405,6 +405,8 @@ long xe_bo_shrink(struct ttm_operation_ctx *ctx, struct ttm_buffer_object *bo,
 		  const struct xe_bo_shrink_flags flags,
 		  unsigned long *scanned);
 
+void xe_bo_remove(struct xe_device *xe);
+
 #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST)
 /**
  * xe_bo_is_mem_type - Whether the bo currently resides in the given
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index b2f656b2a563..68de09db9ad5 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -54,6 +54,7 @@
 #include "xe_query.h"
 #include "xe_shrinker.h"
 #include "xe_sriov.h"
+#include "xe_svm.h"
 #include "xe_tile.h"
 #include "xe_ttm_stolen_mgr.h"
 #include "xe_ttm_sys_mgr.h"
@@ -925,6 +926,10 @@ void xe_device_remove(struct xe_device *xe)
 	xe_display_unregister(xe);
 
 	drm_dev_unplug(&xe->drm);
+
+	xe_bo_remove(xe);
+
+	xe_pagemaps_remove(xe);
 }
 
 void xe_device_shutdown(struct xe_device *xe)
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 40c6f88f5933..41ba05ae4cd5 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -110,19 +110,21 @@ struct xe_vram_region {
 	/** @ttm: VRAM TTM manager */
 	struct xe_ttm_vram_mgr ttm;
 #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
-	/** @pagemap: Used to remap device memory as ZONE_DEVICE */
-	struct dev_pagemap pagemap;
-	/**
-	 * @dpagemap: The struct drm_pagemap of the ZONE_DEVICE memory
-	 * pages of this tile.
-	 */
-	struct drm_pagemap *dpagemap;
-	/**
-	 * @hpa_base: base host physical address
-	 *
-	 * This is generated when remap device memory as ZONE_DEVICE
-	 */
-	resource_size_t hpa_base;
+	/** @pagemap_cache: Cached struct xe_pagemap for this memory region's memory. */
+	struct xe_pagemap_cache {
+		/** @pagemap_cache.pagemap_mutex: Protects @pagemap_cache.xpagemap. */
+		struct mutex mutex;
+		/** @pagemap_cache.xpagemap: Pointer to a struct xe_pagemap */
+		struct xe_pagemap *xpagemap;
+		/**
+		 * @pagemap_cache.queued: Completed when  @pagemap_cache.xpagemap is queued
+		 * for destruction.
+		 * There's a short interval in between @pagemap_cache.xpagemap's refcount
+		 * dropping to zero and when it's queued for destruction and
+		 * the destruction job can be canceled.
+		 */
+		struct completion queued;
+	} pagemap_cache;
 #endif
 };
 
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 37e1607052ed..c49bcfea5644 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -4,6 +4,8 @@
  */
 
 #include <drm/drm_drv.h>
+#include <drm/drm_managed.h>
+#include <drm/drm_pagemap.h>
 
 #include "xe_bo.h"
 #include "xe_gt_tlb_invalidation.h"
@@ -17,6 +19,8 @@
 #include "xe_vm.h"
 #include "xe_vm_types.h"
 
+static int xe_svm_get_pagemaps(struct xe_vm *vm);
+
 static bool xe_svm_range_in_vram(struct xe_svm_range *range)
 {
 	/* Not reliable without notifier lock */
@@ -345,28 +349,35 @@ static void xe_svm_garbage_collector_work_func(struct work_struct *w)
 
 #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
 
-static struct xe_vram_region *page_to_vr(struct page *page)
+static struct xe_vram_region *xe_pagemap_to_vr(struct xe_pagemap *xpagemap)
 {
-	return container_of(page->pgmap, struct xe_vram_region, pagemap);
+	return xpagemap->vr;
 }
 
-static struct xe_tile *vr_to_tile(struct xe_vram_region *vr)
+static struct xe_pagemap *xe_page_to_pagemap(struct page *page)
 {
-	return container_of(vr, struct xe_tile, mem.vram);
+	return container_of(page->pgmap, struct xe_pagemap, pagemap);
 }
 
-static u64 xe_vram_region_page_to_dpa(struct xe_vram_region *vr,
-				      struct page *page)
+static struct xe_vram_region *xe_page_to_vr(struct page *page)
 {
-	u64 dpa;
-	struct xe_tile *tile = vr_to_tile(vr);
+	return xe_pagemap_to_vr(xe_page_to_pagemap(page));
+}
+
+static u64 xe_page_to_dpa(struct page *page)
+{
+	struct xe_pagemap *xpagemap = xe_page_to_pagemap(page);
+	struct xe_vram_region *vr = xe_pagemap_to_vr(xpagemap);
+	struct xe_tile *tile = xe_vr_to_tile(vr);
+	u64 hpa_base = xpagemap->hpa_base;
 	u64 pfn = page_to_pfn(page);
 	u64 offset;
+	u64 dpa;
 
 	xe_tile_assert(tile, is_device_private_page(page));
-	xe_tile_assert(tile, (pfn << PAGE_SHIFT) >= vr->hpa_base);
+	xe_tile_assert(tile, (pfn << PAGE_SHIFT) >= hpa_base);
 
-	offset = (pfn << PAGE_SHIFT) - vr->hpa_base;
+	offset = (pfn << PAGE_SHIFT) - hpa_base;
 	dpa = vr->dpa_base + offset;
 
 	return dpa;
@@ -413,10 +424,10 @@ static int xe_svm_copy(struct page **pages, dma_addr_t *dma_addr,
 			continue;
 
 		if (!vr && spage) {
-			vr = page_to_vr(spage);
-			tile = vr_to_tile(vr);
+			vr = xe_page_to_vr(spage);
+			tile = xe_vr_to_tile(vr);
 		}
-		XE_WARN_ON(spage && page_to_vr(spage) != vr);
+		XE_WARN_ON(spage && xe_page_to_vr(spage) != vr);
 
 		/*
 		 * CPU page and device page valid, capture physical address on
@@ -424,7 +435,7 @@ static int xe_svm_copy(struct page **pages, dma_addr_t *dma_addr,
 		 * device pages.
 		 */
 		if (dma_addr[i] && spage) {
-			__vram_addr = xe_vram_region_page_to_dpa(vr, spage);
+			__vram_addr = xe_page_to_dpa(spage);
 			if (vram_addr == XE_VRAM_ADDR_INVALID) {
 				vram_addr = __vram_addr;
 				pos = i;
@@ -547,12 +558,12 @@ static void xe_svm_devmem_release(struct drm_pagemap_devmem *devmem_allocation)
 
 static u64 block_offset_to_pfn(struct xe_vram_region *vr, u64 offset)
 {
-	return PHYS_PFN(offset + vr->hpa_base);
+	return PHYS_PFN(offset + vr->pagemap_cache.xpagemap->hpa_base);
 }
 
 static struct drm_buddy *tile_to_buddy(struct xe_tile *tile)
 {
-	return &tile->mem.vram.ttm.mm;
+	return &xe_tile_to_vr(tile)->ttm.mm;
 }
 
 static int xe_svm_populate_devmem_pfn(struct drm_pagemap_devmem *devmem_allocation,
@@ -566,7 +577,7 @@ static int xe_svm_populate_devmem_pfn(struct drm_pagemap_devmem *devmem_allocati
 
 	list_for_each_entry(block, blocks, link) {
 		struct xe_vram_region *vr = block->private;
-		struct xe_tile *tile = vr_to_tile(vr);
+		struct xe_tile *tile = xe_vr_to_tile(vr);
 		struct drm_buddy *buddy = tile_to_buddy(tile);
 		u64 block_pfn = block_offset_to_pfn(vr, drm_buddy_block_offset(block));
 		int i;
@@ -585,6 +596,11 @@ static const struct drm_pagemap_devmem_ops dpagemap_devmem_ops = {
 	.copy_to_ram = xe_svm_copy_to_ram,
 };
 
+#else
+static int xe_svm_get_pagemaps(struct xe_vm *vm)
+{
+	return 0;
+}
 #endif
 
 static const struct drm_gpusvm_ops gpusvm_ops = {
@@ -599,6 +615,26 @@ static const unsigned long fault_chunk_sizes[] = {
 	SZ_4K,
 };
 
+static void xe_pagemap_put(struct xe_pagemap *xpagemap)
+{
+	drm_pagemap_put(&xpagemap->dpagemap);
+}
+
+static void xe_svm_put_pagemaps(struct xe_vm *vm)
+{
+	struct xe_device *xe = vm->xe;
+	struct xe_tile *tile;
+	int id;
+
+	for_each_tile(tile, xe, id) {
+		struct xe_pagemap *xpagemap = vm->svm.pagemaps[id];
+
+		if (xpagemap)
+			xe_pagemap_put(xpagemap);
+		vm->svm.pagemaps[id] = NULL;
+	}
+}
+
 /**
  * xe_svm_init() - SVM initialize
  * @vm: The VM.
@@ -616,13 +652,19 @@ int xe_svm_init(struct xe_vm *vm)
 	INIT_WORK(&vm->svm.garbage_collector.work,
 		  xe_svm_garbage_collector_work_func);
 
+	err = xe_svm_get_pagemaps(vm);
+	if (err)
+		return err;
+
 	err = drm_gpusvm_init(&vm->svm.gpusvm, "Xe SVM", &vm->xe->drm,
 			      current->mm, xe_svm_devm_owner(vm->xe), 0,
 			      vm->size, xe_modparam.svm_notifier_size * SZ_1M,
 			      &gpusvm_ops, fault_chunk_sizes,
 			      ARRAY_SIZE(fault_chunk_sizes));
-	if (err)
+	if (err) {
+		xe_svm_put_pagemaps(vm);
 		return err;
+	}
 
 	drm_gpusvm_driver_set_lock(&vm->svm.gpusvm, &vm->lock);
 
@@ -639,6 +681,7 @@ void xe_svm_close(struct xe_vm *vm)
 {
 	xe_assert(vm->xe, xe_vm_is_closed(vm));
 	flush_work(&vm->svm.garbage_collector.work);
+	xe_svm_put_pagemaps(vm);
 }
 
 /**
@@ -661,20 +704,16 @@ static bool xe_svm_range_is_valid(struct xe_svm_range *range,
 }
 
 #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
-static struct xe_vram_region *tile_to_vr(struct xe_tile *tile)
-{
-	return &tile->mem.vram;
-}
 
 static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
 				      unsigned long start, unsigned long end,
 				      struct mm_struct *mm)
 {
-	struct xe_tile *tile = container_of(dpagemap->pagemap, typeof(*tile),
-					    mem.vram.pagemap);
+	struct xe_pagemap *xpagemap = container_of(dpagemap, typeof(*xpagemap), dpagemap);
+	struct xe_vram_region *vr = xe_pagemap_to_vr(xpagemap);
+	struct xe_tile *tile = xe_vr_to_tile(vr);
 	struct xe_device *xe = tile_to_xe(tile);
 	struct device *dev = xe->drm.dev;
-	struct xe_vram_region *vr = tile_to_vr(tile);
 	struct drm_buddy_block *block;
 	struct list_head *blocks;
 	struct xe_bo *bo;
@@ -700,7 +739,7 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
 
 	drm_pagemap_devmem_init(&bo->devmem_allocation, dev, mm,
 				&dpagemap_devmem_ops,
-				tile->mem.vram.dpagemap,
+				dpagemap,
 				end - start);
 
 	blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)->blocks;
@@ -896,12 +935,12 @@ xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap,
 			  unsigned int order,
 			  enum dma_data_direction dir)
 {
-	struct device *pgmap_dev = dpagemap->dev;
+	struct device *pgmap_dev = dpagemap->drm->dev;
 	enum drm_interconnect_protocol prot;
 	dma_addr_t addr;
 
 	if (pgmap_dev == dev) {
-		addr = xe_vram_region_page_to_dpa(page_to_vr(page), page);
+		addr = xe_page_to_dpa(page);
 		prot = XE_INTERCONNECT_VRAM;
 	} else {
 		addr = DMA_MAPPING_ERROR;
@@ -911,73 +950,306 @@ xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap,
 	return drm_pagemap_device_addr_encode(addr, prot, order, dir);
 }
 
+static void xe_pagemap_fini(struct xe_pagemap *xpagemap)
+{
+	struct dev_pagemap *pagemap = &xpagemap->pagemap;
+	struct device *dev = xpagemap->dpagemap.drm->dev;
+
+	WRITE_ONCE(xpagemap->unplugged, true);
+	if (xpagemap->hpa_base) {
+		devm_memunmap_pages(dev, pagemap);
+		xpagemap->hpa_base = 0;
+	}
+
+	if (pagemap->range.start) {
+		devm_release_mem_region(dev, pagemap->range.start,
+					pagemap->range.end - pagemap->range.start + 1);
+		pagemap->range.start = 0;
+	}
+}
+
+static void xe_pagemap_destroy_work(struct work_struct *work)
+{
+	struct xe_pagemap *xpagemap = container_of(work, typeof(*xpagemap), destroy_work.work);
+	struct xe_pagemap_cache *cache = xpagemap->cache;
+
+	mutex_lock(&cache->mutex);
+	if (cache->xpagemap == xpagemap)
+		cache->xpagemap = NULL;
+	mutex_unlock(&cache->mutex);
+
+	xe_pagemap_fini(xpagemap);
+	kfree(xpagemap);
+}
+
+static void xe_pagemap_destroy(struct drm_pagemap *dpagemap)
+{
+	struct xe_pagemap *xpagemap = container_of(dpagemap, typeof(*xpagemap), dpagemap);
+	struct xe_device *xe = to_xe_device(dpagemap->drm);
+
+	/* Keep the pagemap cached for 5s, unless the device is unplugged. */
+	queue_delayed_work(xe->unordered_wq, &xpagemap->destroy_work,
+			   READ_ONCE(xpagemap->unplugged) ? 0 : secs_to_jiffies(5));
+
+	complete_all(&xpagemap->cache->queued);
+}
+
 static const struct drm_pagemap_ops xe_drm_pagemap_ops = {
 	.device_map = xe_drm_pagemap_device_map,
 	.populate_mm = xe_drm_pagemap_populate_mm,
+	.destroy = xe_pagemap_destroy,
 };
 
 /**
- * xe_devm_add: Remap and provide memmap backing for device memory
- * @tile: tile that the memory region belongs to
- * @vr: vram memory region to remap
+ * xe_pagemap_create() - Create a struct xe_pagemap object
+ * @xe: The xe device.
+ * @cache: Back-pointer to the struct xe_pagemap_cache.
+ * @vr: Back-pointer to the struct xe_vram_region.
  *
- * This remap device memory to host physical address space and create
- * struct page to back device memory
+ * Allocate and initialize a struct xe_pagemap. On successful
+ * return, drm_pagemap_put() on the embedded struct drm_pagemap
+ * should be used to unreference.
  *
- * Return: 0 on success standard error code otherwise
+ * Return: Pointer to a struct xe_pagemap if successful. Error pointer
+ * on failure.
  */
-int xe_devm_add(struct xe_tile *tile, struct xe_vram_region *vr)
+struct xe_pagemap *xe_pagemap_create(struct xe_device *xe, struct xe_pagemap_cache *cache,
+				     struct xe_vram_region *vr)
 {
-	struct xe_device *xe = tile_to_xe(tile);
-	struct device *dev = &to_pci_dev(xe->drm.dev)->dev;
+	struct device *dev = xe->drm.dev;
+	struct xe_pagemap *xpagemap;
+	struct dev_pagemap *pagemap;
+	struct drm_pagemap *dpagemap;
 	struct resource *res;
 	void *addr;
-	int ret;
+	int err;
+
+	xpagemap = kzalloc(sizeof(*xpagemap), GFP_KERNEL);
+	if (!xpagemap)
+		return ERR_PTR(-ENOMEM);
+
+	pagemap = &xpagemap->pagemap;
+	dpagemap = &xpagemap->dpagemap;
+	INIT_DELAYED_WORK(&xpagemap->destroy_work, xe_pagemap_destroy_work);
+	xpagemap->cache = cache;
+	xpagemap->vr = vr;
+
+	err = drm_pagemap_init(dpagemap, pagemap, &xe->drm, &xe_drm_pagemap_ops);
+	if (err)
+		goto out_no_dpagemap;
 
 	res = devm_request_free_mem_region(dev, &iomem_resource,
 					   vr->usable_size);
 	if (IS_ERR(res)) {
-		ret = PTR_ERR(res);
-		return ret;
+		err = PTR_ERR(res);
+		goto out_err;
 	}
 
-	vr->dpagemap = drm_pagemap_create(dev, &vr->pagemap,
-					  &xe_drm_pagemap_ops);
-	if (IS_ERR(vr->dpagemap)) {
-		drm_err(&xe->drm, "Failed to create drm_pagemap tile %d memory: %pe\n",
-			tile->id, vr->dpagemap);
-		ret = PTR_ERR(vr->dpagemap);
-		goto out_no_dpagemap;
-	}
-
-	vr->pagemap.type = MEMORY_DEVICE_PRIVATE;
-	vr->pagemap.range.start = res->start;
-	vr->pagemap.range.end = res->end;
-	vr->pagemap.nr_range = 1;
-	vr->pagemap.ops = drm_pagemap_pagemap_ops_get();
-	vr->pagemap.owner = xe_svm_devm_owner(xe);
-	addr = devm_memremap_pages(dev, &vr->pagemap);
+	pagemap->type = MEMORY_DEVICE_PRIVATE;
+	pagemap->range.start = res->start;
+	pagemap->range.end = res->end;
+	pagemap->nr_range = 1;
+	pagemap->owner = xe_svm_devm_owner(xe);
+	pagemap->ops = drm_pagemap_pagemap_ops_get();
+	addr = devm_memremap_pages(dev, pagemap);
 	if (IS_ERR(addr)) {
-		ret = PTR_ERR(addr);
-		drm_err(&xe->drm, "Failed to remap tile %d memory, errno %pe\n",
-			tile->id, ERR_PTR(ret));
-		goto out_failed_memremap;
+		err = PTR_ERR(addr);
+		goto out_err;
 	}
-	vr->hpa_base = res->start;
+	xpagemap->hpa_base = res->start;
+	return xpagemap;
 
-	drm_dbg(&xe->drm, "Added tile %d memory [%llx-%llx] to devm, remapped to %pr\n",
-		tile->id, vr->io_start, vr->io_start + vr->usable_size, res);
-	return 0;
+out_err:
+	drm_pagemap_put(dpagemap);
+	return ERR_PTR(err);
 
-out_failed_memremap:
-	drm_pagemap_put(vr->dpagemap);
 out_no_dpagemap:
-	devm_release_mem_region(dev, res->start, resource_size(res));
-	return ret;
+	kfree(xpagemap);
+	return ERR_PTR(err);
 }
-#else
-int xe_devm_add(struct xe_tile *tile, struct xe_vram_region *vr)
+
+static struct xe_pagemap *
+xe_pagemap_find_or_create(struct xe_device *xe, struct xe_pagemap_cache *cache,
+			  struct xe_vram_region *vr);
+
+static int xe_svm_get_pagemaps(struct xe_vm *vm)
 {
+	struct xe_device *xe = vm->xe;
+	struct xe_pagemap *xpagemap;
+	struct xe_tile *tile;
+	int id;
+
+	for_each_tile(tile, xe, id) {
+		struct xe_vram_region *vr;
+
+		if (!((BIT(id) << 1) & xe->info.mem_region_mask))
+			continue;
+
+		vr = xe_tile_to_vr(tile);
+		xpagemap = xe_pagemap_find_or_create(xe, &vr->pagemap_cache, vr);
+		if (IS_ERR(xpagemap))
+			break;
+		vm->svm.pagemaps[id] = xpagemap;
+	}
+
+	if (IS_ERR(xpagemap)) {
+		xe_svm_put_pagemaps(vm);
+		return PTR_ERR(xpagemap);
+	}
+
 	return 0;
 }
+
+/**
+ * xe_pagemaps_remove() - Device remove work for the xe pagemaps
+ * @xe: The xe device
+ *
+ * This function needs to be run as part of the device remove (unplug)
+ * sequence to ensure that divice-private pages allocated using the
+ * xe pagemaps are not used anymore and that the dev_pagemaps are
+ * unregistered.
+ *
+ * The function needs to be called *after* the call to drm_dev_unplug()
+ * to ensure any calls to drm_pagemap_populate_mm() will return -ENODEV.
+ *
+ * Note that the pagemaps' references to the drm device and hence the
+ * xe device will remain until the pagemaps are destroyed.
+ */
+void xe_pagemaps_remove(struct xe_device *xe)
+{
+	unsigned int id, mem_type;
+	struct xe_tile *tile;
+	int ret;
+
+	/* Migrate all PTEs of this pagemap to system */
+	for (mem_type = XE_PL_VRAM1; mem_type >= XE_PL_TT; --mem_type) {
+		struct ttm_resource_manager *man =
+			ttm_manager_type(&xe->ttm, mem_type);
+
+		if (man) {
+			ret = ttm_resource_manager_evict_all(&xe->ttm, man);
+			drm_WARN_ON(&xe->drm, ret);
+		}
+	}
+
+	/* Remove the device pages themselves */
+	for_each_tile(tile, xe, id) {
+		struct xe_pagemap_cache *cache;
+
+		if (!((BIT(id) << 1) & xe->info.mem_region_mask))
+			continue;
+
+		cache = &tile->mem.vram.pagemap_cache;
+		mutex_lock(&cache->mutex);
+		if (cache->xpagemap)
+			xe_pagemap_fini(cache->xpagemap);
+		/* Nobody can resurrect, since the device is unplugged. */
+		mutex_unlock(&cache->mutex);
+	}
+}
+
+static void xe_pagemap_cache_fini(struct drm_device *drm, void *arg)
+{
+	struct xe_pagemap_cache *cache = arg;
+	struct xe_pagemap *xpagemap;
+
+	wait_for_completion(&cache->queued);
+	mutex_lock(&cache->mutex);
+	xpagemap = cache->xpagemap;
+	if (xpagemap && cancel_delayed_work(&xpagemap->destroy_work)) {
+		mutex_unlock(&cache->mutex);
+		xe_pagemap_destroy_work(&xpagemap->destroy_work.work);
+		return;
+	}
+	mutex_unlock(&cache->mutex);
+	flush_workqueue(to_xe_device(drm)->unordered_wq);
+	mutex_destroy(&cache->mutex);
+}
+
+/**
+ * xe_pagemap_cache_init() - Initialize a struct xe_pagemap_cache
+ * @drm: Pointer to the struct drm_device
+ * @cache: Pointer to a struct xe_pagemap_cache
+ *
+ * Initialize a struct xe_pagemap_cache and if successful, register a cleanup
+ * function to be run at xe/drm device destruction.
+ *
+ * Return: 0 on success, negative error code on error.
+ */
+int xe_pagemap_cache_init(struct drm_device *drm, struct xe_pagemap_cache *cache)
+{
+	mutex_init(&cache->mutex);
+	init_completion(&cache->queued);
+	complete_all(&cache->queued);
+	return drmm_add_action_or_reset(drm, xe_pagemap_cache_fini, cache);
+}
+
+static struct xe_pagemap *xe_pagemap_get_unless_zero(struct xe_pagemap *xpagemap)
+{
+	return (xpagemap && drm_pagemap_get_unless_zero(&xpagemap->dpagemap)) ? xpagemap : NULL;
+}
+
+/**
+ * xe_pagemap_find_or_create() - Find or create a struct xe_pagemap
+ * @xe: The xe device.
+ * @cache: The struct xe_pagemap_cache.
+ * @vr: The VRAM region.
+ *
+ * Check if there is an already used xe_pagemap for this tile, and in that case,
+ * return it.
+ * If not, check if there is a cached xe_pagemap for this tile, and in that case,
+ * cancel its destruction, re-initialize it and return it.
+ * Finally if there is no cached or already used pagemap, create one and
+ * register it in the tile's pagemap cache.
+ *
+ * Note that this function is typically called from within an IOCTL, and waits are
+ * therefore carried out interruptible if possible.
+ *
+ * Return: A pointer to a struct xe_pagemap if successful, Error pointer on failure.
+ */
+static struct xe_pagemap *
+xe_pagemap_find_or_create(struct xe_device *xe, struct xe_pagemap_cache *cache,
+			  struct xe_vram_region *vr)
+{
+	struct xe_pagemap *xpagemap;
+	int err;
+
+	err = mutex_lock_interruptible(&cache->mutex);
+	if (err)
+		return ERR_PTR(err);
+
+	xpagemap = cache->xpagemap;
+	if (xpagemap && !xe_pagemap_get_unless_zero(xpagemap)) {
+		/* Wait for the destroy work to get queued before canceling it! */
+		err = wait_for_completion_interruptible(&cache->queued);
+		if (err) {
+			mutex_unlock(&cache->mutex);
+			return ERR_PTR(err);
+		}
+
+		if (cancel_delayed_work(&xpagemap->destroy_work)) {
+			err = drm_pagemap_reinit(&xpagemap->dpagemap);
+			if (!err) {
+				reinit_completion(&cache->queued);
+				goto out_unlock;
+			}
+
+			queue_delayed_work(xe->unordered_wq, &xpagemap->destroy_work, 0);
+		}
+
+		cache->xpagemap = NULL;
+		xpagemap = NULL;
+	}
+	if (!xpagemap) {
+		xpagemap = xe_pagemap_create(xe, cache, vr);
+		if (IS_ERR(xpagemap))
+			goto out_unlock;
+
+		cache->xpagemap = xpagemap;
+		reinit_completion(&cache->queued);
+	}
+out_unlock:
+	mutex_unlock(&cache->mutex);
+	return xpagemap;
+}
 #endif
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index c32b6d46ecf1..19469fd91666 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -13,7 +13,11 @@
 
 #define XE_INTERCONNECT_VRAM DRM_INTERCONNECT_DRIVER
 
+struct drm_device;
+struct drm_file;
+
 struct xe_bo;
+struct xe_device;
 struct xe_vram_region;
 struct xe_tile;
 struct xe_vm;
@@ -45,6 +49,28 @@ struct xe_svm_range {
 	u8 skip_migrate	:1;
 };
 
+/**
+ * struct xe_pagemap - Manages xe device_private memory for SVM.
+ * @pagemap: The struct dev_pagemap providing the struct pages.
+ * @dpagemap: The drm_pagemap managing allocation and migration.
+ * @destroy_work: Handles asnynchronous destruction and caching.
+ * @hpa_base: The host physical address base for the managemd memory.
+ * @cache: Backpointer to the struct xe_pagemap_cache for the memory region.
+ * @vr: Backpointer to the xe_vram region.
+ * @unplugged: Advisory only information whether the device owning this
+ * pagemap has been unplugged. This field is typically used for caching
+ * time determination.
+ */
+struct xe_pagemap {
+	struct dev_pagemap pagemap;
+	struct drm_pagemap dpagemap;
+	struct delayed_work destroy_work;
+	resource_size_t hpa_base;
+	struct xe_pagemap_cache *cache;
+	struct xe_vram_region *vr;
+	bool unplugged;
+};
+
 /**
  * xe_svm_range_pages_valid() - SVM range pages valid
  * @range: SVM range
@@ -95,11 +121,16 @@ static inline bool xe_svm_range_has_dma_mapping(struct xe_svm_range *range)
 #define xe_svm_notifier_unlock(vm__)	\
 	drm_gpusvm_notifier_unlock(&(vm__)->svm.gpusvm)
 
+struct xe_pagemap *
+xe_pagemap_create(struct xe_device *xe, struct xe_pagemap_cache *cache,
+		  struct xe_vram_region *vr);
+
 #else
 #include <linux/interval_tree.h>
 
 struct drm_pagemap_device_addr;
 struct xe_bo;
+struct xe_device;
 struct xe_vm;
 struct xe_vma;
 struct xe_tile;
@@ -178,5 +209,23 @@ static inline void xe_svm_notifier_lock(struct xe_vm *vm)
 static inline void xe_svm_notifier_unlock(struct xe_vm *vm)
 {
 }
+
 #endif
+
+#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
+
+int xe_pagemap_cache_init(struct drm_device *drm, struct xe_pagemap_cache *cache);
+
+void xe_pagemaps_remove(struct xe_device *xe);
+
+#else
+
+#define xe_pagemap_cache_init(...) 0
+
+static inline void xe_pagemaps_remove(struct xe_device *xe)
+{
+}
+
+#endif
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_tile.c b/drivers/gpu/drm/xe/xe_tile.c
index 0771acbbf367..f5d9d56418ee 100644
--- a/drivers/gpu/drm/xe/xe_tile.c
+++ b/drivers/gpu/drm/xe/xe_tile.c
@@ -161,7 +161,6 @@ static int tile_ttm_mgr_init(struct xe_tile *tile)
  */
 int xe_tile_init_noalloc(struct xe_tile *tile)
 {
-	struct xe_device *xe = tile_to_xe(tile);
 	int err;
 
 	err = tile_ttm_mgr_init(tile);
@@ -170,8 +169,9 @@ int xe_tile_init_noalloc(struct xe_tile *tile)
 
 	xe_wa_apply_tile_workarounds(tile);
 
-	if (xe->info.has_usm && IS_DGFX(xe))
-		xe_devm_add(tile, &tile->mem.vram);
+	err = xe_pagemap_cache_init(&tile_to_xe(tile)->drm, &tile->mem.vram.pagemap_cache);
+	if (err)
+		return err;
 
 	return xe_tile_sysfs_init(tile);
 }
@@ -188,3 +188,17 @@ void xe_tile_migrate_wait(struct xe_tile *tile)
 {
 	xe_migrate_wait(tile->migrate);
 }
+
+#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
+/**
+ * xe_tile_local_pagemap() - Return a pointer to the tile's local drm_pagemap if any
+ * @tile: The tile.
+ *
+ * Return: A pointer to the tile's local drm_pagemap, or NULL if local pagemap
+ * support has been compiled out.
+ */
+struct drm_pagemap *xe_tile_local_pagemap(struct xe_tile *tile)
+{
+	return &xe_tile_to_vr(tile)->pagemap_cache.xpagemap->dpagemap;
+}
+#endif
diff --git a/drivers/gpu/drm/xe/xe_tile.h b/drivers/gpu/drm/xe/xe_tile.h
index 1d42b235c322..375b8323cda6 100644
--- a/drivers/gpu/drm/xe/xe_tile.h
+++ b/drivers/gpu/drm/xe/xe_tile.h
@@ -8,6 +8,7 @@
 
 #include "xe_device_types.h"
 
+struct xe_pagemap;
 struct xe_tile;
 
 int xe_tile_init_early(struct xe_tile *tile, struct xe_device *xe, u8 id);
@@ -16,11 +17,32 @@ int xe_tile_init(struct xe_tile *tile);
 
 void xe_tile_migrate_wait(struct xe_tile *tile);
 
-#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
-static inline struct drm_pagemap *xe_tile_local_pagemap(struct xe_tile *tile)
+/**
+ * xe_vr_to_tile() - Return the struct xe_tile pointer from a
+ * struct xe_vram_region pointer.
+ * @vr: The xe_vram_region.
+ *
+ * Return: Pointer to the struct xe_tile embedding *@vr.
+ */
+static inline struct xe_tile *xe_vr_to_tile(struct xe_vram_region *vr)
 {
-	return tile->mem.vram.dpagemap;
+	return container_of(vr, struct xe_tile, mem.vram);
 }
+
+/**
+ * xe_tile_to_vr() - Return the struct xe_vram_region pointer from a
+ * struct xe_tile pointer
+ * @tile: Pointer to the struct xe_tile.
+ *
+ * Return: Pointer to the struct xe_vram_region embedded in *@tile.
+ */
+static inline struct xe_vram_region *xe_tile_to_vr(struct xe_tile *tile)
+{
+	return &tile->mem.vram;
+}
+
+#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
+struct drm_pagemap *xe_tile_local_pagemap(struct xe_tile *tile);
 #else
 static inline struct drm_pagemap *xe_tile_local_pagemap(struct xe_tile *tile)
 {
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 84fa41b9fa20..08baea03df00 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -168,6 +168,7 @@ struct xe_vm {
 			 */
 			struct work_struct work;
 		} garbage_collector;
+		struct xe_pagemap *pagemaps[XE_MAX_TILES_PER_DEVICE];
 	} svm;
 
 	struct xe_device *xe;
diff --git a/include/drm/drm_pagemap.h b/include/drm/drm_pagemap.h
index 49f2e0b6c699..9f758a46988a 100644
--- a/include/drm/drm_pagemap.h
+++ b/include/drm/drm_pagemap.h
@@ -120,6 +120,21 @@ struct drm_pagemap_ops {
 	int (*populate_mm)(struct drm_pagemap *dpagemap,
 			   unsigned long start, unsigned long end,
 			   struct mm_struct *mm);
+
+	/**
+	 * @destroy: Uninitialize a struct drm_pagemap.
+	 * @dpagemap: The struct drm_pagemap to uninitialize.
+	 *
+	 * Uninitialize the drm_pagemap, potentially retaining it in
+	 * a cache for re-initialization. This callback may be called
+	 * with page locks held and typicall needs to queue any
+	 * destruction or caching work on a workqueue to avoid locking
+	 * order inversions. Since the drm_pagemap code also may put
+	 * the owning device immediately after this function is called,
+	 * the drm_pagemap destruction needs to be waited for in
+	 * the device destruction code.
+	 */
+	void (*destroy)(struct drm_pagemap *dpagemap);
 };
 
 /**
@@ -127,14 +142,16 @@ struct drm_pagemap_ops {
  * used for device p2p handshaking.
  * @ops: The struct drm_pagemap_ops.
  * @ref: Reference count.
- * @dev: The struct drevice owning the device-private memory.
+ * @drm: The struct drm device owning the device-private memory.
  * @pagemap: Pointer to the underlying dev_pagemap.
+ * @dev_hold: Pointer to a struct dev_hold for device referencing.
  */
 struct drm_pagemap {
 	const struct drm_pagemap_ops *ops;
 	struct kref ref;
-	struct device *dev;
+	struct drm_device *drm;
 	struct dev_pagemap *pagemap;
+	struct drm_pagemap_dev_hold *dev_hold;
 };
 
 struct drm_pagemap_devmem;
@@ -199,26 +216,44 @@ struct drm_pagemap_devmem_ops {
 			   unsigned long npages);
 };
 
-struct drm_pagemap *drm_pagemap_create(struct device *dev,
-				       struct dev_pagemap *pagemap,
-				       const struct drm_pagemap_ops *ops);
+int drm_pagemap_reinit(struct drm_pagemap *dpagemap);
+
+int drm_pagemap_init(struct drm_pagemap *dpagemap,
+		     struct dev_pagemap *pagemap,
+		     struct drm_device *drm,
+		     const struct drm_pagemap_ops *ops);
 
 void drm_pagemap_put(struct drm_pagemap *dpagemap);
 
 /**
  * drm_pagemap_get() - Obtain a reference on a struct drm_pagemap
- * @dpagemap: Pointer to the struct drm_pagemap.
+ * @dpagemap: Pointer to the struct drm_pagemap, or NULL.
  *
- * Return: Pointer to the struct drm_pagemap.
+ * Return: Pointer to the struct drm_pagemap, or NULL.
  */
 static inline struct drm_pagemap *
 drm_pagemap_get(struct drm_pagemap *dpagemap)
 {
-	kref_get(&dpagemap->ref);
+	if (likely(dpagemap))
+		kref_get(&dpagemap->ref);
 
 	return dpagemap;
 }
 
+/**
+ * drm_pagemap_get_unless_zero() - Obtain a reference on a struct drm_pagemap
+ * unless the current reference count is zero.
+ * @dpagemap: Pointer to the drm_pagemap or NULL.
+ *
+ * Return: A pointer to @dpagemap if the reference count was successfully
+ * incremented. NULL if @dpagemap was NULL, or its refcount was 0.
+ */
+static inline struct drm_pagemap *
+drm_pagemap_get_unless_zero(struct drm_pagemap *dpagemap)
+{
+	return (dpagemap && kref_get_unless_zero(&dpagemap->ref)) ? dpagemap : NULL;
+}
+
 /**
  * struct drm_pagemap_devmem - Structure representing a GPU SVM device memory allocation
  *
@@ -257,6 +292,4 @@ void drm_pagemap_devmem_init(struct drm_pagemap_devmem *devmem_allocation,
 int drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
 			    unsigned long start, unsigned long end,
 			    struct mm_struct *mm);
-
 #endif
-
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 09/19] drm/pagemap_util: Add a utility to assign an owner to a set of interconnected gpus
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
                   ` (7 preceding siblings ...)
  2025-03-12 21:04 ` [RFC PATCH 08/19] drm/xe/bo: Add a bo remove callback Thomas Hellström
@ 2025-03-12 21:04 ` Thomas Hellström
  2025-03-12 21:04 ` [RFC PATCH 10/19] drm/gpusvm, drm/xe: Move the device private owner to the drm_gpusvm_ctx Thomas Hellström
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:04 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Thomas Hellström, himal.prasad.ghimiray, apopple, airlied,
	Simona Vetter, felix.kuehling, Matthew Brost,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

The hmm_range_fault() and the migration helpers currently need a common
"owner" to identify pagemaps and clients with fast interconnect.
Add a drm_pagemap utility to setup such owners by registering
drm_pagemaps, in a registry, and for each new drm_pagemap,
query which existing drm_pagemaps have fast interconnects with the new
drm_pagemap.

The "owner" scheme is limited in that it is static at drm_pagemap creation.
Ideally one would want the owner to be adjusted at run-time, but that
requires changes to hmm. If the proposed scheme becomes too limited,
we need to revisit.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/Makefile           |   3 +-
 drivers/gpu/drm/drm_pagemap_util.c | 125 +++++++++++++++++++++++++++++
 include/drm/drm_pagemap_util.h     |  55 +++++++++++++
 3 files changed, 182 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/drm_pagemap_util.c
 create mode 100644 include/drm/drm_pagemap_util.h

diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 6e3520bff769..bd7bdf973897 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -107,7 +107,8 @@ obj-$(CONFIG_DRM_GPUVM) += drm_gpuvm.o
 
 drm_gpusvm_helper-y := \
 	drm_gpusvm.o\
-	drm_pagemap.o
+	drm_pagemap.o\
+	drm_pagemap_util.o
 obj-$(CONFIG_DRM_GPUSVM) += drm_gpusvm_helper.o
 
 obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
diff --git a/drivers/gpu/drm/drm_pagemap_util.c b/drivers/gpu/drm/drm_pagemap_util.c
new file mode 100644
index 000000000000..ae8f78cde4a7
--- /dev/null
+++ b/drivers/gpu/drm/drm_pagemap_util.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: GPL-2.0-only OR MIT
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+
+#include <linux/slab.h>
+
+#include <drm/drm_pagemap.h>
+#include <drm/drm_pagemap_util.h>
+
+/**
+ * struct drm_pagemap_owner - Device interconnect group
+ * @kref: Reference count.
+ *
+ * A struct drm_pagemap_owner identifies a device interconnect group.
+ */
+struct drm_pagemap_owner {
+	struct kref kref;
+};
+
+static void drm_pagemap_owner_release(struct kref *kref)
+{
+	kfree(container_of(kref, struct drm_pagemap_owner, kref));
+}
+
+/**
+ * drm_pagemap_release_owner() - Stop participating in an interconnect group
+ * @peer: Pointer to the struct drm_pagemap_peer used when joining the group
+ *
+ * Stop participating in an interconnect group. This function is typically
+ * called when a pagemap is removed to indicate that it doesn't need to
+ * be taken into account.
+ */
+void drm_pagemap_release_owner(struct drm_pagemap_peer *peer)
+{
+	struct drm_pagemap_owner_list *owner_list = peer->list;
+
+	if (!owner_list)
+		return;
+
+	mutex_lock(&owner_list->lock);
+	list_del(&peer->link);
+	kref_put(&peer->owner->kref, drm_pagemap_owner_release);
+	peer->owner = NULL;
+	mutex_unlock(&owner_list->lock);
+}
+EXPORT_SYMBOL(drm_pagemap_release_owner);
+
+/**
+ * typedef interconnect_fn - Callback function to identify fast interconnects
+ * @peer1: First endpoint.
+ * @peer2: Second endpont.
+ *
+ * The function returns %true iff @peer1 and @peer2 have a fast interconnect.
+ * Note that this is symmetrical. The function has no notion of client and provider,
+ * which may not be sufficient in some cases. However, since the callback is intended
+ * to guide in providing common pagemap owners, the notion of a common owner to
+ * indicate fast interconnects would then have to change as well.
+ *
+ * Return: %true iff @peer1 and @peer2 have a fast interconnect. Otherwise @false.
+ */
+typedef bool (*interconnect_fn)(struct drm_pagemap_peer *peer1, struct drm_pagemap_peer *peer2);
+
+/**
+ * drm_pagemap_acquire_owner() - Join an interconnect group
+ * @peer: A struct drm_pagemap_peer keeping track of the device interconnect
+ * @owner_list: Pointer to the owner_list, keeping track of all interconnects
+ * @has_interconnect: Callback function to determine whether two peers have a
+ * fast local interconnect.
+ *
+ * Repeatedly calls @has_interconnect for @peer and other peers on @owner_list to
+ * determine a set of peers for which @peer has a fast interconnect. That set will
+ * have common &struct drm_pagemap_owner, and upon successful return, @peer::owner
+ * will point to that struct, holding a reference, and @peer will be registered in
+ * @owner_list. If @peer doesn't have any fast interconnects to other @peers, a
+ * new unique &struct drm_pagemap_owner will be allocated for it, and that
+ * may be shared with other peers that, at a later point, are determined to have
+ * a fast interconnect with @peer.
+ *
+ * When @peer no longer participates in an interconnect group,
+ * drm_pagemap_release_owner() should be called to drop the reference on the
+ * struct drm_pagemap_owner.
+ *
+ * Return: %0 on success, negative error code on failure.
+ */
+int drm_pagemap_acquire_owner(struct drm_pagemap_peer *peer,
+			      struct drm_pagemap_owner_list *owner_list,
+			      interconnect_fn has_interconnect)
+{
+	struct drm_pagemap_peer *cur_peer;
+	struct drm_pagemap_owner *owner = NULL;
+	bool interconnect = false;
+
+	mutex_lock(&owner_list->lock);
+	might_alloc(GFP_KERNEL);
+	list_for_each_entry(cur_peer, &owner_list->peers, link) {
+		if (cur_peer->owner != owner) {
+			if (owner && interconnect)
+				break;
+			owner = cur_peer->owner;
+			interconnect = true;
+		}
+		if (interconnect && !has_interconnect(peer, cur_peer))
+			interconnect = false;
+	}
+
+	if (!interconnect) {
+		owner = kmalloc(sizeof(*owner), GFP_KERNEL);
+		if (!owner) {
+			mutex_unlock(&owner_list->lock);
+			return -ENOMEM;
+		}
+		kref_init(&owner->kref);
+		list_add_tail(&peer->link, &owner_list->peers);
+	} else {
+		kref_get(&owner->kref);
+		list_add_tail(&peer->link, &cur_peer->link);
+	}
+	peer->owner = owner;
+	peer->list = owner_list;
+	mutex_unlock(&owner_list->lock);
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_pagemap_acquire_owner);
diff --git a/include/drm/drm_pagemap_util.h b/include/drm/drm_pagemap_util.h
new file mode 100644
index 000000000000..03731c79493f
--- /dev/null
+++ b/include/drm/drm_pagemap_util.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2025 Intel Corporation
+ */
+#ifndef _DRM_PAGEMAP_UTIL_H_
+#define _DRM_PAGEMAP_UTIL_H_
+
+#include <linux/list.h>
+#include <linux/mutex.h>
+
+struct drm_pagemap;
+struct drm_pagemap_owner;
+
+/**
+ * struct drm_pagemap_peer - Structure representing a fast interconnect peer
+ * @list: Pointer to a &struct drm_pagemap_owner_list used to keep track of peers
+ * @link: List link for @list's list of peers.
+ * @owner: Pointer to a &struct drm_pagemap_owner, common for a set of peers having
+ * fast interconnects.
+ */
+struct drm_pagemap_peer {
+	struct drm_pagemap_owner_list *list;
+	struct list_head link;
+	struct drm_pagemap_owner *owner;
+};
+
+/**
+ * struct drm_pagemap_owner_list - Keeping track of peers and owners
+ * @lock: Mutex protecting the @peers list.
+ * @peer: List of peers.
+ *
+ * The owner list defines the scope where we identify peers having fast interconnects
+ * and a common owner. Typically a driver has a single global owner list to
+ * keep track of common owners for the driver's pagemaps.
+ */
+struct drm_pagemap_owner_list {
+	struct mutex lock;
+	struct list_head peers;
+};
+
+/*
+ * Convenience macro to define an owner list.
+ */
+#define DRM_PAGEMAP_OWNER_LIST_DEFINE(_name)	\
+	struct drm_pagemap_owner_list _name = {	\
+	.lock = __MUTEX_INITIALIZER(_name.lock), \
+	.peers = LIST_HEAD_INIT(_name.peers) }
+
+void drm_pagemap_release_owner(struct drm_pagemap_peer *peer);
+
+int drm_pagemap_acquire_owner(struct drm_pagemap_peer *peer,
+			      struct drm_pagemap_owner_list *owner_list,
+			      bool (*has_interconnect)(struct drm_pagemap_peer *peer1,
+						       struct drm_pagemap_peer *peer2));
+#endif
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 10/19] drm/gpusvm, drm/xe: Move the device private owner to the drm_gpusvm_ctx
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
                   ` (8 preceding siblings ...)
  2025-03-12 21:04 ` [RFC PATCH 09/19] drm/pagemap_util: Add a utility to assign an owner to a set of interconnected gpus Thomas Hellström
@ 2025-03-12 21:04 ` Thomas Hellström
  2025-03-12 21:04 ` [RFC PATCH 11/19] drm/xe: Use the drm_pagemap_util helper to get a svm pagemap owner Thomas Hellström
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:04 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Thomas Hellström, himal.prasad.ghimiray, apopple, airlied,
	Simona Vetter, felix.kuehling, Matthew Brost,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

It may not be known at drm_gpusvm_init time.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/drm_gpusvm.c | 21 ++++++++++++---------
 drivers/gpu/drm/xe/xe_svm.c  |  5 +++--
 include/drm/drm_gpusvm.h     |  7 ++++---
 3 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/drm_gpusvm.c b/drivers/gpu/drm/drm_gpusvm.c
index d84e27283768..8d836248f5fe 100644
--- a/drivers/gpu/drm/drm_gpusvm.c
+++ b/drivers/gpu/drm/drm_gpusvm.c
@@ -416,7 +416,6 @@ static const struct mmu_interval_notifier_ops drm_gpusvm_notifier_ops = {
  * @name: Name of the GPU SVM.
  * @drm: Pointer to the DRM device structure.
  * @mm: Pointer to the mm_struct for the address space.
- * @device_private_page_owner: Device private pages owner.
  * @mm_start: Start address of GPU SVM.
  * @mm_range: Range of the GPU SVM.
  * @notifier_size: Size of individual notifiers.
@@ -432,7 +431,7 @@ static const struct mmu_interval_notifier_ops drm_gpusvm_notifier_ops = {
  */
 int drm_gpusvm_init(struct drm_gpusvm *gpusvm,
 		    const char *name, struct drm_device *drm,
-		    struct mm_struct *mm, void *device_private_page_owner,
+		    struct mm_struct *mm,
 		    unsigned long mm_start, unsigned long mm_range,
 		    unsigned long notifier_size,
 		    const struct drm_gpusvm_ops *ops,
@@ -444,7 +443,6 @@ int drm_gpusvm_init(struct drm_gpusvm *gpusvm,
 	gpusvm->name = name;
 	gpusvm->drm = drm;
 	gpusvm->mm = mm;
-	gpusvm->device_private_page_owner = device_private_page_owner;
 	gpusvm->mm_start = mm_start;
 	gpusvm->mm_range = mm_range;
 	gpusvm->notifier_size = notifier_size;
@@ -712,6 +710,7 @@ drm_gpusvm_range_alloc(struct drm_gpusvm *gpusvm,
  * @notifier: Pointer to the GPU SVM notifier structure
  * @start: Start address
  * @end: End address
+ * @dev_private_owner: The device private page owner
  *
  * Check if pages between start and end have been faulted in on the CPU. Use to
  * prevent migration of pages without CPU backing store.
@@ -720,14 +719,15 @@ drm_gpusvm_range_alloc(struct drm_gpusvm *gpusvm,
  */
 static bool drm_gpusvm_check_pages(struct drm_gpusvm *gpusvm,
 				   struct drm_gpusvm_notifier *notifier,
-				   unsigned long start, unsigned long end)
+				   unsigned long start, unsigned long end,
+				   void *dev_private_owner)
 {
 	struct hmm_range hmm_range = {
 		.default_flags = 0,
 		.notifier = &notifier->notifier,
 		.start = start,
 		.end = end,
-		.dev_private_owner = gpusvm->device_private_page_owner,
+		.dev_private_owner = dev_private_owner,
 	};
 	unsigned long timeout =
 		jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
@@ -781,6 +781,7 @@ static bool drm_gpusvm_check_pages(struct drm_gpusvm *gpusvm,
  * @gpuva_start: Start address of GPUVA which mirrors CPU
  * @gpuva_end: End address of GPUVA which mirrors CPU
  * @check_pages_threshold: Check CPU pages for present threshold
+ * @dev_private_owner: The device private page owner
  *
  * This function determines the chunk size for the GPU SVM range based on the
  * fault address, GPU SVM chunk sizes, existing GPU SVM ranges, and the virtual
@@ -795,7 +796,8 @@ drm_gpusvm_range_chunk_size(struct drm_gpusvm *gpusvm,
 			    unsigned long fault_addr,
 			    unsigned long gpuva_start,
 			    unsigned long gpuva_end,
-			    unsigned long check_pages_threshold)
+			    unsigned long check_pages_threshold,
+			    void *dev_private_owner)
 {
 	unsigned long start, end;
 	int i = 0;
@@ -842,7 +844,7 @@ drm_gpusvm_range_chunk_size(struct drm_gpusvm *gpusvm,
 		 * process-many-malloc' mallocs at least 64k at a time.
 		 */
 		if (end - start <= check_pages_threshold &&
-		    !drm_gpusvm_check_pages(gpusvm, notifier, start, end)) {
+		    !drm_gpusvm_check_pages(gpusvm, notifier, start, end, dev_private_owner)) {
 			++i;
 			goto retry;
 		}
@@ -951,7 +953,8 @@ drm_gpusvm_range_find_or_insert(struct drm_gpusvm *gpusvm,
 	chunk_size = drm_gpusvm_range_chunk_size(gpusvm, notifier, vas,
 						 fault_addr, gpuva_start,
 						 gpuva_end,
-						 ctx->check_pages_threshold);
+						 ctx->check_pages_threshold,
+						 ctx->device_private_page_owner);
 	if (chunk_size == LONG_MAX) {
 		err = -EINVAL;
 		goto err_notifier_remove;
@@ -1207,7 +1210,7 @@ int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm,
 		.notifier = notifier,
 		.start = drm_gpusvm_range_start(range),
 		.end = drm_gpusvm_range_end(range),
-		.dev_private_owner = gpusvm->device_private_page_owner,
+		.dev_private_owner = ctx->device_private_page_owner,
 	};
 	struct mm_struct *mm = gpusvm->mm;
 	void *zdd;
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index c49bcfea5644..20441da0aff7 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -657,8 +657,8 @@ int xe_svm_init(struct xe_vm *vm)
 		return err;
 
 	err = drm_gpusvm_init(&vm->svm.gpusvm, "Xe SVM", &vm->xe->drm,
-			      current->mm, xe_svm_devm_owner(vm->xe), 0,
-			      vm->size, xe_modparam.svm_notifier_size * SZ_1M,
+			      current->mm, 0, vm->size,
+			      xe_modparam.svm_notifier_size * SZ_1M,
 			      &gpusvm_ops, fault_chunk_sizes,
 			      ARRAY_SIZE(fault_chunk_sizes));
 	if (err) {
@@ -841,6 +841,7 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 	}
 
 	range_debug(range, "GET PAGES");
+	ctx.device_private_page_owner = xe_svm_devm_owner(vm->xe);
 	err = drm_gpusvm_range_get_pages(&vm->svm.gpusvm, r, &ctx);
 	/* Corner where CPU mappings have changed */
 	if (err == -EOPNOTSUPP || err == -EFAULT || err == -EPERM) {
diff --git a/include/drm/drm_gpusvm.h b/include/drm/drm_gpusvm.h
index 97c641bf49c5..729e5251b582 100644
--- a/include/drm/drm_gpusvm.h
+++ b/include/drm/drm_gpusvm.h
@@ -156,7 +156,6 @@ struct drm_gpusvm_range {
  * @name: Name of the GPU SVM
  * @drm: Pointer to the DRM device structure
  * @mm: Pointer to the mm_struct for the address space
- * @device_private_page_owner: Device private pages owner
  * @mm_start: Start address of GPU SVM
  * @mm_range: Range of the GPU SVM
  * @notifier_size: Size of individual notifiers
@@ -181,7 +180,6 @@ struct drm_gpusvm {
 	const char *name;
 	struct drm_device *drm;
 	struct mm_struct *mm;
-	void *device_private_page_owner;
 	unsigned long mm_start;
 	unsigned long mm_range;
 	unsigned long notifier_size;
@@ -203,6 +201,8 @@ struct drm_gpusvm {
 /**
  * struct drm_gpusvm_ctx - DRM GPU SVM context
  *
+ * @device_private_page_owner: The device-private page owner to use for
+ * this operation
  * @check_pages_threshold: Check CPU pages for present if chunk is less than or
  *                         equal to threshold. If not present, reduce chunk
  *                         size.
@@ -213,6 +213,7 @@ struct drm_gpusvm {
  * Context that is DRM GPUSVM is operating in (i.e. user arguments).
  */
 struct drm_gpusvm_ctx {
+	void *device_private_page_owner;
 	unsigned long check_pages_threshold;
 	unsigned int in_notifier :1;
 	unsigned int read_only :1;
@@ -221,7 +222,7 @@ struct drm_gpusvm_ctx {
 
 int drm_gpusvm_init(struct drm_gpusvm *gpusvm,
 		    const char *name, struct drm_device *drm,
-		    struct mm_struct *mm, void *device_private_page_owner,
+		    struct mm_struct *mm,
 		    unsigned long mm_start, unsigned long mm_range,
 		    unsigned long notifier_size,
 		    const struct drm_gpusvm_ops *ops,
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 11/19] drm/xe: Use the drm_pagemap_util helper to get a svm pagemap owner
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
                   ` (9 preceding siblings ...)
  2025-03-12 21:04 ` [RFC PATCH 10/19] drm/gpusvm, drm/xe: Move the device private owner to the drm_gpusvm_ctx Thomas Hellström
@ 2025-03-12 21:04 ` Thomas Hellström
  2025-03-12 21:04 ` [RFC PATCH 12/19] drm/xe: Make the PT code handle placement per PTE rather than per vma / range Thomas Hellström
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:04 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Thomas Hellström, himal.prasad.ghimiray, apopple, airlied,
	Simona Vetter, felix.kuehling, Matthew Brost,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

Register a driver-wide owner list, provide a callback to identify
fast interconnects and use the drm_pagemap_util helper to allocate
or reuse a suitable owner struct. For now we consider pagemaps on
different tiles on the same device as having fast interconnect.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_svm.c | 40 +++++++++++++++++++++++++++----------
 drivers/gpu/drm/xe/xe_svm.h |  3 +++
 2 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 20441da0aff7..25d49d0d7484 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -75,11 +75,6 @@ void xe_svm_range_debug(struct xe_svm_range *range, const char *operation)
 	range_debug(range, operation);
 }
 
-static void *xe_svm_devm_owner(struct xe_device *xe)
-{
-	return xe;
-}
-
 static struct drm_gpusvm_range *
 xe_svm_range_alloc(struct drm_gpusvm *gpusvm)
 {
@@ -751,7 +746,7 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
 	/* Ensure the device has a pm ref while there are device pages active. */
 	xe_pm_runtime_get_noresume(xe);
 	err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation, mm,
-					    start, end, xe_svm_devm_owner(xe));
+					    start, end, xpagemap->pagemap.owner);
 	if (err)
 		xe_svm_devmem_release(&bo->devmem_allocation);
 
@@ -791,6 +786,7 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 		.check_pages_threshold = IS_DGFX(vm->xe) &&
 			IS_ENABLED(CONFIG_DRM_XE_PAGEMAP) ? SZ_64K : 0,
 	};
+	struct drm_pagemap *dpagemap;
 	struct xe_svm_range *range;
 	struct drm_gpusvm_range *r;
 	struct drm_exec exec;
@@ -818,16 +814,14 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 		return 0;
 
 	range_debug(range, "PAGE FAULT");
+	dpagemap = xe_tile_local_pagemap(tile);
 
 	/* XXX: Add migration policy, for now migrate range once */
 	if (!range->skip_migrate && range->base.flags.migrate_devmem &&
 	    xe_svm_range_size(range) >= SZ_64K) {
-		struct drm_pagemap *dpagemap;
-
 		range->skip_migrate = true;
 
 		range_debug(range, "ALLOCATE VRAM");
-		dpagemap = xe_tile_local_pagemap(tile);
 		err = drm_pagemap_populate_mm(dpagemap, xe_svm_range_start(range),
 					      xe_svm_range_end(range),
 					      range->base.gpusvm->mm);
@@ -841,7 +835,8 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 	}
 
 	range_debug(range, "GET PAGES");
-	ctx.device_private_page_owner = xe_svm_devm_owner(vm->xe);
+	ctx.device_private_page_owner = dpagemap ?
+		container_of(dpagemap, struct xe_pagemap, dpagemap)->pagemap.owner : NULL;
 	err = drm_gpusvm_range_get_pages(&vm->svm.gpusvm, r, &ctx);
 	/* Corner where CPU mappings have changed */
 	if (err == -EOPNOTSUPP || err == -EFAULT || err == -EPERM) {
@@ -962,6 +957,11 @@ static void xe_pagemap_fini(struct xe_pagemap *xpagemap)
 		xpagemap->hpa_base = 0;
 	}
 
+	if (pagemap->owner) {
+		drm_pagemap_release_owner(&xpagemap->peer);
+		pagemap->owner = NULL;
+	}
+
 	if (pagemap->range.start) {
 		devm_release_mem_region(dev, pagemap->range.start,
 					pagemap->range.end - pagemap->range.start + 1);
@@ -995,6 +995,19 @@ static void xe_pagemap_destroy(struct drm_pagemap *dpagemap)
 	complete_all(&xpagemap->cache->queued);
 }
 
+static bool xe_has_interconnect(struct drm_pagemap_peer *peer1,
+				struct drm_pagemap_peer *peer2)
+{
+	struct xe_pagemap *xpagemap1 = container_of(peer1, typeof(*xpagemap1), peer);
+	struct xe_pagemap *xpagemap2 = container_of(peer2, typeof(*xpagemap1), peer);
+	struct device *dev1 = xpagemap1->dpagemap.drm->dev;
+	struct device *dev2 = xpagemap2->dpagemap.drm->dev;
+
+	return dev1 == dev2;
+}
+
+static DRM_PAGEMAP_OWNER_LIST_DEFINE(xe_owner_list);
+
 static const struct drm_pagemap_ops xe_drm_pagemap_ops = {
 	.device_map = xe_drm_pagemap_device_map,
 	.populate_mm = xe_drm_pagemap_populate_mm,
@@ -1046,11 +1059,16 @@ struct xe_pagemap *xe_pagemap_create(struct xe_device *xe, struct xe_pagemap_cac
 		goto out_err;
 	}
 
+	err = drm_pagemap_acquire_owner(&xpagemap->peer, &xe_owner_list,
+					xe_has_interconnect);
+	if (err)
+		goto out_err;
+
 	pagemap->type = MEMORY_DEVICE_PRIVATE;
 	pagemap->range.start = res->start;
 	pagemap->range.end = res->end;
 	pagemap->nr_range = 1;
-	pagemap->owner = xe_svm_devm_owner(xe);
+	pagemap->owner = xpagemap->peer.owner;
 	pagemap->ops = drm_pagemap_pagemap_ops_get();
 	addr = devm_memremap_pages(dev, pagemap);
 	if (IS_ERR(addr)) {
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index 19469fd91666..3fd8fc125cba 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -10,6 +10,7 @@
 
 #include <drm/drm_pagemap.h>
 #include <drm/drm_gpusvm.h>
+#include <drm/drm_pagemap_util.h>
 
 #define XE_INTERCONNECT_VRAM DRM_INTERCONNECT_DRIVER
 
@@ -54,6 +55,7 @@ struct xe_svm_range {
  * @pagemap: The struct dev_pagemap providing the struct pages.
  * @dpagemap: The drm_pagemap managing allocation and migration.
  * @destroy_work: Handles asnynchronous destruction and caching.
+ * @peer: Used for pagemap owner computation.
  * @hpa_base: The host physical address base for the managemd memory.
  * @cache: Backpointer to the struct xe_pagemap_cache for the memory region.
  * @vr: Backpointer to the xe_vram region.
@@ -65,6 +67,7 @@ struct xe_pagemap {
 	struct dev_pagemap pagemap;
 	struct drm_pagemap dpagemap;
 	struct delayed_work destroy_work;
+	struct drm_pagemap_peer peer;
 	resource_size_t hpa_base;
 	struct xe_pagemap_cache *cache;
 	struct xe_vram_region *vr;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 12/19] drm/xe: Make the PT code handle placement per PTE rather than per vma / range
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
                   ` (10 preceding siblings ...)
  2025-03-12 21:04 ` [RFC PATCH 11/19] drm/xe: Use the drm_pagemap_util helper to get a svm pagemap owner Thomas Hellström
@ 2025-03-12 21:04 ` Thomas Hellström
  2025-03-12 21:04 ` [RFC PATCH 13/19] drm/gpusvm: Allow mixed mappings Thomas Hellström
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:04 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Thomas Hellström, himal.prasad.ghimiray, apopple, airlied,
	Simona Vetter, felix.kuehling, Matthew Brost,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

With SVM, ranges forwarded to the PT code for binding can, mostly
due to races when migrating, point to both VRAM and system / foreign
device memory. Make the PT code able to handle that by checking,
for each PTE set up, whether it points to local VRAM or to system
memory.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c |  12 +++--
 drivers/gpu/drm/xe/xe_pt.c | 106 ++++++++++++++++---------------------
 2 files changed, 56 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 390f90fbd366..bec788bce95c 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -2091,10 +2091,16 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res)
 {
 	struct xe_device *xe = ttm_to_xe_device(res->bo->bdev);
 
-	if (res->mem_type == XE_PL_STOLEN)
+	switch (res->mem_type) {
+	case XE_PL_STOLEN:
 		return xe_ttm_stolen_gpu_offset(xe);
-
-	return res_to_mem_region(res)->dpa_base;
+	case XE_PL_TT:
+	case XE_PL_SYSTEM:
+		return 0;
+	default:
+		return res_to_mem_region(res)->dpa_base;
+	}
+	return 0;
 }
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 9e719535a3bb..d14b1a28474a 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -278,13 +278,15 @@ struct xe_pt_stage_bind_walk {
 	struct xe_vm *vm;
 	/** @tile: The tile we're building for. */
 	struct xe_tile *tile;
-	/** @default_pte: PTE flag only template. No address is associated */
-	u64 default_pte;
+	/** @default_pte: PTE flag only template for VRAM. No address is associated */
+	u64 default_vram_pte;
+	/** @default_pte: PTE flag only template for VRAM. No address is associated */
+	u64 default_system_pte;
 	/** @dma_offset: DMA offset to add to the PTE. */
 	u64 dma_offset;
 	/**
 	 * @needs_64k: This address range enforces 64K alignment and
-	 * granularity.
+	 * granularity on VRAM.
 	 */
 	bool needs_64K;
 	/**
@@ -515,13 +517,16 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 	if (level == 0 || xe_pt_hugepte_possible(addr, next, level, xe_walk)) {
 		struct xe_res_cursor *curs = xe_walk->curs;
 		bool is_null = xe_vma_is_null(xe_walk->vma);
+		bool is_vram = xe_res_is_vram(curs);
 
 		XE_WARN_ON(xe_walk->va_curs_start != addr);
 
 		pte = vm->pt_ops->pte_encode_vma(is_null ? 0 :
 						 xe_res_dma(curs) + xe_walk->dma_offset,
 						 xe_walk->vma, pat_index, level);
-		pte |= xe_walk->default_pte;
+		if (!is_null)
+			pte |= is_vram ? xe_walk->default_vram_pte :
+				xe_walk->default_system_pte;
 
 		/*
 		 * Set the XE_PTE_PS64 hint if possible, otherwise if
@@ -531,7 +536,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 			if (xe_pt_is_pte_ps64K(addr, next, xe_walk)) {
 				xe_walk->vma->gpuva.flags |= XE_VMA_PTE_64K;
 				pte |= XE_PTE_PS64;
-			} else if (XE_WARN_ON(xe_walk->needs_64K)) {
+			} else if (XE_WARN_ON(xe_walk->needs_64K && is_vram)) {
 				return -EINVAL;
 			}
 		}
@@ -603,6 +608,31 @@ static const struct xe_pt_walk_ops xe_pt_stage_bind_ops = {
 	.pt_entry = xe_pt_stage_bind_entry,
 };
 
+/* The GPU can always to atomics in VRAM */
+static bool xe_atomic_for_vram(struct xe_vm *vm)
+{
+	return true;
+}
+
+/*
+ * iGFX always expect to be able to do atomics in system.
+ *
+ * For DGFX, 3D clients want to do atomics in system that is
+ * not coherent with CPU atomics. Compute clients want
+ * atomics that look coherent with CPU atomics. We
+ * distinguish the two by checking for lr mode. For
+ * compute we then disallow atomics in system.
+ * Compute attempts to perform atomics in system memory would
+ * then cause an unrecoverable page-fault in preempt-fence
+ * mode, but in fault mode the data would be migrated to VRAM
+ * for GPU atomics and to system for CPU atomics.
+ */
+static bool xe_atomic_for_system(struct xe_vm *vm)
+{
+	return (!IS_DGFX(vm->xe) || !xe_vm_in_lr_mode(vm)) &&
+		!vm->xe->info.has_device_atomics_on_smem;
+}
+
 /**
  * xe_pt_stage_bind() - Build a disconnected page-table tree for a given address
  * range.
@@ -629,9 +659,8 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
 {
 	struct xe_device *xe = tile_to_xe(tile);
 	struct xe_bo *bo = xe_vma_bo(vma);
-	bool is_devmem = !xe_vma_is_userptr(vma) && bo &&
-		(xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo));
 	struct xe_res_cursor curs;
+	struct xe_vm *vm = xe_vma_vm(vma);
 	struct xe_pt_stage_bind_walk xe_walk = {
 		.base = {
 			.ops = &xe_pt_stage_bind_ops,
@@ -639,7 +668,7 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
 			.max_level = XE_PT_HIGHEST_LEVEL,
 			.staging = true,
 		},
-		.vm = xe_vma_vm(vma),
+		.vm = vm,
 		.tile = tile,
 		.curs = &curs,
 		.va_curs_start = range ? range->base.itree.start :
@@ -647,26 +676,22 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
 		.vma = vma,
 		.wupd.entries = entries,
 	};
-	struct xe_pt *pt = xe_vma_vm(vma)->pt_root[tile->id];
+	struct xe_pt *pt = vm->pt_root[tile->id];
 	int ret;
 
 	if (range) {
 		/* Move this entire thing to xe_svm.c? */
-		xe_svm_notifier_lock(xe_vma_vm(vma));
+		xe_svm_notifier_lock(vm);
 		if (!xe_svm_range_pages_valid(range)) {
 			xe_svm_range_debug(range, "BIND PREPARE - RETRY");
-			xe_svm_notifier_unlock(xe_vma_vm(vma));
+			xe_svm_notifier_unlock(vm);
 			return -EAGAIN;
 		}
 		if (xe_svm_range_has_dma_mapping(range)) {
 			xe_res_first_dma(range->base.dma_addr, 0,
 					 range->base.itree.last + 1 - range->base.itree.start,
 					 &curs);
-			is_devmem = xe_res_is_vram(&curs);
-			if (is_devmem)
-				xe_svm_range_debug(range, "BIND PREPARE - DMA VRAM");
-			else
-				xe_svm_range_debug(range, "BIND PREPARE - DMA");
+			xe_svm_range_debug(range, "BIND PREPARE - MIXED");
 		} else {
 			xe_assert(xe, false);
 		}
@@ -674,54 +699,17 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
 		 * Note, when unlocking the resource cursor dma addresses may become
 		 * stale, but the bind will be aborted anyway at commit time.
 		 */
-		xe_svm_notifier_unlock(xe_vma_vm(vma));
+		xe_svm_notifier_unlock(vm);
 	}
 
-	xe_walk.needs_64K = (xe_vma_vm(vma)->flags & XE_VM_FLAG_64K) && is_devmem;
-
-	/**
-	 * Default atomic expectations for different allocation scenarios are as follows:
-	 *
-	 * 1. Traditional API: When the VM is not in LR mode:
-	 *    - Device atomics are expected to function with all allocations.
-	 *
-	 * 2. Compute/SVM API: When the VM is in LR mode:
-	 *    - Device atomics are the default behavior when the bo is placed in a single region.
-	 *    - In all other cases device atomics will be disabled with AE=0 until an application
-	 *      request differently using a ioctl like madvise.
-	 */
+	xe_walk.needs_64K = (vm->flags & XE_VM_FLAG_64K);
 	if (vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT) {
-		if (xe_vm_in_lr_mode(xe_vma_vm(vma))) {
-			if (bo && xe_bo_has_single_placement(bo))
-				xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
-			/**
-			 * If a SMEM+LMEM allocation is backed by SMEM, a device
-			 * atomics will cause a gpu page fault and which then
-			 * gets migrated to LMEM, bind such allocations with
-			 * device atomics enabled.
-			 */
-			else if (is_devmem)
-				xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
-		} else {
-			xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
-		}
-
-		/**
-		 * Unset AE if the platform(PVC) doesn't support it on an
-		 * allocation
-		 */
-		if (!xe->info.has_device_atomics_on_smem && !is_devmem)
-			xe_walk.default_pte &= ~XE_USM_PPGTT_PTE_AE;
+		xe_walk.default_vram_pte = xe_atomic_for_vram(vm) ? XE_USM_PPGTT_PTE_AE : 0;
+		xe_walk.default_system_pte = xe_atomic_for_system(vm) ? XE_USM_PPGTT_PTE_AE : 0;
 	}
 
-	if (is_devmem) {
-		xe_walk.default_pte |= XE_PPGTT_PTE_DM;
-		xe_walk.dma_offset = bo ? vram_region_gpu_offset(bo->ttm.resource) : 0;
-	}
-
-	if (!xe_vma_has_no_bo(vma) && xe_bo_is_stolen(bo))
-		xe_walk.dma_offset = xe_ttm_stolen_gpu_offset(xe_bo_device(bo));
-
+	xe_walk.default_vram_pte |= XE_PPGTT_PTE_DM;
+	xe_walk.dma_offset = bo ? vram_region_gpu_offset(bo->ttm.resource) : 0;
 	if (!range)
 		xe_bo_assert_held(bo);
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 13/19] drm/gpusvm: Allow mixed mappings
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
                   ` (11 preceding siblings ...)
  2025-03-12 21:04 ` [RFC PATCH 12/19] drm/xe: Make the PT code handle placement per PTE rather than per vma / range Thomas Hellström
@ 2025-03-12 21:04 ` Thomas Hellström
  2025-03-12 21:04 ` [RFC PATCH 14/19] drm/xe: Add a preferred dpagemap Thomas Hellström
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:04 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Thomas Hellström, himal.prasad.ghimiray, apopple, airlied,
	Simona Vetter, felix.kuehling, Matthew Brost,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

Racing while migrating can cause part of an SVM range to reside in
system and part of the range to reside in local VRAM.

Currently we disallow that and repeatedly try to force everything
out to system memory.

Instead, allow drm_gpusvm_range_get_pages() to be a bit more permissive,
and allow mapping of mixed ranges.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/drm_gpusvm.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/drm_gpusvm.c b/drivers/gpu/drm/drm_gpusvm.c
index 8d836248f5fe..5d502ca091ee 100644
--- a/drivers/gpu/drm/drm_gpusvm.c
+++ b/drivers/gpu/drm/drm_gpusvm.c
@@ -1213,7 +1213,6 @@ int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm,
 		.dev_private_owner = ctx->device_private_page_owner,
 	};
 	struct mm_struct *mm = gpusvm->mm;
-	void *zdd;
 	unsigned long timeout =
 		jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
 	unsigned long i, j;
@@ -1295,7 +1294,6 @@ int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm,
 		goto map_pages;
 	}
 
-	zdd = NULL;
 	num_dma_mapped = 0;
 	for (i = 0, j = 0; i < npages; ++j) {
 		struct page *page = hmm_pfn_to_page(pfns[i]);
@@ -1303,17 +1301,7 @@ int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm,
 		order = hmm_pfn_to_map_order(pfns[i]);
 		if (is_device_private_page(page) ||
 		    is_device_coherent_page(page)) {
-			if (zdd != page->zone_device_data && i > 0) {
-				err = -EOPNOTSUPP;
-				goto err_unmap;
-			}
-			zdd = page->zone_device_data;
 			if (pagemap != page->pgmap) {
-				if (i > 0) {
-					err = -EOPNOTSUPP;
-					goto err_unmap;
-				}
-
 				pagemap = page->pgmap;
 				dpagemap = drm_pagemap_page_to_dpagemap(page);
 				if (drm_WARN_ON(gpusvm->drm, !dpagemap)) {
@@ -1341,7 +1329,7 @@ int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm,
 		} else {
 			dma_addr_t addr;
 
-			if (is_zone_device_page(page) || pagemap) {
+			if (is_zone_device_page(page)) {
 				err = -EOPNOTSUPP;
 				goto err_unmap;
 			}
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 14/19] drm/xe: Add a preferred dpagemap
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
                   ` (12 preceding siblings ...)
  2025-03-12 21:04 ` [RFC PATCH 13/19] drm/gpusvm: Allow mixed mappings Thomas Hellström
@ 2025-03-12 21:04 ` Thomas Hellström
  2025-03-12 21:04 ` [RFC PATCH 15/19] drm/pagemap/util: Add file descriptors pointing to struct drm_pagemap Thomas Hellström
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:04 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Thomas Hellström, himal.prasad.ghimiray, apopple, airlied,
	Simona Vetter, felix.kuehling, Matthew Brost,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

Introduce a preferred dpagemap, that can override the default.
The default is still the local tile vram dpagemap.
The preferred pagemap is intended to be set from user-space.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_svm.c      | 18 +++++++++++++++++-
 drivers/gpu/drm/xe/xe_svm.h      | 22 ++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_vm.c       |  4 ++++
 drivers/gpu/drm/xe/xe_vm_types.h |  6 ++++++
 4 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 25d49d0d7484..660fae255a09 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -814,7 +814,7 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
 		return 0;
 
 	range_debug(range, "PAGE FAULT");
-	dpagemap = xe_tile_local_pagemap(tile);
+	dpagemap = vma->svm.pref_dpagemap ? : xe_tile_local_pagemap(tile);
 
 	/* XXX: Add migration policy, for now migrate range once */
 	if (!range->skip_migrate && range->base.flags.migrate_devmem &&
@@ -1271,4 +1271,20 @@ xe_pagemap_find_or_create(struct xe_device *xe, struct xe_pagemap_cache *cache,
 	mutex_unlock(&cache->mutex);
 	return xpagemap;
 }
+
 #endif
+
+/**
+ * xe_svm_vma_fini() - Finalize the svm part of a vma
+ * @svma: The struct xe_svm_vma to finalize
+ *
+ * Release the resources associated with the svm
+ * metadata of a gpu vma.
+ */
+void xe_svm_vma_fini(struct xe_svm_vma *svma)
+{
+	if (svma->pref_dpagemap) {
+		drm_pagemap_put(svma->pref_dpagemap);
+		svma->pref_dpagemap = NULL;
+	}
+}
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index 3fd8fc125cba..c5d542567cfc 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -24,6 +24,16 @@ struct xe_tile;
 struct xe_vm;
 struct xe_vma;
 
+/**
+ * struct xe_svm_vma - VMA svm metadata
+ * @pref_dpagemap: Reference-counted pointer to the drm_pagemap preferred
+ * for migration on a SVM page-fault. The pointer is protected by the
+ * vm lock.
+ */
+struct xe_svm_vma {
+	struct drm_pagemap *pref_dpagemap;
+};
+
 /** struct xe_svm_range - SVM range */
 struct xe_svm_range {
 	/** @base: base drm_gpusvm_range */
@@ -124,10 +134,18 @@ static inline bool xe_svm_range_has_dma_mapping(struct xe_svm_range *range)
 #define xe_svm_notifier_unlock(vm__)	\
 	drm_gpusvm_notifier_unlock(&(vm__)->svm.gpusvm)
 
+void xe_svm_vma_fini(struct xe_svm_vma *svma);
+
 struct xe_pagemap *
 xe_pagemap_create(struct xe_device *xe, struct xe_pagemap_cache *cache,
 		  struct xe_vram_region *vr);
 
+static inline void xe_svm_vma_assign_dpagemap(struct xe_svm_vma *svma,
+					      struct drm_pagemap *dpagemap)
+{
+	svma->pref_dpagemap = drm_pagemap_get(dpagemap);
+}
+
 #else
 #include <linux/interval_tree.h>
 
@@ -213,6 +231,10 @@ static inline void xe_svm_notifier_unlock(struct xe_vm *vm)
 {
 }
 
+#define xe_svm_vma_fini(...) do {} while (0)
+
+#define xe_svm_vma_assign_dpagemap(...) do {} while (0)
+
 #endif
 
 #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 07c4992fb3d7..4eea35957549 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1246,6 +1246,8 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
 {
 	struct xe_vm *vm = xe_vma_vm(vma);
 
+	xe_svm_vma_fini(&vma->svm);
+
 	if (vma->ufence) {
 		xe_sync_ufence_put(vma->ufence);
 		vma->ufence = NULL;
@@ -2516,6 +2518,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
 				if (IS_ERR(vma))
 					return PTR_ERR(vma);
 
+				xe_svm_vma_assign_dpagemap(&vma->svm, old->svm.pref_dpagemap);
 				op->remap.prev = vma;
 
 				/*
@@ -2547,6 +2550,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct drm_gpuva_ops *ops,
 					return PTR_ERR(vma);
 
 				op->remap.next = vma;
+				xe_svm_vma_assign_dpagemap(&vma->svm, old->svm.pref_dpagemap);
 
 				/*
 				 * Userptr creates a new SG mapping so
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 08baea03df00..0e6b6e0251d1 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -17,6 +17,7 @@
 #include "xe_device_types.h"
 #include "xe_pt_types.h"
 #include "xe_range_fence.h"
+#include "xe_svm.h"
 
 struct xe_bo;
 struct xe_svm_range;
@@ -128,6 +129,11 @@ struct xe_vma {
 	 * Needs to be signalled before UNMAP can be processed.
 	 */
 	struct xe_user_fence *ufence;
+
+#if IS_ENABLED(CONFIG_DRM_XE_GPUSVM)
+	/** @svm: SVM metadata attached to the vma. */
+	struct xe_svm_vma svm;
+#endif
 };
 
 /**
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 15/19] drm/pagemap/util: Add file descriptors pointing to struct drm_pagemap
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
                   ` (13 preceding siblings ...)
  2025-03-12 21:04 ` [RFC PATCH 14/19] drm/xe: Add a preferred dpagemap Thomas Hellström
@ 2025-03-12 21:04 ` Thomas Hellström
  2025-03-12 21:04 ` [RFC PATCH 16/19] drm/xe/migrate: Allow xe_migrate_vram() also on non-pagefault capable devices Thomas Hellström
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:04 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Thomas Hellström, himal.prasad.ghimiray, apopple, airlied,
	Simona Vetter, felix.kuehling, Matthew Brost,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

Falicitate implementing uapi representing a struct drm_pagemap
as a file descriptor.

A drm_pagemap file descriptor holds, while open, a reference to
the struct drm_pagemap and to the drm_pagemap_helper module.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/drm_pagemap_util.c | 78 ++++++++++++++++++++++++++++++
 include/drm/drm_pagemap_util.h     |  4 ++
 2 files changed, 82 insertions(+)

diff --git a/drivers/gpu/drm/drm_pagemap_util.c b/drivers/gpu/drm/drm_pagemap_util.c
index ae8f78cde4a7..4bcd7b8927ee 100644
--- a/drivers/gpu/drm/drm_pagemap_util.c
+++ b/drivers/gpu/drm/drm_pagemap_util.c
@@ -3,6 +3,8 @@
  * Copyright © 2025 Intel Corporation
  */
 
+#include <linux/anon_inodes.h>
+#include <linux/file.h>
 #include <linux/slab.h>
 
 #include <drm/drm_pagemap.h>
@@ -123,3 +125,79 @@ int drm_pagemap_acquire_owner(struct drm_pagemap_peer *peer,
 	return 0;
 }
 EXPORT_SYMBOL(drm_pagemap_acquire_owner);
+
+static int drm_pagemap_file_release(struct inode *inode, struct file *file)
+{
+	drm_pagemap_put(file->private_data);
+
+	return 0;
+}
+
+static const struct file_operations drm_pagemap_fops = {
+	.owner = THIS_MODULE,
+	.release = drm_pagemap_file_release,
+};
+
+/**
+ * drm_pagemap_fd() - Obtain an fd that can be used to reference a drm_pagemap.
+ * @dpagemap: The drm_pagemap for which to obtain an fd.
+ *
+ * Obtain an fd that can be used to reference a drm_pagemap using the function
+ * drm_pagemap_from_fd(). The fd has a reference count on the drm_pagemap, and
+ * on this module. When the fd is closed and the underlying struct file is
+ * released, the references are dropped.
+ */
+int drm_pagemap_fd(struct drm_pagemap *dpagemap)
+{
+	struct file *file;
+	int fd;
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0)
+		return fd;
+
+	file = anon_inode_getfile("drm_pagemap_file",
+				  &drm_pagemap_fops,
+				  dpagemap, 0);
+	if (IS_ERR(file)) {
+		put_unused_fd(fd);
+		return PTR_ERR(file);
+	}
+
+	drm_pagemap_get(dpagemap);
+	fd_install(fd, file);
+
+	return fd;
+}
+EXPORT_SYMBOL(drm_pagemap_fd);
+
+/**
+ * drm_pagemap_from_fd() - Get a drm_pagemap from a file descriptor
+ * @fd: The file descriptor
+ *
+ * Return a reference-counted pointer to a drm_pagemap from
+ * a file-descriptor, typically obtained from drm_pagemap_fd().
+ * The pagemap pointer should be put using drm_pagemap_put() when
+ * no longer in use.
+ *
+ * Return: A valid drm_pagemap pointer on success. Error pointer on failure.
+ */
+struct drm_pagemap *drm_pagemap_from_fd(unsigned int fd)
+{
+	struct file *file = fget(fd);
+	struct drm_pagemap *dpagemap;
+
+	if (!file)
+		return ERR_PTR(-ENOENT);
+
+	if (file->f_op != &drm_pagemap_fops) {
+		fput(file);
+		return ERR_PTR(-ENOENT);
+	}
+
+	dpagemap = drm_pagemap_get(file->private_data);
+	fput(file);
+
+	return dpagemap;
+}
+EXPORT_SYMBOL(drm_pagemap_from_fd);
diff --git a/include/drm/drm_pagemap_util.h b/include/drm/drm_pagemap_util.h
index 03731c79493f..8f9676a469fb 100644
--- a/include/drm/drm_pagemap_util.h
+++ b/include/drm/drm_pagemap_util.h
@@ -52,4 +52,8 @@ int drm_pagemap_acquire_owner(struct drm_pagemap_peer *peer,
 			      struct drm_pagemap_owner_list *owner_list,
 			      bool (*has_interconnect)(struct drm_pagemap_peer *peer1,
 						       struct drm_pagemap_peer *peer2));
+
+int drm_pagemap_fd(struct drm_pagemap *dpagemap);
+
+struct drm_pagemap *drm_pagemap_from_fd(unsigned int fd);
 #endif
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 16/19] drm/xe/migrate: Allow xe_migrate_vram() also on non-pagefault capable devices
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
                   ` (14 preceding siblings ...)
  2025-03-12 21:04 ` [RFC PATCH 15/19] drm/pagemap/util: Add file descriptors pointing to struct drm_pagemap Thomas Hellström
@ 2025-03-12 21:04 ` Thomas Hellström
  2025-03-12 21:04 ` [RFC PATCH 17/19] drm/xe/uapi: Add the devmem_open ioctl Thomas Hellström
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:04 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Thomas Hellström, himal.prasad.ghimiray, apopple, airlied,
	Simona Vetter, felix.kuehling, Matthew Brost,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

The drm_pagemap functionality does not depend on the device having
recoverable pagefaults available. So allow xe_migrate_vram() also for
such devices. Even if this will have little use in practice, it's
beneficial for testin multi-device SVM, since a memory provider could
be a non-pagefault capable gpu.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_migrate.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 3894efe7ba60..23c258b775a0 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -1609,6 +1609,7 @@ static struct dma_fence *xe_migrate_vram(struct xe_migrate *m,
 {
 	struct xe_gt *gt = m->tile->primary_gt;
 	struct xe_device *xe = gt_to_xe(gt);
+	bool use_usm_batch = xe->info.has_usm;
 	struct dma_fence *fence = NULL;
 	u32 batch_size = 2;
 	u64 src_L0_ofs, dst_L0_ofs;
@@ -1625,7 +1626,7 @@ static struct dma_fence *xe_migrate_vram(struct xe_migrate *m,
 	batch_size += pte_update_cmd_size(round_update_size);
 	batch_size += EMIT_COPY_DW;
 
-	bb = xe_bb_new(gt, batch_size, true);
+	bb = xe_bb_new(gt, batch_size, use_usm_batch);
 	if (IS_ERR(bb)) {
 		err = PTR_ERR(bb);
 		return ERR_PTR(err);
@@ -1650,7 +1651,7 @@ static struct dma_fence *xe_migrate_vram(struct xe_migrate *m,
 		  XE_PAGE_SIZE);
 
 	job = xe_bb_create_migration_job(m->q, bb,
-					 xe_migrate_batch_base(m, true),
+					 xe_migrate_batch_base(m, use_usm_batch),
 					 update_idx);
 	if (IS_ERR(job)) {
 		err = PTR_ERR(job);
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 17/19] drm/xe/uapi: Add the devmem_open ioctl
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
                   ` (15 preceding siblings ...)
  2025-03-12 21:04 ` [RFC PATCH 16/19] drm/xe/migrate: Allow xe_migrate_vram() also on non-pagefault capable devices Thomas Hellström
@ 2025-03-12 21:04 ` Thomas Hellström
  2025-03-12 21:04 ` [RFC PATCH 18/19] drm/xe/uapi: HAX: Add the xe_madvise_prefer_devmem IOCTL Thomas Hellström
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:04 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Thomas Hellström, himal.prasad.ghimiray, apopple, airlied,
	Simona Vetter, felix.kuehling, Matthew Brost,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

Add an IOCTL to get a file descriptor referencing a memory region
for SVM.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_device.c |  1 +
 drivers/gpu/drm/xe/xe_svm.c    | 50 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_svm.h    |  8 ++++++
 include/uapi/drm/xe_drm.h      | 29 ++++++++++++++++++++
 4 files changed, 88 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 68de09db9ad5..160b3c189de0 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -195,6 +195,7 @@ static const struct drm_ioctl_desc xe_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(XE_WAIT_USER_FENCE, xe_wait_user_fence_ioctl,
 			  DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(XE_OBSERVATION, xe_observation_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(XE_DEVMEM_OPEN, xe_devmem_open_ioctl, DRM_RENDER_ALLOW),
 };
 
 static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 660fae255a09..ebdd27b02be7 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -6,6 +6,7 @@
 #include <drm/drm_drv.h>
 #include <drm/drm_managed.h>
 #include <drm/drm_pagemap.h>
+#include <drm/drm_pagemap_util.h>
 
 #include "xe_bo.h"
 #include "xe_gt_tlb_invalidation.h"
@@ -1272,6 +1273,55 @@ xe_pagemap_find_or_create(struct xe_device *xe, struct xe_pagemap_cache *cache,
 	return xpagemap;
 }
 
+/**
+ * xe_devmem_open_ioctl() - IOCTL callback implementing the devmem_open functionality
+ * @dev: The struct drm_device.
+ * @data: The ioctl argurment.
+ * @file: The drm file.
+ *
+ * For the given xe device and memory region, open a pagemap and return a
+ * file descriptor that can be used to reference the pagemap. First,
+ * attempt to look up an already used or cached pagemap. If that fails,
+ * create a new one.
+ *
+ * Return: %0 on success. Negative error code on failure.
+ */
+int xe_devmem_open_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
+{
+	struct xe_device *xe = to_xe_device(dev);
+	struct drm_xe_devmem_open *args = data;
+	struct drm_pagemap *dpagemap;
+	struct xe_pagemap *xpagemap;
+	struct xe_vram_region *vr;
+	u32 tile_id;
+	int fd;
+
+	if (XE_IOCTL_DBG(xe, args->extensions) ||
+	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
+		return -EINVAL;
+
+	tile_id = (u32)args->region_instance - 1;
+	if (XE_IOCTL_DBG(xe, tile_id >= xe->info.tile_count))
+		return -ENOENT;
+
+	if (XE_IOCTL_DBG(xe, !((BIT(tile_id) << 1) & xe->info.mem_region_mask)))
+		return -ENOENT;
+
+	vr = &xe->tiles[tile_id].mem.vram;
+	xpagemap = xe_pagemap_find_or_create(xe, &vr->pagemap_cache, vr);
+	if (XE_IOCTL_DBG(xe, IS_ERR(xpagemap)))
+		return -ENOENT;
+
+	dpagemap = &xpagemap->dpagemap;
+	fd = drm_pagemap_fd(dpagemap);
+	xe_pagemap_put(xpagemap);
+	if (XE_IOCTL_DBG(xe, fd < 0))
+		return fd;
+
+	args->pagemap_fd = fd;
+
+	return 0;
+}
 #endif
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index c5d542567cfc..4f1a9e410dad 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -149,6 +149,8 @@ static inline void xe_svm_vma_assign_dpagemap(struct xe_svm_vma *svma,
 #else
 #include <linux/interval_tree.h>
 
+struct drm_device;
+struct drm_file;
 struct drm_pagemap_device_addr;
 struct xe_bo;
 struct xe_device;
@@ -243,6 +245,8 @@ int xe_pagemap_cache_init(struct drm_device *drm, struct xe_pagemap_cache *cache
 
 void xe_pagemaps_remove(struct xe_device *xe);
 
+int xe_devmem_open_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
+
 #else
 
 #define xe_pagemap_cache_init(...) 0
@@ -251,6 +255,10 @@ static inline void xe_pagemaps_remove(struct xe_device *xe)
 {
 }
 
+static inline int xe_devmem_open_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 #endif
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 616916985e3f..bb22413713f0 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -102,6 +102,7 @@ extern "C" {
 #define DRM_XE_EXEC			0x09
 #define DRM_XE_WAIT_USER_FENCE		0x0a
 #define DRM_XE_OBSERVATION		0x0b
+#define DRM_XE_DEVMEM_OPEN		0x0c
 
 /* Must be kept compact -- no holes */
 
@@ -117,6 +118,7 @@ extern "C" {
 #define DRM_IOCTL_XE_EXEC			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_EXEC, struct drm_xe_exec)
 #define DRM_IOCTL_XE_WAIT_USER_FENCE		DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_WAIT_USER_FENCE, struct drm_xe_wait_user_fence)
 #define DRM_IOCTL_XE_OBSERVATION		DRM_IOW(DRM_COMMAND_BASE + DRM_XE_OBSERVATION, struct drm_xe_observation_param)
+#define DRM_IOCTL_XE_DEVMEM_OPEN                DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_DEVMEM_OPEN, struct drm_xe_devmem_open)
 
 /**
  * DOC: Xe IOCTL Extensions
@@ -1961,6 +1963,33 @@ struct drm_xe_query_eu_stall {
 	__u64 sampling_rates[];
 };
 
+/**
+ * struct drm_xe_devmem_open - Get a file-descriptor representing
+ * device memory on a specific tile on a specific device.
+ */
+struct drm_xe_devmem_open {
+	/** @extensions: Pointer to the first extension struct, if any */
+	__u64 extensions;
+
+	/** @region_instance: The memory region describing the device memory to open. */
+	__u16 region_instance;
+
+	/** @pad: MBZ */
+	__u16 pad;
+
+	/**
+	 * @pagemap_fd: On successful return, a file descriptor
+	 * representing the device memory to open.
+	 * Should be close()d when no longer in use. The file
+	 * descriptor can be used to represent the device memory in
+	 * gpu madvise ioctl and the devmem_allow ioctl.
+	 */
+	__u32 pagemap_fd;
+
+	/** @reserved: Reserved */
+	__u64 reserved[2];
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 18/19] drm/xe/uapi: HAX: Add the xe_madvise_prefer_devmem IOCTL
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
                   ` (16 preceding siblings ...)
  2025-03-12 21:04 ` [RFC PATCH 17/19] drm/xe/uapi: Add the devmem_open ioctl Thomas Hellström
@ 2025-03-12 21:04 ` Thomas Hellström
  2025-03-12 21:04 ` [RFC PATCH 19/19] drm/xe: HAX: Use pcie p2p dma to test fast interconnect Thomas Hellström
  2025-03-13 10:19 ` [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Christian König
  19 siblings, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:04 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Thomas Hellström, himal.prasad.ghimiray, apopple, airlied,
	Simona Vetter, felix.kuehling, Matthew Brost,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

As a POC, add an xe_madvise_prefer_devmem IOCTL so that the user
can set the preferred pagemap to migrate to for a given memory
region (in this POC, the memory region is the whole GPU VM).

This is intended to be replaced by a proper madvise IOCTL, probably
with improved functionality

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_device.c |  2 +
 drivers/gpu/drm/xe/xe_svm.c    | 72 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_svm.h    |  9 +++++
 include/uapi/drm/xe_drm.h      | 10 +++++
 4 files changed, 93 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 160b3c189de0..a6ac699e9d12 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -196,6 +196,8 @@ static const struct drm_ioctl_desc xe_ioctls[] = {
 			  DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(XE_OBSERVATION, xe_observation_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(XE_DEVMEM_OPEN, xe_devmem_open_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(XE_MADVISE_PREFER_DEVMEM, xe_madvise_prefer_devmem_ioctl,
+			  DRM_RENDER_ALLOW),
 };
 
 static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index ebdd27b02be7..56c2c731be27 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -1338,3 +1338,75 @@ void xe_svm_vma_fini(struct xe_svm_vma *svma)
 		svma->pref_dpagemap = NULL;
 	}
 }
+
+/**
+ * xe_madvise_prefer_devmem_ioctl() - POC IOCTL callback implementing a rudimentary
+ * version of a madvise prefer_devmem() functionality.
+ * @dev: The struct drm_device.
+ * @data: The ioctl argurment.
+ * @file: The drm file.
+ *
+ * For the given gpu vm, look up all SVM gpu vmas and assign the preferred
+ * drm pagemap for migration to the one associated with the file-descriptor
+ * given by this function. If a negative (invalid) file descriptor is given,
+ * the function instead clears the preferred drm pagemap, meaning that at
+ * fault time, the drm pagemap associated with the same tile as the client
+ * is used.
+ *
+ * Return: %0 on success. Negative error code on failure.
+ */
+int xe_madvise_prefer_devmem_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
+{
+	struct xe_device *xe = to_xe_device(dev);
+	struct xe_file *xef = to_xe_file(file);
+	struct drm_xe_madvise_prefer_devmem *args = data;
+	struct xe_vm *vm;
+	struct drm_pagemap *dpagemap;
+	struct drm_gpuva *gpuva;
+	struct xe_vma *gvma;
+	int err = 0;
+
+	if (XE_IOCTL_DBG(xe, args->extensions) ||
+	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
+		return -EINVAL;
+
+	vm = xe_vm_lookup(xef, args->vm_id);
+	if (XE_IOCTL_DBG(xe, !vm))
+		return -EINVAL;
+
+	if (args->devmem_fd < 0) {
+		dpagemap = NULL;
+	} else {
+		dpagemap = drm_pagemap_from_fd(args->devmem_fd);
+		if (XE_IOCTL_DBG(xe, IS_ERR(dpagemap))) {
+			err = PTR_ERR(dpagemap);
+			goto out_no_dpagemap;
+		}
+
+		if (XE_IOCTL_DBG(xe, drm_dev_is_unplugged(dpagemap->drm))) {
+			err = -ENODEV;
+			goto out_no_lock;
+		}
+	}
+
+	err = down_write_killable(&vm->lock);
+	if (err)
+		goto out_no_lock;
+
+	drm_gpuvm_for_each_va(gpuva, &vm->gpuvm) {
+		gvma = gpuva_to_vma(gpuva);
+		if (!xe_vma_is_cpu_addr_mirror(gvma))
+			continue;
+
+		if (dpagemap != gvma->svm.pref_dpagemap) {
+			drm_pagemap_put(gvma->svm.pref_dpagemap);
+			gvma->svm.pref_dpagemap = drm_pagemap_get(dpagemap);
+		}
+	}
+	up_write(&vm->lock);
+out_no_lock:
+	drm_pagemap_put(dpagemap);
+out_no_dpagemap:
+	xe_vm_put(vm);
+	return err;
+}
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index 4f1a9e410dad..7c076c36c1c5 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -146,6 +146,8 @@ static inline void xe_svm_vma_assign_dpagemap(struct xe_svm_vma *svma,
 	svma->pref_dpagemap = drm_pagemap_get(dpagemap);
 }
 
+int xe_madvise_prefer_devmem_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
+
 #else
 #include <linux/interval_tree.h>
 
@@ -237,6 +239,12 @@ static inline void xe_svm_notifier_unlock(struct xe_vm *vm)
 
 #define xe_svm_vma_assign_dpagemap(...) do {} while (0)
 
+static inline int
+xe_madvise_prefer_devmem_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
+{
+	return -EOPNOTSUPP;
+}
+
 #endif
 
 #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
@@ -259,6 +267,7 @@ static inline int xe_devmem_open_ioctl(struct drm_device *dev, void *data, struc
 {
 	return -EOPNOTSUPP;
 }
+
 #endif
 
 #endif
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index bb22413713f0..d9572cfb5a10 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -103,6 +103,7 @@ extern "C" {
 #define DRM_XE_WAIT_USER_FENCE		0x0a
 #define DRM_XE_OBSERVATION		0x0b
 #define DRM_XE_DEVMEM_OPEN		0x0c
+#define DRM_XE_MADVISE_PREFER_DEVMEM    0x0d
 
 /* Must be kept compact -- no holes */
 
@@ -119,6 +120,7 @@ extern "C" {
 #define DRM_IOCTL_XE_WAIT_USER_FENCE		DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_WAIT_USER_FENCE, struct drm_xe_wait_user_fence)
 #define DRM_IOCTL_XE_OBSERVATION		DRM_IOW(DRM_COMMAND_BASE + DRM_XE_OBSERVATION, struct drm_xe_observation_param)
 #define DRM_IOCTL_XE_DEVMEM_OPEN                DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_DEVMEM_OPEN, struct drm_xe_devmem_open)
+#define DRM_IOCTL_XE_MADVISE_PREFER_DEVMEM      DRM_IOW(DRM_COMMAND_BASE + DRM_XE_MADVISE_PREFER_DEVMEM, struct drm_xe_madvise_prefer_devmem)
 
 /**
  * DOC: Xe IOCTL Extensions
@@ -1990,6 +1992,14 @@ struct drm_xe_devmem_open {
 	__u64 reserved[2];
 };
 
+struct drm_xe_madvise_prefer_devmem {
+	/** @extensions: Pointer to the first extension struct, if any */
+	__u64 extensions;
+	__u32 vm_id;
+	__u32 devmem_fd;
+	__u64 reserved[2];
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [RFC PATCH 19/19] drm/xe: HAX: Use pcie p2p dma to test fast interconnect
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
                   ` (17 preceding siblings ...)
  2025-03-12 21:04 ` [RFC PATCH 18/19] drm/xe/uapi: HAX: Add the xe_madvise_prefer_devmem IOCTL Thomas Hellström
@ 2025-03-12 21:04 ` Thomas Hellström
  2025-03-13 10:19 ` [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Christian König
  19 siblings, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-12 21:04 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: Thomas Hellström, himal.prasad.ghimiray, apopple, airlied,
	Simona Vetter, felix.kuehling, Matthew Brost,
	Christian König, dakr, Mrozek, Michal, Joonas Lahtinen

Knowing that this is not the correct way to support pcie_p2p over hmm
+ the dma api, pretend that pcie_p2p is a driver-private fast
interconnect to demonstrate how multi-device SVM can be done.

This has been used to test SVM on a BMG client with a pagemap
created on a DG1 GPU over pcie p2p.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_svm.c | 50 ++++++++++++++++++++++++++++++++++---
 drivers/gpu/drm/xe/xe_svm.h |  1 +
 2 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 56c2c731be27..0b562b411fa4 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -3,6 +3,8 @@
  * Copyright © 2024 Intel Corporation
  */
 
+#include <linux/pci-p2pdma.h>
+
 #include <drm/drm_drv.h>
 #include <drm/drm_managed.h>
 #include <drm/drm_pagemap.h>
@@ -379,6 +381,25 @@ static u64 xe_page_to_dpa(struct page *page)
 	return dpa;
 }
 
+static u64 xe_page_to_pcie(struct page *page)
+{
+	struct xe_pagemap *xpagemap = xe_page_to_pagemap(page);
+	struct xe_vram_region *vr = xe_pagemap_to_vr(xpagemap);
+	u64 hpa_base = xpagemap->hpa_base;
+	u64 ioaddr;
+	struct xe_tile *tile = xe_vr_to_tile(vr);
+	u64 pfn = page_to_pfn(page);
+	u64 offset;
+
+	xe_tile_assert(tile, is_device_private_page(page));
+	xe_tile_assert(tile, (pfn << PAGE_SHIFT) >= hpa_base);
+
+	offset = (pfn << PAGE_SHIFT) - hpa_base;
+	ioaddr = vr->io_start + offset;
+
+	return ioaddr;
+}
+
 enum xe_svm_copy_dir {
 	XE_SVM_COPY_TO_VRAM,
 	XE_SVM_COPY_TO_SRAM,
@@ -940,13 +961,27 @@ xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap,
 		addr = xe_page_to_dpa(page);
 		prot = XE_INTERCONNECT_VRAM;
 	} else {
-		addr = DMA_MAPPING_ERROR;
-		prot = 0;
+		addr = dma_map_resource(dev,
+					xe_page_to_pcie(page),
+					PAGE_SIZE << order, dir,
+					DMA_ATTR_SKIP_CPU_SYNC);
+		prot = XE_INTERCONNECT_P2P;
 	}
 
 	return drm_pagemap_device_addr_encode(addr, prot, order, dir);
 }
 
+static void xe_drm_pagemap_device_unmap(struct drm_pagemap *dpagemap,
+					struct device *dev,
+					struct drm_pagemap_device_addr addr)
+{
+	if (addr.proto != XE_INTERCONNECT_P2P)
+		return;
+
+	dma_unmap_resource(dev, addr.addr, PAGE_SIZE << addr.order,
+			   addr.dir, DMA_ATTR_SKIP_CPU_SYNC);
+}
+
 static void xe_pagemap_fini(struct xe_pagemap *xpagemap)
 {
 	struct dev_pagemap *pagemap = &xpagemap->pagemap;
@@ -1004,13 +1039,22 @@ static bool xe_has_interconnect(struct drm_pagemap_peer *peer1,
 	struct device *dev1 = xpagemap1->dpagemap.drm->dev;
 	struct device *dev2 = xpagemap2->dpagemap.drm->dev;
 
-	return dev1 == dev2;
+	if (dev1 == dev2)
+		return true;
+
+	/* Define this if your system can correctly identify pci_p2p capability */
+#ifdef XE_P2P_CAPABLE
+	return pci_p2pdma_distance(to_pci_dev(dev1), dev2, true) >= 0;
+#else
+	return true;
+#endif
 }
 
 static DRM_PAGEMAP_OWNER_LIST_DEFINE(xe_owner_list);
 
 static const struct drm_pagemap_ops xe_drm_pagemap_ops = {
 	.device_map = xe_drm_pagemap_device_map,
+	.device_unmap = xe_drm_pagemap_device_unmap,
 	.populate_mm = xe_drm_pagemap_populate_mm,
 	.destroy = xe_pagemap_destroy,
 };
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index 7c076c36c1c5..59b7a46f2bd9 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -13,6 +13,7 @@
 #include <drm/drm_pagemap_util.h>
 
 #define XE_INTERCONNECT_VRAM DRM_INTERCONNECT_DRIVER
+#define XE_INTERCONNECT_P2P (XE_INTERCONNECT_VRAM + 1)
 
 struct drm_device;
 struct drm_file;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM
  2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
                   ` (18 preceding siblings ...)
  2025-03-12 21:04 ` [RFC PATCH 19/19] drm/xe: HAX: Use pcie p2p dma to test fast interconnect Thomas Hellström
@ 2025-03-13 10:19 ` Christian König
  2025-03-13 12:50   ` Thomas Hellström
  19 siblings, 1 reply; 26+ messages in thread
From: Christian König @ 2025-03-13 10:19 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe, dri-devel
  Cc: himal.prasad.ghimiray, apopple, airlied, Simona Vetter,
	felix.kuehling, Matthew Brost, dakr, Mrozek, Michal,
	Joonas Lahtinen

Am 12.03.25 um 22:03 schrieb Thomas Hellström:
> This RFC implements and requests comments for a way to handle SVM with multi-device,
> typically with fast interconnects. It adds generic code and helpers in drm, and
> device-specific code for xe.
>
> For SVM, devices set up maps of device-private struct pages, using a struct dev_pagemap,
> The CPU virtual address space (mm), can then be set up using special page-table entries
> to point to such pages, but they can't be accessed directly by the CPU, but possibly
> by other devices using a fast interconnect. This series aims to provide helpers to
> identify pagemaps that take part in such a fast interconnect and to aid in migrating
> between them.
>
> This is initially done by augmenting the struct dev_pagemap with a struct drm_pagemap,
> and having the struct drm_pagemap implement a "populate_mm" method, where a region of
> the CPU virtual address space (mm) is populated with device_private pages from the
> dev_pagemap associated with the drm_pagemap, migrating data from system memory or other
> devices if necessary. The drm_pagemap_populate_mm() function is then typically called
> from a fault handler, using the struct drm_pagemap pointer of choice. It could be
> referencing a local drm_pagemap or a remote one. The migration is now completely done
> by drm_pagemap callbacks, (typically using a copy-engine local to the dev_pagemap local
> memory).

Up till here that makes sense. Maybe not necessary to be put into the DRM layer, but that is an implementation detail.

> In addition there are helpers to build a drm_pagemap UAPI using file-descripors
> representing struct drm_pagemaps, and a helper to register devices with a common
> fast interconnect. The UAPI is intended to be private to the device, but if drivers
> agree to identify struct drm_pagemaps by file descriptors one could in theory
> do cross-driver multi-device SVM if a use-case were found.

But this completely eludes me.

Why would you want an UAPI for representing pagemaps as file descriptors? Isn't it the kernel which enumerates the interconnects of the devices?

I mean we somehow need to expose those interconnects between devices to userspace, e.g. like amdgpu does with it's XGMI connectors. But that is static for the hardware (unless HW is hot removed/added) and so I would assume exposed through sysfs.

Thanks,
Christian.

> The implementation for the Xe driver uses dynamic pagemaps which are created on first
> use and removed 5s after the last reference is gone. Pagemaps are revoked on
> device unbind, and data is then migrated to system.
>
> Status:
> This is a POC series. It has been tested with an IGT test soon to be published, with a
> DG1 drm_pagemap and a BattleMage SVM client. There is separate work ongoing for the
> gpu_madvise functionality.
>
> The Xe implementation of the "populate_mm()" callback is
> still rudimentary and doesn't migrate from foreign devices. It should be tuned to do
> smarter choices.
>
> Any feedback appreciated.
>
> Patch overview:
> Patch 1:
> - Extends the way the Xe driver can compile out SVM support and pagemaps.
> Patch 2:
> - Fixes an existing potential UAF in the Xe SVM code.
> Patch 3:
> - Introduces the drm_pagemap.c file and moves drm_pagemap functionality to it.
> Patch 4:
> - Adds a populate_mm op to drm_pagemap.
> Patch 5:
> - Implement Xe's version of the populate_mm op.
> Patch 6:
> - Refcount struct drm_pagemap.
> Patch 7:
> - Cleanup patch.
> Patch 8:
> - Add a bo_remove callback for Xe, Used during device unbind.
> Patch 9:
> - Add a drm_pagemap utility to calculate a common owner structure
> Patch 10:
> - Adopt GPUSVM to a (sort of) dynamic owner.
> Patch 11:
> - Xe calculates the dev_private owner using the drm_pagemap utility.
> Patch 12:
> - Update the Xe page-table code to handle per range mixed system / device_private placement.
> Patch 13:
> - Modify GPUSVM to allow such placements.
> Patch 14:
> - Add a preferred pagemap to use by the Xe fault handler.
> Patch 15:
> - Add a utility that converts between drm_pagemaps and file-descriptors and back.
> Patch 16:
> - Fix Xe so that also devices without fault capability can publish drm_pagemaps.
> Patch 17:
> - Add the devmem_open UAPI, creating a drm_pagemap file descriptor from a
>   (device, region) pair.
> Patch 18:
> - (Only for POC) Add an GPU madvise prefer_devmem IOCTL.
> Patch 19:
> - (Only for POC) Implement pcie p2p DMA as a fast interconnect and test.
>
> Matthew Brost (1):
>   drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap
>
> Thomas Hellström (18):
>   drm/xe: Introduce CONFIG_DRM_XE_GPUSVM
>   drm/xe/svm: Fix a potential bo UAF
>   drm/pagemap: Add a populate_mm op
>   drm/xe: Implement and use the drm_pagemap populate_mm op
>   drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap and manage
>     lifetime
>   drm/pagemap: Get rid of the struct
>     drm_pagemap_zdd::device_private_page_owner field
>   drm/xe/bo: Add a bo remove callback
>   drm/pagemap_util: Add a utility to assign an owner to a set of
>     interconnected gpus
>   drm/gpusvm, drm/xe: Move the device private owner to the
>     drm_gpusvm_ctx
>   drm/xe: Use the drm_pagemap_util helper to get a svm pagemap owner
>   drm/xe: Make the PT code handle placement per PTE rather than per vma
>     / range
>   drm/gpusvm: Allow mixed mappings
>   drm/xe: Add a preferred dpagemap
>   drm/pagemap/util: Add file descriptors pointing to struct drm_pagemap
>   drm/xe/migrate: Allow xe_migrate_vram() also on non-pagefault capable
>     devices
>   drm/xe/uapi: Add the devmem_open ioctl
>   drm/xe/uapi: HAX: Add the xe_madvise_prefer_devmem IOCTL
>   drm/xe: HAX: Use pcie p2p dma to test fast interconnect
>
>  Documentation/gpu/rfc/gpusvm.rst     |  12 +-
>  drivers/gpu/drm/Makefile             |   7 +-
>  drivers/gpu/drm/drm_gpusvm.c         | 782 +---------------------
>  drivers/gpu/drm/drm_pagemap.c        | 940 +++++++++++++++++++++++++++
>  drivers/gpu/drm/drm_pagemap_util.c   | 203 ++++++
>  drivers/gpu/drm/xe/Kconfig           |  24 +-
>  drivers/gpu/drm/xe/Makefile          |   2 +-
>  drivers/gpu/drm/xe/xe_bo.c           |  65 +-
>  drivers/gpu/drm/xe/xe_bo.h           |   2 +
>  drivers/gpu/drm/xe/xe_bo_types.h     |   2 +-
>  drivers/gpu/drm/xe/xe_device.c       |   8 +
>  drivers/gpu/drm/xe/xe_device_types.h |  30 +-
>  drivers/gpu/drm/xe/xe_migrate.c      |   8 +-
>  drivers/gpu/drm/xe/xe_pt.c           | 112 ++--
>  drivers/gpu/drm/xe/xe_query.c        |   2 +-
>  drivers/gpu/drm/xe/xe_svm.c          | 716 +++++++++++++++++---
>  drivers/gpu/drm/xe/xe_svm.h          | 158 ++++-
>  drivers/gpu/drm/xe/xe_tile.c         |  20 +-
>  drivers/gpu/drm/xe/xe_tile.h         |  33 +
>  drivers/gpu/drm/xe/xe_vm.c           |   6 +-
>  drivers/gpu/drm/xe/xe_vm_types.h     |   7 +
>  include/drm/drm_gpusvm.h             | 102 +--
>  include/drm/drm_pagemap.h            | 190 +++++-
>  include/drm/drm_pagemap_util.h       |  59 ++
>  include/uapi/drm/xe_drm.h            |  39 ++
>  25 files changed, 2458 insertions(+), 1071 deletions(-)
>  create mode 100644 drivers/gpu/drm/drm_pagemap.c
>  create mode 100644 drivers/gpu/drm/drm_pagemap_util.c
>  create mode 100644 include/drm/drm_pagemap_util.h
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM
  2025-03-13 10:19 ` [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Christian König
@ 2025-03-13 12:50   ` Thomas Hellström
  2025-03-13 12:57     ` Christian König
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Hellström @ 2025-03-13 12:50 UTC (permalink / raw)
  To: Christian König, intel-xe, dri-devel
  Cc: himal.prasad.ghimiray, apopple, airlied, Simona Vetter,
	felix.kuehling, Matthew Brost, dakr, Mrozek, Michal,
	Joonas Lahtinen

Hi, Christian

On Thu, 2025-03-13 at 11:19 +0100, Christian König wrote:
> Am 12.03.25 um 22:03 schrieb Thomas Hellström:
> > This RFC implements and requests comments for a way to handle SVM
> > with multi-device,
> > typically with fast interconnects. It adds generic code and helpers
> > in drm, and
> > device-specific code for xe.
> > 
> > For SVM, devices set up maps of device-private struct pages, using
> > a struct dev_pagemap,
> > The CPU virtual address space (mm), can then be set up using
> > special page-table entries
> > to point to such pages, but they can't be accessed directly by the
> > CPU, but possibly
> > by other devices using a fast interconnect. This series aims to
> > provide helpers to
> > identify pagemaps that take part in such a fast interconnect and to
> > aid in migrating
> > between them.
> > 
> > This is initially done by augmenting the struct dev_pagemap with a
> > struct drm_pagemap,
> > and having the struct drm_pagemap implement a "populate_mm" method,
> > where a region of
> > the CPU virtual address space (mm) is populated with device_private
> > pages from the
> > dev_pagemap associated with the drm_pagemap, migrating data from
> > system memory or other
> > devices if necessary. The drm_pagemap_populate_mm() function is
> > then typically called
> > from a fault handler, using the struct drm_pagemap pointer of
> > choice. It could be
> > referencing a local drm_pagemap or a remote one. The migration is
> > now completely done
> > by drm_pagemap callbacks, (typically using a copy-engine local to
> > the dev_pagemap local
> > memory).
> 
> Up till here that makes sense. Maybe not necessary to be put into the
> DRM layer, but that is an implementation detail.
> 
> > In addition there are helpers to build a drm_pagemap UAPI using
> > file-descripors
> > representing struct drm_pagemaps, and a helper to register devices
> > with a common
> > fast interconnect. The UAPI is intended to be private to the
> > device, but if drivers
> > agree to identify struct drm_pagemaps by file descriptors one could
> > in theory
> > do cross-driver multi-device SVM if a use-case were found.
> 
> But this completely eludes me.
> 
> Why would you want an UAPI for representing pagemaps as file
> descriptors? Isn't it the kernel which enumerates the interconnects
> of the devices?
> 
> I mean we somehow need to expose those interconnects between devices
> to userspace, e.g. like amdgpu does with it's XGMI connectors. But
> that is static for the hardware (unless HW is hot removed/added) and
> so I would assume exposed through sysfs.

Thanks for the feedback.

The idea here is not to expose the interconnects but rather have a way
for user-space to identify a drm_pagemap and some level of access- and
lifetime control.

For Xe, If an application wants to use a particular drm_pagemap it
calls an ioctl:

pagemap_fd = drm_xe_ioctl_pagemap_open(exporting_device_fd,
memory_region);

And then when it's no longer used
close(pagemap_fd)

To use it for a memory range, the intended idea is call gpu madvise
ioctl:
 
err = drm_xe_ioctl_gpu_madvise(local_device_fd, range, pagemap_fd);

Now, if there is no fast interconnect between the two, the madvise call
could just return an error. All this ofc assumes that user-space is
somehow aware of the fast interconnect topology but how that is exposed
is beyond the scope of this first series. (Suggestions welcome).

The advantage of the above approach is
1) We get some level of access control. If the user doesn't have access
to the exporting device, he/she can't obtain a pagemap file descriptor.

2) Lifetime control. The pagemaps are memory hungry, but also take
considerable time to set up and tear down.

3) It's a driver-independent approach.

One could ofc use a different approach by feeding the gpu_madvise()
ioctl with a remote device file descriptor and whatever information is
needed for the remote device to identify the drm_pagemap. That would
not be driver independent, though. Not sure how important that is.

/Thomas


> 
> Thanks,
> Christian.
> 
> > The implementation for the Xe driver uses dynamic pagemaps which
> > are created on first
> > use and removed 5s after the last reference is gone. Pagemaps are
> > revoked on
> > device unbind, and data is then migrated to system.
> > 
> > Status:
> > This is a POC series. It has been tested with an IGT test soon to
> > be published, with a
> > DG1 drm_pagemap and a BattleMage SVM client. There is separate work
> > ongoing for the
> > gpu_madvise functionality.
> > 
> > The Xe implementation of the "populate_mm()" callback is
> > still rudimentary and doesn't migrate from foreign devices. It
> > should be tuned to do
> > smarter choices.
> > 
> > Any feedback appreciated.
> > 
> > Patch overview:
> > Patch 1:
> > - Extends the way the Xe driver can compile out SVM support and
> > pagemaps.
> > Patch 2:
> > - Fixes an existing potential UAF in the Xe SVM code.
> > Patch 3:
> > - Introduces the drm_pagemap.c file and moves drm_pagemap
> > functionality to it.
> > Patch 4:
> > - Adds a populate_mm op to drm_pagemap.
> > Patch 5:
> > - Implement Xe's version of the populate_mm op.
> > Patch 6:
> > - Refcount struct drm_pagemap.
> > Patch 7:
> > - Cleanup patch.
> > Patch 8:
> > - Add a bo_remove callback for Xe, Used during device unbind.
> > Patch 9:
> > - Add a drm_pagemap utility to calculate a common owner structure
> > Patch 10:
> > - Adopt GPUSVM to a (sort of) dynamic owner.
> > Patch 11:
> > - Xe calculates the dev_private owner using the drm_pagemap
> > utility.
> > Patch 12:
> > - Update the Xe page-table code to handle per range mixed system /
> > device_private placement.
> > Patch 13:
> > - Modify GPUSVM to allow such placements.
> > Patch 14:
> > - Add a preferred pagemap to use by the Xe fault handler.
> > Patch 15:
> > - Add a utility that converts between drm_pagemaps and file-
> > descriptors and back.
> > Patch 16:
> > - Fix Xe so that also devices without fault capability can publish
> > drm_pagemaps.
> > Patch 17:
> > - Add the devmem_open UAPI, creating a drm_pagemap file descriptor
> > from a
> >   (device, region) pair.
> > Patch 18:
> > - (Only for POC) Add an GPU madvise prefer_devmem IOCTL.
> > Patch 19:
> > - (Only for POC) Implement pcie p2p DMA as a fast interconnect and
> > test.
> > 
> > Matthew Brost (1):
> >   drm/gpusvm, drm/pagemap: Move migration functionality to
> > drm_pagemap
> > 
> > Thomas Hellström (18):
> >   drm/xe: Introduce CONFIG_DRM_XE_GPUSVM
> >   drm/xe/svm: Fix a potential bo UAF
> >   drm/pagemap: Add a populate_mm op
> >   drm/xe: Implement and use the drm_pagemap populate_mm op
> >   drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap and
> > manage
> >     lifetime
> >   drm/pagemap: Get rid of the struct
> >     drm_pagemap_zdd::device_private_page_owner field
> >   drm/xe/bo: Add a bo remove callback
> >   drm/pagemap_util: Add a utility to assign an owner to a set of
> >     interconnected gpus
> >   drm/gpusvm, drm/xe: Move the device private owner to the
> >     drm_gpusvm_ctx
> >   drm/xe: Use the drm_pagemap_util helper to get a svm pagemap
> > owner
> >   drm/xe: Make the PT code handle placement per PTE rather than per
> > vma
> >     / range
> >   drm/gpusvm: Allow mixed mappings
> >   drm/xe: Add a preferred dpagemap
> >   drm/pagemap/util: Add file descriptors pointing to struct
> > drm_pagemap
> >   drm/xe/migrate: Allow xe_migrate_vram() also on non-pagefault
> > capable
> >     devices
> >   drm/xe/uapi: Add the devmem_open ioctl
> >   drm/xe/uapi: HAX: Add the xe_madvise_prefer_devmem IOCTL
> >   drm/xe: HAX: Use pcie p2p dma to test fast interconnect
> > 
> >  Documentation/gpu/rfc/gpusvm.rst     |  12 +-
> >  drivers/gpu/drm/Makefile             |   7 +-
> >  drivers/gpu/drm/drm_gpusvm.c         | 782 +---------------------
> >  drivers/gpu/drm/drm_pagemap.c        | 940
> > +++++++++++++++++++++++++++
> >  drivers/gpu/drm/drm_pagemap_util.c   | 203 ++++++
> >  drivers/gpu/drm/xe/Kconfig           |  24 +-
> >  drivers/gpu/drm/xe/Makefile          |   2 +-
> >  drivers/gpu/drm/xe/xe_bo.c           |  65 +-
> >  drivers/gpu/drm/xe/xe_bo.h           |   2 +
> >  drivers/gpu/drm/xe/xe_bo_types.h     |   2 +-
> >  drivers/gpu/drm/xe/xe_device.c       |   8 +
> >  drivers/gpu/drm/xe/xe_device_types.h |  30 +-
> >  drivers/gpu/drm/xe/xe_migrate.c      |   8 +-
> >  drivers/gpu/drm/xe/xe_pt.c           | 112 ++--
> >  drivers/gpu/drm/xe/xe_query.c        |   2 +-
> >  drivers/gpu/drm/xe/xe_svm.c          | 716 +++++++++++++++++---
> >  drivers/gpu/drm/xe/xe_svm.h          | 158 ++++-
> >  drivers/gpu/drm/xe/xe_tile.c         |  20 +-
> >  drivers/gpu/drm/xe/xe_tile.h         |  33 +
> >  drivers/gpu/drm/xe/xe_vm.c           |   6 +-
> >  drivers/gpu/drm/xe/xe_vm_types.h     |   7 +
> >  include/drm/drm_gpusvm.h             | 102 +--
> >  include/drm/drm_pagemap.h            | 190 +++++-
> >  include/drm/drm_pagemap_util.h       |  59 ++
> >  include/uapi/drm/xe_drm.h            |  39 ++
> >  25 files changed, 2458 insertions(+), 1071 deletions(-)
> >  create mode 100644 drivers/gpu/drm/drm_pagemap.c
> >  create mode 100644 drivers/gpu/drm/drm_pagemap_util.c
> >  create mode 100644 include/drm/drm_pagemap_util.h
> > 
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM
  2025-03-13 12:50   ` Thomas Hellström
@ 2025-03-13 12:57     ` Christian König
  2025-03-13 15:55       ` Thomas Hellström
  2025-03-17  9:20       ` Thomas Hellström
  0 siblings, 2 replies; 26+ messages in thread
From: Christian König @ 2025-03-13 12:57 UTC (permalink / raw)
  To: Thomas Hellström, intel-xe, dri-devel
  Cc: himal.prasad.ghimiray, apopple, airlied, Simona Vetter,
	felix.kuehling, Matthew Brost, dakr, Mrozek, Michal,
	Joonas Lahtinen

Am 13.03.25 um 13:50 schrieb Thomas Hellström:
> Hi, Christian
>
> On Thu, 2025-03-13 at 11:19 +0100, Christian König wrote:
>> Am 12.03.25 um 22:03 schrieb Thomas Hellström:
>>> This RFC implements and requests comments for a way to handle SVM
>>> with multi-device,
>>> typically with fast interconnects. It adds generic code and helpers
>>> in drm, and
>>> device-specific code for xe.
>>>
>>> For SVM, devices set up maps of device-private struct pages, using
>>> a struct dev_pagemap,
>>> The CPU virtual address space (mm), can then be set up using
>>> special page-table entries
>>> to point to such pages, but they can't be accessed directly by the
>>> CPU, but possibly
>>> by other devices using a fast interconnect. This series aims to
>>> provide helpers to
>>> identify pagemaps that take part in such a fast interconnect and to
>>> aid in migrating
>>> between them.
>>>
>>> This is initially done by augmenting the struct dev_pagemap with a
>>> struct drm_pagemap,
>>> and having the struct drm_pagemap implement a "populate_mm" method,
>>> where a region of
>>> the CPU virtual address space (mm) is populated with device_private
>>> pages from the
>>> dev_pagemap associated with the drm_pagemap, migrating data from
>>> system memory or other
>>> devices if necessary. The drm_pagemap_populate_mm() function is
>>> then typically called
>>> from a fault handler, using the struct drm_pagemap pointer of
>>> choice. It could be
>>> referencing a local drm_pagemap or a remote one. The migration is
>>> now completely done
>>> by drm_pagemap callbacks, (typically using a copy-engine local to
>>> the dev_pagemap local
>>> memory).
>> Up till here that makes sense. Maybe not necessary to be put into the
>> DRM layer, but that is an implementation detail.
>>
>>> In addition there are helpers to build a drm_pagemap UAPI using
>>> file-descripors
>>> representing struct drm_pagemaps, and a helper to register devices
>>> with a common
>>> fast interconnect. The UAPI is intended to be private to the
>>> device, but if drivers
>>> agree to identify struct drm_pagemaps by file descriptors one could
>>> in theory
>>> do cross-driver multi-device SVM if a use-case were found.
>> But this completely eludes me.
>>
>> Why would you want an UAPI for representing pagemaps as file
>> descriptors? Isn't it the kernel which enumerates the interconnects
>> of the devices?
>>
>> I mean we somehow need to expose those interconnects between devices
>> to userspace, e.g. like amdgpu does with it's XGMI connectors. But
>> that is static for the hardware (unless HW is hot removed/added) and
>> so I would assume exposed through sysfs.
> Thanks for the feedback.
>
> The idea here is not to expose the interconnects but rather have a way
> for user-space to identify a drm_pagemap and some level of access- and
> lifetime control.

Well that's what I get I just don't get why?

I mean when you want to have the pagemap as optional feature you can turn on and off I would say make that a sysfs file.

It's a global feature anyway and not bound in any way to the file descriptor, isn't it?

> For Xe, If an application wants to use a particular drm_pagemap it
> calls an ioctl:
>
> pagemap_fd = drm_xe_ioctl_pagemap_open(exporting_device_fd,
> memory_region);

Well should userspace deal with physical addresses here, or what exactly is memory_region here?

Regards,
Christian.

>
> And then when it's no longer used
> close(pagemap_fd)
>
> To use it for a memory range, the intended idea is call gpu madvise
> ioctl:
>  
> err = drm_xe_ioctl_gpu_madvise(local_device_fd, range, pagemap_fd);
>
> Now, if there is no fast interconnect between the two, the madvise call
> could just return an error. All this ofc assumes that user-space is
> somehow aware of the fast interconnect topology but how that is exposed
> is beyond the scope of this first series. (Suggestions welcome).
>
> The advantage of the above approach is
> 1) We get some level of access control. If the user doesn't have access
> to the exporting device, he/she can't obtain a pagemap file descriptor.
>
> 2) Lifetime control. The pagemaps are memory hungry, but also take
> considerable time to set up and tear down.
>
> 3) It's a driver-independent approach.
>
> One could ofc use a different approach by feeding the gpu_madvise()
> ioctl with a remote device file descriptor and whatever information is
> needed for the remote device to identify the drm_pagemap. That would
> not be driver independent, though. Not sure how important that is.
>
> /Thomas
>
>
>> Thanks,
>> Christian.
>>
>>> The implementation for the Xe driver uses dynamic pagemaps which
>>> are created on first
>>> use and removed 5s after the last reference is gone. Pagemaps are
>>> revoked on
>>> device unbind, and data is then migrated to system.
>>>
>>> Status:
>>> This is a POC series. It has been tested with an IGT test soon to
>>> be published, with a
>>> DG1 drm_pagemap and a BattleMage SVM client. There is separate work
>>> ongoing for the
>>> gpu_madvise functionality.
>>>
>>> The Xe implementation of the "populate_mm()" callback is
>>> still rudimentary and doesn't migrate from foreign devices. It
>>> should be tuned to do
>>> smarter choices.
>>>
>>> Any feedback appreciated.
>>>
>>> Patch overview:
>>> Patch 1:
>>> - Extends the way the Xe driver can compile out SVM support and
>>> pagemaps.
>>> Patch 2:
>>> - Fixes an existing potential UAF in the Xe SVM code.
>>> Patch 3:
>>> - Introduces the drm_pagemap.c file and moves drm_pagemap
>>> functionality to it.
>>> Patch 4:
>>> - Adds a populate_mm op to drm_pagemap.
>>> Patch 5:
>>> - Implement Xe's version of the populate_mm op.
>>> Patch 6:
>>> - Refcount struct drm_pagemap.
>>> Patch 7:
>>> - Cleanup patch.
>>> Patch 8:
>>> - Add a bo_remove callback for Xe, Used during device unbind.
>>> Patch 9:
>>> - Add a drm_pagemap utility to calculate a common owner structure
>>> Patch 10:
>>> - Adopt GPUSVM to a (sort of) dynamic owner.
>>> Patch 11:
>>> - Xe calculates the dev_private owner using the drm_pagemap
>>> utility.
>>> Patch 12:
>>> - Update the Xe page-table code to handle per range mixed system /
>>> device_private placement.
>>> Patch 13:
>>> - Modify GPUSVM to allow such placements.
>>> Patch 14:
>>> - Add a preferred pagemap to use by the Xe fault handler.
>>> Patch 15:
>>> - Add a utility that converts between drm_pagemaps and file-
>>> descriptors and back.
>>> Patch 16:
>>> - Fix Xe so that also devices without fault capability can publish
>>> drm_pagemaps.
>>> Patch 17:
>>> - Add the devmem_open UAPI, creating a drm_pagemap file descriptor
>>> from a
>>>   (device, region) pair.
>>> Patch 18:
>>> - (Only for POC) Add an GPU madvise prefer_devmem IOCTL.
>>> Patch 19:
>>> - (Only for POC) Implement pcie p2p DMA as a fast interconnect and
>>> test.
>>>
>>> Matthew Brost (1):
>>>   drm/gpusvm, drm/pagemap: Move migration functionality to
>>> drm_pagemap
>>>
>>> Thomas Hellström (18):
>>>   drm/xe: Introduce CONFIG_DRM_XE_GPUSVM
>>>   drm/xe/svm: Fix a potential bo UAF
>>>   drm/pagemap: Add a populate_mm op
>>>   drm/xe: Implement and use the drm_pagemap populate_mm op
>>>   drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap and
>>> manage
>>>     lifetime
>>>   drm/pagemap: Get rid of the struct
>>>     drm_pagemap_zdd::device_private_page_owner field
>>>   drm/xe/bo: Add a bo remove callback
>>>   drm/pagemap_util: Add a utility to assign an owner to a set of
>>>     interconnected gpus
>>>   drm/gpusvm, drm/xe: Move the device private owner to the
>>>     drm_gpusvm_ctx
>>>   drm/xe: Use the drm_pagemap_util helper to get a svm pagemap
>>> owner
>>>   drm/xe: Make the PT code handle placement per PTE rather than per
>>> vma
>>>     / range
>>>   drm/gpusvm: Allow mixed mappings
>>>   drm/xe: Add a preferred dpagemap
>>>   drm/pagemap/util: Add file descriptors pointing to struct
>>> drm_pagemap
>>>   drm/xe/migrate: Allow xe_migrate_vram() also on non-pagefault
>>> capable
>>>     devices
>>>   drm/xe/uapi: Add the devmem_open ioctl
>>>   drm/xe/uapi: HAX: Add the xe_madvise_prefer_devmem IOCTL
>>>   drm/xe: HAX: Use pcie p2p dma to test fast interconnect
>>>
>>>  Documentation/gpu/rfc/gpusvm.rst     |  12 +-
>>>  drivers/gpu/drm/Makefile             |   7 +-
>>>  drivers/gpu/drm/drm_gpusvm.c         | 782 +---------------------
>>>  drivers/gpu/drm/drm_pagemap.c        | 940
>>> +++++++++++++++++++++++++++
>>>  drivers/gpu/drm/drm_pagemap_util.c   | 203 ++++++
>>>  drivers/gpu/drm/xe/Kconfig           |  24 +-
>>>  drivers/gpu/drm/xe/Makefile          |   2 +-
>>>  drivers/gpu/drm/xe/xe_bo.c           |  65 +-
>>>  drivers/gpu/drm/xe/xe_bo.h           |   2 +
>>>  drivers/gpu/drm/xe/xe_bo_types.h     |   2 +-
>>>  drivers/gpu/drm/xe/xe_device.c       |   8 +
>>>  drivers/gpu/drm/xe/xe_device_types.h |  30 +-
>>>  drivers/gpu/drm/xe/xe_migrate.c      |   8 +-
>>>  drivers/gpu/drm/xe/xe_pt.c           | 112 ++--
>>>  drivers/gpu/drm/xe/xe_query.c        |   2 +-
>>>  drivers/gpu/drm/xe/xe_svm.c          | 716 +++++++++++++++++---
>>>  drivers/gpu/drm/xe/xe_svm.h          | 158 ++++-
>>>  drivers/gpu/drm/xe/xe_tile.c         |  20 +-
>>>  drivers/gpu/drm/xe/xe_tile.h         |  33 +
>>>  drivers/gpu/drm/xe/xe_vm.c           |   6 +-
>>>  drivers/gpu/drm/xe/xe_vm_types.h     |   7 +
>>>  include/drm/drm_gpusvm.h             | 102 +--
>>>  include/drm/drm_pagemap.h            | 190 +++++-
>>>  include/drm/drm_pagemap_util.h       |  59 ++
>>>  include/uapi/drm/xe_drm.h            |  39 ++
>>>  25 files changed, 2458 insertions(+), 1071 deletions(-)
>>>  create mode 100644 drivers/gpu/drm/drm_pagemap.c
>>>  create mode 100644 drivers/gpu/drm/drm_pagemap_util.c
>>>  create mode 100644 include/drm/drm_pagemap_util.h
>>>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM
  2025-03-13 12:57     ` Christian König
@ 2025-03-13 15:55       ` Thomas Hellström
  2025-03-17  9:20       ` Thomas Hellström
  1 sibling, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-13 15:55 UTC (permalink / raw)
  To: Christian König, intel-xe, dri-devel
  Cc: himal.prasad.ghimiray, apopple, airlied, Simona Vetter,
	felix.kuehling, Matthew Brost, dakr, Mrozek, Michal,
	Joonas Lahtinen

On Thu, 2025-03-13 at 13:57 +0100, Christian König wrote:
> Am 13.03.25 um 13:50 schrieb Thomas Hellström:
> > Hi, Christian
> > 
> > On Thu, 2025-03-13 at 11:19 +0100, Christian König wrote:
> > > Am 12.03.25 um 22:03 schrieb Thomas Hellström:
> > > > This RFC implements and requests comments for a way to handle
> > > > SVM
> > > > with multi-device,
> > > > typically with fast interconnects. It adds generic code and
> > > > helpers
> > > > in drm, and
> > > > device-specific code for xe.
> > > > 
> > > > For SVM, devices set up maps of device-private struct pages,
> > > > using
> > > > a struct dev_pagemap,
> > > > The CPU virtual address space (mm), can then be set up using
> > > > special page-table entries
> > > > to point to such pages, but they can't be accessed directly by
> > > > the
> > > > CPU, but possibly
> > > > by other devices using a fast interconnect. This series aims to
> > > > provide helpers to
> > > > identify pagemaps that take part in such a fast interconnect
> > > > and to
> > > > aid in migrating
> > > > between them.
> > > > 
> > > > This is initially done by augmenting the struct dev_pagemap
> > > > with a
> > > > struct drm_pagemap,
> > > > and having the struct drm_pagemap implement a "populate_mm"
> > > > method,
> > > > where a region of
> > > > the CPU virtual address space (mm) is populated with
> > > > device_private
> > > > pages from the
> > > > dev_pagemap associated with the drm_pagemap, migrating data
> > > > from
> > > > system memory or other
> > > > devices if necessary. The drm_pagemap_populate_mm() function is
> > > > then typically called
> > > > from a fault handler, using the struct drm_pagemap pointer of
> > > > choice. It could be
> > > > referencing a local drm_pagemap or a remote one. The migration
> > > > is
> > > > now completely done
> > > > by drm_pagemap callbacks, (typically using a copy-engine local
> > > > to
> > > > the dev_pagemap local
> > > > memory).
> > > Up till here that makes sense. Maybe not necessary to be put into
> > > the
> > > DRM layer, but that is an implementation detail.
> > > 
> > > > In addition there are helpers to build a drm_pagemap UAPI using
> > > > file-descripors
> > > > representing struct drm_pagemaps, and a helper to register
> > > > devices
> > > > with a common
> > > > fast interconnect. The UAPI is intended to be private to the
> > > > device, but if drivers
> > > > agree to identify struct drm_pagemaps by file descriptors one
> > > > could
> > > > in theory
> > > > do cross-driver multi-device SVM if a use-case were found.
> > > But this completely eludes me.
> > > 
> > > Why would you want an UAPI for representing pagemaps as file
> > > descriptors? Isn't it the kernel which enumerates the
> > > interconnects
> > > of the devices?
> > > 
> > > I mean we somehow need to expose those interconnects between
> > > devices
> > > to userspace, e.g. like amdgpu does with it's XGMI connectors.
> > > But
> > > that is static for the hardware (unless HW is hot removed/added)
> > > and
> > > so I would assume exposed through sysfs.
> > Thanks for the feedback.
> > 
> > The idea here is not to expose the interconnects but rather have a
> > way
> > for user-space to identify a drm_pagemap and some level of access-
> > and
> > lifetime control.
> 
> Well that's what I get I just don't get why?
> 
> I mean when you want to have the pagemap as optional feature you can
> turn on and off I would say make that a sysfs file.
> 
> It's a global feature anyway and not bound in any way to the file
> descriptor, isn't it?

As it is currently coded in Xe, the drm_pagemap is revoked when the the
struct files backing the fds are released. (After a delay that is, to
avoid repeatedly destroying and re-creating the pagemap).

Will take the sysfs switch back to the team and discuss what's the
preferred option.

> 
> > For Xe, If an application wants to use a particular drm_pagemap it
> > calls an ioctl:
> > 
> > pagemap_fd = drm_xe_ioctl_pagemap_open(exporting_device_fd,
> > memory_region);
> 
> Well should userspace deal with physical addresses here, or what
> exactly is memory_region here?

On some hardware we have multiple VRAM banks per device, one per
"tile". The memory_region selects which one to use.

/Thomas


> 
> Regards,
> Christian.
> 
> > 
> > And then when it's no longer used
> > close(pagemap_fd)
> > 
> > To use it for a memory range, the intended idea is call gpu madvise
> > ioctl:
> >  
> > err = drm_xe_ioctl_gpu_madvise(local_device_fd, range, pagemap_fd);
> > 
> > Now, if there is no fast interconnect between the two, the madvise
> > call
> > could just return an error. All this ofc assumes that user-space is
> > somehow aware of the fast interconnect topology but how that is
> > exposed
> > is beyond the scope of this first series. (Suggestions welcome).
> > 
> > The advantage of the above approach is
> > 1) We get some level of access control. If the user doesn't have
> > access
> > to the exporting device, he/she can't obtain a pagemap file
> > descriptor.
> > 
> > 2) Lifetime control. The pagemaps are memory hungry, but also take
> > considerable time to set up and tear down.
> > 
> > 3) It's a driver-independent approach.
> > 
> > One could ofc use a different approach by feeding the gpu_madvise()
> > ioctl with a remote device file descriptor and whatever information
> > is
> > needed for the remote device to identify the drm_pagemap. That
> > would
> > not be driver independent, though. Not sure how important that is.
> > 
> > /Thomas
> > 
> > 
> > > Thanks,
> > > Christian.
> > > 
> > > > The implementation for the Xe driver uses dynamic pagemaps
> > > > which
> > > > are created on first
> > > > use and removed 5s after the last reference is gone. Pagemaps
> > > > are
> > > > revoked on
> > > > device unbind, and data is then migrated to system.
> > > > 
> > > > Status:
> > > > This is a POC series. It has been tested with an IGT test soon
> > > > to
> > > > be published, with a
> > > > DG1 drm_pagemap and a BattleMage SVM client. There is separate
> > > > work
> > > > ongoing for the
> > > > gpu_madvise functionality.
> > > > 
> > > > The Xe implementation of the "populate_mm()" callback is
> > > > still rudimentary and doesn't migrate from foreign devices. It
> > > > should be tuned to do
> > > > smarter choices.
> > > > 
> > > > Any feedback appreciated.
> > > > 
> > > > Patch overview:
> > > > Patch 1:
> > > > - Extends the way the Xe driver can compile out SVM support and
> > > > pagemaps.
> > > > Patch 2:
> > > > - Fixes an existing potential UAF in the Xe SVM code.
> > > > Patch 3:
> > > > - Introduces the drm_pagemap.c file and moves drm_pagemap
> > > > functionality to it.
> > > > Patch 4:
> > > > - Adds a populate_mm op to drm_pagemap.
> > > > Patch 5:
> > > > - Implement Xe's version of the populate_mm op.
> > > > Patch 6:
> > > > - Refcount struct drm_pagemap.
> > > > Patch 7:
> > > > - Cleanup patch.
> > > > Patch 8:
> > > > - Add a bo_remove callback for Xe, Used during device unbind.
> > > > Patch 9:
> > > > - Add a drm_pagemap utility to calculate a common owner
> > > > structure
> > > > Patch 10:
> > > > - Adopt GPUSVM to a (sort of) dynamic owner.
> > > > Patch 11:
> > > > - Xe calculates the dev_private owner using the drm_pagemap
> > > > utility.
> > > > Patch 12:
> > > > - Update the Xe page-table code to handle per range mixed
> > > > system /
> > > > device_private placement.
> > > > Patch 13:
> > > > - Modify GPUSVM to allow such placements.
> > > > Patch 14:
> > > > - Add a preferred pagemap to use by the Xe fault handler.
> > > > Patch 15:
> > > > - Add a utility that converts between drm_pagemaps and file-
> > > > descriptors and back.
> > > > Patch 16:
> > > > - Fix Xe so that also devices without fault capability can
> > > > publish
> > > > drm_pagemaps.
> > > > Patch 17:
> > > > - Add the devmem_open UAPI, creating a drm_pagemap file
> > > > descriptor
> > > > from a
> > > >   (device, region) pair.
> > > > Patch 18:
> > > > - (Only for POC) Add an GPU madvise prefer_devmem IOCTL.
> > > > Patch 19:
> > > > - (Only for POC) Implement pcie p2p DMA as a fast interconnect
> > > > and
> > > > test.
> > > > 
> > > > Matthew Brost (1):
> > > >   drm/gpusvm, drm/pagemap: Move migration functionality to
> > > > drm_pagemap
> > > > 
> > > > Thomas Hellström (18):
> > > >   drm/xe: Introduce CONFIG_DRM_XE_GPUSVM
> > > >   drm/xe/svm: Fix a potential bo UAF
> > > >   drm/pagemap: Add a populate_mm op
> > > >   drm/xe: Implement and use the drm_pagemap populate_mm op
> > > >   drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap
> > > > and
> > > > manage
> > > >     lifetime
> > > >   drm/pagemap: Get rid of the struct
> > > >     drm_pagemap_zdd::device_private_page_owner field
> > > >   drm/xe/bo: Add a bo remove callback
> > > >   drm/pagemap_util: Add a utility to assign an owner to a set
> > > > of
> > > >     interconnected gpus
> > > >   drm/gpusvm, drm/xe: Move the device private owner to the
> > > >     drm_gpusvm_ctx
> > > >   drm/xe: Use the drm_pagemap_util helper to get a svm pagemap
> > > > owner
> > > >   drm/xe: Make the PT code handle placement per PTE rather than
> > > > per
> > > > vma
> > > >     / range
> > > >   drm/gpusvm: Allow mixed mappings
> > > >   drm/xe: Add a preferred dpagemap
> > > >   drm/pagemap/util: Add file descriptors pointing to struct
> > > > drm_pagemap
> > > >   drm/xe/migrate: Allow xe_migrate_vram() also on non-pagefault
> > > > capable
> > > >     devices
> > > >   drm/xe/uapi: Add the devmem_open ioctl
> > > >   drm/xe/uapi: HAX: Add the xe_madvise_prefer_devmem IOCTL
> > > >   drm/xe: HAX: Use pcie p2p dma to test fast interconnect
> > > > 
> > > >  Documentation/gpu/rfc/gpusvm.rst     |  12 +-
> > > >  drivers/gpu/drm/Makefile             |   7 +-
> > > >  drivers/gpu/drm/drm_gpusvm.c         | 782 +------------------
> > > > ---
> > > >  drivers/gpu/drm/drm_pagemap.c        | 940
> > > > +++++++++++++++++++++++++++
> > > >  drivers/gpu/drm/drm_pagemap_util.c   | 203 ++++++
> > > >  drivers/gpu/drm/xe/Kconfig           |  24 +-
> > > >  drivers/gpu/drm/xe/Makefile          |   2 +-
> > > >  drivers/gpu/drm/xe/xe_bo.c           |  65 +-
> > > >  drivers/gpu/drm/xe/xe_bo.h           |   2 +
> > > >  drivers/gpu/drm/xe/xe_bo_types.h     |   2 +-
> > > >  drivers/gpu/drm/xe/xe_device.c       |   8 +
> > > >  drivers/gpu/drm/xe/xe_device_types.h |  30 +-
> > > >  drivers/gpu/drm/xe/xe_migrate.c      |   8 +-
> > > >  drivers/gpu/drm/xe/xe_pt.c           | 112 ++--
> > > >  drivers/gpu/drm/xe/xe_query.c        |   2 +-
> > > >  drivers/gpu/drm/xe/xe_svm.c          | 716 +++++++++++++++++--
> > > > -
> > > >  drivers/gpu/drm/xe/xe_svm.h          | 158 ++++-
> > > >  drivers/gpu/drm/xe/xe_tile.c         |  20 +-
> > > >  drivers/gpu/drm/xe/xe_tile.h         |  33 +
> > > >  drivers/gpu/drm/xe/xe_vm.c           |   6 +-
> > > >  drivers/gpu/drm/xe/xe_vm_types.h     |   7 +
> > > >  include/drm/drm_gpusvm.h             | 102 +--
> > > >  include/drm/drm_pagemap.h            | 190 +++++-
> > > >  include/drm/drm_pagemap_util.h       |  59 ++
> > > >  include/uapi/drm/xe_drm.h            |  39 ++
> > > >  25 files changed, 2458 insertions(+), 1071 deletions(-)
> > > >  create mode 100644 drivers/gpu/drm/drm_pagemap.c
> > > >  create mode 100644 drivers/gpu/drm/drm_pagemap_util.c
> > > >  create mode 100644 include/drm/drm_pagemap_util.h
> > > > 
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 08/19] drm/xe/bo: Add a bo remove callback
  2025-03-12 21:04 ` [RFC PATCH 08/19] drm/xe/bo: Add a bo remove callback Thomas Hellström
@ 2025-03-14 13:05   ` Thomas Hellström
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-14 13:05 UTC (permalink / raw)
  To: intel-xe, dri-devel
  Cc: himal.prasad.ghimiray, apopple, airlied, Simona Vetter,
	felix.kuehling, Matthew Brost, Christian König, dakr,
	Mrozek, Michal, Joonas Lahtinen

On Wed, 2025-03-12 at 22:04 +0100, Thomas Hellström wrote:
> On device unbind, migrate exported bos, including pagemap bos to
> system. This allows importers to take proper action without
> disruption. In particular, SVM clients on remote devices may
> continue as if nothing happened, and can chose a different
> placement.
> 
> The evict_flags() placement is chosen in such a way that bos that
> aren't exported are purged.
> 
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Hmm. It seems like this patch acidentally got merged with another.
I'll separate and resend in v2.

Thanks,
Thomas



> ---
>  drivers/gpu/drm/drm_pagemap.c        | 113 ++++++--
>  drivers/gpu/drm/xe/xe_bo.c           |  53 +++-
>  drivers/gpu/drm/xe/xe_bo.h           |   2 +
>  drivers/gpu/drm/xe/xe_device.c       |   5 +
>  drivers/gpu/drm/xe/xe_device_types.h |  28 +-
>  drivers/gpu/drm/xe/xe_svm.c          | 412 ++++++++++++++++++++++---
> --
>  drivers/gpu/drm/xe/xe_svm.h          |  49 ++++
>  drivers/gpu/drm/xe/xe_tile.c         |  20 +-
>  drivers/gpu/drm/xe/xe_tile.h         |  28 +-
>  drivers/gpu/drm/xe/xe_vm_types.h     |   1 +
>  include/drm/drm_pagemap.h            |  53 +++-
>  11 files changed, 645 insertions(+), 119 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_pagemap.c
> b/drivers/gpu/drm/drm_pagemap.c
> index d1efcd78a023..dcb26328f94b 100644
> --- a/drivers/gpu/drm/drm_pagemap.c
> +++ b/drivers/gpu/drm/drm_pagemap.c
> @@ -97,6 +97,7 @@ drm_pagemap_zdd_alloc(struct drm_pagemap *dpagemap)
>  	kref_init(&zdd->refcount);
>  	zdd->devmem_allocation = NULL;
>  	zdd->dpagemap = dpagemap;
> +	kref_get(&dpagemap->ref);
>  
>  	return zdd;
>  }
> @@ -126,6 +127,7 @@ static void drm_pagemap_zdd_destroy(struct kref
> *ref)
>  	struct drm_pagemap_zdd *zdd =
>  		container_of(ref, struct drm_pagemap_zdd, refcount);
>  	struct drm_pagemap_devmem *devmem = zdd->devmem_allocation;
> +	struct drm_pagemap *dpagemap = zdd->dpagemap;
>  
>  	if (devmem) {
>  		complete_all(&devmem->detached);
> @@ -133,6 +135,7 @@ static void drm_pagemap_zdd_destroy(struct kref
> *ref)
>  			devmem->ops->devmem_release(devmem);
>  	}
>  	kfree(zdd);
> +	drm_pagemap_put(dpagemap);
>  }
>  
>  /**
> @@ -484,42 +487,113 @@ static int
> drm_pagemap_migrate_populate_ram_pfn(struct vm_area_struct *vas,
>  	return -ENOMEM;
>  }
>  
> +/**
> + * struct drm_pagemap_dev_hold - Struct to aid in drm_device
> release.
> + * @work: work struct for async release.
> + * @drm: drm device to put.
> + *
> + * When a struct drm_pagemap is released, we also need to release
> the
> + * reference it holds on the drm device. However, typically that
> needs
> + * to be done separately from a workqueue that is not removed in the
> + * drm device destructor since that would cause a deadlock flushing
> + * that workqueue. Each time a struct drm_pagemap is initialized
> + * (or re-initialized if cached) therefore allocate a separate work
> + * item using this struct, from which we put the drm device and
> + * associated module.
> + */
> +struct drm_pagemap_dev_hold {
> +	struct work_struct work;
> +	struct drm_device *drm;
> +};
> +
>  static void drm_pagemap_release(struct kref *ref)
>  {
>  	struct drm_pagemap *dpagemap = container_of(ref,
> typeof(*dpagemap), ref);
> +	struct drm_pagemap_dev_hold *dev_hold = dpagemap->dev_hold;
>  
> -	kfree(dpagemap);
> +	dpagemap->ops->destroy(dpagemap);
> +	schedule_work(&dev_hold->work);
> +}
> +
> +static void drm_pagemap_dev_unhold_work(struct work_struct *work)
> +{
> +	struct drm_pagemap_dev_hold *dev_hold =
> +		container_of(work, typeof(*dev_hold), work);
> +	struct drm_device *drm = dev_hold->drm;
> +	struct module *module = drm->driver->fops->owner;
> +
> +	drm_dev_put(drm);
> +	module_put(module);
> +	kfree(dev_hold);
> +}
> +
> +static struct drm_pagemap_dev_hold *
> +drm_pagemap_dev_hold(struct drm_pagemap *dpagemap)
> +{
> +	struct drm_pagemap_dev_hold *dev_hold;
> +	struct drm_device *drm = dpagemap->drm;
> +
> +	dev_hold = kzalloc(sizeof(*dev_hold), GFP_KERNEL);
> +	if (!dev_hold)
> +		return ERR_PTR(-ENOMEM);
> +
> +	INIT_WORK(&dev_hold->work, drm_pagemap_dev_unhold_work);
> +	dev_hold->drm = drm;
> +	(void)try_module_get(drm->driver->fops->owner);
> +	drm_dev_get(drm);
> +
> +	return dev_hold;
>  }
>  
>  /**
> - * drm_pagemap_create() - Create a struct drm_pagemap.
> - * @dev: Pointer to a struct device providing the device-private
> memory.
> - * @pagemap: Pointer to a pre-setup struct dev_pagemap providing the
> struct pages.
> - * @ops: Pointer to the struct drm_pagemap_ops.
> + * drm_pagemap_reinit() - Reinitialize a drm_pagemap
> + * @dpagemap: The drm_pagemap to reinitialize
>   *
> - * Allocate and initialize a struct drm_pagemap.
> + * Reinitialize a drm_pagemap, for which drm_pagemap_release
> + * has already been called. This interface is intended for the
> + * situation where the driver caches a destroyed drm_pagemap.
>   *
> - * Return: A refcounted pointer to a struct drm_pagemap on success.
> - * Error pointer on error.
> + * Return: 0 on success, negative error code on failure.
>   */
> -struct drm_pagemap *
> -drm_pagemap_create(struct device *dev,
> -		   struct dev_pagemap *pagemap,
> -		   const struct drm_pagemap_ops *ops)
> +int drm_pagemap_reinit(struct drm_pagemap *dpagemap)
>  {
> -	struct drm_pagemap *dpagemap = kzalloc(sizeof(*dpagemap),
> GFP_KERNEL);
> +	dpagemap->dev_hold = drm_pagemap_dev_hold(dpagemap);
> +	if (IS_ERR(dpagemap->dev_hold))
> +		return PTR_ERR(dpagemap->dev_hold);
>  
> -	if (!dpagemap)
> -		return ERR_PTR(-ENOMEM);
> +	kref_init(&dpagemap->ref);
> +	return 0;
> +}
> +EXPORT_SYMBOL(drm_pagemap_reinit);
>  
> +/**
> + * drm_pagemap_init() - Initialize a pre-allocated drm_pagemap
> + * @dpagemap: The drm_pagemap to initialize.
> + * @pagemap: The associated dev_pagemap providing the device
> + * private pages.
> + * @drm: The drm device. The drm_pagemap holds a reference on the
> + * drm_device and the module owning the drm_device until
> + * drm_pagemap_release(). This facilitates drm_pagemap exporting.
> + * @ops: The drm_pagemap ops.
> + *
> + * Initialize and take an initial reference on a drm_pagemap.
> + * After successful return, use drm_pagemap_put() to destroy.
> + *
> + ** Return: 0 on success, negative error code on error.
> + */
> +int drm_pagemap_init(struct drm_pagemap *dpagemap,
> +		     struct dev_pagemap *pagemap,
> +		     struct drm_device *drm,
> +		     const struct drm_pagemap_ops *ops)
> +{
>  	kref_init(&dpagemap->ref);
> -	dpagemap->dev = dev;
>  	dpagemap->ops = ops;
>  	dpagemap->pagemap = pagemap;
> +	dpagemap->drm = drm;
>  
> -	return dpagemap;
> +	return drm_pagemap_reinit(dpagemap);
>  }
> -EXPORT_SYMBOL(drm_pagemap_create);
> +EXPORT_SYMBOL(drm_pagemap_init);
>  
>  /**
>   * drm_pagemap_put() - Put a struct drm_pagemap reference
> @@ -530,7 +604,8 @@ EXPORT_SYMBOL(drm_pagemap_create);
>   */
>  void drm_pagemap_put(struct drm_pagemap *dpagemap)
>  {
> -	kref_put(&dpagemap->ref, drm_pagemap_release);
> +	if (dpagemap)
> +		kref_put(&dpagemap->ref, drm_pagemap_release);
>  }
>  EXPORT_SYMBOL(drm_pagemap_put);
>  
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 64f9c936eea0..390f90fbd366 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -55,6 +55,8 @@ static struct ttm_placement sys_placement = {
>  	.placement = &sys_placement_flags,
>  };
>  
> +static struct ttm_placement purge_placement;
> +
>  static const struct ttm_place tt_placement_flags[] = {
>  	{
>  		.fpfn = 0,
> @@ -281,6 +283,8 @@ int xe_bo_placement_for_flags(struct xe_device
> *xe, struct xe_bo *bo,
>  static void xe_evict_flags(struct ttm_buffer_object *tbo,
>  			   struct ttm_placement *placement)
>  {
> +	struct xe_device *xe = container_of(tbo->bdev, typeof(*xe),
> ttm);
> +	bool device_unplugged = drm_dev_is_unplugged(&xe->drm);
>  	struct xe_bo *bo;
>  
>  	if (!xe_bo_is_xe_bo(tbo)) {
> @@ -290,7 +294,7 @@ static void xe_evict_flags(struct
> ttm_buffer_object *tbo,
>  			return;
>  		}
>  
> -		*placement = sys_placement;
> +		*placement = device_unplugged ? purge_placement :
> sys_placement;
>  		return;
>  	}
>  
> @@ -300,6 +304,11 @@ static void xe_evict_flags(struct
> ttm_buffer_object *tbo,
>  		return;
>  	}
>  
> +	if (device_unplugged && !tbo->base.dma_buf) {
> +		*placement = purge_placement;
> +		return;
> +	}
> +
>  	/*
>  	 * For xe, sg bos that are evicted to system just triggers a
>  	 * rebind of the sg list upon subsequent validation to
> XE_PL_TT.
> @@ -657,11 +666,20 @@ static int xe_bo_move_dmabuf(struct
> ttm_buffer_object *ttm_bo,
>  	struct xe_ttm_tt *xe_tt = container_of(ttm_bo->ttm, struct
> xe_ttm_tt,
>  					       ttm);
>  	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
> +	bool device_unplugged = drm_dev_is_unplugged(&xe->drm);
>  	struct sg_table *sg;
>  
>  	xe_assert(xe, attach);
>  	xe_assert(xe, ttm_bo->ttm);
>  
> +	if (device_unplugged && new_res->mem_type == XE_PL_SYSTEM &&
> +	    ttm_bo->sg) {
> +		dma_resv_wait_timeout(ttm_bo->base.resv,
> DMA_RESV_USAGE_BOOKKEEP,
> +				      false, MAX_SCHEDULE_TIMEOUT);
> +		dma_buf_unmap_attachment(attach, ttm_bo->sg,
> DMA_BIDIRECTIONAL);
> +		ttm_bo->sg = NULL;
> +	}
> +
>  	if (new_res->mem_type == XE_PL_SYSTEM)
>  		goto out;
>  
> @@ -2945,6 +2963,39 @@ void
> xe_bo_runtime_pm_release_mmap_offset(struct xe_bo *bo)
>  	list_del_init(&bo->vram_userfault_link);
>  }
>  
> +/**
> + * xe_bo_remove() - Handle bos when the pci_device is about to be
> removed
> + * @xe: The xe device.
> + *
> + * On pci_device removal we need to drop all dma mappings and move
> + * the data of exported bos out to system. This includes SVM bos and
> + * exported dma-buf bos. This is done by evicting all bos, but
> + * the evict placement in xe_evict_flags() is chosen such that all
> + * bos except those mentioned are purged, and thus their memory
> + * is released.
> + *
> + * Pinned bos are not handled, though. Ideally they should be
> released
> + * using devm_ actions.
> + */
> +void xe_bo_remove(struct xe_device *xe)
> +{
> +	unsigned int mem_type;
> +	int ret;
> +
> +	/*
> +	 * Move pagemap bos and exported dma-buf to system.
> +	 */
> +	for (mem_type = XE_PL_VRAM1; mem_type >= XE_PL_TT; --
> mem_type) {
> +		struct ttm_resource_manager *man =
> +			ttm_manager_type(&xe->ttm, mem_type);
> +
> +		if (man) {
> +			ret = ttm_resource_manager_evict_all(&xe-
> >ttm, man);
> +			drm_WARN_ON(&xe->drm, ret);
> +		}
> +	}
> +}
> +
>  #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST)
>  #include "tests/xe_bo.c"
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index bda3fdd408da..22b1c63f9311 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -405,6 +405,8 @@ long xe_bo_shrink(struct ttm_operation_ctx *ctx,
> struct ttm_buffer_object *bo,
>  		  const struct xe_bo_shrink_flags flags,
>  		  unsigned long *scanned);
>  
> +void xe_bo_remove(struct xe_device *xe);
> +
>  #if IS_ENABLED(CONFIG_DRM_XE_KUNIT_TEST)
>  /**
>   * xe_bo_is_mem_type - Whether the bo currently resides in the given
> diff --git a/drivers/gpu/drm/xe/xe_device.c
> b/drivers/gpu/drm/xe/xe_device.c
> index b2f656b2a563..68de09db9ad5 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -54,6 +54,7 @@
>  #include "xe_query.h"
>  #include "xe_shrinker.h"
>  #include "xe_sriov.h"
> +#include "xe_svm.h"
>  #include "xe_tile.h"
>  #include "xe_ttm_stolen_mgr.h"
>  #include "xe_ttm_sys_mgr.h"
> @@ -925,6 +926,10 @@ void xe_device_remove(struct xe_device *xe)
>  	xe_display_unregister(xe);
>  
>  	drm_dev_unplug(&xe->drm);
> +
> +	xe_bo_remove(xe);
> +
> +	xe_pagemaps_remove(xe);
>  }
>  
>  void xe_device_shutdown(struct xe_device *xe)
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> b/drivers/gpu/drm/xe/xe_device_types.h
> index 40c6f88f5933..41ba05ae4cd5 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -110,19 +110,21 @@ struct xe_vram_region {
>  	/** @ttm: VRAM TTM manager */
>  	struct xe_ttm_vram_mgr ttm;
>  #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> -	/** @pagemap: Used to remap device memory as ZONE_DEVICE */
> -	struct dev_pagemap pagemap;
> -	/**
> -	 * @dpagemap: The struct drm_pagemap of the ZONE_DEVICE
> memory
> -	 * pages of this tile.
> -	 */
> -	struct drm_pagemap *dpagemap;
> -	/**
> -	 * @hpa_base: base host physical address
> -	 *
> -	 * This is generated when remap device memory as ZONE_DEVICE
> -	 */
> -	resource_size_t hpa_base;
> +	/** @pagemap_cache: Cached struct xe_pagemap for this memory
> region's memory. */
> +	struct xe_pagemap_cache {
> +		/** @pagemap_cache.pagemap_mutex: Protects
> @pagemap_cache.xpagemap. */
> +		struct mutex mutex;
> +		/** @pagemap_cache.xpagemap: Pointer to a struct
> xe_pagemap */
> +		struct xe_pagemap *xpagemap;
> +		/**
> +		 * @pagemap_cache.queued: Completed when 
> @pagemap_cache.xpagemap is queued
> +		 * for destruction.
> +		 * There's a short interval in between
> @pagemap_cache.xpagemap's refcount
> +		 * dropping to zero and when it's queued for
> destruction and
> +		 * the destruction job can be canceled.
> +		 */
> +		struct completion queued;
> +	} pagemap_cache;
>  #endif
>  };
>  
> diff --git a/drivers/gpu/drm/xe/xe_svm.c
> b/drivers/gpu/drm/xe/xe_svm.c
> index 37e1607052ed..c49bcfea5644 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -4,6 +4,8 @@
>   */
>  
>  #include <drm/drm_drv.h>
> +#include <drm/drm_managed.h>
> +#include <drm/drm_pagemap.h>
>  
>  #include "xe_bo.h"
>  #include "xe_gt_tlb_invalidation.h"
> @@ -17,6 +19,8 @@
>  #include "xe_vm.h"
>  #include "xe_vm_types.h"
>  
> +static int xe_svm_get_pagemaps(struct xe_vm *vm);
> +
>  static bool xe_svm_range_in_vram(struct xe_svm_range *range)
>  {
>  	/* Not reliable without notifier lock */
> @@ -345,28 +349,35 @@ static void
> xe_svm_garbage_collector_work_func(struct work_struct *w)
>  
>  #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
>  
> -static struct xe_vram_region *page_to_vr(struct page *page)
> +static struct xe_vram_region *xe_pagemap_to_vr(struct xe_pagemap
> *xpagemap)
>  {
> -	return container_of(page->pgmap, struct xe_vram_region,
> pagemap);
> +	return xpagemap->vr;
>  }
>  
> -static struct xe_tile *vr_to_tile(struct xe_vram_region *vr)
> +static struct xe_pagemap *xe_page_to_pagemap(struct page *page)
>  {
> -	return container_of(vr, struct xe_tile, mem.vram);
> +	return container_of(page->pgmap, struct xe_pagemap,
> pagemap);
>  }
>  
> -static u64 xe_vram_region_page_to_dpa(struct xe_vram_region *vr,
> -				      struct page *page)
> +static struct xe_vram_region *xe_page_to_vr(struct page *page)
>  {
> -	u64 dpa;
> -	struct xe_tile *tile = vr_to_tile(vr);
> +	return xe_pagemap_to_vr(xe_page_to_pagemap(page));
> +}
> +
> +static u64 xe_page_to_dpa(struct page *page)
> +{
> +	struct xe_pagemap *xpagemap = xe_page_to_pagemap(page);
> +	struct xe_vram_region *vr = xe_pagemap_to_vr(xpagemap);
> +	struct xe_tile *tile = xe_vr_to_tile(vr);
> +	u64 hpa_base = xpagemap->hpa_base;
>  	u64 pfn = page_to_pfn(page);
>  	u64 offset;
> +	u64 dpa;
>  
>  	xe_tile_assert(tile, is_device_private_page(page));
> -	xe_tile_assert(tile, (pfn << PAGE_SHIFT) >= vr->hpa_base);
> +	xe_tile_assert(tile, (pfn << PAGE_SHIFT) >= hpa_base);
>  
> -	offset = (pfn << PAGE_SHIFT) - vr->hpa_base;
> +	offset = (pfn << PAGE_SHIFT) - hpa_base;
>  	dpa = vr->dpa_base + offset;
>  
>  	return dpa;
> @@ -413,10 +424,10 @@ static int xe_svm_copy(struct page **pages,
> dma_addr_t *dma_addr,
>  			continue;
>  
>  		if (!vr && spage) {
> -			vr = page_to_vr(spage);
> -			tile = vr_to_tile(vr);
> +			vr = xe_page_to_vr(spage);
> +			tile = xe_vr_to_tile(vr);
>  		}
> -		XE_WARN_ON(spage && page_to_vr(spage) != vr);
> +		XE_WARN_ON(spage && xe_page_to_vr(spage) != vr);
>  
>  		/*
>  		 * CPU page and device page valid, capture physical
> address on
> @@ -424,7 +435,7 @@ static int xe_svm_copy(struct page **pages,
> dma_addr_t *dma_addr,
>  		 * device pages.
>  		 */
>  		if (dma_addr[i] && spage) {
> -			__vram_addr = xe_vram_region_page_to_dpa(vr,
> spage);
> +			__vram_addr = xe_page_to_dpa(spage);
>  			if (vram_addr == XE_VRAM_ADDR_INVALID) {
>  				vram_addr = __vram_addr;
>  				pos = i;
> @@ -547,12 +558,12 @@ static void xe_svm_devmem_release(struct
> drm_pagemap_devmem *devmem_allocation)
>  
>  static u64 block_offset_to_pfn(struct xe_vram_region *vr, u64
> offset)
>  {
> -	return PHYS_PFN(offset + vr->hpa_base);
> +	return PHYS_PFN(offset + vr->pagemap_cache.xpagemap-
> >hpa_base);
>  }
>  
>  static struct drm_buddy *tile_to_buddy(struct xe_tile *tile)
>  {
> -	return &tile->mem.vram.ttm.mm;
> +	return &xe_tile_to_vr(tile)->ttm.mm;
>  }
>  
>  static int xe_svm_populate_devmem_pfn(struct drm_pagemap_devmem
> *devmem_allocation,
> @@ -566,7 +577,7 @@ static int xe_svm_populate_devmem_pfn(struct
> drm_pagemap_devmem *devmem_allocati
>  
>  	list_for_each_entry(block, blocks, link) {
>  		struct xe_vram_region *vr = block->private;
> -		struct xe_tile *tile = vr_to_tile(vr);
> +		struct xe_tile *tile = xe_vr_to_tile(vr);
>  		struct drm_buddy *buddy = tile_to_buddy(tile);
>  		u64 block_pfn = block_offset_to_pfn(vr,
> drm_buddy_block_offset(block));
>  		int i;
> @@ -585,6 +596,11 @@ static const struct drm_pagemap_devmem_ops
> dpagemap_devmem_ops = {
>  	.copy_to_ram = xe_svm_copy_to_ram,
>  };
>  
> +#else
> +static int xe_svm_get_pagemaps(struct xe_vm *vm)
> +{
> +	return 0;
> +}
>  #endif
>  
>  static const struct drm_gpusvm_ops gpusvm_ops = {
> @@ -599,6 +615,26 @@ static const unsigned long fault_chunk_sizes[] =
> {
>  	SZ_4K,
>  };
>  
> +static void xe_pagemap_put(struct xe_pagemap *xpagemap)
> +{
> +	drm_pagemap_put(&xpagemap->dpagemap);
> +}
> +
> +static void xe_svm_put_pagemaps(struct xe_vm *vm)
> +{
> +	struct xe_device *xe = vm->xe;
> +	struct xe_tile *tile;
> +	int id;
> +
> +	for_each_tile(tile, xe, id) {
> +		struct xe_pagemap *xpagemap = vm->svm.pagemaps[id];
> +
> +		if (xpagemap)
> +			xe_pagemap_put(xpagemap);
> +		vm->svm.pagemaps[id] = NULL;
> +	}
> +}
> +
>  /**
>   * xe_svm_init() - SVM initialize
>   * @vm: The VM.
> @@ -616,13 +652,19 @@ int xe_svm_init(struct xe_vm *vm)
>  	INIT_WORK(&vm->svm.garbage_collector.work,
>  		  xe_svm_garbage_collector_work_func);
>  
> +	err = xe_svm_get_pagemaps(vm);
> +	if (err)
> +		return err;
> +
>  	err = drm_gpusvm_init(&vm->svm.gpusvm, "Xe SVM", &vm->xe-
> >drm,
>  			      current->mm, xe_svm_devm_owner(vm-
> >xe), 0,
>  			      vm->size,
> xe_modparam.svm_notifier_size * SZ_1M,
>  			      &gpusvm_ops, fault_chunk_sizes,
>  			      ARRAY_SIZE(fault_chunk_sizes));
> -	if (err)
> +	if (err) {
> +		xe_svm_put_pagemaps(vm);
>  		return err;
> +	}
>  
>  	drm_gpusvm_driver_set_lock(&vm->svm.gpusvm, &vm->lock);
>  
> @@ -639,6 +681,7 @@ void xe_svm_close(struct xe_vm *vm)
>  {
>  	xe_assert(vm->xe, xe_vm_is_closed(vm));
>  	flush_work(&vm->svm.garbage_collector.work);
> +	xe_svm_put_pagemaps(vm);
>  }
>  
>  /**
> @@ -661,20 +704,16 @@ static bool xe_svm_range_is_valid(struct
> xe_svm_range *range,
>  }
>  
>  #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> -static struct xe_vram_region *tile_to_vr(struct xe_tile *tile)
> -{
> -	return &tile->mem.vram;
> -}
>  
>  static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
>  				      unsigned long start, unsigned
> long end,
>  				      struct mm_struct *mm)
>  {
> -	struct xe_tile *tile = container_of(dpagemap->pagemap,
> typeof(*tile),
> -					    mem.vram.pagemap);
> +	struct xe_pagemap *xpagemap = container_of(dpagemap,
> typeof(*xpagemap), dpagemap);
> +	struct xe_vram_region *vr = xe_pagemap_to_vr(xpagemap);
> +	struct xe_tile *tile = xe_vr_to_tile(vr);
>  	struct xe_device *xe = tile_to_xe(tile);
>  	struct device *dev = xe->drm.dev;
> -	struct xe_vram_region *vr = tile_to_vr(tile);
>  	struct drm_buddy_block *block;
>  	struct list_head *blocks;
>  	struct xe_bo *bo;
> @@ -700,7 +739,7 @@ static int xe_drm_pagemap_populate_mm(struct
> drm_pagemap *dpagemap,
>  
>  	drm_pagemap_devmem_init(&bo->devmem_allocation, dev, mm,
>  				&dpagemap_devmem_ops,
> -				tile->mem.vram.dpagemap,
> +				dpagemap,
>  				end - start);
>  
>  	blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)-
> >blocks;
> @@ -896,12 +935,12 @@ xe_drm_pagemap_device_map(struct drm_pagemap
> *dpagemap,
>  			  unsigned int order,
>  			  enum dma_data_direction dir)
>  {
> -	struct device *pgmap_dev = dpagemap->dev;
> +	struct device *pgmap_dev = dpagemap->drm->dev;
>  	enum drm_interconnect_protocol prot;
>  	dma_addr_t addr;
>  
>  	if (pgmap_dev == dev) {
> -		addr = xe_vram_region_page_to_dpa(page_to_vr(page),
> page);
> +		addr = xe_page_to_dpa(page);
>  		prot = XE_INTERCONNECT_VRAM;
>  	} else {
>  		addr = DMA_MAPPING_ERROR;
> @@ -911,73 +950,306 @@ xe_drm_pagemap_device_map(struct drm_pagemap
> *dpagemap,
>  	return drm_pagemap_device_addr_encode(addr, prot, order,
> dir);
>  }
>  
> +static void xe_pagemap_fini(struct xe_pagemap *xpagemap)
> +{
> +	struct dev_pagemap *pagemap = &xpagemap->pagemap;
> +	struct device *dev = xpagemap->dpagemap.drm->dev;
> +
> +	WRITE_ONCE(xpagemap->unplugged, true);
> +	if (xpagemap->hpa_base) {
> +		devm_memunmap_pages(dev, pagemap);
> +		xpagemap->hpa_base = 0;
> +	}
> +
> +	if (pagemap->range.start) {
> +		devm_release_mem_region(dev, pagemap->range.start,
> +					pagemap->range.end -
> pagemap->range.start + 1);
> +		pagemap->range.start = 0;
> +	}
> +}
> +
> +static void xe_pagemap_destroy_work(struct work_struct *work)
> +{
> +	struct xe_pagemap *xpagemap = container_of(work,
> typeof(*xpagemap), destroy_work.work);
> +	struct xe_pagemap_cache *cache = xpagemap->cache;
> +
> +	mutex_lock(&cache->mutex);
> +	if (cache->xpagemap == xpagemap)
> +		cache->xpagemap = NULL;
> +	mutex_unlock(&cache->mutex);
> +
> +	xe_pagemap_fini(xpagemap);
> +	kfree(xpagemap);
> +}
> +
> +static void xe_pagemap_destroy(struct drm_pagemap *dpagemap)
> +{
> +	struct xe_pagemap *xpagemap = container_of(dpagemap,
> typeof(*xpagemap), dpagemap);
> +	struct xe_device *xe = to_xe_device(dpagemap->drm);
> +
> +	/* Keep the pagemap cached for 5s, unless the device is
> unplugged. */
> +	queue_delayed_work(xe->unordered_wq, &xpagemap-
> >destroy_work,
> +			   READ_ONCE(xpagemap->unplugged) ? 0 :
> secs_to_jiffies(5));
> +
> +	complete_all(&xpagemap->cache->queued);
> +}
> +
>  static const struct drm_pagemap_ops xe_drm_pagemap_ops = {
>  	.device_map = xe_drm_pagemap_device_map,
>  	.populate_mm = xe_drm_pagemap_populate_mm,
> +	.destroy = xe_pagemap_destroy,
>  };
>  
>  /**
> - * xe_devm_add: Remap and provide memmap backing for device memory
> - * @tile: tile that the memory region belongs to
> - * @vr: vram memory region to remap
> + * xe_pagemap_create() - Create a struct xe_pagemap object
> + * @xe: The xe device.
> + * @cache: Back-pointer to the struct xe_pagemap_cache.
> + * @vr: Back-pointer to the struct xe_vram_region.
>   *
> - * This remap device memory to host physical address space and
> create
> - * struct page to back device memory
> + * Allocate and initialize a struct xe_pagemap. On successful
> + * return, drm_pagemap_put() on the embedded struct drm_pagemap
> + * should be used to unreference.
>   *
> - * Return: 0 on success standard error code otherwise
> + * Return: Pointer to a struct xe_pagemap if successful. Error
> pointer
> + * on failure.
>   */
> -int xe_devm_add(struct xe_tile *tile, struct xe_vram_region *vr)
> +struct xe_pagemap *xe_pagemap_create(struct xe_device *xe, struct
> xe_pagemap_cache *cache,
> +				     struct xe_vram_region *vr)
>  {
> -	struct xe_device *xe = tile_to_xe(tile);
> -	struct device *dev = &to_pci_dev(xe->drm.dev)->dev;
> +	struct device *dev = xe->drm.dev;
> +	struct xe_pagemap *xpagemap;
> +	struct dev_pagemap *pagemap;
> +	struct drm_pagemap *dpagemap;
>  	struct resource *res;
>  	void *addr;
> -	int ret;
> +	int err;
> +
> +	xpagemap = kzalloc(sizeof(*xpagemap), GFP_KERNEL);
> +	if (!xpagemap)
> +		return ERR_PTR(-ENOMEM);
> +
> +	pagemap = &xpagemap->pagemap;
> +	dpagemap = &xpagemap->dpagemap;
> +	INIT_DELAYED_WORK(&xpagemap->destroy_work,
> xe_pagemap_destroy_work);
> +	xpagemap->cache = cache;
> +	xpagemap->vr = vr;
> +
> +	err = drm_pagemap_init(dpagemap, pagemap, &xe->drm,
> &xe_drm_pagemap_ops);
> +	if (err)
> +		goto out_no_dpagemap;
>  
>  	res = devm_request_free_mem_region(dev, &iomem_resource,
>  					   vr->usable_size);
>  	if (IS_ERR(res)) {
> -		ret = PTR_ERR(res);
> -		return ret;
> +		err = PTR_ERR(res);
> +		goto out_err;
>  	}
>  
> -	vr->dpagemap = drm_pagemap_create(dev, &vr->pagemap,
> -					  &xe_drm_pagemap_ops);
> -	if (IS_ERR(vr->dpagemap)) {
> -		drm_err(&xe->drm, "Failed to create drm_pagemap tile
> %d memory: %pe\n",
> -			tile->id, vr->dpagemap);
> -		ret = PTR_ERR(vr->dpagemap);
> -		goto out_no_dpagemap;
> -	}
> -
> -	vr->pagemap.type = MEMORY_DEVICE_PRIVATE;
> -	vr->pagemap.range.start = res->start;
> -	vr->pagemap.range.end = res->end;
> -	vr->pagemap.nr_range = 1;
> -	vr->pagemap.ops = drm_pagemap_pagemap_ops_get();
> -	vr->pagemap.owner = xe_svm_devm_owner(xe);
> -	addr = devm_memremap_pages(dev, &vr->pagemap);
> +	pagemap->type = MEMORY_DEVICE_PRIVATE;
> +	pagemap->range.start = res->start;
> +	pagemap->range.end = res->end;
> +	pagemap->nr_range = 1;
> +	pagemap->owner = xe_svm_devm_owner(xe);
> +	pagemap->ops = drm_pagemap_pagemap_ops_get();
> +	addr = devm_memremap_pages(dev, pagemap);
>  	if (IS_ERR(addr)) {
> -		ret = PTR_ERR(addr);
> -		drm_err(&xe->drm, "Failed to remap tile %d memory,
> errno %pe\n",
> -			tile->id, ERR_PTR(ret));
> -		goto out_failed_memremap;
> +		err = PTR_ERR(addr);
> +		goto out_err;
>  	}
> -	vr->hpa_base = res->start;
> +	xpagemap->hpa_base = res->start;
> +	return xpagemap;
>  
> -	drm_dbg(&xe->drm, "Added tile %d memory [%llx-%llx] to devm,
> remapped to %pr\n",
> -		tile->id, vr->io_start, vr->io_start + vr-
> >usable_size, res);
> -	return 0;
> +out_err:
> +	drm_pagemap_put(dpagemap);
> +	return ERR_PTR(err);
>  
> -out_failed_memremap:
> -	drm_pagemap_put(vr->dpagemap);
>  out_no_dpagemap:
> -	devm_release_mem_region(dev, res->start,
> resource_size(res));
> -	return ret;
> +	kfree(xpagemap);
> +	return ERR_PTR(err);
>  }
> -#else
> -int xe_devm_add(struct xe_tile *tile, struct xe_vram_region *vr)
> +
> +static struct xe_pagemap *
> +xe_pagemap_find_or_create(struct xe_device *xe, struct
> xe_pagemap_cache *cache,
> +			  struct xe_vram_region *vr);
> +
> +static int xe_svm_get_pagemaps(struct xe_vm *vm)
>  {
> +	struct xe_device *xe = vm->xe;
> +	struct xe_pagemap *xpagemap;
> +	struct xe_tile *tile;
> +	int id;
> +
> +	for_each_tile(tile, xe, id) {
> +		struct xe_vram_region *vr;
> +
> +		if (!((BIT(id) << 1) & xe->info.mem_region_mask))
> +			continue;
> +
> +		vr = xe_tile_to_vr(tile);
> +		xpagemap = xe_pagemap_find_or_create(xe, &vr-
> >pagemap_cache, vr);
> +		if (IS_ERR(xpagemap))
> +			break;
> +		vm->svm.pagemaps[id] = xpagemap;
> +	}
> +
> +	if (IS_ERR(xpagemap)) {
> +		xe_svm_put_pagemaps(vm);
> +		return PTR_ERR(xpagemap);
> +	}
> +
>  	return 0;
>  }
> +
> +/**
> + * xe_pagemaps_remove() - Device remove work for the xe pagemaps
> + * @xe: The xe device
> + *
> + * This function needs to be run as part of the device remove
> (unplug)
> + * sequence to ensure that divice-private pages allocated using the
> + * xe pagemaps are not used anymore and that the dev_pagemaps are
> + * unregistered.
> + *
> + * The function needs to be called *after* the call to
> drm_dev_unplug()
> + * to ensure any calls to drm_pagemap_populate_mm() will return -
> ENODEV.
> + *
> + * Note that the pagemaps' references to the drm device and hence
> the
> + * xe device will remain until the pagemaps are destroyed.
> + */
> +void xe_pagemaps_remove(struct xe_device *xe)
> +{
> +	unsigned int id, mem_type;
> +	struct xe_tile *tile;
> +	int ret;
> +
> +	/* Migrate all PTEs of this pagemap to system */
> +	for (mem_type = XE_PL_VRAM1; mem_type >= XE_PL_TT; --
> mem_type) {
> +		struct ttm_resource_manager *man =
> +			ttm_manager_type(&xe->ttm, mem_type);
> +
> +		if (man) {
> +			ret = ttm_resource_manager_evict_all(&xe-
> >ttm, man);
> +			drm_WARN_ON(&xe->drm, ret);
> +		}
> +	}
> +
> +	/* Remove the device pages themselves */
> +	for_each_tile(tile, xe, id) {
> +		struct xe_pagemap_cache *cache;
> +
> +		if (!((BIT(id) << 1) & xe->info.mem_region_mask))
> +			continue;
> +
> +		cache = &tile->mem.vram.pagemap_cache;
> +		mutex_lock(&cache->mutex);
> +		if (cache->xpagemap)
> +			xe_pagemap_fini(cache->xpagemap);
> +		/* Nobody can resurrect, since the device is
> unplugged. */
> +		mutex_unlock(&cache->mutex);
> +	}
> +}
> +
> +static void xe_pagemap_cache_fini(struct drm_device *drm, void *arg)
> +{
> +	struct xe_pagemap_cache *cache = arg;
> +	struct xe_pagemap *xpagemap;
> +
> +	wait_for_completion(&cache->queued);
> +	mutex_lock(&cache->mutex);
> +	xpagemap = cache->xpagemap;
> +	if (xpagemap && cancel_delayed_work(&xpagemap-
> >destroy_work)) {
> +		mutex_unlock(&cache->mutex);
> +		xe_pagemap_destroy_work(&xpagemap-
> >destroy_work.work);
> +		return;
> +	}
> +	mutex_unlock(&cache->mutex);
> +	flush_workqueue(to_xe_device(drm)->unordered_wq);
> +	mutex_destroy(&cache->mutex);
> +}
> +
> +/**
> + * xe_pagemap_cache_init() - Initialize a struct xe_pagemap_cache
> + * @drm: Pointer to the struct drm_device
> + * @cache: Pointer to a struct xe_pagemap_cache
> + *
> + * Initialize a struct xe_pagemap_cache and if successful, register
> a cleanup
> + * function to be run at xe/drm device destruction.
> + *
> + * Return: 0 on success, negative error code on error.
> + */
> +int xe_pagemap_cache_init(struct drm_device *drm, struct
> xe_pagemap_cache *cache)
> +{
> +	mutex_init(&cache->mutex);
> +	init_completion(&cache->queued);
> +	complete_all(&cache->queued);
> +	return drmm_add_action_or_reset(drm, xe_pagemap_cache_fini,
> cache);
> +}
> +
> +static struct xe_pagemap *xe_pagemap_get_unless_zero(struct
> xe_pagemap *xpagemap)
> +{
> +	return (xpagemap && drm_pagemap_get_unless_zero(&xpagemap-
> >dpagemap)) ? xpagemap : NULL;
> +}
> +
> +/**
> + * xe_pagemap_find_or_create() - Find or create a struct xe_pagemap
> + * @xe: The xe device.
> + * @cache: The struct xe_pagemap_cache.
> + * @vr: The VRAM region.
> + *
> + * Check if there is an already used xe_pagemap for this tile, and
> in that case,
> + * return it.
> + * If not, check if there is a cached xe_pagemap for this tile, and
> in that case,
> + * cancel its destruction, re-initialize it and return it.
> + * Finally if there is no cached or already used pagemap, create one
> and
> + * register it in the tile's pagemap cache.
> + *
> + * Note that this function is typically called from within an IOCTL,
> and waits are
> + * therefore carried out interruptible if possible.
> + *
> + * Return: A pointer to a struct xe_pagemap if successful, Error
> pointer on failure.
> + */
> +static struct xe_pagemap *
> +xe_pagemap_find_or_create(struct xe_device *xe, struct
> xe_pagemap_cache *cache,
> +			  struct xe_vram_region *vr)
> +{
> +	struct xe_pagemap *xpagemap;
> +	int err;
> +
> +	err = mutex_lock_interruptible(&cache->mutex);
> +	if (err)
> +		return ERR_PTR(err);
> +
> +	xpagemap = cache->xpagemap;
> +	if (xpagemap && !xe_pagemap_get_unless_zero(xpagemap)) {
> +		/* Wait for the destroy work to get queued before
> canceling it! */
> +		err = wait_for_completion_interruptible(&cache-
> >queued);
> +		if (err) {
> +			mutex_unlock(&cache->mutex);
> +			return ERR_PTR(err);
> +		}
> +
> +		if (cancel_delayed_work(&xpagemap->destroy_work)) {
> +			err = drm_pagemap_reinit(&xpagemap-
> >dpagemap);
> +			if (!err) {
> +				reinit_completion(&cache->queued);
> +				goto out_unlock;
> +			}
> +
> +			queue_delayed_work(xe->unordered_wq,
> &xpagemap->destroy_work, 0);
> +		}
> +
> +		cache->xpagemap = NULL;
> +		xpagemap = NULL;
> +	}
> +	if (!xpagemap) {
> +		xpagemap = xe_pagemap_create(xe, cache, vr);
> +		if (IS_ERR(xpagemap))
> +			goto out_unlock;
> +
> +		cache->xpagemap = xpagemap;
> +		reinit_completion(&cache->queued);
> +	}
> +out_unlock:
> +	mutex_unlock(&cache->mutex);
> +	return xpagemap;
> +}
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_svm.h
> b/drivers/gpu/drm/xe/xe_svm.h
> index c32b6d46ecf1..19469fd91666 100644
> --- a/drivers/gpu/drm/xe/xe_svm.h
> +++ b/drivers/gpu/drm/xe/xe_svm.h
> @@ -13,7 +13,11 @@
>  
>  #define XE_INTERCONNECT_VRAM DRM_INTERCONNECT_DRIVER
>  
> +struct drm_device;
> +struct drm_file;
> +
>  struct xe_bo;
> +struct xe_device;
>  struct xe_vram_region;
>  struct xe_tile;
>  struct xe_vm;
> @@ -45,6 +49,28 @@ struct xe_svm_range {
>  	u8 skip_migrate	:1;
>  };
>  
> +/**
> + * struct xe_pagemap - Manages xe device_private memory for SVM.
> + * @pagemap: The struct dev_pagemap providing the struct pages.
> + * @dpagemap: The drm_pagemap managing allocation and migration.
> + * @destroy_work: Handles asnynchronous destruction and caching.
> + * @hpa_base: The host physical address base for the managemd
> memory.
> + * @cache: Backpointer to the struct xe_pagemap_cache for the memory
> region.
> + * @vr: Backpointer to the xe_vram region.
> + * @unplugged: Advisory only information whether the device owning
> this
> + * pagemap has been unplugged. This field is typically used for
> caching
> + * time determination.
> + */
> +struct xe_pagemap {
> +	struct dev_pagemap pagemap;
> +	struct drm_pagemap dpagemap;
> +	struct delayed_work destroy_work;
> +	resource_size_t hpa_base;
> +	struct xe_pagemap_cache *cache;
> +	struct xe_vram_region *vr;
> +	bool unplugged;
> +};
> +
>  /**
>   * xe_svm_range_pages_valid() - SVM range pages valid
>   * @range: SVM range
> @@ -95,11 +121,16 @@ static inline bool
> xe_svm_range_has_dma_mapping(struct xe_svm_range *range)
>  #define xe_svm_notifier_unlock(vm__)	\
>  	drm_gpusvm_notifier_unlock(&(vm__)->svm.gpusvm)
>  
> +struct xe_pagemap *
> +xe_pagemap_create(struct xe_device *xe, struct xe_pagemap_cache
> *cache,
> +		  struct xe_vram_region *vr);
> +
>  #else
>  #include <linux/interval_tree.h>
>  
>  struct drm_pagemap_device_addr;
>  struct xe_bo;
> +struct xe_device;
>  struct xe_vm;
>  struct xe_vma;
>  struct xe_tile;
> @@ -178,5 +209,23 @@ static inline void xe_svm_notifier_lock(struct
> xe_vm *vm)
>  static inline void xe_svm_notifier_unlock(struct xe_vm *vm)
>  {
>  }
> +
>  #endif
> +
> +#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> +
> +int xe_pagemap_cache_init(struct drm_device *drm, struct
> xe_pagemap_cache *cache);
> +
> +void xe_pagemaps_remove(struct xe_device *xe);
> +
> +#else
> +
> +#define xe_pagemap_cache_init(...) 0
> +
> +static inline void xe_pagemaps_remove(struct xe_device *xe)
> +{
> +}
> +
> +#endif
> +
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_tile.c
> b/drivers/gpu/drm/xe/xe_tile.c
> index 0771acbbf367..f5d9d56418ee 100644
> --- a/drivers/gpu/drm/xe/xe_tile.c
> +++ b/drivers/gpu/drm/xe/xe_tile.c
> @@ -161,7 +161,6 @@ static int tile_ttm_mgr_init(struct xe_tile
> *tile)
>   */
>  int xe_tile_init_noalloc(struct xe_tile *tile)
>  {
> -	struct xe_device *xe = tile_to_xe(tile);
>  	int err;
>  
>  	err = tile_ttm_mgr_init(tile);
> @@ -170,8 +169,9 @@ int xe_tile_init_noalloc(struct xe_tile *tile)
>  
>  	xe_wa_apply_tile_workarounds(tile);
>  
> -	if (xe->info.has_usm && IS_DGFX(xe))
> -		xe_devm_add(tile, &tile->mem.vram);
> +	err = xe_pagemap_cache_init(&tile_to_xe(tile)->drm, &tile-
> >mem.vram.pagemap_cache);
> +	if (err)
> +		return err;
>  
>  	return xe_tile_sysfs_init(tile);
>  }
> @@ -188,3 +188,17 @@ void xe_tile_migrate_wait(struct xe_tile *tile)
>  {
>  	xe_migrate_wait(tile->migrate);
>  }
> +
> +#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> +/**
> + * xe_tile_local_pagemap() - Return a pointer to the tile's local
> drm_pagemap if any
> + * @tile: The tile.
> + *
> + * Return: A pointer to the tile's local drm_pagemap, or NULL if
> local pagemap
> + * support has been compiled out.
> + */
> +struct drm_pagemap *xe_tile_local_pagemap(struct xe_tile *tile)
> +{
> +	return &xe_tile_to_vr(tile)->pagemap_cache.xpagemap-
> >dpagemap;
> +}
> +#endif
> diff --git a/drivers/gpu/drm/xe/xe_tile.h
> b/drivers/gpu/drm/xe/xe_tile.h
> index 1d42b235c322..375b8323cda6 100644
> --- a/drivers/gpu/drm/xe/xe_tile.h
> +++ b/drivers/gpu/drm/xe/xe_tile.h
> @@ -8,6 +8,7 @@
>  
>  #include "xe_device_types.h"
>  
> +struct xe_pagemap;
>  struct xe_tile;
>  
>  int xe_tile_init_early(struct xe_tile *tile, struct xe_device *xe,
> u8 id);
> @@ -16,11 +17,32 @@ int xe_tile_init(struct xe_tile *tile);
>  
>  void xe_tile_migrate_wait(struct xe_tile *tile);
>  
> -#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> -static inline struct drm_pagemap *xe_tile_local_pagemap(struct
> xe_tile *tile)
> +/**
> + * xe_vr_to_tile() - Return the struct xe_tile pointer from a
> + * struct xe_vram_region pointer.
> + * @vr: The xe_vram_region.
> + *
> + * Return: Pointer to the struct xe_tile embedding *@vr.
> + */
> +static inline struct xe_tile *xe_vr_to_tile(struct xe_vram_region
> *vr)
>  {
> -	return tile->mem.vram.dpagemap;
> +	return container_of(vr, struct xe_tile, mem.vram);
>  }
> +
> +/**
> + * xe_tile_to_vr() - Return the struct xe_vram_region pointer from a
> + * struct xe_tile pointer
> + * @tile: Pointer to the struct xe_tile.
> + *
> + * Return: Pointer to the struct xe_vram_region embedded in *@tile.
> + */
> +static inline struct xe_vram_region *xe_tile_to_vr(struct xe_tile
> *tile)
> +{
> +	return &tile->mem.vram;
> +}
> +
> +#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> +struct drm_pagemap *xe_tile_local_pagemap(struct xe_tile *tile);
>  #else
>  static inline struct drm_pagemap *xe_tile_local_pagemap(struct
> xe_tile *tile)
>  {
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h
> b/drivers/gpu/drm/xe/xe_vm_types.h
> index 84fa41b9fa20..08baea03df00 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -168,6 +168,7 @@ struct xe_vm {
>  			 */
>  			struct work_struct work;
>  		} garbage_collector;
> +		struct xe_pagemap
> *pagemaps[XE_MAX_TILES_PER_DEVICE];
>  	} svm;
>  
>  	struct xe_device *xe;
> diff --git a/include/drm/drm_pagemap.h b/include/drm/drm_pagemap.h
> index 49f2e0b6c699..9f758a46988a 100644
> --- a/include/drm/drm_pagemap.h
> +++ b/include/drm/drm_pagemap.h
> @@ -120,6 +120,21 @@ struct drm_pagemap_ops {
>  	int (*populate_mm)(struct drm_pagemap *dpagemap,
>  			   unsigned long start, unsigned long end,
>  			   struct mm_struct *mm);
> +
> +	/**
> +	 * @destroy: Uninitialize a struct drm_pagemap.
> +	 * @dpagemap: The struct drm_pagemap to uninitialize.
> +	 *
> +	 * Uninitialize the drm_pagemap, potentially retaining it in
> +	 * a cache for re-initialization. This callback may be
> called
> +	 * with page locks held and typicall needs to queue any
> +	 * destruction or caching work on a workqueue to avoid
> locking
> +	 * order inversions. Since the drm_pagemap code also may put
> +	 * the owning device immediately after this function is
> called,
> +	 * the drm_pagemap destruction needs to be waited for in
> +	 * the device destruction code.
> +	 */
> +	void (*destroy)(struct drm_pagemap *dpagemap);
>  };
>  
>  /**
> @@ -127,14 +142,16 @@ struct drm_pagemap_ops {
>   * used for device p2p handshaking.
>   * @ops: The struct drm_pagemap_ops.
>   * @ref: Reference count.
> - * @dev: The struct drevice owning the device-private memory.
> + * @drm: The struct drm device owning the device-private memory.
>   * @pagemap: Pointer to the underlying dev_pagemap.
> + * @dev_hold: Pointer to a struct dev_hold for device referencing.
>   */
>  struct drm_pagemap {
>  	const struct drm_pagemap_ops *ops;
>  	struct kref ref;
> -	struct device *dev;
> +	struct drm_device *drm;
>  	struct dev_pagemap *pagemap;
> +	struct drm_pagemap_dev_hold *dev_hold;
>  };
>  
>  struct drm_pagemap_devmem;
> @@ -199,26 +216,44 @@ struct drm_pagemap_devmem_ops {
>  			   unsigned long npages);
>  };
>  
> -struct drm_pagemap *drm_pagemap_create(struct device *dev,
> -				       struct dev_pagemap *pagemap,
> -				       const struct drm_pagemap_ops
> *ops);
> +int drm_pagemap_reinit(struct drm_pagemap *dpagemap);
> +
> +int drm_pagemap_init(struct drm_pagemap *dpagemap,
> +		     struct dev_pagemap *pagemap,
> +		     struct drm_device *drm,
> +		     const struct drm_pagemap_ops *ops);
>  
>  void drm_pagemap_put(struct drm_pagemap *dpagemap);
>  
>  /**
>   * drm_pagemap_get() - Obtain a reference on a struct drm_pagemap
> - * @dpagemap: Pointer to the struct drm_pagemap.
> + * @dpagemap: Pointer to the struct drm_pagemap, or NULL.
>   *
> - * Return: Pointer to the struct drm_pagemap.
> + * Return: Pointer to the struct drm_pagemap, or NULL.
>   */
>  static inline struct drm_pagemap *
>  drm_pagemap_get(struct drm_pagemap *dpagemap)
>  {
> -	kref_get(&dpagemap->ref);
> +	if (likely(dpagemap))
> +		kref_get(&dpagemap->ref);
>  
>  	return dpagemap;
>  }
>  
> +/**
> + * drm_pagemap_get_unless_zero() - Obtain a reference on a struct
> drm_pagemap
> + * unless the current reference count is zero.
> + * @dpagemap: Pointer to the drm_pagemap or NULL.
> + *
> + * Return: A pointer to @dpagemap if the reference count was
> successfully
> + * incremented. NULL if @dpagemap was NULL, or its refcount was 0.
> + */
> +static inline struct drm_pagemap *
> +drm_pagemap_get_unless_zero(struct drm_pagemap *dpagemap)
> +{
> +	return (dpagemap && kref_get_unless_zero(&dpagemap->ref)) ?
> dpagemap : NULL;
> +}
> +
>  /**
>   * struct drm_pagemap_devmem - Structure representing a GPU SVM
> device memory allocation
>   *
> @@ -257,6 +292,4 @@ void drm_pagemap_devmem_init(struct
> drm_pagemap_devmem *devmem_allocation,
>  int drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
>  			    unsigned long start, unsigned long end,
>  			    struct mm_struct *mm);
> -
>  #endif
> -


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM
  2025-03-13 12:57     ` Christian König
  2025-03-13 15:55       ` Thomas Hellström
@ 2025-03-17  9:20       ` Thomas Hellström
  1 sibling, 0 replies; 26+ messages in thread
From: Thomas Hellström @ 2025-03-17  9:20 UTC (permalink / raw)
  To: Christian König, intel-xe, dri-devel
  Cc: himal.prasad.ghimiray, apopple, airlied, Simona Vetter,
	felix.kuehling, Matthew Brost, dakr, Mrozek, Michal,
	Joonas Lahtinen

On Thu, 2025-03-13 at 13:57 +0100, Christian König wrote:
> Am 13.03.25 um 13:50 schrieb Thomas Hellström:
> > Hi, Christian
> > 
> > On Thu, 2025-03-13 at 11:19 +0100, Christian König wrote:
> > > Am 12.03.25 um 22:03 schrieb Thomas Hellström:
> > > > This RFC implements and requests comments for a way to handle
> > > > SVM
> > > > with multi-device,
> > > > typically with fast interconnects. It adds generic code and
> > > > helpers
> > > > in drm, and
> > > > device-specific code for xe.
> > > > 
> > > > For SVM, devices set up maps of device-private struct pages,
> > > > using
> > > > a struct dev_pagemap,
> > > > The CPU virtual address space (mm), can then be set up using
> > > > special page-table entries
> > > > to point to such pages, but they can't be accessed directly by
> > > > the
> > > > CPU, but possibly
> > > > by other devices using a fast interconnect. This series aims to
> > > > provide helpers to
> > > > identify pagemaps that take part in such a fast interconnect
> > > > and to
> > > > aid in migrating
> > > > between them.
> > > > 
> > > > This is initially done by augmenting the struct dev_pagemap
> > > > with a
> > > > struct drm_pagemap,
> > > > and having the struct drm_pagemap implement a "populate_mm"
> > > > method,
> > > > where a region of
> > > > the CPU virtual address space (mm) is populated with
> > > > device_private
> > > > pages from the
> > > > dev_pagemap associated with the drm_pagemap, migrating data
> > > > from
> > > > system memory or other
> > > > devices if necessary. The drm_pagemap_populate_mm() function is
> > > > then typically called
> > > > from a fault handler, using the struct drm_pagemap pointer of
> > > > choice. It could be
> > > > referencing a local drm_pagemap or a remote one. The migration
> > > > is
> > > > now completely done
> > > > by drm_pagemap callbacks, (typically using a copy-engine local
> > > > to
> > > > the dev_pagemap local
> > > > memory).
> > > Up till here that makes sense. Maybe not necessary to be put into
> > > the
> > > DRM layer, but that is an implementation detail.
> > > 
> > > > In addition there are helpers to build a drm_pagemap UAPI using
> > > > file-descripors
> > > > representing struct drm_pagemaps, and a helper to register
> > > > devices
> > > > with a common
> > > > fast interconnect. The UAPI is intended to be private to the
> > > > device, but if drivers
> > > > agree to identify struct drm_pagemaps by file descriptors one
> > > > could
> > > > in theory
> > > > do cross-driver multi-device SVM if a use-case were found.
> > > But this completely eludes me.
> > > 
> > > Why would you want an UAPI for representing pagemaps as file
> > > descriptors? Isn't it the kernel which enumerates the
> > > interconnects
> > > of the devices?
> > > 
> > > I mean we somehow need to expose those interconnects between
> > > devices
> > > to userspace, e.g. like amdgpu does with it's XGMI connectors.
> > > But
> > > that is static for the hardware (unless HW is hot removed/added)
> > > and
> > > so I would assume exposed through sysfs.
> > Thanks for the feedback.
> > 
> > The idea here is not to expose the interconnects but rather have a
> > way
> > for user-space to identify a drm_pagemap and some level of access-
> > and
> > lifetime control.
> 
> Well that's what I get I just don't get why?
> 
> I mean when you want to have the pagemap as optional feature you can
> turn on and off I would say make that a sysfs file.
> 
> It's a global feature anyway and not bound in any way to the file
> descriptor, isn't it?

Getting back on this we had some discussions internally on this and the
desired behavior is to have the device-private pages on a firstopen-
lastclose lifetime (Or rather firstopen-(lastclose + shrinker))
lifetime, for memory usage concerns. So I believe a file descriptor is
a good fit for the UAPI representation.

Thanks,
Thomas



^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2025-03-17  9:21 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
2025-03-12 21:03 ` [RFC PATCH 01/19] drm/xe: Introduce CONFIG_DRM_XE_GPUSVM Thomas Hellström
2025-03-12 21:03 ` [RFC PATCH 02/19] drm/xe/svm: Fix a potential bo UAF Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 03/19] drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 04/19] drm/pagemap: Add a populate_mm op Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 05/19] drm/xe: Implement and use the drm_pagemap " Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 06/19] drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap and manage lifetime Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 07/19] drm/pagemap: Get rid of the struct drm_pagemap_zdd::device_private_page_owner field Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 08/19] drm/xe/bo: Add a bo remove callback Thomas Hellström
2025-03-14 13:05   ` Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 09/19] drm/pagemap_util: Add a utility to assign an owner to a set of interconnected gpus Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 10/19] drm/gpusvm, drm/xe: Move the device private owner to the drm_gpusvm_ctx Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 11/19] drm/xe: Use the drm_pagemap_util helper to get a svm pagemap owner Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 12/19] drm/xe: Make the PT code handle placement per PTE rather than per vma / range Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 13/19] drm/gpusvm: Allow mixed mappings Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 14/19] drm/xe: Add a preferred dpagemap Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 15/19] drm/pagemap/util: Add file descriptors pointing to struct drm_pagemap Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 16/19] drm/xe/migrate: Allow xe_migrate_vram() also on non-pagefault capable devices Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 17/19] drm/xe/uapi: Add the devmem_open ioctl Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 18/19] drm/xe/uapi: HAX: Add the xe_madvise_prefer_devmem IOCTL Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 19/19] drm/xe: HAX: Use pcie p2p dma to test fast interconnect Thomas Hellström
2025-03-13 10:19 ` [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Christian König
2025-03-13 12:50   ` Thomas Hellström
2025-03-13 12:57     ` Christian König
2025-03-13 15:55       ` Thomas Hellström
2025-03-17  9:20       ` Thomas Hellström

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).