* [PATCH v2 0/3] drm/gpusvm, drm/pagemap, drm/xe: Restructure migration in preparation for multi-device
@ 2025-06-04 9:35 Thomas Hellström
2025-06-04 9:35 ` [PATCH v2 1/3] drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap Thomas Hellström
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Thomas Hellström @ 2025-06-04 9:35 UTC (permalink / raw)
To: intel-xe
Cc: Thomas Hellström, dri-devel, himal.prasad.ghimiray, apopple,
airlied, Simona Vetter, felix.kuehling, Matthew Brost,
Christian König, dakr, Mrozek, Michal, Joonas Lahtinen
This patchset modifies the migration part of drm_gpusvm to drm_pagemap and
adds a populate_mm() op to drm_pagemap.
The idea is that the device that receives a pagefault determines if it wants to
migrate content and to where. It then calls the populate_mm() method of relevant
drm_pagemap.
This functionality was mostly already in place, but hard-coded for xe only without
going through a pagemap op. Since we might be dealing with separate devices moving
forward, it also now becomes the responsibilit of the populate_mm() op to
grab any necessary local device runtime pm references and keep them held while
its pages are present in an mm (struct mm_struct).
On thing to decide here is whether the populate_mm() callback should sit on a
struct drm_pagemap for now while we sort multi-device usability out or whether
we should add it (or something equivalent) to struct dev_pagemap.
v2:
- Rebase.
Matthew Brost (1):
drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap
Thomas Hellström (2):
drm/pagemap: Add a populate_mm op
drm/xe: Implement and use the drm_pagemap populate_mm op
Documentation/gpu/rfc/gpusvm.rst | 12 +-
drivers/gpu/drm/Makefile | 6 +-
drivers/gpu/drm/drm_gpusvm.c | 760 +-----------------------
drivers/gpu/drm/drm_pagemap.c | 846 +++++++++++++++++++++++++++
drivers/gpu/drm/xe/Kconfig | 10 +-
drivers/gpu/drm/xe/xe_bo_types.h | 2 +-
drivers/gpu/drm/xe/xe_device_types.h | 2 +-
drivers/gpu/drm/xe/xe_svm.c | 129 ++--
drivers/gpu/drm/xe/xe_svm.h | 10 +-
drivers/gpu/drm/xe/xe_tile.h | 11 +
drivers/gpu/drm/xe/xe_vm.c | 2 +-
include/drm/drm_gpusvm.h | 96 ---
include/drm/drm_pagemap.h | 135 +++++
13 files changed, 1107 insertions(+), 914 deletions(-)
create mode 100644 drivers/gpu/drm/drm_pagemap.c
--
2.49.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 1/3] drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap
2025-06-04 9:35 [PATCH v2 0/3] drm/gpusvm, drm/pagemap, drm/xe: Restructure migration in preparation for multi-device Thomas Hellström
@ 2025-06-04 9:35 ` Thomas Hellström
2025-06-04 15:45 ` kernel test robot
2025-06-05 22:44 ` Matthew Brost
2025-06-04 9:35 ` [PATCH v2 2/3] drm/pagemap: Add a populate_mm op Thomas Hellström
` (2 subsequent siblings)
3 siblings, 2 replies; 16+ messages in thread
From: Thomas Hellström @ 2025-06-04 9:35 UTC (permalink / raw)
To: intel-xe
Cc: Matthew Brost, Thomas Hellström, dri-devel,
himal.prasad.ghimiray, apopple, airlied, Simona Vetter,
felix.kuehling, Christian König, dakr, Mrozek, Michal,
Joonas Lahtinen
From: Matthew Brost <matthew.brost@intel.com>
The migration functionality and track-keeping of per-pagemap VRAM
mapped to the CPU mm is not per GPU_vm, but rather per pagemap.
This is also reflected by the functions not needing the drm_gpusvm
structures. So move to drm_pagemap.
With this, drm_gpusvm shouldn't really access the page zone-device-data
since its meaning is internal to drm_pagemap. Currently it's used to
reject mapping ranges backed by multiple drm_pagemap allocations.
For now, make the zone-device-data a void pointer.
Rename CONFIG_DRM_XE_DEVMEM_MIRROR to CONFIG_DRM_XE_PAGEMAP.
Matt is listed as author of this commit since he wrote most of the code,
and it makes sense to retain his git authorship.
Thomas mostly moved the code around.
Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
Documentation/gpu/rfc/gpusvm.rst | 12 +-
drivers/gpu/drm/Makefile | 6 +-
drivers/gpu/drm/drm_gpusvm.c | 759 +------------------------
drivers/gpu/drm/drm_pagemap.c | 811 +++++++++++++++++++++++++++
drivers/gpu/drm/xe/Kconfig | 10 +-
drivers/gpu/drm/xe/xe_bo_types.h | 2 +-
drivers/gpu/drm/xe/xe_device_types.h | 2 +-
drivers/gpu/drm/xe/xe_svm.c | 49 +-
include/drm/drm_gpusvm.h | 96 ----
include/drm/drm_pagemap.h | 101 ++++
10 files changed, 974 insertions(+), 874 deletions(-)
create mode 100644 drivers/gpu/drm/drm_pagemap.c
diff --git a/Documentation/gpu/rfc/gpusvm.rst b/Documentation/gpu/rfc/gpusvm.rst
index bcf66a8137a6..469db1372f16 100644
--- a/Documentation/gpu/rfc/gpusvm.rst
+++ b/Documentation/gpu/rfc/gpusvm.rst
@@ -73,15 +73,21 @@ Overview of baseline design
.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
:doc: Locking
-.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
- :doc: Migration
-
.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
:doc: Partial Unmapping of Ranges
.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
:doc: Examples
+Overview of drm_pagemap design
+==============================
+
+.. kernel-doc:: drivers/gpu/drm/drm_pagemap.c
+ :doc: Overview
+
+.. kernel-doc:: drivers/gpu/drm/drm_pagemap.c
+ :doc: Migration
+
Possible future design features
===============================
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 4199715670b1..f9cde5717f85 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -104,7 +104,11 @@ obj-$(CONFIG_DRM_PANEL_BACKLIGHT_QUIRKS) += drm_panel_backlight_quirks.o
#
obj-$(CONFIG_DRM_EXEC) += drm_exec.o
obj-$(CONFIG_DRM_GPUVM) += drm_gpuvm.o
-obj-$(CONFIG_DRM_GPUSVM) += drm_gpusvm.o
+
+drm_gpusvm_helper-y := \
+ drm_gpusvm.o\
+ drm_pagemap.o
+obj-$(CONFIG_DRM_GPUSVM) += drm_gpusvm_helper.o
obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
diff --git a/drivers/gpu/drm/drm_gpusvm.c b/drivers/gpu/drm/drm_gpusvm.c
index 7ff81aa0a1ca..ef81381609de 100644
--- a/drivers/gpu/drm/drm_gpusvm.c
+++ b/drivers/gpu/drm/drm_gpusvm.c
@@ -8,10 +8,9 @@
#include <linux/dma-mapping.h>
#include <linux/hmm.h>
+#include <linux/hugetlb_inline.h>
#include <linux/memremap.h>
-#include <linux/migrate.h>
#include <linux/mm_types.h>
-#include <linux/pagemap.h>
#include <linux/slab.h>
#include <drm/drm_device.h>
@@ -107,21 +106,6 @@
* to add annotations to GPU SVM.
*/
-/**
- * DOC: Migration
- *
- * The migration support is quite simple, allowing migration between RAM and
- * device memory at the range granularity. For example, GPU SVM currently does
- * not support mixing RAM and device memory pages within a range. This means
- * that upon GPU fault, the entire range can be migrated to device memory, and
- * upon CPU fault, the entire range is migrated to RAM. Mixed RAM and device
- * memory storage within a range could be added in the future if required.
- *
- * The reasoning for only supporting range granularity is as follows: it
- * simplifies the implementation, and range sizes are driver-defined and should
- * be relatively small.
- */
-
/**
* DOC: Partial Unmapping of Ranges
*
@@ -193,10 +177,9 @@
* if (driver_migration_policy(range)) {
* mmap_read_lock(mm);
* devmem = driver_alloc_devmem();
- * err = drm_gpusvm_migrate_to_devmem(gpusvm, range,
- * devmem_allocation,
- * &ctx);
- * mmap_read_unlock(mm);
+ * err = drm_pagemap_migrate_to_devmem(devmem, gpusvm->mm, gpuva_start,
+ * gpuva_end, driver_pgmap_owner());
+ * mmap_read_unlock(mm);
* if (err) // CPU mappings may have changed
* goto retry;
* }
@@ -288,97 +271,6 @@ npages_in_range(unsigned long start, unsigned long end)
return (end - start) >> PAGE_SHIFT;
}
-/**
- * struct drm_gpusvm_zdd - GPU SVM zone device data
- *
- * @refcount: Reference count for the zdd
- * @devmem_allocation: device memory allocation
- * @device_private_page_owner: Device private pages owner
- *
- * This structure serves as a generic wrapper installed in
- * page->zone_device_data. It provides infrastructure for looking up a device
- * memory allocation upon CPU page fault and asynchronously releasing device
- * memory once the CPU has no page references. Asynchronous release is useful
- * because CPU page references can be dropped in IRQ contexts, while releasing
- * device memory likely requires sleeping locks.
- */
-struct drm_gpusvm_zdd {
- struct kref refcount;
- struct drm_gpusvm_devmem *devmem_allocation;
- void *device_private_page_owner;
-};
-
-/**
- * drm_gpusvm_zdd_alloc() - Allocate a zdd structure.
- * @device_private_page_owner: Device private pages owner
- *
- * This function allocates and initializes a new zdd structure. It sets up the
- * reference count and initializes the destroy work.
- *
- * Return: Pointer to the allocated zdd on success, ERR_PTR() on failure.
- */
-static struct drm_gpusvm_zdd *
-drm_gpusvm_zdd_alloc(void *device_private_page_owner)
-{
- struct drm_gpusvm_zdd *zdd;
-
- zdd = kmalloc(sizeof(*zdd), GFP_KERNEL);
- if (!zdd)
- return NULL;
-
- kref_init(&zdd->refcount);
- zdd->devmem_allocation = NULL;
- zdd->device_private_page_owner = device_private_page_owner;
-
- return zdd;
-}
-
-/**
- * drm_gpusvm_zdd_get() - Get a reference to a zdd structure.
- * @zdd: Pointer to the zdd structure.
- *
- * This function increments the reference count of the provided zdd structure.
- *
- * Return: Pointer to the zdd structure.
- */
-static struct drm_gpusvm_zdd *drm_gpusvm_zdd_get(struct drm_gpusvm_zdd *zdd)
-{
- kref_get(&zdd->refcount);
- return zdd;
-}
-
-/**
- * drm_gpusvm_zdd_destroy() - Destroy a zdd structure.
- * @ref: Pointer to the reference count structure.
- *
- * This function queues the destroy_work of the zdd for asynchronous destruction.
- */
-static void drm_gpusvm_zdd_destroy(struct kref *ref)
-{
- struct drm_gpusvm_zdd *zdd =
- container_of(ref, struct drm_gpusvm_zdd, refcount);
- struct drm_gpusvm_devmem *devmem = zdd->devmem_allocation;
-
- if (devmem) {
- complete_all(&devmem->detached);
- if (devmem->ops->devmem_release)
- devmem->ops->devmem_release(devmem);
- }
- kfree(zdd);
-}
-
-/**
- * drm_gpusvm_zdd_put() - Put a zdd reference.
- * @zdd: Pointer to the zdd structure.
- *
- * This function decrements the reference count of the provided zdd structure
- * and schedules its destruction if the count drops to zero.
- */
-static void drm_gpusvm_zdd_put(struct drm_gpusvm_zdd *zdd)
-{
- kref_put(&zdd->refcount, drm_gpusvm_zdd_destroy);
-}
-
/**
* drm_gpusvm_range_find() - Find GPU SVM range from GPU SVM notifier
* @notifier: Pointer to the GPU SVM notifier structure.
@@ -945,7 +837,7 @@ drm_gpusvm_range_chunk_size(struct drm_gpusvm *gpusvm,
* process-many-malloc' fails. In the failure case, each process
* mallocs 16k but the CPU VMA is ~128k which results in 64k SVM
* ranges. When migrating the SVM ranges, some processes fail in
- * drm_gpusvm_migrate_to_devmem with 'migrate.cpages != npages'
+ * drm_pagemap_migrate_to_devmem with 'migrate.cpages != npages'
* and then upon drm_gpusvm_range_get_pages device pages from
* other processes are collected + faulted in which creates all
* sorts of problems. Unsure exactly how this happening, also
@@ -1363,7 +1255,7 @@ int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm,
.dev_private_owner = gpusvm->device_private_page_owner,
};
struct mm_struct *mm = gpusvm->mm;
- struct drm_gpusvm_zdd *zdd;
+ void *zdd;
unsigned long timeout =
jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
unsigned long i, j;
@@ -1465,7 +1357,7 @@ int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm,
}
pagemap = page_pgmap(page);
- dpagemap = zdd->devmem_allocation->dpagemap;
+ dpagemap = drm_pagemap_page_to_dpagemap(page);
if (drm_WARN_ON(gpusvm->drm, !dpagemap)) {
/*
* Raced. This is not supposed to happen
@@ -1489,7 +1381,7 @@ int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm,
} else {
dma_addr_t addr;
- if (is_zone_device_page(page) || zdd) {
+ if (is_zone_device_page(page) || pagemap) {
err = -EOPNOTSUPP;
goto err_unmap;
}
@@ -1517,7 +1409,7 @@ int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm,
flags.has_dma_mapping = true;
}
- if (zdd) {
+ if (pagemap) {
flags.has_devmem_pages = true;
range->dpagemap = dpagemap;
}
@@ -1545,6 +1437,7 @@ EXPORT_SYMBOL_GPL(drm_gpusvm_range_get_pages);
/**
* drm_gpusvm_range_unmap_pages() - Unmap pages associated with a GPU SVM range
+ * drm_gpusvm_range_evict() - Evict GPU SVM range
* @gpusvm: Pointer to the GPU SVM structure
* @range: Pointer to the GPU SVM range structure
* @ctx: GPU SVM context
@@ -1575,562 +1468,11 @@ void drm_gpusvm_range_unmap_pages(struct drm_gpusvm *gpusvm,
EXPORT_SYMBOL_GPL(drm_gpusvm_range_unmap_pages);
/**
- * drm_gpusvm_migration_unlock_put_page() - Put a migration page
- * @page: Pointer to the page to put
- *
- * This function unlocks and puts a page.
- */
-static void drm_gpusvm_migration_unlock_put_page(struct page *page)
-{
- unlock_page(page);
- put_page(page);
-}
-
-/**
- * drm_gpusvm_migration_unlock_put_pages() - Put migration pages
- * @npages: Number of pages
- * @migrate_pfn: Array of migrate page frame numbers
- *
- * This function unlocks and puts an array of pages.
- */
-static void drm_gpusvm_migration_unlock_put_pages(unsigned long npages,
- unsigned long *migrate_pfn)
-{
- unsigned long i;
-
- for (i = 0; i < npages; ++i) {
- struct page *page;
-
- if (!migrate_pfn[i])
- continue;
-
- page = migrate_pfn_to_page(migrate_pfn[i]);
- drm_gpusvm_migration_unlock_put_page(page);
- migrate_pfn[i] = 0;
- }
-}
-
-/**
- * drm_gpusvm_get_devmem_page() - Get a reference to a device memory page
- * @page: Pointer to the page
- * @zdd: Pointer to the GPU SVM zone device data
- *
- * This function associates the given page with the specified GPU SVM zone
- * device data and initializes it for zone device usage.
- */
-static void drm_gpusvm_get_devmem_page(struct page *page,
- struct drm_gpusvm_zdd *zdd)
-{
- page->zone_device_data = drm_gpusvm_zdd_get(zdd);
- zone_device_page_init(page);
-}
-
-/**
- * drm_gpusvm_migrate_map_pages() - Map migration pages for GPU SVM migration
- * @dev: The device for which the pages are being mapped
- * @dma_addr: Array to store DMA addresses corresponding to mapped pages
- * @migrate_pfn: Array of migrate page frame numbers to map
- * @npages: Number of pages to map
- * @dir: Direction of data transfer (e.g., DMA_BIDIRECTIONAL)
- *
- * This function maps pages of memory for migration usage in GPU SVM. It
- * iterates over each page frame number provided in @migrate_pfn, maps the
- * corresponding page, and stores the DMA address in the provided @dma_addr
- * array.
- *
- * Return: 0 on success, -EFAULT if an error occurs during mapping.
- */
-static int drm_gpusvm_migrate_map_pages(struct device *dev,
- dma_addr_t *dma_addr,
- unsigned long *migrate_pfn,
- unsigned long npages,
- enum dma_data_direction dir)
-{
- unsigned long i;
-
- for (i = 0; i < npages; ++i) {
- struct page *page = migrate_pfn_to_page(migrate_pfn[i]);
-
- if (!page)
- continue;
-
- if (WARN_ON_ONCE(is_zone_device_page(page)))
- return -EFAULT;
-
- dma_addr[i] = dma_map_page(dev, page, 0, PAGE_SIZE, dir);
- if (dma_mapping_error(dev, dma_addr[i]))
- return -EFAULT;
- }
-
- return 0;
-}
-
-/**
- * drm_gpusvm_migrate_unmap_pages() - Unmap pages previously mapped for GPU SVM migration
- * @dev: The device for which the pages were mapped
- * @dma_addr: Array of DMA addresses corresponding to mapped pages
- * @npages: Number of pages to unmap
- * @dir: Direction of data transfer (e.g., DMA_BIDIRECTIONAL)
- *
- * This function unmaps previously mapped pages of memory for GPU Shared Virtual
- * Memory (SVM). It iterates over each DMA address provided in @dma_addr, checks
- * if it's valid and not already unmapped, and unmaps the corresponding page.
- */
-static void drm_gpusvm_migrate_unmap_pages(struct device *dev,
- dma_addr_t *dma_addr,
- unsigned long npages,
- enum dma_data_direction dir)
-{
- unsigned long i;
-
- for (i = 0; i < npages; ++i) {
- if (!dma_addr[i] || dma_mapping_error(dev, dma_addr[i]))
- continue;
-
- dma_unmap_page(dev, dma_addr[i], PAGE_SIZE, dir);
- }
-}
-
-/**
- * drm_gpusvm_migrate_to_devmem() - Migrate GPU SVM range to device memory
+ * drm_gpusvm_range_evict() - Evict GPU SVM range
* @gpusvm: Pointer to the GPU SVM structure
- * @range: Pointer to the GPU SVM range structure
- * @devmem_allocation: Pointer to the device memory allocation. The caller
- * should hold a reference to the device memory allocation,
- * which should be dropped via ops->devmem_release or upon
- * the failure of this function.
- * @ctx: GPU SVM context
- *
- * This function migrates the specified GPU SVM range to device memory. It
- * performs the necessary setup and invokes the driver-specific operations for
- * migration to device memory. Upon successful return, @devmem_allocation can
- * safely reference @range until ops->devmem_release is called which only upon
- * successful return. Expected to be called while holding the mmap lock in read
- * mode.
- *
- * Return: 0 on success, negative error code on failure.
- */
-int drm_gpusvm_migrate_to_devmem(struct drm_gpusvm *gpusvm,
- struct drm_gpusvm_range *range,
- struct drm_gpusvm_devmem *devmem_allocation,
- const struct drm_gpusvm_ctx *ctx)
-{
- const struct drm_gpusvm_devmem_ops *ops = devmem_allocation->ops;
- unsigned long start = drm_gpusvm_range_start(range),
- end = drm_gpusvm_range_end(range);
- struct migrate_vma migrate = {
- .start = start,
- .end = end,
- .pgmap_owner = gpusvm->device_private_page_owner,
- .flags = MIGRATE_VMA_SELECT_SYSTEM,
- };
- struct mm_struct *mm = gpusvm->mm;
- unsigned long i, npages = npages_in_range(start, end);
- struct vm_area_struct *vas;
- struct drm_gpusvm_zdd *zdd = NULL;
- struct page **pages;
- dma_addr_t *dma_addr;
- void *buf;
- int err;
-
- mmap_assert_locked(gpusvm->mm);
-
- if (!range->flags.migrate_devmem)
- return -EINVAL;
-
- if (!ops->populate_devmem_pfn || !ops->copy_to_devmem ||
- !ops->copy_to_ram)
- return -EOPNOTSUPP;
-
- vas = vma_lookup(mm, start);
- if (!vas) {
- err = -ENOENT;
- goto err_out;
- }
-
- if (end > vas->vm_end || start < vas->vm_start) {
- err = -EINVAL;
- goto err_out;
- }
-
- if (!vma_is_anonymous(vas)) {
- err = -EBUSY;
- goto err_out;
- }
-
- buf = kvcalloc(npages, 2 * sizeof(*migrate.src) + sizeof(*dma_addr) +
- sizeof(*pages), GFP_KERNEL);
- if (!buf) {
- err = -ENOMEM;
- goto err_out;
- }
- dma_addr = buf + (2 * sizeof(*migrate.src) * npages);
- pages = buf + (2 * sizeof(*migrate.src) + sizeof(*dma_addr)) * npages;
-
- zdd = drm_gpusvm_zdd_alloc(gpusvm->device_private_page_owner);
- if (!zdd) {
- err = -ENOMEM;
- goto err_free;
- }
-
- migrate.vma = vas;
- migrate.src = buf;
- migrate.dst = migrate.src + npages;
-
- err = migrate_vma_setup(&migrate);
- if (err)
- goto err_free;
-
- if (!migrate.cpages) {
- err = -EFAULT;
- goto err_free;
- }
-
- if (migrate.cpages != npages) {
- err = -EBUSY;
- goto err_finalize;
- }
-
- err = ops->populate_devmem_pfn(devmem_allocation, npages, migrate.dst);
- if (err)
- goto err_finalize;
-
- err = drm_gpusvm_migrate_map_pages(devmem_allocation->dev, dma_addr,
- migrate.src, npages, DMA_TO_DEVICE);
- if (err)
- goto err_finalize;
-
- for (i = 0; i < npages; ++i) {
- struct page *page = pfn_to_page(migrate.dst[i]);
-
- pages[i] = page;
- migrate.dst[i] = migrate_pfn(migrate.dst[i]);
- drm_gpusvm_get_devmem_page(page, zdd);
- }
-
- err = ops->copy_to_devmem(pages, dma_addr, npages);
- if (err)
- goto err_finalize;
-
- /* Upon success bind devmem allocation to range and zdd */
- devmem_allocation->timeslice_expiration = get_jiffies_64() +
- msecs_to_jiffies(ctx->timeslice_ms);
- zdd->devmem_allocation = devmem_allocation; /* Owns ref */
-
-err_finalize:
- if (err)
- drm_gpusvm_migration_unlock_put_pages(npages, migrate.dst);
- migrate_vma_pages(&migrate);
- migrate_vma_finalize(&migrate);
- drm_gpusvm_migrate_unmap_pages(devmem_allocation->dev, dma_addr, npages,
- DMA_TO_DEVICE);
-err_free:
- if (zdd)
- drm_gpusvm_zdd_put(zdd);
- kvfree(buf);
-err_out:
- return err;
-}
-EXPORT_SYMBOL_GPL(drm_gpusvm_migrate_to_devmem);
-
-/**
- * drm_gpusvm_migrate_populate_ram_pfn() - Populate RAM PFNs for a VM area
- * @vas: Pointer to the VM area structure, can be NULL
- * @fault_page: Fault page
- * @npages: Number of pages to populate
- * @mpages: Number of pages to migrate
- * @src_mpfn: Source array of migrate PFNs
- * @mpfn: Array of migrate PFNs to populate
- * @addr: Start address for PFN allocation
- *
- * This function populates the RAM migrate page frame numbers (PFNs) for the
- * specified VM area structure. It allocates and locks pages in the VM area for
- * RAM usage. If vas is non-NULL use alloc_page_vma for allocation, if NULL use
- * alloc_page for allocation.
- *
- * Return: 0 on success, negative error code on failure.
- */
-static int drm_gpusvm_migrate_populate_ram_pfn(struct vm_area_struct *vas,
- struct page *fault_page,
- unsigned long npages,
- unsigned long *mpages,
- unsigned long *src_mpfn,
- unsigned long *mpfn,
- unsigned long addr)
-{
- unsigned long i;
-
- for (i = 0; i < npages; ++i, addr += PAGE_SIZE) {
- struct page *page, *src_page;
-
- if (!(src_mpfn[i] & MIGRATE_PFN_MIGRATE))
- continue;
-
- src_page = migrate_pfn_to_page(src_mpfn[i]);
- if (!src_page)
- continue;
-
- if (fault_page) {
- if (src_page->zone_device_data !=
- fault_page->zone_device_data)
- continue;
- }
-
- if (vas)
- page = alloc_page_vma(GFP_HIGHUSER, vas, addr);
- else
- page = alloc_page(GFP_HIGHUSER);
-
- if (!page)
- goto free_pages;
-
- mpfn[i] = migrate_pfn(page_to_pfn(page));
- }
-
- for (i = 0; i < npages; ++i) {
- struct page *page = migrate_pfn_to_page(mpfn[i]);
-
- if (!page)
- continue;
-
- WARN_ON_ONCE(!trylock_page(page));
- ++*mpages;
- }
-
- return 0;
-
-free_pages:
- for (i = 0; i < npages; ++i) {
- struct page *page = migrate_pfn_to_page(mpfn[i]);
-
- if (!page)
- continue;
-
- put_page(page);
- mpfn[i] = 0;
- }
- return -ENOMEM;
-}
-
-/**
- * drm_gpusvm_evict_to_ram() - Evict GPU SVM range to RAM
- * @devmem_allocation: Pointer to the device memory allocation
- *
- * Similar to __drm_gpusvm_migrate_to_ram but does not require mmap lock and
- * migration done via migrate_device_* functions.
- *
- * Return: 0 on success, negative error code on failure.
- */
-int drm_gpusvm_evict_to_ram(struct drm_gpusvm_devmem *devmem_allocation)
-{
- const struct drm_gpusvm_devmem_ops *ops = devmem_allocation->ops;
- unsigned long npages, mpages = 0;
- struct page **pages;
- unsigned long *src, *dst;
- dma_addr_t *dma_addr;
- void *buf;
- int i, err = 0;
- unsigned int retry_count = 2;
-
- npages = devmem_allocation->size >> PAGE_SHIFT;
-
-retry:
- if (!mmget_not_zero(devmem_allocation->mm))
- return -EFAULT;
-
- buf = kvcalloc(npages, 2 * sizeof(*src) + sizeof(*dma_addr) +
- sizeof(*pages), GFP_KERNEL);
- if (!buf) {
- err = -ENOMEM;
- goto err_out;
- }
- src = buf;
- dst = buf + (sizeof(*src) * npages);
- dma_addr = buf + (2 * sizeof(*src) * npages);
- pages = buf + (2 * sizeof(*src) + sizeof(*dma_addr)) * npages;
-
- err = ops->populate_devmem_pfn(devmem_allocation, npages, src);
- if (err)
- goto err_free;
-
- err = migrate_device_pfns(src, npages);
- if (err)
- goto err_free;
-
- err = drm_gpusvm_migrate_populate_ram_pfn(NULL, NULL, npages, &mpages,
- src, dst, 0);
- if (err || !mpages)
- goto err_finalize;
-
- err = drm_gpusvm_migrate_map_pages(devmem_allocation->dev, dma_addr,
- dst, npages, DMA_FROM_DEVICE);
- if (err)
- goto err_finalize;
-
- for (i = 0; i < npages; ++i)
- pages[i] = migrate_pfn_to_page(src[i]);
-
- err = ops->copy_to_ram(pages, dma_addr, npages);
- if (err)
- goto err_finalize;
-
-err_finalize:
- if (err)
- drm_gpusvm_migration_unlock_put_pages(npages, dst);
- migrate_device_pages(src, dst, npages);
- migrate_device_finalize(src, dst, npages);
- drm_gpusvm_migrate_unmap_pages(devmem_allocation->dev, dma_addr, npages,
- DMA_FROM_DEVICE);
-err_free:
- kvfree(buf);
-err_out:
- mmput_async(devmem_allocation->mm);
-
- if (completion_done(&devmem_allocation->detached))
- return 0;
-
- if (retry_count--) {
- cond_resched();
- goto retry;
- }
-
- return err ?: -EBUSY;
-}
-EXPORT_SYMBOL_GPL(drm_gpusvm_evict_to_ram);
-
-/**
- * __drm_gpusvm_migrate_to_ram() - Migrate GPU SVM range to RAM (internal)
- * @vas: Pointer to the VM area structure
- * @device_private_page_owner: Device private pages owner
- * @page: Pointer to the page for fault handling (can be NULL)
- * @fault_addr: Fault address
- * @size: Size of migration
- *
- * This internal function performs the migration of the specified GPU SVM range
- * to RAM. It sets up the migration, populates + dma maps RAM PFNs, and
- * invokes the driver-specific operations for migration to RAM.
- *
- * Return: 0 on success, negative error code on failure.
- */
-static int __drm_gpusvm_migrate_to_ram(struct vm_area_struct *vas,
- void *device_private_page_owner,
- struct page *page,
- unsigned long fault_addr,
- unsigned long size)
-{
- struct migrate_vma migrate = {
- .vma = vas,
- .pgmap_owner = device_private_page_owner,
- .flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE |
- MIGRATE_VMA_SELECT_DEVICE_COHERENT,
- .fault_page = page,
- };
- struct drm_gpusvm_zdd *zdd;
- const struct drm_gpusvm_devmem_ops *ops;
- struct device *dev = NULL;
- unsigned long npages, mpages = 0;
- struct page **pages;
- dma_addr_t *dma_addr;
- unsigned long start, end;
- void *buf;
- int i, err = 0;
-
- if (page) {
- zdd = page->zone_device_data;
- if (time_before64(get_jiffies_64(),
- zdd->devmem_allocation->timeslice_expiration))
- return 0;
- }
-
- start = ALIGN_DOWN(fault_addr, size);
- end = ALIGN(fault_addr + 1, size);
-
- /* Corner where VMA area struct has been partially unmapped */
- if (start < vas->vm_start)
- start = vas->vm_start;
- if (end > vas->vm_end)
- end = vas->vm_end;
-
- migrate.start = start;
- migrate.end = end;
- npages = npages_in_range(start, end);
-
- buf = kvcalloc(npages, 2 * sizeof(*migrate.src) + sizeof(*dma_addr) +
- sizeof(*pages), GFP_KERNEL);
- if (!buf) {
- err = -ENOMEM;
- goto err_out;
- }
- dma_addr = buf + (2 * sizeof(*migrate.src) * npages);
- pages = buf + (2 * sizeof(*migrate.src) + sizeof(*dma_addr)) * npages;
-
- migrate.vma = vas;
- migrate.src = buf;
- migrate.dst = migrate.src + npages;
-
- err = migrate_vma_setup(&migrate);
- if (err)
- goto err_free;
-
- /* Raced with another CPU fault, nothing to do */
- if (!migrate.cpages)
- goto err_free;
-
- if (!page) {
- for (i = 0; i < npages; ++i) {
- if (!(migrate.src[i] & MIGRATE_PFN_MIGRATE))
- continue;
-
- page = migrate_pfn_to_page(migrate.src[i]);
- break;
- }
-
- if (!page)
- goto err_finalize;
- }
- zdd = page->zone_device_data;
- ops = zdd->devmem_allocation->ops;
- dev = zdd->devmem_allocation->dev;
-
- err = drm_gpusvm_migrate_populate_ram_pfn(vas, page, npages, &mpages,
- migrate.src, migrate.dst,
- start);
- if (err)
- goto err_finalize;
-
- err = drm_gpusvm_migrate_map_pages(dev, dma_addr, migrate.dst, npages,
- DMA_FROM_DEVICE);
- if (err)
- goto err_finalize;
-
- for (i = 0; i < npages; ++i)
- pages[i] = migrate_pfn_to_page(migrate.src[i]);
-
- err = ops->copy_to_ram(pages, dma_addr, npages);
- if (err)
- goto err_finalize;
-
-err_finalize:
- if (err)
- drm_gpusvm_migration_unlock_put_pages(npages, migrate.dst);
- migrate_vma_pages(&migrate);
- migrate_vma_finalize(&migrate);
- if (dev)
- drm_gpusvm_migrate_unmap_pages(dev, dma_addr, npages,
- DMA_FROM_DEVICE);
-err_free:
- kvfree(buf);
-err_out:
-
- return err;
-}
-
-/**
- * drm_gpusvm_range_evict - Evict GPU SVM range
* @range: Pointer to the GPU SVM range to be removed
*
- * This function evicts the specified GPU SVM range. This function will not
- * evict coherent pages.
+ * This function evicts the specified GPU SVM range.
*
* Return: 0 on success, a negative error code on failure.
*/
@@ -2182,60 +1524,6 @@ int drm_gpusvm_range_evict(struct drm_gpusvm *gpusvm,
}
EXPORT_SYMBOL_GPL(drm_gpusvm_range_evict);
-/**
- * drm_gpusvm_page_free() - Put GPU SVM zone device data associated with a page
- * @page: Pointer to the page
- *
- * This function is a callback used to put the GPU SVM zone device data
- * associated with a page when it is being released.
- */
-static void drm_gpusvm_page_free(struct page *page)
-{
- drm_gpusvm_zdd_put(page->zone_device_data);
-}
-
-/**
- * drm_gpusvm_migrate_to_ram() - Migrate GPU SVM range to RAM (page fault handler)
- * @vmf: Pointer to the fault information structure
- *
- * This function is a page fault handler used to migrate a GPU SVM range to RAM.
- * It retrieves the GPU SVM range information from the faulting page and invokes
- * the internal migration function to migrate the range back to RAM.
- *
- * Return: VM_FAULT_SIGBUS on failure, 0 on success.
- */
-static vm_fault_t drm_gpusvm_migrate_to_ram(struct vm_fault *vmf)
-{
- struct drm_gpusvm_zdd *zdd = vmf->page->zone_device_data;
- int err;
-
- err = __drm_gpusvm_migrate_to_ram(vmf->vma,
- zdd->device_private_page_owner,
- vmf->page, vmf->address,
- zdd->devmem_allocation->size);
-
- return err ? VM_FAULT_SIGBUS : 0;
-}
-
-/*
- * drm_gpusvm_pagemap_ops - Device page map operations for GPU SVM
- */
-static const struct dev_pagemap_ops drm_gpusvm_pagemap_ops = {
- .page_free = drm_gpusvm_page_free,
- .migrate_to_ram = drm_gpusvm_migrate_to_ram,
-};
-
-/**
- * drm_gpusvm_pagemap_ops_get() - Retrieve GPU SVM device page map operations
- *
- * Return: Pointer to the GPU SVM device page map operations structure.
- */
-const struct dev_pagemap_ops *drm_gpusvm_pagemap_ops_get(void)
-{
- return &drm_gpusvm_pagemap_ops;
-}
-EXPORT_SYMBOL_GPL(drm_gpusvm_pagemap_ops_get);
-
/**
* drm_gpusvm_has_mapping() - Check if GPU SVM has mapping for the given address range
* @gpusvm: Pointer to the GPU SVM structure.
@@ -2280,28 +1568,5 @@ void drm_gpusvm_range_set_unmapped(struct drm_gpusvm_range *range,
}
EXPORT_SYMBOL_GPL(drm_gpusvm_range_set_unmapped);
-/**
- * drm_gpusvm_devmem_init() - Initialize a GPU SVM device memory allocation
- *
- * @dev: Pointer to the device structure which device memory allocation belongs to
- * @mm: Pointer to the mm_struct for the address space
- * @ops: Pointer to the operations structure for GPU SVM device memory
- * @dpagemap: The struct drm_pagemap we're allocating from.
- * @size: Size of device memory allocation
- */
-void drm_gpusvm_devmem_init(struct drm_gpusvm_devmem *devmem_allocation,
- struct device *dev, struct mm_struct *mm,
- const struct drm_gpusvm_devmem_ops *ops,
- struct drm_pagemap *dpagemap, size_t size)
-{
- init_completion(&devmem_allocation->detached);
- devmem_allocation->dev = dev;
- devmem_allocation->mm = mm;
- devmem_allocation->ops = ops;
- devmem_allocation->dpagemap = dpagemap;
- devmem_allocation->size = size;
-}
-EXPORT_SYMBOL_GPL(drm_gpusvm_devmem_init);
-
MODULE_DESCRIPTION("DRM GPUSVM");
MODULE_LICENSE("GPL");
diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c
new file mode 100644
index 000000000000..3551a50d7381
--- /dev/null
+++ b/drivers/gpu/drm/drm_pagemap.c
@@ -0,0 +1,811 @@
+// SPDX-License-Identifier: GPL-2.0-only OR MIT
+/*
+ * Copyright © 2024-2025 Intel Corporation
+ */
+
+#include <linux/dma-mapping.h>
+#include <linux/migrate.h>
+#include <linux/pagemap.h>
+#include <drm/drm_pagemap.h>
+
+/**
+ * DOC: Overview
+ *
+ * The DRM pagemap layer is intended to augment the dev_pagemap functionality by
+ * providing a way to populate a struct mm_struct virtual range with device
+ * private pages and to provide helpers to abstract device memory allocations,
+ * to migrate memory back and forth between device memory and system RAM and
+ * to handle access (and in the future migration) between devices implementing
+ * a fast interconnect that is not necessarily visible to the rest of the
+ * system.
+ *
+ * Typically the DRM pagemap receives requests from one or more DRM GPU SVM
+ * instances to populate struct mm_struct virtual ranges with memory, and the
+ * migration is best effort only and may thus fail. The implementation should
+ * also handle device unbinding by blocking (return an -ENODEV) error for new
+ * population requests and after that migrate all device pages to system ram.
+ */
+
+/**
+ * DOC: Migration
+ * Migration granularity typically follows the GPU SVM range requests, but
+ * if there are clashes, due to races or due to the fact that multiple GPU
+ * SVM instances have different views of the ranges used, and because of that
+ * parts of a requested range is already present in the requested device memory,
+ * the implementation has a variety of options. It can fail and it can choose
+ * to populate only the part of the range that isn't already in device memory,
+ * and it can evict the range to system before trying to migrate. Ideally an
+ * implementation would just try to migrate the missing part of the range and
+ * allocate just enough memory to do so.
+ *
+ * When migrating to system memory as a response to a cpu fault or a device
+ * memory eviction request, currently a full device memory allocation is
+ * migrated back to system. Moving forward this might need improvement for
+ * situations where a single page needs bouncing between system memory and
+ * device memory due to, for example, atomic operations.
+ *
+ * Key DRM pagemap components:
+ *
+ * - Device Memory Allocations:
+ * Embedded structure containing enough information for the drm_pagemap to
+ * migrate to / from device memory.
+ *
+ * - Device Memory Operations:
+ * Define the interface for driver-specific device memory operations
+ * release memory, populate pfns, and copy to / from device memory.
+ */
+
+/**
+ * struct drm_pagemap_zdd - GPU SVM zone device data
+ *
+ * @refcount: Reference count for the zdd
+ * @devmem_allocation: device memory allocation
+ * @device_private_page_owner: Device private pages owner
+ *
+ * This structure serves as a generic wrapper installed in
+ * page->zone_device_data. It provides infrastructure for looking up a device
+ * memory allocation upon CPU page fault and asynchronously releasing device
+ * memory once the CPU has no page references. Asynchronous release is useful
+ * because CPU page references can be dropped in IRQ contexts, while releasing
+ * device memory likely requires sleeping locks.
+ */
+struct drm_pagemap_zdd {
+ struct kref refcount;
+ struct drm_pagemap_devmem *devmem_allocation;
+ void *device_private_page_owner;
+};
+
+/**
+ * drm_pagemap_zdd_alloc() - Allocate a zdd structure.
+ * @device_private_page_owner: Device private pages owner
+ *
+ * This function allocates and initializes a new zdd structure. It sets up the
+ * reference count and initializes the destroy work.
+ *
+ * Return: Pointer to the allocated zdd on success, ERR_PTR() on failure.
+ */
+static struct drm_pagemap_zdd *
+drm_pagemap_zdd_alloc(void *device_private_page_owner)
+{
+ struct drm_pagemap_zdd *zdd;
+
+ zdd = kmalloc(sizeof(*zdd), GFP_KERNEL);
+ if (!zdd)
+ return NULL;
+
+ kref_init(&zdd->refcount);
+ zdd->devmem_allocation = NULL;
+ zdd->device_private_page_owner = device_private_page_owner;
+
+ return zdd;
+}
+
+/**
+ * drm_pagemap_zdd_get() - Get a reference to a zdd structure.
+ * @zdd: Pointer to the zdd structure.
+ *
+ * This function increments the reference count of the provided zdd structure.
+ *
+ * Return: Pointer to the zdd structure.
+ */
+static struct drm_pagemap_zdd *drm_pagemap_zdd_get(struct drm_pagemap_zdd *zdd)
+{
+ kref_get(&zdd->refcount);
+ return zdd;
+}
+
+/**
+ * drm_pagemap_zdd_destroy() - Destroy a zdd structure.
+ * @ref: Pointer to the reference count structure.
+ *
+ * This function queues the destroy_work of the zdd for asynchronous destruction.
+ */
+static void drm_pagemap_zdd_destroy(struct kref *ref)
+{
+ struct drm_pagemap_zdd *zdd =
+ container_of(ref, struct drm_pagemap_zdd, refcount);
+ struct drm_pagemap_devmem *devmem = zdd->devmem_allocation;
+
+ if (devmem) {
+ complete_all(&devmem->detached);
+ if (devmem->ops->devmem_release)
+ devmem->ops->devmem_release(devmem);
+ }
+ kfree(zdd);
+}
+
+/**
+ * drm_pagemap_zdd_put() - Put a zdd reference.
+ * @zdd: Pointer to the zdd structure.
+ *
+ * This function decrements the reference count of the provided zdd structure
+ * and schedules its destruction if the count drops to zero.
+ */
+static void drm_pagemap_zdd_put(struct drm_pagemap_zdd *zdd)
+{
+ kref_put(&zdd->refcount, drm_pagemap_zdd_destroy);
+}
+
+/**
+ * drm_pagemap_migration_unlock_put_page() - Put a migration page
+ * @page: Pointer to the page to put
+ *
+ * This function unlocks and puts a page.
+ */
+static void drm_pagemap_migration_unlock_put_page(struct page *page)
+{
+ unlock_page(page);
+ put_page(page);
+}
+
+/**
+ * drm_pagemap_migration_unlock_put_pages() - Put migration pages
+ * @npages: Number of pages
+ * @migrate_pfn: Array of migrate page frame numbers
+ *
+ * This function unlocks and puts an array of pages.
+ */
+static void drm_pagemap_migration_unlock_put_pages(unsigned long npages,
+ unsigned long *migrate_pfn)
+{
+ unsigned long i;
+
+ for (i = 0; i < npages; ++i) {
+ struct page *page;
+
+ if (!migrate_pfn[i])
+ continue;
+
+ page = migrate_pfn_to_page(migrate_pfn[i]);
+ drm_pagemap_migration_unlock_put_page(page);
+ migrate_pfn[i] = 0;
+ }
+}
+
+/**
+ * drm_pagemap_get_devmem_page() - Get a reference to a device memory page
+ * @page: Pointer to the page
+ * @zdd: Pointer to the GPU SVM zone device data
+ *
+ * This function associates the given page with the specified GPU SVM zone
+ * device data and initializes it for zone device usage.
+ */
+static void drm_pagemap_get_devmem_page(struct page *page,
+ struct drm_pagemap_zdd *zdd)
+{
+ page->zone_device_data = drm_pagemap_zdd_get(zdd);
+ zone_device_page_init(page);
+}
+
+/**
+ * drm_pagemap_migrate_map_pages() - Map migration pages for GPU SVM migration
+ * @dev: The device for which the pages are being mapped
+ * @dma_addr: Array to store DMA addresses corresponding to mapped pages
+ * @migrate_pfn: Array of migrate page frame numbers to map
+ * @npages: Number of pages to map
+ * @dir: Direction of data transfer (e.g., DMA_BIDIRECTIONAL)
+ *
+ * This function maps pages of memory for migration usage in GPU SVM. It
+ * iterates over each page frame number provided in @migrate_pfn, maps the
+ * corresponding page, and stores the DMA address in the provided @dma_addr
+ * array.
+ *
+ * Returns: 0 on success, -EFAULT if an error occurs during mapping.
+ */
+static int drm_pagemap_migrate_map_pages(struct device *dev,
+ dma_addr_t *dma_addr,
+ unsigned long *migrate_pfn,
+ unsigned long npages,
+ enum dma_data_direction dir)
+{
+ unsigned long i;
+
+ for (i = 0; i < npages; ++i) {
+ struct page *page = migrate_pfn_to_page(migrate_pfn[i]);
+
+ if (!page)
+ continue;
+
+ if (WARN_ON_ONCE(is_zone_device_page(page)))
+ return -EFAULT;
+
+ dma_addr[i] = dma_map_page(dev, page, 0, PAGE_SIZE, dir);
+ if (dma_mapping_error(dev, dma_addr[i]))
+ return -EFAULT;
+ }
+
+ return 0;
+}
+
+/**
+ * drm_pagemap_migrate_unmap_pages() - Unmap pages previously mapped for GPU SVM migration
+ * @dev: The device for which the pages were mapped
+ * @dma_addr: Array of DMA addresses corresponding to mapped pages
+ * @npages: Number of pages to unmap
+ * @dir: Direction of data transfer (e.g., DMA_BIDIRECTIONAL)
+ *
+ * This function unmaps previously mapped pages of memory for GPU Shared Virtual
+ * Memory (SVM). It iterates over each DMA address provided in @dma_addr, checks
+ * if it's valid and not already unmapped, and unmaps the corresponding page.
+ */
+static void drm_pagemap_migrate_unmap_pages(struct device *dev,
+ dma_addr_t *dma_addr,
+ unsigned long npages,
+ enum dma_data_direction dir)
+{
+ unsigned long i;
+
+ for (i = 0; i < npages; ++i) {
+ if (!dma_addr[i] || dma_mapping_error(dev, dma_addr[i]))
+ continue;
+
+ dma_unmap_page(dev, dma_addr[i], PAGE_SIZE, dir);
+ }
+}
+
+static unsigned long
+npages_in_range(unsigned long start, unsigned long end)
+{
+ return (end - start) >> PAGE_SHIFT;
+}
+
+
+/**
+ * drm_pagemap_migrate_to_devmem() - Migrate a struct mm_struct range to device memory
+ * @devmem_allocation: The device memory allocation to migrate to.
+ * The caller should hold a reference to the device memory allocation,
+ * and the reference is consumed by this function unless it returns with
+ * an error.
+ * @mm: Pointer to the struct mm_struct.
+ * @start: Start of the virtual address range to migrate.
+ * @end: End of the virtual address range to migrate.
+ * @pgmap_owner: Not used currently, since only system memory is considered.
+ *
+ * This function migrates the specified virtual address range to device memory.
+ * It performs the necessary setup and invokes the driver-specific operations for
+ * migration to device memory. Expected to be called while holding the mmap lock in
+ * at least read mode.
+ *
+ * Return: %0 on success, negative error code on failure.
+ */
+
+/*
+ * @range: Pointer to the GPU SVM range structure
+ * @devmem_allocation: Pointer to the device memory allocation. The caller
+ * should hold a reference to the device memory allocation,
+ * which should be dropped via ops->devmem_release or upon
+ * the failure of this function.
+ * @ctx: GPU SVM context
+ *
+ * This function migrates the specified GPU SVM range to device memory. It
+ * performs the necessary setup and invokes the driver-specific operations for
+ * migration to device memory. Upon successful return, @devmem_allocation can
+ * safely reference @range until ops->devmem_release is called which only upon
+ * successful return. Expected to be called while holding the mmap lock in read
+ * mode.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int drm_pagemap_migrate_to_devmem(struct drm_pagemap_devmem *devmem_allocation,
+ struct mm_struct *mm,
+ unsigned long start, unsigned long end,
+ unsigned long timeslice_ms,
+ void *pgmap_owner)
+{
+ const struct drm_pagemap_devmem_ops *ops = devmem_allocation->ops;
+ struct migrate_vma migrate = {
+ .start = start,
+ .end = end,
+ .pgmap_owner = pgmap_owner,
+ .flags = MIGRATE_VMA_SELECT_SYSTEM,
+ };
+ unsigned long i, npages = npages_in_range(start, end);
+ struct vm_area_struct *vas;
+ struct drm_pagemap_zdd *zdd = NULL;
+ struct page **pages;
+ dma_addr_t *dma_addr;
+ void *buf;
+ int err;
+
+ mmap_assert_locked(mm);
+
+ if (!ops->populate_devmem_pfn || !ops->copy_to_devmem ||
+ !ops->copy_to_ram)
+ return -EOPNOTSUPP;
+
+ vas = vma_lookup(mm, start);
+ if (!vas) {
+ err = -ENOENT;
+ goto err_out;
+ }
+
+ if (end > vas->vm_end || start < vas->vm_start) {
+ err = -EINVAL;
+ goto err_out;
+ }
+
+ if (!vma_is_anonymous(vas)) {
+ err = -EBUSY;
+ goto err_out;
+ }
+
+ buf = kvcalloc(npages, 2 * sizeof(*migrate.src) + sizeof(*dma_addr) +
+ sizeof(*pages), GFP_KERNEL);
+ if (!buf) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+ dma_addr = buf + (2 * sizeof(*migrate.src) * npages);
+ pages = buf + (2 * sizeof(*migrate.src) + sizeof(*dma_addr)) * npages;
+
+ zdd = drm_pagemap_zdd_alloc(pgmap_owner);
+ if (!zdd) {
+ err = -ENOMEM;
+ goto err_free;
+ }
+
+ migrate.vma = vas;
+ migrate.src = buf;
+ migrate.dst = migrate.src + npages;
+
+ err = migrate_vma_setup(&migrate);
+ if (err)
+ goto err_free;
+
+ if (!migrate.cpages) {
+ err = -EFAULT;
+ goto err_free;
+ }
+
+ if (migrate.cpages != npages) {
+ err = -EBUSY;
+ goto err_finalize;
+ }
+
+ err = ops->populate_devmem_pfn(devmem_allocation, npages, migrate.dst);
+ if (err)
+ goto err_finalize;
+
+ err = drm_pagemap_migrate_map_pages(devmem_allocation->dev, dma_addr,
+ migrate.src, npages, DMA_TO_DEVICE);
+ if (err)
+ goto err_finalize;
+
+ for (i = 0; i < npages; ++i) {
+ struct page *page = pfn_to_page(migrate.dst[i]);
+
+ pages[i] = page;
+ migrate.dst[i] = migrate_pfn(migrate.dst[i]);
+ drm_pagemap_get_devmem_page(page, zdd);
+ }
+
+ err = ops->copy_to_devmem(pages, dma_addr, npages);
+ if (err)
+ goto err_finalize;
+
+ /* Upon success bind devmem allocation to range and zdd */
+ devmem_allocation->timeslice_expiration = get_jiffies_64() +
+ msecs_to_jiffies(timeslice_ms);
+ zdd->devmem_allocation = devmem_allocation; /* Owns ref */
+
+err_finalize:
+ if (err)
+ drm_pagemap_migration_unlock_put_pages(npages, migrate.dst);
+ migrate_vma_pages(&migrate);
+ migrate_vma_finalize(&migrate);
+ drm_pagemap_migrate_unmap_pages(devmem_allocation->dev, dma_addr, npages,
+ DMA_TO_DEVICE);
+err_free:
+ if (zdd)
+ drm_pagemap_zdd_put(zdd);
+ kvfree(buf);
+err_out:
+ return err;
+}
+EXPORT_SYMBOL_GPL(drm_pagemap_migrate_to_devmem);
+
+/**
+ * drm_pagemap_migrate_populate_ram_pfn() - Populate RAM PFNs for a VM area
+ * @vas: Pointer to the VM area structure, can be NULL
+ * @fault_page: Fault page
+ * @npages: Number of pages to populate
+ * @mpages: Number of pages to migrate
+ * @src_mpfn: Source array of migrate PFNs
+ * @mpfn: Array of migrate PFNs to populate
+ * @addr: Start address for PFN allocation
+ *
+ * This function populates the RAM migrate page frame numbers (PFNs) for the
+ * specified VM area structure. It allocates and locks pages in the VM area for
+ * RAM usage. If vas is non-NULL use alloc_page_vma for allocation, if NULL use
+ * alloc_page for allocation.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+static int drm_pagemap_migrate_populate_ram_pfn(struct vm_area_struct *vas,
+ struct page *fault_page,
+ unsigned long npages,
+ unsigned long *mpages,
+ unsigned long *src_mpfn,
+ unsigned long *mpfn,
+ unsigned long addr)
+{
+ unsigned long i;
+
+ for (i = 0; i < npages; ++i, addr += PAGE_SIZE) {
+ struct page *page, *src_page;
+
+ if (!(src_mpfn[i] & MIGRATE_PFN_MIGRATE))
+ continue;
+
+ src_page = migrate_pfn_to_page(src_mpfn[i]);
+ if (!src_page)
+ continue;
+
+ if (fault_page) {
+ if (src_page->zone_device_data !=
+ fault_page->zone_device_data)
+ continue;
+ }
+
+ if (vas)
+ page = alloc_page_vma(GFP_HIGHUSER, vas, addr);
+ else
+ page = alloc_page(GFP_HIGHUSER);
+
+ if (!page)
+ goto free_pages;
+
+ mpfn[i] = migrate_pfn(page_to_pfn(page));
+ }
+
+ for (i = 0; i < npages; ++i) {
+ struct page *page = migrate_pfn_to_page(mpfn[i]);
+
+ if (!page)
+ continue;
+
+ WARN_ON_ONCE(!trylock_page(page));
+ ++*mpages;
+ }
+
+ return 0;
+
+free_pages:
+ for (i = 0; i < npages; ++i) {
+ struct page *page = migrate_pfn_to_page(mpfn[i]);
+
+ if (!page)
+ continue;
+
+ put_page(page);
+ mpfn[i] = 0;
+ }
+ return -ENOMEM;
+}
+
+/**
+ * drm_pagemap_evict_to_ram() - Evict GPU SVM range to RAM
+ * @devmem_allocation: Pointer to the device memory allocation
+ *
+ * Similar to __drm_pagemap_migrate_to_ram but does not require mmap lock and
+ * migration done via migrate_device_* functions.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int drm_pagemap_evict_to_ram(struct drm_pagemap_devmem *devmem_allocation)
+{
+ const struct drm_pagemap_devmem_ops *ops = devmem_allocation->ops;
+ unsigned long npages, mpages = 0;
+ struct page **pages;
+ unsigned long *src, *dst;
+ dma_addr_t *dma_addr;
+ void *buf;
+ int i, err = 0;
+ unsigned int retry_count = 2;
+
+ npages = devmem_allocation->size >> PAGE_SHIFT;
+
+retry:
+ if (!mmget_not_zero(devmem_allocation->mm))
+ return -EFAULT;
+
+ buf = kvcalloc(npages, 2 * sizeof(*src) + sizeof(*dma_addr) +
+ sizeof(*pages), GFP_KERNEL);
+ if (!buf) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+ src = buf;
+ dst = buf + (sizeof(*src) * npages);
+ dma_addr = buf + (2 * sizeof(*src) * npages);
+ pages = buf + (2 * sizeof(*src) + sizeof(*dma_addr)) * npages;
+
+ err = ops->populate_devmem_pfn(devmem_allocation, npages, src);
+ if (err)
+ goto err_free;
+
+ err = migrate_device_pfns(src, npages);
+ if (err)
+ goto err_free;
+
+ err = drm_pagemap_migrate_populate_ram_pfn(NULL, NULL, npages, &mpages,
+ src, dst, 0);
+ if (err || !mpages)
+ goto err_finalize;
+
+ err = drm_pagemap_migrate_map_pages(devmem_allocation->dev, dma_addr,
+ dst, npages, DMA_FROM_DEVICE);
+ if (err)
+ goto err_finalize;
+
+ for (i = 0; i < npages; ++i)
+ pages[i] = migrate_pfn_to_page(src[i]);
+
+ err = ops->copy_to_ram(pages, dma_addr, npages);
+ if (err)
+ goto err_finalize;
+
+err_finalize:
+ if (err)
+ drm_pagemap_migration_unlock_put_pages(npages, dst);
+ migrate_device_pages(src, dst, npages);
+ migrate_device_finalize(src, dst, npages);
+ drm_pagemap_migrate_unmap_pages(devmem_allocation->dev, dma_addr, npages,
+ DMA_FROM_DEVICE);
+err_free:
+ kvfree(buf);
+err_out:
+ mmput_async(devmem_allocation->mm);
+
+ if (completion_done(&devmem_allocation->detached))
+ return 0;
+
+ if (retry_count--) {
+ cond_resched();
+ goto retry;
+ }
+
+ return err ?: -EBUSY;
+}
+EXPORT_SYMBOL_GPL(drm_pagemap_evict_to_ram);
+
+/**
+ * __drm_pagemap_migrate_to_ram() - Migrate GPU SVM range to RAM (internal)
+ * @vas: Pointer to the VM area structure
+ * @device_private_page_owner: Device private pages owner
+ * @page: Pointer to the page for fault handling (can be NULL)
+ * @fault_addr: Fault address
+ * @size: Size of migration
+ *
+ * This internal function performs the migration of the specified GPU SVM range
+ * to RAM. It sets up the migration, populates + dma maps RAM PFNs, and
+ * invokes the driver-specific operations for migration to RAM.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+static int __drm_pagemap_migrate_to_ram(struct vm_area_struct *vas,
+ void *device_private_page_owner,
+ struct page *page,
+ unsigned long fault_addr,
+ unsigned long size)
+{
+ struct migrate_vma migrate = {
+ .vma = vas,
+ .pgmap_owner = device_private_page_owner,
+ .flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE |
+ MIGRATE_VMA_SELECT_DEVICE_COHERENT,
+ .fault_page = page,
+ };
+ struct drm_pagemap_zdd *zdd;
+ const struct drm_pagemap_devmem_ops *ops;
+ struct device *dev = NULL;
+ unsigned long npages, mpages = 0;
+ struct page **pages;
+ dma_addr_t *dma_addr;
+ unsigned long start, end;
+ void *buf;
+ int i, err = 0;
+
+ if (page) {
+ zdd = page->zone_device_data;
+ if (time_before64(get_jiffies_64(),
+ zdd->devmem_allocation->timeslice_expiration))
+ return 0;
+ }
+
+ start = ALIGN_DOWN(fault_addr, size);
+ end = ALIGN(fault_addr + 1, size);
+
+ /* Corner where VMA area struct has been partially unmapped */
+ if (start < vas->vm_start)
+ start = vas->vm_start;
+ if (end > vas->vm_end)
+ end = vas->vm_end;
+
+ migrate.start = start;
+ migrate.end = end;
+ npages = npages_in_range(start, end);
+
+ buf = kvcalloc(npages, 2 * sizeof(*migrate.src) + sizeof(*dma_addr) +
+ sizeof(*pages), GFP_KERNEL);
+ if (!buf) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+ dma_addr = buf + (2 * sizeof(*migrate.src) * npages);
+ pages = buf + (2 * sizeof(*migrate.src) + sizeof(*dma_addr)) * npages;
+
+ migrate.vma = vas;
+ migrate.src = buf;
+ migrate.dst = migrate.src + npages;
+
+ err = migrate_vma_setup(&migrate);
+ if (err)
+ goto err_free;
+
+ /* Raced with another CPU fault, nothing to do */
+ if (!migrate.cpages)
+ goto err_free;
+
+ if (!page) {
+ for (i = 0; i < npages; ++i) {
+ if (!(migrate.src[i] & MIGRATE_PFN_MIGRATE))
+ continue;
+
+ page = migrate_pfn_to_page(migrate.src[i]);
+ break;
+ }
+
+ if (!page)
+ goto err_finalize;
+ }
+ zdd = page->zone_device_data;
+ ops = zdd->devmem_allocation->ops;
+ dev = zdd->devmem_allocation->dev;
+
+ err = drm_pagemap_migrate_populate_ram_pfn(vas, page, npages, &mpages,
+ migrate.src, migrate.dst,
+ start);
+ if (err)
+ goto err_finalize;
+
+ err = drm_pagemap_migrate_map_pages(dev, dma_addr, migrate.dst, npages,
+ DMA_FROM_DEVICE);
+ if (err)
+ goto err_finalize;
+
+ for (i = 0; i < npages; ++i)
+ pages[i] = migrate_pfn_to_page(migrate.src[i]);
+
+ err = ops->copy_to_ram(pages, dma_addr, npages);
+ if (err)
+ goto err_finalize;
+
+err_finalize:
+ if (err)
+ drm_pagemap_migration_unlock_put_pages(npages, migrate.dst);
+ migrate_vma_pages(&migrate);
+ migrate_vma_finalize(&migrate);
+ if (dev)
+ drm_pagemap_migrate_unmap_pages(dev, dma_addr, npages,
+ DMA_FROM_DEVICE);
+err_free:
+ kvfree(buf);
+err_out:
+
+ return err;
+}
+
+/**
+ * drm_pagemap_page_free() - Put GPU SVM zone device data associated with a page
+ * @page: Pointer to the page
+ *
+ * This function is a callback used to put the GPU SVM zone device data
+ * associated with a page when it is being released.
+ */
+static void drm_pagemap_page_free(struct page *page)
+{
+ drm_pagemap_zdd_put(page->zone_device_data);
+}
+
+/**
+ * drm_pagemap_migrate_to_ram() - Migrate a virtual range to RAM (page fault handler)
+ * @vmf: Pointer to the fault information structure
+ *
+ * This function is a page fault handler used to migrate a virtual range
+ * to ram. The device memory allocation in which the device page is found is
+ * migrated in its entirety.
+ *
+ * Returns:
+ * VM_FAULT_SIGBUS on failure, 0 on success.
+ */
+static vm_fault_t drm_pagemap_migrate_to_ram(struct vm_fault *vmf)
+{
+ struct drm_pagemap_zdd *zdd = vmf->page->zone_device_data;
+ int err;
+
+ err = __drm_pagemap_migrate_to_ram(vmf->vma,
+ zdd->device_private_page_owner,
+ vmf->page, vmf->address,
+ zdd->devmem_allocation->size);
+
+ return err ? VM_FAULT_SIGBUS : 0;
+}
+
+static const struct dev_pagemap_ops drm_pagemap_pagemap_ops = {
+ .page_free = drm_pagemap_page_free,
+ .migrate_to_ram = drm_pagemap_migrate_to_ram,
+};
+
+/**
+ * drm_pagemap_pagemap_ops_get() - Retrieve GPU SVM device page map operations
+ *
+ * Returns:
+ * Pointer to the GPU SVM device page map operations structure.
+ */
+const struct dev_pagemap_ops *drm_pagemap_pagemap_ops_get(void)
+{
+ return &drm_pagemap_pagemap_ops;
+}
+EXPORT_SYMBOL_GPL(drm_pagemap_pagemap_ops_get);
+
+/**
+ * drm_pagemap_devmem_init() - Initialize a drm_pagemap device memory allocation
+ *
+ * @devmem_allocation: The struct drm_pagemap_devmem to initialize.
+ * @dev: Pointer to the device structure which device memory allocation belongs to
+ * @mm: Pointer to the mm_struct for the address space
+ * @ops: Pointer to the operations structure for GPU SVM device memory
+ * @dpagemap: The struct drm_pagemap we're allocating from.
+ * @size: Size of device memory allocation
+ */
+void drm_pagemap_devmem_init(struct drm_pagemap_devmem *devmem_allocation,
+ struct device *dev, struct mm_struct *mm,
+ const struct drm_pagemap_devmem_ops *ops,
+ struct drm_pagemap *dpagemap, size_t size)
+{
+ init_completion(&devmem_allocation->detached);
+ devmem_allocation->dev = dev;
+ devmem_allocation->mm = mm;
+ devmem_allocation->ops = ops;
+ devmem_allocation->dpagemap = dpagemap;
+ devmem_allocation->size = size;
+}
+EXPORT_SYMBOL_GPL(drm_pagemap_devmem_init);
+
+/**
+ * drm_pagemap_page_to_dpagemap() - Return a pointer the drm_pagemap of a page
+ * @page: The struct page.
+ *
+ * Return: A pointer to the struct drm_pagemap of a device private page that
+ * was populated from the struct drm_pagemap. If the page was *not* populated
+ * from a struct drm_pagemap, the result is undefined and the function call
+ * may result in dereferencing and invalid address.
+ */
+struct drm_pagemap *drm_pagemap_page_to_dpagemap(struct page *page)
+{
+ struct drm_pagemap_zdd *zdd = page->zone_device_data;
+
+ return zdd->devmem_allocation->dpagemap;
+}
+EXPORT_SYMBOL_GPL(drm_pagemap_page_to_dpagemap);
diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
index 98b46c534278..c7c71734460b 100644
--- a/drivers/gpu/drm/xe/Kconfig
+++ b/drivers/gpu/drm/xe/Kconfig
@@ -87,14 +87,16 @@ config DRM_XE_GPUSVM
If in doubut say "Y".
-config DRM_XE_DEVMEM_MIRROR
- bool "Enable device memory mirror"
+config DRM_XE_PAGEMAP
+ bool "Enable device memory pool for SVM"
depends on DRM_XE_GPUSVM
select GET_FREE_REGION
default y
help
- Disable this option only if you want to compile out without device
- memory mirror. Will reduce KMD memory footprint when disabled.
+ Disable this option only if you don't want to expose local device
+ memory for SVM. Will reduce KMD memory footprint when disabled.
+
+ If in doubut say "Y".
config DRM_XE_FORCE_PROBE
string "Force probe xe for selected Intel hardware IDs"
diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
index eb5e83c5f233..e0efaf23d051 100644
--- a/drivers/gpu/drm/xe/xe_bo_types.h
+++ b/drivers/gpu/drm/xe/xe_bo_types.h
@@ -86,7 +86,7 @@ struct xe_bo {
u16 cpu_caching;
/** @devmem_allocation: SVM device memory allocation */
- struct drm_gpusvm_devmem devmem_allocation;
+ struct drm_pagemap_devmem devmem_allocation;
/** @vram_userfault_link: Link into @mem_access.vram_userfault.list */
struct list_head vram_userfault_link;
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index b93c04466637..67b7f733dd69 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -104,7 +104,7 @@ struct xe_vram_region {
void __iomem *mapping;
/** @ttm: VRAM TTM manager */
struct xe_ttm_vram_mgr ttm;
-#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
+#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
/** @pagemap: Used to remap device memory as ZONE_DEVICE */
struct dev_pagemap pagemap;
/**
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index f27fb9b588de..e161ce3e67a1 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -329,7 +329,7 @@ static void xe_svm_garbage_collector_work_func(struct work_struct *w)
up_write(&vm->lock);
}
-#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
+#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
static struct xe_vram_region *page_to_vr(struct page *page)
{
@@ -517,12 +517,12 @@ static int xe_svm_copy_to_ram(struct page **pages, dma_addr_t *dma_addr,
return xe_svm_copy(pages, dma_addr, npages, XE_SVM_COPY_TO_SRAM);
}
-static struct xe_bo *to_xe_bo(struct drm_gpusvm_devmem *devmem_allocation)
+static struct xe_bo *to_xe_bo(struct drm_pagemap_devmem *devmem_allocation)
{
return container_of(devmem_allocation, struct xe_bo, devmem_allocation);
}
-static void xe_svm_devmem_release(struct drm_gpusvm_devmem *devmem_allocation)
+static void xe_svm_devmem_release(struct drm_pagemap_devmem *devmem_allocation)
{
struct xe_bo *bo = to_xe_bo(devmem_allocation);
@@ -539,7 +539,7 @@ static struct drm_buddy *tile_to_buddy(struct xe_tile *tile)
return &tile->mem.vram.ttm.mm;
}
-static int xe_svm_populate_devmem_pfn(struct drm_gpusvm_devmem *devmem_allocation,
+static int xe_svm_populate_devmem_pfn(struct drm_pagemap_devmem *devmem_allocation,
unsigned long npages, unsigned long *pfn)
{
struct xe_bo *bo = to_xe_bo(devmem_allocation);
@@ -562,7 +562,7 @@ static int xe_svm_populate_devmem_pfn(struct drm_gpusvm_devmem *devmem_allocatio
return 0;
}
-static const struct drm_gpusvm_devmem_ops gpusvm_devmem_ops = {
+static const struct drm_pagemap_devmem_ops dpagemap_devmem_ops = {
.devmem_release = xe_svm_devmem_release,
.populate_devmem_pfn = xe_svm_populate_devmem_pfn,
.copy_to_devmem = xe_svm_copy_to_devmem,
@@ -714,7 +714,7 @@ u64 xe_svm_find_vma_start(struct xe_vm *vm, u64 start, u64 end, struct xe_vma *v
min(end, xe_vma_end(vma)));
}
-#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
+#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
static struct xe_vram_region *tile_to_vr(struct xe_tile *tile)
{
return &tile->mem.vram;
@@ -742,6 +742,9 @@ int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
ktime_t end = 0;
int err;
+ if (!range->base.flags.migrate_devmem)
+ return -EINVAL;
+
range_debug(range, "ALLOCATE VRAM");
if (!mmget_not_zero(mm))
@@ -761,19 +764,23 @@ int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
goto unlock;
}
- drm_gpusvm_devmem_init(&bo->devmem_allocation,
- vm->xe->drm.dev, mm,
- &gpusvm_devmem_ops,
- &tile->mem.vram.dpagemap,
- xe_svm_range_size(range));
+ drm_pagemap_devmem_init(&bo->devmem_allocation,
+ vm->xe->drm.dev, mm,
+ &dpagemap_devmem_ops,
+ &tile->mem.vram.dpagemap,
+ xe_svm_range_size(range));
blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)->blocks;
list_for_each_entry(block, blocks, link)
block->private = vr;
xe_bo_get(bo);
- err = drm_gpusvm_migrate_to_devmem(&vm->svm.gpusvm, &range->base,
- &bo->devmem_allocation, ctx);
+ err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation,
+ mm,
+ xe_svm_range_start(range),
+ xe_svm_range_end(range),
+ ctx->timeslice_ms,
+ xe_svm_devm_owner(vm->xe));
if (err)
xe_svm_devmem_release(&bo->devmem_allocation);
@@ -848,13 +855,13 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
struct drm_gpusvm_ctx ctx = {
.read_only = xe_vma_read_only(vma),
.devmem_possible = IS_DGFX(vm->xe) &&
- IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR),
- .check_pages_threshold = IS_DGFX(vm->xe) &&
- IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR) ? SZ_64K : 0,
+ IS_ENABLED(CONFIG_DRM_XE_PAGEMAP),
+ .check_pages_threshold = IS_DGFX(vm->xe) &&
+ IS_ENABLED(CONFIG_DRM_XE_PAGEMAP) ? SZ_64K : 0,
.devmem_only = atomic && IS_DGFX(vm->xe) &&
- IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR),
+ IS_ENABLED(CONFIG_DRM_XE_PAGEMAP),
.timeslice_ms = atomic && IS_DGFX(vm->xe) &&
- IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR) ?
+ IS_ENABLED(CONFIG_DRM_XE_PAGEMAP) ?
vm->xe->atomic_svm_timeslice_ms : 0,
};
struct xe_svm_range *range;
@@ -992,7 +999,7 @@ bool xe_svm_has_mapping(struct xe_vm *vm, u64 start, u64 end)
*/
int xe_svm_bo_evict(struct xe_bo *bo)
{
- return drm_gpusvm_evict_to_ram(&bo->devmem_allocation);
+ return drm_pagemap_evict_to_ram(&bo->devmem_allocation);
}
/**
@@ -1045,7 +1052,7 @@ int xe_svm_range_get_pages(struct xe_vm *vm, struct xe_svm_range *range,
return err;
}
-#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
+#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
static struct drm_pagemap_device_addr
xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap,
@@ -1102,7 +1109,7 @@ int xe_devm_add(struct xe_tile *tile, struct xe_vram_region *vr)
vr->pagemap.range.start = res->start;
vr->pagemap.range.end = res->end;
vr->pagemap.nr_range = 1;
- vr->pagemap.ops = drm_gpusvm_pagemap_ops_get();
+ vr->pagemap.ops = drm_pagemap_pagemap_ops_get();
vr->pagemap.owner = xe_svm_devm_owner(xe);
addr = devm_memremap_pages(dev, &vr->pagemap);
diff --git a/include/drm/drm_gpusvm.h b/include/drm/drm_gpusvm.h
index 6a5156476bf4..4aedc5423aff 100644
--- a/include/drm/drm_gpusvm.h
+++ b/include/drm/drm_gpusvm.h
@@ -16,91 +16,9 @@ struct drm_gpusvm;
struct drm_gpusvm_notifier;
struct drm_gpusvm_ops;
struct drm_gpusvm_range;
-struct drm_gpusvm_devmem;
struct drm_pagemap;
struct drm_pagemap_device_addr;
-/**
- * struct drm_gpusvm_devmem_ops - Operations structure for GPU SVM device memory
- *
- * This structure defines the operations for GPU Shared Virtual Memory (SVM)
- * device memory. These operations are provided by the GPU driver to manage device memory
- * allocations and perform operations such as migration between device memory and system
- * RAM.
- */
-struct drm_gpusvm_devmem_ops {
- /**
- * @devmem_release: Release device memory allocation (optional)
- * @devmem_allocation: device memory allocation
- *
- * Release device memory allocation and drop a reference to device
- * memory allocation.
- */
- void (*devmem_release)(struct drm_gpusvm_devmem *devmem_allocation);
-
- /**
- * @populate_devmem_pfn: Populate device memory PFN (required for migration)
- * @devmem_allocation: device memory allocation
- * @npages: Number of pages to populate
- * @pfn: Array of page frame numbers to populate
- *
- * Populate device memory page frame numbers (PFN).
- *
- * Return: 0 on success, a negative error code on failure.
- */
- int (*populate_devmem_pfn)(struct drm_gpusvm_devmem *devmem_allocation,
- unsigned long npages, unsigned long *pfn);
-
- /**
- * @copy_to_devmem: Copy to device memory (required for migration)
- * @pages: Pointer to array of device memory pages (destination)
- * @dma_addr: Pointer to array of DMA addresses (source)
- * @npages: Number of pages to copy
- *
- * Copy pages to device memory.
- *
- * Return: 0 on success, a negative error code on failure.
- */
- int (*copy_to_devmem)(struct page **pages,
- dma_addr_t *dma_addr,
- unsigned long npages);
-
- /**
- * @copy_to_ram: Copy to system RAM (required for migration)
- * @pages: Pointer to array of device memory pages (source)
- * @dma_addr: Pointer to array of DMA addresses (destination)
- * @npages: Number of pages to copy
- *
- * Copy pages to system RAM.
- *
- * Return: 0 on success, a negative error code on failure.
- */
- int (*copy_to_ram)(struct page **pages,
- dma_addr_t *dma_addr,
- unsigned long npages);
-};
-
-/**
- * struct drm_gpusvm_devmem - Structure representing a GPU SVM device memory allocation
- *
- * @dev: Pointer to the device structure which device memory allocation belongs to
- * @mm: Pointer to the mm_struct for the address space
- * @detached: device memory allocations is detached from device pages
- * @ops: Pointer to the operations structure for GPU SVM device memory
- * @dpagemap: The struct drm_pagemap of the pages this allocation belongs to.
- * @size: Size of device memory allocation
- * @timeslice_expiration: Timeslice expiration in jiffies
- */
-struct drm_gpusvm_devmem {
- struct device *dev;
- struct mm_struct *mm;
- struct completion detached;
- const struct drm_gpusvm_devmem_ops *ops;
- struct drm_pagemap *dpagemap;
- size_t size;
- u64 timeslice_expiration;
-};
-
/**
* struct drm_gpusvm_ops - Operations structure for GPU SVM
*
@@ -361,15 +279,6 @@ void drm_gpusvm_range_unmap_pages(struct drm_gpusvm *gpusvm,
struct drm_gpusvm_range *range,
const struct drm_gpusvm_ctx *ctx);
-int drm_gpusvm_migrate_to_devmem(struct drm_gpusvm *gpusvm,
- struct drm_gpusvm_range *range,
- struct drm_gpusvm_devmem *devmem_allocation,
- const struct drm_gpusvm_ctx *ctx);
-
-int drm_gpusvm_evict_to_ram(struct drm_gpusvm_devmem *devmem_allocation);
-
-const struct dev_pagemap_ops *drm_gpusvm_pagemap_ops_get(void);
-
bool drm_gpusvm_has_mapping(struct drm_gpusvm *gpusvm, unsigned long start,
unsigned long end);
@@ -380,11 +289,6 @@ drm_gpusvm_range_find(struct drm_gpusvm_notifier *notifier, unsigned long start,
void drm_gpusvm_range_set_unmapped(struct drm_gpusvm_range *range,
const struct mmu_notifier_range *mmu_range);
-void drm_gpusvm_devmem_init(struct drm_gpusvm_devmem *devmem_allocation,
- struct device *dev, struct mm_struct *mm,
- const struct drm_gpusvm_devmem_ops *ops,
- struct drm_pagemap *dpagemap, size_t size);
-
#ifdef CONFIG_LOCKDEP
/**
* drm_gpusvm_driver_set_lock() - Set the lock protecting accesses to GPU SVM
diff --git a/include/drm/drm_pagemap.h b/include/drm/drm_pagemap.h
index 202c157ff4d7..dabc9c365df4 100644
--- a/include/drm/drm_pagemap.h
+++ b/include/drm/drm_pagemap.h
@@ -7,6 +7,7 @@
#include <linux/types.h>
struct drm_pagemap;
+struct drm_pagemap_zdd;
struct device;
/**
@@ -104,4 +105,104 @@ struct drm_pagemap {
struct device *dev;
};
+struct drm_pagemap_devmem;
+
+/**
+ * struct drm_pagemap_devmem_ops - Operations structure for GPU SVM device memory
+ *
+ * This structure defines the operations for GPU Shared Virtual Memory (SVM)
+ * device memory. These operations are provided by the GPU driver to manage device memory
+ * allocations and perform operations such as migration between device memory and system
+ * RAM.
+ */
+struct drm_pagemap_devmem_ops {
+ /**
+ * @devmem_release: Release device memory allocation (optional)
+ * @devmem_allocation: device memory allocation
+ *
+ * Release device memory allocation and drop a reference to device
+ * memory allocation.
+ */
+ void (*devmem_release)(struct drm_pagemap_devmem *devmem_allocation);
+
+ /**
+ * @populate_devmem_pfn: Populate device memory PFN (required for migration)
+ * @devmem_allocation: device memory allocation
+ * @npages: Number of pages to populate
+ * @pfn: Array of page frame numbers to populate
+ *
+ * Populate device memory page frame numbers (PFN).
+ *
+ * Return: 0 on success, a negative error code on failure.
+ */
+ int (*populate_devmem_pfn)(struct drm_pagemap_devmem *devmem_allocation,
+ unsigned long npages, unsigned long *pfn);
+
+ /**
+ * @copy_to_devmem: Copy to device memory (required for migration)
+ * @pages: Pointer to array of device memory pages (destination)
+ * @dma_addr: Pointer to array of DMA addresses (source)
+ * @npages: Number of pages to copy
+ *
+ * Copy pages to device memory.
+ *
+ * Return: 0 on success, a negative error code on failure.
+ */
+ int (*copy_to_devmem)(struct page **pages,
+ dma_addr_t *dma_addr,
+ unsigned long npages);
+
+ /**
+ * @copy_to_ram: Copy to system RAM (required for migration)
+ * @pages: Pointer to array of device memory pages (source)
+ * @dma_addr: Pointer to array of DMA addresses (destination)
+ * @npages: Number of pages to copy
+ *
+ * Copy pages to system RAM.
+ *
+ * Return: 0 on success, a negative error code on failure.
+ */
+ int (*copy_to_ram)(struct page **pages,
+ dma_addr_t *dma_addr,
+ unsigned long npages);
+};
+
+/**
+ * struct drm_pagemap_devmem - Structure representing a GPU SVM device memory allocation
+ *
+ * @dev: Pointer to the device structure which device memory allocation belongs to
+ * @mm: Pointer to the mm_struct for the address space
+ * @detached: device memory allocations is detached from device pages
+ * @ops: Pointer to the operations structure for GPU SVM device memory
+ * @dpagemap: The struct drm_pagemap of the pages this allocation belongs to.
+ * @size: Size of device memory allocation
+ * @timeslice_expiration: Timeslice expiration in jiffies
+ */
+struct drm_pagemap_devmem {
+ struct device *dev;
+ struct mm_struct *mm;
+ struct completion detached;
+ const struct drm_pagemap_devmem_ops *ops;
+ struct drm_pagemap *dpagemap;
+ size_t size;
+ u64 timeslice_expiration;
+};
+
+int drm_pagemap_migrate_to_devmem(struct drm_pagemap_devmem *devmem_allocation,
+ struct mm_struct *mm,
+ unsigned long start, unsigned long end,
+ unsigned long timeslice_ms,
+ void *pgmap_owner);
+
+int drm_pagemap_evict_to_ram(struct drm_pagemap_devmem *devmem_allocation);
+
+const struct dev_pagemap_ops *drm_pagemap_pagemap_ops_get(void);
+
+struct drm_pagemap *drm_pagemap_page_to_dpagemap(struct page *page);
+
+void drm_pagemap_devmem_init(struct drm_pagemap_devmem *devmem_allocation,
+ struct device *dev, struct mm_struct *mm,
+ const struct drm_pagemap_devmem_ops *ops,
+ struct drm_pagemap *dpagemap, size_t size);
+
#endif
--
2.49.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 2/3] drm/pagemap: Add a populate_mm op
2025-06-04 9:35 [PATCH v2 0/3] drm/gpusvm, drm/pagemap, drm/xe: Restructure migration in preparation for multi-device Thomas Hellström
2025-06-04 9:35 ` [PATCH v2 1/3] drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap Thomas Hellström
@ 2025-06-04 9:35 ` Thomas Hellström
2025-06-04 21:06 ` kernel test robot
2025-06-04 22:05 ` Matthew Brost
2025-06-04 9:35 ` [PATCH v2 3/3] drm/xe: Implement and use the drm_pagemap " Thomas Hellström
2025-06-04 10:01 ` [PATCH v2 0/3] drm/gpusvm, drm/pagemap, drm/xe: Restructure migration in preparation for multi-device Christian König
3 siblings, 2 replies; 16+ messages in thread
From: Thomas Hellström @ 2025-06-04 9:35 UTC (permalink / raw)
To: intel-xe
Cc: Thomas Hellström, dri-devel, himal.prasad.ghimiray, apopple,
airlied, Simona Vetter, felix.kuehling, Matthew Brost,
Christian König, dakr, Mrozek, Michal, Joonas Lahtinen
Add an operation to populate a part of a drm_mm with device
private memory.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
drivers/gpu/drm/drm_gpusvm.c | 7 ++-----
drivers/gpu/drm/drm_pagemap.c | 34 ++++++++++++++++++++++++++++++++++
include/drm/drm_pagemap.h | 34 ++++++++++++++++++++++++++++++++++
3 files changed, 70 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/drm_gpusvm.c b/drivers/gpu/drm/drm_gpusvm.c
index ef81381609de..51afc8a9704d 100644
--- a/drivers/gpu/drm/drm_gpusvm.c
+++ b/drivers/gpu/drm/drm_gpusvm.c
@@ -175,11 +175,8 @@
* }
*
* if (driver_migration_policy(range)) {
- * mmap_read_lock(mm);
- * devmem = driver_alloc_devmem();
- * err = drm_pagemap_migrate_to_devmem(devmem, gpusvm->mm, gpuva_start,
- * gpuva_end, driver_pgmap_owner());
- * mmap_read_unlock(mm);
+ * err = drm_pagemap_populate_mm(driver_choose_drm_pagemap(),
+ * gpuva_start, gpuva_end, gpusvm->mm);
* if (err) // CPU mappings may have changed
* goto retry;
* }
diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c
index 3551a50d7381..25395685a9b8 100644
--- a/drivers/gpu/drm/drm_pagemap.c
+++ b/drivers/gpu/drm/drm_pagemap.c
@@ -6,6 +6,7 @@
#include <linux/dma-mapping.h>
#include <linux/migrate.h>
#include <linux/pagemap.h>
+#include <drm/drm_drv.h>
#include <drm/drm_pagemap.h>
/**
@@ -809,3 +810,36 @@ struct drm_pagemap *drm_pagemap_page_to_dpagemap(struct page *page)
return zdd->devmem_allocation->dpagemap;
}
EXPORT_SYMBOL_GPL(drm_pagemap_page_to_dpagemap);
+
+/**
+ * drm_pagemap_populate_mm() - Populate a virtual range with device memory pages
+ * @dpagemap: Pointer to the drm_pagemap managing the device memory
+ * @start: Start of the virtual range to populate.
+ * @end: End of the virtual range to populate.
+ * @mm: Pointer to the virtual address space.
+ *
+ * Attempt to populate a virtual range with device memory pages,
+ * clearing them or migrating data from the existing pages if necessary.
+ * The function is best effort only, and implementations may vary
+ * in how hard they try to satisfy the request.
+ *
+ * Return: 0 on success, negative error code on error. If the hardware
+ * device was removed / unbound the function will return -ENODEV;
+ */
+int drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
+ unsigned long start, unsigned long end,
+ struct mm_struct *mm,
+ unsigned long timeslice_ms)
+{
+ int err;
+
+ if (!mmget_not_zero(mm))
+ return -EFAULT;
+ mmap_read_lock(mm);
+ err = dpagemap->ops->populate_mm(dpagemap, start, end, mm,
+ timeslice_ms);
+ mmap_read_unlock(mm);
+ mmput(mm);
+
+ return err;
+}
diff --git a/include/drm/drm_pagemap.h b/include/drm/drm_pagemap.h
index dabc9c365df4..e5f20a1235be 100644
--- a/include/drm/drm_pagemap.h
+++ b/include/drm/drm_pagemap.h
@@ -92,6 +92,35 @@ struct drm_pagemap_ops {
struct device *dev,
struct drm_pagemap_device_addr addr);
+ /**
+ * @populate_mm: Populate part of the mm with @dpagemap memory,
+ * migrating existing data.
+ * @dpagemap: The struct drm_pagemap managing the memory.
+ * @start: The virtual start address in @mm
+ * @end: The virtual end address in @mm
+ * @mm: Pointer to a live mm. The caller must have an mmget()
+ * reference.
+ *
+ * The caller will have the mm lock at least in read mode.
+ * Note that there is no guarantee that the memory is resident
+ * after the function returns, it's best effort only.
+ * When the mm is not using the memory anymore,
+ * it will be released. The struct drm_pagemap might have a
+ * mechanism in place to reclaim the memory and the data will
+ * then be migrated. Typically to system memory.
+ * The implementation should hold sufficient runtime power-
+ * references while pages are used in an address space and
+ * should ideally guard against hardware device unbind in
+ * a way such that device pages are migrated back to system
+ * followed by device page removal. The implementation should
+ * return -ENODEV after device removal.
+ *
+ * Return: 0 if successful. Negative error code on error.
+ */
+ int (*populate_mm)(struct drm_pagemap *dpagemap,
+ unsigned long start, unsigned long end,
+ struct mm_struct *mm,
+ unsigned long timeslice_ms);
};
/**
@@ -205,4 +234,9 @@ void drm_pagemap_devmem_init(struct drm_pagemap_devmem *devmem_allocation,
const struct drm_pagemap_devmem_ops *ops,
struct drm_pagemap *dpagemap, size_t size);
+int drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
+ unsigned long start, unsigned long end,
+ struct mm_struct *mm,
+ unsigned long timeslice_ms);
+
#endif
--
2.49.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 3/3] drm/xe: Implement and use the drm_pagemap populate_mm op
2025-06-04 9:35 [PATCH v2 0/3] drm/gpusvm, drm/pagemap, drm/xe: Restructure migration in preparation for multi-device Thomas Hellström
2025-06-04 9:35 ` [PATCH v2 1/3] drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap Thomas Hellström
2025-06-04 9:35 ` [PATCH v2 2/3] drm/pagemap: Add a populate_mm op Thomas Hellström
@ 2025-06-04 9:35 ` Thomas Hellström
2025-06-04 15:04 ` Matthew Brost
2025-06-05 22:16 ` Matthew Brost
2025-06-04 10:01 ` [PATCH v2 0/3] drm/gpusvm, drm/pagemap, drm/xe: Restructure migration in preparation for multi-device Christian König
3 siblings, 2 replies; 16+ messages in thread
From: Thomas Hellström @ 2025-06-04 9:35 UTC (permalink / raw)
To: intel-xe
Cc: Thomas Hellström, dri-devel, himal.prasad.ghimiray, apopple,
airlied, Simona Vetter, felix.kuehling, Matthew Brost,
Christian König, dakr, Mrozek, Michal, Joonas Lahtinen
Add runtime PM since we might call populate_mm on a foreign device.
Also create the VRAM bos as ttm_bo_type_kernel. This avoids the
initial clearing and the creation of an mmap handle.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
drivers/gpu/drm/drm_pagemap.c | 1 +
drivers/gpu/drm/xe/xe_svm.c | 104 ++++++++++++++++++++--------------
drivers/gpu/drm/xe/xe_svm.h | 10 ++--
drivers/gpu/drm/xe/xe_tile.h | 11 ++++
drivers/gpu/drm/xe/xe_vm.c | 2 +-
5 files changed, 78 insertions(+), 50 deletions(-)
diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c
index 25395685a9b8..94619be00d2a 100644
--- a/drivers/gpu/drm/drm_pagemap.c
+++ b/drivers/gpu/drm/drm_pagemap.c
@@ -843,3 +843,4 @@ int drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
return err;
}
+EXPORT_SYMBOL(drm_pagemap_populate_mm);
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index e161ce3e67a1..a10aab3768d8 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -3,13 +3,17 @@
* Copyright © 2024 Intel Corporation
*/
+#include <drm/drm_drv.h>
+
#include "xe_bo.h"
#include "xe_gt_stats.h"
#include "xe_gt_tlb_invalidation.h"
#include "xe_migrate.h"
#include "xe_module.h"
+#include "xe_pm.h"
#include "xe_pt.h"
#include "xe_svm.h"
+#include "xe_tile.h"
#include "xe_ttm_vram_mgr.h"
#include "xe_vm.h"
#include "xe_vm_types.h"
@@ -525,8 +529,10 @@ static struct xe_bo *to_xe_bo(struct drm_pagemap_devmem *devmem_allocation)
static void xe_svm_devmem_release(struct drm_pagemap_devmem *devmem_allocation)
{
struct xe_bo *bo = to_xe_bo(devmem_allocation);
+ struct xe_device *xe = xe_bo_device(bo);
xe_bo_put_async(bo);
+ xe_pm_runtime_put(xe);
}
static u64 block_offset_to_pfn(struct xe_vram_region *vr, u64 offset)
@@ -720,76 +726,63 @@ static struct xe_vram_region *tile_to_vr(struct xe_tile *tile)
return &tile->mem.vram;
}
-/**
- * xe_svm_alloc_vram()- Allocate device memory pages for range,
- * migrating existing data.
- * @vm: The VM.
- * @tile: tile to allocate vram from
- * @range: SVM range
- * @ctx: DRM GPU SVM context
- *
- * Return: 0 on success, error code on failure.
- */
-int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
- struct xe_svm_range *range,
- const struct drm_gpusvm_ctx *ctx)
+static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
+ unsigned long start, unsigned long end,
+ struct mm_struct *mm,
+ unsigned long timeslice_ms)
{
- struct mm_struct *mm = vm->svm.gpusvm.mm;
+ struct xe_tile *tile = container_of(dpagemap, typeof(*tile), mem.vram.dpagemap);
+ struct xe_device *xe = tile_to_xe(tile);
+ struct device *dev = xe->drm.dev;
struct xe_vram_region *vr = tile_to_vr(tile);
struct drm_buddy_block *block;
struct list_head *blocks;
struct xe_bo *bo;
- ktime_t end = 0;
- int err;
-
- if (!range->base.flags.migrate_devmem)
- return -EINVAL;
+ ktime_t time_end = 0;
+ int err, idx;
- range_debug(range, "ALLOCATE VRAM");
+ if (!drm_dev_enter(&xe->drm, &idx))
+ return -ENODEV;
- if (!mmget_not_zero(mm))
- return -EFAULT;
- mmap_read_lock(mm);
+ xe_pm_runtime_get(xe);
-retry:
- bo = xe_bo_create_locked(tile_to_xe(tile), NULL, NULL,
- xe_svm_range_size(range),
- ttm_bo_type_device,
+ retry:
+ bo = xe_bo_create_locked(tile_to_xe(tile), NULL, NULL, end - start,
+ ttm_bo_type_kernel,
XE_BO_FLAG_VRAM_IF_DGFX(tile) |
XE_BO_FLAG_CPU_ADDR_MIRROR);
if (IS_ERR(bo)) {
err = PTR_ERR(bo);
- if (xe_vm_validate_should_retry(NULL, err, &end))
+ if (xe_vm_validate_should_retry(NULL, err, &time_end))
goto retry;
- goto unlock;
+ goto out_pm_put;
}
- drm_pagemap_devmem_init(&bo->devmem_allocation,
- vm->xe->drm.dev, mm,
+ drm_pagemap_devmem_init(&bo->devmem_allocation, dev, mm,
&dpagemap_devmem_ops,
&tile->mem.vram.dpagemap,
- xe_svm_range_size(range));
+ end - start);
blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)->blocks;
list_for_each_entry(block, blocks, link)
block->private = vr;
xe_bo_get(bo);
- err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation,
- mm,
- xe_svm_range_start(range),
- xe_svm_range_end(range),
- ctx->timeslice_ms,
- xe_svm_devm_owner(vm->xe));
+
+ /* Ensure the device has a pm ref while there are device pages active. */
+ xe_pm_runtime_get_noresume(xe);
+ err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation, mm,
+ start, end, timeslice_ms,
+ xe_svm_devm_owner(xe));
if (err)
xe_svm_devmem_release(&bo->devmem_allocation);
xe_bo_unlock(bo);
xe_bo_put(bo);
-unlock:
- mmap_read_unlock(mm);
- mmput(mm);
+out_pm_put:
+ xe_pm_runtime_put(xe);
+ drm_dev_exit(idx);
return err;
}
@@ -898,7 +891,7 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
if (--migrate_try_count >= 0 &&
xe_svm_range_needs_migrate_to_vram(range, vma, IS_DGFX(vm->xe))) {
- err = xe_svm_alloc_vram(vm, tile, range, &ctx);
+ err = xe_svm_alloc_vram(tile, range, &ctx);
ctx.timeslice_ms <<= 1; /* Double timeslice if we have to retry */
if (err) {
if (migrate_try_count || !ctx.devmem_only) {
@@ -1054,6 +1047,30 @@ int xe_svm_range_get_pages(struct xe_vm *vm, struct xe_svm_range *range,
#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
+/**
+ * xe_svm_alloc_vram()- Allocate device memory pages for range,
+ * migrating existing data.
+ * @vm: The VM.
+ * @tile: tile to allocate vram from
+ * @range: SVM range
+ * @ctx: DRM GPU SVM context
+ *
+ * Return: 0 on success, error code on failure.
+ */
+int xe_svm_alloc_vram(struct xe_tile *tile, struct xe_svm_range *range,
+ const struct drm_gpusvm_ctx *ctx)
+{
+ struct drm_pagemap *dpagemap;
+
+ range_debug(range, "ALLOCATE VRAM");
+
+ dpagemap = xe_tile_local_pagemap(tile);
+ return drm_pagemap_populate_mm(dpagemap, xe_svm_range_start(range),
+ xe_svm_range_end(range),
+ range->base.gpusvm->mm,
+ ctx->timeslice_ms);
+}
+
static struct drm_pagemap_device_addr
xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap,
struct device *dev,
@@ -1078,6 +1095,7 @@ xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap,
static const struct drm_pagemap_ops xe_drm_pagemap_ops = {
.device_map = xe_drm_pagemap_device_map,
+ .populate_mm = xe_drm_pagemap_populate_mm,
};
/**
@@ -1130,7 +1148,7 @@ int xe_devm_add(struct xe_tile *tile, struct xe_vram_region *vr)
return 0;
}
#else
-int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
+int xe_svm_alloc_vram(struct xe_tile *tile,
struct xe_svm_range *range,
const struct drm_gpusvm_ctx *ctx)
{
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index 19ce4f2754a7..da9a69ea0bb1 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -70,8 +70,7 @@ int xe_svm_bo_evict(struct xe_bo *bo);
void xe_svm_range_debug(struct xe_svm_range *range, const char *operation);
-int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
- struct xe_svm_range *range,
+int xe_svm_alloc_vram(struct xe_tile *tile, struct xe_svm_range *range,
const struct drm_gpusvm_ctx *ctx);
struct xe_svm_range *xe_svm_range_find_or_insert(struct xe_vm *vm, u64 addr,
@@ -237,10 +236,9 @@ void xe_svm_range_debug(struct xe_svm_range *range, const char *operation)
{
}
-static inline
-int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
- struct xe_svm_range *range,
- const struct drm_gpusvm_ctx *ctx)
+static inline int
+xe_svm_alloc_vram(struct xe_tile *tile, struct xe_svm_range *range,
+ const struct drm_gpusvm_ctx *ctx)
{
return -EOPNOTSUPP;
}
diff --git a/drivers/gpu/drm/xe/xe_tile.h b/drivers/gpu/drm/xe/xe_tile.h
index eb939316d55b..066a3d0cea79 100644
--- a/drivers/gpu/drm/xe/xe_tile.h
+++ b/drivers/gpu/drm/xe/xe_tile.h
@@ -16,4 +16,15 @@ int xe_tile_init(struct xe_tile *tile);
void xe_tile_migrate_wait(struct xe_tile *tile);
+#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
+static inline struct drm_pagemap *xe_tile_local_pagemap(struct xe_tile *tile)
+{
+ return &tile->mem.vram.dpagemap;
+}
+#else
+static inline struct drm_pagemap *xe_tile_local_pagemap(struct xe_tile *tile)
+{
+ return NULL;
+}
+#endif
#endif
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 7140d8856bad..def493acb4d7 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -2911,7 +2911,7 @@ static int prefetch_ranges(struct xe_vm *vm, struct xe_vma_op *op)
if (xe_svm_range_needs_migrate_to_vram(svm_range, vma, region)) {
tile = &vm->xe->tiles[region_to_mem_type[region] - XE_PL_VRAM0];
- err = xe_svm_alloc_vram(vm, tile, svm_range, &ctx);
+ err = xe_svm_alloc_vram(tile, svm_range, &ctx);
if (err) {
drm_dbg(&vm->xe->drm, "VRAM allocation failed, retry from userspace, asid=%u, gpusvm=%p, errno=%pe\n",
vm->usm.asid, &vm->svm.gpusvm, ERR_PTR(err));
--
2.49.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/3] drm/gpusvm, drm/pagemap, drm/xe: Restructure migration in preparation for multi-device
2025-06-04 9:35 [PATCH v2 0/3] drm/gpusvm, drm/pagemap, drm/xe: Restructure migration in preparation for multi-device Thomas Hellström
` (2 preceding siblings ...)
2025-06-04 9:35 ` [PATCH v2 3/3] drm/xe: Implement and use the drm_pagemap " Thomas Hellström
@ 2025-06-04 10:01 ` Christian König
2025-06-04 12:01 ` Thomas Hellström
3 siblings, 1 reply; 16+ messages in thread
From: Christian König @ 2025-06-04 10:01 UTC (permalink / raw)
To: Thomas Hellström, intel-xe
Cc: dri-devel, himal.prasad.ghimiray, apopple, airlied, Simona Vetter,
felix.kuehling, Matthew Brost, dakr, Mrozek, Michal,
Joonas Lahtinen, Kuehling, Felix, Yang, Philip
Hi Thomas,
please make sure to loop in Kuehling, Felix <Felix.Kuehling@amd.com> and Yang, Philip <Philip.Yang@amd.com> for that kind of stuff.
I'm absolutely not deep enough in the pagemap handling to judge any of that here.
Thanks,
Christian.
On 6/4/25 11:35, Thomas Hellström wrote:
> This patchset modifies the migration part of drm_gpusvm to drm_pagemap and
> adds a populate_mm() op to drm_pagemap.
>
> The idea is that the device that receives a pagefault determines if it wants to
> migrate content and to where. It then calls the populate_mm() method of relevant
> drm_pagemap.
>
> This functionality was mostly already in place, but hard-coded for xe only without
> going through a pagemap op. Since we might be dealing with separate devices moving
> forward, it also now becomes the responsibilit of the populate_mm() op to
> grab any necessary local device runtime pm references and keep them held while
> its pages are present in an mm (struct mm_struct).
>
> On thing to decide here is whether the populate_mm() callback should sit on a
> struct drm_pagemap for now while we sort multi-device usability out or whether
> we should add it (or something equivalent) to struct dev_pagemap.
>
> v2:
> - Rebase.
>
> Matthew Brost (1):
> drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap
>
> Thomas Hellström (2):
> drm/pagemap: Add a populate_mm op
> drm/xe: Implement and use the drm_pagemap populate_mm op
>
> Documentation/gpu/rfc/gpusvm.rst | 12 +-
> drivers/gpu/drm/Makefile | 6 +-
> drivers/gpu/drm/drm_gpusvm.c | 760 +-----------------------
> drivers/gpu/drm/drm_pagemap.c | 846 +++++++++++++++++++++++++++
> drivers/gpu/drm/xe/Kconfig | 10 +-
> drivers/gpu/drm/xe/xe_bo_types.h | 2 +-
> drivers/gpu/drm/xe/xe_device_types.h | 2 +-
> drivers/gpu/drm/xe/xe_svm.c | 129 ++--
> drivers/gpu/drm/xe/xe_svm.h | 10 +-
> drivers/gpu/drm/xe/xe_tile.h | 11 +
> drivers/gpu/drm/xe/xe_vm.c | 2 +-
> include/drm/drm_gpusvm.h | 96 ---
> include/drm/drm_pagemap.h | 135 +++++
> 13 files changed, 1107 insertions(+), 914 deletions(-)
> create mode 100644 drivers/gpu/drm/drm_pagemap.c
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 0/3] drm/gpusvm, drm/pagemap, drm/xe: Restructure migration in preparation for multi-device
2025-06-04 10:01 ` [PATCH v2 0/3] drm/gpusvm, drm/pagemap, drm/xe: Restructure migration in preparation for multi-device Christian König
@ 2025-06-04 12:01 ` Thomas Hellström
0 siblings, 0 replies; 16+ messages in thread
From: Thomas Hellström @ 2025-06-04 12:01 UTC (permalink / raw)
To: Christian König, intel-xe
Cc: dri-devel, himal.prasad.ghimiray, apopple, airlied, Simona Vetter,
felix.kuehling, Matthew Brost, dakr, Mrozek, Michal,
Joonas Lahtinen, Yang, Philip
Hi,
On Wed, 2025-06-04 at 12:01 +0200, Christian König wrote:
> Hi Thomas,
>
> please make sure to loop in Kuehling, Felix <Felix.Kuehling@amd.com>
> and Yang, Philip <Philip.Yang@amd.com> for that kind of stuff.
>
> I'm absolutely not deep enough in the pagemap handling to judge any
> of that here.
>
> Thanks,
> Christian.
Sure, Felix was CC'd and I'll make sure Philip will be CC'd on the next
revision.
Thanks,
Thomas
>
> On 6/4/25 11:35, Thomas Hellström wrote:
> > This patchset modifies the migration part of drm_gpusvm to
> > drm_pagemap and
> > adds a populate_mm() op to drm_pagemap.
> >
> > The idea is that the device that receives a pagefault determines if
> > it wants to
> > migrate content and to where. It then calls the populate_mm()
> > method of relevant
> > drm_pagemap.
> >
> > This functionality was mostly already in place, but hard-coded for
> > xe only without
> > going through a pagemap op. Since we might be dealing with separate
> > devices moving
> > forward, it also now becomes the responsibilit of the populate_mm()
> > op to
> > grab any necessary local device runtime pm references and keep them
> > held while
> > its pages are present in an mm (struct mm_struct).
> >
> > On thing to decide here is whether the populate_mm() callback
> > should sit on a
> > struct drm_pagemap for now while we sort multi-device usability out
> > or whether
> > we should add it (or something equivalent) to struct dev_pagemap.
> >
> > v2:
> > - Rebase.
> >
> > Matthew Brost (1):
> > drm/gpusvm, drm/pagemap: Move migration functionality to
> > drm_pagemap
> >
> > Thomas Hellström (2):
> > drm/pagemap: Add a populate_mm op
> > drm/xe: Implement and use the drm_pagemap populate_mm op
> >
> > Documentation/gpu/rfc/gpusvm.rst | 12 +-
> > drivers/gpu/drm/Makefile | 6 +-
> > drivers/gpu/drm/drm_gpusvm.c | 760 +----------------------
> > -
> > drivers/gpu/drm/drm_pagemap.c | 846
> > +++++++++++++++++++++++++++
> > drivers/gpu/drm/xe/Kconfig | 10 +-
> > drivers/gpu/drm/xe/xe_bo_types.h | 2 +-
> > drivers/gpu/drm/xe/xe_device_types.h | 2 +-
> > drivers/gpu/drm/xe/xe_svm.c | 129 ++--
> > drivers/gpu/drm/xe/xe_svm.h | 10 +-
> > drivers/gpu/drm/xe/xe_tile.h | 11 +
> > drivers/gpu/drm/xe/xe_vm.c | 2 +-
> > include/drm/drm_gpusvm.h | 96 ---
> > include/drm/drm_pagemap.h | 135 +++++
> > 13 files changed, 1107 insertions(+), 914 deletions(-)
> > create mode 100644 drivers/gpu/drm/drm_pagemap.c
> >
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 3/3] drm/xe: Implement and use the drm_pagemap populate_mm op
2025-06-04 9:35 ` [PATCH v2 3/3] drm/xe: Implement and use the drm_pagemap " Thomas Hellström
@ 2025-06-04 15:04 ` Matthew Brost
2025-06-05 7:37 ` Thomas Hellström
2025-06-05 22:16 ` Matthew Brost
1 sibling, 1 reply; 16+ messages in thread
From: Matthew Brost @ 2025-06-04 15:04 UTC (permalink / raw)
To: Thomas Hellström
Cc: intel-xe, dri-devel, himal.prasad.ghimiray, apopple, airlied,
Simona Vetter, felix.kuehling, Christian König, dakr,
Mrozek, Michal, Joonas Lahtinen
On Wed, Jun 04, 2025 at 11:35:36AM +0200, Thomas Hellström wrote:
> Add runtime PM since we might call populate_mm on a foreign device.
I think taking a runtime PM will fix hard to hit splat [1] too.
[1] https://patchwork.freedesktop.org/patch/648954/?series=147849&rev=1
> Also create the VRAM bos as ttm_bo_type_kernel. This avoids the
> initial clearing and the creation of an mmap handle.
>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
> drivers/gpu/drm/drm_pagemap.c | 1 +
> drivers/gpu/drm/xe/xe_svm.c | 104 ++++++++++++++++++++--------------
> drivers/gpu/drm/xe/xe_svm.h | 10 ++--
> drivers/gpu/drm/xe/xe_tile.h | 11 ++++
> drivers/gpu/drm/xe/xe_vm.c | 2 +-
> 5 files changed, 78 insertions(+), 50 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c
> index 25395685a9b8..94619be00d2a 100644
> --- a/drivers/gpu/drm/drm_pagemap.c
> +++ b/drivers/gpu/drm/drm_pagemap.c
> @@ -843,3 +843,4 @@ int drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
>
> return err;
> }
> +EXPORT_SYMBOL(drm_pagemap_populate_mm);
> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index e161ce3e67a1..a10aab3768d8 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -3,13 +3,17 @@
> * Copyright © 2024 Intel Corporation
> */
>
> +#include <drm/drm_drv.h>
> +
> #include "xe_bo.h"
> #include "xe_gt_stats.h"
> #include "xe_gt_tlb_invalidation.h"
> #include "xe_migrate.h"
> #include "xe_module.h"
> +#include "xe_pm.h"
> #include "xe_pt.h"
> #include "xe_svm.h"
> +#include "xe_tile.h"
> #include "xe_ttm_vram_mgr.h"
> #include "xe_vm.h"
> #include "xe_vm_types.h"
> @@ -525,8 +529,10 @@ static struct xe_bo *to_xe_bo(struct drm_pagemap_devmem *devmem_allocation)
> static void xe_svm_devmem_release(struct drm_pagemap_devmem *devmem_allocation)
> {
> struct xe_bo *bo = to_xe_bo(devmem_allocation);
> + struct xe_device *xe = xe_bo_device(bo);
>
> xe_bo_put_async(bo);
> + xe_pm_runtime_put(xe);
> }
>
> static u64 block_offset_to_pfn(struct xe_vram_region *vr, u64 offset)
> @@ -720,76 +726,63 @@ static struct xe_vram_region *tile_to_vr(struct xe_tile *tile)
> return &tile->mem.vram;
> }
>
> -/**
> - * xe_svm_alloc_vram()- Allocate device memory pages for range,
> - * migrating existing data.
> - * @vm: The VM.
> - * @tile: tile to allocate vram from
> - * @range: SVM range
> - * @ctx: DRM GPU SVM context
> - *
> - * Return: 0 on success, error code on failure.
> - */
> -int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
> - struct xe_svm_range *range,
> - const struct drm_gpusvm_ctx *ctx)
> +static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
> + unsigned long start, unsigned long end,
> + struct mm_struct *mm,
> + unsigned long timeslice_ms)
> {
> - struct mm_struct *mm = vm->svm.gpusvm.mm;
> + struct xe_tile *tile = container_of(dpagemap, typeof(*tile), mem.vram.dpagemap);
I think this is going to chnage here [2] making mem.vram a pointer.
Maybe a helper to go from dpagemap -> tile to future proof a little. I
think once [2] lands, we will need to pick the root tile here.
[2] https://patchwork.freedesktop.org/series/149503/
> + struct xe_device *xe = tile_to_xe(tile);
> + struct device *dev = xe->drm.dev;
> struct xe_vram_region *vr = tile_to_vr(tile);
> struct drm_buddy_block *block;
> struct list_head *blocks;
> struct xe_bo *bo;
> - ktime_t end = 0;
> - int err;
> -
> - if (!range->base.flags.migrate_devmem)
> - return -EINVAL;
> + ktime_t time_end = 0;
> + int err, idx;
>
> - range_debug(range, "ALLOCATE VRAM");
> + if (!drm_dev_enter(&xe->drm, &idx))
> + return -ENODEV;
>
> - if (!mmget_not_zero(mm))
> - return -EFAULT;
> - mmap_read_lock(mm);
> + xe_pm_runtime_get(xe);
A forgien device might not be awake so is that why you are using
xe_pm_runtime_get rather than xe_pm_runtime_get_noresume? We only have
the MMAP lock here so assuming that is safe with our runtime PM flow.
>
> -retry:
> - bo = xe_bo_create_locked(tile_to_xe(tile), NULL, NULL,
> - xe_svm_range_size(range),
> - ttm_bo_type_device,
> + retry:
> + bo = xe_bo_create_locked(tile_to_xe(tile), NULL, NULL, end - start,
> + ttm_bo_type_kernel,
> XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> XE_BO_FLAG_CPU_ADDR_MIRROR);
> if (IS_ERR(bo)) {
> err = PTR_ERR(bo);
> - if (xe_vm_validate_should_retry(NULL, err, &end))
> + if (xe_vm_validate_should_retry(NULL, err, &time_end))
> goto retry;
> - goto unlock;
> + goto out_pm_put;
> }
>
> - drm_pagemap_devmem_init(&bo->devmem_allocation,
> - vm->xe->drm.dev, mm,
> + drm_pagemap_devmem_init(&bo->devmem_allocation, dev, mm,
> &dpagemap_devmem_ops,
> &tile->mem.vram.dpagemap,
> - xe_svm_range_size(range));
> + end - start);
>
> blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)->blocks;
> list_for_each_entry(block, blocks, link)
> block->private = vr;
>
> xe_bo_get(bo);
> - err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation,
> - mm,
> - xe_svm_range_start(range),
> - xe_svm_range_end(range),
> - ctx->timeslice_ms,
> - xe_svm_devm_owner(vm->xe));
> +
> + /* Ensure the device has a pm ref while there are device pages active. */
> + xe_pm_runtime_get_noresume(xe);
> + err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation, mm,
> + start, end, timeslice_ms,
> + xe_svm_devm_owner(xe));
> if (err)
> xe_svm_devmem_release(&bo->devmem_allocation);
>
> xe_bo_unlock(bo);
> xe_bo_put(bo);
>
> -unlock:
> - mmap_read_unlock(mm);
> - mmput(mm);
> +out_pm_put:
> + xe_pm_runtime_put(xe);
> + drm_dev_exit(idx);
>
> return err;
> }
> @@ -898,7 +891,7 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>
> if (--migrate_try_count >= 0 &&
> xe_svm_range_needs_migrate_to_vram(range, vma, IS_DGFX(vm->xe))) {
> - err = xe_svm_alloc_vram(vm, tile, range, &ctx);
> + err = xe_svm_alloc_vram(tile, range, &ctx);
> ctx.timeslice_ms <<= 1; /* Double timeslice if we have to retry */
> if (err) {
> if (migrate_try_count || !ctx.devmem_only) {
> @@ -1054,6 +1047,30 @@ int xe_svm_range_get_pages(struct xe_vm *vm, struct xe_svm_range *range,
>
> #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
>
> +/**
> + * xe_svm_alloc_vram()- Allocate device memory pages for range,
> + * migrating existing data.
> + * @vm: The VM.
> + * @tile: tile to allocate vram from
> + * @range: SVM range
> + * @ctx: DRM GPU SVM context
> + *
> + * Return: 0 on success, error code on failure.
> + */
> +int xe_svm_alloc_vram(struct xe_tile *tile, struct xe_svm_range *range,
> + const struct drm_gpusvm_ctx *ctx)
> +{
> + struct drm_pagemap *dpagemap;
> +
> + range_debug(range, "ALLOCATE VRAM");
> +
if (!range->base.flags.migrate_devmem)
return -EINVAL;
Or I guess an assert would work too as caller should have check this
field.
Matt
> + dpagemap = xe_tile_local_pagemap(tile);
> + return drm_pagemap_populate_mm(dpagemap, xe_svm_range_start(range),
> + xe_svm_range_end(range),
> + range->base.gpusvm->mm,
> + ctx->timeslice_ms);
> +}
> +
> static struct drm_pagemap_device_addr
> xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap,
> struct device *dev,
> @@ -1078,6 +1095,7 @@ xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap,
>
> static const struct drm_pagemap_ops xe_drm_pagemap_ops = {
> .device_map = xe_drm_pagemap_device_map,
> + .populate_mm = xe_drm_pagemap_populate_mm,
> };
>
> /**
> @@ -1130,7 +1148,7 @@ int xe_devm_add(struct xe_tile *tile, struct xe_vram_region *vr)
> return 0;
> }
> #else
> -int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
> +int xe_svm_alloc_vram(struct xe_tile *tile,
> struct xe_svm_range *range,
> const struct drm_gpusvm_ctx *ctx)
> {
> diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
> index 19ce4f2754a7..da9a69ea0bb1 100644
> --- a/drivers/gpu/drm/xe/xe_svm.h
> +++ b/drivers/gpu/drm/xe/xe_svm.h
> @@ -70,8 +70,7 @@ int xe_svm_bo_evict(struct xe_bo *bo);
>
> void xe_svm_range_debug(struct xe_svm_range *range, const char *operation);
>
> -int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
> - struct xe_svm_range *range,
> +int xe_svm_alloc_vram(struct xe_tile *tile, struct xe_svm_range *range,
> const struct drm_gpusvm_ctx *ctx);
>
> struct xe_svm_range *xe_svm_range_find_or_insert(struct xe_vm *vm, u64 addr,
> @@ -237,10 +236,9 @@ void xe_svm_range_debug(struct xe_svm_range *range, const char *operation)
> {
> }
>
> -static inline
> -int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
> - struct xe_svm_range *range,
> - const struct drm_gpusvm_ctx *ctx)
> +static inline int
> +xe_svm_alloc_vram(struct xe_tile *tile, struct xe_svm_range *range,
> + const struct drm_gpusvm_ctx *ctx)
> {
> return -EOPNOTSUPP;
> }
> diff --git a/drivers/gpu/drm/xe/xe_tile.h b/drivers/gpu/drm/xe/xe_tile.h
> index eb939316d55b..066a3d0cea79 100644
> --- a/drivers/gpu/drm/xe/xe_tile.h
> +++ b/drivers/gpu/drm/xe/xe_tile.h
> @@ -16,4 +16,15 @@ int xe_tile_init(struct xe_tile *tile);
>
> void xe_tile_migrate_wait(struct xe_tile *tile);
>
> +#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> +static inline struct drm_pagemap *xe_tile_local_pagemap(struct xe_tile *tile)
> +{
> + return &tile->mem.vram.dpagemap;
> +}
> +#else
> +static inline struct drm_pagemap *xe_tile_local_pagemap(struct xe_tile *tile)
> +{
> + return NULL;
> +}
> +#endif
> #endif
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 7140d8856bad..def493acb4d7 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -2911,7 +2911,7 @@ static int prefetch_ranges(struct xe_vm *vm, struct xe_vma_op *op)
>
> if (xe_svm_range_needs_migrate_to_vram(svm_range, vma, region)) {
> tile = &vm->xe->tiles[region_to_mem_type[region] - XE_PL_VRAM0];
> - err = xe_svm_alloc_vram(vm, tile, svm_range, &ctx);
> + err = xe_svm_alloc_vram(tile, svm_range, &ctx);
> if (err) {
> drm_dbg(&vm->xe->drm, "VRAM allocation failed, retry from userspace, asid=%u, gpusvm=%p, errno=%pe\n",
> vm->usm.asid, &vm->svm.gpusvm, ERR_PTR(err));
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/3] drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap
2025-06-04 9:35 ` [PATCH v2 1/3] drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap Thomas Hellström
@ 2025-06-04 15:45 ` kernel test robot
2025-06-05 22:44 ` Matthew Brost
1 sibling, 0 replies; 16+ messages in thread
From: kernel test robot @ 2025-06-04 15:45 UTC (permalink / raw)
To: Thomas Hellström, intel-xe
Cc: oe-kbuild-all, Matthew Brost, Thomas Hellström, dri-devel,
himal.prasad.ghimiray, apopple, airlied, Simona Vetter,
felix.kuehling, Christian König, dakr, Mrozek, Michal,
Joonas Lahtinen
Hi Thomas,
kernel test robot noticed the following build errors:
[auto build test ERROR on drm-xe/drm-xe-next]
[also build test ERROR on next-20250604]
[cannot apply to linus/master v6.15]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Thomas-Hellstr-m/drm-gpusvm-drm-pagemap-Move-migration-functionality-to-drm_pagemap/20250604-173757
base: https://gitlab.freedesktop.org/drm/xe/kernel.git drm-xe-next
patch link: https://lore.kernel.org/r/20250604093536.95982-2-thomas.hellstrom%40linux.intel.com
patch subject: [PATCH v2 1/3] drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap
config: loongarch-allyesconfig (https://download.01.org/0day-ci/archive/20250604/202506042352.xDT1ySBT-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250604/202506042352.xDT1ySBT-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202506042352.xDT1ySBT-lkp@intel.com/
All error/warnings (new ones prefixed by >>):
loongarch64-linux-ld: arch/loongarch/kernel/head.o: relocation R_LARCH_B26 overflow 0xffffffffef55fa70
arch/loongarch/kernel/head.o: in function `smpboot_entry':
>> (.ref.text+0x160): relocation truncated to fit: R_LARCH_B26 against symbol `start_secondary' defined in .text section in arch/loongarch/kernel/smp.o
loongarch64-linux-ld: final link failed: bad value
--
>> drivers/gpu/drm/drm_pagemap.c:314: warning: Function parameter or struct member 'timeslice_ms' not described in 'drm_pagemap_migrate_to_devmem'
vim +314 drivers/gpu/drm/drm_pagemap.c
271
272
273 /**
274 * drm_pagemap_migrate_to_devmem() - Migrate a struct mm_struct range to device memory
275 * @devmem_allocation: The device memory allocation to migrate to.
276 * The caller should hold a reference to the device memory allocation,
277 * and the reference is consumed by this function unless it returns with
278 * an error.
279 * @mm: Pointer to the struct mm_struct.
280 * @start: Start of the virtual address range to migrate.
281 * @end: End of the virtual address range to migrate.
282 * @pgmap_owner: Not used currently, since only system memory is considered.
283 *
284 * This function migrates the specified virtual address range to device memory.
285 * It performs the necessary setup and invokes the driver-specific operations for
286 * migration to device memory. Expected to be called while holding the mmap lock in
287 * at least read mode.
288 *
289 * Return: %0 on success, negative error code on failure.
290 */
291
292 /*
293 * @range: Pointer to the GPU SVM range structure
294 * @devmem_allocation: Pointer to the device memory allocation. The caller
295 * should hold a reference to the device memory allocation,
296 * which should be dropped via ops->devmem_release or upon
297 * the failure of this function.
298 * @ctx: GPU SVM context
299 *
300 * This function migrates the specified GPU SVM range to device memory. It
301 * performs the necessary setup and invokes the driver-specific operations for
302 * migration to device memory. Upon successful return, @devmem_allocation can
303 * safely reference @range until ops->devmem_release is called which only upon
304 * successful return. Expected to be called while holding the mmap lock in read
305 * mode.
306 *
307 * Return: 0 on success, negative error code on failure.
308 */
309 int drm_pagemap_migrate_to_devmem(struct drm_pagemap_devmem *devmem_allocation,
310 struct mm_struct *mm,
311 unsigned long start, unsigned long end,
312 unsigned long timeslice_ms,
313 void *pgmap_owner)
> 314 {
315 const struct drm_pagemap_devmem_ops *ops = devmem_allocation->ops;
316 struct migrate_vma migrate = {
317 .start = start,
318 .end = end,
319 .pgmap_owner = pgmap_owner,
320 .flags = MIGRATE_VMA_SELECT_SYSTEM,
321 };
322 unsigned long i, npages = npages_in_range(start, end);
323 struct vm_area_struct *vas;
324 struct drm_pagemap_zdd *zdd = NULL;
325 struct page **pages;
326 dma_addr_t *dma_addr;
327 void *buf;
328 int err;
329
330 mmap_assert_locked(mm);
331
332 if (!ops->populate_devmem_pfn || !ops->copy_to_devmem ||
333 !ops->copy_to_ram)
334 return -EOPNOTSUPP;
335
336 vas = vma_lookup(mm, start);
337 if (!vas) {
338 err = -ENOENT;
339 goto err_out;
340 }
341
342 if (end > vas->vm_end || start < vas->vm_start) {
343 err = -EINVAL;
344 goto err_out;
345 }
346
347 if (!vma_is_anonymous(vas)) {
348 err = -EBUSY;
349 goto err_out;
350 }
351
352 buf = kvcalloc(npages, 2 * sizeof(*migrate.src) + sizeof(*dma_addr) +
353 sizeof(*pages), GFP_KERNEL);
354 if (!buf) {
355 err = -ENOMEM;
356 goto err_out;
357 }
358 dma_addr = buf + (2 * sizeof(*migrate.src) * npages);
359 pages = buf + (2 * sizeof(*migrate.src) + sizeof(*dma_addr)) * npages;
360
361 zdd = drm_pagemap_zdd_alloc(pgmap_owner);
362 if (!zdd) {
363 err = -ENOMEM;
364 goto err_free;
365 }
366
367 migrate.vma = vas;
368 migrate.src = buf;
369 migrate.dst = migrate.src + npages;
370
371 err = migrate_vma_setup(&migrate);
372 if (err)
373 goto err_free;
374
375 if (!migrate.cpages) {
376 err = -EFAULT;
377 goto err_free;
378 }
379
380 if (migrate.cpages != npages) {
381 err = -EBUSY;
382 goto err_finalize;
383 }
384
385 err = ops->populate_devmem_pfn(devmem_allocation, npages, migrate.dst);
386 if (err)
387 goto err_finalize;
388
389 err = drm_pagemap_migrate_map_pages(devmem_allocation->dev, dma_addr,
390 migrate.src, npages, DMA_TO_DEVICE);
391 if (err)
392 goto err_finalize;
393
394 for (i = 0; i < npages; ++i) {
395 struct page *page = pfn_to_page(migrate.dst[i]);
396
397 pages[i] = page;
398 migrate.dst[i] = migrate_pfn(migrate.dst[i]);
399 drm_pagemap_get_devmem_page(page, zdd);
400 }
401
402 err = ops->copy_to_devmem(pages, dma_addr, npages);
403 if (err)
404 goto err_finalize;
405
406 /* Upon success bind devmem allocation to range and zdd */
407 devmem_allocation->timeslice_expiration = get_jiffies_64() +
408 msecs_to_jiffies(timeslice_ms);
409 zdd->devmem_allocation = devmem_allocation; /* Owns ref */
410
411 err_finalize:
412 if (err)
413 drm_pagemap_migration_unlock_put_pages(npages, migrate.dst);
414 migrate_vma_pages(&migrate);
415 migrate_vma_finalize(&migrate);
416 drm_pagemap_migrate_unmap_pages(devmem_allocation->dev, dma_addr, npages,
417 DMA_TO_DEVICE);
418 err_free:
419 if (zdd)
420 drm_pagemap_zdd_put(zdd);
421 kvfree(buf);
422 err_out:
423 return err;
424 }
425 EXPORT_SYMBOL_GPL(drm_pagemap_migrate_to_devmem);
426
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] drm/pagemap: Add a populate_mm op
2025-06-04 9:35 ` [PATCH v2 2/3] drm/pagemap: Add a populate_mm op Thomas Hellström
@ 2025-06-04 21:06 ` kernel test robot
2025-06-04 22:05 ` Matthew Brost
1 sibling, 0 replies; 16+ messages in thread
From: kernel test robot @ 2025-06-04 21:06 UTC (permalink / raw)
To: Thomas Hellström, intel-xe
Cc: oe-kbuild-all, Thomas Hellström, dri-devel,
himal.prasad.ghimiray, apopple, airlied, Simona Vetter,
felix.kuehling, Matthew Brost, Christian König, dakr,
Mrozek, Michal, Joonas Lahtinen
Hi Thomas,
kernel test robot noticed the following build errors:
[auto build test ERROR on drm-xe/drm-xe-next]
[also build test ERROR on next-20250604]
[cannot apply to linus/master v6.15]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Thomas-Hellstr-m/drm-gpusvm-drm-pagemap-Move-migration-functionality-to-drm_pagemap/20250604-173757
base: https://gitlab.freedesktop.org/drm/xe/kernel.git drm-xe-next
patch link: https://lore.kernel.org/r/20250604093536.95982-3-thomas.hellstrom%40linux.intel.com
patch subject: [PATCH v2 2/3] drm/pagemap: Add a populate_mm op
config: loongarch-allyesconfig (https://download.01.org/0day-ci/archive/20250605/202506050405.9sHdzAlO-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250605/202506050405.9sHdzAlO-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202506050405.9sHdzAlO-lkp@intel.com/
All error/warnings (new ones prefixed by >>):
drivers/gpu/drm/drm_pagemap.c:315: warning: Function parameter or struct member 'timeslice_ms' not described in 'drm_pagemap_migrate_to_devmem'
>> drivers/gpu/drm/drm_pagemap.c:833: warning: Function parameter or struct member 'timeslice_ms' not described in 'drm_pagemap_populate_mm'
--
loongarch64-linux-ld: arch/loongarch/kernel/head.o: relocation R_LARCH_B26 overflow 0xffffffffef55fa70
arch/loongarch/kernel/head.o: in function `smpboot_entry':
>> (.ref.text+0x160): relocation truncated to fit: R_LARCH_B26 against symbol `start_secondary' defined in .text section in arch/loongarch/kernel/smp.o
loongarch64-linux-ld: final link failed: bad value
vim +833 drivers/gpu/drm/drm_pagemap.c
813
814 /**
815 * drm_pagemap_populate_mm() - Populate a virtual range with device memory pages
816 * @dpagemap: Pointer to the drm_pagemap managing the device memory
817 * @start: Start of the virtual range to populate.
818 * @end: End of the virtual range to populate.
819 * @mm: Pointer to the virtual address space.
820 *
821 * Attempt to populate a virtual range with device memory pages,
822 * clearing them or migrating data from the existing pages if necessary.
823 * The function is best effort only, and implementations may vary
824 * in how hard they try to satisfy the request.
825 *
826 * Return: 0 on success, negative error code on error. If the hardware
827 * device was removed / unbound the function will return -ENODEV;
828 */
829 int drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
830 unsigned long start, unsigned long end,
831 struct mm_struct *mm,
832 unsigned long timeslice_ms)
> 833 {
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] drm/pagemap: Add a populate_mm op
2025-06-04 9:35 ` [PATCH v2 2/3] drm/pagemap: Add a populate_mm op Thomas Hellström
2025-06-04 21:06 ` kernel test robot
@ 2025-06-04 22:05 ` Matthew Brost
2025-06-05 7:40 ` Thomas Hellström
1 sibling, 1 reply; 16+ messages in thread
From: Matthew Brost @ 2025-06-04 22:05 UTC (permalink / raw)
To: Thomas Hellström
Cc: intel-xe, dri-devel, himal.prasad.ghimiray, apopple, airlied,
Simona Vetter, felix.kuehling, Christian König, dakr,
Mrozek, Michal, Joonas Lahtinen
On Wed, Jun 04, 2025 at 11:35:35AM +0200, Thomas Hellström wrote:
> Add an operation to populate a part of a drm_mm with device
> private memory.
>
With the kernel doc fixed:
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
> drivers/gpu/drm/drm_gpusvm.c | 7 ++-----
> drivers/gpu/drm/drm_pagemap.c | 34 ++++++++++++++++++++++++++++++++++
> include/drm/drm_pagemap.h | 34 ++++++++++++++++++++++++++++++++++
> 3 files changed, 70 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gpusvm.c b/drivers/gpu/drm/drm_gpusvm.c
> index ef81381609de..51afc8a9704d 100644
> --- a/drivers/gpu/drm/drm_gpusvm.c
> +++ b/drivers/gpu/drm/drm_gpusvm.c
> @@ -175,11 +175,8 @@
> * }
> *
> * if (driver_migration_policy(range)) {
> - * mmap_read_lock(mm);
> - * devmem = driver_alloc_devmem();
> - * err = drm_pagemap_migrate_to_devmem(devmem, gpusvm->mm, gpuva_start,
> - * gpuva_end, driver_pgmap_owner());
> - * mmap_read_unlock(mm);
> + * err = drm_pagemap_populate_mm(driver_choose_drm_pagemap(),
> + * gpuva_start, gpuva_end, gpusvm->mm);
> * if (err) // CPU mappings may have changed
> * goto retry;
> * }
> diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c
> index 3551a50d7381..25395685a9b8 100644
> --- a/drivers/gpu/drm/drm_pagemap.c
> +++ b/drivers/gpu/drm/drm_pagemap.c
> @@ -6,6 +6,7 @@
> #include <linux/dma-mapping.h>
> #include <linux/migrate.h>
> #include <linux/pagemap.h>
> +#include <drm/drm_drv.h>
> #include <drm/drm_pagemap.h>
>
> /**
> @@ -809,3 +810,36 @@ struct drm_pagemap *drm_pagemap_page_to_dpagemap(struct page *page)
> return zdd->devmem_allocation->dpagemap;
> }
> EXPORT_SYMBOL_GPL(drm_pagemap_page_to_dpagemap);
> +
> +/**
> + * drm_pagemap_populate_mm() - Populate a virtual range with device memory pages
> + * @dpagemap: Pointer to the drm_pagemap managing the device memory
> + * @start: Start of the virtual range to populate.
> + * @end: End of the virtual range to populate.
> + * @mm: Pointer to the virtual address space.
> + *
> + * Attempt to populate a virtual range with device memory pages,
> + * clearing them or migrating data from the existing pages if necessary.
> + * The function is best effort only, and implementations may vary
> + * in how hard they try to satisfy the request.
> + *
> + * Return: 0 on success, negative error code on error. If the hardware
> + * device was removed / unbound the function will return -ENODEV;
> + */
> +int drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
> + unsigned long start, unsigned long end,
> + struct mm_struct *mm,
> + unsigned long timeslice_ms)
> +{
> + int err;
> +
> + if (!mmget_not_zero(mm))
> + return -EFAULT;
> + mmap_read_lock(mm);
> + err = dpagemap->ops->populate_mm(dpagemap, start, end, mm,
> + timeslice_ms);
> + mmap_read_unlock(mm);
> + mmput(mm);
> +
> + return err;
> +}
> diff --git a/include/drm/drm_pagemap.h b/include/drm/drm_pagemap.h
> index dabc9c365df4..e5f20a1235be 100644
> --- a/include/drm/drm_pagemap.h
> +++ b/include/drm/drm_pagemap.h
> @@ -92,6 +92,35 @@ struct drm_pagemap_ops {
> struct device *dev,
> struct drm_pagemap_device_addr addr);
>
> + /**
> + * @populate_mm: Populate part of the mm with @dpagemap memory,
> + * migrating existing data.
> + * @dpagemap: The struct drm_pagemap managing the memory.
> + * @start: The virtual start address in @mm
> + * @end: The virtual end address in @mm
> + * @mm: Pointer to a live mm. The caller must have an mmget()
> + * reference.
> + *
> + * The caller will have the mm lock at least in read mode.
> + * Note that there is no guarantee that the memory is resident
> + * after the function returns, it's best effort only.
> + * When the mm is not using the memory anymore,
> + * it will be released. The struct drm_pagemap might have a
> + * mechanism in place to reclaim the memory and the data will
> + * then be migrated. Typically to system memory.
> + * The implementation should hold sufficient runtime power-
> + * references while pages are used in an address space and
> + * should ideally guard against hardware device unbind in
> + * a way such that device pages are migrated back to system
> + * followed by device page removal. The implementation should
> + * return -ENODEV after device removal.
> + *
> + * Return: 0 if successful. Negative error code on error.
> + */
> + int (*populate_mm)(struct drm_pagemap *dpagemap,
> + unsigned long start, unsigned long end,
> + struct mm_struct *mm,
> + unsigned long timeslice_ms);
> };
>
> /**
> @@ -205,4 +234,9 @@ void drm_pagemap_devmem_init(struct drm_pagemap_devmem *devmem_allocation,
> const struct drm_pagemap_devmem_ops *ops,
> struct drm_pagemap *dpagemap, size_t size);
>
> +int drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
> + unsigned long start, unsigned long end,
> + struct mm_struct *mm,
> + unsigned long timeslice_ms);
> +
> #endif
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 3/3] drm/xe: Implement and use the drm_pagemap populate_mm op
2025-06-04 15:04 ` Matthew Brost
@ 2025-06-05 7:37 ` Thomas Hellström
0 siblings, 0 replies; 16+ messages in thread
From: Thomas Hellström @ 2025-06-05 7:37 UTC (permalink / raw)
To: Matthew Brost
Cc: intel-xe, dri-devel, himal.prasad.ghimiray, apopple, airlied,
Simona Vetter, felix.kuehling, Christian König, dakr,
Mrozek, Michal, Joonas Lahtinen
On Wed, 2025-06-04 at 08:04 -0700, Matthew Brost wrote:
> On Wed, Jun 04, 2025 at 11:35:36AM +0200, Thomas Hellström wrote:
> > Add runtime PM since we might call populate_mm on a foreign device.
>
> I think taking a runtime PM will fix hard to hit splat [1] too.
>
> [1]
> https://patchwork.freedesktop.org/patch/648954/?series=147849&rev=1
>
> > Also create the VRAM bos as ttm_bo_type_kernel. This avoids the
> > initial clearing and the creation of an mmap handle.
> >
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> > drivers/gpu/drm/drm_pagemap.c | 1 +
> > drivers/gpu/drm/xe/xe_svm.c | 104 ++++++++++++++++++++----------
> > ----
> > drivers/gpu/drm/xe/xe_svm.h | 10 ++--
> > drivers/gpu/drm/xe/xe_tile.h | 11 ++++
> > drivers/gpu/drm/xe/xe_vm.c | 2 +-
> > 5 files changed, 78 insertions(+), 50 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_pagemap.c
> > b/drivers/gpu/drm/drm_pagemap.c
> > index 25395685a9b8..94619be00d2a 100644
> > --- a/drivers/gpu/drm/drm_pagemap.c
> > +++ b/drivers/gpu/drm/drm_pagemap.c
> > @@ -843,3 +843,4 @@ int drm_pagemap_populate_mm(struct drm_pagemap
> > *dpagemap,
> >
> > return err;
> > }
> > +EXPORT_SYMBOL(drm_pagemap_populate_mm);
> > diff --git a/drivers/gpu/drm/xe/xe_svm.c
> > b/drivers/gpu/drm/xe/xe_svm.c
> > index e161ce3e67a1..a10aab3768d8 100644
> > --- a/drivers/gpu/drm/xe/xe_svm.c
> > +++ b/drivers/gpu/drm/xe/xe_svm.c
> > @@ -3,13 +3,17 @@
> > * Copyright © 2024 Intel Corporation
> > */
> >
> > +#include <drm/drm_drv.h>
> > +
> > #include "xe_bo.h"
> > #include "xe_gt_stats.h"
> > #include "xe_gt_tlb_invalidation.h"
> > #include "xe_migrate.h"
> > #include "xe_module.h"
> > +#include "xe_pm.h"
> > #include "xe_pt.h"
> > #include "xe_svm.h"
> > +#include "xe_tile.h"
> > #include "xe_ttm_vram_mgr.h"
> > #include "xe_vm.h"
> > #include "xe_vm_types.h"
> > @@ -525,8 +529,10 @@ static struct xe_bo *to_xe_bo(struct
> > drm_pagemap_devmem *devmem_allocation)
> > static void xe_svm_devmem_release(struct drm_pagemap_devmem
> > *devmem_allocation)
> > {
> > struct xe_bo *bo = to_xe_bo(devmem_allocation);
> > + struct xe_device *xe = xe_bo_device(bo);
> >
> > xe_bo_put_async(bo);
> > + xe_pm_runtime_put(xe);
> > }
> >
> > static u64 block_offset_to_pfn(struct xe_vram_region *vr, u64
> > offset)
> > @@ -720,76 +726,63 @@ static struct xe_vram_region
> > *tile_to_vr(struct xe_tile *tile)
> > return &tile->mem.vram;
> > }
> >
> > -/**
> > - * xe_svm_alloc_vram()- Allocate device memory pages for range,
> > - * migrating existing data.
> > - * @vm: The VM.
> > - * @tile: tile to allocate vram from
> > - * @range: SVM range
> > - * @ctx: DRM GPU SVM context
> > - *
> > - * Return: 0 on success, error code on failure.
> > - */
> > -int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
> > - struct xe_svm_range *range,
> > - const struct drm_gpusvm_ctx *ctx)
> > +static int xe_drm_pagemap_populate_mm(struct drm_pagemap
> > *dpagemap,
> > + unsigned long start,
> > unsigned long end,
> > + struct mm_struct *mm,
> > + unsigned long timeslice_ms)
> > {
> > - struct mm_struct *mm = vm->svm.gpusvm.mm;
> > + struct xe_tile *tile = container_of(dpagemap,
> > typeof(*tile), mem.vram.dpagemap);
>
> I think this is going to chnage here [2] making mem.vram a pointer.
> Maybe a helper to go from dpagemap -> tile to future proof a little.
> I
> think once [2] lands, we will need to pick the root tile here.
>
> [2] https://patchwork.freedesktop.org/series/149503/
OK, Yes I think such a helper is part of later patches in the
multidevice series not posted yet pending madvise.
In any case we probably need to associate a dpagemap with a vram region
and then figure out what to use for migration.
>
> > + struct xe_device *xe = tile_to_xe(tile);
> > + struct device *dev = xe->drm.dev;
> > struct xe_vram_region *vr = tile_to_vr(tile);
> > struct drm_buddy_block *block;
> > struct list_head *blocks;
> > struct xe_bo *bo;
> > - ktime_t end = 0;
> > - int err;
> > -
> > - if (!range->base.flags.migrate_devmem)
> > - return -EINVAL;
> > + ktime_t time_end = 0;
> > + int err, idx;
> >
> > - range_debug(range, "ALLOCATE VRAM");
> > + if (!drm_dev_enter(&xe->drm, &idx))
> > + return -ENODEV;
> >
> > - if (!mmget_not_zero(mm))
> > - return -EFAULT;
> > - mmap_read_lock(mm);
> > + xe_pm_runtime_get(xe);
>
> A forgien device might not be awake so is that why you are using
> xe_pm_runtime_get rather than xe_pm_runtime_get_noresume?
Yes exactly.
> We only have
> the MMAP lock here so assuming that is safe with our runtime PM flow.
It should. We use the same in the CPU fault handle.
>
> >
> > -retry:
> > - bo = xe_bo_create_locked(tile_to_xe(tile), NULL, NULL,
> > - xe_svm_range_size(range),
> > - ttm_bo_type_device,
> > + retry:
> > + bo = xe_bo_create_locked(tile_to_xe(tile), NULL, NULL, end
> > - start,
> > + ttm_bo_type_kernel,
> > XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> > XE_BO_FLAG_CPU_ADDR_MIRROR);
> > if (IS_ERR(bo)) {
> > err = PTR_ERR(bo);
> > - if (xe_vm_validate_should_retry(NULL, err, &end))
> > + if (xe_vm_validate_should_retry(NULL, err,
> > &time_end))
> > goto retry;
> > - goto unlock;
> > + goto out_pm_put;
> > }
> >
> > - drm_pagemap_devmem_init(&bo->devmem_allocation,
> > - vm->xe->drm.dev, mm,
> > + drm_pagemap_devmem_init(&bo->devmem_allocation, dev, mm,
> > &dpagemap_devmem_ops,
> > &tile->mem.vram.dpagemap,
> > - xe_svm_range_size(range));
> > + end - start);
> >
> > blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)-
> > >blocks;
> > list_for_each_entry(block, blocks, link)
> > block->private = vr;
> >
> > xe_bo_get(bo);
> > - err = drm_pagemap_migrate_to_devmem(&bo-
> > >devmem_allocation,
> > - mm,
> > -
> > xe_svm_range_start(range),
> > -
> > xe_svm_range_end(range),
> > - ctx->timeslice_ms,
> > - xe_svm_devm_owner(vm-
> > >xe));
> > +
> > + /* Ensure the device has a pm ref while there are device
> > pages active. */
> > + xe_pm_runtime_get_noresume(xe);
> > + err = drm_pagemap_migrate_to_devmem(&bo-
> > >devmem_allocation, mm,
> > + start, end,
> > timeslice_ms,
> > +
> > xe_svm_devm_owner(xe));
> > if (err)
> > xe_svm_devmem_release(&bo->devmem_allocation);
> >
> > xe_bo_unlock(bo);
> > xe_bo_put(bo);
> >
> > -unlock:
> > - mmap_read_unlock(mm);
> > - mmput(mm);
> > +out_pm_put:
> > + xe_pm_runtime_put(xe);
> > + drm_dev_exit(idx);
> >
> > return err;
> > }
> > @@ -898,7 +891,7 @@ int xe_svm_handle_pagefault(struct xe_vm *vm,
> > struct xe_vma *vma,
> >
> > if (--migrate_try_count >= 0 &&
> > xe_svm_range_needs_migrate_to_vram(range, vma,
> > IS_DGFX(vm->xe))) {
> > - err = xe_svm_alloc_vram(vm, tile, range, &ctx);
> > + err = xe_svm_alloc_vram(tile, range, &ctx);
> > ctx.timeslice_ms <<= 1; /* Double
> > timeslice if we have to retry */
> > if (err) {
> > if (migrate_try_count || !ctx.devmem_only)
> > {
> > @@ -1054,6 +1047,30 @@ int xe_svm_range_get_pages(struct xe_vm *vm,
> > struct xe_svm_range *range,
> >
> > #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> >
> > +/**
> > + * xe_svm_alloc_vram()- Allocate device memory pages for range,
> > + * migrating existing data.
> > + * @vm: The VM.
> > + * @tile: tile to allocate vram from
> > + * @range: SVM range
> > + * @ctx: DRM GPU SVM context
> > + *
> > + * Return: 0 on success, error code on failure.
> > + */
> > +int xe_svm_alloc_vram(struct xe_tile *tile, struct xe_svm_range
> > *range,
> > + const struct drm_gpusvm_ctx *ctx)
> > +{
> > + struct drm_pagemap *dpagemap;
> > +
> > + range_debug(range, "ALLOCATE VRAM");
> > +
>
> if (!range->base.flags.migrate_devmem)
> return -EINVAL;
>
> Or I guess an assert would work too as caller should have check this
> field.
Sure. Will fix.
/Thomas
>
> Matt
>
> > + dpagemap = xe_tile_local_pagemap(tile);
> > + return drm_pagemap_populate_mm(dpagemap,
> > xe_svm_range_start(range),
> > + xe_svm_range_end(range),
> > + range->base.gpusvm->mm,
> > + ctx->timeslice_ms);
> > +}
> > +
> > static struct drm_pagemap_device_addr
> > xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap,
> > struct device *dev,
> > @@ -1078,6 +1095,7 @@ xe_drm_pagemap_device_map(struct drm_pagemap
> > *dpagemap,
> >
> > static const struct drm_pagemap_ops xe_drm_pagemap_ops = {
> > .device_map = xe_drm_pagemap_device_map,
> > + .populate_mm = xe_drm_pagemap_populate_mm,
> > };
> >
> > /**
> > @@ -1130,7 +1148,7 @@ int xe_devm_add(struct xe_tile *tile, struct
> > xe_vram_region *vr)
> > return 0;
> > }
> > #else
> > -int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
> > +int xe_svm_alloc_vram(struct xe_tile *tile,
> > struct xe_svm_range *range,
> > const struct drm_gpusvm_ctx *ctx)
> > {
> > diff --git a/drivers/gpu/drm/xe/xe_svm.h
> > b/drivers/gpu/drm/xe/xe_svm.h
> > index 19ce4f2754a7..da9a69ea0bb1 100644
> > --- a/drivers/gpu/drm/xe/xe_svm.h
> > +++ b/drivers/gpu/drm/xe/xe_svm.h
> > @@ -70,8 +70,7 @@ int xe_svm_bo_evict(struct xe_bo *bo);
> >
> > void xe_svm_range_debug(struct xe_svm_range *range, const char
> > *operation);
> >
> > -int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
> > - struct xe_svm_range *range,
> > +int xe_svm_alloc_vram(struct xe_tile *tile, struct xe_svm_range
> > *range,
> > const struct drm_gpusvm_ctx *ctx);
> >
> > struct xe_svm_range *xe_svm_range_find_or_insert(struct xe_vm *vm,
> > u64 addr,
> > @@ -237,10 +236,9 @@ void xe_svm_range_debug(struct xe_svm_range
> > *range, const char *operation)
> > {
> > }
> >
> > -static inline
> > -int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
> > - struct xe_svm_range *range,
> > - const struct drm_gpusvm_ctx *ctx)
> > +static inline int
> > +xe_svm_alloc_vram(struct xe_tile *tile, struct xe_svm_range
> > *range,
> > + const struct drm_gpusvm_ctx *ctx)
> > {
> > return -EOPNOTSUPP;
> > }
> > diff --git a/drivers/gpu/drm/xe/xe_tile.h
> > b/drivers/gpu/drm/xe/xe_tile.h
> > index eb939316d55b..066a3d0cea79 100644
> > --- a/drivers/gpu/drm/xe/xe_tile.h
> > +++ b/drivers/gpu/drm/xe/xe_tile.h
> > @@ -16,4 +16,15 @@ int xe_tile_init(struct xe_tile *tile);
> >
> > void xe_tile_migrate_wait(struct xe_tile *tile);
> >
> > +#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> > +static inline struct drm_pagemap *xe_tile_local_pagemap(struct
> > xe_tile *tile)
> > +{
> > + return &tile->mem.vram.dpagemap;
> > +}
> > +#else
> > +static inline struct drm_pagemap *xe_tile_local_pagemap(struct
> > xe_tile *tile)
> > +{
> > + return NULL;
> > +}
> > +#endif
> > #endif
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > b/drivers/gpu/drm/xe/xe_vm.c
> > index 7140d8856bad..def493acb4d7 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -2911,7 +2911,7 @@ static int prefetch_ranges(struct xe_vm *vm,
> > struct xe_vma_op *op)
> >
> > if (xe_svm_range_needs_migrate_to_vram(svm_range,
> > vma, region)) {
> > tile = &vm->xe-
> > >tiles[region_to_mem_type[region] - XE_PL_VRAM0];
> > - err = xe_svm_alloc_vram(vm, tile,
> > svm_range, &ctx);
> > + err = xe_svm_alloc_vram(tile, svm_range,
> > &ctx);
> > if (err) {
> > drm_dbg(&vm->xe->drm, "VRAM
> > allocation failed, retry from userspace, asid=%u, gpusvm=%p,
> > errno=%pe\n",
> > vm->usm.asid, &vm-
> > >svm.gpusvm, ERR_PTR(err));
> > --
> > 2.49.0
> >
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] drm/pagemap: Add a populate_mm op
2025-06-04 22:05 ` Matthew Brost
@ 2025-06-05 7:40 ` Thomas Hellström
0 siblings, 0 replies; 16+ messages in thread
From: Thomas Hellström @ 2025-06-05 7:40 UTC (permalink / raw)
To: Matthew Brost
Cc: intel-xe, dri-devel, himal.prasad.ghimiray, apopple, airlied,
Simona Vetter, felix.kuehling, Christian König, dakr,
Mrozek, Michal, Joonas Lahtinen
On Wed, 2025-06-04 at 15:05 -0700, Matthew Brost wrote:
> On Wed, Jun 04, 2025 at 11:35:35AM +0200, Thomas Hellström wrote:
> > Add an operation to populate a part of a drm_mm with device
> > private memory.
> >
>
> With the kernel doc fixed:
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Thanks for reviewing,
Thomas
>
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> > drivers/gpu/drm/drm_gpusvm.c | 7 ++-----
> > drivers/gpu/drm/drm_pagemap.c | 34
> > ++++++++++++++++++++++++++++++++++
> > include/drm/drm_pagemap.h | 34
> > ++++++++++++++++++++++++++++++++++
> > 3 files changed, 70 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_gpusvm.c
> > b/drivers/gpu/drm/drm_gpusvm.c
> > index ef81381609de..51afc8a9704d 100644
> > --- a/drivers/gpu/drm/drm_gpusvm.c
> > +++ b/drivers/gpu/drm/drm_gpusvm.c
> > @@ -175,11 +175,8 @@
> > * }
> > *
> > * if (driver_migration_policy(range)) {
> > - * mmap_read_lock(mm);
> > - * devmem = driver_alloc_devmem();
> > - * err =
> > drm_pagemap_migrate_to_devmem(devmem, gpusvm->mm, gpuva_start,
> > - *
> > gpuva_end, driver_pgmap_owner());
> > - * mmap_read_unlock(mm);
> > + * err =
> > drm_pagemap_populate_mm(driver_choose_drm_pagemap(),
> > + * gpuva_start,
> > gpuva_end, gpusvm->mm);
> > * if (err) // CPU mappings may have
> > changed
> > * goto retry;
> > * }
> > diff --git a/drivers/gpu/drm/drm_pagemap.c
> > b/drivers/gpu/drm/drm_pagemap.c
> > index 3551a50d7381..25395685a9b8 100644
> > --- a/drivers/gpu/drm/drm_pagemap.c
> > +++ b/drivers/gpu/drm/drm_pagemap.c
> > @@ -6,6 +6,7 @@
> > #include <linux/dma-mapping.h>
> > #include <linux/migrate.h>
> > #include <linux/pagemap.h>
> > +#include <drm/drm_drv.h>
> > #include <drm/drm_pagemap.h>
> >
> > /**
> > @@ -809,3 +810,36 @@ struct drm_pagemap
> > *drm_pagemap_page_to_dpagemap(struct page *page)
> > return zdd->devmem_allocation->dpagemap;
> > }
> > EXPORT_SYMBOL_GPL(drm_pagemap_page_to_dpagemap);
> > +
> > +/**
> > + * drm_pagemap_populate_mm() - Populate a virtual range with
> > device memory pages
> > + * @dpagemap: Pointer to the drm_pagemap managing the device
> > memory
> > + * @start: Start of the virtual range to populate.
> > + * @end: End of the virtual range to populate.
> > + * @mm: Pointer to the virtual address space.
> > + *
> > + * Attempt to populate a virtual range with device memory pages,
> > + * clearing them or migrating data from the existing pages if
> > necessary.
> > + * The function is best effort only, and implementations may vary
> > + * in how hard they try to satisfy the request.
> > + *
> > + * Return: 0 on success, negative error code on error. If the
> > hardware
> > + * device was removed / unbound the function will return -ENODEV;
> > + */
> > +int drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
> > + unsigned long start, unsigned long
> > end,
> > + struct mm_struct *mm,
> > + unsigned long timeslice_ms)
> > +{
> > + int err;
> > +
> > + if (!mmget_not_zero(mm))
> > + return -EFAULT;
> > + mmap_read_lock(mm);
> > + err = dpagemap->ops->populate_mm(dpagemap, start, end, mm,
> > + timeslice_ms);
> > + mmap_read_unlock(mm);
> > + mmput(mm);
> > +
> > + return err;
> > +}
> > diff --git a/include/drm/drm_pagemap.h b/include/drm/drm_pagemap.h
> > index dabc9c365df4..e5f20a1235be 100644
> > --- a/include/drm/drm_pagemap.h
> > +++ b/include/drm/drm_pagemap.h
> > @@ -92,6 +92,35 @@ struct drm_pagemap_ops {
> > struct device *dev,
> > struct drm_pagemap_device_addr addr);
> >
> > + /**
> > + * @populate_mm: Populate part of the mm with @dpagemap
> > memory,
> > + * migrating existing data.
> > + * @dpagemap: The struct drm_pagemap managing the memory.
> > + * @start: The virtual start address in @mm
> > + * @end: The virtual end address in @mm
> > + * @mm: Pointer to a live mm. The caller must have an
> > mmget()
> > + * reference.
> > + *
> > + * The caller will have the mm lock at least in read mode.
> > + * Note that there is no guarantee that the memory is
> > resident
> > + * after the function returns, it's best effort only.
> > + * When the mm is not using the memory anymore,
> > + * it will be released. The struct drm_pagemap might have
> > a
> > + * mechanism in place to reclaim the memory and the data
> > will
> > + * then be migrated. Typically to system memory.
> > + * The implementation should hold sufficient runtime
> > power-
> > + * references while pages are used in an address space and
> > + * should ideally guard against hardware device unbind in
> > + * a way such that device pages are migrated back to
> > system
> > + * followed by device page removal. The implementation
> > should
> > + * return -ENODEV after device removal.
> > + *
> > + * Return: 0 if successful. Negative error code on error.
> > + */
> > + int (*populate_mm)(struct drm_pagemap *dpagemap,
> > + unsigned long start, unsigned long end,
> > + struct mm_struct *mm,
> > + unsigned long timeslice_ms);
> > };
> >
> > /**
> > @@ -205,4 +234,9 @@ void drm_pagemap_devmem_init(struct
> > drm_pagemap_devmem *devmem_allocation,
> > const struct drm_pagemap_devmem_ops
> > *ops,
> > struct drm_pagemap *dpagemap, size_t
> > size);
> >
> > +int drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
> > + unsigned long start, unsigned long
> > end,
> > + struct mm_struct *mm,
> > + unsigned long timeslice_ms);
> > +
> > #endif
> > --
> > 2.49.0
> >
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 3/3] drm/xe: Implement and use the drm_pagemap populate_mm op
2025-06-04 9:35 ` [PATCH v2 3/3] drm/xe: Implement and use the drm_pagemap " Thomas Hellström
2025-06-04 15:04 ` Matthew Brost
@ 2025-06-05 22:16 ` Matthew Brost
2025-06-13 10:16 ` Thomas Hellström
1 sibling, 1 reply; 16+ messages in thread
From: Matthew Brost @ 2025-06-05 22:16 UTC (permalink / raw)
To: Thomas Hellström
Cc: intel-xe, dri-devel, himal.prasad.ghimiray, apopple, airlied,
Simona Vetter, felix.kuehling, Christian König, dakr,
Mrozek, Michal, Joonas Lahtinen
On Wed, Jun 04, 2025 at 11:35:36AM +0200, Thomas Hellström wrote:
> Add runtime PM since we might call populate_mm on a foreign device.
> Also create the VRAM bos as ttm_bo_type_kernel. This avoids the
> initial clearing and the creation of an mmap handle.
>
I didn't read this part - skipping the initial clears. Discussed this on
a private chat but to recap we need initial clears as copies for
non-faulted in CPU pages are skipped which could result in another
processes data being exposed in VRAM. We could only issue a clear if a
non-faulted in page is found in xe_svm_copy or IIRC there was some work
flying around to clear VRAM upon free, not sure if that ever landed - I
believe AMD does clear on free their driver + buddy allocator has the
concept of dirty blocks.
Matt
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
> drivers/gpu/drm/drm_pagemap.c | 1 +
> drivers/gpu/drm/xe/xe_svm.c | 104 ++++++++++++++++++++--------------
> drivers/gpu/drm/xe/xe_svm.h | 10 ++--
> drivers/gpu/drm/xe/xe_tile.h | 11 ++++
> drivers/gpu/drm/xe/xe_vm.c | 2 +-
> 5 files changed, 78 insertions(+), 50 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c
> index 25395685a9b8..94619be00d2a 100644
> --- a/drivers/gpu/drm/drm_pagemap.c
> +++ b/drivers/gpu/drm/drm_pagemap.c
> @@ -843,3 +843,4 @@ int drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
>
> return err;
> }
> +EXPORT_SYMBOL(drm_pagemap_populate_mm);
> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index e161ce3e67a1..a10aab3768d8 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -3,13 +3,17 @@
> * Copyright © 2024 Intel Corporation
> */
>
> +#include <drm/drm_drv.h>
> +
> #include "xe_bo.h"
> #include "xe_gt_stats.h"
> #include "xe_gt_tlb_invalidation.h"
> #include "xe_migrate.h"
> #include "xe_module.h"
> +#include "xe_pm.h"
> #include "xe_pt.h"
> #include "xe_svm.h"
> +#include "xe_tile.h"
> #include "xe_ttm_vram_mgr.h"
> #include "xe_vm.h"
> #include "xe_vm_types.h"
> @@ -525,8 +529,10 @@ static struct xe_bo *to_xe_bo(struct drm_pagemap_devmem *devmem_allocation)
> static void xe_svm_devmem_release(struct drm_pagemap_devmem *devmem_allocation)
> {
> struct xe_bo *bo = to_xe_bo(devmem_allocation);
> + struct xe_device *xe = xe_bo_device(bo);
>
> xe_bo_put_async(bo);
> + xe_pm_runtime_put(xe);
> }
>
> static u64 block_offset_to_pfn(struct xe_vram_region *vr, u64 offset)
> @@ -720,76 +726,63 @@ static struct xe_vram_region *tile_to_vr(struct xe_tile *tile)
> return &tile->mem.vram;
> }
>
> -/**
> - * xe_svm_alloc_vram()- Allocate device memory pages for range,
> - * migrating existing data.
> - * @vm: The VM.
> - * @tile: tile to allocate vram from
> - * @range: SVM range
> - * @ctx: DRM GPU SVM context
> - *
> - * Return: 0 on success, error code on failure.
> - */
> -int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
> - struct xe_svm_range *range,
> - const struct drm_gpusvm_ctx *ctx)
> +static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
> + unsigned long start, unsigned long end,
> + struct mm_struct *mm,
> + unsigned long timeslice_ms)
> {
> - struct mm_struct *mm = vm->svm.gpusvm.mm;
> + struct xe_tile *tile = container_of(dpagemap, typeof(*tile), mem.vram.dpagemap);
> + struct xe_device *xe = tile_to_xe(tile);
> + struct device *dev = xe->drm.dev;
> struct xe_vram_region *vr = tile_to_vr(tile);
> struct drm_buddy_block *block;
> struct list_head *blocks;
> struct xe_bo *bo;
> - ktime_t end = 0;
> - int err;
> -
> - if (!range->base.flags.migrate_devmem)
> - return -EINVAL;
> + ktime_t time_end = 0;
> + int err, idx;
>
> - range_debug(range, "ALLOCATE VRAM");
> + if (!drm_dev_enter(&xe->drm, &idx))
> + return -ENODEV;
>
> - if (!mmget_not_zero(mm))
> - return -EFAULT;
> - mmap_read_lock(mm);
> + xe_pm_runtime_get(xe);
>
> -retry:
> - bo = xe_bo_create_locked(tile_to_xe(tile), NULL, NULL,
> - xe_svm_range_size(range),
> - ttm_bo_type_device,
> + retry:
> + bo = xe_bo_create_locked(tile_to_xe(tile), NULL, NULL, end - start,
> + ttm_bo_type_kernel,
> XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> XE_BO_FLAG_CPU_ADDR_MIRROR);
> if (IS_ERR(bo)) {
> err = PTR_ERR(bo);
> - if (xe_vm_validate_should_retry(NULL, err, &end))
> + if (xe_vm_validate_should_retry(NULL, err, &time_end))
> goto retry;
> - goto unlock;
> + goto out_pm_put;
> }
>
> - drm_pagemap_devmem_init(&bo->devmem_allocation,
> - vm->xe->drm.dev, mm,
> + drm_pagemap_devmem_init(&bo->devmem_allocation, dev, mm,
> &dpagemap_devmem_ops,
> &tile->mem.vram.dpagemap,
> - xe_svm_range_size(range));
> + end - start);
>
> blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)->blocks;
> list_for_each_entry(block, blocks, link)
> block->private = vr;
>
> xe_bo_get(bo);
> - err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation,
> - mm,
> - xe_svm_range_start(range),
> - xe_svm_range_end(range),
> - ctx->timeslice_ms,
> - xe_svm_devm_owner(vm->xe));
> +
> + /* Ensure the device has a pm ref while there are device pages active. */
> + xe_pm_runtime_get_noresume(xe);
> + err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation, mm,
> + start, end, timeslice_ms,
> + xe_svm_devm_owner(xe));
> if (err)
> xe_svm_devmem_release(&bo->devmem_allocation);
>
> xe_bo_unlock(bo);
> xe_bo_put(bo);
>
> -unlock:
> - mmap_read_unlock(mm);
> - mmput(mm);
> +out_pm_put:
> + xe_pm_runtime_put(xe);
> + drm_dev_exit(idx);
>
> return err;
> }
> @@ -898,7 +891,7 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>
> if (--migrate_try_count >= 0 &&
> xe_svm_range_needs_migrate_to_vram(range, vma, IS_DGFX(vm->xe))) {
> - err = xe_svm_alloc_vram(vm, tile, range, &ctx);
> + err = xe_svm_alloc_vram(tile, range, &ctx);
> ctx.timeslice_ms <<= 1; /* Double timeslice if we have to retry */
> if (err) {
> if (migrate_try_count || !ctx.devmem_only) {
> @@ -1054,6 +1047,30 @@ int xe_svm_range_get_pages(struct xe_vm *vm, struct xe_svm_range *range,
>
> #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
>
> +/**
> + * xe_svm_alloc_vram()- Allocate device memory pages for range,
> + * migrating existing data.
> + * @vm: The VM.
> + * @tile: tile to allocate vram from
> + * @range: SVM range
> + * @ctx: DRM GPU SVM context
> + *
> + * Return: 0 on success, error code on failure.
> + */
> +int xe_svm_alloc_vram(struct xe_tile *tile, struct xe_svm_range *range,
> + const struct drm_gpusvm_ctx *ctx)
> +{
> + struct drm_pagemap *dpagemap;
> +
> + range_debug(range, "ALLOCATE VRAM");
> +
> + dpagemap = xe_tile_local_pagemap(tile);
> + return drm_pagemap_populate_mm(dpagemap, xe_svm_range_start(range),
> + xe_svm_range_end(range),
> + range->base.gpusvm->mm,
> + ctx->timeslice_ms);
> +}
> +
> static struct drm_pagemap_device_addr
> xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap,
> struct device *dev,
> @@ -1078,6 +1095,7 @@ xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap,
>
> static const struct drm_pagemap_ops xe_drm_pagemap_ops = {
> .device_map = xe_drm_pagemap_device_map,
> + .populate_mm = xe_drm_pagemap_populate_mm,
> };
>
> /**
> @@ -1130,7 +1148,7 @@ int xe_devm_add(struct xe_tile *tile, struct xe_vram_region *vr)
> return 0;
> }
> #else
> -int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
> +int xe_svm_alloc_vram(struct xe_tile *tile,
> struct xe_svm_range *range,
> const struct drm_gpusvm_ctx *ctx)
> {
> diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
> index 19ce4f2754a7..da9a69ea0bb1 100644
> --- a/drivers/gpu/drm/xe/xe_svm.h
> +++ b/drivers/gpu/drm/xe/xe_svm.h
> @@ -70,8 +70,7 @@ int xe_svm_bo_evict(struct xe_bo *bo);
>
> void xe_svm_range_debug(struct xe_svm_range *range, const char *operation);
>
> -int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
> - struct xe_svm_range *range,
> +int xe_svm_alloc_vram(struct xe_tile *tile, struct xe_svm_range *range,
> const struct drm_gpusvm_ctx *ctx);
>
> struct xe_svm_range *xe_svm_range_find_or_insert(struct xe_vm *vm, u64 addr,
> @@ -237,10 +236,9 @@ void xe_svm_range_debug(struct xe_svm_range *range, const char *operation)
> {
> }
>
> -static inline
> -int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
> - struct xe_svm_range *range,
> - const struct drm_gpusvm_ctx *ctx)
> +static inline int
> +xe_svm_alloc_vram(struct xe_tile *tile, struct xe_svm_range *range,
> + const struct drm_gpusvm_ctx *ctx)
> {
> return -EOPNOTSUPP;
> }
> diff --git a/drivers/gpu/drm/xe/xe_tile.h b/drivers/gpu/drm/xe/xe_tile.h
> index eb939316d55b..066a3d0cea79 100644
> --- a/drivers/gpu/drm/xe/xe_tile.h
> +++ b/drivers/gpu/drm/xe/xe_tile.h
> @@ -16,4 +16,15 @@ int xe_tile_init(struct xe_tile *tile);
>
> void xe_tile_migrate_wait(struct xe_tile *tile);
>
> +#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> +static inline struct drm_pagemap *xe_tile_local_pagemap(struct xe_tile *tile)
> +{
> + return &tile->mem.vram.dpagemap;
> +}
> +#else
> +static inline struct drm_pagemap *xe_tile_local_pagemap(struct xe_tile *tile)
> +{
> + return NULL;
> +}
> +#endif
> #endif
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 7140d8856bad..def493acb4d7 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -2911,7 +2911,7 @@ static int prefetch_ranges(struct xe_vm *vm, struct xe_vma_op *op)
>
> if (xe_svm_range_needs_migrate_to_vram(svm_range, vma, region)) {
> tile = &vm->xe->tiles[region_to_mem_type[region] - XE_PL_VRAM0];
> - err = xe_svm_alloc_vram(vm, tile, svm_range, &ctx);
> + err = xe_svm_alloc_vram(tile, svm_range, &ctx);
> if (err) {
> drm_dbg(&vm->xe->drm, "VRAM allocation failed, retry from userspace, asid=%u, gpusvm=%p, errno=%pe\n",
> vm->usm.asid, &vm->svm.gpusvm, ERR_PTR(err));
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/3] drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap
2025-06-04 9:35 ` [PATCH v2 1/3] drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap Thomas Hellström
2025-06-04 15:45 ` kernel test robot
@ 2025-06-05 22:44 ` Matthew Brost
2025-06-13 10:01 ` Thomas Hellström
1 sibling, 1 reply; 16+ messages in thread
From: Matthew Brost @ 2025-06-05 22:44 UTC (permalink / raw)
To: Thomas Hellström
Cc: intel-xe, dri-devel, himal.prasad.ghimiray, apopple, airlied,
Simona Vetter, felix.kuehling, Christian König, dakr,
Mrozek, Michal, Joonas Lahtinen
On Wed, Jun 04, 2025 at 11:35:34AM +0200, Thomas Hellström wrote:
> From: Matthew Brost <matthew.brost@intel.com>
>
> The migration functionality and track-keeping of per-pagemap VRAM
> mapped to the CPU mm is not per GPU_vm, but rather per pagemap.
> This is also reflected by the functions not needing the drm_gpusvm
> structures. So move to drm_pagemap.
>
> With this, drm_gpusvm shouldn't really access the page zone-device-data
> since its meaning is internal to drm_pagemap. Currently it's used to
> reject mapping ranges backed by multiple drm_pagemap allocations.
> For now, make the zone-device-data a void pointer.
>
> Rename CONFIG_DRM_XE_DEVMEM_MIRROR to CONFIG_DRM_XE_PAGEMAP.
>
> Matt is listed as author of this commit since he wrote most of the code,
> and it makes sense to retain his git authorship.
> Thomas mostly moved the code around.
>
Kernel test robot has kernel doc fixes. A couple questions / comments on
the new doc below.
> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
> Documentation/gpu/rfc/gpusvm.rst | 12 +-
> drivers/gpu/drm/Makefile | 6 +-
> drivers/gpu/drm/drm_gpusvm.c | 759 +------------------------
> drivers/gpu/drm/drm_pagemap.c | 811 +++++++++++++++++++++++++++
> drivers/gpu/drm/xe/Kconfig | 10 +-
> drivers/gpu/drm/xe/xe_bo_types.h | 2 +-
> drivers/gpu/drm/xe/xe_device_types.h | 2 +-
> drivers/gpu/drm/xe/xe_svm.c | 49 +-
> include/drm/drm_gpusvm.h | 96 ----
> include/drm/drm_pagemap.h | 101 ++++
> 10 files changed, 974 insertions(+), 874 deletions(-)
> create mode 100644 drivers/gpu/drm/drm_pagemap.c
>
> diff --git a/Documentation/gpu/rfc/gpusvm.rst b/Documentation/gpu/rfc/gpusvm.rst
> index bcf66a8137a6..469db1372f16 100644
> --- a/Documentation/gpu/rfc/gpusvm.rst
> +++ b/Documentation/gpu/rfc/gpusvm.rst
> @@ -73,15 +73,21 @@ Overview of baseline design
> .. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
> :doc: Locking
>
> -.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
> - :doc: Migration
> -
> .. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
> :doc: Partial Unmapping of Ranges
>
> .. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
> :doc: Examples
>
> +Overview of drm_pagemap design
> +==============================
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_pagemap.c
> + :doc: Overview
> +
> +.. kernel-doc:: drivers/gpu/drm/drm_pagemap.c
> + :doc: Migration
> +
> Possible future design features
> ===============================
>
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index 4199715670b1..f9cde5717f85 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -104,7 +104,11 @@ obj-$(CONFIG_DRM_PANEL_BACKLIGHT_QUIRKS) += drm_panel_backlight_quirks.o
> #
> obj-$(CONFIG_DRM_EXEC) += drm_exec.o
> obj-$(CONFIG_DRM_GPUVM) += drm_gpuvm.o
> -obj-$(CONFIG_DRM_GPUSVM) += drm_gpusvm.o
> +
> +drm_gpusvm_helper-y := \
> + drm_gpusvm.o\
> + drm_pagemap.o
> +obj-$(CONFIG_DRM_GPUSVM) += drm_gpusvm_helper.o
>
> obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
>
> diff --git a/drivers/gpu/drm/drm_gpusvm.c b/drivers/gpu/drm/drm_gpusvm.c
> index 7ff81aa0a1ca..ef81381609de 100644
> --- a/drivers/gpu/drm/drm_gpusvm.c
> +++ b/drivers/gpu/drm/drm_gpusvm.c
> @@ -8,10 +8,9 @@
>
> #include <linux/dma-mapping.h>
> #include <linux/hmm.h>
> +#include <linux/hugetlb_inline.h>
> #include <linux/memremap.h>
> -#include <linux/migrate.h>
> #include <linux/mm_types.h>
> -#include <linux/pagemap.h>
> #include <linux/slab.h>
>
> #include <drm/drm_device.h>
> @@ -107,21 +106,6 @@
> * to add annotations to GPU SVM.
> */
>
> -/**
> - * DOC: Migration
> - *
> - * The migration support is quite simple, allowing migration between RAM and
> - * device memory at the range granularity. For example, GPU SVM currently does
> - * not support mixing RAM and device memory pages within a range. This means
> - * that upon GPU fault, the entire range can be migrated to device memory, and
> - * upon CPU fault, the entire range is migrated to RAM. Mixed RAM and device
> - * memory storage within a range could be added in the future if required.
> - *
> - * The reasoning for only supporting range granularity is as follows: it
> - * simplifies the implementation, and range sizes are driver-defined and should
> - * be relatively small.
> - */
> -
> /**
> * DOC: Partial Unmapping of Ranges
> *
> @@ -193,10 +177,9 @@
> * if (driver_migration_policy(range)) {
> * mmap_read_lock(mm);
> * devmem = driver_alloc_devmem();
> - * err = drm_gpusvm_migrate_to_devmem(gpusvm, range,
> - * devmem_allocation,
> - * &ctx);
> - * mmap_read_unlock(mm);
> + * err = drm_pagemap_migrate_to_devmem(devmem, gpusvm->mm, gpuva_start,
> + * gpuva_end, driver_pgmap_owner());
> + * mmap_read_unlock(mm);
> * if (err) // CPU mappings may have changed
> * goto retry;
> * }
> @@ -288,97 +271,6 @@ npages_in_range(unsigned long start, unsigned long end)
> return (end - start) >> PAGE_SHIFT;
> }
>
> -/**
> - * struct drm_gpusvm_zdd - GPU SVM zone device data
> - *
> - * @refcount: Reference count for the zdd
> - * @devmem_allocation: device memory allocation
> - * @device_private_page_owner: Device private pages owner
> - *
> - * This structure serves as a generic wrapper installed in
> - * page->zone_device_data. It provides infrastructure for looking up a device
> - * memory allocation upon CPU page fault and asynchronously releasing device
> - * memory once the CPU has no page references. Asynchronous release is useful
> - * because CPU page references can be dropped in IRQ contexts, while releasing
> - * device memory likely requires sleeping locks.
> - */
> -struct drm_gpusvm_zdd {
> - struct kref refcount;
> - struct drm_gpusvm_devmem *devmem_allocation;
> - void *device_private_page_owner;
> -};
> -
> -/**
> - * drm_gpusvm_zdd_alloc() - Allocate a zdd structure.
> - * @device_private_page_owner: Device private pages owner
> - *
> - * This function allocates and initializes a new zdd structure. It sets up the
> - * reference count and initializes the destroy work.
> - *
> - * Return: Pointer to the allocated zdd on success, ERR_PTR() on failure.
> - */
> -static struct drm_gpusvm_zdd *
> -drm_gpusvm_zdd_alloc(void *device_private_page_owner)
> -{
> - struct drm_gpusvm_zdd *zdd;
> -
> - zdd = kmalloc(sizeof(*zdd), GFP_KERNEL);
> - if (!zdd)
> - return NULL;
> -
> - kref_init(&zdd->refcount);
> - zdd->devmem_allocation = NULL;
> - zdd->device_private_page_owner = device_private_page_owner;
> -
> - return zdd;
> -}
> -
> -/**
> - * drm_gpusvm_zdd_get() - Get a reference to a zdd structure.
> - * @zdd: Pointer to the zdd structure.
> - *
> - * This function increments the reference count of the provided zdd structure.
> - *
> - * Return: Pointer to the zdd structure.
> - */
> -static struct drm_gpusvm_zdd *drm_gpusvm_zdd_get(struct drm_gpusvm_zdd *zdd)
> -{
> - kref_get(&zdd->refcount);
> - return zdd;
> -}
> -
> -/**
> - * drm_gpusvm_zdd_destroy() - Destroy a zdd structure.
> - * @ref: Pointer to the reference count structure.
> - *
> - * This function queues the destroy_work of the zdd for asynchronous destruction.
> - */
> -static void drm_gpusvm_zdd_destroy(struct kref *ref)
> -{
> - struct drm_gpusvm_zdd *zdd =
> - container_of(ref, struct drm_gpusvm_zdd, refcount);
> - struct drm_gpusvm_devmem *devmem = zdd->devmem_allocation;
> -
> - if (devmem) {
> - complete_all(&devmem->detached);
> - if (devmem->ops->devmem_release)
> - devmem->ops->devmem_release(devmem);
> - }
> - kfree(zdd);
> -}
> -
> -/**
> - * drm_gpusvm_zdd_put() - Put a zdd reference.
> - * @zdd: Pointer to the zdd structure.
> - *
> - * This function decrements the reference count of the provided zdd structure
> - * and schedules its destruction if the count drops to zero.
> - */
> -static void drm_gpusvm_zdd_put(struct drm_gpusvm_zdd *zdd)
> -{
> - kref_put(&zdd->refcount, drm_gpusvm_zdd_destroy);
> -}
> -
> /**
> * drm_gpusvm_range_find() - Find GPU SVM range from GPU SVM notifier
> * @notifier: Pointer to the GPU SVM notifier structure.
> @@ -945,7 +837,7 @@ drm_gpusvm_range_chunk_size(struct drm_gpusvm *gpusvm,
> * process-many-malloc' fails. In the failure case, each process
> * mallocs 16k but the CPU VMA is ~128k which results in 64k SVM
> * ranges. When migrating the SVM ranges, some processes fail in
> - * drm_gpusvm_migrate_to_devmem with 'migrate.cpages != npages'
> + * drm_pagemap_migrate_to_devmem with 'migrate.cpages != npages'
> * and then upon drm_gpusvm_range_get_pages device pages from
> * other processes are collected + faulted in which creates all
> * sorts of problems. Unsure exactly how this happening, also
> @@ -1363,7 +1255,7 @@ int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm,
> .dev_private_owner = gpusvm->device_private_page_owner,
> };
> struct mm_struct *mm = gpusvm->mm;
> - struct drm_gpusvm_zdd *zdd;
> + void *zdd;
> unsigned long timeout =
> jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
> unsigned long i, j;
> @@ -1465,7 +1357,7 @@ int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm,
> }
>
> pagemap = page_pgmap(page);
> - dpagemap = zdd->devmem_allocation->dpagemap;
> + dpagemap = drm_pagemap_page_to_dpagemap(page);
> if (drm_WARN_ON(gpusvm->drm, !dpagemap)) {
> /*
> * Raced. This is not supposed to happen
> @@ -1489,7 +1381,7 @@ int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm,
> } else {
> dma_addr_t addr;
>
> - if (is_zone_device_page(page) || zdd) {
> + if (is_zone_device_page(page) || pagemap) {
> err = -EOPNOTSUPP;
> goto err_unmap;
> }
> @@ -1517,7 +1409,7 @@ int drm_gpusvm_range_get_pages(struct drm_gpusvm *gpusvm,
> flags.has_dma_mapping = true;
> }
>
> - if (zdd) {
> + if (pagemap) {
> flags.has_devmem_pages = true;
> range->dpagemap = dpagemap;
> }
> @@ -1545,6 +1437,7 @@ EXPORT_SYMBOL_GPL(drm_gpusvm_range_get_pages);
>
> /**
> * drm_gpusvm_range_unmap_pages() - Unmap pages associated with a GPU SVM range
> + * drm_gpusvm_range_evict() - Evict GPU SVM range
> * @gpusvm: Pointer to the GPU SVM structure
> * @range: Pointer to the GPU SVM range structure
> * @ctx: GPU SVM context
> @@ -1575,562 +1468,11 @@ void drm_gpusvm_range_unmap_pages(struct drm_gpusvm *gpusvm,
> EXPORT_SYMBOL_GPL(drm_gpusvm_range_unmap_pages);
>
> /**
> - * drm_gpusvm_migration_unlock_put_page() - Put a migration page
> - * @page: Pointer to the page to put
> - *
> - * This function unlocks and puts a page.
> - */
> -static void drm_gpusvm_migration_unlock_put_page(struct page *page)
> -{
> - unlock_page(page);
> - put_page(page);
> -}
> -
> -/**
> - * drm_gpusvm_migration_unlock_put_pages() - Put migration pages
> - * @npages: Number of pages
> - * @migrate_pfn: Array of migrate page frame numbers
> - *
> - * This function unlocks and puts an array of pages.
> - */
> -static void drm_gpusvm_migration_unlock_put_pages(unsigned long npages,
> - unsigned long *migrate_pfn)
> -{
> - unsigned long i;
> -
> - for (i = 0; i < npages; ++i) {
> - struct page *page;
> -
> - if (!migrate_pfn[i])
> - continue;
> -
> - page = migrate_pfn_to_page(migrate_pfn[i]);
> - drm_gpusvm_migration_unlock_put_page(page);
> - migrate_pfn[i] = 0;
> - }
> -}
> -
> -/**
> - * drm_gpusvm_get_devmem_page() - Get a reference to a device memory page
> - * @page: Pointer to the page
> - * @zdd: Pointer to the GPU SVM zone device data
> - *
> - * This function associates the given page with the specified GPU SVM zone
> - * device data and initializes it for zone device usage.
> - */
> -static void drm_gpusvm_get_devmem_page(struct page *page,
> - struct drm_gpusvm_zdd *zdd)
> -{
> - page->zone_device_data = drm_gpusvm_zdd_get(zdd);
> - zone_device_page_init(page);
> -}
> -
> -/**
> - * drm_gpusvm_migrate_map_pages() - Map migration pages for GPU SVM migration
> - * @dev: The device for which the pages are being mapped
> - * @dma_addr: Array to store DMA addresses corresponding to mapped pages
> - * @migrate_pfn: Array of migrate page frame numbers to map
> - * @npages: Number of pages to map
> - * @dir: Direction of data transfer (e.g., DMA_BIDIRECTIONAL)
> - *
> - * This function maps pages of memory for migration usage in GPU SVM. It
> - * iterates over each page frame number provided in @migrate_pfn, maps the
> - * corresponding page, and stores the DMA address in the provided @dma_addr
> - * array.
> - *
> - * Return: 0 on success, -EFAULT if an error occurs during mapping.
> - */
> -static int drm_gpusvm_migrate_map_pages(struct device *dev,
> - dma_addr_t *dma_addr,
> - unsigned long *migrate_pfn,
> - unsigned long npages,
> - enum dma_data_direction dir)
> -{
> - unsigned long i;
> -
> - for (i = 0; i < npages; ++i) {
> - struct page *page = migrate_pfn_to_page(migrate_pfn[i]);
> -
> - if (!page)
> - continue;
> -
> - if (WARN_ON_ONCE(is_zone_device_page(page)))
> - return -EFAULT;
> -
> - dma_addr[i] = dma_map_page(dev, page, 0, PAGE_SIZE, dir);
> - if (dma_mapping_error(dev, dma_addr[i]))
> - return -EFAULT;
> - }
> -
> - return 0;
> -}
> -
> -/**
> - * drm_gpusvm_migrate_unmap_pages() - Unmap pages previously mapped for GPU SVM migration
> - * @dev: The device for which the pages were mapped
> - * @dma_addr: Array of DMA addresses corresponding to mapped pages
> - * @npages: Number of pages to unmap
> - * @dir: Direction of data transfer (e.g., DMA_BIDIRECTIONAL)
> - *
> - * This function unmaps previously mapped pages of memory for GPU Shared Virtual
> - * Memory (SVM). It iterates over each DMA address provided in @dma_addr, checks
> - * if it's valid and not already unmapped, and unmaps the corresponding page.
> - */
> -static void drm_gpusvm_migrate_unmap_pages(struct device *dev,
> - dma_addr_t *dma_addr,
> - unsigned long npages,
> - enum dma_data_direction dir)
> -{
> - unsigned long i;
> -
> - for (i = 0; i < npages; ++i) {
> - if (!dma_addr[i] || dma_mapping_error(dev, dma_addr[i]))
> - continue;
> -
> - dma_unmap_page(dev, dma_addr[i], PAGE_SIZE, dir);
> - }
> -}
> -
> -/**
> - * drm_gpusvm_migrate_to_devmem() - Migrate GPU SVM range to device memory
> + * drm_gpusvm_range_evict() - Evict GPU SVM range
> * @gpusvm: Pointer to the GPU SVM structure
> - * @range: Pointer to the GPU SVM range structure
> - * @devmem_allocation: Pointer to the device memory allocation. The caller
> - * should hold a reference to the device memory allocation,
> - * which should be dropped via ops->devmem_release or upon
> - * the failure of this function.
> - * @ctx: GPU SVM context
> - *
> - * This function migrates the specified GPU SVM range to device memory. It
> - * performs the necessary setup and invokes the driver-specific operations for
> - * migration to device memory. Upon successful return, @devmem_allocation can
> - * safely reference @range until ops->devmem_release is called which only upon
> - * successful return. Expected to be called while holding the mmap lock in read
> - * mode.
> - *
> - * Return: 0 on success, negative error code on failure.
> - */
> -int drm_gpusvm_migrate_to_devmem(struct drm_gpusvm *gpusvm,
> - struct drm_gpusvm_range *range,
> - struct drm_gpusvm_devmem *devmem_allocation,
> - const struct drm_gpusvm_ctx *ctx)
> -{
> - const struct drm_gpusvm_devmem_ops *ops = devmem_allocation->ops;
> - unsigned long start = drm_gpusvm_range_start(range),
> - end = drm_gpusvm_range_end(range);
> - struct migrate_vma migrate = {
> - .start = start,
> - .end = end,
> - .pgmap_owner = gpusvm->device_private_page_owner,
> - .flags = MIGRATE_VMA_SELECT_SYSTEM,
> - };
> - struct mm_struct *mm = gpusvm->mm;
> - unsigned long i, npages = npages_in_range(start, end);
> - struct vm_area_struct *vas;
> - struct drm_gpusvm_zdd *zdd = NULL;
> - struct page **pages;
> - dma_addr_t *dma_addr;
> - void *buf;
> - int err;
> -
> - mmap_assert_locked(gpusvm->mm);
> -
> - if (!range->flags.migrate_devmem)
> - return -EINVAL;
> -
> - if (!ops->populate_devmem_pfn || !ops->copy_to_devmem ||
> - !ops->copy_to_ram)
> - return -EOPNOTSUPP;
> -
> - vas = vma_lookup(mm, start);
> - if (!vas) {
> - err = -ENOENT;
> - goto err_out;
> - }
> -
> - if (end > vas->vm_end || start < vas->vm_start) {
> - err = -EINVAL;
> - goto err_out;
> - }
> -
> - if (!vma_is_anonymous(vas)) {
> - err = -EBUSY;
> - goto err_out;
> - }
> -
> - buf = kvcalloc(npages, 2 * sizeof(*migrate.src) + sizeof(*dma_addr) +
> - sizeof(*pages), GFP_KERNEL);
> - if (!buf) {
> - err = -ENOMEM;
> - goto err_out;
> - }
> - dma_addr = buf + (2 * sizeof(*migrate.src) * npages);
> - pages = buf + (2 * sizeof(*migrate.src) + sizeof(*dma_addr)) * npages;
> -
> - zdd = drm_gpusvm_zdd_alloc(gpusvm->device_private_page_owner);
> - if (!zdd) {
> - err = -ENOMEM;
> - goto err_free;
> - }
> -
> - migrate.vma = vas;
> - migrate.src = buf;
> - migrate.dst = migrate.src + npages;
> -
> - err = migrate_vma_setup(&migrate);
> - if (err)
> - goto err_free;
> -
> - if (!migrate.cpages) {
> - err = -EFAULT;
> - goto err_free;
> - }
> -
> - if (migrate.cpages != npages) {
> - err = -EBUSY;
> - goto err_finalize;
> - }
> -
> - err = ops->populate_devmem_pfn(devmem_allocation, npages, migrate.dst);
> - if (err)
> - goto err_finalize;
> -
> - err = drm_gpusvm_migrate_map_pages(devmem_allocation->dev, dma_addr,
> - migrate.src, npages, DMA_TO_DEVICE);
> - if (err)
> - goto err_finalize;
> -
> - for (i = 0; i < npages; ++i) {
> - struct page *page = pfn_to_page(migrate.dst[i]);
> -
> - pages[i] = page;
> - migrate.dst[i] = migrate_pfn(migrate.dst[i]);
> - drm_gpusvm_get_devmem_page(page, zdd);
> - }
> -
> - err = ops->copy_to_devmem(pages, dma_addr, npages);
> - if (err)
> - goto err_finalize;
> -
> - /* Upon success bind devmem allocation to range and zdd */
> - devmem_allocation->timeslice_expiration = get_jiffies_64() +
> - msecs_to_jiffies(ctx->timeslice_ms);
> - zdd->devmem_allocation = devmem_allocation; /* Owns ref */
> -
> -err_finalize:
> - if (err)
> - drm_gpusvm_migration_unlock_put_pages(npages, migrate.dst);
> - migrate_vma_pages(&migrate);
> - migrate_vma_finalize(&migrate);
> - drm_gpusvm_migrate_unmap_pages(devmem_allocation->dev, dma_addr, npages,
> - DMA_TO_DEVICE);
> -err_free:
> - if (zdd)
> - drm_gpusvm_zdd_put(zdd);
> - kvfree(buf);
> -err_out:
> - return err;
> -}
> -EXPORT_SYMBOL_GPL(drm_gpusvm_migrate_to_devmem);
> -
> -/**
> - * drm_gpusvm_migrate_populate_ram_pfn() - Populate RAM PFNs for a VM area
> - * @vas: Pointer to the VM area structure, can be NULL
> - * @fault_page: Fault page
> - * @npages: Number of pages to populate
> - * @mpages: Number of pages to migrate
> - * @src_mpfn: Source array of migrate PFNs
> - * @mpfn: Array of migrate PFNs to populate
> - * @addr: Start address for PFN allocation
> - *
> - * This function populates the RAM migrate page frame numbers (PFNs) for the
> - * specified VM area structure. It allocates and locks pages in the VM area for
> - * RAM usage. If vas is non-NULL use alloc_page_vma for allocation, if NULL use
> - * alloc_page for allocation.
> - *
> - * Return: 0 on success, negative error code on failure.
> - */
> -static int drm_gpusvm_migrate_populate_ram_pfn(struct vm_area_struct *vas,
> - struct page *fault_page,
> - unsigned long npages,
> - unsigned long *mpages,
> - unsigned long *src_mpfn,
> - unsigned long *mpfn,
> - unsigned long addr)
> -{
> - unsigned long i;
> -
> - for (i = 0; i < npages; ++i, addr += PAGE_SIZE) {
> - struct page *page, *src_page;
> -
> - if (!(src_mpfn[i] & MIGRATE_PFN_MIGRATE))
> - continue;
> -
> - src_page = migrate_pfn_to_page(src_mpfn[i]);
> - if (!src_page)
> - continue;
> -
> - if (fault_page) {
> - if (src_page->zone_device_data !=
> - fault_page->zone_device_data)
> - continue;
> - }
> -
> - if (vas)
> - page = alloc_page_vma(GFP_HIGHUSER, vas, addr);
> - else
> - page = alloc_page(GFP_HIGHUSER);
> -
> - if (!page)
> - goto free_pages;
> -
> - mpfn[i] = migrate_pfn(page_to_pfn(page));
> - }
> -
> - for (i = 0; i < npages; ++i) {
> - struct page *page = migrate_pfn_to_page(mpfn[i]);
> -
> - if (!page)
> - continue;
> -
> - WARN_ON_ONCE(!trylock_page(page));
> - ++*mpages;
> - }
> -
> - return 0;
> -
> -free_pages:
> - for (i = 0; i < npages; ++i) {
> - struct page *page = migrate_pfn_to_page(mpfn[i]);
> -
> - if (!page)
> - continue;
> -
> - put_page(page);
> - mpfn[i] = 0;
> - }
> - return -ENOMEM;
> -}
> -
> -/**
> - * drm_gpusvm_evict_to_ram() - Evict GPU SVM range to RAM
> - * @devmem_allocation: Pointer to the device memory allocation
> - *
> - * Similar to __drm_gpusvm_migrate_to_ram but does not require mmap lock and
> - * migration done via migrate_device_* functions.
> - *
> - * Return: 0 on success, negative error code on failure.
> - */
> -int drm_gpusvm_evict_to_ram(struct drm_gpusvm_devmem *devmem_allocation)
> -{
> - const struct drm_gpusvm_devmem_ops *ops = devmem_allocation->ops;
> - unsigned long npages, mpages = 0;
> - struct page **pages;
> - unsigned long *src, *dst;
> - dma_addr_t *dma_addr;
> - void *buf;
> - int i, err = 0;
> - unsigned int retry_count = 2;
> -
> - npages = devmem_allocation->size >> PAGE_SHIFT;
> -
> -retry:
> - if (!mmget_not_zero(devmem_allocation->mm))
> - return -EFAULT;
> -
> - buf = kvcalloc(npages, 2 * sizeof(*src) + sizeof(*dma_addr) +
> - sizeof(*pages), GFP_KERNEL);
> - if (!buf) {
> - err = -ENOMEM;
> - goto err_out;
> - }
> - src = buf;
> - dst = buf + (sizeof(*src) * npages);
> - dma_addr = buf + (2 * sizeof(*src) * npages);
> - pages = buf + (2 * sizeof(*src) + sizeof(*dma_addr)) * npages;
> -
> - err = ops->populate_devmem_pfn(devmem_allocation, npages, src);
> - if (err)
> - goto err_free;
> -
> - err = migrate_device_pfns(src, npages);
> - if (err)
> - goto err_free;
> -
> - err = drm_gpusvm_migrate_populate_ram_pfn(NULL, NULL, npages, &mpages,
> - src, dst, 0);
> - if (err || !mpages)
> - goto err_finalize;
> -
> - err = drm_gpusvm_migrate_map_pages(devmem_allocation->dev, dma_addr,
> - dst, npages, DMA_FROM_DEVICE);
> - if (err)
> - goto err_finalize;
> -
> - for (i = 0; i < npages; ++i)
> - pages[i] = migrate_pfn_to_page(src[i]);
> -
> - err = ops->copy_to_ram(pages, dma_addr, npages);
> - if (err)
> - goto err_finalize;
> -
> -err_finalize:
> - if (err)
> - drm_gpusvm_migration_unlock_put_pages(npages, dst);
> - migrate_device_pages(src, dst, npages);
> - migrate_device_finalize(src, dst, npages);
> - drm_gpusvm_migrate_unmap_pages(devmem_allocation->dev, dma_addr, npages,
> - DMA_FROM_DEVICE);
> -err_free:
> - kvfree(buf);
> -err_out:
> - mmput_async(devmem_allocation->mm);
> -
> - if (completion_done(&devmem_allocation->detached))
> - return 0;
> -
> - if (retry_count--) {
> - cond_resched();
> - goto retry;
> - }
> -
> - return err ?: -EBUSY;
> -}
> -EXPORT_SYMBOL_GPL(drm_gpusvm_evict_to_ram);
> -
> -/**
> - * __drm_gpusvm_migrate_to_ram() - Migrate GPU SVM range to RAM (internal)
> - * @vas: Pointer to the VM area structure
> - * @device_private_page_owner: Device private pages owner
> - * @page: Pointer to the page for fault handling (can be NULL)
> - * @fault_addr: Fault address
> - * @size: Size of migration
> - *
> - * This internal function performs the migration of the specified GPU SVM range
> - * to RAM. It sets up the migration, populates + dma maps RAM PFNs, and
> - * invokes the driver-specific operations for migration to RAM.
> - *
> - * Return: 0 on success, negative error code on failure.
> - */
> -static int __drm_gpusvm_migrate_to_ram(struct vm_area_struct *vas,
> - void *device_private_page_owner,
> - struct page *page,
> - unsigned long fault_addr,
> - unsigned long size)
> -{
> - struct migrate_vma migrate = {
> - .vma = vas,
> - .pgmap_owner = device_private_page_owner,
> - .flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE |
> - MIGRATE_VMA_SELECT_DEVICE_COHERENT,
> - .fault_page = page,
> - };
> - struct drm_gpusvm_zdd *zdd;
> - const struct drm_gpusvm_devmem_ops *ops;
> - struct device *dev = NULL;
> - unsigned long npages, mpages = 0;
> - struct page **pages;
> - dma_addr_t *dma_addr;
> - unsigned long start, end;
> - void *buf;
> - int i, err = 0;
> -
> - if (page) {
> - zdd = page->zone_device_data;
> - if (time_before64(get_jiffies_64(),
> - zdd->devmem_allocation->timeslice_expiration))
> - return 0;
> - }
> -
> - start = ALIGN_DOWN(fault_addr, size);
> - end = ALIGN(fault_addr + 1, size);
> -
> - /* Corner where VMA area struct has been partially unmapped */
> - if (start < vas->vm_start)
> - start = vas->vm_start;
> - if (end > vas->vm_end)
> - end = vas->vm_end;
> -
> - migrate.start = start;
> - migrate.end = end;
> - npages = npages_in_range(start, end);
> -
> - buf = kvcalloc(npages, 2 * sizeof(*migrate.src) + sizeof(*dma_addr) +
> - sizeof(*pages), GFP_KERNEL);
> - if (!buf) {
> - err = -ENOMEM;
> - goto err_out;
> - }
> - dma_addr = buf + (2 * sizeof(*migrate.src) * npages);
> - pages = buf + (2 * sizeof(*migrate.src) + sizeof(*dma_addr)) * npages;
> -
> - migrate.vma = vas;
> - migrate.src = buf;
> - migrate.dst = migrate.src + npages;
> -
> - err = migrate_vma_setup(&migrate);
> - if (err)
> - goto err_free;
> -
> - /* Raced with another CPU fault, nothing to do */
> - if (!migrate.cpages)
> - goto err_free;
> -
> - if (!page) {
> - for (i = 0; i < npages; ++i) {
> - if (!(migrate.src[i] & MIGRATE_PFN_MIGRATE))
> - continue;
> -
> - page = migrate_pfn_to_page(migrate.src[i]);
> - break;
> - }
> -
> - if (!page)
> - goto err_finalize;
> - }
> - zdd = page->zone_device_data;
> - ops = zdd->devmem_allocation->ops;
> - dev = zdd->devmem_allocation->dev;
> -
> - err = drm_gpusvm_migrate_populate_ram_pfn(vas, page, npages, &mpages,
> - migrate.src, migrate.dst,
> - start);
> - if (err)
> - goto err_finalize;
> -
> - err = drm_gpusvm_migrate_map_pages(dev, dma_addr, migrate.dst, npages,
> - DMA_FROM_DEVICE);
> - if (err)
> - goto err_finalize;
> -
> - for (i = 0; i < npages; ++i)
> - pages[i] = migrate_pfn_to_page(migrate.src[i]);
> -
> - err = ops->copy_to_ram(pages, dma_addr, npages);
> - if (err)
> - goto err_finalize;
> -
> -err_finalize:
> - if (err)
> - drm_gpusvm_migration_unlock_put_pages(npages, migrate.dst);
> - migrate_vma_pages(&migrate);
> - migrate_vma_finalize(&migrate);
> - if (dev)
> - drm_gpusvm_migrate_unmap_pages(dev, dma_addr, npages,
> - DMA_FROM_DEVICE);
> -err_free:
> - kvfree(buf);
> -err_out:
> -
> - return err;
> -}
> -
> -/**
> - * drm_gpusvm_range_evict - Evict GPU SVM range
> * @range: Pointer to the GPU SVM range to be removed
> *
> - * This function evicts the specified GPU SVM range. This function will not
> - * evict coherent pages.
> + * This function evicts the specified GPU SVM range.
> *
> * Return: 0 on success, a negative error code on failure.
> */
> @@ -2182,60 +1524,6 @@ int drm_gpusvm_range_evict(struct drm_gpusvm *gpusvm,
> }
> EXPORT_SYMBOL_GPL(drm_gpusvm_range_evict);
>
> -/**
> - * drm_gpusvm_page_free() - Put GPU SVM zone device data associated with a page
> - * @page: Pointer to the page
> - *
> - * This function is a callback used to put the GPU SVM zone device data
> - * associated with a page when it is being released.
> - */
> -static void drm_gpusvm_page_free(struct page *page)
> -{
> - drm_gpusvm_zdd_put(page->zone_device_data);
> -}
> -
> -/**
> - * drm_gpusvm_migrate_to_ram() - Migrate GPU SVM range to RAM (page fault handler)
> - * @vmf: Pointer to the fault information structure
> - *
> - * This function is a page fault handler used to migrate a GPU SVM range to RAM.
> - * It retrieves the GPU SVM range information from the faulting page and invokes
> - * the internal migration function to migrate the range back to RAM.
> - *
> - * Return: VM_FAULT_SIGBUS on failure, 0 on success.
> - */
> -static vm_fault_t drm_gpusvm_migrate_to_ram(struct vm_fault *vmf)
> -{
> - struct drm_gpusvm_zdd *zdd = vmf->page->zone_device_data;
> - int err;
> -
> - err = __drm_gpusvm_migrate_to_ram(vmf->vma,
> - zdd->device_private_page_owner,
> - vmf->page, vmf->address,
> - zdd->devmem_allocation->size);
> -
> - return err ? VM_FAULT_SIGBUS : 0;
> -}
> -
> -/*
> - * drm_gpusvm_pagemap_ops - Device page map operations for GPU SVM
> - */
> -static const struct dev_pagemap_ops drm_gpusvm_pagemap_ops = {
> - .page_free = drm_gpusvm_page_free,
> - .migrate_to_ram = drm_gpusvm_migrate_to_ram,
> -};
> -
> -/**
> - * drm_gpusvm_pagemap_ops_get() - Retrieve GPU SVM device page map operations
> - *
> - * Return: Pointer to the GPU SVM device page map operations structure.
> - */
> -const struct dev_pagemap_ops *drm_gpusvm_pagemap_ops_get(void)
> -{
> - return &drm_gpusvm_pagemap_ops;
> -}
> -EXPORT_SYMBOL_GPL(drm_gpusvm_pagemap_ops_get);
> -
> /**
> * drm_gpusvm_has_mapping() - Check if GPU SVM has mapping for the given address range
> * @gpusvm: Pointer to the GPU SVM structure.
> @@ -2280,28 +1568,5 @@ void drm_gpusvm_range_set_unmapped(struct drm_gpusvm_range *range,
> }
> EXPORT_SYMBOL_GPL(drm_gpusvm_range_set_unmapped);
>
> -/**
> - * drm_gpusvm_devmem_init() - Initialize a GPU SVM device memory allocation
> - *
> - * @dev: Pointer to the device structure which device memory allocation belongs to
> - * @mm: Pointer to the mm_struct for the address space
> - * @ops: Pointer to the operations structure for GPU SVM device memory
> - * @dpagemap: The struct drm_pagemap we're allocating from.
> - * @size: Size of device memory allocation
> - */
> -void drm_gpusvm_devmem_init(struct drm_gpusvm_devmem *devmem_allocation,
> - struct device *dev, struct mm_struct *mm,
> - const struct drm_gpusvm_devmem_ops *ops,
> - struct drm_pagemap *dpagemap, size_t size)
> -{
> - init_completion(&devmem_allocation->detached);
> - devmem_allocation->dev = dev;
> - devmem_allocation->mm = mm;
> - devmem_allocation->ops = ops;
> - devmem_allocation->dpagemap = dpagemap;
> - devmem_allocation->size = size;
> -}
> -EXPORT_SYMBOL_GPL(drm_gpusvm_devmem_init);
> -
> MODULE_DESCRIPTION("DRM GPUSVM");
> MODULE_LICENSE("GPL");
> diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c
> new file mode 100644
> index 000000000000..3551a50d7381
> --- /dev/null
> +++ b/drivers/gpu/drm/drm_pagemap.c
> @@ -0,0 +1,811 @@
> +// SPDX-License-Identifier: GPL-2.0-only OR MIT
> +/*
> + * Copyright © 2024-2025 Intel Corporation
> + */
> +
> +#include <linux/dma-mapping.h>
> +#include <linux/migrate.h>
> +#include <linux/pagemap.h>
> +#include <drm/drm_pagemap.h>
> +
> +/**
> + * DOC: Overview
> + *
> + * The DRM pagemap layer is intended to augment the dev_pagemap functionality by
> + * providing a way to populate a struct mm_struct virtual range with device
> + * private pages and to provide helpers to abstract device memory allocations,
> + * to migrate memory back and forth between device memory and system RAM and
> + * to handle access (and in the future migration) between devices implementing
> + * a fast interconnect that is not necessarily visible to the rest of the
> + * system.
The latter part (fast interconnect support) is NIY, right. Also not only
fast interconnects, PCIe P2P, right?
> + *
> + * Typically the DRM pagemap receives requests from one or more DRM GPU SVM
> + * instances to populate struct mm_struct virtual ranges with memory, and the
> + * migration is best effort only and may thus fail. The implementation should
> + * also handle device unbinding by blocking (return an -ENODEV) error for new
> + * population requests and after that migrate all device pages to system ram.
So this means populate_devmem_pfn returning -ENODEV. Should we
specifically document this return code in populate_devmem_pfn,
drm_pagemap_migrate_to_devmem?
> + */
> +
> +/**
> + * DOC: Migration
> + * Migration granularity typically follows the GPU SVM range requests, but
> + * if there are clashes, due to races or due to the fact that multiple GPU
> + * SVM instances have different views of the ranges used, and because of that
'multiple GPU SVM instances have different views of the ranges used'
This seems scray and hard handle, perhaps you have thought this one
through a bit more than me.
> + * parts of a requested range is already present in the requested device memory,
> + * the implementation has a variety of options. It can fail and it can choose
> + * to populate only the part of the range that isn't already in device memory,
> + * and it can evict the range to system before trying to migrate. Ideally an
> + * implementation would just try to migrate the missing part of the range and
> + * allocate just enough memory to do so.
> + *
I think we need bit more plumbing to implement the ideal case but again
maybe you thought this one through a more than me.
In general there this doc seems forward looking to this not implement
yet which I'm not sure is a good idea for just moving code around.
Matt
> + * When migrating to system memory as a response to a cpu fault or a device
> + * memory eviction request, currently a full device memory allocation is
> + * migrated back to system. Moving forward this might need improvement for
> + * situations where a single page needs bouncing between system memory and
> + * device memory due to, for example, atomic operations.
> + *
> + * Key DRM pagemap components:
> + *
> + * - Device Memory Allocations:
> + * Embedded structure containing enough information for the drm_pagemap to
> + * migrate to / from device memory.
> + *
> + * - Device Memory Operations:
> + * Define the interface for driver-specific device memory operations
> + * release memory, populate pfns, and copy to / from device memory.
> + */
> +
> +/**
> + * struct drm_pagemap_zdd - GPU SVM zone device data
> + *
> + * @refcount: Reference count for the zdd
> + * @devmem_allocation: device memory allocation
> + * @device_private_page_owner: Device private pages owner
> + *
> + * This structure serves as a generic wrapper installed in
> + * page->zone_device_data. It provides infrastructure for looking up a device
> + * memory allocation upon CPU page fault and asynchronously releasing device
> + * memory once the CPU has no page references. Asynchronous release is useful
> + * because CPU page references can be dropped in IRQ contexts, while releasing
> + * device memory likely requires sleeping locks.
> + */
> +struct drm_pagemap_zdd {
> + struct kref refcount;
> + struct drm_pagemap_devmem *devmem_allocation;
> + void *device_private_page_owner;
> +};
> +
> +/**
> + * drm_pagemap_zdd_alloc() - Allocate a zdd structure.
> + * @device_private_page_owner: Device private pages owner
> + *
> + * This function allocates and initializes a new zdd structure. It sets up the
> + * reference count and initializes the destroy work.
> + *
> + * Return: Pointer to the allocated zdd on success, ERR_PTR() on failure.
> + */
> +static struct drm_pagemap_zdd *
> +drm_pagemap_zdd_alloc(void *device_private_page_owner)
> +{
> + struct drm_pagemap_zdd *zdd;
> +
> + zdd = kmalloc(sizeof(*zdd), GFP_KERNEL);
> + if (!zdd)
> + return NULL;
> +
> + kref_init(&zdd->refcount);
> + zdd->devmem_allocation = NULL;
> + zdd->device_private_page_owner = device_private_page_owner;
> +
> + return zdd;
> +}
> +
> +/**
> + * drm_pagemap_zdd_get() - Get a reference to a zdd structure.
> + * @zdd: Pointer to the zdd structure.
> + *
> + * This function increments the reference count of the provided zdd structure.
> + *
> + * Return: Pointer to the zdd structure.
> + */
> +static struct drm_pagemap_zdd *drm_pagemap_zdd_get(struct drm_pagemap_zdd *zdd)
> +{
> + kref_get(&zdd->refcount);
> + return zdd;
> +}
> +
> +/**
> + * drm_pagemap_zdd_destroy() - Destroy a zdd structure.
> + * @ref: Pointer to the reference count structure.
> + *
> + * This function queues the destroy_work of the zdd for asynchronous destruction.
> + */
> +static void drm_pagemap_zdd_destroy(struct kref *ref)
> +{
> + struct drm_pagemap_zdd *zdd =
> + container_of(ref, struct drm_pagemap_zdd, refcount);
> + struct drm_pagemap_devmem *devmem = zdd->devmem_allocation;
> +
> + if (devmem) {
> + complete_all(&devmem->detached);
> + if (devmem->ops->devmem_release)
> + devmem->ops->devmem_release(devmem);
> + }
> + kfree(zdd);
> +}
> +
> +/**
> + * drm_pagemap_zdd_put() - Put a zdd reference.
> + * @zdd: Pointer to the zdd structure.
> + *
> + * This function decrements the reference count of the provided zdd structure
> + * and schedules its destruction if the count drops to zero.
> + */
> +static void drm_pagemap_zdd_put(struct drm_pagemap_zdd *zdd)
> +{
> + kref_put(&zdd->refcount, drm_pagemap_zdd_destroy);
> +}
> +
> +/**
> + * drm_pagemap_migration_unlock_put_page() - Put a migration page
> + * @page: Pointer to the page to put
> + *
> + * This function unlocks and puts a page.
> + */
> +static void drm_pagemap_migration_unlock_put_page(struct page *page)
> +{
> + unlock_page(page);
> + put_page(page);
> +}
> +
> +/**
> + * drm_pagemap_migration_unlock_put_pages() - Put migration pages
> + * @npages: Number of pages
> + * @migrate_pfn: Array of migrate page frame numbers
> + *
> + * This function unlocks and puts an array of pages.
> + */
> +static void drm_pagemap_migration_unlock_put_pages(unsigned long npages,
> + unsigned long *migrate_pfn)
> +{
> + unsigned long i;
> +
> + for (i = 0; i < npages; ++i) {
> + struct page *page;
> +
> + if (!migrate_pfn[i])
> + continue;
> +
> + page = migrate_pfn_to_page(migrate_pfn[i]);
> + drm_pagemap_migration_unlock_put_page(page);
> + migrate_pfn[i] = 0;
> + }
> +}
> +
> +/**
> + * drm_pagemap_get_devmem_page() - Get a reference to a device memory page
> + * @page: Pointer to the page
> + * @zdd: Pointer to the GPU SVM zone device data
> + *
> + * This function associates the given page with the specified GPU SVM zone
> + * device data and initializes it for zone device usage.
> + */
> +static void drm_pagemap_get_devmem_page(struct page *page,
> + struct drm_pagemap_zdd *zdd)
> +{
> + page->zone_device_data = drm_pagemap_zdd_get(zdd);
> + zone_device_page_init(page);
> +}
> +
> +/**
> + * drm_pagemap_migrate_map_pages() - Map migration pages for GPU SVM migration
> + * @dev: The device for which the pages are being mapped
> + * @dma_addr: Array to store DMA addresses corresponding to mapped pages
> + * @migrate_pfn: Array of migrate page frame numbers to map
> + * @npages: Number of pages to map
> + * @dir: Direction of data transfer (e.g., DMA_BIDIRECTIONAL)
> + *
> + * This function maps pages of memory for migration usage in GPU SVM. It
> + * iterates over each page frame number provided in @migrate_pfn, maps the
> + * corresponding page, and stores the DMA address in the provided @dma_addr
> + * array.
> + *
> + * Returns: 0 on success, -EFAULT if an error occurs during mapping.
> + */
> +static int drm_pagemap_migrate_map_pages(struct device *dev,
> + dma_addr_t *dma_addr,
> + unsigned long *migrate_pfn,
> + unsigned long npages,
> + enum dma_data_direction dir)
> +{
> + unsigned long i;
> +
> + for (i = 0; i < npages; ++i) {
> + struct page *page = migrate_pfn_to_page(migrate_pfn[i]);
> +
> + if (!page)
> + continue;
> +
> + if (WARN_ON_ONCE(is_zone_device_page(page)))
> + return -EFAULT;
> +
> + dma_addr[i] = dma_map_page(dev, page, 0, PAGE_SIZE, dir);
> + if (dma_mapping_error(dev, dma_addr[i]))
> + return -EFAULT;
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * drm_pagemap_migrate_unmap_pages() - Unmap pages previously mapped for GPU SVM migration
> + * @dev: The device for which the pages were mapped
> + * @dma_addr: Array of DMA addresses corresponding to mapped pages
> + * @npages: Number of pages to unmap
> + * @dir: Direction of data transfer (e.g., DMA_BIDIRECTIONAL)
> + *
> + * This function unmaps previously mapped pages of memory for GPU Shared Virtual
> + * Memory (SVM). It iterates over each DMA address provided in @dma_addr, checks
> + * if it's valid and not already unmapped, and unmaps the corresponding page.
> + */
> +static void drm_pagemap_migrate_unmap_pages(struct device *dev,
> + dma_addr_t *dma_addr,
> + unsigned long npages,
> + enum dma_data_direction dir)
> +{
> + unsigned long i;
> +
> + for (i = 0; i < npages; ++i) {
> + if (!dma_addr[i] || dma_mapping_error(dev, dma_addr[i]))
> + continue;
> +
> + dma_unmap_page(dev, dma_addr[i], PAGE_SIZE, dir);
> + }
> +}
> +
> +static unsigned long
> +npages_in_range(unsigned long start, unsigned long end)
> +{
> + return (end - start) >> PAGE_SHIFT;
> +}
> +
> +
> +/**
> + * drm_pagemap_migrate_to_devmem() - Migrate a struct mm_struct range to device memory
> + * @devmem_allocation: The device memory allocation to migrate to.
> + * The caller should hold a reference to the device memory allocation,
> + * and the reference is consumed by this function unless it returns with
> + * an error.
> + * @mm: Pointer to the struct mm_struct.
> + * @start: Start of the virtual address range to migrate.
> + * @end: End of the virtual address range to migrate.
> + * @pgmap_owner: Not used currently, since only system memory is considered.
> + *
> + * This function migrates the specified virtual address range to device memory.
> + * It performs the necessary setup and invokes the driver-specific operations for
> + * migration to device memory. Expected to be called while holding the mmap lock in
> + * at least read mode.
> + *
> + * Return: %0 on success, negative error code on failure.
> + */
> +
> +/*
> + * @range: Pointer to the GPU SVM range structure
> + * @devmem_allocation: Pointer to the device memory allocation. The caller
> + * should hold a reference to the device memory allocation,
> + * which should be dropped via ops->devmem_release or upon
> + * the failure of this function.
> + * @ctx: GPU SVM context
> + *
> + * This function migrates the specified GPU SVM range to device memory. It
> + * performs the necessary setup and invokes the driver-specific operations for
> + * migration to device memory. Upon successful return, @devmem_allocation can
> + * safely reference @range until ops->devmem_release is called which only upon
> + * successful return. Expected to be called while holding the mmap lock in read
> + * mode.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int drm_pagemap_migrate_to_devmem(struct drm_pagemap_devmem *devmem_allocation,
> + struct mm_struct *mm,
> + unsigned long start, unsigned long end,
> + unsigned long timeslice_ms,
> + void *pgmap_owner)
> +{
> + const struct drm_pagemap_devmem_ops *ops = devmem_allocation->ops;
> + struct migrate_vma migrate = {
> + .start = start,
> + .end = end,
> + .pgmap_owner = pgmap_owner,
> + .flags = MIGRATE_VMA_SELECT_SYSTEM,
> + };
> + unsigned long i, npages = npages_in_range(start, end);
> + struct vm_area_struct *vas;
> + struct drm_pagemap_zdd *zdd = NULL;
> + struct page **pages;
> + dma_addr_t *dma_addr;
> + void *buf;
> + int err;
> +
> + mmap_assert_locked(mm);
> +
> + if (!ops->populate_devmem_pfn || !ops->copy_to_devmem ||
> + !ops->copy_to_ram)
> + return -EOPNOTSUPP;
> +
> + vas = vma_lookup(mm, start);
> + if (!vas) {
> + err = -ENOENT;
> + goto err_out;
> + }
> +
> + if (end > vas->vm_end || start < vas->vm_start) {
> + err = -EINVAL;
> + goto err_out;
> + }
> +
> + if (!vma_is_anonymous(vas)) {
> + err = -EBUSY;
> + goto err_out;
> + }
> +
> + buf = kvcalloc(npages, 2 * sizeof(*migrate.src) + sizeof(*dma_addr) +
> + sizeof(*pages), GFP_KERNEL);
> + if (!buf) {
> + err = -ENOMEM;
> + goto err_out;
> + }
> + dma_addr = buf + (2 * sizeof(*migrate.src) * npages);
> + pages = buf + (2 * sizeof(*migrate.src) + sizeof(*dma_addr)) * npages;
> +
> + zdd = drm_pagemap_zdd_alloc(pgmap_owner);
> + if (!zdd) {
> + err = -ENOMEM;
> + goto err_free;
> + }
> +
> + migrate.vma = vas;
> + migrate.src = buf;
> + migrate.dst = migrate.src + npages;
> +
> + err = migrate_vma_setup(&migrate);
> + if (err)
> + goto err_free;
> +
> + if (!migrate.cpages) {
> + err = -EFAULT;
> + goto err_free;
> + }
> +
> + if (migrate.cpages != npages) {
> + err = -EBUSY;
> + goto err_finalize;
> + }
> +
> + err = ops->populate_devmem_pfn(devmem_allocation, npages, migrate.dst);
> + if (err)
> + goto err_finalize;
> +
> + err = drm_pagemap_migrate_map_pages(devmem_allocation->dev, dma_addr,
> + migrate.src, npages, DMA_TO_DEVICE);
> + if (err)
> + goto err_finalize;
> +
> + for (i = 0; i < npages; ++i) {
> + struct page *page = pfn_to_page(migrate.dst[i]);
> +
> + pages[i] = page;
> + migrate.dst[i] = migrate_pfn(migrate.dst[i]);
> + drm_pagemap_get_devmem_page(page, zdd);
> + }
> +
> + err = ops->copy_to_devmem(pages, dma_addr, npages);
> + if (err)
> + goto err_finalize;
> +
> + /* Upon success bind devmem allocation to range and zdd */
> + devmem_allocation->timeslice_expiration = get_jiffies_64() +
> + msecs_to_jiffies(timeslice_ms);
> + zdd->devmem_allocation = devmem_allocation; /* Owns ref */
> +
> +err_finalize:
> + if (err)
> + drm_pagemap_migration_unlock_put_pages(npages, migrate.dst);
> + migrate_vma_pages(&migrate);
> + migrate_vma_finalize(&migrate);
> + drm_pagemap_migrate_unmap_pages(devmem_allocation->dev, dma_addr, npages,
> + DMA_TO_DEVICE);
> +err_free:
> + if (zdd)
> + drm_pagemap_zdd_put(zdd);
> + kvfree(buf);
> +err_out:
> + return err;
> +}
> +EXPORT_SYMBOL_GPL(drm_pagemap_migrate_to_devmem);
> +
> +/**
> + * drm_pagemap_migrate_populate_ram_pfn() - Populate RAM PFNs for a VM area
> + * @vas: Pointer to the VM area structure, can be NULL
> + * @fault_page: Fault page
> + * @npages: Number of pages to populate
> + * @mpages: Number of pages to migrate
> + * @src_mpfn: Source array of migrate PFNs
> + * @mpfn: Array of migrate PFNs to populate
> + * @addr: Start address for PFN allocation
> + *
> + * This function populates the RAM migrate page frame numbers (PFNs) for the
> + * specified VM area structure. It allocates and locks pages in the VM area for
> + * RAM usage. If vas is non-NULL use alloc_page_vma for allocation, if NULL use
> + * alloc_page for allocation.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +static int drm_pagemap_migrate_populate_ram_pfn(struct vm_area_struct *vas,
> + struct page *fault_page,
> + unsigned long npages,
> + unsigned long *mpages,
> + unsigned long *src_mpfn,
> + unsigned long *mpfn,
> + unsigned long addr)
> +{
> + unsigned long i;
> +
> + for (i = 0; i < npages; ++i, addr += PAGE_SIZE) {
> + struct page *page, *src_page;
> +
> + if (!(src_mpfn[i] & MIGRATE_PFN_MIGRATE))
> + continue;
> +
> + src_page = migrate_pfn_to_page(src_mpfn[i]);
> + if (!src_page)
> + continue;
> +
> + if (fault_page) {
> + if (src_page->zone_device_data !=
> + fault_page->zone_device_data)
> + continue;
> + }
> +
> + if (vas)
> + page = alloc_page_vma(GFP_HIGHUSER, vas, addr);
> + else
> + page = alloc_page(GFP_HIGHUSER);
> +
> + if (!page)
> + goto free_pages;
> +
> + mpfn[i] = migrate_pfn(page_to_pfn(page));
> + }
> +
> + for (i = 0; i < npages; ++i) {
> + struct page *page = migrate_pfn_to_page(mpfn[i]);
> +
> + if (!page)
> + continue;
> +
> + WARN_ON_ONCE(!trylock_page(page));
> + ++*mpages;
> + }
> +
> + return 0;
> +
> +free_pages:
> + for (i = 0; i < npages; ++i) {
> + struct page *page = migrate_pfn_to_page(mpfn[i]);
> +
> + if (!page)
> + continue;
> +
> + put_page(page);
> + mpfn[i] = 0;
> + }
> + return -ENOMEM;
> +}
> +
> +/**
> + * drm_pagemap_evict_to_ram() - Evict GPU SVM range to RAM
> + * @devmem_allocation: Pointer to the device memory allocation
> + *
> + * Similar to __drm_pagemap_migrate_to_ram but does not require mmap lock and
> + * migration done via migrate_device_* functions.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int drm_pagemap_evict_to_ram(struct drm_pagemap_devmem *devmem_allocation)
> +{
> + const struct drm_pagemap_devmem_ops *ops = devmem_allocation->ops;
> + unsigned long npages, mpages = 0;
> + struct page **pages;
> + unsigned long *src, *dst;
> + dma_addr_t *dma_addr;
> + void *buf;
> + int i, err = 0;
> + unsigned int retry_count = 2;
> +
> + npages = devmem_allocation->size >> PAGE_SHIFT;
> +
> +retry:
> + if (!mmget_not_zero(devmem_allocation->mm))
> + return -EFAULT;
> +
> + buf = kvcalloc(npages, 2 * sizeof(*src) + sizeof(*dma_addr) +
> + sizeof(*pages), GFP_KERNEL);
> + if (!buf) {
> + err = -ENOMEM;
> + goto err_out;
> + }
> + src = buf;
> + dst = buf + (sizeof(*src) * npages);
> + dma_addr = buf + (2 * sizeof(*src) * npages);
> + pages = buf + (2 * sizeof(*src) + sizeof(*dma_addr)) * npages;
> +
> + err = ops->populate_devmem_pfn(devmem_allocation, npages, src);
> + if (err)
> + goto err_free;
> +
> + err = migrate_device_pfns(src, npages);
> + if (err)
> + goto err_free;
> +
> + err = drm_pagemap_migrate_populate_ram_pfn(NULL, NULL, npages, &mpages,
> + src, dst, 0);
> + if (err || !mpages)
> + goto err_finalize;
> +
> + err = drm_pagemap_migrate_map_pages(devmem_allocation->dev, dma_addr,
> + dst, npages, DMA_FROM_DEVICE);
> + if (err)
> + goto err_finalize;
> +
> + for (i = 0; i < npages; ++i)
> + pages[i] = migrate_pfn_to_page(src[i]);
> +
> + err = ops->copy_to_ram(pages, dma_addr, npages);
> + if (err)
> + goto err_finalize;
> +
> +err_finalize:
> + if (err)
> + drm_pagemap_migration_unlock_put_pages(npages, dst);
> + migrate_device_pages(src, dst, npages);
> + migrate_device_finalize(src, dst, npages);
> + drm_pagemap_migrate_unmap_pages(devmem_allocation->dev, dma_addr, npages,
> + DMA_FROM_DEVICE);
> +err_free:
> + kvfree(buf);
> +err_out:
> + mmput_async(devmem_allocation->mm);
> +
> + if (completion_done(&devmem_allocation->detached))
> + return 0;
> +
> + if (retry_count--) {
> + cond_resched();
> + goto retry;
> + }
> +
> + return err ?: -EBUSY;
> +}
> +EXPORT_SYMBOL_GPL(drm_pagemap_evict_to_ram);
> +
> +/**
> + * __drm_pagemap_migrate_to_ram() - Migrate GPU SVM range to RAM (internal)
> + * @vas: Pointer to the VM area structure
> + * @device_private_page_owner: Device private pages owner
> + * @page: Pointer to the page for fault handling (can be NULL)
> + * @fault_addr: Fault address
> + * @size: Size of migration
> + *
> + * This internal function performs the migration of the specified GPU SVM range
> + * to RAM. It sets up the migration, populates + dma maps RAM PFNs, and
> + * invokes the driver-specific operations for migration to RAM.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +static int __drm_pagemap_migrate_to_ram(struct vm_area_struct *vas,
> + void *device_private_page_owner,
> + struct page *page,
> + unsigned long fault_addr,
> + unsigned long size)
> +{
> + struct migrate_vma migrate = {
> + .vma = vas,
> + .pgmap_owner = device_private_page_owner,
> + .flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE |
> + MIGRATE_VMA_SELECT_DEVICE_COHERENT,
> + .fault_page = page,
> + };
> + struct drm_pagemap_zdd *zdd;
> + const struct drm_pagemap_devmem_ops *ops;
> + struct device *dev = NULL;
> + unsigned long npages, mpages = 0;
> + struct page **pages;
> + dma_addr_t *dma_addr;
> + unsigned long start, end;
> + void *buf;
> + int i, err = 0;
> +
> + if (page) {
> + zdd = page->zone_device_data;
> + if (time_before64(get_jiffies_64(),
> + zdd->devmem_allocation->timeslice_expiration))
> + return 0;
> + }
> +
> + start = ALIGN_DOWN(fault_addr, size);
> + end = ALIGN(fault_addr + 1, size);
> +
> + /* Corner where VMA area struct has been partially unmapped */
> + if (start < vas->vm_start)
> + start = vas->vm_start;
> + if (end > vas->vm_end)
> + end = vas->vm_end;
> +
> + migrate.start = start;
> + migrate.end = end;
> + npages = npages_in_range(start, end);
> +
> + buf = kvcalloc(npages, 2 * sizeof(*migrate.src) + sizeof(*dma_addr) +
> + sizeof(*pages), GFP_KERNEL);
> + if (!buf) {
> + err = -ENOMEM;
> + goto err_out;
> + }
> + dma_addr = buf + (2 * sizeof(*migrate.src) * npages);
> + pages = buf + (2 * sizeof(*migrate.src) + sizeof(*dma_addr)) * npages;
> +
> + migrate.vma = vas;
> + migrate.src = buf;
> + migrate.dst = migrate.src + npages;
> +
> + err = migrate_vma_setup(&migrate);
> + if (err)
> + goto err_free;
> +
> + /* Raced with another CPU fault, nothing to do */
> + if (!migrate.cpages)
> + goto err_free;
> +
> + if (!page) {
> + for (i = 0; i < npages; ++i) {
> + if (!(migrate.src[i] & MIGRATE_PFN_MIGRATE))
> + continue;
> +
> + page = migrate_pfn_to_page(migrate.src[i]);
> + break;
> + }
> +
> + if (!page)
> + goto err_finalize;
> + }
> + zdd = page->zone_device_data;
> + ops = zdd->devmem_allocation->ops;
> + dev = zdd->devmem_allocation->dev;
> +
> + err = drm_pagemap_migrate_populate_ram_pfn(vas, page, npages, &mpages,
> + migrate.src, migrate.dst,
> + start);
> + if (err)
> + goto err_finalize;
> +
> + err = drm_pagemap_migrate_map_pages(dev, dma_addr, migrate.dst, npages,
> + DMA_FROM_DEVICE);
> + if (err)
> + goto err_finalize;
> +
> + for (i = 0; i < npages; ++i)
> + pages[i] = migrate_pfn_to_page(migrate.src[i]);
> +
> + err = ops->copy_to_ram(pages, dma_addr, npages);
> + if (err)
> + goto err_finalize;
> +
> +err_finalize:
> + if (err)
> + drm_pagemap_migration_unlock_put_pages(npages, migrate.dst);
> + migrate_vma_pages(&migrate);
> + migrate_vma_finalize(&migrate);
> + if (dev)
> + drm_pagemap_migrate_unmap_pages(dev, dma_addr, npages,
> + DMA_FROM_DEVICE);
> +err_free:
> + kvfree(buf);
> +err_out:
> +
> + return err;
> +}
> +
> +/**
> + * drm_pagemap_page_free() - Put GPU SVM zone device data associated with a page
> + * @page: Pointer to the page
> + *
> + * This function is a callback used to put the GPU SVM zone device data
> + * associated with a page when it is being released.
> + */
> +static void drm_pagemap_page_free(struct page *page)
> +{
> + drm_pagemap_zdd_put(page->zone_device_data);
> +}
> +
> +/**
> + * drm_pagemap_migrate_to_ram() - Migrate a virtual range to RAM (page fault handler)
> + * @vmf: Pointer to the fault information structure
> + *
> + * This function is a page fault handler used to migrate a virtual range
> + * to ram. The device memory allocation in which the device page is found is
> + * migrated in its entirety.
> + *
> + * Returns:
> + * VM_FAULT_SIGBUS on failure, 0 on success.
> + */
> +static vm_fault_t drm_pagemap_migrate_to_ram(struct vm_fault *vmf)
> +{
> + struct drm_pagemap_zdd *zdd = vmf->page->zone_device_data;
> + int err;
> +
> + err = __drm_pagemap_migrate_to_ram(vmf->vma,
> + zdd->device_private_page_owner,
> + vmf->page, vmf->address,
> + zdd->devmem_allocation->size);
> +
> + return err ? VM_FAULT_SIGBUS : 0;
> +}
> +
> +static const struct dev_pagemap_ops drm_pagemap_pagemap_ops = {
> + .page_free = drm_pagemap_page_free,
> + .migrate_to_ram = drm_pagemap_migrate_to_ram,
> +};
> +
> +/**
> + * drm_pagemap_pagemap_ops_get() - Retrieve GPU SVM device page map operations
> + *
> + * Returns:
> + * Pointer to the GPU SVM device page map operations structure.
> + */
> +const struct dev_pagemap_ops *drm_pagemap_pagemap_ops_get(void)
> +{
> + return &drm_pagemap_pagemap_ops;
> +}
> +EXPORT_SYMBOL_GPL(drm_pagemap_pagemap_ops_get);
> +
> +/**
> + * drm_pagemap_devmem_init() - Initialize a drm_pagemap device memory allocation
> + *
> + * @devmem_allocation: The struct drm_pagemap_devmem to initialize.
> + * @dev: Pointer to the device structure which device memory allocation belongs to
> + * @mm: Pointer to the mm_struct for the address space
> + * @ops: Pointer to the operations structure for GPU SVM device memory
> + * @dpagemap: The struct drm_pagemap we're allocating from.
> + * @size: Size of device memory allocation
> + */
> +void drm_pagemap_devmem_init(struct drm_pagemap_devmem *devmem_allocation,
> + struct device *dev, struct mm_struct *mm,
> + const struct drm_pagemap_devmem_ops *ops,
> + struct drm_pagemap *dpagemap, size_t size)
> +{
> + init_completion(&devmem_allocation->detached);
> + devmem_allocation->dev = dev;
> + devmem_allocation->mm = mm;
> + devmem_allocation->ops = ops;
> + devmem_allocation->dpagemap = dpagemap;
> + devmem_allocation->size = size;
> +}
> +EXPORT_SYMBOL_GPL(drm_pagemap_devmem_init);
> +
> +/**
> + * drm_pagemap_page_to_dpagemap() - Return a pointer the drm_pagemap of a page
> + * @page: The struct page.
> + *
> + * Return: A pointer to the struct drm_pagemap of a device private page that
> + * was populated from the struct drm_pagemap. If the page was *not* populated
> + * from a struct drm_pagemap, the result is undefined and the function call
> + * may result in dereferencing and invalid address.
> + */
> +struct drm_pagemap *drm_pagemap_page_to_dpagemap(struct page *page)
> +{
> + struct drm_pagemap_zdd *zdd = page->zone_device_data;
> +
> + return zdd->devmem_allocation->dpagemap;
> +}
> +EXPORT_SYMBOL_GPL(drm_pagemap_page_to_dpagemap);
> diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
> index 98b46c534278..c7c71734460b 100644
> --- a/drivers/gpu/drm/xe/Kconfig
> +++ b/drivers/gpu/drm/xe/Kconfig
> @@ -87,14 +87,16 @@ config DRM_XE_GPUSVM
>
> If in doubut say "Y".
>
> -config DRM_XE_DEVMEM_MIRROR
> - bool "Enable device memory mirror"
> +config DRM_XE_PAGEMAP
> + bool "Enable device memory pool for SVM"
> depends on DRM_XE_GPUSVM
> select GET_FREE_REGION
> default y
> help
> - Disable this option only if you want to compile out without device
> - memory mirror. Will reduce KMD memory footprint when disabled.
> + Disable this option only if you don't want to expose local device
> + memory for SVM. Will reduce KMD memory footprint when disabled.
> +
> + If in doubut say "Y".
>
> config DRM_XE_FORCE_PROBE
> string "Force probe xe for selected Intel hardware IDs"
> diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
> index eb5e83c5f233..e0efaf23d051 100644
> --- a/drivers/gpu/drm/xe/xe_bo_types.h
> +++ b/drivers/gpu/drm/xe/xe_bo_types.h
> @@ -86,7 +86,7 @@ struct xe_bo {
> u16 cpu_caching;
>
> /** @devmem_allocation: SVM device memory allocation */
> - struct drm_gpusvm_devmem devmem_allocation;
> + struct drm_pagemap_devmem devmem_allocation;
>
> /** @vram_userfault_link: Link into @mem_access.vram_userfault.list */
> struct list_head vram_userfault_link;
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index b93c04466637..67b7f733dd69 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -104,7 +104,7 @@ struct xe_vram_region {
> void __iomem *mapping;
> /** @ttm: VRAM TTM manager */
> struct xe_ttm_vram_mgr ttm;
> -#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
> +#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> /** @pagemap: Used to remap device memory as ZONE_DEVICE */
> struct dev_pagemap pagemap;
> /**
> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index f27fb9b588de..e161ce3e67a1 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -329,7 +329,7 @@ static void xe_svm_garbage_collector_work_func(struct work_struct *w)
> up_write(&vm->lock);
> }
>
> -#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
> +#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
>
> static struct xe_vram_region *page_to_vr(struct page *page)
> {
> @@ -517,12 +517,12 @@ static int xe_svm_copy_to_ram(struct page **pages, dma_addr_t *dma_addr,
> return xe_svm_copy(pages, dma_addr, npages, XE_SVM_COPY_TO_SRAM);
> }
>
> -static struct xe_bo *to_xe_bo(struct drm_gpusvm_devmem *devmem_allocation)
> +static struct xe_bo *to_xe_bo(struct drm_pagemap_devmem *devmem_allocation)
> {
> return container_of(devmem_allocation, struct xe_bo, devmem_allocation);
> }
>
> -static void xe_svm_devmem_release(struct drm_gpusvm_devmem *devmem_allocation)
> +static void xe_svm_devmem_release(struct drm_pagemap_devmem *devmem_allocation)
> {
> struct xe_bo *bo = to_xe_bo(devmem_allocation);
>
> @@ -539,7 +539,7 @@ static struct drm_buddy *tile_to_buddy(struct xe_tile *tile)
> return &tile->mem.vram.ttm.mm;
> }
>
> -static int xe_svm_populate_devmem_pfn(struct drm_gpusvm_devmem *devmem_allocation,
> +static int xe_svm_populate_devmem_pfn(struct drm_pagemap_devmem *devmem_allocation,
> unsigned long npages, unsigned long *pfn)
> {
> struct xe_bo *bo = to_xe_bo(devmem_allocation);
> @@ -562,7 +562,7 @@ static int xe_svm_populate_devmem_pfn(struct drm_gpusvm_devmem *devmem_allocatio
> return 0;
> }
>
> -static const struct drm_gpusvm_devmem_ops gpusvm_devmem_ops = {
> +static const struct drm_pagemap_devmem_ops dpagemap_devmem_ops = {
> .devmem_release = xe_svm_devmem_release,
> .populate_devmem_pfn = xe_svm_populate_devmem_pfn,
> .copy_to_devmem = xe_svm_copy_to_devmem,
> @@ -714,7 +714,7 @@ u64 xe_svm_find_vma_start(struct xe_vm *vm, u64 start, u64 end, struct xe_vma *v
> min(end, xe_vma_end(vma)));
> }
>
> -#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
> +#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> static struct xe_vram_region *tile_to_vr(struct xe_tile *tile)
> {
> return &tile->mem.vram;
> @@ -742,6 +742,9 @@ int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
> ktime_t end = 0;
> int err;
>
> + if (!range->base.flags.migrate_devmem)
> + return -EINVAL;
> +
> range_debug(range, "ALLOCATE VRAM");
>
> if (!mmget_not_zero(mm))
> @@ -761,19 +764,23 @@ int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
> goto unlock;
> }
>
> - drm_gpusvm_devmem_init(&bo->devmem_allocation,
> - vm->xe->drm.dev, mm,
> - &gpusvm_devmem_ops,
> - &tile->mem.vram.dpagemap,
> - xe_svm_range_size(range));
> + drm_pagemap_devmem_init(&bo->devmem_allocation,
> + vm->xe->drm.dev, mm,
> + &dpagemap_devmem_ops,
> + &tile->mem.vram.dpagemap,
> + xe_svm_range_size(range));
>
> blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)->blocks;
> list_for_each_entry(block, blocks, link)
> block->private = vr;
>
> xe_bo_get(bo);
> - err = drm_gpusvm_migrate_to_devmem(&vm->svm.gpusvm, &range->base,
> - &bo->devmem_allocation, ctx);
> + err = drm_pagemap_migrate_to_devmem(&bo->devmem_allocation,
> + mm,
> + xe_svm_range_start(range),
> + xe_svm_range_end(range),
> + ctx->timeslice_ms,
> + xe_svm_devm_owner(vm->xe));
> if (err)
> xe_svm_devmem_release(&bo->devmem_allocation);
>
> @@ -848,13 +855,13 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
> struct drm_gpusvm_ctx ctx = {
> .read_only = xe_vma_read_only(vma),
> .devmem_possible = IS_DGFX(vm->xe) &&
> - IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR),
> - .check_pages_threshold = IS_DGFX(vm->xe) &&
> - IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR) ? SZ_64K : 0,
> + IS_ENABLED(CONFIG_DRM_XE_PAGEMAP),
> + .check_pages_threshold = IS_DGFX(vm->xe) &&
> + IS_ENABLED(CONFIG_DRM_XE_PAGEMAP) ? SZ_64K : 0,
> .devmem_only = atomic && IS_DGFX(vm->xe) &&
> - IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR),
> + IS_ENABLED(CONFIG_DRM_XE_PAGEMAP),
> .timeslice_ms = atomic && IS_DGFX(vm->xe) &&
> - IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR) ?
> + IS_ENABLED(CONFIG_DRM_XE_PAGEMAP) ?
> vm->xe->atomic_svm_timeslice_ms : 0,
> };
> struct xe_svm_range *range;
> @@ -992,7 +999,7 @@ bool xe_svm_has_mapping(struct xe_vm *vm, u64 start, u64 end)
> */
> int xe_svm_bo_evict(struct xe_bo *bo)
> {
> - return drm_gpusvm_evict_to_ram(&bo->devmem_allocation);
> + return drm_pagemap_evict_to_ram(&bo->devmem_allocation);
> }
>
> /**
> @@ -1045,7 +1052,7 @@ int xe_svm_range_get_pages(struct xe_vm *vm, struct xe_svm_range *range,
> return err;
> }
>
> -#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
> +#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
>
> static struct drm_pagemap_device_addr
> xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap,
> @@ -1102,7 +1109,7 @@ int xe_devm_add(struct xe_tile *tile, struct xe_vram_region *vr)
> vr->pagemap.range.start = res->start;
> vr->pagemap.range.end = res->end;
> vr->pagemap.nr_range = 1;
> - vr->pagemap.ops = drm_gpusvm_pagemap_ops_get();
> + vr->pagemap.ops = drm_pagemap_pagemap_ops_get();
> vr->pagemap.owner = xe_svm_devm_owner(xe);
> addr = devm_memremap_pages(dev, &vr->pagemap);
>
> diff --git a/include/drm/drm_gpusvm.h b/include/drm/drm_gpusvm.h
> index 6a5156476bf4..4aedc5423aff 100644
> --- a/include/drm/drm_gpusvm.h
> +++ b/include/drm/drm_gpusvm.h
> @@ -16,91 +16,9 @@ struct drm_gpusvm;
> struct drm_gpusvm_notifier;
> struct drm_gpusvm_ops;
> struct drm_gpusvm_range;
> -struct drm_gpusvm_devmem;
> struct drm_pagemap;
> struct drm_pagemap_device_addr;
>
> -/**
> - * struct drm_gpusvm_devmem_ops - Operations structure for GPU SVM device memory
> - *
> - * This structure defines the operations for GPU Shared Virtual Memory (SVM)
> - * device memory. These operations are provided by the GPU driver to manage device memory
> - * allocations and perform operations such as migration between device memory and system
> - * RAM.
> - */
> -struct drm_gpusvm_devmem_ops {
> - /**
> - * @devmem_release: Release device memory allocation (optional)
> - * @devmem_allocation: device memory allocation
> - *
> - * Release device memory allocation and drop a reference to device
> - * memory allocation.
> - */
> - void (*devmem_release)(struct drm_gpusvm_devmem *devmem_allocation);
> -
> - /**
> - * @populate_devmem_pfn: Populate device memory PFN (required for migration)
> - * @devmem_allocation: device memory allocation
> - * @npages: Number of pages to populate
> - * @pfn: Array of page frame numbers to populate
> - *
> - * Populate device memory page frame numbers (PFN).
> - *
> - * Return: 0 on success, a negative error code on failure.
> - */
> - int (*populate_devmem_pfn)(struct drm_gpusvm_devmem *devmem_allocation,
> - unsigned long npages, unsigned long *pfn);
> -
> - /**
> - * @copy_to_devmem: Copy to device memory (required for migration)
> - * @pages: Pointer to array of device memory pages (destination)
> - * @dma_addr: Pointer to array of DMA addresses (source)
> - * @npages: Number of pages to copy
> - *
> - * Copy pages to device memory.
> - *
> - * Return: 0 on success, a negative error code on failure.
> - */
> - int (*copy_to_devmem)(struct page **pages,
> - dma_addr_t *dma_addr,
> - unsigned long npages);
> -
> - /**
> - * @copy_to_ram: Copy to system RAM (required for migration)
> - * @pages: Pointer to array of device memory pages (source)
> - * @dma_addr: Pointer to array of DMA addresses (destination)
> - * @npages: Number of pages to copy
> - *
> - * Copy pages to system RAM.
> - *
> - * Return: 0 on success, a negative error code on failure.
> - */
> - int (*copy_to_ram)(struct page **pages,
> - dma_addr_t *dma_addr,
> - unsigned long npages);
> -};
> -
> -/**
> - * struct drm_gpusvm_devmem - Structure representing a GPU SVM device memory allocation
> - *
> - * @dev: Pointer to the device structure which device memory allocation belongs to
> - * @mm: Pointer to the mm_struct for the address space
> - * @detached: device memory allocations is detached from device pages
> - * @ops: Pointer to the operations structure for GPU SVM device memory
> - * @dpagemap: The struct drm_pagemap of the pages this allocation belongs to.
> - * @size: Size of device memory allocation
> - * @timeslice_expiration: Timeslice expiration in jiffies
> - */
> -struct drm_gpusvm_devmem {
> - struct device *dev;
> - struct mm_struct *mm;
> - struct completion detached;
> - const struct drm_gpusvm_devmem_ops *ops;
> - struct drm_pagemap *dpagemap;
> - size_t size;
> - u64 timeslice_expiration;
> -};
> -
> /**
> * struct drm_gpusvm_ops - Operations structure for GPU SVM
> *
> @@ -361,15 +279,6 @@ void drm_gpusvm_range_unmap_pages(struct drm_gpusvm *gpusvm,
> struct drm_gpusvm_range *range,
> const struct drm_gpusvm_ctx *ctx);
>
> -int drm_gpusvm_migrate_to_devmem(struct drm_gpusvm *gpusvm,
> - struct drm_gpusvm_range *range,
> - struct drm_gpusvm_devmem *devmem_allocation,
> - const struct drm_gpusvm_ctx *ctx);
> -
> -int drm_gpusvm_evict_to_ram(struct drm_gpusvm_devmem *devmem_allocation);
> -
> -const struct dev_pagemap_ops *drm_gpusvm_pagemap_ops_get(void);
> -
> bool drm_gpusvm_has_mapping(struct drm_gpusvm *gpusvm, unsigned long start,
> unsigned long end);
>
> @@ -380,11 +289,6 @@ drm_gpusvm_range_find(struct drm_gpusvm_notifier *notifier, unsigned long start,
> void drm_gpusvm_range_set_unmapped(struct drm_gpusvm_range *range,
> const struct mmu_notifier_range *mmu_range);
>
> -void drm_gpusvm_devmem_init(struct drm_gpusvm_devmem *devmem_allocation,
> - struct device *dev, struct mm_struct *mm,
> - const struct drm_gpusvm_devmem_ops *ops,
> - struct drm_pagemap *dpagemap, size_t size);
> -
> #ifdef CONFIG_LOCKDEP
> /**
> * drm_gpusvm_driver_set_lock() - Set the lock protecting accesses to GPU SVM
> diff --git a/include/drm/drm_pagemap.h b/include/drm/drm_pagemap.h
> index 202c157ff4d7..dabc9c365df4 100644
> --- a/include/drm/drm_pagemap.h
> +++ b/include/drm/drm_pagemap.h
> @@ -7,6 +7,7 @@
> #include <linux/types.h>
>
> struct drm_pagemap;
> +struct drm_pagemap_zdd;
> struct device;
>
> /**
> @@ -104,4 +105,104 @@ struct drm_pagemap {
> struct device *dev;
> };
>
> +struct drm_pagemap_devmem;
> +
> +/**
> + * struct drm_pagemap_devmem_ops - Operations structure for GPU SVM device memory
> + *
> + * This structure defines the operations for GPU Shared Virtual Memory (SVM)
> + * device memory. These operations are provided by the GPU driver to manage device memory
> + * allocations and perform operations such as migration between device memory and system
> + * RAM.
> + */
> +struct drm_pagemap_devmem_ops {
> + /**
> + * @devmem_release: Release device memory allocation (optional)
> + * @devmem_allocation: device memory allocation
> + *
> + * Release device memory allocation and drop a reference to device
> + * memory allocation.
> + */
> + void (*devmem_release)(struct drm_pagemap_devmem *devmem_allocation);
> +
> + /**
> + * @populate_devmem_pfn: Populate device memory PFN (required for migration)
> + * @devmem_allocation: device memory allocation
> + * @npages: Number of pages to populate
> + * @pfn: Array of page frame numbers to populate
> + *
> + * Populate device memory page frame numbers (PFN).
> + *
> + * Return: 0 on success, a negative error code on failure.
> + */
> + int (*populate_devmem_pfn)(struct drm_pagemap_devmem *devmem_allocation,
> + unsigned long npages, unsigned long *pfn);
> +
> + /**
> + * @copy_to_devmem: Copy to device memory (required for migration)
> + * @pages: Pointer to array of device memory pages (destination)
> + * @dma_addr: Pointer to array of DMA addresses (source)
> + * @npages: Number of pages to copy
> + *
> + * Copy pages to device memory.
> + *
> + * Return: 0 on success, a negative error code on failure.
> + */
> + int (*copy_to_devmem)(struct page **pages,
> + dma_addr_t *dma_addr,
> + unsigned long npages);
> +
> + /**
> + * @copy_to_ram: Copy to system RAM (required for migration)
> + * @pages: Pointer to array of device memory pages (source)
> + * @dma_addr: Pointer to array of DMA addresses (destination)
> + * @npages: Number of pages to copy
> + *
> + * Copy pages to system RAM.
> + *
> + * Return: 0 on success, a negative error code on failure.
> + */
> + int (*copy_to_ram)(struct page **pages,
> + dma_addr_t *dma_addr,
> + unsigned long npages);
> +};
> +
> +/**
> + * struct drm_pagemap_devmem - Structure representing a GPU SVM device memory allocation
> + *
> + * @dev: Pointer to the device structure which device memory allocation belongs to
> + * @mm: Pointer to the mm_struct for the address space
> + * @detached: device memory allocations is detached from device pages
> + * @ops: Pointer to the operations structure for GPU SVM device memory
> + * @dpagemap: The struct drm_pagemap of the pages this allocation belongs to.
> + * @size: Size of device memory allocation
> + * @timeslice_expiration: Timeslice expiration in jiffies
> + */
> +struct drm_pagemap_devmem {
> + struct device *dev;
> + struct mm_struct *mm;
> + struct completion detached;
> + const struct drm_pagemap_devmem_ops *ops;
> + struct drm_pagemap *dpagemap;
> + size_t size;
> + u64 timeslice_expiration;
> +};
> +
> +int drm_pagemap_migrate_to_devmem(struct drm_pagemap_devmem *devmem_allocation,
> + struct mm_struct *mm,
> + unsigned long start, unsigned long end,
> + unsigned long timeslice_ms,
> + void *pgmap_owner);
> +
> +int drm_pagemap_evict_to_ram(struct drm_pagemap_devmem *devmem_allocation);
> +
> +const struct dev_pagemap_ops *drm_pagemap_pagemap_ops_get(void);
> +
> +struct drm_pagemap *drm_pagemap_page_to_dpagemap(struct page *page);
> +
> +void drm_pagemap_devmem_init(struct drm_pagemap_devmem *devmem_allocation,
> + struct device *dev, struct mm_struct *mm,
> + const struct drm_pagemap_devmem_ops *ops,
> + struct drm_pagemap *dpagemap, size_t size);
> +
> #endif
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/3] drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap
2025-06-05 22:44 ` Matthew Brost
@ 2025-06-13 10:01 ` Thomas Hellström
0 siblings, 0 replies; 16+ messages in thread
From: Thomas Hellström @ 2025-06-13 10:01 UTC (permalink / raw)
To: Matthew Brost
Cc: intel-xe, dri-devel, himal.prasad.ghimiray, apopple, airlied,
Simona Vetter, felix.kuehling, Christian König, dakr,
Mrozek, Michal, Joonas Lahtinen
On Thu, 2025-06-05 at 15:44 -0700, Matthew Brost wrote:
> On Wed, Jun 04, 2025 at 11:35:34AM +0200, Thomas Hellström wrote:
> > From: Matthew Brost <matthew.brost@intel.com>
> >
> > The migration functionality and track-keeping of per-pagemap VRAM
> > mapped to the CPU mm is not per GPU_vm, but rather per pagemap.
> > This is also reflected by the functions not needing the drm_gpusvm
> > structures. So move to drm_pagemap.
> >
> > With this, drm_gpusvm shouldn't really access the page zone-device-
> > data
> > since its meaning is internal to drm_pagemap. Currently it's used
> > to
> > reject mapping ranges backed by multiple drm_pagemap allocations.
> > For now, make the zone-device-data a void pointer.
> >
> > Rename CONFIG_DRM_XE_DEVMEM_MIRROR to CONFIG_DRM_XE_PAGEMAP.
> >
> > Matt is listed as author of this commit since he wrote most of the
> > code,
> > and it makes sense to retain his git authorship.
> > Thomas mostly moved the code around.
> >
>
> Kernel test robot has kernel doc fixes. A couple questions / comments
> on
> the new doc below.
8<---------------------------------------------------------------------
>
> > +/**
> > + * DOC: Overview
> > + *
> > + * The DRM pagemap layer is intended to augment the dev_pagemap
> > functionality by
> > + * providing a way to populate a struct mm_struct virtual range
> > with device
> > + * private pages and to provide helpers to abstract device memory
> > allocations,
> > + * to migrate memory back and forth between device memory and
> > system RAM and
> > + * to handle access (and in the future migration) between devices
> > implementing
> > + * a fast interconnect that is not necessarily visible to the rest
> > of the
> > + * system.
>
> The latter part (fast interconnect support) is NIY, right. Also not
> only
> fast interconnects, PCIe P2P, right?
Yes. PCIe P2P is intended to be included in "fast interconnects".
>
> > + *
> > + * Typically the DRM pagemap receives requests from one or more
> > DRM GPU SVM
> > + * instances to populate struct mm_struct virtual ranges with
> > memory, and the
> > + * migration is best effort only and may thus fail. The
> > implementation should
> > + * also handle device unbinding by blocking (return an -ENODEV)
> > error for new
> > + * population requests and after that migrate all device pages to
> > system ram.
>
> So this means populate_devmem_pfn returning -ENODEV. Should we
> specifically document this return code in populate_devmem_pfn,
> drm_pagemap_migrate_to_devmem?
Yes, agreed. This is actually more suitable for patch 2, so I'll move
it there.
>
> > + */
> > +
> > +/**
> > + * DOC: Migration
> > + * Migration granularity typically follows the GPU SVM range
> > requests, but
> > + * if there are clashes, due to races or due to the fact that
> > multiple GPU
> > + * SVM instances have different views of the ranges used, and
> > because of that
>
> 'multiple GPU SVM instances have different views of the ranges used'
>
> This seems scray and hard handle, perhaps you have thought this one
> through a bit more than me.
I'd say given multiple devices and rogue / bad / racy user-space this
is a situation we need to account for. I don't think we necessarily
need to handle it very gracefully, but we need to take proper action.
>
> > + * parts of a requested range is already present in the requested
> > device memory,
> > + * the implementation has a variety of options. It can fail and it
> > can choose
> > + * to populate only the part of the range that isn't already in
> > device memory,
> > + * and it can evict the range to system before trying to migrate.
> > Ideally an
> > + * implementation would just try to migrate the missing part of
> > the range and
> > + * allocate just enough memory to do so.
> > + *
>
> I think we need bit more plumbing to implement the ideal case but
> again
> maybe you thought this one through a more than me.
Yes I have some ideas here how to implement that ideal case for xe if
needed.
>
> In general there this doc seems forward looking to this not implement
> yet which I'm not sure is a good idea for just moving code around.
Agreed. I'll move parts of the documentaion changes to later patches.
Thanks for review,
Thomas
>
> Matt
>
> > + * When migrating to system memory as a response to a cpu fault or
> > a device
> > + * memory eviction request, currently a full device memory
> > allocation is
> > + * migrated back to system. Moving forward this might need
> > improvement for
> > + * situations where a single page needs bouncing between system
> > memory and
> > + * device memory due to, for example, atomic operations.
> > + *
> > + * Key DRM pagemap components:
> > + *
> > + * - Device Memory Allocations:
> > + * Embedded structure containing enough information for the
> > drm_pagemap to
> > + * migrate to / from device memory.
> > + *
> > + * - Device Memory Operations:
> > + * Define the interface for driver-specific device memory
> > operations
> > + * release memory, populate pfns, and copy to / from device
> > memory.
> > + */
> > +
> > +/**
> > + * struct drm_pagemap_zdd - GPU SVM zone device data
> > + *
> > + * @refcount: Reference count for the zdd
> > + * @devmem_allocation: device memory allocation
> > + * @device_private_page_owner: Device private pages owner
> > + *
> > + * This structure serves as a generic wrapper installed in
> > + * page->zone_device_data. It provides infrastructure for looking
> > up a device
> > + * memory allocation upon CPU page fault and asynchronously
> > releasing device
> > + * memory once the CPU has no page references. Asynchronous
> > release is useful
> > + * because CPU page references can be dropped in IRQ contexts,
> > while releasing
> > + * device memory likely requires sleeping locks.
> > + */
> > +struct drm_pagemap_zdd {
> > + struct kref refcount;
> > + struct drm_pagemap_devmem *devmem_allocation;
> > + void *device_private_page_owner;
> > +};
> > +
> > +/**
> > + * drm_pagemap_zdd_alloc() - Allocate a zdd structure.
> > + * @device_private_page_owner: Device private pages owner
> > + *
> > + * This function allocates and initializes a new zdd structure. It
> > sets up the
> > + * reference count and initializes the destroy work.
> > + *
> > + * Return: Pointer to the allocated zdd on success, ERR_PTR() on
> > failure.
> > + */
> > +static struct drm_pagemap_zdd *
> > +drm_pagemap_zdd_alloc(void *device_private_page_owner)
> > +{
> > + struct drm_pagemap_zdd *zdd;
> > +
> > + zdd = kmalloc(sizeof(*zdd), GFP_KERNEL);
> > + if (!zdd)
> > + return NULL;
> > +
> > + kref_init(&zdd->refcount);
> > + zdd->devmem_allocation = NULL;
> > + zdd->device_private_page_owner =
> > device_private_page_owner;
> > +
> > + return zdd;
> > +}
> > +
> > +/**
> > + * drm_pagemap_zdd_get() - Get a reference to a zdd structure.
> > + * @zdd: Pointer to the zdd structure.
> > + *
> > + * This function increments the reference count of the provided
> > zdd structure.
> > + *
> > + * Return: Pointer to the zdd structure.
> > + */
> > +static struct drm_pagemap_zdd *drm_pagemap_zdd_get(struct
> > drm_pagemap_zdd *zdd)
> > +{
> > + kref_get(&zdd->refcount);
> > + return zdd;
> > +}
> > +
> > +/**
> > + * drm_pagemap_zdd_destroy() - Destroy a zdd structure.
> > + * @ref: Pointer to the reference count structure.
> > + *
> > + * This function queues the destroy_work of the zdd for
> > asynchronous destruction.
> > + */
> > +static void drm_pagemap_zdd_destroy(struct kref *ref)
> > +{
> > + struct drm_pagemap_zdd *zdd =
> > + container_of(ref, struct drm_pagemap_zdd,
> > refcount);
> > + struct drm_pagemap_devmem *devmem = zdd-
> > >devmem_allocation;
> > +
> > + if (devmem) {
> > + complete_all(&devmem->detached);
> > + if (devmem->ops->devmem_release)
> > + devmem->ops->devmem_release(devmem);
> > + }
> > + kfree(zdd);
> > +}
> > +
> > +/**
> > + * drm_pagemap_zdd_put() - Put a zdd reference.
> > + * @zdd: Pointer to the zdd structure.
> > + *
> > + * This function decrements the reference count of the provided
> > zdd structure
> > + * and schedules its destruction if the count drops to zero.
> > + */
> > +static void drm_pagemap_zdd_put(struct drm_pagemap_zdd *zdd)
> > +{
> > + kref_put(&zdd->refcount, drm_pagemap_zdd_destroy);
> > +}
> > +
> > +/**
> > + * drm_pagemap_migration_unlock_put_page() - Put a migration page
> > + * @page: Pointer to the page to put
> > + *
> > + * This function unlocks and puts a page.
> > + */
> > +static void drm_pagemap_migration_unlock_put_page(struct page
> > *page)
> > +{
> > + unlock_page(page);
> > + put_page(page);
> > +}
> > +
> > +/**
> > + * drm_pagemap_migration_unlock_put_pages() - Put migration pages
> > + * @npages: Number of pages
> > + * @migrate_pfn: Array of migrate page frame numbers
> > + *
> > + * This function unlocks and puts an array of pages.
> > + */
> > +static void drm_pagemap_migration_unlock_put_pages(unsigned long
> > npages,
> > + unsigned long
> > *migrate_pfn)
> > +{
> > + unsigned long i;
> > +
> > + for (i = 0; i < npages; ++i) {
> > + struct page *page;
> > +
> > + if (!migrate_pfn[i])
> > + continue;
> > +
> > + page = migrate_pfn_to_page(migrate_pfn[i]);
> > + drm_pagemap_migration_unlock_put_page(page);
> > + migrate_pfn[i] = 0;
> > + }
> > +}
> > +
> > +/**
> > + * drm_pagemap_get_devmem_page() - Get a reference to a device
> > memory page
> > + * @page: Pointer to the page
> > + * @zdd: Pointer to the GPU SVM zone device data
> > + *
> > + * This function associates the given page with the specified GPU
> > SVM zone
> > + * device data and initializes it for zone device usage.
> > + */
> > +static void drm_pagemap_get_devmem_page(struct page *page,
> > + struct drm_pagemap_zdd
> > *zdd)
> > +{
> > + page->zone_device_data = drm_pagemap_zdd_get(zdd);
> > + zone_device_page_init(page);
> > +}
> > +
> > +/**
> > + * drm_pagemap_migrate_map_pages() - Map migration pages for GPU
> > SVM migration
> > + * @dev: The device for which the pages are being mapped
> > + * @dma_addr: Array to store DMA addresses corresponding to mapped
> > pages
> > + * @migrate_pfn: Array of migrate page frame numbers to map
> > + * @npages: Number of pages to map
> > + * @dir: Direction of data transfer (e.g., DMA_BIDIRECTIONAL)
> > + *
> > + * This function maps pages of memory for migration usage in GPU
> > SVM. It
> > + * iterates over each page frame number provided in @migrate_pfn,
> > maps the
> > + * corresponding page, and stores the DMA address in the provided
> > @dma_addr
> > + * array.
> > + *
> > + * Returns: 0 on success, -EFAULT if an error occurs during
> > mapping.
> > + */
> > +static int drm_pagemap_migrate_map_pages(struct device *dev,
> > + dma_addr_t *dma_addr,
> > + unsigned long
> > *migrate_pfn,
> > + unsigned long npages,
> > + enum dma_data_direction
> > dir)
> > +{
> > + unsigned long i;
> > +
> > + for (i = 0; i < npages; ++i) {
> > + struct page *page =
> > migrate_pfn_to_page(migrate_pfn[i]);
> > +
> > + if (!page)
> > + continue;
> > +
> > + if (WARN_ON_ONCE(is_zone_device_page(page)))
> > + return -EFAULT;
> > +
> > + dma_addr[i] = dma_map_page(dev, page, 0,
> > PAGE_SIZE, dir);
> > + if (dma_mapping_error(dev, dma_addr[i]))
> > + return -EFAULT;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * drm_pagemap_migrate_unmap_pages() - Unmap pages previously
> > mapped for GPU SVM migration
> > + * @dev: The device for which the pages were mapped
> > + * @dma_addr: Array of DMA addresses corresponding to mapped pages
> > + * @npages: Number of pages to unmap
> > + * @dir: Direction of data transfer (e.g., DMA_BIDIRECTIONAL)
> > + *
> > + * This function unmaps previously mapped pages of memory for GPU
> > Shared Virtual
> > + * Memory (SVM). It iterates over each DMA address provided in
> > @dma_addr, checks
> > + * if it's valid and not already unmapped, and unmaps the
> > corresponding page.
> > + */
> > +static void drm_pagemap_migrate_unmap_pages(struct device *dev,
> > + dma_addr_t *dma_addr,
> > + unsigned long npages,
> > + enum
> > dma_data_direction dir)
> > +{
> > + unsigned long i;
> > +
> > + for (i = 0; i < npages; ++i) {
> > + if (!dma_addr[i] || dma_mapping_error(dev,
> > dma_addr[i]))
> > + continue;
> > +
> > + dma_unmap_page(dev, dma_addr[i], PAGE_SIZE, dir);
> > + }
> > +}
> > +
> > +static unsigned long
> > +npages_in_range(unsigned long start, unsigned long end)
> > +{
> > + return (end - start) >> PAGE_SHIFT;
> > +}
> > +
> > +
> > +/**
> > + * drm_pagemap_migrate_to_devmem() - Migrate a struct mm_struct
> > range to device memory
> > + * @devmem_allocation: The device memory allocation to migrate to.
> > + * The caller should hold a reference to the device memory
> > allocation,
> > + * and the reference is consumed by this function unless it
> > returns with
> > + * an error.
> > + * @mm: Pointer to the struct mm_struct.
> > + * @start: Start of the virtual address range to migrate.
> > + * @end: End of the virtual address range to migrate.
> > + * @pgmap_owner: Not used currently, since only system memory is
> > considered.
> > + *
> > + * This function migrates the specified virtual address range to
> > device memory.
> > + * It performs the necessary setup and invokes the driver-specific
> > operations for
> > + * migration to device memory. Expected to be called while holding
> > the mmap lock in
> > + * at least read mode.
> > + *
> > + * Return: %0 on success, negative error code on failure.
> > + */
> > +
> > +/*
> > + * @range: Pointer to the GPU SVM range structure
> > + * @devmem_allocation: Pointer to the device memory allocation.
> > The caller
> > + * should hold a reference to the device
> > memory allocation,
> > + * which should be dropped via ops-
> > >devmem_release or upon
> > + * the failure of this function.
> > + * @ctx: GPU SVM context
> > + *
> > + * This function migrates the specified GPU SVM range to device
> > memory. It
> > + * performs the necessary setup and invokes the driver-specific
> > operations for
> > + * migration to device memory. Upon successful return,
> > @devmem_allocation can
> > + * safely reference @range until ops->devmem_release is called
> > which only upon
> > + * successful return. Expected to be called while holding the mmap
> > lock in read
> > + * mode.
> > + *
> > + * Return: 0 on success, negative error code on failure.
> > + */
> > +int drm_pagemap_migrate_to_devmem(struct drm_pagemap_devmem
> > *devmem_allocation,
> > + struct mm_struct *mm,
> > + unsigned long start, unsigned
> > long end,
> > + unsigned long timeslice_ms,
> > + void *pgmap_owner)
> > +{
> > + const struct drm_pagemap_devmem_ops *ops =
> > devmem_allocation->ops;
> > + struct migrate_vma migrate = {
> > + .start = start,
> > + .end = end,
> > + .pgmap_owner = pgmap_owner,
> > + .flags = MIGRATE_VMA_SELECT_SYSTEM,
> > + };
> > + unsigned long i, npages = npages_in_range(start, end);
> > + struct vm_area_struct *vas;
> > + struct drm_pagemap_zdd *zdd = NULL;
> > + struct page **pages;
> > + dma_addr_t *dma_addr;
> > + void *buf;
> > + int err;
> > +
> > + mmap_assert_locked(mm);
> > +
> > + if (!ops->populate_devmem_pfn || !ops->copy_to_devmem ||
> > + !ops->copy_to_ram)
> > + return -EOPNOTSUPP;
> > +
> > + vas = vma_lookup(mm, start);
> > + if (!vas) {
> > + err = -ENOENT;
> > + goto err_out;
> > + }
> > +
> > + if (end > vas->vm_end || start < vas->vm_start) {
> > + err = -EINVAL;
> > + goto err_out;
> > + }
> > +
> > + if (!vma_is_anonymous(vas)) {
> > + err = -EBUSY;
> > + goto err_out;
> > + }
> > +
> > + buf = kvcalloc(npages, 2 * sizeof(*migrate.src) +
> > sizeof(*dma_addr) +
> > + sizeof(*pages), GFP_KERNEL);
> > + if (!buf) {
> > + err = -ENOMEM;
> > + goto err_out;
> > + }
> > + dma_addr = buf + (2 * sizeof(*migrate.src) * npages);
> > + pages = buf + (2 * sizeof(*migrate.src) +
> > sizeof(*dma_addr)) * npages;
> > +
> > + zdd = drm_pagemap_zdd_alloc(pgmap_owner);
> > + if (!zdd) {
> > + err = -ENOMEM;
> > + goto err_free;
> > + }
> > +
> > + migrate.vma = vas;
> > + migrate.src = buf;
> > + migrate.dst = migrate.src + npages;
> > +
> > + err = migrate_vma_setup(&migrate);
> > + if (err)
> > + goto err_free;
> > +
> > + if (!migrate.cpages) {
> > + err = -EFAULT;
> > + goto err_free;
> > + }
> > +
> > + if (migrate.cpages != npages) {
> > + err = -EBUSY;
> > + goto err_finalize;
> > + }
> > +
> > + err = ops->populate_devmem_pfn(devmem_allocation, npages,
> > migrate.dst);
> > + if (err)
> > + goto err_finalize;
> > +
> > + err = drm_pagemap_migrate_map_pages(devmem_allocation-
> > >dev, dma_addr,
> > + migrate.src, npages,
> > DMA_TO_DEVICE);
> > + if (err)
> > + goto err_finalize;
> > +
> > + for (i = 0; i < npages; ++i) {
> > + struct page *page = pfn_to_page(migrate.dst[i]);
> > +
> > + pages[i] = page;
> > + migrate.dst[i] = migrate_pfn(migrate.dst[i]);
> > + drm_pagemap_get_devmem_page(page, zdd);
> > + }
> > +
> > + err = ops->copy_to_devmem(pages, dma_addr, npages);
> > + if (err)
> > + goto err_finalize;
> > +
> > + /* Upon success bind devmem allocation to range and zdd */
> > + devmem_allocation->timeslice_expiration = get_jiffies_64()
> > +
> > + msecs_to_jiffies(timeslice_ms);
> > + zdd->devmem_allocation = devmem_allocation; /* Owns
> > ref */
> > +
> > +err_finalize:
> > + if (err)
> > + drm_pagemap_migration_unlock_put_pages(npages,
> > migrate.dst);
> > + migrate_vma_pages(&migrate);
> > + migrate_vma_finalize(&migrate);
> > + drm_pagemap_migrate_unmap_pages(devmem_allocation->dev,
> > dma_addr, npages,
> > + DMA_TO_DEVICE);
> > +err_free:
> > + if (zdd)
> > + drm_pagemap_zdd_put(zdd);
> > + kvfree(buf);
> > +err_out:
> > + return err;
> > +}
> > +EXPORT_SYMBOL_GPL(drm_pagemap_migrate_to_devmem);
> > +
> > +/**
> > + * drm_pagemap_migrate_populate_ram_pfn() - Populate RAM PFNs for
> > a VM area
> > + * @vas: Pointer to the VM area structure, can be NULL
> > + * @fault_page: Fault page
> > + * @npages: Number of pages to populate
> > + * @mpages: Number of pages to migrate
> > + * @src_mpfn: Source array of migrate PFNs
> > + * @mpfn: Array of migrate PFNs to populate
> > + * @addr: Start address for PFN allocation
> > + *
> > + * This function populates the RAM migrate page frame numbers
> > (PFNs) for the
> > + * specified VM area structure. It allocates and locks pages in
> > the VM area for
> > + * RAM usage. If vas is non-NULL use alloc_page_vma for
> > allocation, if NULL use
> > + * alloc_page for allocation.
> > + *
> > + * Return: 0 on success, negative error code on failure.
> > + */
> > +static int drm_pagemap_migrate_populate_ram_pfn(struct
> > vm_area_struct *vas,
> > + struct page
> > *fault_page,
> > + unsigned long
> > npages,
> > + unsigned long
> > *mpages,
> > + unsigned long
> > *src_mpfn,
> > + unsigned long
> > *mpfn,
> > + unsigned long addr)
> > +{
> > + unsigned long i;
> > +
> > + for (i = 0; i < npages; ++i, addr += PAGE_SIZE) {
> > + struct page *page, *src_page;
> > +
> > + if (!(src_mpfn[i] & MIGRATE_PFN_MIGRATE))
> > + continue;
> > +
> > + src_page = migrate_pfn_to_page(src_mpfn[i]);
> > + if (!src_page)
> > + continue;
> > +
> > + if (fault_page) {
> > + if (src_page->zone_device_data !=
> > + fault_page->zone_device_data)
> > + continue;
> > + }
> > +
> > + if (vas)
> > + page = alloc_page_vma(GFP_HIGHUSER, vas,
> > addr);
> > + else
> > + page = alloc_page(GFP_HIGHUSER);
> > +
> > + if (!page)
> > + goto free_pages;
> > +
> > + mpfn[i] = migrate_pfn(page_to_pfn(page));
> > + }
> > +
> > + for (i = 0; i < npages; ++i) {
> > + struct page *page = migrate_pfn_to_page(mpfn[i]);
> > +
> > + if (!page)
> > + continue;
> > +
> > + WARN_ON_ONCE(!trylock_page(page));
> > + ++*mpages;
> > + }
> > +
> > + return 0;
> > +
> > +free_pages:
> > + for (i = 0; i < npages; ++i) {
> > + struct page *page = migrate_pfn_to_page(mpfn[i]);
> > +
> > + if (!page)
> > + continue;
> > +
> > + put_page(page);
> > + mpfn[i] = 0;
> > + }
> > + return -ENOMEM;
> > +}
> > +
> > +/**
> > + * drm_pagemap_evict_to_ram() - Evict GPU SVM range to RAM
> > + * @devmem_allocation: Pointer to the device memory allocation
> > + *
> > + * Similar to __drm_pagemap_migrate_to_ram but does not require
> > mmap lock and
> > + * migration done via migrate_device_* functions.
> > + *
> > + * Return: 0 on success, negative error code on failure.
> > + */
> > +int drm_pagemap_evict_to_ram(struct drm_pagemap_devmem
> > *devmem_allocation)
> > +{
> > + const struct drm_pagemap_devmem_ops *ops =
> > devmem_allocation->ops;
> > + unsigned long npages, mpages = 0;
> > + struct page **pages;
> > + unsigned long *src, *dst;
> > + dma_addr_t *dma_addr;
> > + void *buf;
> > + int i, err = 0;
> > + unsigned int retry_count = 2;
> > +
> > + npages = devmem_allocation->size >> PAGE_SHIFT;
> > +
> > +retry:
> > + if (!mmget_not_zero(devmem_allocation->mm))
> > + return -EFAULT;
> > +
> > + buf = kvcalloc(npages, 2 * sizeof(*src) +
> > sizeof(*dma_addr) +
> > + sizeof(*pages), GFP_KERNEL);
> > + if (!buf) {
> > + err = -ENOMEM;
> > + goto err_out;
> > + }
> > + src = buf;
> > + dst = buf + (sizeof(*src) * npages);
> > + dma_addr = buf + (2 * sizeof(*src) * npages);
> > + pages = buf + (2 * sizeof(*src) + sizeof(*dma_addr)) *
> > npages;
> > +
> > + err = ops->populate_devmem_pfn(devmem_allocation, npages,
> > src);
> > + if (err)
> > + goto err_free;
> > +
> > + err = migrate_device_pfns(src, npages);
> > + if (err)
> > + goto err_free;
> > +
> > + err = drm_pagemap_migrate_populate_ram_pfn(NULL, NULL,
> > npages, &mpages,
> > + src, dst, 0);
> > + if (err || !mpages)
> > + goto err_finalize;
> > +
> > + err = drm_pagemap_migrate_map_pages(devmem_allocation-
> > >dev, dma_addr,
> > + dst, npages,
> > DMA_FROM_DEVICE);
> > + if (err)
> > + goto err_finalize;
> > +
> > + for (i = 0; i < npages; ++i)
> > + pages[i] = migrate_pfn_to_page(src[i]);
> > +
> > + err = ops->copy_to_ram(pages, dma_addr, npages);
> > + if (err)
> > + goto err_finalize;
> > +
> > +err_finalize:
> > + if (err)
> > + drm_pagemap_migration_unlock_put_pages(npages,
> > dst);
> > + migrate_device_pages(src, dst, npages);
> > + migrate_device_finalize(src, dst, npages);
> > + drm_pagemap_migrate_unmap_pages(devmem_allocation->dev,
> > dma_addr, npages,
> > + DMA_FROM_DEVICE);
> > +err_free:
> > + kvfree(buf);
> > +err_out:
> > + mmput_async(devmem_allocation->mm);
> > +
> > + if (completion_done(&devmem_allocation->detached))
> > + return 0;
> > +
> > + if (retry_count--) {
> > + cond_resched();
> > + goto retry;
> > + }
> > +
> > + return err ?: -EBUSY;
> > +}
> > +EXPORT_SYMBOL_GPL(drm_pagemap_evict_to_ram);
> > +
> > +/**
> > + * __drm_pagemap_migrate_to_ram() - Migrate GPU SVM range to RAM
> > (internal)
> > + * @vas: Pointer to the VM area structure
> > + * @device_private_page_owner: Device private pages owner
> > + * @page: Pointer to the page for fault handling (can be NULL)
> > + * @fault_addr: Fault address
> > + * @size: Size of migration
> > + *
> > + * This internal function performs the migration of the specified
> > GPU SVM range
> > + * to RAM. It sets up the migration, populates + dma maps RAM
> > PFNs, and
> > + * invokes the driver-specific operations for migration to RAM.
> > + *
> > + * Return: 0 on success, negative error code on failure.
> > + */
> > +static int __drm_pagemap_migrate_to_ram(struct vm_area_struct
> > *vas,
> > + void
> > *device_private_page_owner,
> > + struct page *page,
> > + unsigned long fault_addr,
> > + unsigned long size)
> > +{
> > + struct migrate_vma migrate = {
> > + .vma = vas,
> > + .pgmap_owner = device_private_page_owner,
> > + .flags =
> > MIGRATE_VMA_SELECT_DEVICE_PRIVATE |
> > + MIGRATE_VMA_SELECT_DEVICE_COHERENT,
> > + .fault_page = page,
> > + };
> > + struct drm_pagemap_zdd *zdd;
> > + const struct drm_pagemap_devmem_ops *ops;
> > + struct device *dev = NULL;
> > + unsigned long npages, mpages = 0;
> > + struct page **pages;
> > + dma_addr_t *dma_addr;
> > + unsigned long start, end;
> > + void *buf;
> > + int i, err = 0;
> > +
> > + if (page) {
> > + zdd = page->zone_device_data;
> > + if (time_before64(get_jiffies_64(),
> > + zdd->devmem_allocation-
> > >timeslice_expiration))
> > + return 0;
> > + }
> > +
> > + start = ALIGN_DOWN(fault_addr, size);
> > + end = ALIGN(fault_addr + 1, size);
> > +
> > + /* Corner where VMA area struct has been partially
> > unmapped */
> > + if (start < vas->vm_start)
> > + start = vas->vm_start;
> > + if (end > vas->vm_end)
> > + end = vas->vm_end;
> > +
> > + migrate.start = start;
> > + migrate.end = end;
> > + npages = npages_in_range(start, end);
> > +
> > + buf = kvcalloc(npages, 2 * sizeof(*migrate.src) +
> > sizeof(*dma_addr) +
> > + sizeof(*pages), GFP_KERNEL);
> > + if (!buf) {
> > + err = -ENOMEM;
> > + goto err_out;
> > + }
> > + dma_addr = buf + (2 * sizeof(*migrate.src) * npages);
> > + pages = buf + (2 * sizeof(*migrate.src) +
> > sizeof(*dma_addr)) * npages;
> > +
> > + migrate.vma = vas;
> > + migrate.src = buf;
> > + migrate.dst = migrate.src + npages;
> > +
> > + err = migrate_vma_setup(&migrate);
> > + if (err)
> > + goto err_free;
> > +
> > + /* Raced with another CPU fault, nothing to do */
> > + if (!migrate.cpages)
> > + goto err_free;
> > +
> > + if (!page) {
> > + for (i = 0; i < npages; ++i) {
> > + if (!(migrate.src[i] &
> > MIGRATE_PFN_MIGRATE))
> > + continue;
> > +
> > + page =
> > migrate_pfn_to_page(migrate.src[i]);
> > + break;
> > + }
> > +
> > + if (!page)
> > + goto err_finalize;
> > + }
> > + zdd = page->zone_device_data;
> > + ops = zdd->devmem_allocation->ops;
> > + dev = zdd->devmem_allocation->dev;
> > +
> > + err = drm_pagemap_migrate_populate_ram_pfn(vas, page,
> > npages, &mpages,
> > + migrate.src,
> > migrate.dst,
> > + start);
> > + if (err)
> > + goto err_finalize;
> > +
> > + err = drm_pagemap_migrate_map_pages(dev, dma_addr,
> > migrate.dst, npages,
> > + DMA_FROM_DEVICE);
> > + if (err)
> > + goto err_finalize;
> > +
> > + for (i = 0; i < npages; ++i)
> > + pages[i] = migrate_pfn_to_page(migrate.src[i]);
> > +
> > + err = ops->copy_to_ram(pages, dma_addr, npages);
> > + if (err)
> > + goto err_finalize;
> > +
> > +err_finalize:
> > + if (err)
> > + drm_pagemap_migration_unlock_put_pages(npages,
> > migrate.dst);
> > + migrate_vma_pages(&migrate);
> > + migrate_vma_finalize(&migrate);
> > + if (dev)
> > + drm_pagemap_migrate_unmap_pages(dev, dma_addr,
> > npages,
> > + DMA_FROM_DEVICE);
> > +err_free:
> > + kvfree(buf);
> > +err_out:
> > +
> > + return err;
> > +}
> > +
> > +/**
> > + * drm_pagemap_page_free() - Put GPU SVM zone device data
> > associated with a page
> > + * @page: Pointer to the page
> > + *
> > + * This function is a callback used to put the GPU SVM zone device
> > data
> > + * associated with a page when it is being released.
> > + */
> > +static void drm_pagemap_page_free(struct page *page)
> > +{
> > + drm_pagemap_zdd_put(page->zone_device_data);
> > +}
> > +
> > +/**
> > + * drm_pagemap_migrate_to_ram() - Migrate a virtual range to RAM
> > (page fault handler)
> > + * @vmf: Pointer to the fault information structure
> > + *
> > + * This function is a page fault handler used to migrate a virtual
> > range
> > + * to ram. The device memory allocation in which the device page
> > is found is
> > + * migrated in its entirety.
> > + *
> > + * Returns:
> > + * VM_FAULT_SIGBUS on failure, 0 on success.
> > + */
> > +static vm_fault_t drm_pagemap_migrate_to_ram(struct vm_fault *vmf)
> > +{
> > + struct drm_pagemap_zdd *zdd = vmf->page->zone_device_data;
> > + int err;
> > +
> > + err = __drm_pagemap_migrate_to_ram(vmf->vma,
> > + zdd-
> > >device_private_page_owner,
> > + vmf->page, vmf-
> > >address,
> > + zdd->devmem_allocation-
> > >size);
> > +
> > + return err ? VM_FAULT_SIGBUS : 0;
> > +}
> > +
> > +static const struct dev_pagemap_ops drm_pagemap_pagemap_ops = {
> > + .page_free = drm_pagemap_page_free,
> > + .migrate_to_ram = drm_pagemap_migrate_to_ram,
> > +};
> > +
> > +/**
> > + * drm_pagemap_pagemap_ops_get() - Retrieve GPU SVM device page
> > map operations
> > + *
> > + * Returns:
> > + * Pointer to the GPU SVM device page map operations structure.
> > + */
> > +const struct dev_pagemap_ops *drm_pagemap_pagemap_ops_get(void)
> > +{
> > + return &drm_pagemap_pagemap_ops;
> > +}
> > +EXPORT_SYMBOL_GPL(drm_pagemap_pagemap_ops_get);
> > +
> > +/**
> > + * drm_pagemap_devmem_init() - Initialize a drm_pagemap device
> > memory allocation
> > + *
> > + * @devmem_allocation: The struct drm_pagemap_devmem to
> > initialize.
> > + * @dev: Pointer to the device structure which device memory
> > allocation belongs to
> > + * @mm: Pointer to the mm_struct for the address space
> > + * @ops: Pointer to the operations structure for GPU SVM device
> > memory
> > + * @dpagemap: The struct drm_pagemap we're allocating from.
> > + * @size: Size of device memory allocation
> > + */
> > +void drm_pagemap_devmem_init(struct drm_pagemap_devmem
> > *devmem_allocation,
> > + struct device *dev, struct mm_struct
> > *mm,
> > + const struct drm_pagemap_devmem_ops
> > *ops,
> > + struct drm_pagemap *dpagemap, size_t
> > size)
> > +{
> > + init_completion(&devmem_allocation->detached);
> > + devmem_allocation->dev = dev;
> > + devmem_allocation->mm = mm;
> > + devmem_allocation->ops = ops;
> > + devmem_allocation->dpagemap = dpagemap;
> > + devmem_allocation->size = size;
> > +}
> > +EXPORT_SYMBOL_GPL(drm_pagemap_devmem_init);
> > +
> > +/**
> > + * drm_pagemap_page_to_dpagemap() - Return a pointer the
> > drm_pagemap of a page
> > + * @page: The struct page.
> > + *
> > + * Return: A pointer to the struct drm_pagemap of a device private
> > page that
> > + * was populated from the struct drm_pagemap. If the page was
> > *not* populated
> > + * from a struct drm_pagemap, the result is undefined and the
> > function call
> > + * may result in dereferencing and invalid address.
> > + */
> > +struct drm_pagemap *drm_pagemap_page_to_dpagemap(struct page
> > *page)
> > +{
> > + struct drm_pagemap_zdd *zdd = page->zone_device_data;
> > +
> > + return zdd->devmem_allocation->dpagemap;
> > +}
> > +EXPORT_SYMBOL_GPL(drm_pagemap_page_to_dpagemap);
> > diff --git a/drivers/gpu/drm/xe/Kconfig
> > b/drivers/gpu/drm/xe/Kconfig
> > index 98b46c534278..c7c71734460b 100644
> > --- a/drivers/gpu/drm/xe/Kconfig
> > +++ b/drivers/gpu/drm/xe/Kconfig
> > @@ -87,14 +87,16 @@ config DRM_XE_GPUSVM
> >
> > If in doubut say "Y".
> >
> > -config DRM_XE_DEVMEM_MIRROR
> > - bool "Enable device memory mirror"
> > +config DRM_XE_PAGEMAP
> > + bool "Enable device memory pool for SVM"
> > depends on DRM_XE_GPUSVM
> > select GET_FREE_REGION
> > default y
> > help
> > - Disable this option only if you want to compile out
> > without device
> > - memory mirror. Will reduce KMD memory footprint when
> > disabled.
> > + Disable this option only if you don't want to expose
> > local device
> > + memory for SVM. Will reduce KMD memory footprint when
> > disabled.
> > +
> > + If in doubut say "Y".
> >
> > config DRM_XE_FORCE_PROBE
> > string "Force probe xe for selected Intel hardware IDs"
> > diff --git a/drivers/gpu/drm/xe/xe_bo_types.h
> > b/drivers/gpu/drm/xe/xe_bo_types.h
> > index eb5e83c5f233..e0efaf23d051 100644
> > --- a/drivers/gpu/drm/xe/xe_bo_types.h
> > +++ b/drivers/gpu/drm/xe/xe_bo_types.h
> > @@ -86,7 +86,7 @@ struct xe_bo {
> > u16 cpu_caching;
> >
> > /** @devmem_allocation: SVM device memory allocation */
> > - struct drm_gpusvm_devmem devmem_allocation;
> > + struct drm_pagemap_devmem devmem_allocation;
> >
> > /** @vram_userfault_link: Link into
> > @mem_access.vram_userfault.list */
> > struct list_head vram_userfault_link;
> > diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> > b/drivers/gpu/drm/xe/xe_device_types.h
> > index b93c04466637..67b7f733dd69 100644
> > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > @@ -104,7 +104,7 @@ struct xe_vram_region {
> > void __iomem *mapping;
> > /** @ttm: VRAM TTM manager */
> > struct xe_ttm_vram_mgr ttm;
> > -#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
> > +#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> > /** @pagemap: Used to remap device memory as ZONE_DEVICE
> > */
> > struct dev_pagemap pagemap;
> > /**
> > diff --git a/drivers/gpu/drm/xe/xe_svm.c
> > b/drivers/gpu/drm/xe/xe_svm.c
> > index f27fb9b588de..e161ce3e67a1 100644
> > --- a/drivers/gpu/drm/xe/xe_svm.c
> > +++ b/drivers/gpu/drm/xe/xe_svm.c
> > @@ -329,7 +329,7 @@ static void
> > xe_svm_garbage_collector_work_func(struct work_struct *w)
> > up_write(&vm->lock);
> > }
> >
> > -#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
> > +#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> >
> > static struct xe_vram_region *page_to_vr(struct page *page)
> > {
> > @@ -517,12 +517,12 @@ static int xe_svm_copy_to_ram(struct page
> > **pages, dma_addr_t *dma_addr,
> > return xe_svm_copy(pages, dma_addr, npages,
> > XE_SVM_COPY_TO_SRAM);
> > }
> >
> > -static struct xe_bo *to_xe_bo(struct drm_gpusvm_devmem
> > *devmem_allocation)
> > +static struct xe_bo *to_xe_bo(struct drm_pagemap_devmem
> > *devmem_allocation)
> > {
> > return container_of(devmem_allocation, struct xe_bo,
> > devmem_allocation);
> > }
> >
> > -static void xe_svm_devmem_release(struct drm_gpusvm_devmem
> > *devmem_allocation)
> > +static void xe_svm_devmem_release(struct drm_pagemap_devmem
> > *devmem_allocation)
> > {
> > struct xe_bo *bo = to_xe_bo(devmem_allocation);
> >
> > @@ -539,7 +539,7 @@ static struct drm_buddy *tile_to_buddy(struct
> > xe_tile *tile)
> > return &tile->mem.vram.ttm.mm;
> > }
> >
> > -static int xe_svm_populate_devmem_pfn(struct drm_gpusvm_devmem
> > *devmem_allocation,
> > +static int xe_svm_populate_devmem_pfn(struct drm_pagemap_devmem
> > *devmem_allocation,
> > unsigned long npages,
> > unsigned long *pfn)
> > {
> > struct xe_bo *bo = to_xe_bo(devmem_allocation);
> > @@ -562,7 +562,7 @@ static int xe_svm_populate_devmem_pfn(struct
> > drm_gpusvm_devmem *devmem_allocatio
> > return 0;
> > }
> >
> > -static const struct drm_gpusvm_devmem_ops gpusvm_devmem_ops = {
> > +static const struct drm_pagemap_devmem_ops dpagemap_devmem_ops = {
> > .devmem_release = xe_svm_devmem_release,
> > .populate_devmem_pfn = xe_svm_populate_devmem_pfn,
> > .copy_to_devmem = xe_svm_copy_to_devmem,
> > @@ -714,7 +714,7 @@ u64 xe_svm_find_vma_start(struct xe_vm *vm, u64
> > start, u64 end, struct xe_vma *v
> > min(end,
> > xe_vma_end(vma)));
> > }
> >
> > -#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
> > +#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> > static struct xe_vram_region *tile_to_vr(struct xe_tile *tile)
> > {
> > return &tile->mem.vram;
> > @@ -742,6 +742,9 @@ int xe_svm_alloc_vram(struct xe_vm *vm, struct
> > xe_tile *tile,
> > ktime_t end = 0;
> > int err;
> >
> > + if (!range->base.flags.migrate_devmem)
> > + return -EINVAL;
> > +
> > range_debug(range, "ALLOCATE VRAM");
> >
> > if (!mmget_not_zero(mm))
> > @@ -761,19 +764,23 @@ int xe_svm_alloc_vram(struct xe_vm *vm,
> > struct xe_tile *tile,
> > goto unlock;
> > }
> >
> > - drm_gpusvm_devmem_init(&bo->devmem_allocation,
> > - vm->xe->drm.dev, mm,
> > - &gpusvm_devmem_ops,
> > - &tile->mem.vram.dpagemap,
> > - xe_svm_range_size(range));
> > + drm_pagemap_devmem_init(&bo->devmem_allocation,
> > + vm->xe->drm.dev, mm,
> > + &dpagemap_devmem_ops,
> > + &tile->mem.vram.dpagemap,
> > + xe_svm_range_size(range));
> >
> > blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)-
> > >blocks;
> > list_for_each_entry(block, blocks, link)
> > block->private = vr;
> >
> > xe_bo_get(bo);
> > - err = drm_gpusvm_migrate_to_devmem(&vm->svm.gpusvm,
> > &range->base,
> > - &bo->devmem_allocation,
> > ctx);
> > + err = drm_pagemap_migrate_to_devmem(&bo-
> > >devmem_allocation,
> > + mm,
> > +
> > xe_svm_range_start(range),
> > +
> > xe_svm_range_end(range),
> > + ctx->timeslice_ms,
> > + xe_svm_devm_owner(vm-
> > >xe));
> > if (err)
> > xe_svm_devmem_release(&bo->devmem_allocation);
> >
> > @@ -848,13 +855,13 @@ int xe_svm_handle_pagefault(struct xe_vm *vm,
> > struct xe_vma *vma,
> > struct drm_gpusvm_ctx ctx = {
> > .read_only = xe_vma_read_only(vma),
> > .devmem_possible = IS_DGFX(vm->xe) &&
> > - IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR),
> > - .check_pages_threshold = IS_DGFX(vm->xe) &&
> > - IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR) ?
> > SZ_64K : 0,
> > + IS_ENABLED(CONFIG_DRM_XE_PAGEMAP),
> > + .check_pages_threshold = IS_DGFX(vm->xe)
> > &&
> > + IS_ENABLED(CONFIG_DRM_XE_PAGEMAP) ? SZ_64K
> > : 0,
> > .devmem_only = atomic && IS_DGFX(vm->xe) &&
> > - IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR),
> > + IS_ENABLED(CONFIG_DRM_XE_PAGEMAP),
> > .timeslice_ms = atomic && IS_DGFX(vm->xe) &&
> > - IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR) ?
> > + IS_ENABLED(CONFIG_DRM_XE_PAGEMAP) ?
> > vm->xe->atomic_svm_timeslice_ms : 0,
> > };
> > struct xe_svm_range *range;
> > @@ -992,7 +999,7 @@ bool xe_svm_has_mapping(struct xe_vm *vm, u64
> > start, u64 end)
> > */
> > int xe_svm_bo_evict(struct xe_bo *bo)
> > {
> > - return drm_gpusvm_evict_to_ram(&bo->devmem_allocation);
> > + return drm_pagemap_evict_to_ram(&bo->devmem_allocation);
> > }
> >
> > /**
> > @@ -1045,7 +1052,7 @@ int xe_svm_range_get_pages(struct xe_vm *vm,
> > struct xe_svm_range *range,
> > return err;
> > }
> >
> > -#if IS_ENABLED(CONFIG_DRM_XE_DEVMEM_MIRROR)
> > +#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> >
> > static struct drm_pagemap_device_addr
> > xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap,
> > @@ -1102,7 +1109,7 @@ int xe_devm_add(struct xe_tile *tile, struct
> > xe_vram_region *vr)
> > vr->pagemap.range.start = res->start;
> > vr->pagemap.range.end = res->end;
> > vr->pagemap.nr_range = 1;
> > - vr->pagemap.ops = drm_gpusvm_pagemap_ops_get();
> > + vr->pagemap.ops = drm_pagemap_pagemap_ops_get();
> > vr->pagemap.owner = xe_svm_devm_owner(xe);
> > addr = devm_memremap_pages(dev, &vr->pagemap);
> >
> > diff --git a/include/drm/drm_gpusvm.h b/include/drm/drm_gpusvm.h
> > index 6a5156476bf4..4aedc5423aff 100644
> > --- a/include/drm/drm_gpusvm.h
> > +++ b/include/drm/drm_gpusvm.h
> > @@ -16,91 +16,9 @@ struct drm_gpusvm;
> > struct drm_gpusvm_notifier;
> > struct drm_gpusvm_ops;
> > struct drm_gpusvm_range;
> > -struct drm_gpusvm_devmem;
> > struct drm_pagemap;
> > struct drm_pagemap_device_addr;
> >
> > -/**
> > - * struct drm_gpusvm_devmem_ops - Operations structure for GPU SVM
> > device memory
> > - *
> > - * This structure defines the operations for GPU Shared Virtual
> > Memory (SVM)
> > - * device memory. These operations are provided by the GPU driver
> > to manage device memory
> > - * allocations and perform operations such as migration between
> > device memory and system
> > - * RAM.
> > - */
> > -struct drm_gpusvm_devmem_ops {
> > - /**
> > - * @devmem_release: Release device memory allocation
> > (optional)
> > - * @devmem_allocation: device memory allocation
> > - *
> > - * Release device memory allocation and drop a reference
> > to device
> > - * memory allocation.
> > - */
> > - void (*devmem_release)(struct drm_gpusvm_devmem
> > *devmem_allocation);
> > -
> > - /**
> > - * @populate_devmem_pfn: Populate device memory PFN
> > (required for migration)
> > - * @devmem_allocation: device memory allocation
> > - * @npages: Number of pages to populate
> > - * @pfn: Array of page frame numbers to populate
> > - *
> > - * Populate device memory page frame numbers (PFN).
> > - *
> > - * Return: 0 on success, a negative error code on failure.
> > - */
> > - int (*populate_devmem_pfn)(struct drm_gpusvm_devmem
> > *devmem_allocation,
> > - unsigned long npages, unsigned
> > long *pfn);
> > -
> > - /**
> > - * @copy_to_devmem: Copy to device memory (required for
> > migration)
> > - * @pages: Pointer to array of device memory pages
> > (destination)
> > - * @dma_addr: Pointer to array of DMA addresses (source)
> > - * @npages: Number of pages to copy
> > - *
> > - * Copy pages to device memory.
> > - *
> > - * Return: 0 on success, a negative error code on failure.
> > - */
> > - int (*copy_to_devmem)(struct page **pages,
> > - dma_addr_t *dma_addr,
> > - unsigned long npages);
> > -
> > - /**
> > - * @copy_to_ram: Copy to system RAM (required for
> > migration)
> > - * @pages: Pointer to array of device memory pages
> > (source)
> > - * @dma_addr: Pointer to array of DMA addresses
> > (destination)
> > - * @npages: Number of pages to copy
> > - *
> > - * Copy pages to system RAM.
> > - *
> > - * Return: 0 on success, a negative error code on failure.
> > - */
> > - int (*copy_to_ram)(struct page **pages,
> > - dma_addr_t *dma_addr,
> > - unsigned long npages);
> > -};
> > -
> > -/**
> > - * struct drm_gpusvm_devmem - Structure representing a GPU SVM
> > device memory allocation
> > - *
> > - * @dev: Pointer to the device structure which device memory
> > allocation belongs to
> > - * @mm: Pointer to the mm_struct for the address space
> > - * @detached: device memory allocations is detached from device
> > pages
> > - * @ops: Pointer to the operations structure for GPU SVM device
> > memory
> > - * @dpagemap: The struct drm_pagemap of the pages this allocation
> > belongs to.
> > - * @size: Size of device memory allocation
> > - * @timeslice_expiration: Timeslice expiration in jiffies
> > - */
> > -struct drm_gpusvm_devmem {
> > - struct device *dev;
> > - struct mm_struct *mm;
> > - struct completion detached;
> > - const struct drm_gpusvm_devmem_ops *ops;
> > - struct drm_pagemap *dpagemap;
> > - size_t size;
> > - u64 timeslice_expiration;
> > -};
> > -
> > /**
> > * struct drm_gpusvm_ops - Operations structure for GPU SVM
> > *
> > @@ -361,15 +279,6 @@ void drm_gpusvm_range_unmap_pages(struct
> > drm_gpusvm *gpusvm,
> > struct drm_gpusvm_range *range,
> > const struct drm_gpusvm_ctx
> > *ctx);
> >
> > -int drm_gpusvm_migrate_to_devmem(struct drm_gpusvm *gpusvm,
> > - struct drm_gpusvm_range *range,
> > - struct drm_gpusvm_devmem
> > *devmem_allocation,
> > - const struct drm_gpusvm_ctx
> > *ctx);
> > -
> > -int drm_gpusvm_evict_to_ram(struct drm_gpusvm_devmem
> > *devmem_allocation);
> > -
> > -const struct dev_pagemap_ops *drm_gpusvm_pagemap_ops_get(void);
> > -
> > bool drm_gpusvm_has_mapping(struct drm_gpusvm *gpusvm, unsigned
> > long start,
> > unsigned long end);
> >
> > @@ -380,11 +289,6 @@ drm_gpusvm_range_find(struct
> > drm_gpusvm_notifier *notifier, unsigned long start,
> > void drm_gpusvm_range_set_unmapped(struct drm_gpusvm_range *range,
> > const struct mmu_notifier_range
> > *mmu_range);
> >
> > -void drm_gpusvm_devmem_init(struct drm_gpusvm_devmem
> > *devmem_allocation,
> > - struct device *dev, struct mm_struct
> > *mm,
> > - const struct drm_gpusvm_devmem_ops
> > *ops,
> > - struct drm_pagemap *dpagemap, size_t
> > size);
> > -
> > #ifdef CONFIG_LOCKDEP
> > /**
> > * drm_gpusvm_driver_set_lock() - Set the lock protecting accesses
> > to GPU SVM
> > diff --git a/include/drm/drm_pagemap.h b/include/drm/drm_pagemap.h
> > index 202c157ff4d7..dabc9c365df4 100644
> > --- a/include/drm/drm_pagemap.h
> > +++ b/include/drm/drm_pagemap.h
> > @@ -7,6 +7,7 @@
> > #include <linux/types.h>
> >
> > struct drm_pagemap;
> > +struct drm_pagemap_zdd;
> > struct device;
> >
> > /**
> > @@ -104,4 +105,104 @@ struct drm_pagemap {
> > struct device *dev;
> > };
> >
> > +struct drm_pagemap_devmem;
> > +
> > +/**
> > + * struct drm_pagemap_devmem_ops - Operations structure for GPU
> > SVM device memory
> > + *
> > + * This structure defines the operations for GPU Shared Virtual
> > Memory (SVM)
> > + * device memory. These operations are provided by the GPU driver
> > to manage device memory
> > + * allocations and perform operations such as migration between
> > device memory and system
> > + * RAM.
> > + */
> > +struct drm_pagemap_devmem_ops {
> > + /**
> > + * @devmem_release: Release device memory allocation
> > (optional)
> > + * @devmem_allocation: device memory allocation
> > + *
> > + * Release device memory allocation and drop a reference
> > to device
> > + * memory allocation.
> > + */
> > + void (*devmem_release)(struct drm_pagemap_devmem
> > *devmem_allocation);
> > +
> > + /**
> > + * @populate_devmem_pfn: Populate device memory PFN
> > (required for migration)
> > + * @devmem_allocation: device memory allocation
> > + * @npages: Number of pages to populate
> > + * @pfn: Array of page frame numbers to populate
> > + *
> > + * Populate device memory page frame numbers (PFN).
> > + *
> > + * Return: 0 on success, a negative error code on failure.
> > + */
> > + int (*populate_devmem_pfn)(struct drm_pagemap_devmem
> > *devmem_allocation,
> > + unsigned long npages, unsigned
> > long *pfn);
> > +
> > + /**
> > + * @copy_to_devmem: Copy to device memory (required for
> > migration)
> > + * @pages: Pointer to array of device memory pages
> > (destination)
> > + * @dma_addr: Pointer to array of DMA addresses (source)
> > + * @npages: Number of pages to copy
> > + *
> > + * Copy pages to device memory.
> > + *
> > + * Return: 0 on success, a negative error code on failure.
> > + */
> > + int (*copy_to_devmem)(struct page **pages,
> > + dma_addr_t *dma_addr,
> > + unsigned long npages);
> > +
> > + /**
> > + * @copy_to_ram: Copy to system RAM (required for
> > migration)
> > + * @pages: Pointer to array of device memory pages
> > (source)
> > + * @dma_addr: Pointer to array of DMA addresses
> > (destination)
> > + * @npages: Number of pages to copy
> > + *
> > + * Copy pages to system RAM.
> > + *
> > + * Return: 0 on success, a negative error code on failure.
> > + */
> > + int (*copy_to_ram)(struct page **pages,
> > + dma_addr_t *dma_addr,
> > + unsigned long npages);
> > +};
> > +
> > +/**
> > + * struct drm_pagemap_devmem - Structure representing a GPU SVM
> > device memory allocation
> > + *
> > + * @dev: Pointer to the device structure which device memory
> > allocation belongs to
> > + * @mm: Pointer to the mm_struct for the address space
> > + * @detached: device memory allocations is detached from device
> > pages
> > + * @ops: Pointer to the operations structure for GPU SVM device
> > memory
> > + * @dpagemap: The struct drm_pagemap of the pages this allocation
> > belongs to.
> > + * @size: Size of device memory allocation
> > + * @timeslice_expiration: Timeslice expiration in jiffies
> > + */
> > +struct drm_pagemap_devmem {
> > + struct device *dev;
> > + struct mm_struct *mm;
> > + struct completion detached;
> > + const struct drm_pagemap_devmem_ops *ops;
> > + struct drm_pagemap *dpagemap;
> > + size_t size;
> > + u64 timeslice_expiration;
> > +};
> > +
> > +int drm_pagemap_migrate_to_devmem(struct drm_pagemap_devmem
> > *devmem_allocation,
> > + struct mm_struct *mm,
> > + unsigned long start, unsigned
> > long end,
> > + unsigned long timeslice_ms,
> > + void *pgmap_owner);
> > +
> > +int drm_pagemap_evict_to_ram(struct drm_pagemap_devmem
> > *devmem_allocation);
> > +
> > +const struct dev_pagemap_ops *drm_pagemap_pagemap_ops_get(void);
> > +
> > +struct drm_pagemap *drm_pagemap_page_to_dpagemap(struct page
> > *page);
> > +
> > +void drm_pagemap_devmem_init(struct drm_pagemap_devmem
> > *devmem_allocation,
> > + struct device *dev, struct mm_struct
> > *mm,
> > + const struct drm_pagemap_devmem_ops
> > *ops,
> > + struct drm_pagemap *dpagemap, size_t
> > size);
> > +
> > #endif
> > --
> > 2.49.0
> >
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 3/3] drm/xe: Implement and use the drm_pagemap populate_mm op
2025-06-05 22:16 ` Matthew Brost
@ 2025-06-13 10:16 ` Thomas Hellström
0 siblings, 0 replies; 16+ messages in thread
From: Thomas Hellström @ 2025-06-13 10:16 UTC (permalink / raw)
To: Matthew Brost
Cc: intel-xe, dri-devel, himal.prasad.ghimiray, apopple, airlied,
Simona Vetter, felix.kuehling, Christian König, dakr,
Mrozek, Michal, Joonas Lahtinen
On Thu, 2025-06-05 at 15:16 -0700, Matthew Brost wrote:
> On Wed, Jun 04, 2025 at 11:35:36AM +0200, Thomas Hellström wrote:
> > Add runtime PM since we might call populate_mm on a foreign device.
> > Also create the VRAM bos as ttm_bo_type_kernel. This avoids the
> > initial clearing and the creation of an mmap handle.
> >
>
> I didn't read this part - skipping the initial clears. Discussed this
> on
> a private chat but to recap we need initial clears as copies for
> non-faulted in CPU pages are skipped which could result in another
> processes data being exposed in VRAM. We could only issue a clear if
> a
> non-faulted in page is found in xe_svm_copy or IIRC there was some
> work
> flying around to clear VRAM upon free, not sure if that ever landed -
> I
> believe AMD does clear on free their driver + buddy allocator has the
> concept of dirty blocks.
Thanks for reviewing!
I'll change back to ttm_bo_type_device for now. I think we should
change back to ttm_bo_type_kernel later, also to avoid the mmap offset
allocation.
From my understanding of the migrate docs, we're intended to either
clear (no system pages allocated) or copy (system pages allocated).
So for overall best efficiency I think when we implement clear on free,
we should also have a hint to allow allocation of blocks not
necessarily cleared yet.
Thanks,
Thomas
>
> Matt
>
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> > drivers/gpu/drm/drm_pagemap.c | 1 +
> > drivers/gpu/drm/xe/xe_svm.c | 104 ++++++++++++++++++++----------
> > ----
> > drivers/gpu/drm/xe/xe_svm.h | 10 ++--
> > drivers/gpu/drm/xe/xe_tile.h | 11 ++++
> > drivers/gpu/drm/xe/xe_vm.c | 2 +-
> > 5 files changed, 78 insertions(+), 50 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_pagemap.c
> > b/drivers/gpu/drm/drm_pagemap.c
> > index 25395685a9b8..94619be00d2a 100644
> > --- a/drivers/gpu/drm/drm_pagemap.c
> > +++ b/drivers/gpu/drm/drm_pagemap.c
> > @@ -843,3 +843,4 @@ int drm_pagemap_populate_mm(struct drm_pagemap
> > *dpagemap,
> >
> > return err;
> > }
> > +EXPORT_SYMBOL(drm_pagemap_populate_mm);
> > diff --git a/drivers/gpu/drm/xe/xe_svm.c
> > b/drivers/gpu/drm/xe/xe_svm.c
> > index e161ce3e67a1..a10aab3768d8 100644
> > --- a/drivers/gpu/drm/xe/xe_svm.c
> > +++ b/drivers/gpu/drm/xe/xe_svm.c
> > @@ -3,13 +3,17 @@
> > * Copyright © 2024 Intel Corporation
> > */
> >
> > +#include <drm/drm_drv.h>
> > +
> > #include "xe_bo.h"
> > #include "xe_gt_stats.h"
> > #include "xe_gt_tlb_invalidation.h"
> > #include "xe_migrate.h"
> > #include "xe_module.h"
> > +#include "xe_pm.h"
> > #include "xe_pt.h"
> > #include "xe_svm.h"
> > +#include "xe_tile.h"
> > #include "xe_ttm_vram_mgr.h"
> > #include "xe_vm.h"
> > #include "xe_vm_types.h"
> > @@ -525,8 +529,10 @@ static struct xe_bo *to_xe_bo(struct
> > drm_pagemap_devmem *devmem_allocation)
> > static void xe_svm_devmem_release(struct drm_pagemap_devmem
> > *devmem_allocation)
> > {
> > struct xe_bo *bo = to_xe_bo(devmem_allocation);
> > + struct xe_device *xe = xe_bo_device(bo);
> >
> > xe_bo_put_async(bo);
> > + xe_pm_runtime_put(xe);
> > }
> >
> > static u64 block_offset_to_pfn(struct xe_vram_region *vr, u64
> > offset)
> > @@ -720,76 +726,63 @@ static struct xe_vram_region
> > *tile_to_vr(struct xe_tile *tile)
> > return &tile->mem.vram;
> > }
> >
> > -/**
> > - * xe_svm_alloc_vram()- Allocate device memory pages for range,
> > - * migrating existing data.
> > - * @vm: The VM.
> > - * @tile: tile to allocate vram from
> > - * @range: SVM range
> > - * @ctx: DRM GPU SVM context
> > - *
> > - * Return: 0 on success, error code on failure.
> > - */
> > -int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
> > - struct xe_svm_range *range,
> > - const struct drm_gpusvm_ctx *ctx)
> > +static int xe_drm_pagemap_populate_mm(struct drm_pagemap
> > *dpagemap,
> > + unsigned long start,
> > unsigned long end,
> > + struct mm_struct *mm,
> > + unsigned long timeslice_ms)
> > {
> > - struct mm_struct *mm = vm->svm.gpusvm.mm;
> > + struct xe_tile *tile = container_of(dpagemap,
> > typeof(*tile), mem.vram.dpagemap);
> > + struct xe_device *xe = tile_to_xe(tile);
> > + struct device *dev = xe->drm.dev;
> > struct xe_vram_region *vr = tile_to_vr(tile);
> > struct drm_buddy_block *block;
> > struct list_head *blocks;
> > struct xe_bo *bo;
> > - ktime_t end = 0;
> > - int err;
> > -
> > - if (!range->base.flags.migrate_devmem)
> > - return -EINVAL;
> > + ktime_t time_end = 0;
> > + int err, idx;
> >
> > - range_debug(range, "ALLOCATE VRAM");
> > + if (!drm_dev_enter(&xe->drm, &idx))
> > + return -ENODEV;
> >
> > - if (!mmget_not_zero(mm))
> > - return -EFAULT;
> > - mmap_read_lock(mm);
> > + xe_pm_runtime_get(xe);
> >
> > -retry:
> > - bo = xe_bo_create_locked(tile_to_xe(tile), NULL, NULL,
> > - xe_svm_range_size(range),
> > - ttm_bo_type_device,
> > + retry:
> > + bo = xe_bo_create_locked(tile_to_xe(tile), NULL, NULL, end
> > - start,
> > + ttm_bo_type_kernel,
> > XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> > XE_BO_FLAG_CPU_ADDR_MIRROR);
> > if (IS_ERR(bo)) {
> > err = PTR_ERR(bo);
> > - if (xe_vm_validate_should_retry(NULL, err, &end))
> > + if (xe_vm_validate_should_retry(NULL, err,
> > &time_end))
> > goto retry;
> > - goto unlock;
> > + goto out_pm_put;
> > }
> >
> > - drm_pagemap_devmem_init(&bo->devmem_allocation,
> > - vm->xe->drm.dev, mm,
> > + drm_pagemap_devmem_init(&bo->devmem_allocation, dev, mm,
> > &dpagemap_devmem_ops,
> > &tile->mem.vram.dpagemap,
> > - xe_svm_range_size(range));
> > + end - start);
> >
> > blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)-
> > >blocks;
> > list_for_each_entry(block, blocks, link)
> > block->private = vr;
> >
> > xe_bo_get(bo);
> > - err = drm_pagemap_migrate_to_devmem(&bo-
> > >devmem_allocation,
> > - mm,
> > -
> > xe_svm_range_start(range),
> > -
> > xe_svm_range_end(range),
> > - ctx->timeslice_ms,
> > - xe_svm_devm_owner(vm-
> > >xe));
> > +
> > + /* Ensure the device has a pm ref while there are device
> > pages active. */
> > + xe_pm_runtime_get_noresume(xe);
> > + err = drm_pagemap_migrate_to_devmem(&bo-
> > >devmem_allocation, mm,
> > + start, end,
> > timeslice_ms,
> > +
> > xe_svm_devm_owner(xe));
> > if (err)
> > xe_svm_devmem_release(&bo->devmem_allocation);
> >
> > xe_bo_unlock(bo);
> > xe_bo_put(bo);
> >
> > -unlock:
> > - mmap_read_unlock(mm);
> > - mmput(mm);
> > +out_pm_put:
> > + xe_pm_runtime_put(xe);
> > + drm_dev_exit(idx);
> >
> > return err;
> > }
> > @@ -898,7 +891,7 @@ int xe_svm_handle_pagefault(struct xe_vm *vm,
> > struct xe_vma *vma,
> >
> > if (--migrate_try_count >= 0 &&
> > xe_svm_range_needs_migrate_to_vram(range, vma,
> > IS_DGFX(vm->xe))) {
> > - err = xe_svm_alloc_vram(vm, tile, range, &ctx);
> > + err = xe_svm_alloc_vram(tile, range, &ctx);
> > ctx.timeslice_ms <<= 1; /* Double
> > timeslice if we have to retry */
> > if (err) {
> > if (migrate_try_count || !ctx.devmem_only)
> > {
> > @@ -1054,6 +1047,30 @@ int xe_svm_range_get_pages(struct xe_vm *vm,
> > struct xe_svm_range *range,
> >
> > #if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> >
> > +/**
> > + * xe_svm_alloc_vram()- Allocate device memory pages for range,
> > + * migrating existing data.
> > + * @vm: The VM.
> > + * @tile: tile to allocate vram from
> > + * @range: SVM range
> > + * @ctx: DRM GPU SVM context
> > + *
> > + * Return: 0 on success, error code on failure.
> > + */
> > +int xe_svm_alloc_vram(struct xe_tile *tile, struct xe_svm_range
> > *range,
> > + const struct drm_gpusvm_ctx *ctx)
> > +{
> > + struct drm_pagemap *dpagemap;
> > +
> > + range_debug(range, "ALLOCATE VRAM");
> > +
> > + dpagemap = xe_tile_local_pagemap(tile);
> > + return drm_pagemap_populate_mm(dpagemap,
> > xe_svm_range_start(range),
> > + xe_svm_range_end(range),
> > + range->base.gpusvm->mm,
> > + ctx->timeslice_ms);
> > +}
> > +
> > static struct drm_pagemap_device_addr
> > xe_drm_pagemap_device_map(struct drm_pagemap *dpagemap,
> > struct device *dev,
> > @@ -1078,6 +1095,7 @@ xe_drm_pagemap_device_map(struct drm_pagemap
> > *dpagemap,
> >
> > static const struct drm_pagemap_ops xe_drm_pagemap_ops = {
> > .device_map = xe_drm_pagemap_device_map,
> > + .populate_mm = xe_drm_pagemap_populate_mm,
> > };
> >
> > /**
> > @@ -1130,7 +1148,7 @@ int xe_devm_add(struct xe_tile *tile, struct
> > xe_vram_region *vr)
> > return 0;
> > }
> > #else
> > -int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
> > +int xe_svm_alloc_vram(struct xe_tile *tile,
> > struct xe_svm_range *range,
> > const struct drm_gpusvm_ctx *ctx)
> > {
> > diff --git a/drivers/gpu/drm/xe/xe_svm.h
> > b/drivers/gpu/drm/xe/xe_svm.h
> > index 19ce4f2754a7..da9a69ea0bb1 100644
> > --- a/drivers/gpu/drm/xe/xe_svm.h
> > +++ b/drivers/gpu/drm/xe/xe_svm.h
> > @@ -70,8 +70,7 @@ int xe_svm_bo_evict(struct xe_bo *bo);
> >
> > void xe_svm_range_debug(struct xe_svm_range *range, const char
> > *operation);
> >
> > -int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
> > - struct xe_svm_range *range,
> > +int xe_svm_alloc_vram(struct xe_tile *tile, struct xe_svm_range
> > *range,
> > const struct drm_gpusvm_ctx *ctx);
> >
> > struct xe_svm_range *xe_svm_range_find_or_insert(struct xe_vm *vm,
> > u64 addr,
> > @@ -237,10 +236,9 @@ void xe_svm_range_debug(struct xe_svm_range
> > *range, const char *operation)
> > {
> > }
> >
> > -static inline
> > -int xe_svm_alloc_vram(struct xe_vm *vm, struct xe_tile *tile,
> > - struct xe_svm_range *range,
> > - const struct drm_gpusvm_ctx *ctx)
> > +static inline int
> > +xe_svm_alloc_vram(struct xe_tile *tile, struct xe_svm_range
> > *range,
> > + const struct drm_gpusvm_ctx *ctx)
> > {
> > return -EOPNOTSUPP;
> > }
> > diff --git a/drivers/gpu/drm/xe/xe_tile.h
> > b/drivers/gpu/drm/xe/xe_tile.h
> > index eb939316d55b..066a3d0cea79 100644
> > --- a/drivers/gpu/drm/xe/xe_tile.h
> > +++ b/drivers/gpu/drm/xe/xe_tile.h
> > @@ -16,4 +16,15 @@ int xe_tile_init(struct xe_tile *tile);
> >
> > void xe_tile_migrate_wait(struct xe_tile *tile);
> >
> > +#if IS_ENABLED(CONFIG_DRM_XE_PAGEMAP)
> > +static inline struct drm_pagemap *xe_tile_local_pagemap(struct
> > xe_tile *tile)
> > +{
> > + return &tile->mem.vram.dpagemap;
> > +}
> > +#else
> > +static inline struct drm_pagemap *xe_tile_local_pagemap(struct
> > xe_tile *tile)
> > +{
> > + return NULL;
> > +}
> > +#endif
> > #endif
> > diff --git a/drivers/gpu/drm/xe/xe_vm.c
> > b/drivers/gpu/drm/xe/xe_vm.c
> > index 7140d8856bad..def493acb4d7 100644
> > --- a/drivers/gpu/drm/xe/xe_vm.c
> > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > @@ -2911,7 +2911,7 @@ static int prefetch_ranges(struct xe_vm *vm,
> > struct xe_vma_op *op)
> >
> > if (xe_svm_range_needs_migrate_to_vram(svm_range,
> > vma, region)) {
> > tile = &vm->xe-
> > >tiles[region_to_mem_type[region] - XE_PL_VRAM0];
> > - err = xe_svm_alloc_vram(vm, tile,
> > svm_range, &ctx);
> > + err = xe_svm_alloc_vram(tile, svm_range,
> > &ctx);
> > if (err) {
> > drm_dbg(&vm->xe->drm, "VRAM
> > allocation failed, retry from userspace, asid=%u, gpusvm=%p,
> > errno=%pe\n",
> > vm->usm.asid, &vm-
> > >svm.gpusvm, ERR_PTR(err));
> > --
> > 2.49.0
> >
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-06-13 10:17 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-04 9:35 [PATCH v2 0/3] drm/gpusvm, drm/pagemap, drm/xe: Restructure migration in preparation for multi-device Thomas Hellström
2025-06-04 9:35 ` [PATCH v2 1/3] drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap Thomas Hellström
2025-06-04 15:45 ` kernel test robot
2025-06-05 22:44 ` Matthew Brost
2025-06-13 10:01 ` Thomas Hellström
2025-06-04 9:35 ` [PATCH v2 2/3] drm/pagemap: Add a populate_mm op Thomas Hellström
2025-06-04 21:06 ` kernel test robot
2025-06-04 22:05 ` Matthew Brost
2025-06-05 7:40 ` Thomas Hellström
2025-06-04 9:35 ` [PATCH v2 3/3] drm/xe: Implement and use the drm_pagemap " Thomas Hellström
2025-06-04 15:04 ` Matthew Brost
2025-06-05 7:37 ` Thomas Hellström
2025-06-05 22:16 ` Matthew Brost
2025-06-13 10:16 ` Thomas Hellström
2025-06-04 10:01 ` [PATCH v2 0/3] drm/gpusvm, drm/pagemap, drm/xe: Restructure migration in preparation for multi-device Christian König
2025-06-04 12:01 ` Thomas Hellström
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).