Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: himal.prasad.ghimiray@intel.com, apopple@nvidia.com,
	airlied@gmail.com, Simona Vetter <simona.vetter@ffwll.ch>,
	felix.kuehling@amd.com, Matthew Brost <matthew.brost@intel.com>,
	dakr@kernel.org, "Mrozek, Michal" <michal.mrozek@intel.com>,
	Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Subject: Re: [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM
Date: Thu, 13 Mar 2025 13:57:53 +0100	[thread overview]
Message-ID: <50a6d33b-1b8f-4a3e-8698-1e4508eb3be0@amd.com> (raw)
In-Reply-To: <466cdc46b81b4e1a51fa1accee8f56487cce1268.camel@linux.intel.com>

Am 13.03.25 um 13:50 schrieb Thomas Hellström:
> Hi, Christian
>
> On Thu, 2025-03-13 at 11:19 +0100, Christian König wrote:
>> Am 12.03.25 um 22:03 schrieb Thomas Hellström:
>>> This RFC implements and requests comments for a way to handle SVM
>>> with multi-device,
>>> typically with fast interconnects. It adds generic code and helpers
>>> in drm, and
>>> device-specific code for xe.
>>>
>>> For SVM, devices set up maps of device-private struct pages, using
>>> a struct dev_pagemap,
>>> The CPU virtual address space (mm), can then be set up using
>>> special page-table entries
>>> to point to such pages, but they can't be accessed directly by the
>>> CPU, but possibly
>>> by other devices using a fast interconnect. This series aims to
>>> provide helpers to
>>> identify pagemaps that take part in such a fast interconnect and to
>>> aid in migrating
>>> between them.
>>>
>>> This is initially done by augmenting the struct dev_pagemap with a
>>> struct drm_pagemap,
>>> and having the struct drm_pagemap implement a "populate_mm" method,
>>> where a region of
>>> the CPU virtual address space (mm) is populated with device_private
>>> pages from the
>>> dev_pagemap associated with the drm_pagemap, migrating data from
>>> system memory or other
>>> devices if necessary. The drm_pagemap_populate_mm() function is
>>> then typically called
>>> from a fault handler, using the struct drm_pagemap pointer of
>>> choice. It could be
>>> referencing a local drm_pagemap or a remote one. The migration is
>>> now completely done
>>> by drm_pagemap callbacks, (typically using a copy-engine local to
>>> the dev_pagemap local
>>> memory).
>> Up till here that makes sense. Maybe not necessary to be put into the
>> DRM layer, but that is an implementation detail.
>>
>>> In addition there are helpers to build a drm_pagemap UAPI using
>>> file-descripors
>>> representing struct drm_pagemaps, and a helper to register devices
>>> with a common
>>> fast interconnect. The UAPI is intended to be private to the
>>> device, but if drivers
>>> agree to identify struct drm_pagemaps by file descriptors one could
>>> in theory
>>> do cross-driver multi-device SVM if a use-case were found.
>> But this completely eludes me.
>>
>> Why would you want an UAPI for representing pagemaps as file
>> descriptors? Isn't it the kernel which enumerates the interconnects
>> of the devices?
>>
>> I mean we somehow need to expose those interconnects between devices
>> to userspace, e.g. like amdgpu does with it's XGMI connectors. But
>> that is static for the hardware (unless HW is hot removed/added) and
>> so I would assume exposed through sysfs.
> Thanks for the feedback.
>
> The idea here is not to expose the interconnects but rather have a way
> for user-space to identify a drm_pagemap and some level of access- and
> lifetime control.

Well that's what I get I just don't get why?

I mean when you want to have the pagemap as optional feature you can turn on and off I would say make that a sysfs file.

It's a global feature anyway and not bound in any way to the file descriptor, isn't it?

> For Xe, If an application wants to use a particular drm_pagemap it
> calls an ioctl:
>
> pagemap_fd = drm_xe_ioctl_pagemap_open(exporting_device_fd,
> memory_region);

Well should userspace deal with physical addresses here, or what exactly is memory_region here?

Regards,
Christian.

>
> And then when it's no longer used
> close(pagemap_fd)
>
> To use it for a memory range, the intended idea is call gpu madvise
> ioctl:
>  
> err = drm_xe_ioctl_gpu_madvise(local_device_fd, range, pagemap_fd);
>
> Now, if there is no fast interconnect between the two, the madvise call
> could just return an error. All this ofc assumes that user-space is
> somehow aware of the fast interconnect topology but how that is exposed
> is beyond the scope of this first series. (Suggestions welcome).
>
> The advantage of the above approach is
> 1) We get some level of access control. If the user doesn't have access
> to the exporting device, he/she can't obtain a pagemap file descriptor.
>
> 2) Lifetime control. The pagemaps are memory hungry, but also take
> considerable time to set up and tear down.
>
> 3) It's a driver-independent approach.
>
> One could ofc use a different approach by feeding the gpu_madvise()
> ioctl with a remote device file descriptor and whatever information is
> needed for the remote device to identify the drm_pagemap. That would
> not be driver independent, though. Not sure how important that is.
>
> /Thomas
>
>
>> Thanks,
>> Christian.
>>
>>> The implementation for the Xe driver uses dynamic pagemaps which
>>> are created on first
>>> use and removed 5s after the last reference is gone. Pagemaps are
>>> revoked on
>>> device unbind, and data is then migrated to system.
>>>
>>> Status:
>>> This is a POC series. It has been tested with an IGT test soon to
>>> be published, with a
>>> DG1 drm_pagemap and a BattleMage SVM client. There is separate work
>>> ongoing for the
>>> gpu_madvise functionality.
>>>
>>> The Xe implementation of the "populate_mm()" callback is
>>> still rudimentary and doesn't migrate from foreign devices. It
>>> should be tuned to do
>>> smarter choices.
>>>
>>> Any feedback appreciated.
>>>
>>> Patch overview:
>>> Patch 1:
>>> - Extends the way the Xe driver can compile out SVM support and
>>> pagemaps.
>>> Patch 2:
>>> - Fixes an existing potential UAF in the Xe SVM code.
>>> Patch 3:
>>> - Introduces the drm_pagemap.c file and moves drm_pagemap
>>> functionality to it.
>>> Patch 4:
>>> - Adds a populate_mm op to drm_pagemap.
>>> Patch 5:
>>> - Implement Xe's version of the populate_mm op.
>>> Patch 6:
>>> - Refcount struct drm_pagemap.
>>> Patch 7:
>>> - Cleanup patch.
>>> Patch 8:
>>> - Add a bo_remove callback for Xe, Used during device unbind.
>>> Patch 9:
>>> - Add a drm_pagemap utility to calculate a common owner structure
>>> Patch 10:
>>> - Adopt GPUSVM to a (sort of) dynamic owner.
>>> Patch 11:
>>> - Xe calculates the dev_private owner using the drm_pagemap
>>> utility.
>>> Patch 12:
>>> - Update the Xe page-table code to handle per range mixed system /
>>> device_private placement.
>>> Patch 13:
>>> - Modify GPUSVM to allow such placements.
>>> Patch 14:
>>> - Add a preferred pagemap to use by the Xe fault handler.
>>> Patch 15:
>>> - Add a utility that converts between drm_pagemaps and file-
>>> descriptors and back.
>>> Patch 16:
>>> - Fix Xe so that also devices without fault capability can publish
>>> drm_pagemaps.
>>> Patch 17:
>>> - Add the devmem_open UAPI, creating a drm_pagemap file descriptor
>>> from a
>>>   (device, region) pair.
>>> Patch 18:
>>> - (Only for POC) Add an GPU madvise prefer_devmem IOCTL.
>>> Patch 19:
>>> - (Only for POC) Implement pcie p2p DMA as a fast interconnect and
>>> test.
>>>
>>> Matthew Brost (1):
>>>   drm/gpusvm, drm/pagemap: Move migration functionality to
>>> drm_pagemap
>>>
>>> Thomas Hellström (18):
>>>   drm/xe: Introduce CONFIG_DRM_XE_GPUSVM
>>>   drm/xe/svm: Fix a potential bo UAF
>>>   drm/pagemap: Add a populate_mm op
>>>   drm/xe: Implement and use the drm_pagemap populate_mm op
>>>   drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap and
>>> manage
>>>     lifetime
>>>   drm/pagemap: Get rid of the struct
>>>     drm_pagemap_zdd::device_private_page_owner field
>>>   drm/xe/bo: Add a bo remove callback
>>>   drm/pagemap_util: Add a utility to assign an owner to a set of
>>>     interconnected gpus
>>>   drm/gpusvm, drm/xe: Move the device private owner to the
>>>     drm_gpusvm_ctx
>>>   drm/xe: Use the drm_pagemap_util helper to get a svm pagemap
>>> owner
>>>   drm/xe: Make the PT code handle placement per PTE rather than per
>>> vma
>>>     / range
>>>   drm/gpusvm: Allow mixed mappings
>>>   drm/xe: Add a preferred dpagemap
>>>   drm/pagemap/util: Add file descriptors pointing to struct
>>> drm_pagemap
>>>   drm/xe/migrate: Allow xe_migrate_vram() also on non-pagefault
>>> capable
>>>     devices
>>>   drm/xe/uapi: Add the devmem_open ioctl
>>>   drm/xe/uapi: HAX: Add the xe_madvise_prefer_devmem IOCTL
>>>   drm/xe: HAX: Use pcie p2p dma to test fast interconnect
>>>
>>>  Documentation/gpu/rfc/gpusvm.rst     |  12 +-
>>>  drivers/gpu/drm/Makefile             |   7 +-
>>>  drivers/gpu/drm/drm_gpusvm.c         | 782 +---------------------
>>>  drivers/gpu/drm/drm_pagemap.c        | 940
>>> +++++++++++++++++++++++++++
>>>  drivers/gpu/drm/drm_pagemap_util.c   | 203 ++++++
>>>  drivers/gpu/drm/xe/Kconfig           |  24 +-
>>>  drivers/gpu/drm/xe/Makefile          |   2 +-
>>>  drivers/gpu/drm/xe/xe_bo.c           |  65 +-
>>>  drivers/gpu/drm/xe/xe_bo.h           |   2 +
>>>  drivers/gpu/drm/xe/xe_bo_types.h     |   2 +-
>>>  drivers/gpu/drm/xe/xe_device.c       |   8 +
>>>  drivers/gpu/drm/xe/xe_device_types.h |  30 +-
>>>  drivers/gpu/drm/xe/xe_migrate.c      |   8 +-
>>>  drivers/gpu/drm/xe/xe_pt.c           | 112 ++--
>>>  drivers/gpu/drm/xe/xe_query.c        |   2 +-
>>>  drivers/gpu/drm/xe/xe_svm.c          | 716 +++++++++++++++++---
>>>  drivers/gpu/drm/xe/xe_svm.h          | 158 ++++-
>>>  drivers/gpu/drm/xe/xe_tile.c         |  20 +-
>>>  drivers/gpu/drm/xe/xe_tile.h         |  33 +
>>>  drivers/gpu/drm/xe/xe_vm.c           |   6 +-
>>>  drivers/gpu/drm/xe/xe_vm_types.h     |   7 +
>>>  include/drm/drm_gpusvm.h             | 102 +--
>>>  include/drm/drm_pagemap.h            | 190 +++++-
>>>  include/drm/drm_pagemap_util.h       |  59 ++
>>>  include/uapi/drm/xe_drm.h            |  39 ++
>>>  25 files changed, 2458 insertions(+), 1071 deletions(-)
>>>  create mode 100644 drivers/gpu/drm/drm_pagemap.c
>>>  create mode 100644 drivers/gpu/drm/drm_pagemap_util.c
>>>  create mode 100644 include/drm/drm_pagemap_util.h
>>>


  reply	other threads:[~2025-03-13 12:58 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
2025-03-12 21:03 ` [RFC PATCH 01/19] drm/xe: Introduce CONFIG_DRM_XE_GPUSVM Thomas Hellström
2025-03-12 21:03 ` [RFC PATCH 02/19] drm/xe/svm: Fix a potential bo UAF Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 03/19] drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 04/19] drm/pagemap: Add a populate_mm op Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 05/19] drm/xe: Implement and use the drm_pagemap " Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 06/19] drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap and manage lifetime Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 07/19] drm/pagemap: Get rid of the struct drm_pagemap_zdd::device_private_page_owner field Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 08/19] drm/xe/bo: Add a bo remove callback Thomas Hellström
2025-03-14 13:05   ` Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 09/19] drm/pagemap_util: Add a utility to assign an owner to a set of interconnected gpus Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 10/19] drm/gpusvm, drm/xe: Move the device private owner to the drm_gpusvm_ctx Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 11/19] drm/xe: Use the drm_pagemap_util helper to get a svm pagemap owner Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 12/19] drm/xe: Make the PT code handle placement per PTE rather than per vma / range Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 13/19] drm/gpusvm: Allow mixed mappings Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 14/19] drm/xe: Add a preferred dpagemap Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 15/19] drm/pagemap/util: Add file descriptors pointing to struct drm_pagemap Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 16/19] drm/xe/migrate: Allow xe_migrate_vram() also on non-pagefault capable devices Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 17/19] drm/xe/uapi: Add the devmem_open ioctl Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 18/19] drm/xe/uapi: HAX: Add the xe_madvise_prefer_devmem IOCTL Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 19/19] drm/xe: HAX: Use pcie p2p dma to test fast interconnect Thomas Hellström
2025-03-12 21:10 ` ✓ CI.Patch_applied: success for drm, drm/xe: Multi-device GPUSVM Patchwork
2025-03-12 21:11 ` ✗ CI.checkpatch: warning " Patchwork
2025-03-12 21:12 ` ✓ CI.KUnit: success " Patchwork
2025-03-12 21:29 ` ✓ CI.Build: " Patchwork
2025-03-12 21:31 ` ✗ CI.Hooks: failure " Patchwork
2025-03-12 21:33 ` ✓ CI.checksparse: success " Patchwork
2025-03-12 22:06 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-03-13 10:19 ` [RFC PATCH 00/19] " Christian König
2025-03-13 12:50   ` Thomas Hellström
2025-03-13 12:57     ` Christian König [this message]
2025-03-13 15:55       ` Thomas Hellström
2025-03-17  9:20       ` Thomas Hellström
2025-03-13 13:24 ` ✗ Xe.CI.Full: failure for " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50a6d33b-1b8f-4a3e-8698-1e4508eb3be0@amd.com \
    --to=christian.koenig@amd.com \
    --cc=airlied@gmail.com \
    --cc=apopple@nvidia.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=matthew.brost@intel.com \
    --cc=michal.mrozek@intel.com \
    --cc=simona.vetter@ffwll.ch \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox