Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: "Christian König" <christian.koenig@amd.com>,
	intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: himal.prasad.ghimiray@intel.com, apopple@nvidia.com,
	airlied@gmail.com,  Simona Vetter <simona.vetter@ffwll.ch>,
	felix.kuehling@amd.com, Matthew Brost <matthew.brost@intel.com>,
		dakr@kernel.org, "Mrozek, Michal" <michal.mrozek@intel.com>,
	Joonas Lahtinen	 <joonas.lahtinen@linux.intel.com>
Subject: Re: [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM
Date: Mon, 17 Mar 2025 10:20:48 +0100	[thread overview]
Message-ID: <0cec6c8ad9ed854300d4533eb8659fb017e76ae6.camel@linux.intel.com> (raw)
In-Reply-To: <50a6d33b-1b8f-4a3e-8698-1e4508eb3be0@amd.com>

On Thu, 2025-03-13 at 13:57 +0100, Christian König wrote:
> Am 13.03.25 um 13:50 schrieb Thomas Hellström:
> > Hi, Christian
> > 
> > On Thu, 2025-03-13 at 11:19 +0100, Christian König wrote:
> > > Am 12.03.25 um 22:03 schrieb Thomas Hellström:
> > > > This RFC implements and requests comments for a way to handle
> > > > SVM
> > > > with multi-device,
> > > > typically with fast interconnects. It adds generic code and
> > > > helpers
> > > > in drm, and
> > > > device-specific code for xe.
> > > > 
> > > > For SVM, devices set up maps of device-private struct pages,
> > > > using
> > > > a struct dev_pagemap,
> > > > The CPU virtual address space (mm), can then be set up using
> > > > special page-table entries
> > > > to point to such pages, but they can't be accessed directly by
> > > > the
> > > > CPU, but possibly
> > > > by other devices using a fast interconnect. This series aims to
> > > > provide helpers to
> > > > identify pagemaps that take part in such a fast interconnect
> > > > and to
> > > > aid in migrating
> > > > between them.
> > > > 
> > > > This is initially done by augmenting the struct dev_pagemap
> > > > with a
> > > > struct drm_pagemap,
> > > > and having the struct drm_pagemap implement a "populate_mm"
> > > > method,
> > > > where a region of
> > > > the CPU virtual address space (mm) is populated with
> > > > device_private
> > > > pages from the
> > > > dev_pagemap associated with the drm_pagemap, migrating data
> > > > from
> > > > system memory or other
> > > > devices if necessary. The drm_pagemap_populate_mm() function is
> > > > then typically called
> > > > from a fault handler, using the struct drm_pagemap pointer of
> > > > choice. It could be
> > > > referencing a local drm_pagemap or a remote one. The migration
> > > > is
> > > > now completely done
> > > > by drm_pagemap callbacks, (typically using a copy-engine local
> > > > to
> > > > the dev_pagemap local
> > > > memory).
> > > Up till here that makes sense. Maybe not necessary to be put into
> > > the
> > > DRM layer, but that is an implementation detail.
> > > 
> > > > In addition there are helpers to build a drm_pagemap UAPI using
> > > > file-descripors
> > > > representing struct drm_pagemaps, and a helper to register
> > > > devices
> > > > with a common
> > > > fast interconnect. The UAPI is intended to be private to the
> > > > device, but if drivers
> > > > agree to identify struct drm_pagemaps by file descriptors one
> > > > could
> > > > in theory
> > > > do cross-driver multi-device SVM if a use-case were found.
> > > But this completely eludes me.
> > > 
> > > Why would you want an UAPI for representing pagemaps as file
> > > descriptors? Isn't it the kernel which enumerates the
> > > interconnects
> > > of the devices?
> > > 
> > > I mean we somehow need to expose those interconnects between
> > > devices
> > > to userspace, e.g. like amdgpu does with it's XGMI connectors.
> > > But
> > > that is static for the hardware (unless HW is hot removed/added)
> > > and
> > > so I would assume exposed through sysfs.
> > Thanks for the feedback.
> > 
> > The idea here is not to expose the interconnects but rather have a
> > way
> > for user-space to identify a drm_pagemap and some level of access-
> > and
> > lifetime control.
> 
> Well that's what I get I just don't get why?
> 
> I mean when you want to have the pagemap as optional feature you can
> turn on and off I would say make that a sysfs file.
> 
> It's a global feature anyway and not bound in any way to the file
> descriptor, isn't it?

Getting back on this we had some discussions internally on this and the
desired behavior is to have the device-private pages on a firstopen-
lastclose lifetime (Or rather firstopen-(lastclose + shrinker))
lifetime, for memory usage concerns. So I believe a file descriptor is
a good fit for the UAPI representation.

Thanks,
Thomas



  parent reply	other threads:[~2025-03-17  9:21 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-12 21:03 [RFC PATCH 00/19] drm, drm/xe: Multi-device GPUSVM Thomas Hellström
2025-03-12 21:03 ` [RFC PATCH 01/19] drm/xe: Introduce CONFIG_DRM_XE_GPUSVM Thomas Hellström
2025-03-12 21:03 ` [RFC PATCH 02/19] drm/xe/svm: Fix a potential bo UAF Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 03/19] drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 04/19] drm/pagemap: Add a populate_mm op Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 05/19] drm/xe: Implement and use the drm_pagemap " Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 06/19] drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap and manage lifetime Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 07/19] drm/pagemap: Get rid of the struct drm_pagemap_zdd::device_private_page_owner field Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 08/19] drm/xe/bo: Add a bo remove callback Thomas Hellström
2025-03-14 13:05   ` Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 09/19] drm/pagemap_util: Add a utility to assign an owner to a set of interconnected gpus Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 10/19] drm/gpusvm, drm/xe: Move the device private owner to the drm_gpusvm_ctx Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 11/19] drm/xe: Use the drm_pagemap_util helper to get a svm pagemap owner Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 12/19] drm/xe: Make the PT code handle placement per PTE rather than per vma / range Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 13/19] drm/gpusvm: Allow mixed mappings Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 14/19] drm/xe: Add a preferred dpagemap Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 15/19] drm/pagemap/util: Add file descriptors pointing to struct drm_pagemap Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 16/19] drm/xe/migrate: Allow xe_migrate_vram() also on non-pagefault capable devices Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 17/19] drm/xe/uapi: Add the devmem_open ioctl Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 18/19] drm/xe/uapi: HAX: Add the xe_madvise_prefer_devmem IOCTL Thomas Hellström
2025-03-12 21:04 ` [RFC PATCH 19/19] drm/xe: HAX: Use pcie p2p dma to test fast interconnect Thomas Hellström
2025-03-12 21:10 ` ✓ CI.Patch_applied: success for drm, drm/xe: Multi-device GPUSVM Patchwork
2025-03-12 21:11 ` ✗ CI.checkpatch: warning " Patchwork
2025-03-12 21:12 ` ✓ CI.KUnit: success " Patchwork
2025-03-12 21:29 ` ✓ CI.Build: " Patchwork
2025-03-12 21:31 ` ✗ CI.Hooks: failure " Patchwork
2025-03-12 21:33 ` ✓ CI.checksparse: success " Patchwork
2025-03-12 22:06 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-03-13 10:19 ` [RFC PATCH 00/19] " Christian König
2025-03-13 12:50   ` Thomas Hellström
2025-03-13 12:57     ` Christian König
2025-03-13 15:55       ` Thomas Hellström
2025-03-17  9:20       ` Thomas Hellström [this message]
2025-03-13 13:24 ` ✗ Xe.CI.Full: failure for " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0cec6c8ad9ed854300d4533eb8659fb017e76ae6.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=airlied@gmail.com \
    --cc=apopple@nvidia.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=matthew.brost@intel.com \
    --cc=michal.mrozek@intel.com \
    --cc=simona.vetter@ffwll.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox