From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
himal.prasad.ghimiray@intel.com, apopple@nvidia.com,
airlied@gmail.com, "Simona Vetter" <simona.vetter@ffwll.ch>,
felix.kuehling@amd.com,
"Christian König" <christian.koenig@amd.com>,
dakr@kernel.org, "Mrozek, Michal" <michal.mrozek@intel.com>,
"Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>
Subject: Re: [PATCH 03/15] drm/pagemap, drm/xe: Manage drm_pagemap provider lifetimes
Date: Wed, 29 Oct 2025 15:49:47 +0100 [thread overview]
Message-ID: <4e6b06069f77aee5cf7153e0ac0cf1a30c1327a0.camel@linux.intel.com> (raw)
In-Reply-To: <aQFjzTpZwYaytZqB@lstrano-desk.jf.intel.com>
On Tue, 2025-10-28 at 17:46 -0700, Matthew Brost wrote:
> On Sat, Oct 25, 2025 at 02:04:00PM +0200, Thomas Hellström wrote:
> > Even if the drm_pagemap provider has released its reference on
> > the drm_pagemap, references may be held by still active pages.
> > Ensure that we hold a reference on the provider drm device and
> > modules for as long as we might need to use the drm_pagemap ops.
> >
>
> Just to make sure I’m understanding this correctly — this is intended
> to
> guard against the devm action [1] running while a device is still
> holding references to another device’s pages, right?
>
> [1]
> https://elixir.bootlin.com/linux/v6.17.5/source/kernel/resource.c#L1993
Actually removing the dev_pagemap and its region is allowed while
another device holds a reference on the *drm_pagemap*. For example if
you have two devices. Device 0 executes from the memory of device 1.
Suddenly you feel like offlining / unbinding device 1. When you execute
unbind, the driver evicts all SVM bos and thereby frees all device-
private pages. But device 0 still has a reference to the drm_pagemap,
even if it's unusable: Any VRAM migration trying to use the drm_pagemap
will error with -ENODEV, so depending on how the driver handles that,
it will continue executing out of another memory region. At this point
it would've been possible without this code to rmmod the drm_pagemap
provider device module, and its drm device would've been freed without
this code, and when drm_pagemap_put() eventually is called, things go
boom. So the commit message is a bit misleading.
In the case where we only have pages left, the last page should be
freed from the device remove callback where bos are evicted. At that
point, the provider drm device is still alive as the devm callbacks
haven't executed yet. Also a rmmod wold typically cause the devm
callbacks to execute so that should also be safe without this patch. At
least if the page freeing doesn't trigger any async callbacks that
aren't waited on before removal.
So yeah, I need to update the commit message a bit. We should also
craft an IGT that unbinds device 1 while device 0 is executing out of
its memory and verify that execution completes with correct results
anyway.
/Thomas
>
> > Note that in theory, the drm_gpusvm_helper module may be unloaded
> > as soon as the final module_put() of the provider driver module is
> > executed, so we need to add a module_exit() function that waits
> > for the work item executing the module_put() has completed.
> >
> > Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > ---
> > drivers/gpu/drm/drm_pagemap.c | 101
> > ++++++++++++++++++++++++++++++++--
> > drivers/gpu/drm/xe/xe_svm.c | 15 ++++-
> > include/drm/drm_pagemap.h | 10 +++-
> > 3 files changed, 117 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_pagemap.c
> > b/drivers/gpu/drm/drm_pagemap.c
> > index 173b3ecb07d5..fb18a80d6a1c 100644
> > --- a/drivers/gpu/drm/drm_pagemap.c
> > +++ b/drivers/gpu/drm/drm_pagemap.c
> > @@ -8,6 +8,7 @@
> > #include <linux/pagemap.h>
> > #include <drm/drm_drv.h>
> > #include <drm/drm_pagemap.h>
> > +#include <drm/drm_print.h>
> >
> > /**
> > * DOC: Overview
> > @@ -544,16 +545,92 @@ static int
> > drm_pagemap_migrate_populate_ram_pfn(struct vm_area_struct *vas,
> > return -ENOMEM;
> > }
> >
> > +static void drm_pagemap_dev_unhold_work(struct work_struct *work);
> > +static LLIST_HEAD(drm_pagemap_unhold_list);
> > +static DECLARE_WORK(drm_pagemap_work,
> > drm_pagemap_dev_unhold_work);
> > +
> > +/**
> > + * struct drm_pagemap_dev_hold - Struct to aid in drm_device
> > release.
> > + * @link: Link into drm_pagemap_unhold_list for deferred reference
> > releases.
> > + * @drm: drm device to put.
> > + *
> > + * When a struct drm_pagemap is released, we also need to release
> > the
> > + * reference it holds on the drm device. However, typically that
> > needs
> > + * to be done separately from a system-wide workqueue.
> > + * Each time a struct drm_pagemap is initialized
> > + * (or re-initialized if cached) therefore allocate a separate
> > + * drm_pagemap_dev_hold item, from which we put the drm device and
> > + * associated module.
> > + */
> > +struct drm_pagemap_dev_hold {
> > + struct llist_node link;
> > + struct drm_device *drm;
> > +};
> > +
> > static void drm_pagemap_release(struct kref *ref)
> > {
> > struct drm_pagemap *dpagemap = container_of(ref,
> > typeof(*dpagemap), ref);
> > -
> > + struct drm_pagemap_dev_hold *dev_hold = dpagemap-
> > >dev_hold;
> > +
> > + /*
> > + * We know the pagemap provider is alive at this point,
> > since
> > + * the struct drm_pagemap_dev_hold holds a reference to
> > the
> > + * pagemap provider drm_device and its module.
> > + */
> > + dpagemap->dev_hold = NULL;
> > kfree(dpagemap);
> > + llist_add(&dev_hold->link, &drm_pagemap_unhold_list);
> > + schedule_work(&drm_pagemap_work);
> > + /*
> > + * Here, either the provider device is still alive, since
> > if called from
> > + * page_free(), the caller is holding a reference on the
> > dev_pagemap,
> > + * or if called from drm_pagemap_put(), the direct caller
> > is still alive.
> > + * This ensures we can't race with THIS module unload.
> > + */
> > +}
> > +
> > +static void drm_pagemap_dev_unhold_work(struct work_struct *work)
> > +{
> > + struct llist_node *node =
> > llist_del_all(&drm_pagemap_unhold_list);
> > + struct drm_pagemap_dev_hold *dev_hold, *next;
> > +
> > + /*
> > + * Deferred release of drm_pagemap provider device and
> > module.
> > + * THIS module is kept alive during the release by the
> > + * flush_work() in the drm_pagemap_exit() function.
> > + */
> > + llist_for_each_entry_safe(dev_hold, next, node, link) {
> > + struct drm_device *drm = dev_hold->drm;
> > + struct module *module = drm->driver->fops->owner;
> > +
> > + drm_dbg(drm, "Releasing reference on provider
> > device and module.\n");
> > + drm_dev_put(drm);
> > + module_put(module);
> > + kfree(dev_hold);
> > + }
> > +}
> > +
> > +static struct drm_pagemap_dev_hold *
> > +drm_pagemap_dev_hold(struct drm_pagemap *dpagemap)
> > +{
> > + struct drm_pagemap_dev_hold *dev_hold;
> > + struct drm_device *drm = dpagemap->drm;
> > +
> > + dev_hold = kzalloc(sizeof(*dev_hold), GFP_KERNEL);
> > + if (!dev_hold)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + init_llist_node(&dev_hold->link);
> > + dev_hold->drm = drm;
> > + (void)try_module_get(drm->driver->fops->owner);
> > + drm_dev_get(drm);
> > +
> > + return dev_hold;
> > }
> >
> > /**
> > * drm_pagemap_create() - Create a struct drm_pagemap.
> > - * @dev: Pointer to a struct device providing the device-private
> > memory.
> > + * @drm: Pointer to a struct drm_device providing the device-
> > private memory.
> > * @pagemap: Pointer to a pre-setup struct dev_pagemap providing
> > the struct pages.
> > * @ops: Pointer to the struct drm_pagemap_ops.
> > *
> > @@ -563,20 +640,28 @@ static void drm_pagemap_release(struct kref
> > *ref)
> > * Error pointer on error.
> > */
> > struct drm_pagemap *
> > -drm_pagemap_create(struct device *dev,
> > +drm_pagemap_create(struct drm_device *drm,
> > struct dev_pagemap *pagemap,
> > const struct drm_pagemap_ops *ops)
> > {
> > struct drm_pagemap *dpagemap = kzalloc(sizeof(*dpagemap),
> > GFP_KERNEL);
> > + struct drm_pagemap_dev_hold *dev_hold;
> >
> > if (!dpagemap)
> > return ERR_PTR(-ENOMEM);
> >
> > kref_init(&dpagemap->ref);
> > - dpagemap->dev = dev;
> > + dpagemap->drm = drm;
> > dpagemap->ops = ops;
> > dpagemap->pagemap = pagemap;
> >
> > + dev_hold = drm_pagemap_dev_hold(dpagemap);
> > + if (IS_ERR(dev_hold)) {
> > + kfree(dpagemap);
> > + return ERR_CAST(dev_hold);
> > + }
> > + dpagemap->dev_hold = dev_hold;
> > +
> > return dpagemap;
> > }
> > EXPORT_SYMBOL(drm_pagemap_create);
> > @@ -937,3 +1022,11 @@ int drm_pagemap_populate_mm(struct
> > drm_pagemap *dpagemap,
> > return err;
> > }
> > EXPORT_SYMBOL(drm_pagemap_populate_mm);
> > +
> > +static void drm_pagemap_exit(void)
> > +{
> > + flush_work(&drm_pagemap_work);
> > + if (WARN_ON(!llist_empty(&drm_pagemap_unhold_list)))
> > + disable_work_sync(&drm_pagemap_work);
> > +}
> > +module_exit(drm_pagemap_exit);
> > diff --git a/drivers/gpu/drm/xe/xe_svm.c
> > b/drivers/gpu/drm/xe/xe_svm.c
> > index 6d2c6c144315..f6ee22da2e95 100644
> > --- a/drivers/gpu/drm/xe/xe_svm.c
> > +++ b/drivers/gpu/drm/xe/xe_svm.c
> > @@ -1437,7 +1437,7 @@ xe_drm_pagemap_device_map(struct drm_pagemap
> > *dpagemap,
> > unsigned int order,
> > enum dma_data_direction dir)
> > {
> > - struct device *pgmap_dev = dpagemap->dev;
> > + struct device *pgmap_dev = dpagemap->drm->dev;
> > enum drm_interconnect_protocol prot;
> > dma_addr_t addr;
> >
> > @@ -1457,6 +1457,14 @@ static const struct drm_pagemap_ops
> > xe_drm_pagemap_ops = {
> > .populate_mm = xe_drm_pagemap_populate_mm,
> > };
> >
> > +static void xe_devm_release(void *data)
> > +{
> > + struct xe_vram_region *vr = data;
> > +
> > + drm_pagemap_put(vr->dpagemap);
> > + vr->dpagemap = NULL;
> > +}
> > +
> > /**
> > * xe_devm_add: Remap and provide memmap backing for device memory
> > * @tile: tile that the memory region belongs to
> > @@ -1482,7 +1490,7 @@ int xe_devm_add(struct xe_tile *tile, struct
> > xe_vram_region *vr)
> > return ret;
> > }
> >
> > - vr->dpagemap = drm_pagemap_create(dev, &vr->pagemap,
> > + vr->dpagemap = drm_pagemap_create(&xe->drm, &vr->pagemap,
> > &xe_drm_pagemap_ops);
> > if (IS_ERR(vr->dpagemap)) {
> > drm_err(&xe->drm, "Failed to create drm_pagemap
> > tile %d memory: %pe\n",
> > @@ -1490,6 +1498,9 @@ int xe_devm_add(struct xe_tile *tile, struct
> > xe_vram_region *vr)
> > ret = PTR_ERR(vr->dpagemap);
> > goto out_no_dpagemap;
> > }
> > + ret = devm_add_action_or_reset(dev, xe_devm_release, vr);
> > + if (ret)
> > + goto out_no_dpagemap;
>
> I mentioned this in first patch that this was missing, maybe move
> this
> part to the first patch even though this will get removed a bit
> later.
>
> Matt
>
> >
> > vr->pagemap.type = MEMORY_DEVICE_PRIVATE;
> > vr->pagemap.range.start = res->start;
> > diff --git a/include/drm/drm_pagemap.h b/include/drm/drm_pagemap.h
> > index 2c7de928865b..5cfe54331ba7 100644
> > --- a/include/drm/drm_pagemap.h
> > +++ b/include/drm/drm_pagemap.h
> > @@ -9,6 +9,7 @@
> > #define NR_PAGES(order) (1U << (order))
> >
> > struct drm_pagemap;
> > +struct drm_pagemap_dev_hold;
> > struct drm_pagemap_zdd;
> > struct device;
> >
> > @@ -130,14 +131,17 @@ struct drm_pagemap_ops {
> > * used for device p2p handshaking.
> > * @ops: The struct drm_pagemap_ops.
> > * @ref: Reference count.
> > - * @dev: The struct drevice owning the device-private memory.
> > + * @drm: The struct drm device owning the device-private memory.
> > * @pagemap: Pointer to the underlying dev_pagemap.
> > + * @dev_hold: Pointer to a struct drm_pagemap_dev_hold for
> > + * device referencing.
> > */
> > struct drm_pagemap {
> > const struct drm_pagemap_ops *ops;
> > struct kref ref;
> > - struct device *dev;
> > + struct drm_device *drm;
> > struct dev_pagemap *pagemap;
> > + struct drm_pagemap_dev_hold *dev_hold;
> > };
> >
> > struct drm_pagemap_devmem;
> > @@ -206,7 +210,7 @@ struct drm_pagemap_devmem_ops {
> > unsigned long npages);
> > };
> >
> > -struct drm_pagemap *drm_pagemap_create(struct device *dev,
> > +struct drm_pagemap *drm_pagemap_create(struct drm_device *drm,
> > struct dev_pagemap
> > *pagemap,
> > const struct
> > drm_pagemap_ops *ops);
> >
> > --
> > 2.51.0
> >
next prev parent reply other threads:[~2025-10-29 14:49 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-25 12:03 [PATCH 00/15] Dynamic drm_pagemaps and Initial multi-device SVM Thomas Hellström
2025-10-25 12:03 ` [PATCH 01/15] drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap Thomas Hellström
2025-10-29 0:31 ` Matthew Brost
2025-10-29 1:11 ` Matthew Brost
2025-10-29 14:51 ` Thomas Hellström
2025-10-25 12:03 ` [PATCH 02/15] drm/pagemap: Add a refcounted drm_pagemap backpointer to struct drm_pagemap_zdd Thomas Hellström
2025-10-29 0:33 ` Matthew Brost
2025-10-25 12:04 ` [PATCH 03/15] drm/pagemap, drm/xe: Manage drm_pagemap provider lifetimes Thomas Hellström
2025-10-29 0:46 ` Matthew Brost
2025-10-29 14:49 ` Thomas Hellström [this message]
2025-10-30 2:46 ` Matthew Brost
2025-10-25 12:04 ` [PATCH 04/15] drm/pagemap: Add a drm_pagemap cache and shrinker Thomas Hellström
2025-10-28 1:23 ` Matthew Brost
2025-10-28 9:46 ` Thomas Hellström
2025-10-28 10:29 ` Thomas Hellström
2025-10-28 18:38 ` Matthew Brost
2025-10-29 22:41 ` Matthew Brost
2025-10-29 22:48 ` Matthew Brost
2025-10-25 12:04 ` [PATCH 05/15] drm/xe: Use the " Thomas Hellström
2025-10-30 0:43 ` Matthew Brost
2025-10-25 12:04 ` [PATCH 06/15] drm/pagemap: Remove the drm_pagemap_create() interface Thomas Hellström
2025-10-29 1:00 ` Matthew Brost
2025-10-25 12:04 ` [PATCH 07/15] drm/pagemap_util: Add a utility to assign an owner to a set of interconnected gpus Thomas Hellström
2025-10-29 1:21 ` Matthew Brost
2025-10-29 14:52 ` Thomas Hellström
2025-10-25 12:04 ` [PATCH 08/15] drm/xe: Use the drm_pagemap_util helper to get a svm pagemap owner Thomas Hellström
2025-10-27 23:02 ` Matthew Brost
2025-10-25 12:04 ` [PATCH 09/15] drm/xe: Pass a drm_pagemap pointer around with the memory advise attributes Thomas Hellström
2025-10-28 0:35 ` Matthew Brost
2025-10-25 12:04 ` [PATCH 10/15] drm/xe: Use the vma attibute drm_pagemap to select where to migrate Thomas Hellström
2025-10-25 18:01 ` kernel test robot
2025-10-29 3:27 ` Matthew Brost
2025-10-29 14:56 ` Thomas Hellström
2025-10-29 16:59 ` kernel test robot
2025-10-25 12:04 ` [PATCH 11/15] drm/xe: Simplify madvise_preferred_mem_loc() Thomas Hellström
2025-10-27 23:14 ` Matthew Brost
2025-10-25 12:04 ` [PATCH 12/15] drm/xe/uapi: Extend the madvise functionality to support foreign pagemap placement for svm Thomas Hellström
2025-10-28 0:51 ` Matthew Brost
2025-10-25 12:04 ` [PATCH 13/15] drm/xe: Support pcie p2p dma as a fast interconnect Thomas Hellström
2025-10-28 1:14 ` Matthew Brost
2025-10-28 9:32 ` Thomas Hellström
2025-10-29 2:17 ` Matthew Brost
2025-10-29 14:54 ` Thomas Hellström
2025-10-25 12:04 ` [PATCH 14/15] drm/xe/vm: Add a prefetch debug printout Thomas Hellström
2025-10-27 23:16 ` Matthew Brost
2025-10-25 12:04 ` [PATCH 15/15] drm/xe: Retry migration once Thomas Hellström
2025-10-28 0:13 ` Matthew Brost
2025-10-28 9:11 ` Thomas Hellström
2025-10-28 19:03 ` Matthew Brost
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4e6b06069f77aee5cf7153e0ac0cf1a30c1327a0.camel@linux.intel.com \
--to=thomas.hellstrom@linux.intel.com \
--cc=airlied@gmail.com \
--cc=apopple@nvidia.com \
--cc=christian.koenig@amd.com \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=felix.kuehling@amd.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=joonas.lahtinen@linux.intel.com \
--cc=matthew.brost@intel.com \
--cc=michal.mrozek@intel.com \
--cc=simona.vetter@ffwll.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).