From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Danilo Krummrich <dakr@kernel.org>
Cc: Alice Ryhl <aliceryhl@google.com>,
Matthew Brost <matthew.brost@intel.com>,
Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
Maxime Ripard <mripard@kernel.org>,
Thomas Zimmermann <tzimmermann@suse.de>,
dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] drm/gpuvm: take refcount on DRM device
Date: Mon, 20 Apr 2026 11:28:48 +0200 [thread overview]
Message-ID: <215f305ff04ddf8a426871e895aaf520b02e89bf.camel@linux.intel.com> (raw)
In-Reply-To: <DHVOJ6ILGSLK.3OWKA4RV2HB5C@kernel.org>
On Fri, 2026-04-17 at 21:33 +0200, Danilo Krummrich wrote:
> On Fri Apr 17, 2026 at 4:41 PM CEST, Thomas Hellström wrote:
> > This is problematic since typically you also need a module
> > reference
> > when taking a drm device reference.
> >
> > The reason for this is that the devres reference on the drm device
> > expects to be the last one, since it might be called from the
> > module
> > exit function of the driver.
>
> No, this is not how it works; if this would be true then drmm_* would
> be pretty
> pointless in the first place, as one could just use devm_* for
> everything.
>
> Citing the commit introducing drmm_* APIs:
>
> "The biggest wrong pattern is that developers use devm_,
> which ties the
> release action to the underlying struct device, whereas all
> the
> userspace visible stuff attached to a drm_device can long
> outlive that
> one (e.g. after a hotunplug while userspace has open files
> and mmap'ed
> buffers)."
Yeah, I was a bit unclear and partly incorrect. This only happens *if
there are no other holders of the driver module reference. (But see
below WRT potential other holders)-
driver_module_unload()->...->pci_dev_remove -> devm_release->
drm_dev_put-><module is unloaded>.
So if, at this point there are additional drm device references,
they'd point to dangling devices.
>
> > Now if there is an additional reference held at that point the
> > driver module
> > can be unloaded with a dangling reference to the drm device.
> >
> > On the other hand, if you in addition take a module reference then
> > that
> > blocks the driver module from being unloaded while held, just like
> > a
> > drm file reference. This leads to complicated module release
> > schemes
> > like the one in drm_pagemap where the module refcount is released
> > from
> > a work item that is waited on in the drm_pagemap exit function.
> >
> > I'm working to lift the module refcount requirement, but meanwhile
> > I'd
> > recommend that in the file close callback, we'd make sure all
> > drm_gpuvms have called their drm_gpuvm_free() function, because
> > then we
> > are sure that the drm_device is still alive and the module still
> > pinned.
>
> If GPUVM has a pointer to the DRM device, it implies shared ownership
> and hence
> GPUVM should account for this shared ownership and take a reference
> count.
>
> The fact that GPUVM must not outlive module unload when it has driver
> callbacks
> attached is an orthogonal requirement.
>
> The module lifetime / callback issue is a separate problem that
> exists
> regardless of whether you hold a device refcount. Not taking the
> refcount
> doesn't make the module problem go away, it just adds a second,
> independent bug.
>
> If struct drm_device itself, e.g. due to drm_dev_release() requires a
> module
> refcount, then this is on struct drm_device to ensure this constraint
> (or remove
> the requirement).
>
> IOW, if I get to choose between a DRM component that has a pointer to
> a DRM
> device stalls module unload and a DRM component that has a pointer to
> a DRM
> device oopses the kernel when used wrongly, I prefer the former.
I agree with your reasoning here, but current fact is that most (if not
all) holders of a drm device reference (files, pagemaps, dma-bufs)
currently also hold a module reference to protect against this, and
drm_gpuvm would be an outlier.
To fix this properly (lifting that requirement) one could introduce a
drm device count in the module and have the module exit function wait
for it to become zero, *and* that the code that did the last decrement
finished executing.
https://patchwork.freedesktop.org/patch/712146/?series=163298&rev=1
Or one could also have the drm device hold a reference count on the
driver module, but that would block unloading without previous unbind
which is not typical driver behaviour and would likely be seen as a
regression.
/Thomas
>
> - Danilo
next prev parent reply other threads:[~2026-04-20 9:29 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-16 13:10 [PATCH] drm/gpuvm: take refcount on DRM device Alice Ryhl
2026-04-16 15:26 ` Danilo Krummrich
2026-04-17 14:41 ` Thomas Hellström
2026-04-17 19:33 ` Danilo Krummrich
2026-04-20 9:28 ` Thomas Hellström [this message]
2026-04-20 15:08 ` Danilo Krummrich
2026-04-20 16:19 ` Thomas Hellström
2026-04-27 7:34 ` Alice Ryhl
2026-04-27 11:44 ` Thomas Hellström
2026-05-05 11:06 ` Danilo Krummrich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=215f305ff04ddf8a426871e895aaf520b02e89bf.camel@linux.intel.com \
--to=thomas.hellstrom@linux.intel.com \
--cc=aliceryhl@google.com \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maarten.lankhorst@linux.intel.com \
--cc=matthew.brost@intel.com \
--cc=mripard@kernel.org \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.