public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Danilo Krummrich <dakr@kernel.org>
Cc: Alice Ryhl <aliceryhl@google.com>,
	Matthew Brost <matthew.brost@intel.com>,
	 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann	 <tzimmermann@suse.de>,
	dri-devel@lists.freedesktop.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH] drm/gpuvm: take refcount on DRM device
Date: Mon, 20 Apr 2026 11:28:48 +0200	[thread overview]
Message-ID: <215f305ff04ddf8a426871e895aaf520b02e89bf.camel@linux.intel.com> (raw)
In-Reply-To: <DHVOJ6ILGSLK.3OWKA4RV2HB5C@kernel.org>

On Fri, 2026-04-17 at 21:33 +0200, Danilo Krummrich wrote:
> On Fri Apr 17, 2026 at 4:41 PM CEST, Thomas Hellström wrote:
> > This is problematic since typically you also need a module
> > reference
> > when taking a drm device reference.
> > 
> > The reason for this is that the devres reference on the drm device
> > expects to be the last one, since it might be called from the
> > module
> > exit function of the driver.
> 
> No, this is not how it works; if this would be true then drmm_* would
> be pretty
> pointless in the first place, as one could just use devm_* for
> everything.
> 
> Citing the commit introducing drmm_* APIs:
> 
> 	"The biggest wrong pattern is that developers use devm_,
> which ties the
> 	release action to the underlying struct device, whereas all
> the
> 	userspace visible stuff attached to a drm_device can long
> outlive that
> 	one (e.g. after a hotunplug while userspace has open files
> and mmap'ed
> 	buffers)."

Yeah, I was a bit unclear and partly incorrect. This only happens *if
there are no other holders of the driver module reference. (But see
below WRT potential other holders)-

driver_module_unload()->...->pci_dev_remove -> devm_release->
drm_dev_put-><module is unloaded>.

So if, at this point there are additional drm device references,
they'd point to dangling devices.

> 
> > Now if there is an additional reference held at that point the
> > driver module
> > can be unloaded with a dangling reference to the drm device.
> > 
> > On the other hand, if you in addition take a module reference then
> > that
> > blocks the driver module from being unloaded while held, just like
> > a
> > drm file reference. This leads to complicated module release
> > schemes
> > like the one in drm_pagemap where the module refcount is released
> > from
> > a work item that is waited on in the drm_pagemap exit function.
> > 
> > I'm working to lift the module refcount requirement, but meanwhile
> > I'd
> > recommend that in the file close callback, we'd make sure all
> > drm_gpuvms have called their drm_gpuvm_free() function, because
> > then we
> > are sure that the drm_device is still alive and the module still
> > pinned.
> 
> If GPUVM has a pointer to the DRM device, it implies shared ownership
> and hence
> GPUVM should account for this shared ownership and take a reference
> count.
> 
> The fact that GPUVM must not outlive module unload when it has driver
> callbacks
> attached is an orthogonal requirement.
> 
> The module lifetime / callback issue is a separate problem that
> exists
> regardless of whether you hold a device refcount. Not taking the
> refcount
> doesn't make the module problem go away, it just adds a second,
> independent bug.
> 
> If struct drm_device itself, e.g. due to drm_dev_release() requires a
> module
> refcount, then this is on struct drm_device to ensure this constraint
> (or remove
> the requirement).
> 
> IOW, if I get to choose between a DRM component that has a pointer to
> a DRM
> device stalls module unload and a DRM component that has a pointer to
> a DRM
> device oopses the kernel when used wrongly, I prefer the former.

I agree with your reasoning here, but current fact is that most (if not
all) holders of a drm device reference (files, pagemaps, dma-bufs)
currently also hold a module reference to protect against this, and
drm_gpuvm would be an outlier.

To fix this properly (lifting that requirement) one could introduce a
drm device count in the module and have the module exit function wait
for it to become zero, *and* that the code that did the last decrement
finished executing.

https://patchwork.freedesktop.org/patch/712146/?series=163298&rev=1

Or one could also have the drm device hold a reference count on the
driver module, but that would block unloading without previous unbind
which is not typical driver behaviour and would likely be seen as a
regression.

/Thomas







> 
> - Danilo

  reply	other threads:[~2026-04-20  9:29 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-16 13:10 [PATCH] drm/gpuvm: take refcount on DRM device Alice Ryhl
2026-04-16 15:26 ` Danilo Krummrich
2026-04-17 14:41 ` Thomas Hellström
2026-04-17 19:33   ` Danilo Krummrich
2026-04-20  9:28     ` Thomas Hellström [this message]
2026-04-20 15:08       ` Danilo Krummrich
2026-04-20 16:19         ` Thomas Hellström

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=215f305ff04ddf8a426871e895aaf520b02e89bf.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=aliceryhl@google.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mripard@kernel.org \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox