public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/gpuvm: take refcount on DRM device
@ 2026-04-16 13:10 Alice Ryhl
  2026-04-16 15:26 ` Danilo Krummrich
  2026-04-17 14:41 ` Thomas Hellström
  0 siblings, 2 replies; 4+ messages in thread
From: Alice Ryhl @ 2026-04-16 13:10 UTC (permalink / raw)
  To: Danilo Krummrich, Matthew Brost, Thomas Hellström
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, dri-devel,
	linux-kernel, Alice Ryhl

Currently GPUVM relies on the owner implicitly holding a refcount to the
drm device, and it does not implicitly take a refcount on the drm
device. This design is error-prone, so take a refcount on the device.

Suggested-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
 drivers/gpu/drm/drm_gpuvm.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
index 44acfe4120d2..000e7910a899 100644
--- a/drivers/gpu/drm/drm_gpuvm.c
+++ b/drivers/gpu/drm/drm_gpuvm.c
@@ -25,6 +25,7 @@
  *
  */
 
+#include <drm/drm_drv.h>
 #include <drm/drm_gpuvm.h>
 #include <drm/drm_print.h>
 
@@ -1117,6 +1118,7 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, const char *name,
 	gpuvm->drm = drm;
 	gpuvm->r_obj = r_obj;
 
+	drm_dev_get(drm);
 	drm_gem_object_get(r_obj);
 
 	drm_gpuvm_warn_check_overflow(gpuvm, start_offset, range);
@@ -1160,13 +1162,15 @@ static void
 drm_gpuvm_free(struct kref *kref)
 {
 	struct drm_gpuvm *gpuvm = container_of(kref, struct drm_gpuvm, kref);
+	struct drm_device *drm = gpuvm->drm;
 
 	drm_gpuvm_fini(gpuvm);
 
-	if (drm_WARN_ON(gpuvm->drm, !gpuvm->ops->vm_free))
+	if (drm_WARN_ON(drm, !gpuvm->ops->vm_free))
 		return;
 
 	gpuvm->ops->vm_free(gpuvm);
+	drm_dev_put(drm);
 }
 
 /**

---
base-commit: 126c50bc2fb6ddfe5b7718de67bbd7592a1062bb
change-id: 20260416-gpuvm-drm-dev-get-5ded89c39bb3

Best regards,
-- 
Alice Ryhl <aliceryhl@google.com>


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/gpuvm: take refcount on DRM device
  2026-04-16 13:10 [PATCH] drm/gpuvm: take refcount on DRM device Alice Ryhl
@ 2026-04-16 15:26 ` Danilo Krummrich
  2026-04-17 14:41 ` Thomas Hellström
  1 sibling, 0 replies; 4+ messages in thread
From: Danilo Krummrich @ 2026-04-16 15:26 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: Matthew Brost, Thomas Hellström, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, dri-devel, linux-kernel

On Thu Apr 16, 2026 at 3:10 PM CEST, Alice Ryhl wrote:
> Currently GPUVM relies on the owner implicitly holding a refcount to the
> drm device, and it does not implicitly take a refcount on the drm
> device. This design is error-prone, so take a refcount on the device.
>
> Suggested-by: Danilo Krummrich <dakr@kernel.org>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>

Fixes: 546ca4d35dcc ("drm/gpuvm: convert WARN() to drm_WARN() variants")

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/gpuvm: take refcount on DRM device
  2026-04-16 13:10 [PATCH] drm/gpuvm: take refcount on DRM device Alice Ryhl
  2026-04-16 15:26 ` Danilo Krummrich
@ 2026-04-17 14:41 ` Thomas Hellström
  2026-04-17 19:33   ` Danilo Krummrich
  1 sibling, 1 reply; 4+ messages in thread
From: Thomas Hellström @ 2026-04-17 14:41 UTC (permalink / raw)
  To: Alice Ryhl, Danilo Krummrich, Matthew Brost
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, dri-devel,
	linux-kernel

Hi,

On Thu, 2026-04-16 at 13:10 +0000, Alice Ryhl wrote:
> Currently GPUVM relies on the owner implicitly holding a refcount to
> the
> drm device, and it does not implicitly take a refcount on the drm
> device. This design is error-prone, so take a refcount on the device.
> 
> Suggested-by: Danilo Krummrich <dakr@kernel.org>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>

This is problematic since typically you also need a module reference
when taking a drm device reference.

The reason for this is that the devres reference on the drm device
expects to be the last one, since it might be called from the module
exit function of the driver. Now if there is an additional reference
held at that point the driver module can be unloaded with a dangling
reference to the drm device.

On the other hand, if you in addition take a module reference then that
blocks the driver module from being unloaded while held, just like a
drm file reference. This leads to complicated module release schemes
like the one in drm_pagemap where the module refcount is released from
a work item that is waited on in the drm_pagemap exit function.

I'm working to lift the module refcount requirement, but meanwhile I'd
recommend that in the file close callback, we'd make sure all
drm_gpuvms have called their drm_gpuvm_free() function, because then we
are sure that the drm_device is still alive and the module still
pinned.

Thanks,
Thomas


> ---
>  drivers/gpu/drm/drm_gpuvm.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_gpuvm.c
> b/drivers/gpu/drm/drm_gpuvm.c
> index 44acfe4120d2..000e7910a899 100644
> --- a/drivers/gpu/drm/drm_gpuvm.c
> +++ b/drivers/gpu/drm/drm_gpuvm.c
> @@ -25,6 +25,7 @@
>   *
>   */
>  
> +#include <drm/drm_drv.h>
>  #include <drm/drm_gpuvm.h>
>  #include <drm/drm_print.h>
>  
> @@ -1117,6 +1118,7 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, const
> char *name,
>  	gpuvm->drm = drm;
>  	gpuvm->r_obj = r_obj;
>  
> +	drm_dev_get(drm);
>  	drm_gem_object_get(r_obj);
>  
>  	drm_gpuvm_warn_check_overflow(gpuvm, start_offset, range);
> @@ -1160,13 +1162,15 @@ static void
>  drm_gpuvm_free(struct kref *kref)
>  {
>  	struct drm_gpuvm *gpuvm = container_of(kref, struct
> drm_gpuvm, kref);
> +	struct drm_device *drm = gpuvm->drm;
>  
>  	drm_gpuvm_fini(gpuvm);
>  
> -	if (drm_WARN_ON(gpuvm->drm, !gpuvm->ops->vm_free))
> +	if (drm_WARN_ON(drm, !gpuvm->ops->vm_free))
>  		return;
>  
>  	gpuvm->ops->vm_free(gpuvm);
> +	drm_dev_put(drm);
>  }
>  
>  /**
> 
> ---
> base-commit: 126c50bc2fb6ddfe5b7718de67bbd7592a1062bb
> change-id: 20260416-gpuvm-drm-dev-get-5ded89c39bb3
> 
> Best regards,

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/gpuvm: take refcount on DRM device
  2026-04-17 14:41 ` Thomas Hellström
@ 2026-04-17 19:33   ` Danilo Krummrich
  0 siblings, 0 replies; 4+ messages in thread
From: Danilo Krummrich @ 2026-04-17 19:33 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Alice Ryhl, Matthew Brost, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, dri-devel, linux-kernel

On Fri Apr 17, 2026 at 4:41 PM CEST, Thomas Hellström wrote:
> This is problematic since typically you also need a module reference
> when taking a drm device reference.
>
> The reason for this is that the devres reference on the drm device
> expects to be the last one, since it might be called from the module
> exit function of the driver.

No, this is not how it works; if this would be true then drmm_* would be pretty
pointless in the first place, as one could just use devm_* for everything.

Citing the commit introducing drmm_* APIs:

	"The biggest wrong pattern is that developers use devm_, which ties the
	release action to the underlying struct device, whereas all the
	userspace visible stuff attached to a drm_device can long outlive that
	one (e.g. after a hotunplug while userspace has open files and mmap'ed
	buffers)."

> Now if there is an additional reference held at that point the driver module
> can be unloaded with a dangling reference to the drm device.
>
> On the other hand, if you in addition take a module reference then that
> blocks the driver module from being unloaded while held, just like a
> drm file reference. This leads to complicated module release schemes
> like the one in drm_pagemap where the module refcount is released from
> a work item that is waited on in the drm_pagemap exit function.
>
> I'm working to lift the module refcount requirement, but meanwhile I'd
> recommend that in the file close callback, we'd make sure all
> drm_gpuvms have called their drm_gpuvm_free() function, because then we
> are sure that the drm_device is still alive and the module still
> pinned.

If GPUVM has a pointer to the DRM device, it implies shared ownership and hence
GPUVM should account for this shared ownership and take a reference count.

The fact that GPUVM must not outlive module unload when it has driver callbacks
attached is an orthogonal requirement.

The module lifetime / callback issue is a separate problem that exists
regardless of whether you hold a device refcount. Not taking the refcount
doesn't make the module problem go away, it just adds a second, independent bug.

If struct drm_device itself, e.g. due to drm_dev_release() requires a module
refcount, then this is on struct drm_device to ensure this constraint (or remove
the requirement).

IOW, if I get to choose between a DRM component that has a pointer to a DRM
device stalls module unload and a DRM component that has a pointer to a DRM
device oopses the kernel when used wrongly, I prefer the former.

- Danilo

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-17 19:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-16 13:10 [PATCH] drm/gpuvm: take refcount on DRM device Alice Ryhl
2026-04-16 15:26 ` Danilo Krummrich
2026-04-17 14:41 ` Thomas Hellström
2026-04-17 19:33   ` Danilo Krummrich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox