All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/drm_crtc: fix race with dma_fence_signal() in ::get_driver_name()
@ 2026-06-18 14:03 André Draszik
  2026-06-18 14:16 ` sashiko-bot
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: André Draszik @ 2026-06-18 14:03 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Sumit Semwal, Christian König, Tvrtko Ursulin,
	Boris Brezillon, Philipp Stanner
  Cc: dri-devel, linux-kernel, linux-media, linaro-mm-sig,
	Peter Griffin, Tudor Ambarus, Juan Yescas, kernel-team,
	André Draszik

Since commit 541c8f2468b9 ("dma-buf: detach fence ops on signal v3"),
I'm seeing the BUG_ON() triggering in drm_crtc's fence_to_crtc() via
drm_crtc_fence_get_driver_name() regularly:

    Call trace:
     panic+0x58/0x5c
     die+0x160/0x178
     bug_brk_handler+0x70/0xa4
     call_el1_break_hook+0x3c/0x1a0
     do_el1_brk64+0x24/0x74
     el1_brk64+0x34/0x54
     el1h_64_sync_handler+0x80/0xfc
     el1h_64_sync+0x84/0x88
     drm_crtc_fence_get_driver_name+0x60/0x68 (P)
     sync_file_get_name+0x184/0x45c
     sync_file_ioctl+0x404/0xf70
     __arm64_sys_ioctl+0x124/0x1dc

This looks to be caused by a code flow similar to the following:

+++ snip +++
thread A                             thread B

                                     ioctl(SYNC_IOC_FILE_INFO)
                                     sync_file_ioctl()
                                     sync_file_get_name()
dma_fence_signal_timestamp_locked()  dma_fence_driver_name()
                                       ops = rcu_dereference(fence->ops)
                                       if (!dma_fence_test_signaled_flag())
                                         ops->get_driver_name(fence) i.e.
                                         drm_crtc_fence_get_driver_name()
test_and_set_bit(SIGNALED)
RCU_INIT_POINTER(fence->ops, NULL)
                                     drm_crtc_fence_get_driver_name()
                                       BUG_ON(rcu_access_pointer(fence->ops)
                                              != &drm_crtc_fence_ops)
+++ snap +++

I see two ways to resolve this:
a) simply drop the BUG_ON(). It can not work anymore since above
   commit, as it is racy now.
b) pass the original 'ops' pointer obtained in dma_fence_driver_name()
   to all callees.

This patch implements option a), as because:
* I don't see much benefit in passing the extra pointer just for this
  BUG_ON() to work.
* Requiring the dma_fence_ops in those callbacks is an implementation
  detail of the drm_crtc driver, and therefore upper layers shouldn't
  have to care about that.
* The existence of the BUG_ON() doesn't appear to be consistent with
  implementations of ::get_driver_name() or ::get_timeline_name() in
  the majority of other DRM drivers in the first place. Those that do
  have a similar BUG_ON() (i915, xe) probably also need an update
  similar to this patch here but I'm not in a position to test those.

Note that the adjacent drm_crtc_fence_get_timeline_name() has the same
problem and is fixed by this patch as well.

Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3")
Signed-off-by: André Draszik <andre.draszik@linaro.org>
---
 drivers/gpu/drm/drm_crtc.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
index 63ead8ba6756..31c8636e7467 100644
--- a/drivers/gpu/drm/drm_crtc.c
+++ b/drivers/gpu/drm/drm_crtc.c
@@ -73,6 +73,9 @@
  * &drm_mode_config_funcs.atomic_check.
  */
 
+#define fence_to_crtc(f) container_of((f)->extern_lock, \
+				      struct drm_crtc, fence_lock)
+
 /**
  * drm_crtc_from_index - find the registered CRTC at an index
  * @dev: DRM device
@@ -154,14 +157,6 @@ static void drm_crtc_crc_fini(struct drm_crtc *crtc)
 #endif
 }
 
-static const struct dma_fence_ops drm_crtc_fence_ops;
-
-static struct drm_crtc *fence_to_crtc(struct dma_fence *fence)
-{
-	BUG_ON(rcu_access_pointer(fence->ops) != &drm_crtc_fence_ops);
-	return container_of(fence->extern_lock, struct drm_crtc, fence_lock);
-}
-
 static const char *drm_crtc_fence_get_driver_name(struct dma_fence *fence)
 {
 	struct drm_crtc *crtc = fence_to_crtc(fence);

---
base-commit: e2cae00c05d196491c318196792297f2dfbaa02c
change-id: 20260618-linux-drm_crtc_fix2-23a7c354a412

Best regards,
-- 
André Draszik <andre.draszik@linaro.org>


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/drm_crtc: fix race with dma_fence_signal() in ::get_driver_name()
  2026-06-18 14:03 [PATCH] drm/drm_crtc: fix race with dma_fence_signal() in ::get_driver_name() André Draszik
@ 2026-06-18 14:16 ` sashiko-bot
  2026-06-18 14:53 ` Jani Nikula
  2026-06-18 15:56 ` Philipp Stanner
  2 siblings, 0 replies; 4+ messages in thread
From: sashiko-bot @ 2026-06-18 14:16 UTC (permalink / raw)
  To: André Draszik; +Cc: dri-devel

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Use-After-Free in get_driver_name() and get_timeline_name() due to dma_fence RCU contract violation during CRTC teardown.
--

commit 4e03eb31730ff05ffb65d7d539f01df4c6b89099
Author: André Draszik <andre.draszik@linaro.org>

drm/drm_crtc: fix race with dma_fence_signal() in ::get_driver_name()

This patch removes a BUG_ON() in drm_crtc_fence_get_driver_name() that
was triggering because the dma_fence_ops get detached upon signaling.

> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
> index 63ead8ba67564..31c8636e74677 100644
> --- a/drivers/gpu/drm/drm_crtc.c
> +++ b/drivers/gpu/drm/drm_crtc.c

[ ... ]

> @@ -154,14 +157,6 @@ static void drm_crtc_crc_fini(struct drm_crtc *crtc)
>  #endif
>  }
>  
> -static const struct dma_fence_ops drm_crtc_fence_ops;
> -
> -static struct drm_crtc *fence_to_crtc(struct dma_fence *fence)
> -{
> -	BUG_ON(rcu_access_pointer(fence->ops) != &drm_crtc_fence_ops);
> -	return container_of(fence->extern_lock, struct drm_crtc, fence_lock);
> -}
> -
>  static const char *drm_crtc_fence_get_driver_name(struct dma_fence *fence)
>  {
>  	struct drm_crtc *crtc = fence_to_crtc(fence);

[Severity: Critical]
This is a pre-existing issue, but does removing this BUG_ON() unmask a
use-after-free during device teardown?

Looking at how these fences are managed, drm_crtc_create_fence() creates
a dma_fence without taking a reference to the drm_device or drm_crtc.
Because the sync_file framework exposes this fence to userspace, the fence
can outlive the CRTC.

The dma_fence contract requires that data accessed by dma_fence_ops (like
get_driver_name) must remain valid for an RCU grace period after the fence
is signaled. However, drm_crtc_cleanup() and the subsequent freeing of the
device do not wait for an RCU grace period via synchronize_rcu().

If userspace calls ioctl(SYNC_IOC_FILE_INFO) concurrently with a device
hot-unplug:

CPU1 (Userspace)
sync_file_get_name()
  ops = rcu_dereference(fence->ops);
  if (!dma_fence_test_signaled_flag())
    // Preempted or delayed here

CPU2 (Driver Teardown)
Signals the fence (setting fence->ops = NULL)
Destroys and frees the CRTC without waiting for an RCU grace period

CPU1 (Resumes)
  ops->get_driver_name(fence) -> drm_crtc_fence_get_driver_name()
  crtc = fence_to_crtc(fence); // Casts to the freed CRTC
  return crtc->dev->driver->name; // Use-after-free

While the BUG_ON() was causing spurious panics during normal operation,
it seems it was also inadvertently preventing this use-after-free from
proceeding silently.

Does the CRTC or DRM device need to be kept alive for the RCU grace period,
or should the fence hold a proper reference to prevent the use-after-free
when get_driver_name() and get_timeline_name() access the freed CRTC
structure?

> 
> 	return crtc->dev->driver->name;
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260618-linux-drm_crtc_fix2-v1-1-c03e77b36f34@linaro.org?part=1

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/drm_crtc: fix race with dma_fence_signal() in ::get_driver_name()
  2026-06-18 14:03 [PATCH] drm/drm_crtc: fix race with dma_fence_signal() in ::get_driver_name() André Draszik
  2026-06-18 14:16 ` sashiko-bot
@ 2026-06-18 14:53 ` Jani Nikula
  2026-06-18 15:56 ` Philipp Stanner
  2 siblings, 0 replies; 4+ messages in thread
From: Jani Nikula @ 2026-06-18 14:53 UTC (permalink / raw)
  To: André Draszik, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
	Christian König, Tvrtko Ursulin, Boris Brezillon,
	Philipp Stanner
  Cc: dri-devel, linux-kernel, linux-media, linaro-mm-sig,
	Peter Griffin, Tudor Ambarus, Juan Yescas, kernel-team,
	André Draszik

On Thu, 18 Jun 2026, André Draszik <andre.draszik@linaro.org> wrote:
> Since commit 541c8f2468b9 ("dma-buf: detach fence ops on signal v3"),
> I'm seeing the BUG_ON() triggering in drm_crtc's fence_to_crtc() via
> drm_crtc_fence_get_driver_name() regularly:
>
>     Call trace:
>      panic+0x58/0x5c
>      die+0x160/0x178
>      bug_brk_handler+0x70/0xa4
>      call_el1_break_hook+0x3c/0x1a0
>      do_el1_brk64+0x24/0x74
>      el1_brk64+0x34/0x54
>      el1h_64_sync_handler+0x80/0xfc
>      el1h_64_sync+0x84/0x88
>      drm_crtc_fence_get_driver_name+0x60/0x68 (P)
>      sync_file_get_name+0x184/0x45c
>      sync_file_ioctl+0x404/0xf70
>      __arm64_sys_ioctl+0x124/0x1dc
>
> This looks to be caused by a code flow similar to the following:
>
> +++ snip +++
> thread A                             thread B
>
>                                      ioctl(SYNC_IOC_FILE_INFO)
>                                      sync_file_ioctl()
>                                      sync_file_get_name()
> dma_fence_signal_timestamp_locked()  dma_fence_driver_name()
>                                        ops = rcu_dereference(fence->ops)
>                                        if (!dma_fence_test_signaled_flag())
>                                          ops->get_driver_name(fence) i.e.
>                                          drm_crtc_fence_get_driver_name()
> test_and_set_bit(SIGNALED)
> RCU_INIT_POINTER(fence->ops, NULL)
>                                      drm_crtc_fence_get_driver_name()
>                                        BUG_ON(rcu_access_pointer(fence->ops)
>                                               != &drm_crtc_fence_ops)
> +++ snap +++
>
> I see two ways to resolve this:
> a) simply drop the BUG_ON(). It can not work anymore since above
>    commit, as it is racy now.
> b) pass the original 'ops' pointer obtained in dma_fence_driver_name()
>    to all callees.
>
> This patch implements option a), as because:
> * I don't see much benefit in passing the extra pointer just for this
>   BUG_ON() to work.
> * Requiring the dma_fence_ops in those callbacks is an implementation
>   detail of the drm_crtc driver, and therefore upper layers shouldn't
>   have to care about that.
> * The existence of the BUG_ON() doesn't appear to be consistent with
>   implementations of ::get_driver_name() or ::get_timeline_name() in
>   the majority of other DRM drivers in the first place. Those that do
>   have a similar BUG_ON() (i915, xe) probably also need an update
>   similar to this patch here but I'm not in a position to test those.
>
> Note that the adjacent drm_crtc_fence_get_timeline_name() has the same
> problem and is fixed by this patch as well.
>
> Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3")
> Signed-off-by: André Draszik <andre.draszik@linaro.org>
> ---
>  drivers/gpu/drm/drm_crtc.c | 11 +++--------
>  1 file changed, 3 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
> index 63ead8ba6756..31c8636e7467 100644
> --- a/drivers/gpu/drm/drm_crtc.c
> +++ b/drivers/gpu/drm/drm_crtc.c
> @@ -73,6 +73,9 @@
>   * &drm_mode_config_funcs.atomic_check.
>   */
>  
> +#define fence_to_crtc(f) container_of((f)->extern_lock, \
> +				      struct drm_crtc, fence_lock)
> +
>  /**
>   * drm_crtc_from_index - find the registered CRTC at an index
>   * @dev: DRM device
> @@ -154,14 +157,6 @@ static void drm_crtc_crc_fini(struct drm_crtc *crtc)
>  #endif
>  }
>  
> -static const struct dma_fence_ops drm_crtc_fence_ops;
> -
> -static struct drm_crtc *fence_to_crtc(struct dma_fence *fence)
> -{
> -	BUG_ON(rcu_access_pointer(fence->ops) != &drm_crtc_fence_ops);

Whether removing the BUG_ON() turns out to be the right choice or not, I
couldn't say, but please don't turn this function into a macro, at least
not without rationale. (I can't think of any.)

BR,
Jani.

> -	return container_of(fence->extern_lock, struct drm_crtc, fence_lock);
> -}
> -
>  static const char *drm_crtc_fence_get_driver_name(struct dma_fence *fence)
>  {
>  	struct drm_crtc *crtc = fence_to_crtc(fence);
>
> ---
> base-commit: e2cae00c05d196491c318196792297f2dfbaa02c
> change-id: 20260618-linux-drm_crtc_fix2-23a7c354a412
>
> Best regards,

-- 
Jani Nikula, Intel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/drm_crtc: fix race with dma_fence_signal() in ::get_driver_name()
  2026-06-18 14:03 [PATCH] drm/drm_crtc: fix race with dma_fence_signal() in ::get_driver_name() André Draszik
  2026-06-18 14:16 ` sashiko-bot
  2026-06-18 14:53 ` Jani Nikula
@ 2026-06-18 15:56 ` Philipp Stanner
  2 siblings, 0 replies; 4+ messages in thread
From: Philipp Stanner @ 2026-06-18 15:56 UTC (permalink / raw)
  To: André Draszik, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
	Christian König, Tvrtko Ursulin, Boris Brezillon,
	Philipp Stanner, Danilo Krummrich
  Cc: dri-devel, linux-kernel, linux-media, linaro-mm-sig,
	Peter Griffin, Tudor Ambarus, Juan Yescas, kernel-team

+Cc Danilo

On Thu, 2026-06-18 at 15:03 +0100, André Draszik wrote:
> Since commit 541c8f2468b9 ("dma-buf: detach fence ops on signal v3"),
> I'm seeing the BUG_ON() triggering in drm_crtc's fence_to_crtc() via
> drm_crtc_fence_get_driver_name() regularly:
> 
>     Call trace:
>      panic+0x58/0x5c
>      die+0x160/0x178
>      bug_brk_handler+0x70/0xa4
>      call_el1_break_hook+0x3c/0x1a0
>      do_el1_brk64+0x24/0x74
>      el1_brk64+0x34/0x54
>      el1h_64_sync_handler+0x80/0xfc
>      el1h_64_sync+0x84/0x88
>      drm_crtc_fence_get_driver_name+0x60/0x68 (P)
>      sync_file_get_name+0x184/0x45c
>      sync_file_ioctl+0x404/0xf70
>      __arm64_sys_ioctl+0x124/0x1dc
> 
> This looks to be caused by a code flow similar to the following:
> 
> +++ snip +++
> thread A                             thread B
> 
>                                      ioctl(SYNC_IOC_FILE_INFO)
>                                      sync_file_ioctl()
>                                      sync_file_get_name()
> dma_fence_signal_timestamp_locked()  dma_fence_driver_name()
>                                        ops = rcu_dereference(fence->ops)
>                                        if (!dma_fence_test_signaled_flag())
>                                          ops->get_driver_name(fence) i.e.
>                                          drm_crtc_fence_get_driver_name()
> test_and_set_bit(SIGNALED)
> RCU_INIT_POINTER(fence->ops, NULL)
>                                      drm_crtc_fence_get_driver_name()
>                                        BUG_ON(rcu_access_pointer(fence->ops)
>                                               != &drm_crtc_fence_ops)

Now this looks like a very similar problem that I have recently been
concerned with:

https://lore.kernel.org/dri-devel/20260612104251.2264707-2-phasta@kernel.org/

https://lore.kernel.org/dri-devel/fa0dc9757bf8343516c4b156a2b70ec91b64ef8f.camel@mailbox.org/


I continue to believe because of bugs like this and the ones I have
quoted in the threads above the robustness of the kernel could be
greatly improved if we could get dma_fence fully synchronized with its
lock.


That said:

> +++ snap +++
> 
> I see two ways to resolve this:
> a) simply drop the BUG_ON(). It can not work anymore since above
>    commit, as it is racy now.
> b) pass the original 'ops' pointer obtained in dma_fence_driver_name()
>    to all callees.
> 
> This patch implements option a), as because:
> * I don't see much benefit in passing the extra pointer just for this
>   BUG_ON() to work.
> * Requiring the dma_fence_ops in those callbacks is an implementation
>   detail of the drm_crtc driver, and therefore upper layers shouldn't
>   have to care about that.
> * The existence of the BUG_ON() doesn't appear to be consistent with
>   implementations of ::get_driver_name() or ::get_timeline_name() in
>   the majority of other DRM drivers in the first place. Those that do
>   have a similar BUG_ON() (i915, xe) probably also need an update
>   similar to this patch here but I'm not in a position to test those.
> 
> Note that the adjacent drm_crtc_fence_get_timeline_name() has the same
> problem and is fixed by this patch as well.
> 
> Fixes: 541c8f2468b9 ("dma-buf: detach fence ops on signal v3")
> Signed-off-by: André Draszik <andre.draszik@linaro.org>
> ---
>  drivers/gpu/drm/drm_crtc.c | 11 +++--------
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
> index 63ead8ba6756..31c8636e7467 100644
> --- a/drivers/gpu/drm/drm_crtc.c
> +++ b/drivers/gpu/drm/drm_crtc.c
> @@ -73,6 +73,9 @@
>   * &drm_mode_config_funcs.atomic_check.
>   */
>  
> +#define fence_to_crtc(f) container_of((f)->extern_lock, \
> +				      struct drm_crtc, fence_lock)

I agree that macros should be avoided if possible.

> +
>  /**
>   * drm_crtc_from_index - find the registered CRTC at an index
>   * @dev: DRM device
> @@ -154,14 +157,6 @@ static void drm_crtc_crc_fini(struct drm_crtc *crtc)
>  #endif
>  }
>  
> -static const struct dma_fence_ops drm_crtc_fence_ops;
> -
> -static struct drm_crtc *fence_to_crtc(struct dma_fence *fence)
> -{
> -	BUG_ON(rcu_access_pointer(fence->ops) != &drm_crtc_fence_ops);

+1

BUG_ON is more or less deprecated and should not be used anymore. There
needs to be bombastic justification for shooting down the entire
kernel.


P.

> -	return container_of(fence->extern_lock, struct drm_crtc, fence_lock);
> -}
> -
>  static const char *drm_crtc_fence_get_driver_name(struct dma_fence *fence)
>  {
>  	struct drm_crtc *crtc = fence_to_crtc(fence);
> 
> ---
> base-commit: e2cae00c05d196491c318196792297f2dfbaa02c
> change-id: 20260618-linux-drm_crtc_fix2-23a7c354a412
> 
> Best regards,

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-18 15:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-18 14:03 [PATCH] drm/drm_crtc: fix race with dma_fence_signal() in ::get_driver_name() André Draszik
2026-06-18 14:16 ` sashiko-bot
2026-06-18 14:53 ` Jani Nikula
2026-06-18 15:56 ` Philipp Stanner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.