Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 00/13] Some (drm_sched_|dma_)fence lifetime issues
@ 2025-05-09 15:33 Tvrtko Ursulin
  2025-05-09 15:33 ` [RFC v2 01/13] drm/i915: Use provided dma_fence_is_chain Tvrtko Ursulin
                   ` (12 more replies)
  0 siblings, 13 replies; 37+ messages in thread
From: Tvrtko Ursulin @ 2025-05-09 15:33 UTC (permalink / raw)
  To: dri-devel
  Cc: Sumit Semwal, Gustavo Padovan, Christian König,
	Matthew Brost, Lucas De Marchi, Rodrigo Vivi, amd-gfx, intel-xe,
	intel-gfx, linux-media, linaro-mm-sig, kernel-dev, Tvrtko Ursulin

Hi all,

tl;dr;
Xe and probably some other drivers can tear down the internal state referenced
by an exported sync_file fence which then causes a null pointer derefences on
accessing said fence.

IGT that exploits the problem:
https://patchwork.freedesktop.org/patch/642709/?series=146211&rev=2

It seems there is a consensus this is a known problem with the dma-fence design,
where internal state shouldn't really be accessed after the fence has been
signaled. However currently the code is mostly unaware of that hence the use-
after-free potential.

To fix it, between the option of adding more reference counting and trying to
"revoke" the fence, suggestion is to focus on the later.

Reference to the recent discussion:
https://lore.kernel.org/dri-devel/20250418164246.72426-1-tvrtko.ursulin@igalia.com/

This series therefore attempts to implement a solution along those lines.

Most of the description what and how can be found in:
 "dma-fence: Add safe access helpers and document the rules"

Other than that, the series starts with some cleanups, with the general goal of
hiding more of the dma-fence implementation details behind explicit API. This
means adding helpers for access to driver and timeline name, and also moving as
much as it is easily possible of driver allocated state into the fence object
itself. Because dma-fence is already reference counted, any state we can embed
automatically becomes safe.

The moved state refers to the patches that move the 64bit seqno, dma-fence-array
and dma-fence-chain disambiguation into the fence. Again, it is better for as
much of the state to be directly embedded. And since we have plenty of free
flags, the parts I moved are all for free. No increase in struct dma_fence size.

Having said that, the series only addreses the runtime use-after-free scenarios,
such as the above explained situation with the xe driver. For now the module
unload problem is deliberately left for later. (Although again, some of the
early patches do make it safer, and will make future improvements easier due
fewer accesses to fence->ops.)

Final patch in the series is the one which makes xe compliant with the rules
and API proposed earlier in the series. It does so by ensuring there is at least
one RCU grace period between fences being signaled and driver allocated memory
accessible from xe fences getting freed. Which couples with the earlier (from
the series) refactors which added dma_fence_access_begin/end() protection to
the relevant call sites.

If this approach is acceptable the next steps will be to see if any other
drivers will need similar changes for which there are asserts in the new
dma-fence API which will help catch them.

And also to discuss whether we want to go a step futher and later move to SRCU,
so code would be protected against module unload as well.

v2:
 * Dropped module unload handling.
 * Proposing real API instead of hacks.

Tvrtko Ursulin (13):
  drm/i915: Use provided dma_fence_is_chain
  dma-fence: Change signature of __dma_fence_is_later
  dma-fence: Use a flag for 64-bit seqnos
  dma-fence: Move array and chain checks to flags
  dma-fence: Add helpers for accessing driver and timeline name
  dma-fence: Use driver and timeline name helpers internally
  sync_file: Use dma-fence driver and timeline name helpers
  drm/amdgpu: Use dma-fence driver and timeline name helpers
  drm/i915: Use dma-fence driver and timeline name helpers
  dma-fence: Add safe access helpers and document the rules
  sync_file: Protect access to driver and timeline name
  drm/i915: Protect access to driver and timeline name
  drm/xe: Make dma-fences compliant with the safe access rules

 drivers/dma-buf/dma-fence-array.c             |  2 +-
 drivers/dma-buf/dma-fence-chain.c             |  9 +-
 drivers/dma-buf/dma-fence.c                   | 87 ++++++++++++++++++-
 drivers/dma-buf/sw_sync.c                     |  2 +-
 drivers/dma-buf/sync_file.c                   | 14 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h     |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_vm_tlb_fence.c  |  5 +-
 drivers/gpu/drm/i915/gem/i915_gem_wait.c      |  7 +-
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   |  6 +-
 drivers/gpu/drm/i915/i915_request.c           |  5 +-
 drivers/gpu/drm/i915/i915_sw_fence.c          |  6 +-
 drivers/gpu/drm/xe/xe_guc_exec_queue_types.h  |  2 +
 drivers/gpu/drm/xe/xe_guc_submit.c            |  7 +-
 drivers/gpu/drm/xe/xe_hw_fence.c              |  5 +-
 drivers/gpu/drm/xe/xe_sched_job.c             | 14 +--
 include/linux/dma-fence.h                     | 48 +++++++---
 include/trace/events/dma_fence.h              |  4 +-
 17 files changed, 174 insertions(+), 51 deletions(-)

-- 
2.48.0


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2025-05-14 15:38 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-09 15:33 [RFC v2 00/13] Some (drm_sched_|dma_)fence lifetime issues Tvrtko Ursulin
2025-05-09 15:33 ` [RFC v2 01/13] drm/i915: Use provided dma_fence_is_chain Tvrtko Ursulin
2025-05-09 15:47   ` Matthew Brost
2025-05-12  8:05     ` Christian König
2025-05-12  8:11       ` Tvrtko Ursulin
2025-05-09 15:33 ` [RFC v2 02/13] dma-fence: Change signature of __dma_fence_is_later Tvrtko Ursulin
2025-05-12  8:13   ` Christian König
2025-05-09 15:33 ` [RFC v2 03/13] dma-fence: Use a flag for 64-bit seqnos Tvrtko Ursulin
2025-05-12  8:12   ` Tvrtko Ursulin
2025-05-12  8:17   ` Christian König
2025-05-12  9:20     ` Tvrtko Ursulin
2025-05-09 15:33 ` [RFC v2 04/13] dma-fence: Move array and chain checks to flags Tvrtko Ursulin
2025-05-12  8:19   ` Christian König
2025-05-12  9:14     ` Tvrtko Ursulin
2025-05-12 17:57       ` Christian König
2025-05-09 15:33 ` [RFC v2 05/13] dma-fence: Add helpers for accessing driver and timeline name Tvrtko Ursulin
2025-05-12  8:20   ` Christian König
2025-05-09 15:33 ` [RFC v2 06/13] dma-fence: Use driver and timeline name helpers internally Tvrtko Ursulin
2025-05-12  8:22   ` Christian König
2025-05-12  9:05     ` Tvrtko Ursulin
2025-05-09 15:33 ` [RFC v2 07/13] sync_file: Use dma-fence driver and timeline name helpers Tvrtko Ursulin
2025-05-12  8:25   ` Christian König
2025-05-09 15:33 ` [RFC v2 08/13] drm/amdgpu: " Tvrtko Ursulin
2025-05-12  8:27   ` Christian König
2025-05-12  9:07     ` Tvrtko Ursulin
2025-05-09 15:33 ` [RFC v2 09/13] drm/i915: " Tvrtko Ursulin
2025-05-12  8:28   ` Christian König
2025-05-09 15:33 ` [RFC v2 10/13] dma-fence: Add safe access helpers and document the rules Tvrtko Ursulin
2025-05-13 14:16   ` Rob Clark
2025-05-14 10:01     ` Tvrtko Ursulin
2025-05-14 13:57       ` Rob Clark
2025-05-14 14:58         ` Tvrtko Ursulin
2025-05-14 15:38           ` Rob Clark
2025-05-09 15:33 ` [RFC v2 11/13] sync_file: Protect access to driver and timeline name Tvrtko Ursulin
2025-05-09 15:33 ` [RFC v2 12/13] drm/i915: " Tvrtko Ursulin
2025-05-09 15:33 ` [RFC v2 13/13] drm/xe: Make dma-fences compliant with the safe access rules Tvrtko Ursulin
2025-05-12 14:11   ` Matthew Brost

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox