Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915/gt: Fix potential UAF by revoke of fence registers
@ 2024-06-03 19:54 Janusz Krzysztofik
  2024-06-03 20:42 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
                   ` (15 more replies)
  0 siblings, 16 replies; 20+ messages in thread
From: Janusz Krzysztofik @ 2024-06-03 19:54 UTC (permalink / raw)
  To: intel-gfx
  Cc: dri-devel, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi,
	Tvrtko Ursulin, Chris Wilson, Matthew Auld, Andi Shyti,
	Nirmoy Das, Jonathan Cavitt, Janusz Krzysztofik

CI has been sporadically reporting the following issue triggered by
igt@i915_selftest@live@hangcheck on ADL-P and similar machines:

<6> [414.049203] i915: Running intel_hangcheck_live_selftests/igt_reset_evict_fence
...
<6> [414.068804] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
<6> [414.068812] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
<3> [414.070354] Unable to pin Y-tiled fence; err:-4
<3> [414.071282] i915_vma_revoke_fence:301 GEM_BUG_ON(!i915_active_is_idle(&fence->active))
...
<4>[  609.603992] ------------[ cut here ]------------
<2>[  609.603995] kernel BUG at drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c:301!
<4>[  609.604003] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
<4>[  609.604006] CPU: 0 PID: 268 Comm: kworker/u64:3 Tainted: G     U  W          6.9.0-CI_DRM_14785-g1ba62f8cea9c+ #1
<4>[  609.604008] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023
<4>[  609.604010] Workqueue: i915 __i915_gem_free_work [i915]
<4>[  609.604149] RIP: 0010:i915_vma_revoke_fence+0x187/0x1f0 [i915]
...
<4>[  609.604271] Call Trace:
<4>[  609.604273]  <TASK>
...
<4>[  609.604716]  __i915_vma_evict+0x2e9/0x550 [i915]
<4>[  609.604852]  __i915_vma_unbind+0x7c/0x160 [i915]
<4>[  609.604977]  force_unbind+0x24/0xa0 [i915]
<4>[  609.605098]  i915_vma_destroy+0x2f/0xa0 [i915]
<4>[  609.605210]  __i915_gem_object_pages_fini+0x51/0x2f0 [i915]
<4>[  609.605330]  __i915_gem_free_objects.isra.0+0x6a/0xc0 [i915]
<4>[  609.605440]  process_scheduled_works+0x351/0x690
...

In the past, there were similar failures reported by CI from other IGT
tests, observed on other platforms.

Before commit 63baf4f3d587 ("drm/i915/gt: Only wait for GPU activity
before unbinding a GGTT fence"), i915_vma_revoke_fence() was waiting for
idleness of vma->active via fence_update().   That commit introduced
vma->fence->active in order for the fence_update() to be able to wait
selectively on that one instead of vma->active since only idleness of
fence registers was needed.  But then, another commit 0d86ee35097a
("drm/i915/gt: Make fence revocation unequivocal") replaced the call to
fence_update() in i915_vma_revoke_fence() with only fence_write(), and
also added that GEM_BUG_ON(!i915_active_is_idle(&fence->active)) in front.
No justification was provided on why we might then expect idleness of
vma->fence->active without first waiting on it.

The issue can be potentially caused by a race among revocation of fence
registers on one side and sequential execution of signal callbacks invoked
on completion of a request that was using them on the other, still
processed in parallel to revocation of those fence registers.  Fix it by
waiting for idleness of vma->fence->active in i915_vma_revoke_fence().

Fixes: 0d86ee35097a ("drm/i915/gt: Make fence revocation unequivocal")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/10021
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: stable@vger.kernel.org # v5.8+
---
 drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
index 40371b8a9bbbd..93bc1cc1ee7e6 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
@@ -298,6 +298,7 @@ void i915_vma_revoke_fence(struct i915_vma *vma)
 		return;
 
 	GEM_BUG_ON(fence->vma != vma);
+	i915_active_wait(&fence->active);
 	GEM_BUG_ON(!i915_active_is_idle(&fence->active));
 	GEM_BUG_ON(atomic_read(&fence->pin_count));
 
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-06-19 10:55 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-03 19:54 [PATCH] drm/i915/gt: Fix potential UAF by revoke of fence registers Janusz Krzysztofik
2024-06-03 20:42 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2024-06-03 20:50 ` ✓ Fi.CI.BAT: success " Patchwork
2024-06-04  0:48 ` [PATCH] " Andi Shyti
2024-06-04 15:27   ` Janusz Krzysztofik
2024-06-06  2:33     ` Andi Shyti
2024-06-04 10:01 ` ✗ Fi.CI.IGT: failure for " Patchwork
2024-06-04 14:44   ` Janusz Krzysztofik
2024-06-06  5:07 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gt: Fix potential UAF by revoke of fence registers (rev2) Patchwork
2024-06-06  5:14 ` ✓ Fi.CI.BAT: success " Patchwork
2024-06-06  9:16 ` ✗ Fi.CI.IGT: failure " Patchwork
2024-06-06 16:05 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gt: Fix potential UAF by revoke of fence registers (rev3) Patchwork
2024-06-06 16:15 ` ✓ Fi.CI.BAT: success " Patchwork
2024-06-07  4:44 ` ✗ Fi.CI.IGT: failure " Patchwork
2024-06-17 15:22 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gt: Fix potential UAF by revoke of fence registers (rev4) Patchwork
2024-06-17 15:30 ` ✗ Fi.CI.BAT: failure " Patchwork
2024-06-17 19:01 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gt: Fix potential UAF by revoke of fence registers (rev5) Patchwork
2024-06-17 19:10 ` ✓ Fi.CI.BAT: success " Patchwork
2024-06-18  8:34 ` ✗ Fi.CI.IGT: failure " Patchwork
2024-06-19 10:54 ` [PATCH] drm/i915/gt: Fix potential UAF by revoke of fence registers Andi Shyti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox