Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 6.10 01/34] drm/xe/preempt_fence: enlarge the fence critical section
@ 2024-07-28 15:40 Sasha Levin
  2024-07-28 15:40 ` [PATCH AUTOSEL 6.10 07/34] drm/xe/xe_guc_submit: Fix exec queue stop race condition Sasha Levin
  0 siblings, 1 reply; 2+ messages in thread
From: Sasha Levin @ 2024-07-28 15:40 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Matthew Auld, Matthew Brost, Sasha Levin, lucas.demarchi,
	thomas.hellstrom, rodrigo.vivi, maarten.lankhorst, mripard,
	tzimmermann, airlied, daniel, intel-xe, dri-devel

From: Matthew Auld <matthew.auld@intel.com>

[ Upstream commit 3cd1585e57908b6efcd967465ef7685f40b2a294 ]

It is really easy to introduce subtle deadlocks in
preempt_fence_work_func() since we operate on single global ordered-wq
for signalling our preempt fences behind the scenes, so even though we
signal a particular fence, everything in the callback should be in the
fence critical section, since blocking in the callback will prevent
other published fences from signalling. If we enlarge the fence critical
section to cover the entire callback, then lockdep should be able to
understand this better, and complain if we grab a sensitive lock like
vm->lock, which is also held when waiting on preempt fences.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240418144630.299531-2-matthew.auld@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/gpu/drm/xe/xe_preempt_fence.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_preempt_fence.c b/drivers/gpu/drm/xe/xe_preempt_fence.c
index 7d50c6e89d8e7..5b243b7feb59d 100644
--- a/drivers/gpu/drm/xe/xe_preempt_fence.c
+++ b/drivers/gpu/drm/xe/xe_preempt_fence.c
@@ -23,11 +23,19 @@ static void preempt_fence_work_func(struct work_struct *w)
 		q->ops->suspend_wait(q);
 
 	dma_fence_signal(&pfence->base);
-	dma_fence_end_signalling(cookie);
-
+	/*
+	 * Opt for keep everything in the fence critical section. This looks really strange since we
+	 * have just signalled the fence, however the preempt fences are all signalled via single
+	 * global ordered-wq, therefore anything that happens in this callback can easily block
+	 * progress on the entire wq, which itself may prevent other published preempt fences from
+	 * ever signalling.  Therefore try to keep everything here in the callback in the fence
+	 * critical section. For example if something below grabs a scary lock like vm->lock,
+	 * lockdep should complain since we also hold that lock whilst waiting on preempt fences to
+	 * complete.
+	 */
 	xe_vm_queue_rebind_worker(q->vm);
-
 	xe_exec_queue_put(q);
+	dma_fence_end_signalling(cookie);
 }
 
 static const char *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [PATCH AUTOSEL 6.10 07/34] drm/xe/xe_guc_submit: Fix exec queue stop race condition
  2024-07-28 15:40 [PATCH AUTOSEL 6.10 01/34] drm/xe/preempt_fence: enlarge the fence critical section Sasha Levin
@ 2024-07-28 15:40 ` Sasha Levin
  0 siblings, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2024-07-28 15:40 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jonathan Cavitt, Matthew Brost, Stuart Summers, Sasha Levin,
	lucas.demarchi, thomas.hellstrom, rodrigo.vivi, maarten.lankhorst,
	mripard, tzimmermann, airlied, daniel, intel-xe, dri-devel

From: Jonathan Cavitt <jonathan.cavitt@intel.com>

[ Upstream commit 1564d411e17f51e2f64655b4e4da015be1ba7eaa ]

Reorder the xe_sched_tdr_queue_imm and set_exec_queue_banned calls in
guc_exec_queue_stop.  This prevents a possible race condition between
the two events in which it's possible for xe_sched_tdr_queue_imm to
wake the ufence waiter before the exec queue is banned, causing the
ufence waiter to miss the banned state.

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240510194540.3246991-1-jonathan.cavitt@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index e4e3658e6a138..0f42971ff0a83 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1429,8 +1429,8 @@ static void guc_exec_queue_stop(struct xe_guc *guc, struct xe_exec_queue *q)
 			    !xe_sched_job_completed(job)) ||
 			    xe_sched_invalidate_job(job, 2)) {
 				trace_xe_sched_job_ban(job);
-				xe_sched_tdr_queue_imm(&q->guc->sched);
 				set_exec_queue_banned(q);
+				xe_sched_tdr_queue_imm(&q->guc->sched);
 			}
 		}
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-07-28 15:43 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-28 15:40 [PATCH AUTOSEL 6.10 01/34] drm/xe/preempt_fence: enlarge the fence critical section Sasha Levin
2024-07-28 15:40 ` [PATCH AUTOSEL 6.10 07/34] drm/xe/xe_guc_submit: Fix exec queue stop race condition Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox