Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-xe] [PATCH v2 1/2] drm/xe/guc_submit: prevent repeated unregister
@ 2023-08-03 17:38 Matthew Auld
  2023-08-03 17:38 ` [Intel-xe] [PATCH v2 2/2] drm/xe/guc_submit: fixup deregister in job timeout Matthew Auld
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Matthew Auld @ 2023-08-03 17:38 UTC (permalink / raw)
  To: intel-xe

It seems that various things can trigger the lr cleanup worker,
including CAT error, engine reset and destroying the actual engine, so
seems plausible to end up triggering the worker more than once in some
cases. If that does happen we can race with an ongoing engine deregister
before it has completed, thus triggering it again and also changing the
state back into pending_disable. Checking if the engine has been marked
as destroyed looks like it should prevent this.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 193362518a62..b88bfe7d8470 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -802,8 +802,18 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
 	/* Kill the run_job / process_msg entry points */
 	drm_sched_run_wq_stop(sched);
 
-	/* Engine state now stable, disable scheduling / deregister if needed */
-	if (exec_queue_registered(q)) {
+	/*
+	 * Engine state now mostly stable, disable scheduling / deregister if
+	 * needed. This cleanup routine might be called multiple times, where
+	 * the actual async engine deregister drops the final engine ref.
+	 * Calling disable_scheduling_deregister will mark the engine as
+	 * destroyed and fire off the CT requests to disable scheduling /
+	 * deregister, which we only want to do once. We also don't want to mark
+	 * the engine as pending_disable again as this may race with the
+	 * xe_guc_deregister_done_handler() which treats it as an unexpected
+	 * state.
+	 */
+	if (exec_queue_registered(q) && !exec_queue_destroyed(q)) {
 		struct xe_guc *guc = exec_queue_to_guc(q);
 		int ret;
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-08-07 22:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-03 17:38 [Intel-xe] [PATCH v2 1/2] drm/xe/guc_submit: prevent repeated unregister Matthew Auld
2023-08-03 17:38 ` [Intel-xe] [PATCH v2 2/2] drm/xe/guc_submit: fixup deregister in job timeout Matthew Auld
2023-08-03 18:32   ` Matthew Brost
2023-08-04  8:48     ` Matthew Auld
2023-08-04 13:37       ` Matthew Brost
2023-08-04 15:03         ` Matthew Auld
2023-08-07 22:18           ` Matthew Brost
2023-08-03 17:41 ` [Intel-xe] ✓ CI.Patch_applied: success for series starting with [v2,1/2] drm/xe/guc_submit: prevent repeated unregister Patchwork
2023-08-03 17:41 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-08-03 17:42 ` [Intel-xe] ✓ CI.KUnit: success " Patchwork
2023-08-03 17:46 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-08-03 17:46 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-08-03 17:47 ` [Intel-xe] ✗ CI.checksparse: warning " Patchwork
2023-08-04  8:31 ` [Intel-xe] ○ CI.BAT: info " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox