On 10/22/2024 1:02 AM, Matthew Brost wrote:
On Mon, Oct 21, 2024 at 11:11:46PM +0200, Nirmoy Das wrote:
In case of parallel submissions multiple GuC id will point to the
same exec queue and on GT reset such exec queues will get restarted
multiple times which is not desirable.

Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2295
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 0b81972ff651..6aeb007eaf06 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1784,8 +1784,13 @@ int xe_guc_submit_start(struct xe_guc *guc)
 
 	mutex_lock(&guc->submission_state.lock);
 	atomic_dec(&guc->submission_state.stopped);
-	xa_for_each(&guc->submission_state.exec_queue_lookup, index, q)
+	xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) {
+		/* Skip restarting parallel queues */
+		if (exec_queue_enabled(q) && xe_exec_queue_is_parallel(q))
+			continue;
This doesn't look right as exec_queue_enabled can race here...

Ah right, just realized this happens async with the run_job



I think this should be...

if (q->guc->id != index)
	continue;


This looks much better. I will try it out and resend.



This way we only call guc_exec_queue_start once per queue parallel exec
queue. Also I think we need to add the same check to xe_guc_submit_stop.


I will resend with updated xe_guc_submit_stop()

thanks,

Nirmoy

Matt

+
 		guc_exec_queue_start(q);
+	}
 	mutex_unlock(&guc->submission_state.lock);
 
 	wake_up_all(&guc->ct.wq);
-- 
2.46.0