[PATCH 00/16] drm/xe: Multi Queue feature support

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/16] drm/xe: Multi Queue feature support
@ 2025-10-31 18:29 Niranjana Vishwanathapura
  2025-10-31 18:29 ` [PATCH 01/16] drm/xe/multi_queue: Add multi_queue_enable_mask to gt information Niranjana Vishwanathapura
                   ` (20 more replies)
  0 siblings, 21 replies; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-10-31 18:29 UTC (permalink / raw)
  To: intel-xe

Multi Queue is a new mode of execution supported by the compute and
blitter copy command streamers (CCS and BCS, respectively). It is an
enhancement of the existing hardware architecture and leverages the
same submission model. It enables support for efficient, parallel
execution of multiple queues within a single context.

Add support for multi-queue feature and enable it on xe3p_xpc.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Niranjana Vishwanathapura (16):
  drm/xe/multi_queue: Add multi_queue_enable_mask to gt information
  drm/xe/multi_queue: Add user interface for multi queue support
  drm/xe/multi_queue: Add GuC interface for multi queue support
  drm/xe/multi_queue: Add multi queue priority property
  drm/xe/multi_queue: Handle invalid exec queue property setting
  drm/xe/multi_queue: Add exec_queue set_property ioctl support
  drm/xe/multi_queue: Add support for multi queue dynamic priority
    change
  drm/xe/multi_queue: Add multi queue information to guc_info dump
  drm/xe/multi_queue: Handle tearing down of a multi queue
  drm/xe/multi_queue: Set QUEUE_DRAIN_MODE for Multi Queue batches
  drm/xe/multi_queue: Handle CGP context error
  drm/xe/multi_queue: Tracepoint support
  drm/xe/multi_queue: Support active group after primary is destroyed
  drm/xe/doc: Add documentation for Multi Queue Group
  drm/xe/doc: Add documentation for Multi Queue Group GuC interface
  drm/xe/multi_queue: Enable multi_queue on xe3p_xpc

 Documentation/gpu/xe/xe_exec_queue.rst        |  14 +
 drivers/gpu/drm/xe/abi/guc_actions_abi.h      |   4 +
 .../gpu/drm/xe/instructions/xe_gpu_commands.h |   1 +
 drivers/gpu/drm/xe/xe_debugfs.c               |   2 +
 drivers/gpu/drm/xe/xe_device.c                |   9 +-
 drivers/gpu/drm/xe/xe_exec_queue.c            | 414 +++++++++++-
 drivers/gpu/drm/xe/xe_exec_queue.h            |  51 ++
 drivers/gpu/drm/xe/xe_exec_queue_types.h      |  51 ++
 drivers/gpu/drm/xe/xe_gt_types.h              |   5 +
 drivers/gpu/drm/xe/xe_guc_ct.c                |   8 +
 drivers/gpu/drm/xe/xe_guc_fwif.h              |   3 +
 drivers/gpu/drm/xe/xe_guc_submit.c            | 612 ++++++++++++++++--
 drivers/gpu/drm/xe/xe_guc_submit.h            |   3 +
 drivers/gpu/drm/xe/xe_guc_submit_types.h      |  13 +
 drivers/gpu/drm/xe/xe_lrc.c                   |  32 +
 drivers/gpu/drm/xe/xe_lrc.h                   |   5 +
 drivers/gpu/drm/xe/xe_pci.c                   |   2 +
 drivers/gpu/drm/xe/xe_pci_types.h             |   1 +
 drivers/gpu/drm/xe/xe_ring_ops.c              |  68 +-
 drivers/gpu/drm/xe/xe_trace.h                 |  46 ++
 include/uapi/drm/xe_drm.h                     |  40 ++
 21 files changed, 1276 insertions(+), 108 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 01/16] drm/xe/multi_queue: Add multi_queue_enable_mask to gt information
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
@ 2025-10-31 18:29 ` Niranjana Vishwanathapura
  2025-11-02  0:01   ` Matthew Brost
  2025-10-31 18:29 ` [PATCH 02/16] drm/xe/multi_queue: Add user interface for multi queue support Niranjana Vishwanathapura
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-10-31 18:29 UTC (permalink / raw)
  To: intel-xe

Add multi_queue_enable_mask field to the gt information structure
which is bitmask of all engine classes with multi queue support
enabled.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/xe/xe_debugfs.c   | 2 ++
 drivers/gpu/drm/xe/xe_gt_types.h  | 5 +++++
 drivers/gpu/drm/xe/xe_pci.c       | 1 +
 drivers/gpu/drm/xe/xe_pci_types.h | 1 +
 4 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
index e91da9589c5f..34460d7ef71c 100644
--- a/drivers/gpu/drm/xe/xe_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_debugfs.c
@@ -93,6 +93,8 @@ static int info(struct seq_file *m, void *data)
 			   xe_force_wake_ref(gt_to_fw(gt), XE_FW_GT));
 		drm_printf(&p, "gt%d engine_mask 0x%llx\n", id,
 			   gt->info.engine_mask);
+		drm_printf(&p, "gt%d multi_queue_enable_mask 0x%x\n", id,
+			   gt->info.multi_queue_enable_mask);
 	}
 
 	xe_pm_runtime_put(xe);
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index 0b525643a048..4a18bf772b22 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -140,6 +140,11 @@ struct xe_gt {
 		u64 engine_mask;
 		/** @info.gmdid: raw GMD_ID value from hardware */
 		u32 gmdid;
+		/**
+		 * @multi_queue_enable_mask: Bitmask of engine classes with
+		 * multi queue support enabled.
+		 */
+		u16 multi_queue_enable_mask;
 		/** @info.id: Unique ID of this GT within the PCI Device */
 		u8 id;
 		/** @info.has_indirect_ring_state: GT has indirect ring state support */
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 6e59642e7820..b5eaf0fc105c 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -754,6 +754,7 @@ static struct xe_gt *alloc_primary_gt(struct xe_tile *tile,
 	gt->info.type = XE_GT_TYPE_MAIN;
 	gt->info.id = tile->id * xe->info.max_gt_per_tile;
 	gt->info.has_indirect_ring_state = graphics_desc->has_indirect_ring_state;
+	gt->info.multi_queue_enable_mask = graphics_desc->multi_queue_enable_mask;
 	gt->info.engine_mask = graphics_desc->hw_engine_mask;
 
 	/*
diff --git a/drivers/gpu/drm/xe/xe_pci_types.h b/drivers/gpu/drm/xe/xe_pci_types.h
index 9892c063a9c5..77e09a53da64 100644
--- a/drivers/gpu/drm/xe/xe_pci_types.h
+++ b/drivers/gpu/drm/xe/xe_pci_types.h
@@ -58,6 +58,7 @@ struct xe_device_desc {
 
 struct xe_graphics_desc {
 	u64 hw_engine_mask;	/* hardware engines provided by graphics IP */
+	u16 multi_queue_enable_mask; /* bitmask of engine classes which support multi queue */
 
 	u8 has_asid:1;
 	u8 has_atomic_enable_pte_bit:1;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 02/16] drm/xe/multi_queue: Add user interface for multi queue support
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
  2025-10-31 18:29 ` [PATCH 01/16] drm/xe/multi_queue: Add multi_queue_enable_mask to gt information Niranjana Vishwanathapura
@ 2025-10-31 18:29 ` Niranjana Vishwanathapura
  2025-10-31 19:31   ` Matthew Brost
                     ` (2 more replies)
  2025-10-31 18:29 ` [PATCH 03/16] drm/xe/multi_queue: Add GuC " Niranjana Vishwanathapura
                   ` (18 subsequent siblings)
  20 siblings, 3 replies; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-10-31 18:29 UTC (permalink / raw)
  To: intel-xe

Multi Queue is a new mode of execution supported by the compute and
blitter copy command streamers (CCS and BCS, respectively). It is an
enhancement of the existing hardware architecture and leverages the
same submission model. It enables support for efficient, parallel
execution of multiple queues within a single context. All the queues
of a group must use the same address space (VM).

The new DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE execution queue
property supports creating a multi queue group and adding queues to
a queue group. All queues of a multi queue group share the same
context.

A exec queue create ioctl call with above property specified with value
DRM_XE_SUPER_GROUP_CREATE will create a new multi queue group with the
queue being created as the primary queue (aka q0) of the group. To add
secondary queues to the group, they need to be created with the above
property with id of the primary queue as the value. The properties of
the primary queue (like priority, timeslice) applies to the whole group.
So, these properties can't be set for secondary queues of a group.

Once destroyed, the secondary queues of a multi queue group can't be
replaced. However, they can be dynamically added to the group up to a
total of 64 queues per group. Once the primary queue is destroyed,
secondary queues can't be added to the queue group.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue.c       | 191 ++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_exec_queue.h       |  47 ++++++
 drivers/gpu/drm/xe/xe_exec_queue_types.h |  30 ++++
 include/uapi/drm/xe_drm.h                |   8 +
 4 files changed, 274 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 1b57d7c2cc94..86404a7c9fe4 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -12,6 +12,7 @@
 #include <drm/drm_file.h>
 #include <uapi/drm/xe_drm.h>
 
+#include "xe_bo.h"
 #include "xe_dep_scheduler.h"
 #include "xe_device.h"
 #include "xe_gt.h"
@@ -62,6 +63,32 @@ enum xe_exec_queue_sched_prop {
 static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q,
 				      u64 extensions, int ext_number);
 
+static void xe_exec_queue_group_cleanup(struct xe_exec_queue *q)
+{
+	struct xe_exec_queue_group *group = q->multi_queue.group;
+	struct xe_lrc *lrc;
+	unsigned long idx;
+
+	if (xe_exec_queue_is_multi_queue_secondary(q)) {
+		xe_exec_queue_put(xe_exec_queue_multi_queue_primary(q));
+		return;
+	}
+
+	if (!group)
+		return;
+
+	/* Primary queue cleanup */
+	mutex_lock(&group->lock);
+	xa_for_each(&group->xa, idx, lrc)
+		xe_lrc_put(lrc);
+	mutex_unlock(&group->lock);
+
+	xa_destroy(&group->xa);
+	mutex_destroy(&group->lock);
+	xe_bo_unpin_map_no_vm(group->cgp_bo);
+	kfree(group);
+}
+
 static void __xe_exec_queue_free(struct xe_exec_queue *q)
 {
 	int i;
@@ -72,6 +99,10 @@ static void __xe_exec_queue_free(struct xe_exec_queue *q)
 
 	if (xe_exec_queue_uses_pxp(q))
 		xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp, q);
+
+	if (xe_exec_queue_is_multi_queue(q))
+		xe_exec_queue_group_cleanup(q);
+
 	if (q->vm)
 		xe_vm_put(q->vm);
 
@@ -549,6 +580,148 @@ exec_queue_set_pxp_type(struct xe_device *xe, struct xe_exec_queue *q, u64 value
 	return xe_pxp_exec_queue_set_type(xe->pxp, q, DRM_XE_PXP_TYPE_HWDRM);
 }
 
+static int xe_exec_queue_group_init(struct xe_device *xe, struct xe_exec_queue *q)
+{
+	struct xe_tile *tile = gt_to_tile(q->gt);
+	struct xe_exec_queue_group *group;
+	struct xe_bo *bo;
+
+	group = kzalloc(sizeof(*group), GFP_KERNEL);
+	if (!group)
+		return -ENOMEM;
+
+	bo = xe_bo_create_pin_map_novm(xe, tile, SZ_4K, ttm_bo_type_kernel,
+				       XE_BO_FLAG_VRAM_IF_DGFX(tile) |
+				       XE_BO_FLAG_GGTT, false);
+	if (IS_ERR(bo)) {
+		drm_err(&xe->drm, "CGP bo allocation for queue group failed: %ld\n",
+			PTR_ERR(bo));
+		kfree(group);
+		return PTR_ERR(bo);
+	}
+
+	xe_map_memset(xe, &bo->vmap, 0, 0, SZ_4K);
+
+	group->primary = q;
+	group->cgp_bo = bo;
+	xa_init_flags(&group->xa, XA_FLAGS_ALLOC1);
+	mutex_init(&group->lock);
+	mutex_init(&group->list_lock);
+	q->multi_queue.group = group;
+
+	return 0;
+}
+
+static inline bool xe_exec_queue_supports_multi_queue(struct xe_exec_queue *q)
+{
+	return q->gt->info.multi_queue_enable_mask & BIT(q->class);
+}
+
+static int xe_exec_queue_group_validate(struct xe_device *xe, struct xe_exec_queue *q,
+					u32 primary_id)
+{
+	struct xe_exec_queue_group *group;
+	struct xe_exec_queue *primary;
+	int ret;
+
+	primary = xe_exec_queue_lookup(q->vm->xef, primary_id);
+	if (XE_IOCTL_DBG(xe, !primary))
+		return -ENOENT;
+
+	if (XE_IOCTL_DBG(xe, !xe_exec_queue_is_multi_queue_primary(primary)) ||
+	    XE_IOCTL_DBG(xe, q->vm != primary->vm) ||
+	    XE_IOCTL_DBG(xe, q->logical_mask != primary->logical_mask)) {
+		ret = -EINVAL;
+		goto put_primary;
+	}
+
+	group = primary->multi_queue.group;
+	q->multi_queue.valid = true;
+	q->multi_queue.group = group;
+
+	return 0;
+put_primary:
+	xe_exec_queue_put(primary);
+	return ret;
+}
+
+#define XE_MAX_GROUP_SIZE	64
+static int xe_exec_queue_group_add(struct xe_device *xe, struct xe_exec_queue *q)
+{
+	struct xe_exec_queue_group *group = q->multi_queue.group;
+	u32 pos;
+	int err;
+
+	if (!xe_exec_queue_is_multi_queue_secondary(q))
+		return 0;
+
+	mutex_lock(&group->lock);
+	err = xa_alloc(&group->xa, &pos, xe_lrc_get(q->lrc[0]),
+		       XA_LIMIT(1, XE_MAX_GROUP_SIZE - 1), GFP_KERNEL);
+	if (XE_IOCTL_DBG(xe, err)) {
+		xe_lrc_put(q->lrc[0]);
+		mutex_unlock(&group->lock);
+
+		/* It is invalid if queue group limit is exceeded */
+		if (err == -EBUSY)
+			err = -EINVAL;
+
+		return err;
+	}
+
+	q->multi_queue.pos = pos;
+	mutex_unlock(&group->lock);
+
+	return 0;
+}
+
+static void xe_exec_queue_group_delete(struct xe_exec_queue *q)
+{
+	struct xe_exec_queue_group *group = q->multi_queue.group;
+	struct xe_lrc *lrc;
+
+	if (!xe_exec_queue_is_multi_queue_secondary(q))
+		return;
+
+	mutex_lock(&group->lock);
+	lrc = xa_erase(&group->xa, q->multi_queue.pos);
+	if (lrc)
+		xe_lrc_put(lrc);
+	mutex_unlock(&group->lock);
+}
+
+static int exec_queue_set_multi_group(struct xe_device *xe, struct xe_exec_queue *q,
+				      u64 value)
+{
+	if (XE_IOCTL_DBG(xe, !xe_exec_queue_supports_multi_queue(q)))
+		return -ENODEV;
+
+	if (XE_IOCTL_DBG(xe, !xe_device_uc_enabled(xe)))
+		return -EOPNOTSUPP;
+
+	if (XE_IOCTL_DBG(xe, xe_exec_queue_is_parallel(q)))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, xe_exec_queue_is_multi_queue(q)))
+		return -EINVAL;
+
+	if (value & DRM_XE_MULTI_GROUP_CREATE) {
+		if (XE_IOCTL_DBG(xe, value & ~DRM_XE_MULTI_GROUP_CREATE))
+			return -EINVAL;
+
+		q->multi_queue.valid = true;
+		q->multi_queue.is_primary = true;
+		q->multi_queue.pos = 0;
+		return 0;
+	}
+
+	/* While adding secondary queues, the upper 32 bits must be 0 */
+	if (XE_IOCTL_DBG(xe, value & (~0ull << 32)))
+		return -EINVAL;
+
+	return xe_exec_queue_group_validate(xe, q, value);
+}
+
 typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
 					     struct xe_exec_queue *q,
 					     u64 value);
@@ -557,6 +730,7 @@ static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
 	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY] = exec_queue_set_priority,
 	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE] = exec_queue_set_timeslice,
 	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE] = exec_queue_set_pxp_type,
+	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP] = exec_queue_set_multi_group,
 };
 
 static int exec_queue_user_ext_set_property(struct xe_device *xe,
@@ -577,7 +751,8 @@ static int exec_queue_user_ext_set_property(struct xe_device *xe,
 	    XE_IOCTL_DBG(xe, ext.pad) ||
 	    XE_IOCTL_DBG(xe, ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY &&
 			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE &&
-			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE))
+			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE &&
+			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP))
 		return -EINVAL;
 
 	idx = array_index_nospec(ext.property, ARRAY_SIZE(exec_queue_set_property_funcs));
@@ -626,6 +801,12 @@ static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue
 		return exec_queue_user_extensions(xe, q, ext.next_extension,
 						  ++ext_number);
 
+	if (xe_exec_queue_is_multi_queue_primary(q)) {
+		err = xe_exec_queue_group_init(xe, q);
+		if (XE_IOCTL_DBG(xe, err))
+			return err;
+	}
+
 	return 0;
 }
 
@@ -780,12 +961,16 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 		if (IS_ERR(q))
 			return PTR_ERR(q);
 
+		err = xe_exec_queue_group_add(xe, q);
+		if (XE_IOCTL_DBG(xe, err))
+			goto put_exec_queue;
+
 		if (xe_vm_in_preempt_fence_mode(vm)) {
 			q->lr.context = dma_fence_context_alloc(1);
 
 			err = xe_vm_add_compute_exec_queue(vm, q);
 			if (XE_IOCTL_DBG(xe, err))
-				goto put_exec_queue;
+				goto delete_queue_group;
 		}
 
 		if (q->vm && q->hwe->hw_engine_group) {
@@ -808,6 +993,8 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
 
 kill_exec_queue:
 	xe_exec_queue_kill(q);
+delete_queue_group:
+	xe_exec_queue_group_delete(q);
 put_exec_queue:
 	xe_exec_queue_put(q);
 	return err;
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
index a4dfbe858bda..8cd6487018fa 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue.h
@@ -62,6 +62,53 @@ static inline bool xe_exec_queue_uses_pxp(struct xe_exec_queue *q)
 	return q->pxp.type;
 }
 
+/**
+ * xe_exec_queue_is_multi_queue() - Whether an exec_queue is part of a queue group.
+ * @q: The exec_queue
+ *
+ * Return: True if the exec_queue is part of a queue group, false otherwise.
+ */
+static inline bool xe_exec_queue_is_multi_queue(struct xe_exec_queue *q)
+{
+	return q->multi_queue.valid;
+}
+
+/**
+ * xe_exec_queue_is_multi_queue_primary() - Whether an exec_queue is primary queue
+ * of a multi queue group.
+ * @q: The exec_queue
+ *
+ * Return: True if @q is primary queue of a queue group, false otherwise.
+ */
+static inline bool xe_exec_queue_is_multi_queue_primary(struct xe_exec_queue *q)
+{
+	return q->multi_queue.is_primary;
+}
+
+/**
+ * xe_exec_queue_is_multi_queue_secondary() - Whether an exec_queue is secondary queue
+ * of a multi queue group.
+ * @q: The exec_queue
+ *
+ * Return: True if @q is secondary queue of a queue group, false otherwise.
+ */
+static inline bool xe_exec_queue_is_multi_queue_secondary(struct xe_exec_queue *q)
+{
+	return xe_exec_queue_is_multi_queue(q) && !q->multi_queue.is_primary;
+}
+
+/**
+ * xe_exec_queue_multi_queue_primary() - Get multi queue group's primary queue
+ * @q: The exec_queue
+ *
+ * If @q belongs to a multi queue group, then the primary queue of the group will
+ * be returned. Otherwise, @q will be returned.
+ */
+static inline struct xe_exec_queue *xe_exec_queue_multi_queue_primary(struct xe_exec_queue *q)
+{
+	return xe_exec_queue_is_multi_queue(q) ? q->multi_queue.group->primary : q;
+}
+
 bool xe_exec_queue_is_lr(struct xe_exec_queue *q);
 
 bool xe_exec_queue_is_idle(struct xe_exec_queue *q);
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index c8807268ec6c..3856776df5c4 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -31,6 +31,24 @@ enum xe_exec_queue_priority {
 	XE_EXEC_QUEUE_PRIORITY_COUNT
 };
 
+/**
+ * struct xe_exec_queue_group - Execution multi queue group
+ *
+ * Contains multi queue group information.
+ */
+struct xe_exec_queue_group {
+	/** @primary: Primary queue of this group */
+	struct xe_exec_queue *primary;
+	/** @lock: Queue group update lock */
+	struct mutex lock;
+	/** @cgp_bo: BO for the Context Group Page */
+	struct xe_bo *cgp_bo;
+	/** @xa: xarray to store LRCs */
+	struct xarray xa;
+	/** @list_lock: Secondary queue list lock */
+	struct mutex list_lock;
+};
+
 /**
  * struct xe_exec_queue - Execution queue
  *
@@ -110,6 +128,18 @@ struct xe_exec_queue {
 		struct xe_guc_exec_queue *guc;
 	};
 
+	/** @multi_queue: Multi queue information */
+	struct {
+		/** @multi_queue.group: Queue group information */
+		struct xe_exec_queue_group *group;
+		/** @multi_queue.pos: Position of queue within the multi-queue group */
+		u8 pos;
+		/** @multi_queue.valid: Queue belongs to a multi queue group */
+		u8 valid:1;
+		/** @multi_queue.is_primary: Is primary queue (Q0) of the group */
+		u8 is_primary:1;
+	} multi_queue;
+
 	/** @sched_props: scheduling properties */
 	struct {
 		/** @sched_props.timeslice_us: timeslice period in micro-seconds */
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 47853659a705..d903b3a55ec1 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -1252,6 +1252,12 @@ struct drm_xe_vm_bind {
  *    Given that going into a power-saving state kills PXP HWDRM sessions,
  *    runtime PM will be blocked while queues of this type are alive.
  *    All PXP queues will be killed if a PXP invalidation event occurs.
+ *  - %DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP - Create a multi-queue group
+ *    or add secondary queues to a multi-queue group.
+ *    If the extension's 'value' field has %DRM_XE_MULTI_GROUP_CREATE flag set,
+ *    then a new multi-queue group is created with this queue as the primary queue
+ *    (Q0). Otherwise, the queue gets added to the multi-queue group whose primary
+ *    queue id is specified in the 'value' field.
  *
  * The example below shows how to use @drm_xe_exec_queue_create to create
  * a simple exec_queue (no parallel submission) of class
@@ -1292,6 +1298,8 @@ struct drm_xe_exec_queue_create {
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY		0
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE		1
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE		2
+#define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP		3
+#define     DRM_XE_MULTI_GROUP_CREATE				(1ull << 63)
 	/** @extensions: Pointer to the first extension struct, if any */
 	__u64 extensions;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 03/16] drm/xe/multi_queue: Add GuC interface for multi queue support
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
  2025-10-31 18:29 ` [PATCH 01/16] drm/xe/multi_queue: Add multi_queue_enable_mask to gt information Niranjana Vishwanathapura
  2025-10-31 18:29 ` [PATCH 02/16] drm/xe/multi_queue: Add user interface for multi queue support Niranjana Vishwanathapura
@ 2025-10-31 18:29 ` Niranjana Vishwanathapura
  2025-11-01 18:07   ` Matthew Brost
  2025-11-02 18:02   ` Matthew Brost
  2025-10-31 18:29 ` [PATCH 04/16] drm/xe/multi_queue: Add multi queue priority property Niranjana Vishwanathapura
                   ` (17 subsequent siblings)
  20 siblings, 2 replies; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-10-31 18:29 UTC (permalink / raw)
  To: intel-xe

Implement GuC commands and response along with the Context
Group Page (CGP) interface for multi queue support.

Ensure that only primary queue (q0) of a multi queue group
communicate with GuC. The secondary queues of the group only
need to maintain LRCA and interface with drm scheduler.

Use primary queue's submit_wq for all secondary queues of a multi
queue group. This serialization avoids any locking around CGP
synchronization with GuC.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/xe/abi/guc_actions_abi.h |   3 +
 drivers/gpu/drm/xe/xe_exec_queue_types.h |   2 +
 drivers/gpu/drm/xe/xe_guc_ct.c           |   4 +
 drivers/gpu/drm/xe/xe_guc_fwif.h         |   3 +
 drivers/gpu/drm/xe/xe_guc_submit.c       | 302 +++++++++++++++++++----
 drivers/gpu/drm/xe/xe_guc_submit.h       |   1 +
 6 files changed, 270 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/xe/abi/guc_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
index 47756e4674a1..3e9fbed9cda6 100644
--- a/drivers/gpu/drm/xe/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
@@ -139,6 +139,9 @@ enum xe_guc_action {
 	XE_GUC_ACTION_DEREGISTER_G2G = 0x4508,
 	XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
 	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
+	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE = 0x4602,
+	XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC = 0x4603,
+	XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE = 0x4604,
 	XE_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507,
 	XE_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A,
 	XE_GUC_ACTION_SET_DEVICE_ENGINE_ACTIVITY_BUFFER = 0x550C,
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index 3856776df5c4..38e47b003259 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -47,6 +47,8 @@ struct xe_exec_queue_group {
 	struct xarray xa;
 	/** @list_lock: Secondary queue list lock */
 	struct mutex list_lock;
+	/** @sync_pending: CGP_SYNC_DONE g2h response pending */
+	bool sync_pending;
 };
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index e68953ef3a00..48b5006eb080 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -1304,6 +1304,7 @@ static int parse_g2h_event(struct xe_guc_ct *ct, u32 *msg, u32 len)
 	lockdep_assert_held(&ct->lock);
 
 	switch (action) {
+	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
 	case XE_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
 	case XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
 	case XE_GUC_ACTION_SCHED_ENGINE_MODE_DONE:
@@ -1570,6 +1571,9 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
 		ret = xe_guc_g2g_test_notification(guc, payload, adj_len);
 		break;
 #endif
+	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
+		ret = xe_guc_exec_queue_cgp_sync_done_handler(guc, payload, adj_len);
+		break;
 	default:
 		xe_gt_err(gt, "unexpected G2H action 0x%04x\n", action);
 	}
diff --git a/drivers/gpu/drm/xe/xe_guc_fwif.h b/drivers/gpu/drm/xe/xe_guc_fwif.h
index c90dd266e9cf..610dfb2f1cb5 100644
--- a/drivers/gpu/drm/xe/xe_guc_fwif.h
+++ b/drivers/gpu/drm/xe/xe_guc_fwif.h
@@ -16,6 +16,7 @@
 #define G2H_LEN_DW_DEREGISTER_CONTEXT		3
 #define G2H_LEN_DW_TLB_INVALIDATE		3
 #define G2H_LEN_DW_G2G_NOTIFY_MIN		3
+#define G2H_LEN_DW_MULTI_QUEUE_CONTEXT		4
 
 #define GUC_ID_MAX			65535
 #define GUC_ID_UNKNOWN			0xffffffff
@@ -62,6 +63,8 @@ struct guc_ctxt_registration_info {
 	u32 wq_base_lo;
 	u32 wq_base_hi;
 	u32 wq_size;
+	u32 cgp_lo;
+	u32 cgp_hi;
 	u32 hwlrca_lo;
 	u32 hwlrca_hi;
 };
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index d4ffdb71ef3d..d2aa9a2524e7 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -46,6 +46,7 @@
 #include "xe_trace.h"
 #include "xe_uc_fw.h"
 #include "xe_vm.h"
+#include "xe_bo.h"
 
 static struct xe_guc *
 exec_queue_to_guc(struct xe_exec_queue *q)
@@ -541,7 +542,8 @@ static void init_policies(struct xe_guc *guc, struct xe_exec_queue *q)
 	u32 slpc_exec_queue_freq_req = 0;
 	u32 preempt_timeout_us = q->sched_props.preempt_timeout_us;
 
-	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
+	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q) &&
+		     !xe_exec_queue_is_multi_queue_secondary(q));
 
 	if (q->flags & EXEC_QUEUE_FLAG_LOW_LATENCY)
 		slpc_exec_queue_freq_req |= SLPC_CTX_FREQ_REQ_IS_COMPUTE;
@@ -561,6 +563,8 @@ static void set_min_preemption_timeout(struct xe_guc *guc, struct xe_exec_queue
 {
 	struct exec_queue_policy policy;
 
+	xe_assert(guc_to_xe(guc), !xe_exec_queue_is_multi_queue_secondary(q));
+
 	__guc_exec_queue_policy_start_klv(&policy, q->guc->id);
 	__guc_exec_queue_policy_add_preemption_timeout(&policy, 1);
 
@@ -575,6 +579,130 @@ static void set_min_preemption_timeout(struct xe_guc *guc, struct xe_exec_queue
 	xe_map_wr_field(xe_, &map_, 0, struct guc_submit_parallel_scratch, \
 			field_, val_)
 
+#define CGP_VERSION_MAJOR_SHIFT	8
+
+static void xe_guc_exec_queue_group_cgp_update(struct xe_device *xe,
+					       struct xe_exec_queue *q)
+{
+	struct xe_exec_queue_group *group = q->multi_queue.group;
+	u32 guc_id = group->primary->guc->id;
+
+	/* Currently implementing CGP version 1.0 */
+	xe_map_wr(xe, &group->cgp_bo->vmap, 0, u32,
+		  1 << CGP_VERSION_MAJOR_SHIFT);
+
+	xe_map_wr(xe, &group->cgp_bo->vmap,
+		  (32 + q->multi_queue.pos * 2) * sizeof(u32),
+		  u32, lower_32_bits(xe_lrc_descriptor(q->lrc[0])));
+
+	xe_map_wr(xe, &group->cgp_bo->vmap,
+		  (33 + q->multi_queue.pos * 2) * sizeof(u32),
+		  u32, guc_id);
+
+	if (q->multi_queue.pos / 32) {
+		xe_map_wr(xe, &group->cgp_bo->vmap, 17 * sizeof(u32),
+			  u32, BIT(q->multi_queue.pos % 32));
+		xe_map_wr(xe, &group->cgp_bo->vmap, 16 * sizeof(u32), u32, 0);
+	} else {
+		xe_map_wr(xe, &group->cgp_bo->vmap, 16 * sizeof(u32),
+			  u32, BIT(q->multi_queue.pos));
+		xe_map_wr(xe, &group->cgp_bo->vmap, 17 * sizeof(u32), u32, 0);
+	}
+}
+
+static void xe_guc_exec_queue_group_cgp_sync(struct xe_guc *guc,
+					     struct xe_exec_queue *q,
+					     const u32 *action, u32 len)
+{
+	struct xe_exec_queue_group *group = q->multi_queue.group;
+	struct xe_device *xe = guc_to_xe(guc);
+	long ret;
+
+	/*
+	 * As all queues of a multi queue group use single drm scheduler
+	 * submit workqueue, CGP synchronization with GuC are serialized.
+	 * Hence, no locking is required here.
+	 * Wait for any pending CGP_SYNC_DONE response before updating the
+	 * CGP page and sending CGP_SYNC message.
+	 */
+	ret = wait_event_timeout(guc->ct.wq,
+				 !READ_ONCE(group->sync_pending) ||
+				 xe_guc_read_stopped(guc), HZ);
+	if (!ret || xe_guc_read_stopped(guc)) {
+		drm_err(&xe->drm, "Wait for CGP_SYNC_DONE response failed!\n");
+		/* Something wrong with the CTB or GuC, no need to proceed */
+		return;
+	}
+
+	xe_guc_exec_queue_group_cgp_update(xe, q);
+
+	WRITE_ONCE(group->sync_pending, true);
+	xe_guc_ct_send(&guc->ct, action, len, G2H_LEN_DW_MULTI_QUEUE_CONTEXT, 1);
+}
+
+static void __register_exec_queue(struct xe_guc *guc,
+				  struct guc_ctxt_registration_info *info)
+{
+	u32 action[] = {
+		XE_GUC_ACTION_REGISTER_CONTEXT,
+		info->flags,
+		info->context_idx,
+		info->engine_class,
+		info->engine_submit_mask,
+		info->wq_desc_lo,
+		info->wq_desc_hi,
+		info->wq_base_lo,
+		info->wq_base_hi,
+		info->wq_size,
+		info->hwlrca_lo,
+		info->hwlrca_hi,
+	};
+
+	/* explicitly checks some fields that we might fixup later */
+	xe_gt_assert(guc_to_gt(guc), info->wq_desc_lo ==
+		     action[XE_GUC_REGISTER_CONTEXT_DATA_5_WQ_DESC_ADDR_LOWER]);
+	xe_gt_assert(guc_to_gt(guc), info->wq_base_lo ==
+		     action[XE_GUC_REGISTER_CONTEXT_DATA_7_WQ_BUF_BASE_LOWER]);
+	xe_gt_assert(guc_to_gt(guc), info->hwlrca_lo ==
+		     action[XE_GUC_REGISTER_CONTEXT_DATA_10_HW_LRC_ADDR]);
+
+	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
+}
+
+static void __register_exec_queue_group(struct xe_guc *guc,
+					struct xe_exec_queue *q,
+					struct guc_ctxt_registration_info *info)
+{
+#define MAX_MULTI_QUEUE_REG_SIZE	(8)
+	struct xe_device *xe = guc_to_xe(guc);
+	u32 action[MAX_MULTI_QUEUE_REG_SIZE];
+	int len = 0;
+
+	if (xe_exec_queue_is_multi_queue_primary(q)) {
+		action[len++] = XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE;
+		action[len++] = info->flags;
+		action[len++] = info->context_idx;
+		action[len++] = info->engine_class;
+		action[len++] = info->engine_submit_mask;
+		action[len++] = 0; /* Reserved */
+		action[len++] = info->cgp_lo;
+		action[len++] = info->cgp_hi;
+	} else {
+		/*
+		 * No need to wait before CGP sync since CT descriptors
+		 * should be ordered.
+		 */
+
+		action[len++] = XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC;
+		action[len++] = q->multi_queue.group->primary->guc->id;
+	}
+
+	xe_assert(xe, len <= MAX_MULTI_QUEUE_REG_SIZE);
+#undef MAX_MULTI_QUEUE_REG_SIZE
+
+	xe_guc_exec_queue_group_cgp_sync(guc, q, action, len);
+}
+
 static void __register_mlrc_exec_queue(struct xe_guc *guc,
 				       struct xe_exec_queue *q,
 				       struct guc_ctxt_registration_info *info)
@@ -622,35 +750,6 @@ static void __register_mlrc_exec_queue(struct xe_guc *guc,
 	xe_guc_ct_send(&guc->ct, action, len, 0, 0);
 }
 
-static void __register_exec_queue(struct xe_guc *guc,
-				  struct guc_ctxt_registration_info *info)
-{
-	u32 action[] = {
-		XE_GUC_ACTION_REGISTER_CONTEXT,
-		info->flags,
-		info->context_idx,
-		info->engine_class,
-		info->engine_submit_mask,
-		info->wq_desc_lo,
-		info->wq_desc_hi,
-		info->wq_base_lo,
-		info->wq_base_hi,
-		info->wq_size,
-		info->hwlrca_lo,
-		info->hwlrca_hi,
-	};
-
-	/* explicitly checks some fields that we might fixup later */
-	xe_gt_assert(guc_to_gt(guc), info->wq_desc_lo ==
-		     action[XE_GUC_REGISTER_CONTEXT_DATA_5_WQ_DESC_ADDR_LOWER]);
-	xe_gt_assert(guc_to_gt(guc), info->wq_base_lo ==
-		     action[XE_GUC_REGISTER_CONTEXT_DATA_7_WQ_BUF_BASE_LOWER]);
-	xe_gt_assert(guc_to_gt(guc), info->hwlrca_lo ==
-		     action[XE_GUC_REGISTER_CONTEXT_DATA_10_HW_LRC_ADDR]);
-
-	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
-}
-
 static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
 {
 	struct xe_guc *guc = exec_queue_to_guc(q);
@@ -670,6 +769,13 @@ static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
 	info.flags = CONTEXT_REGISTRATION_FLAG_KMD |
 		FIELD_PREP(CONTEXT_REGISTRATION_FLAG_TYPE, ctx_type);
 
+	if (xe_exec_queue_is_multi_queue(q)) {
+		struct xe_exec_queue_group *group = q->multi_queue.group;
+
+		info.cgp_lo = xe_bo_ggtt_addr(group->cgp_bo);
+		info.cgp_hi = 0;
+	}
+
 	if (xe_exec_queue_is_parallel(q)) {
 		u64 ggtt_addr = xe_lrc_parallel_ggtt_addr(lrc);
 		struct iosys_map map = xe_lrc_parallel_map(lrc);
@@ -700,11 +806,15 @@ static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
 
 	set_exec_queue_registered(q);
 	trace_xe_exec_queue_register(q);
-	if (xe_exec_queue_is_parallel(q))
+	if (xe_exec_queue_is_multi_queue(q))
+		__register_exec_queue_group(guc, q, &info);
+	else if (xe_exec_queue_is_parallel(q))
 		__register_mlrc_exec_queue(guc, q, &info);
 	else
 		__register_exec_queue(guc, &info);
-	init_policies(guc, q);
+
+	if (!xe_exec_queue_is_multi_queue_secondary(q))
+		init_policies(guc, q);
 }
 
 static u32 wq_space_until_wrap(struct xe_exec_queue *q)
@@ -833,6 +943,12 @@ static void submit_exec_queue(struct xe_exec_queue *q, struct xe_sched_job *job)
 	if (exec_queue_suspended(q) && !xe_exec_queue_is_parallel(q))
 		return;
 
+	/*
+	 * All queues in a multi-queue group will use the primary queue
+	 * of the group to interface with GuC.
+	 */
+	q = xe_exec_queue_multi_queue_primary(q);
+
 	if (!exec_queue_enabled(q) && !exec_queue_suspended(q)) {
 		action[len++] = XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
 		action[len++] = q->guc->id;
@@ -879,6 +995,18 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
 	trace_xe_sched_job_run(job);
 
 	if (!killed_or_banned_or_wedged && !xe_sched_job_is_error(job)) {
+		if (xe_exec_queue_is_multi_queue_secondary(q)) {
+			struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
+
+			if (exec_queue_killed_or_banned_or_wedged(primary)) {
+				killed_or_banned_or_wedged = true;
+				goto run_job_out;
+			}
+
+			if (!exec_queue_registered(primary))
+				register_exec_queue(primary, GUC_CONTEXT_NORMAL);
+		}
+
 		if (!exec_queue_registered(q))
 			register_exec_queue(q, GUC_CONTEXT_NORMAL);
 		if (!job->skip_emit)
@@ -887,6 +1015,7 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
 		job->skip_emit = false;
 	}
 
+run_job_out:
 	/*
 	 * We don't care about job-fence ordering in LR VMs because these fences
 	 * are never exported; they are used solely to keep jobs on the pending
@@ -912,6 +1041,11 @@ int xe_guc_read_stopped(struct xe_guc *guc)
 	return atomic_read(&guc->submission_state.stopped);
 }
 
+static void handle_multi_queue_secondary_sched_done(struct xe_guc *guc,
+						    struct xe_exec_queue *q,
+						    u32 runnable_state);
+static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q);
+
 #define MAKE_SCHED_CONTEXT_ACTION(q, enable_disable)			\
 	u32 action[] = {						\
 		XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET,			\
@@ -925,7 +1059,9 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
 	MAKE_SCHED_CONTEXT_ACTION(q, DISABLE);
 	int ret;
 
-	set_min_preemption_timeout(guc, q);
+	if (!xe_exec_queue_is_multi_queue_secondary(q))
+		set_min_preemption_timeout(guc, q);
+
 	smp_rmb();
 	ret = wait_event_timeout(guc->ct.wq,
 				 (!exec_queue_pending_enable(q) &&
@@ -953,9 +1089,12 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
 	 * Reserve space for both G2H here as the 2nd G2H is sent from a G2H
 	 * handler and we are not allowed to reserved G2H space in handlers.
 	 */
-	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
-		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET +
-		       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
+	if (xe_exec_queue_is_multi_queue_secondary(q))
+		handle_multi_queue_secondary_sched_done(guc, q, 0);
+	else
+		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
+			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET +
+			       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
 }
 
 static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
@@ -1161,8 +1300,11 @@ static void enable_scheduling(struct xe_exec_queue *q)
 	set_exec_queue_enabled(q);
 	trace_xe_exec_queue_scheduling_enable(q);
 
-	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
-		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
+	if (xe_exec_queue_is_multi_queue_secondary(q))
+		handle_multi_queue_secondary_sched_done(guc, q, 1);
+	else
+		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
+			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
 
 	ret = wait_event_timeout(guc->ct.wq,
 				 !exec_queue_pending_enable(q) ||
@@ -1186,14 +1328,17 @@ static void disable_scheduling(struct xe_exec_queue *q, bool immediate)
 	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
 	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q));
 
-	if (immediate)
+	if (immediate && !xe_exec_queue_is_multi_queue_secondary(q))
 		set_min_preemption_timeout(guc, q);
 	clear_exec_queue_enabled(q);
 	set_exec_queue_pending_disable(q);
 	trace_xe_exec_queue_scheduling_disable(q);
 
-	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
-		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
+	if (xe_exec_queue_is_multi_queue_secondary(q))
+		handle_multi_queue_secondary_sched_done(guc, q, 0);
+	else
+		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
+			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
 }
 
 static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
@@ -1211,8 +1356,11 @@ static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
 	set_exec_queue_destroyed(q);
 	trace_xe_exec_queue_deregister(q);
 
-	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
-		       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
+	if (xe_exec_queue_is_multi_queue_secondary(q))
+		handle_deregister_done(guc, q);
+	else
+		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
+			       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
 }
 
 static enum drm_gpu_sched_stat
@@ -1660,6 +1808,7 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
 {
 	struct xe_gpu_scheduler *sched;
 	struct xe_guc *guc = exec_queue_to_guc(q);
+	struct workqueue_struct *submit_wq = NULL;
 	struct xe_guc_exec_queue *ge;
 	long timeout;
 	int err, i;
@@ -1680,8 +1829,20 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
 
 	timeout = (q->vm && xe_vm_in_lr_mode(q->vm)) ? MAX_SCHEDULE_TIMEOUT :
 		  msecs_to_jiffies(q->sched_props.job_timeout_ms);
+
+	/*
+	 * Use primary queue's submit_wq for all secondary queues of a
+	 * multi queue group. This serialization avoids any locking around
+	 * CGP synchronization with GuC.
+	 */
+	if (xe_exec_queue_is_multi_queue_secondary(q)) {
+		struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
+
+		submit_wq = primary->guc->sched.base.submit_wq;
+	}
+
 	err = xe_sched_init(&ge->sched, &drm_sched_ops, &xe_sched_ops,
-			    NULL, xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES, 64,
+			    submit_wq, xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES, 64,
 			    timeout, guc_to_gt(guc)->ordered_wq, NULL,
 			    q->name, gt_to_xe(q->gt)->drm.dev);
 	if (err)
@@ -2418,7 +2579,11 @@ static void deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
 
 	trace_xe_exec_queue_deregister(q);
 
-	xe_guc_ct_send_g2h_handler(&guc->ct, action, ARRAY_SIZE(action));
+	if (xe_exec_queue_is_multi_queue_secondary(q))
+		handle_deregister_done(guc, q);
+	else
+		xe_guc_ct_send_g2h_handler(&guc->ct, action,
+					   ARRAY_SIZE(action));
 }
 
 static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
@@ -2468,6 +2633,15 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
 	}
 }
 
+static void handle_multi_queue_secondary_sched_done(struct xe_guc *guc,
+						    struct xe_exec_queue *q,
+						    u32 runnable_state)
+{
+	mutex_lock(&guc->ct.lock);
+	handle_sched_done(guc, q, runnable_state);
+	mutex_unlock(&guc->ct.lock);
+}
+
 int xe_guc_sched_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
 {
 	struct xe_exec_queue *q;
@@ -2672,6 +2846,44 @@ int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 le
 	return 0;
 }
 
+/**
+ * xe_guc_exec_queue_cgp_sync_done_handler - CGP synchronization done handler
+ * @guc: guc
+ * @msg: message indicating CGP sync done
+ * @len: length of message
+ *
+ * Set multi queue group's sync_pending flag to false and wakeup anyone waiting
+ * for CGP synchronization to complete.
+ *
+ * Return: 0 on success, -EPROTO for malformed messages.
+ */
+int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
+{
+	struct xe_device *xe = guc_to_xe(guc);
+	struct xe_exec_queue *q;
+	u32 guc_id = msg[0];
+
+	if (unlikely(len < 1)) {
+		drm_err(&xe->drm, "Invalid CGP_SYNC_DONE length %u", len);
+		return -EPROTO;
+	}
+
+	q = g2h_exec_queue_lookup(guc, guc_id);
+	if (unlikely(!q))
+		return -EPROTO;
+
+	if (!xe_exec_queue_is_multi_queue_primary(q)) {
+		drm_err(&xe->drm, "Unexpected CGP_SYNC_DONE response");
+		return -EPROTO;
+	}
+
+	/* Wakeup the serialized cgp update wait */
+	WRITE_ONCE(q->multi_queue.group->sync_pending, false);
+	wake_up_all(&guc->ct.wq);
+
+	return 0;
+}
+
 static void
 guc_exec_queue_wq_snapshot_capture(struct xe_exec_queue *q,
 				   struct xe_guc_submit_exec_queue_snapshot *snapshot)
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
index b49a2748ec46..abfa94bce391 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.h
+++ b/drivers/gpu/drm/xe/xe_guc_submit.h
@@ -34,6 +34,7 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
 					       u32 len);
 int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 len);
 int xe_guc_error_capture_handler(struct xe_guc *guc, u32 *msg, u32 len);
+int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
 
 struct xe_guc_submit_exec_queue_snapshot *
 xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 04/16] drm/xe/multi_queue: Add multi queue priority property
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
                   ` (2 preceding siblings ...)
  2025-10-31 18:29 ` [PATCH 03/16] drm/xe/multi_queue: Add GuC " Niranjana Vishwanathapura
@ 2025-10-31 18:29 ` Niranjana Vishwanathapura
  2025-11-01 23:59   ` Matthew Brost
  2025-10-31 18:29 ` [PATCH 05/16] drm/xe/multi_queue: Handle invalid exec queue property setting Niranjana Vishwanathapura
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-10-31 18:29 UTC (permalink / raw)
  To: intel-xe

Add support for queues of a multi queue group to set
their priority within the queue group by adding property
DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY.
This is the only other property supported by secondary
queues of a multi queue group, other than
DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue.c       | 17 ++++++++++++-
 drivers/gpu/drm/xe/xe_exec_queue_types.h |  8 ++++++
 drivers/gpu/drm/xe/xe_guc_submit.c       |  1 +
 drivers/gpu/drm/xe/xe_lrc.c              | 32 ++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_lrc.h              |  5 ++++
 include/uapi/drm/xe_drm.h                |  3 +++
 6 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 86404a7c9fe4..0da256428916 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -177,6 +177,7 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
 	INIT_LIST_HEAD(&q->multi_gt_link);
 	INIT_LIST_HEAD(&q->hw_engine_group_link);
 	INIT_LIST_HEAD(&q->pxp.link);
+	q->multi_queue.priority = XE_MULTI_QUEUE_PRIORITY_NORMAL;
 
 	q->sched_props.timeslice_us = hwe->eclass->sched_props.timeslice_us;
 	q->sched_props.preempt_timeout_us =
@@ -722,6 +723,17 @@ static int exec_queue_set_multi_group(struct xe_device *xe, struct xe_exec_queue
 	return xe_exec_queue_group_validate(xe, q, value);
 }
 
+static int exec_queue_set_multi_queue_priority(struct xe_device *xe, struct xe_exec_queue *q,
+					       u64 value)
+{
+	if (XE_IOCTL_DBG(xe, value > XE_MULTI_QUEUE_PRIORITY_HIGH))
+		return -EINVAL;
+
+	q->multi_queue.priority = value;
+
+	return 0;
+}
+
 typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
 					     struct xe_exec_queue *q,
 					     u64 value);
@@ -731,6 +743,8 @@ static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
 	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE] = exec_queue_set_timeslice,
 	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE] = exec_queue_set_pxp_type,
 	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP] = exec_queue_set_multi_group,
+	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY] =
+							exec_queue_set_multi_queue_priority,
 };
 
 static int exec_queue_user_ext_set_property(struct xe_device *xe,
@@ -752,7 +766,8 @@ static int exec_queue_user_ext_set_property(struct xe_device *xe,
 	    XE_IOCTL_DBG(xe, ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY &&
 			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE &&
 			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE &&
-			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP))
+			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP &&
+			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY))
 		return -EINVAL;
 
 	idx = array_index_nospec(ext.property, ARRAY_SIZE(exec_queue_set_property_funcs));
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index 38e47b003259..964a0e6654c7 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -31,6 +31,12 @@ enum xe_exec_queue_priority {
 	XE_EXEC_QUEUE_PRIORITY_COUNT
 };
 
+enum xe_multi_queue_priority {
+	XE_MULTI_QUEUE_PRIORITY_LOW = 0,
+	XE_MULTI_QUEUE_PRIORITY_NORMAL,
+	XE_MULTI_QUEUE_PRIORITY_HIGH,
+};
+
 /**
  * struct xe_exec_queue_group - Execution multi queue group
  *
@@ -134,6 +140,8 @@ struct xe_exec_queue {
 	struct {
 		/** @multi_queue.group: Queue group information */
 		struct xe_exec_queue_group *group;
+		/** @multi_queue.priority: Queue priority within the multi-queue group */
+		enum xe_multi_queue_priority priority;
 		/** @multi_queue.pos: Position of queue within the multi-queue group */
 		u8 pos;
 		/** @multi_queue.valid: Queue belongs to a multi queue group */
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index d2aa9a2524e7..5ec144c1c2dc 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -634,6 +634,7 @@ static void xe_guc_exec_queue_group_cgp_sync(struct xe_guc *guc,
 		return;
 	}
 
+	xe_lrc_set_multi_queue_priority(q->lrc[0], q->multi_queue.priority);
 	xe_guc_exec_queue_group_cgp_update(xe, q);
 
 	WRITE_ONCE(group->sync_pending, true);
diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index b5083c99dd50..45fc5bc5de5c 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -44,6 +44,11 @@
 #define LRC_INDIRECT_CTX_BO_SIZE		SZ_4K
 #define LRC_INDIRECT_RING_STATE_SIZE		SZ_4K
 
+#define LRC_PRIORITY				GENMASK_ULL(10, 9)
+#define LRC_PRIORITY_LOW			0
+#define LRC_PRIORITY_NORMAL			1
+#define LRC_PRIORITY_HIGH			2
+
 /*
  * Layout of the LRC and associated data allocated as
  * lrc->bo:
@@ -1386,6 +1391,33 @@ setup_indirect_ctx(struct xe_lrc *lrc, struct xe_hw_engine *hwe)
 	return 0;
 }
 
+static u8 xe_multi_queue_prio_to_lrc(struct xe_lrc *lrc, enum xe_multi_queue_priority priority)
+{
+	struct xe_device *xe = gt_to_xe(lrc->gt);
+
+	/* xe_multi_queue_priority is directly mapped to LRC priority values */
+	if (priority >= XE_MULTI_QUEUE_PRIORITY_LOW &&
+	    priority <= XE_MULTI_QUEUE_PRIORITY_HIGH)
+		return priority;
+
+	/* Fallback to NORMAL if out of range */
+	drm_warn(&xe->drm, "Unknown multi queue priority: %d, defaulting to NORMAL\n", priority);
+	return LRC_PRIORITY_NORMAL;
+}
+
+/**
+ * xe_lrc_set_multi_queue_priority() - Set multi queue priority in LRC
+ * @lrc: Logical Ring Context
+ * @priority: Multi queue priority of the exec queue
+ *
+ * Convert @priority to LRC multi queue priority and update the @lrc descriptor
+ */
+void xe_lrc_set_multi_queue_priority(struct xe_lrc *lrc, enum xe_multi_queue_priority priority)
+{
+	lrc->desc &= ~LRC_PRIORITY;
+	lrc->desc |= FIELD_PREP(LRC_PRIORITY, xe_multi_queue_prio_to_lrc(lrc, priority));
+}
+
 static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 		       struct xe_vm *vm, u32 ring_size, u16 msix_vec,
 		       u32 init_flags)
diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h
index 2fb628da5c43..3e6b356e0d1c 100644
--- a/drivers/gpu/drm/xe/xe_lrc.h
+++ b/drivers/gpu/drm/xe/xe_lrc.h
@@ -8,11 +8,14 @@
 #include <linux/types.h>
 
 #include "xe_lrc_types.h"
+#include "xe_exec_queue_types.h"
 
 struct drm_printer;
 struct xe_bb;
 struct xe_device;
 struct xe_exec_queue;
+enum xe_exec_queue_priority;
+enum xe_multi_queue_priority;
 enum xe_engine_class;
 struct xe_gt;
 struct xe_hw_engine;
@@ -133,6 +136,8 @@ void xe_lrc_dump_default(struct drm_printer *p,
 
 u32 *xe_lrc_emit_hwe_state_instructions(struct xe_exec_queue *q, u32 *cs);
 
+void xe_lrc_set_multi_queue_priority(struct xe_lrc *lrc, enum xe_multi_queue_priority priority);
+
 struct xe_lrc_snapshot *xe_lrc_snapshot_capture(struct xe_lrc *lrc);
 void xe_lrc_snapshot_capture_delayed(struct xe_lrc_snapshot *snapshot);
 void xe_lrc_snapshot_print(struct xe_lrc_snapshot *snapshot, struct drm_printer *p);
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index d903b3a55ec1..8ab44413646a 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -1258,6 +1258,8 @@ struct drm_xe_vm_bind {
  *    then a new multi-queue group is created with this queue as the primary queue
  *    (Q0). Otherwise, the queue gets added to the multi-queue group whose primary
  *    queue id is specified in the 'value' field.
+ *  - %DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY - Set the queue
+ *    priority within the multi-queue group.
  *
  * The example below shows how to use @drm_xe_exec_queue_create to create
  * a simple exec_queue (no parallel submission) of class
@@ -1300,6 +1302,7 @@ struct drm_xe_exec_queue_create {
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE		2
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP		3
 #define     DRM_XE_MULTI_GROUP_CREATE				(1ull << 63)
+#define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY	4
 	/** @extensions: Pointer to the first extension struct, if any */
 	__u64 extensions;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 05/16] drm/xe/multi_queue: Handle invalid exec queue property setting
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
                   ` (3 preceding siblings ...)
  2025-10-31 18:29 ` [PATCH 04/16] drm/xe/multi_queue: Add multi queue priority property Niranjana Vishwanathapura
@ 2025-10-31 18:29 ` Niranjana Vishwanathapura
  2025-11-03 22:41   ` Matthew Brost
  2025-10-31 18:29 ` [PATCH 06/16] drm/xe/multi_queue: Add exec_queue set_property ioctl support Niranjana Vishwanathapura
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-10-31 18:29 UTC (permalink / raw)
  To: intel-xe

Only MULTI_QUEUE_PRIORITY property is valid for secondary queues of a
multi queue group. MULTI_QUEUE_PRIORITY only applies to multi queue group
queues. Detect invalid user queue property setting and return error.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue.c | 66 ++++++++++++++++++++++++++----
 1 file changed, 57 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 0da256428916..78b3a0e2ddd3 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -61,7 +61,7 @@ enum xe_exec_queue_sched_prop {
 };
 
 static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q,
-				      u64 extensions, int ext_number);
+				      u64 extensions);
 
 static void xe_exec_queue_group_cleanup(struct xe_exec_queue *q)
 {
@@ -206,7 +206,7 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
 		 * may set q->usm, must come before xe_lrc_create(),
 		 * may overwrite q->sched_props, must come before q->ops->init()
 		 */
-		err = exec_queue_user_extensions(xe, q, extensions, 0);
+		err = exec_queue_user_extensions(xe, q, extensions);
 		if (err) {
 			__xe_exec_queue_free(q);
 			return ERR_PTR(err);
@@ -747,9 +747,35 @@ static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
 							exec_queue_set_multi_queue_priority,
 };
 
+static int exec_queue_user_ext_check(struct xe_exec_queue *q, u64 properties)
+{
+	u64 secondary_queue_valid_props = BIT_ULL(DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP) |
+				  BIT_ULL(DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY);
+
+	/*
+	 * Only MULTI_QUEUE_PRIORITY property is valid for secondary queues of a
+	 * multi-queue group.
+	 */
+	if (xe_exec_queue_is_multi_queue_secondary(q) &&
+	    properties & ~secondary_queue_valid_props)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int exec_queue_user_ext_check_final(struct xe_exec_queue *q, u64 properties)
+{
+	/* MULTI_QUEUE_PRIORITY only applies to multi-queue group queues */
+	if ((properties & BIT_ULL(DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY)) &&
+	    !(properties & BIT_ULL(DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP)))
+		return -EINVAL;
+
+	return 0;
+}
+
 static int exec_queue_user_ext_set_property(struct xe_device *xe,
 					    struct xe_exec_queue *q,
-					    u64 extension)
+					    u64 extension, u64 *properties)
 {
 	u64 __user *address = u64_to_user_ptr(extension);
 	struct drm_xe_ext_set_property ext;
@@ -774,20 +800,25 @@ static int exec_queue_user_ext_set_property(struct xe_device *xe,
 	if (!exec_queue_set_property_funcs[idx])
 		return -EINVAL;
 
+	*properties |= BIT_ULL(idx);
+	err = exec_queue_user_ext_check(q, *properties);
+	if (XE_IOCTL_DBG(xe, err))
+		return err;
+
 	return exec_queue_set_property_funcs[idx](xe, q, ext.value);
 }
 
 typedef int (*xe_exec_queue_user_extension_fn)(struct xe_device *xe,
 					       struct xe_exec_queue *q,
-					       u64 extension);
+					       u64 extension, u64 *properties);
 
 static const xe_exec_queue_user_extension_fn exec_queue_user_extension_funcs[] = {
 	[DRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY] = exec_queue_user_ext_set_property,
 };
 
 #define MAX_USER_EXTENSIONS	16
-static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q,
-				      u64 extensions, int ext_number)
+static int __exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q,
+					u64 extensions, int ext_number, u64 *properties)
 {
 	u64 __user *address = u64_to_user_ptr(extensions);
 	struct drm_xe_user_extension ext;
@@ -808,13 +839,30 @@ static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue
 
 	idx = array_index_nospec(ext.name,
 				 ARRAY_SIZE(exec_queue_user_extension_funcs));
-	err = exec_queue_user_extension_funcs[idx](xe, q, extensions);
+	err = exec_queue_user_extension_funcs[idx](xe, q, extensions, properties);
 	if (XE_IOCTL_DBG(xe, err))
 		return err;
 
 	if (ext.next_extension)
-		return exec_queue_user_extensions(xe, q, ext.next_extension,
-						  ++ext_number);
+		return __exec_queue_user_extensions(xe, q, ext.next_extension,
+						    ++ext_number, properties);
+
+	return 0;
+}
+
+static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q,
+				      u64 extensions)
+{
+	u64 properties = 0;
+	int err;
+
+	err = __exec_queue_user_extensions(xe, q, extensions, 0, &properties);
+	if (XE_IOCTL_DBG(xe, err))
+		return err;
+
+	err = exec_queue_user_ext_check_final(q, properties);
+	if (XE_IOCTL_DBG(xe, err))
+		return err;
 
 	if (xe_exec_queue_is_multi_queue_primary(q)) {
 		err = xe_exec_queue_group_init(xe, q);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 06/16] drm/xe/multi_queue: Add exec_queue set_property ioctl support
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
                   ` (4 preceding siblings ...)
  2025-10-31 18:29 ` [PATCH 05/16] drm/xe/multi_queue: Handle invalid exec queue property setting Niranjana Vishwanathapura
@ 2025-10-31 18:29 ` Niranjana Vishwanathapura
  2025-11-02 16:53   ` Matthew Brost
  2025-10-31 18:29 ` [PATCH 07/16] drm/xe/multi_queue: Add support for multi queue dynamic priority change Niranjana Vishwanathapura
                   ` (14 subsequent siblings)
  20 siblings, 1 reply; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-10-31 18:29 UTC (permalink / raw)
  To: intel-xe

This patch adds support for exec_queue set_property ioctl.
It is derived from the original work which is part of
https://patchwork.freedesktop.org/series/112188/

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c     |  2 ++
 drivers/gpu/drm/xe/xe_exec_queue.c | 31 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_exec_queue.h |  2 ++
 include/uapi/drm/xe_drm.h          | 24 +++++++++++++++++++++++
 4 files changed, 59 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 47f5391ad8e9..0b496676527a 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -208,6 +208,8 @@ static const struct drm_ioctl_desc xe_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(XE_MADVISE, xe_vm_madvise_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(XE_VM_QUERY_MEM_RANGE_ATTRS, xe_vm_query_vmas_attrs_ioctl,
 			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(XE_EXEC_QUEUE_SET_PROPERTY, xe_exec_queue_set_property_ioctl,
+			  DRM_RENDER_ALLOW),
 };
 
 static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 78b3a0e2ddd3..0264cab00fd4 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -747,6 +747,37 @@ static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
 							exec_queue_set_multi_queue_priority,
 };
 
+int xe_exec_queue_set_property_ioctl(struct drm_device *dev, void *data,
+				     struct drm_file *file)
+{
+	struct xe_device *xe = to_xe_device(dev);
+	struct xe_file *xef = to_xe_file(file);
+	struct drm_xe_exec_queue_set_property *args = data;
+	struct xe_exec_queue *q;
+	int ret;
+	u32 idx;
+
+	if (XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
+		return -EINVAL;
+
+	q = xe_exec_queue_lookup(xef, args->exec_queue_id);
+	if (XE_IOCTL_DBG(xe, !q))
+		return -ENOENT;
+
+	idx = array_index_nospec(args->property,
+				 ARRAY_SIZE(exec_queue_set_property_funcs));
+	ret = exec_queue_set_property_funcs[idx](xe, q, args->value);
+	if (XE_IOCTL_DBG(xe, ret))
+		goto err_post_lookup;
+
+	xe_exec_queue_put(q);
+	return 0;
+
+ err_post_lookup:
+	xe_exec_queue_put(q);
+	return ret;
+}
+
 static int exec_queue_user_ext_check(struct xe_exec_queue *q, u64 properties)
 {
 	u64 secondary_queue_valid_props = BIT_ULL(DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP) |
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
index 8cd6487018fa..61478b2e883b 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue.h
@@ -121,6 +121,8 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
 				struct drm_file *file);
 int xe_exec_queue_get_property_ioctl(struct drm_device *dev, void *data,
 				     struct drm_file *file);
+int xe_exec_queue_set_property_ioctl(struct drm_device *dev, void *data,
+				     struct drm_file *file);
 enum xe_exec_queue_priority xe_exec_queue_device_get_max_priority(struct xe_device *xe);
 
 void xe_exec_queue_last_fence_put(struct xe_exec_queue *e, struct xe_vm *vm);
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 8ab44413646a..d72151163e77 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -106,6 +106,7 @@ extern "C" {
 #define DRM_XE_OBSERVATION		0x0b
 #define DRM_XE_MADVISE			0x0c
 #define DRM_XE_VM_QUERY_MEM_RANGE_ATTRS	0x0d
+#define DRM_XE_EXEC_QUEUE_SET_PROPERTY	0x0e
 
 /* Must be kept compact -- no holes */
 
@@ -123,6 +124,7 @@ extern "C" {
 #define DRM_IOCTL_XE_OBSERVATION		DRM_IOW(DRM_COMMAND_BASE + DRM_XE_OBSERVATION, struct drm_xe_observation_param)
 #define DRM_IOCTL_XE_MADVISE			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_MADVISE, struct drm_xe_madvise)
 #define DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS	DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_VM_QUERY_MEM_RANGE_ATTRS, struct drm_xe_vm_query_mem_range_attr)
+#define DRM_IOCTL_XE_EXEC_QUEUE_SET_PROPERTY	DRM_IOW(DRM_COMMAND_BASE + DRM_XE_EXEC_QUEUE_SET_PROPERTY, struct drm_xe_exec_queue_set_property)
 
 /**
  * DOC: Xe IOCTL Extensions
@@ -2284,6 +2286,28 @@ struct drm_xe_vm_query_mem_range_attr {
 
 };
 
+/**
+ * struct drm_xe_exec_queue_set_property - exec queue set property
+ *
+ * Sets execution queue properties dynamically.
+ */
+struct drm_xe_exec_queue_set_property {
+	/** @extensions: Pointer to the first extension struct, if any */
+	__u64 extensions;
+
+	/** @exec_queue_id: Exec queue ID */
+	__u32 exec_queue_id;
+
+	/** @property: property to set */
+	__u32 property;
+
+	/** @value: property value */
+	__u64 value;
+
+	/** @reserved: Reserved */
+	__u64 reserved[2];
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 07/16] drm/xe/multi_queue: Add support for multi queue dynamic priority change
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
                   ` (5 preceding siblings ...)
  2025-10-31 18:29 ` [PATCH 06/16] drm/xe/multi_queue: Add exec_queue set_property ioctl support Niranjana Vishwanathapura
@ 2025-10-31 18:29 ` Niranjana Vishwanathapura
  2025-11-01 23:23   ` Matthew Brost
  2025-11-01 23:41   ` Matthew Brost
  2025-10-31 18:29 ` [PATCH 08/16] drm/xe/multi_queue: Add multi queue information to guc_info dump Niranjana Vishwanathapura
                   ` (13 subsequent siblings)
  20 siblings, 2 replies; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-10-31 18:29 UTC (permalink / raw)
  To: intel-xe

Support dynamic priority change for multi queue group queues via
exec queue set_property ioctl. Issue CGP_SYNC command to GuC through
the drm scheduler message interface for priority to take effect.

Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue.c       | 12 ++++-
 drivers/gpu/drm/xe/xe_exec_queue_types.h |  3 ++
 drivers/gpu/drm/xe/xe_guc_submit.c       | 56 ++++++++++++++++++++++--
 3 files changed, 65 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 0264cab00fd4..98f8f1c7f13b 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -729,9 +729,13 @@ static int exec_queue_set_multi_queue_priority(struct xe_device *xe, struct xe_e
 	if (XE_IOCTL_DBG(xe, value > XE_MULTI_QUEUE_PRIORITY_HIGH))
 		return -EINVAL;
 
-	q->multi_queue.priority = value;
+	/* For queue creation time (!q->xef) setting, just store the priority value */
+	if (!q->xef) {
+		q->multi_queue.priority = value;
+		return 0;
+	}
 
-	return 0;
+	return q->ops->set_multi_queue_priority(q, value);
 }
 
 typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
@@ -760,6 +764,10 @@ int xe_exec_queue_set_property_ioctl(struct drm_device *dev, void *data,
 	if (XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
 		return -EINVAL;
 
+	if (XE_IOCTL_DBG(xe, args->property !=
+			 DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY))
+		return -EINVAL;
+
 	q = xe_exec_queue_lookup(xef, args->exec_queue_id);
 	if (XE_IOCTL_DBG(xe, !q))
 		return -ENOENT;
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index 964a0e6654c7..dcb55b069ed8 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -241,6 +241,9 @@ struct xe_exec_queue_ops {
 	int (*set_timeslice)(struct xe_exec_queue *q, u32 timeslice_us);
 	/** @set_preempt_timeout: Set preemption timeout for exec queue */
 	int (*set_preempt_timeout)(struct xe_exec_queue *q, u32 preempt_timeout_us);
+	/** @set_multi_queue_priority: Set multi queue priority */
+	int (*set_multi_queue_priority)(struct xe_exec_queue *q,
+					enum xe_multi_queue_priority priority);
 	/**
 	 * @suspend: Suspend exec queue from executing, allowed to be called
 	 * multiple times in a row before resume with the caveat that
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 5ec144c1c2dc..426b64ef8d99 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1761,10 +1761,32 @@ static void __guc_exec_queue_process_msg_resume(struct xe_sched_msg *msg)
 	}
 }
 
-#define CLEANUP		1	/* Non-zero values to catch uninitialized msg */
-#define SET_SCHED_PROPS	2
-#define SUSPEND		3
-#define RESUME		4
+static void __guc_exec_queue_process_msg_set_multi_queue_priority(struct xe_sched_msg *msg)
+{
+	struct xe_exec_queue *q = msg->private_data;
+
+	if (guc_exec_queue_allowed_to_change_state(q)) {
+#define MAX_MULTI_QUEUE_REG_SIZE        (2)
+		struct xe_guc *guc = exec_queue_to_guc(q);
+		struct xe_exec_queue_group *group = q->multi_queue.group;
+		u32 action[MAX_MULTI_QUEUE_REG_SIZE];
+		int len = 0;
+
+		action[len++] = XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC;
+		action[len++] = group->primary->guc->id;
+#undef MAX_MULTI_QUEUE_REG_SIZE
+
+		xe_guc_exec_queue_group_cgp_sync(guc, q, action, len);
+	}
+
+	kfree(msg);
+}
+
+#define CLEANUP				1	/* Non-zero values to catch uninitialized msg */
+#define SET_SCHED_PROPS			2
+#define SUSPEND				3
+#define RESUME				4
+#define SET_MULTI_QUEUE_PRIORITY	5
 #define OPCODE_MASK	0xf
 #define MSG_LOCKED	BIT(8)
 #define MSG_HEAD	BIT(9)
@@ -1788,6 +1810,9 @@ static void guc_exec_queue_process_msg(struct xe_sched_msg *msg)
 	case RESUME:
 		__guc_exec_queue_process_msg_resume(msg);
 		break;
+	case SET_MULTI_QUEUE_PRIORITY:
+		__guc_exec_queue_process_msg_set_multi_queue_priority(msg);
+		break;
 	default:
 		XE_WARN_ON("Unknown message type");
 	}
@@ -2004,6 +2029,28 @@ static int guc_exec_queue_set_preempt_timeout(struct xe_exec_queue *q,
 	return 0;
 }
 
+static int guc_exec_queue_set_multi_queue_priority(struct xe_exec_queue *q,
+						   enum xe_multi_queue_priority priority)
+{
+	struct xe_sched_msg *msg;
+
+	if (!xe_exec_queue_is_multi_queue(q))
+		return -EINVAL;
+
+	if (q->multi_queue.priority == priority ||
+	    exec_queue_killed_or_banned_or_wedged(q))
+		return 0;
+
+	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	q->multi_queue.priority = priority;
+	guc_exec_queue_add_msg(q, msg, SET_MULTI_QUEUE_PRIORITY);
+
+	return 0;
+}
+
 static int guc_exec_queue_suspend(struct xe_exec_queue *q)
 {
 	struct xe_gpu_scheduler *sched = &q->guc->sched;
@@ -2095,6 +2142,7 @@ static const struct xe_exec_queue_ops guc_exec_queue_ops = {
 	.set_priority = guc_exec_queue_set_priority,
 	.set_timeslice = guc_exec_queue_set_timeslice,
 	.set_preempt_timeout = guc_exec_queue_set_preempt_timeout,
+	.set_multi_queue_priority = guc_exec_queue_set_multi_queue_priority,
 	.suspend = guc_exec_queue_suspend,
 	.suspend_wait = guc_exec_queue_suspend_wait,
 	.resume = guc_exec_queue_resume,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 08/16] drm/xe/multi_queue: Add multi queue information to guc_info dump
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
                   ` (6 preceding siblings ...)
  2025-10-31 18:29 ` [PATCH 07/16] drm/xe/multi_queue: Add support for multi queue dynamic priority change Niranjana Vishwanathapura
@ 2025-10-31 18:29 ` Niranjana Vishwanathapura
  2025-11-01 18:31   ` Matthew Brost
  2025-10-31 18:29 ` [PATCH 09/16] drm/xe/multi_queue: Handle tearing down of a multi queue Niranjana Vishwanathapura
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-10-31 18:29 UTC (permalink / raw)
  To: intel-xe

Dump multi queue specific information in the guc exec queue
dump.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c       | 10 ++++++++++
 drivers/gpu/drm/xe/xe_guc_submit_types.h | 13 +++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 426b64ef8d99..b84a0be2eefe 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -3032,6 +3032,11 @@ xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q)
 	if (snapshot->parallel_execution)
 		guc_exec_queue_wq_snapshot_capture(q, snapshot);
 
+	snapshot->is_multi_queue = xe_exec_queue_is_multi_queue(q);
+	if (snapshot->is_multi_queue) {
+		snapshot->multi_queue.primary = xe_exec_queue_multi_queue_primary(q)->guc->id;
+		snapshot->multi_queue.pos = q->multi_queue.pos;
+	}
 	spin_lock(&sched->base.job_list_lock);
 	snapshot->pending_list_size = list_count_nodes(&sched->base.pending_list);
 	snapshot->pending_list = kmalloc_array(snapshot->pending_list_size,
@@ -3114,6 +3119,11 @@ xe_guc_exec_queue_snapshot_print(struct xe_guc_submit_exec_queue_snapshot *snaps
 	if (snapshot->parallel_execution)
 		guc_exec_queue_wq_snapshot_print(snapshot, p);
 
+	if (snapshot->is_multi_queue) {
+		drm_printf(p, "\tMulti queue primary GuC ID: %d\n", snapshot->multi_queue.primary);
+		drm_printf(p, "\tMulti queue position: %d\n", snapshot->multi_queue.pos);
+	}
+
 	for (i = 0; snapshot->pending_list && i < snapshot->pending_list_size;
 	     i++)
 		drm_printf(p, "\tJob: seqno=%d, fence=%d, finished=%d\n",
diff --git a/drivers/gpu/drm/xe/xe_guc_submit_types.h b/drivers/gpu/drm/xe/xe_guc_submit_types.h
index dc7456c34583..20dddf50d802 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_submit_types.h
@@ -135,6 +135,19 @@ struct xe_guc_submit_exec_queue_snapshot {
 		u32 wq[WQ_SIZE / sizeof(u32)];
 	} parallel;
 
+	/** @is_multi_queue: The exec queue is part of a multi queue group */
+	bool is_multi_queue;
+	/** @multi_queue: snapshot of the multi queue information */
+	struct {
+		/**
+		 * @multi_queue.primary: GuC id of the primary exec queue
+		 * of the multi queue group.
+		 */
+		u32 primary;
+		/** @multi_queue.pos: Position of the exec queue within the multi queue group */
+		u8 pos;
+	} multi_queue;
+
 	/** @pending_list_size: Size of the pending list snapshot array */
 	int pending_list_size;
 	/** @pending_list: snapshot of the pending list info */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 09/16] drm/xe/multi_queue: Handle tearing down of a multi queue
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
                   ` (7 preceding siblings ...)
  2025-10-31 18:29 ` [PATCH 08/16] drm/xe/multi_queue: Add multi queue information to guc_info dump Niranjana Vishwanathapura
@ 2025-10-31 18:29 ` Niranjana Vishwanathapura
  2025-11-02  0:39   ` Matthew Brost
  2025-10-31 18:29 ` [PATCH 10/16] drm/xe/multi_queue: Set QUEUE_DRAIN_MODE for Multi Queue batches Niranjana Vishwanathapura
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-10-31 18:29 UTC (permalink / raw)
  To: intel-xe

As all queues of a multi queue group use the primary queue of the group
to interface with GuC. Hence there is a dependency between the queues of
the group. So, when primary queue of a multi queue group is cleaned up,
also trigger a cleanup of the secondary queues. During cleanup, stop and
re-start submission for all queues of a multi queue group to avoid any
submission happening in parallel when a queue is being cleaned up.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue.c       |   2 +
 drivers/gpu/drm/xe/xe_exec_queue_types.h |   4 +
 drivers/gpu/drm/xe/xe_guc_submit.c       | 150 +++++++++++++++++++----
 3 files changed, 134 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 98f8f1c7f13b..3c1bb4f10fd5 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -85,6 +85,7 @@ static void xe_exec_queue_group_cleanup(struct xe_exec_queue *q)
 
 	xa_destroy(&group->xa);
 	mutex_destroy(&group->lock);
+	mutex_destroy(&group->list_lock);
 	xe_bo_unpin_map_no_vm(group->cgp_bo);
 	kfree(group);
 }
@@ -605,6 +606,7 @@ static int xe_exec_queue_group_init(struct xe_device *xe, struct xe_exec_queue *
 
 	group->primary = q;
 	group->cgp_bo = bo;
+	INIT_LIST_HEAD(&group->list);
 	xa_init_flags(&group->xa, XA_FLAGS_ALLOC1);
 	mutex_init(&group->lock);
 	mutex_init(&group->list_lock);
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index dcb55b069ed8..e64b6588923e 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -51,6 +51,8 @@ struct xe_exec_queue_group {
 	struct xe_bo *cgp_bo;
 	/** @xa: xarray to store LRCs */
 	struct xarray xa;
+	/** @list: List of all secondary queues in the group */
+	struct list_head list;
 	/** @list_lock: Secondary queue list lock */
 	struct mutex list_lock;
 	/** @sync_pending: CGP_SYNC_DONE g2h response pending */
@@ -140,6 +142,8 @@ struct xe_exec_queue {
 	struct {
 		/** @multi_queue.group: Queue group information */
 		struct xe_exec_queue_group *group;
+		/** @multi_queue.link: Link into group's secondary queues list */
+		struct list_head link;
 		/** @multi_queue.priority: Queue priority within the multi-queue group */
 		enum xe_multi_queue_priority priority;
 		/** @multi_queue.pos: Position of queue within the multi-queue group */
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index b84a0be2eefe..87c13feb2cef 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -920,6 +920,81 @@ static void wq_item_append(struct xe_exec_queue *q)
 	parallel_write(xe, map, wq_desc.tail, q->guc->wqi_tail);
 }
 
+static void xe_guc_exec_queue_submission_start(struct xe_exec_queue *q)
+{
+	/*
+	 * If the exec queue is part of a multi queue group, then start submission
+	 * on all queues of the multi queue group.
+	 */
+	if (xe_exec_queue_is_multi_queue(q)) {
+		struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
+		struct xe_exec_queue_group *group = q->multi_queue.group;
+		struct xe_exec_queue *eq;
+
+		xe_sched_submission_start(&primary->guc->sched);
+
+		mutex_lock(&group->list_lock);
+		list_for_each_entry(eq, &group->list, multi_queue.link)
+			xe_sched_submission_start(&eq->guc->sched);
+		mutex_unlock(&group->list_lock);
+	} else {
+		xe_sched_submission_start(&q->guc->sched);
+	}
+}
+
+static void xe_guc_exec_queue_submission_stop(struct xe_exec_queue *q)
+{
+	/*
+	 * If the exec queue is part of a multi queue group, then stop submission
+	 * on all queues of the multi queue group.
+	 */
+	if (xe_exec_queue_is_multi_queue(q)) {
+		struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
+		struct xe_exec_queue_group *group = q->multi_queue.group;
+		struct xe_exec_queue *eq;
+
+		xe_sched_submission_stop(&primary->guc->sched);
+
+		mutex_lock(&group->list_lock);
+		list_for_each_entry(eq, &group->list, multi_queue.link)
+			xe_sched_submission_stop(&eq->guc->sched);
+		mutex_unlock(&group->list_lock);
+	} else {
+		xe_sched_submission_stop(&q->guc->sched);
+	}
+}
+
+static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
+{
+	struct xe_guc *guc = exec_queue_to_guc(q);
+	struct xe_device *xe = guc_to_xe(guc);
+
+	/** to wakeup xe_wait_user_fence ioctl if exec queue is reset */
+	wake_up_all(&xe->ufence_wq);
+
+	if (xe_exec_queue_is_lr(q))
+		queue_work(guc_to_gt(guc)->ordered_wq, &q->guc->lr_tdr);
+	else
+		xe_sched_tdr_queue_imm(&q->guc->sched);
+}
+
+static void xe_guc_exec_queue_trigger_secondary_cleanup(struct xe_exec_queue *q)
+{
+	struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
+	struct xe_exec_queue_group *group = q->multi_queue.group;
+	struct xe_exec_queue *eq;
+
+	mutex_lock(&group->list_lock);
+	list_for_each_entry(eq, &group->list, multi_queue.link) {
+		if (exec_queue_reset(primary))
+			set_exec_queue_reset(eq);
+
+		if (!exec_queue_banned(eq))
+			xe_guc_exec_queue_trigger_cleanup(eq);
+	}
+	mutex_unlock(&group->list_lock);
+}
+
 #define RESUME_PENDING	~0x0ull
 static void submit_exec_queue(struct xe_exec_queue *q, struct xe_sched_job *job)
 {
@@ -1098,20 +1173,6 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
 			       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
 }
 
-static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
-{
-	struct xe_guc *guc = exec_queue_to_guc(q);
-	struct xe_device *xe = guc_to_xe(guc);
-
-	/** to wakeup xe_wait_user_fence ioctl if exec queue is reset */
-	wake_up_all(&xe->ufence_wq);
-
-	if (xe_exec_queue_is_lr(q))
-		queue_work(guc_to_gt(guc)->ordered_wq, &q->guc->lr_tdr);
-	else
-		xe_sched_tdr_queue_imm(&q->guc->sched);
-}
-
 /**
  * xe_guc_submit_wedge() - Wedge GuC submission
  * @guc: the GuC object
@@ -1185,8 +1246,12 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
 	if (!exec_queue_killed(q))
 		wedged = guc_submit_hint_wedged(exec_queue_to_guc(q));
 
-	/* Kill the run_job / process_msg entry points */
-	xe_sched_submission_stop(sched);
+	/*
+	 * Kill the run_job / process_msg entry points.
+	 * As this function is serialized across exec queues, it is safe to
+	 * stop and restart submission on all queues of a multi queue group.
+	 */
+	xe_guc_exec_queue_submission_stop(q);
 
 	/*
 	 * Engine state now mostly stable, disable scheduling / deregister if
@@ -1222,7 +1287,7 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
 				   q->guc->id);
 			xe_devcoredump(q, NULL, "Schedule disable failed to respond, guc_id=%d\n",
 				       q->guc->id);
-			xe_sched_submission_start(sched);
+			xe_guc_exec_queue_submission_start(q);
 			xe_gt_reset_async(q->gt);
 			return;
 		}
@@ -1233,7 +1298,11 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
 
 	xe_hw_fence_irq_stop(q->fence_irq);
 
-	xe_sched_submission_start(sched);
+	xe_guc_exec_queue_submission_start(q);
+
+	/* Trigger cleanup of secondary queues of multi queue group */
+	if (xe_exec_queue_is_multi_queue_primary(q))
+		xe_guc_exec_queue_trigger_secondary_cleanup(q);
 
 	spin_lock(&sched->base.job_list_lock);
 	list_for_each_entry(job, &sched->base.pending_list, drm.list)
@@ -1392,8 +1461,12 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	    vf_recovery(guc))
 		return DRM_GPU_SCHED_STAT_NO_HANG;
 
-	/* Kill the run_job entry point */
-	xe_sched_submission_stop(sched);
+	/*
+	 * Kill the run_job entry point.
+	 * As this function is serialized across exec queues, it is safe to
+	 * stop and restart submission on all queues of a multi queue group.
+	 */
+	xe_guc_exec_queue_submission_stop(q);
 
 	/* Must check all state after stopping scheduler */
 	skip_timeout_check = exec_queue_reset(q) ||
@@ -1552,7 +1625,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	 * fences that are complete
 	 */
 	xe_sched_add_pending_job(sched, job);
-	xe_sched_submission_start(sched);
+	xe_guc_exec_queue_submission_start(q);
 
 	xe_guc_exec_queue_trigger_cleanup(q);
 
@@ -1565,6 +1638,10 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	/* Start fence signaling */
 	xe_hw_fence_irq_start(q->fence_irq);
 
+	/* Trigger cleanup of secondary queues of multi queue group */
+	if (xe_exec_queue_is_multi_queue_primary(q))
+		xe_guc_exec_queue_trigger_secondary_cleanup(q);
+
 	return DRM_GPU_SCHED_STAT_RESET;
 
 sched_enable:
@@ -1576,7 +1653,11 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	 * but there is not currently an easy way to do in DRM scheduler. With
 	 * some thought, do this in a follow up.
 	 */
-	xe_sched_submission_start(sched);
+	xe_guc_exec_queue_submission_start(q);
+
+	/* Trigger cleanup of secondary queues of multi queue group */
+	if (xe_exec_queue_is_multi_queue_primary(q))
+		xe_guc_exec_queue_trigger_secondary_cleanup(q);
 handle_vf_resume:
 	return DRM_GPU_SCHED_STAT_NO_HANG;
 }
@@ -1607,6 +1688,14 @@ static void __guc_exec_queue_destroy_async(struct work_struct *w)
 	xe_pm_runtime_get(guc_to_xe(guc));
 	trace_xe_exec_queue_destroy(q);
 
+	if (xe_exec_queue_is_multi_queue_secondary(q)) {
+		struct xe_exec_queue_group *group = q->multi_queue.group;
+
+		mutex_lock(&group->list_lock);
+		list_del(&q->multi_queue.link);
+		mutex_unlock(&group->list_lock);
+	}
+
 	if (xe_exec_queue_is_lr(q))
 		cancel_work_sync(&ge->lr_tdr);
 	/* Confirm no work left behind accessing device structures */
@@ -1897,6 +1986,19 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
 
 	xe_exec_queue_assign_name(q, q->guc->id);
 
+	/*
+	 * Maintain secondary queues of the multi queue group in a list
+	 * for handling dependencies across the queues in the group.
+	 */
+	if (xe_exec_queue_is_multi_queue_secondary(q)) {
+		struct xe_exec_queue_group *group = q->multi_queue.group;
+
+		INIT_LIST_HEAD(&q->multi_queue.link);
+		mutex_lock(&group->list_lock);
+		list_add_tail(&q->multi_queue.link, &group->list);
+		mutex_unlock(&group->list_lock);
+	}
+
 	trace_xe_exec_queue_create(q);
 
 	return 0;
@@ -2125,6 +2227,10 @@ static void guc_exec_queue_resume(struct xe_exec_queue *q)
 
 static bool guc_exec_queue_reset_status(struct xe_exec_queue *q)
 {
+	if (xe_exec_queue_is_multi_queue_secondary(q) &&
+	    guc_exec_queue_reset_status(xe_exec_queue_multi_queue_primary(q)))
+		return true;
+
 	return exec_queue_reset(q) || exec_queue_killed_or_banned_or_wedged(q);
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 10/16] drm/xe/multi_queue: Set QUEUE_DRAIN_MODE for Multi Queue batches
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
                   ` (8 preceding siblings ...)
  2025-10-31 18:29 ` [PATCH 09/16] drm/xe/multi_queue: Handle tearing down of a multi queue Niranjana Vishwanathapura
@ 2025-10-31 18:29 ` Niranjana Vishwanathapura
  2025-11-02 18:22   ` Matthew Brost
  2025-10-31 18:29 ` [PATCH 11/16] drm/xe/multi_queue: Handle CGP context error Niranjana Vishwanathapura
                   ` (10 subsequent siblings)
  20 siblings, 1 reply; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-10-31 18:29 UTC (permalink / raw)
  To: intel-xe

To properly support soft light restore between batches
being arbitrated at the CFEG, PIPE_CONTROL instructions
have a new bit in the first DW, QUEUE_DRAIN_MODE. When
set, this indicates to the CFEG that it should only
drain the current queue.

Additionally we no longer want to set the CS_STALL bit
for these multi queue queues as this causes the entire
pipeline to stall waiting for completion of the prior
batch, preventing this soft light restore from occurring
between queues in a queue group.

Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 .../gpu/drm/xe/instructions/xe_gpu_commands.h |  1 +
 drivers/gpu/drm/xe/xe_ring_ops.c              | 68 ++++++++++++-------
 2 files changed, 45 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
index 5d41ca297447..885fcf211e6d 100644
--- a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
+++ b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
@@ -47,6 +47,7 @@
 
 #define GFX_OP_PIPE_CONTROL(len)	((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2))
 
+#define   PIPE_CONTROL0_QUEUE_DRAIN_MODE		BIT(12)
 #define	  PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE	BIT(10)	/* gen12 */
 #define	  PIPE_CONTROL0_HDC_PIPELINE_FLUSH		BIT(9)	/* gen12 */
 
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index ac0c6dcffe15..71f0e19fe8ba 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -12,7 +12,7 @@
 #include "regs/xe_engine_regs.h"
 #include "regs/xe_gt_regs.h"
 #include "regs/xe_lrc_layout.h"
-#include "xe_exec_queue_types.h"
+#include "xe_exec_queue.h"
 #include "xe_gt.h"
 #include "xe_lrc.h"
 #include "xe_macros.h"
@@ -135,12 +135,11 @@ emit_pipe_control(u32 *dw, int i, u32 bit_group_0, u32 bit_group_1, u32 offset,
 	return i;
 }
 
-static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
-				int i)
+static int emit_pipe_invalidate(struct xe_exec_queue *q, u32 mask_flags,
+				bool invalidate_tlb, u32 *dw, int i)
 {
 	u32 flags0 = 0;
-	u32 flags1 = PIPE_CONTROL_CS_STALL |
-		PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
+	u32 flags1 = PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
 		PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE |
 		PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
 		PIPE_CONTROL_VF_CACHE_INVALIDATE |
@@ -152,6 +151,11 @@ static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
 	if (invalidate_tlb)
 		flags1 |= PIPE_CONTROL_TLB_INVALIDATE;
 
+	if (xe_exec_queue_is_multi_queue(q))
+		flags0 |= PIPE_CONTROL0_QUEUE_DRAIN_MODE;
+	else
+		flags1 |= PIPE_CONTROL_CS_STALL;
+
 	flags1 &= ~mask_flags;
 
 	if (flags1 & PIPE_CONTROL_VF_CACHE_INVALIDATE)
@@ -175,54 +179,70 @@ static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
 
 static int emit_render_cache_flush(struct xe_sched_job *job, u32 *dw, int i)
 {
-	struct xe_gt *gt = job->q->gt;
+	struct xe_exec_queue *q = job->q;
+	struct xe_gt *gt = q->gt;
 	bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
-	u32 flags;
+	u32 flags0, flags1;
 
 	if (XE_GT_WA(gt, 14016712196))
 		i = emit_pipe_control(dw, i, 0, PIPE_CONTROL_DEPTH_CACHE_FLUSH,
 				      LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
 
-	flags = (PIPE_CONTROL_CS_STALL |
-		 PIPE_CONTROL_TILE_CACHE_FLUSH |
+	flags0 = PIPE_CONTROL0_HDC_PIPELINE_FLUSH;
+	flags1 = (PIPE_CONTROL_TILE_CACHE_FLUSH |
 		 PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
 		 PIPE_CONTROL_DEPTH_CACHE_FLUSH |
 		 PIPE_CONTROL_DC_FLUSH_ENABLE |
 		 PIPE_CONTROL_FLUSH_ENABLE);
 
 	if (XE_GT_WA(gt, 1409600907))
-		flags |= PIPE_CONTROL_DEPTH_STALL;
+		flags1 |= PIPE_CONTROL_DEPTH_STALL;
 
 	if (lacks_render)
-		flags &= ~PIPE_CONTROL_3D_ARCH_FLAGS;
+		flags1 &= ~PIPE_CONTROL_3D_ARCH_FLAGS;
 	else if (job->q->class == XE_ENGINE_CLASS_COMPUTE)
-		flags &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
+		flags1 &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
+
+	if (xe_exec_queue_is_multi_queue(q))
+		flags0 |= PIPE_CONTROL0_QUEUE_DRAIN_MODE;
+	else
+		flags1 |= PIPE_CONTROL_CS_STALL;
 
-	return emit_pipe_control(dw, i, PIPE_CONTROL0_HDC_PIPELINE_FLUSH, flags, 0, 0);
+	return emit_pipe_control(dw, i, flags0, flags1, 0, 0);
 }
 
-static int emit_pipe_control_to_ring_end(struct xe_hw_engine *hwe, u32 *dw, int i)
+static int emit_pipe_control_to_ring_end(struct xe_exec_queue *q, u32 *dw, int i)
 {
+	u32 flags0 = 0, flags1 = PIPE_CONTROL_LRI_POST_SYNC;
+	struct xe_hw_engine *hwe = q->hwe;
+
 	if (hwe->class != XE_ENGINE_CLASS_RENDER)
 		return i;
 
+	if (xe_exec_queue_is_multi_queue(q))
+		flags0 |= PIPE_CONTROL0_QUEUE_DRAIN_MODE;
+
 	if (XE_GT_WA(hwe->gt, 16020292621))
-		i = emit_pipe_control(dw, i, 0, PIPE_CONTROL_LRI_POST_SYNC,
+		i = emit_pipe_control(dw, i, flags0, flags1,
 				      RING_NOPID(hwe->mmio_base).addr, 0);
 
 	return i;
 }
 
-static int emit_pipe_imm_ggtt(u32 addr, u32 value, bool stall_only, u32 *dw,
-			      int i)
+static int emit_pipe_imm_ggtt(struct xe_exec_queue *q, u32 addr, u32 value,
+			      bool stall_only, u32 *dw, int i)
 {
-	u32 flags = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_GLOBAL_GTT_IVB |
-		    PIPE_CONTROL_QW_WRITE;
+	u32 flags0 = 0, flags1 = PIPE_CONTROL_GLOBAL_GTT_IVB | PIPE_CONTROL_QW_WRITE;
 
 	if (!stall_only)
-		flags |= PIPE_CONTROL_FLUSH_ENABLE;
+		flags1 |= PIPE_CONTROL_FLUSH_ENABLE;
+
+	if (xe_exec_queue_is_multi_queue(q))
+		flags0 |= PIPE_CONTROL0_QUEUE_DRAIN_MODE;
+	else
+		flags1 |= PIPE_CONTROL_CS_STALL;
 
-	return emit_pipe_control(dw, i, 0, flags, addr, value);
+	return emit_pipe_control(dw, i, flags0, flags1, addr, value);
 }
 
 static u32 get_ppgtt_flag(struct xe_sched_job *job)
@@ -371,7 +391,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
 		mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
 
 	/* See __xe_pt_bind_vma() for a discussion on TLB invalidations. */
-	i = emit_pipe_invalidate(mask_flags, job->ring_ops_flush_tlb, dw, i);
+	i = emit_pipe_invalidate(job->q, mask_flags, job->ring_ops_flush_tlb, dw, i);
 
 	/* hsdes: 1809175790 */
 	if (has_aux_ccs(xe))
@@ -391,11 +411,11 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
 						job->user_fence.value,
 						dw, i);
 
-	i = emit_pipe_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc), seqno, lacks_render, dw, i);
+	i = emit_pipe_imm_ggtt(job->q, xe_lrc_seqno_ggtt_addr(lrc), seqno, lacks_render, dw, i);
 
 	i = emit_user_interrupt(dw, i);
 
-	i = emit_pipe_control_to_ring_end(job->q->hwe, dw, i);
+	i = emit_pipe_control_to_ring_end(job->q, dw, i);
 
 	xe_gt_assert(gt, i <= MAX_JOB_SIZE_DW);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 11/16] drm/xe/multi_queue: Handle CGP context error
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
                   ` (9 preceding siblings ...)
  2025-10-31 18:29 ` [PATCH 10/16] drm/xe/multi_queue: Set QUEUE_DRAIN_MODE for Multi Queue batches Niranjana Vishwanathapura
@ 2025-10-31 18:29 ` Niranjana Vishwanathapura
  2025-11-02 18:29   ` Matthew Brost
  2025-10-31 18:29 ` [PATCH 12/16] drm/xe/multi_queue: Tracepoint support Niranjana Vishwanathapura
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-10-31 18:29 UTC (permalink / raw)
  To: intel-xe

Trigger multi-queue context cleanup upon CGP context error
notification from GuC.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/xe/abi/guc_actions_abi.h |  1 +
 drivers/gpu/drm/xe/xe_guc_ct.c           |  4 +++
 drivers/gpu/drm/xe/xe_guc_submit.c       | 33 ++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_guc_submit.h       |  2 ++
 drivers/gpu/drm/xe/xe_trace.h            |  5 ++++
 5 files changed, 45 insertions(+)

diff --git a/drivers/gpu/drm/xe/abi/guc_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
index 3e9fbed9cda6..8af3691626bf 100644
--- a/drivers/gpu/drm/xe/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
@@ -142,6 +142,7 @@ enum xe_guc_action {
 	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE = 0x4602,
 	XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC = 0x4603,
 	XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE = 0x4604,
+	XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CGP_CONTEXT_ERROR = 0x4605,
 	XE_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507,
 	XE_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A,
 	XE_GUC_ACTION_SET_DEVICE_ENGINE_ACTIVITY_BUFFER = 0x550C,
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 48b5006eb080..d0e19af0b4d2 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -1574,6 +1574,10 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
 	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
 		ret = xe_guc_exec_queue_cgp_sync_done_handler(guc, payload, adj_len);
 		break;
+	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CGP_CONTEXT_ERROR:
+		ret = xe_guc_exec_queue_cgp_context_error_handler(guc, payload,
+								  adj_len);
+		break;
 	default:
 		xe_gt_err(gt, "unexpected G2H action 0x%04x\n", action);
 	}
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 87c13feb2cef..605352145d76 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -48,6 +48,8 @@
 #include "xe_vm.h"
 #include "xe_bo.h"
 
+#define XE_GUC_EXEC_QUEUE_CGP_CONTEXT_ERROR_LEN		6
+
 static struct xe_guc *
 exec_queue_to_guc(struct xe_exec_queue *q)
 {
@@ -3001,6 +3003,37 @@ int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 le
 	return 0;
 }
 
+int xe_guc_exec_queue_cgp_context_error_handler(struct xe_guc *guc, u32 *msg,
+						u32 len)
+{
+	struct xe_gt *gt = guc_to_gt(guc);
+	struct xe_device *xe = guc_to_xe(guc);
+	struct xe_exec_queue *q;
+	u32 guc_id = msg[2];
+
+	if (unlikely(len != XE_GUC_EXEC_QUEUE_CGP_CONTEXT_ERROR_LEN)) {
+		drm_err(&xe->drm, "Invalid length %u", len);
+		return -EPROTO;
+	}
+
+	q = g2h_exec_queue_lookup(guc, guc_id);
+	if (unlikely(!q))
+		return -EPROTO;
+
+	xe_gt_dbg(gt,
+		  "CGP context error: region=%s err=0x%x, context=0x%x LRCA=0x%x:0x%x SgId=0x%x",
+		  msg[0] & 1 ? "uc" : "kmd", msg[1], msg[2], msg[4], msg[3], msg[5]);
+
+	trace_xe_exec_queue_cgp_context_error(q);
+
+	/* Treat the same as engine reset */
+	set_exec_queue_reset(q);
+	if (!exec_queue_banned(q) && !exec_queue_check_timeout(q))
+		xe_guc_exec_queue_trigger_cleanup(q);
+
+	return 0;
+}
+
 /**
  * xe_guc_exec_queue_cgp_sync_done_handler - CGP synchronization done handler
  * @guc: guc
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
index abfa94bce391..01b013a90b1b 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.h
+++ b/drivers/gpu/drm/xe/xe_guc_submit.h
@@ -35,6 +35,8 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
 int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 len);
 int xe_guc_error_capture_handler(struct xe_guc *guc, u32 *msg, u32 len);
 int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
+int xe_guc_exec_queue_cgp_context_error_handler(struct xe_guc *guc, u32 *msg,
+						u32 len);
 
 struct xe_guc_submit_exec_queue_snapshot *
 xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q);
diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
index 79a97b086cb2..c9d0748dae9d 100644
--- a/drivers/gpu/drm/xe/xe_trace.h
+++ b/drivers/gpu/drm/xe/xe_trace.h
@@ -172,6 +172,11 @@ DEFINE_EVENT(xe_exec_queue, xe_exec_queue_memory_cat_error,
 	     TP_ARGS(q)
 );
 
+DEFINE_EVENT(xe_exec_queue, xe_exec_queue_cgp_context_error,
+	     TP_PROTO(struct xe_exec_queue *q),
+	     TP_ARGS(q)
+);
+
 DEFINE_EVENT(xe_exec_queue, xe_exec_queue_stop,
 	     TP_PROTO(struct xe_exec_queue *q),
 	     TP_ARGS(q)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 12/16] drm/xe/multi_queue: Tracepoint support
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
                   ` (10 preceding siblings ...)
  2025-10-31 18:29 ` [PATCH 11/16] drm/xe/multi_queue: Handle CGP context error Niranjana Vishwanathapura
@ 2025-10-31 18:29 ` Niranjana Vishwanathapura
  2025-11-01 18:32   ` Matthew Brost
  2025-10-31 18:29 ` [PATCH 13/16] drm/xe/multi_queue: Support active group after primary is destroyed Niranjana Vishwanathapura
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-10-31 18:29 UTC (permalink / raw)
  To: intel-xe

Add xe_exec_queue_create_multi_queue event with
multi-queue information.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_submit.c |  5 +++-
 drivers/gpu/drm/xe/xe_trace.h      | 41 ++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 605352145d76..bc7296edf1be 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -2001,7 +2001,10 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
 		mutex_unlock(&group->list_lock);
 	}
 
-	trace_xe_exec_queue_create(q);
+	if (xe_exec_queue_is_multi_queue(q))
+		trace_xe_exec_queue_create_multi_queue(q);
+	else
+		trace_xe_exec_queue_create(q);
 
 	return 0;
 
diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
index c9d0748dae9d..6d12fcc13f43 100644
--- a/drivers/gpu/drm/xe/xe_trace.h
+++ b/drivers/gpu/drm/xe/xe_trace.h
@@ -13,6 +13,7 @@
 #include <linux/types.h>
 
 #include "xe_exec_queue_types.h"
+#include "xe_exec_queue.h"
 #include "xe_gpu_scheduler_types.h"
 #include "xe_gt_types.h"
 #include "xe_guc_exec_queue_types.h"
@@ -97,11 +98,51 @@ DECLARE_EVENT_CLASS(xe_exec_queue,
 			      __entry->guc_state, __entry->flags)
 );
 
+DECLARE_EVENT_CLASS(xe_exec_queue_multi_queue,
+		    TP_PROTO(struct xe_exec_queue *q),
+		    TP_ARGS(q),
+
+		    TP_STRUCT__entry(
+			     __string(dev, __dev_name_eq(q))
+			     __field(enum xe_engine_class, class)
+			     __field(u32, logical_mask)
+			     __field(u8, gt_id)
+			     __field(u16, width)
+			     __field(u32, guc_id)
+			     __field(u32, guc_state)
+			     __field(u32, flags)
+			     __field(u32, primary)
+			     ),
+
+		    TP_fast_assign(
+			   __assign_str(dev);
+			   __entry->class = q->class;
+			   __entry->logical_mask = q->logical_mask;
+			   __entry->gt_id = q->gt->info.id;
+			   __entry->width = q->width;
+			   __entry->guc_id = q->guc->id;
+			   __entry->guc_state = atomic_read(&q->guc->state);
+			   __entry->flags = q->flags;
+			   __entry->primary = xe_exec_queue_multi_queue_primary(q)->guc->id;
+			   ),
+
+		    TP_printk("dev=%s, %d:0x%x, gt=%d, width=%d guc_id=%d, guc_state=0x%x, flags=0x%x, primary=%d",
+			      __get_str(dev), __entry->class, __entry->logical_mask,
+			      __entry->gt_id, __entry->width, __entry->guc_id,
+			      __entry->guc_state, __entry->flags,
+			      __entry->primary)
+);
+
 DEFINE_EVENT(xe_exec_queue, xe_exec_queue_create,
 	     TP_PROTO(struct xe_exec_queue *q),
 	     TP_ARGS(q)
 );
 
+DEFINE_EVENT(xe_exec_queue_multi_queue, xe_exec_queue_create_multi_queue,
+	     TP_PROTO(struct xe_exec_queue *q),
+	     TP_ARGS(q)
+);
+
 DEFINE_EVENT(xe_exec_queue, xe_exec_queue_supress_resume,
 	     TP_PROTO(struct xe_exec_queue *q),
 	     TP_ARGS(q)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 13/16] drm/xe/multi_queue: Support active group after primary is destroyed
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
                   ` (11 preceding siblings ...)
  2025-10-31 18:29 ` [PATCH 12/16] drm/xe/multi_queue: Tracepoint support Niranjana Vishwanathapura
@ 2025-10-31 18:29 ` Niranjana Vishwanathapura
  2025-11-03 22:05   ` Matthew Brost
  2025-10-31 18:29 ` [PATCH 14/16] drm/xe/doc: Add documentation for Multi Queue Group Niranjana Vishwanathapura
                   ` (7 subsequent siblings)
  20 siblings, 1 reply; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-10-31 18:29 UTC (permalink / raw)
  To: intel-xe

Add support to keep the group active after the primary queue is
destroyed. Instead of killing the primary queue during exec_queue
destroy ioctl, kill it when all the secondary queues of the group
are killed.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c           |  7 ++-
 drivers/gpu/drm/xe/xe_exec_queue.c       | 55 +++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_exec_queue.h       |  2 +
 drivers/gpu/drm/xe/xe_exec_queue_types.h |  4 ++
 include/uapi/drm/xe_drm.h                |  5 +++
 5 files changed, 70 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 0b496676527a..708a17c357e6 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -176,7 +176,12 @@ static void xe_file_close(struct drm_device *dev, struct drm_file *file)
 	xa_for_each(&xef->exec_queue.xa, idx, q) {
 		if (q->vm && q->hwe->hw_engine_group)
 			xe_hw_engine_group_del_exec_queue(q->hwe->hw_engine_group, q);
-		xe_exec_queue_kill(q);
+
+		if (xe_exec_queue_is_multi_queue_primary(q))
+			xe_exec_queue_group_kill_put(q->multi_queue.group);
+		else
+			xe_exec_queue_kill(q);
+
 		xe_exec_queue_put(q);
 	}
 	xa_for_each(&xef->vm.xa, idx, vm)
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 3c1bb4f10fd5..d7b0173691c1 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -405,6 +405,26 @@ struct xe_exec_queue *xe_exec_queue_create_bind(struct xe_device *xe,
 }
 ALLOW_ERROR_INJECTION(xe_exec_queue_create_bind, ERRNO);
 
+static void xe_exec_queue_group_kill(struct kref *ref)
+{
+	struct xe_exec_queue_group *group = container_of(ref, struct xe_exec_queue_group,
+							 kill_refcount);
+	xe_exec_queue_kill(group->primary);
+}
+
+static inline void xe_exec_queue_group_kill_get(struct xe_exec_queue_group *group)
+{
+	kref_get(&group->kill_refcount);
+}
+
+void xe_exec_queue_group_kill_put(struct xe_exec_queue_group *group)
+{
+	if (!group)
+		return;
+
+	kref_put(&group->kill_refcount, xe_exec_queue_group_kill);
+}
+
 void xe_exec_queue_destroy(struct kref *ref)
 {
 	struct xe_exec_queue *q = container_of(ref, struct xe_exec_queue, refcount);
@@ -607,6 +627,7 @@ static int xe_exec_queue_group_init(struct xe_device *xe, struct xe_exec_queue *
 	group->primary = q;
 	group->cgp_bo = bo;
 	INIT_LIST_HEAD(&group->list);
+	kref_init(&group->kill_refcount);
 	xa_init_flags(&group->xa, XA_FLAGS_ALLOC1);
 	mutex_init(&group->lock);
 	mutex_init(&group->list_lock);
@@ -675,6 +696,11 @@ static int xe_exec_queue_group_add(struct xe_device *xe, struct xe_exec_queue *q
 	q->multi_queue.pos = pos;
 	mutex_unlock(&group->lock);
 
+	if (group->primary->multi_queue.keep_active) {
+		xe_exec_queue_group_kill_get(group);
+		q->multi_queue.keep_active = true;
+	}
+
 	return 0;
 }
 
@@ -691,6 +717,11 @@ static void xe_exec_queue_group_delete(struct xe_exec_queue *q)
 	if (lrc)
 		xe_lrc_put(lrc);
 	mutex_unlock(&group->lock);
+
+	if (q->multi_queue.keep_active) {
+		xe_exec_queue_group_kill_put(group);
+		q->multi_queue.keep_active = false;
+	}
 }
 
 static int exec_queue_set_multi_group(struct xe_device *xe, struct xe_exec_queue *q,
@@ -709,12 +740,24 @@ static int exec_queue_set_multi_group(struct xe_device *xe, struct xe_exec_queue
 		return -EINVAL;
 
 	if (value & DRM_XE_MULTI_GROUP_CREATE) {
-		if (XE_IOCTL_DBG(xe, value & ~DRM_XE_MULTI_GROUP_CREATE))
+		if (XE_IOCTL_DBG(xe, value & ~(DRM_XE_MULTI_GROUP_CREATE |
+					       DRM_XE_MULTI_GROUP_KEEP_ACTIVE)))
+			return -EINVAL;
+
+		/*
+		 * KEEP_ACTIVE is not supported in preempt fence mode as in that mode,
+		 * VM_DESTROY ioctl expects all exec queues of that VM are already killed.
+		 */
+		if (XE_IOCTL_DBG(xe, (value & DRM_XE_MULTI_GROUP_KEEP_ACTIVE) &&
+				 xe_vm_in_preempt_fence_mode(q->vm)))
 			return -EINVAL;
 
 		q->multi_queue.valid = true;
 		q->multi_queue.is_primary = true;
 		q->multi_queue.pos = 0;
+		if (value & DRM_XE_MULTI_GROUP_KEEP_ACTIVE)
+			q->multi_queue.keep_active = true;
+
 		return 0;
 	}
 
@@ -1254,6 +1297,11 @@ void xe_exec_queue_kill(struct xe_exec_queue *q)
 
 	q->ops->kill(q);
 	xe_vm_remove_compute_exec_queue(q->vm, q);
+
+	if (!xe_exec_queue_is_multi_queue_primary(q) && q->multi_queue.keep_active) {
+		xe_exec_queue_group_kill_put(q->multi_queue.group);
+		q->multi_queue.keep_active = false;
+	}
 }
 
 int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
@@ -1280,7 +1328,10 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
 	if (q->vm && q->hwe->hw_engine_group)
 		xe_hw_engine_group_del_exec_queue(q->hwe->hw_engine_group, q);
 
-	xe_exec_queue_kill(q);
+	if (xe_exec_queue_is_multi_queue_primary(q))
+		xe_exec_queue_group_kill_put(q->multi_queue.group);
+	else
+		xe_exec_queue_kill(q);
 
 	trace_xe_exec_queue_close(q);
 	xe_exec_queue_put(q);
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
index 61478b2e883b..b642341f1ede 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue.h
@@ -109,6 +109,8 @@ static inline struct xe_exec_queue *xe_exec_queue_multi_queue_primary(struct xe_
 	return xe_exec_queue_is_multi_queue(q) ? q->multi_queue.group->primary : q;
 }
 
+void xe_exec_queue_group_kill_put(struct xe_exec_queue_group *group);
+
 bool xe_exec_queue_is_lr(struct xe_exec_queue *q);
 
 bool xe_exec_queue_is_idle(struct xe_exec_queue *q);
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index e64b6588923e..cdca3afe838c 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -55,6 +55,8 @@ struct xe_exec_queue_group {
 	struct list_head list;
 	/** @list_lock: Secondary queue list lock */
 	struct mutex list_lock;
+	/** @kill_refcount: ref count to kill primary queue */
+	struct kref kill_refcount;
 	/** @sync_pending: CGP_SYNC_DONE g2h response pending */
 	bool sync_pending;
 };
@@ -152,6 +154,8 @@ struct xe_exec_queue {
 		u8 valid:1;
 		/** @multi_queue.is_primary: Is primary queue (Q0) of the group */
 		u8 is_primary:1;
+		/** @multi_queue.keep_active: Keep the group active after primary is destroyed */
+		u8 keep_active:1;
 	} multi_queue;
 
 	/** @sched_props: scheduling properties */
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index d72151163e77..333fb38b3404 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -1260,6 +1260,10 @@ struct drm_xe_vm_bind {
  *    then a new multi-queue group is created with this queue as the primary queue
  *    (Q0). Otherwise, the queue gets added to the multi-queue group whose primary
  *    queue id is specified in the 'value' field.
+ *    If the extension's 'value' field has %DRM_XE_MULTI_GROUP_KEEP_ACTIVE flag
+ *    set, then the multi-queue group is kept active after the primary queue is
+ *    destroyed.
+ *
  *  - %DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY - Set the queue
  *    priority within the multi-queue group.
  *
@@ -1304,6 +1308,7 @@ struct drm_xe_exec_queue_create {
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE		2
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP		3
 #define     DRM_XE_MULTI_GROUP_CREATE				(1ull << 63)
+#define     DRM_XE_MULTI_GROUP_KEEP_ACTIVE			(1ull << 62)
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY	4
 	/** @extensions: Pointer to the first extension struct, if any */
 	__u64 extensions;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 14/16] drm/xe/doc: Add documentation for Multi Queue Group
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
                   ` (12 preceding siblings ...)
  2025-10-31 18:29 ` [PATCH 13/16] drm/xe/multi_queue: Support active group after primary is destroyed Niranjana Vishwanathapura
@ 2025-10-31 18:29 ` Niranjana Vishwanathapura
  2025-10-31 18:29 ` [PATCH 15/16] drm/xe/doc: Add documentation for Multi Queue Group GuC interface Niranjana Vishwanathapura
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-10-31 18:29 UTC (permalink / raw)
  To: intel-xe

Add kernel documentation for Multi Queue group and update
the corresponding rst.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 Documentation/gpu/xe/xe_exec_queue.rst |  6 ++++
 drivers/gpu/drm/xe/xe_exec_queue.c     | 45 ++++++++++++++++++++++++++
 2 files changed, 51 insertions(+)

diff --git a/Documentation/gpu/xe/xe_exec_queue.rst b/Documentation/gpu/xe/xe_exec_queue.rst
index 6076569e311c..732af4741df4 100644
--- a/Documentation/gpu/xe/xe_exec_queue.rst
+++ b/Documentation/gpu/xe/xe_exec_queue.rst
@@ -7,6 +7,12 @@ Execution Queue
 .. kernel-doc:: drivers/gpu/drm/xe/xe_exec_queue.c
    :doc: Execution Queue
 
+Multi Queue Group
+=================
+
+.. kernel-doc:: drivers/gpu/drm/xe/xe_exec_queue.c
+   :doc: Multi Queue Group
+
 Internal API
 ============
 
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index d7b0173691c1..bf959b06a6e5 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -53,6 +53,51 @@
  * the ring operations the different engine classes support.
  */
 
+/**
+ * DOC: Multi Queue Group
+ *
+ * Multi Queue Group is another mode of execution supported by the compute
+ * and blitter copy command streamers (CCS and BCS, respectively). It is
+ * an enhancement of the existing hardware architecture and leverages the
+ * same submission model. It enables support for efficient, parallel
+ * execution of multiple queues within a single shared context. The multi
+ * queue group functionality is only supported with GuC submission backend.
+ * All the queues of a group must use the same address space (VM).
+ *
+ * The DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE execution queue property
+ * supports creating a multi queue group and adding queues to a queue group.
+ *
+ * The XE_EXEC_QUEUE_CREATE ioctl call with above property with value field
+ * set to DRM_XE_MULTI_GROUP_CREATE, will create a new multi queue group with
+ * the queue being created as the primary queue (aka q0) of the group. To add
+ * secondary queues to the group, they need to be created with the above
+ * property with id of the primary queue as the value. The properties of
+ * the primary queue (like priority, time slice) applies to the whole group.
+ * So, these properties can't be set for secondary queues of a group.
+ *
+ * The hardware does not support removing a queue from a multi-queue group.
+ * However, queues can be dynamically added to the group. A group can have
+ * up to 64 queues. To support this, XeKMD holds references to LRCs of the
+ * queues even after the queues are destroyed by the user until the whole
+ * group is destroyed. The secondary queues hold a reference to the primary
+ * queue thus preventing the group from being destroyed when user destroys
+ * the primary queue. Once the primary queue is destroyed, secondary queues
+ * can't be added to the queue group, but they can continue to submit the
+ * jobs if the DRM_XE_MULTI_GROUP_KEEP_ACTIVE flag is set during the multi
+ * queue group creation.
+ *
+ * The queues of a multi queue group can set their priority within the group
+ * through the DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY property.
+ * This multi queue priority can also be set dynamically through the
+ * XE_EXEC_QUEUE_SET_PROPERTY ioctl. This is the only other property
+ * supported by the secondary queues of a multi queue group, other than
+ * DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE.
+ *
+ * When GuC reports an error on any of the queues of a multi queue group,
+ * the queue cleanup mechanism is invoked for all the queues of the group
+ * as hardware cannot make progress on the multi queue context.
+ */
+
 enum xe_exec_queue_sched_prop {
 	XE_EXEC_QUEUE_JOB_TIMEOUT = 0,
 	XE_EXEC_QUEUE_TIMESLICE = 1,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 15/16] drm/xe/doc: Add documentation for Multi Queue Group GuC interface
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
                   ` (13 preceding siblings ...)
  2025-10-31 18:29 ` [PATCH 14/16] drm/xe/doc: Add documentation for Multi Queue Group Niranjana Vishwanathapura
@ 2025-10-31 18:29 ` Niranjana Vishwanathapura
  2025-10-31 18:29 ` [PATCH 16/16] drm/xe/multi_queue: Enable multi_queue on xe3p_xpc Niranjana Vishwanathapura
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-10-31 18:29 UTC (permalink / raw)
  To: intel-xe

Add kernel documentation for Multi Queue group GuC interface.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 Documentation/gpu/xe/xe_exec_queue.rst |  8 ++++
 drivers/gpu/drm/xe/xe_exec_queue.c     |  3 ++
 drivers/gpu/drm/xe/xe_guc_submit.c     | 57 ++++++++++++++++++++++++++
 3 files changed, 68 insertions(+)

diff --git a/Documentation/gpu/xe/xe_exec_queue.rst b/Documentation/gpu/xe/xe_exec_queue.rst
index 732af4741df4..8707806211c9 100644
--- a/Documentation/gpu/xe/xe_exec_queue.rst
+++ b/Documentation/gpu/xe/xe_exec_queue.rst
@@ -13,6 +13,14 @@ Multi Queue Group
 .. kernel-doc:: drivers/gpu/drm/xe/xe_exec_queue.c
    :doc: Multi Queue Group
 
+.. _multi-queue-group-guc-interface:
+
+Multi Queue Group GuC interface
+===============================
+
+.. kernel-doc:: drivers/gpu/drm/xe/xe_guc_submit.c
+   :doc: Multi Queue Group GuC interface
+
 Internal API
 ============
 
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index bf959b06a6e5..1f13b7e0b01b 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -96,6 +96,9 @@
  * When GuC reports an error on any of the queues of a multi queue group,
  * the queue cleanup mechanism is invoked for all the queues of the group
  * as hardware cannot make progress on the multi queue context.
+ *
+ * Refer :ref:`multi-queue-group-guc-interface` for multi queue group GuC
+ * interface.
  */
 
 enum xe_exec_queue_sched_prop {
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index bc7296edf1be..79e2c62e879e 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -581,6 +581,63 @@ static void set_min_preemption_timeout(struct xe_guc *guc, struct xe_exec_queue
 	xe_map_wr_field(xe_, &map_, 0, struct guc_submit_parallel_scratch, \
 			field_, val_)
 
+/**
+ * DOC: Multi Queue Group GuC interface
+ *
+ * The multi queue group coordination between KMD and GuC is through a software
+ * construct called Context Group Page (CGP). The CGP is a KMD managed 4KB page
+ * allocated in the global GTT.
+ *
+ * CGP format:
+ *
+ * +-----------+---------------------------+---------------------------------------------+
+ * | DWORD     | Name                      | Description                                 |
+ * +-----------+---------------------------+---------------------------------------------+
+ * | 0         | Version                   | Bits [15:8]=Major ver, [7:0]=Minor ver      |
+ * +-----------+---------------------------+---------------------------------------------+
+ * | 1..15     | RESERVED                  | MBZ                                         |
+ * +-----------+---------------------------+---------------------------------------------+
+ * | 16        | KMD_QUEUE_UPDATE_MASK_DW0 | KMD queue mask for queues 31..0             |
+ * +-----------+---------------------------+---------------------------------------------+
+ * | 17        | KMD_QUEUE_UPDATE_MASK_DW1 | KMD queue mask for queues 63..32            |
+ * +-----------+---------------------------+---------------------------------------------+
+ * | 18..31    | RESERVED                  | MBZ                                         |
+ * +-----------+---------------------------+---------------------------------------------+
+ * | 32        | Q0CD_DW0                  | Queue 0 context LRC descriptor lower DWORD  |
+ * +-----------+---------------------------+---------------------------------------------+
+ * | 33        | Q0ContextIndex            | Context ID for Queue 0                      |
+ * +-----------+---------------------------+---------------------------------------------+
+ * | 34        | Q1CD_DW0                  | Queue 1 context LRC descriptor lower DWORD  |
+ * +-----------+---------------------------+---------------------------------------------+
+ * | 35        | Q1ContextIndex            | Context ID for Queue 1                      |
+ * +-----------+---------------------------+---------------------------------------------+
+ * | ...       |...                        | ...                                         |
+ * +-----------+---------------------------+---------------------------------------------+
+ * | 158       | Q63CD_DW0                 | Queue 63 context LRC descriptor lower DWORD |
+ * +-----------+---------------------------+---------------------------------------------+
+ * | 159       | Q63ContextIndex           | Context ID for Queue 63                     |
+ * +-----------+---------------------------+---------------------------------------------+
+ * | 160..1024 | RESERVED                  | MBZ                                         |
+ * +-----------+---------------------------+---------------------------------------------+
+ *
+ * While registering Q0 with GuC, CGP is updated with Q0 entry and GuC is notified
+ * through XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE H2G message which specifies
+ * the CGP address. When the secondary queues are added to the group, the CGP is
+ * updated with entry for that queue and GuC is notified through the H2G interface
+ * XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC. GuC responds to these H2G messages
+ * with a XE_GUC_ACTION_NOTIFY_MULTIQ_CONTEXT_CGP_SYNC_DONE G2H message. GuC also
+ * sends a XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CGP_CONTEXT_ERROR notification for any
+ * error in the CGP. Only one of these CGP update messages can be outstanding
+ * (waiting for GuC response) at any time. The bits in KMD_QUEUE_UPDATE_MASK_DW*
+ * fields indicate which queue entry is being updated in the CGP.
+ *
+ * The primary queue (Q0) represents the multi queue group context in GuC and
+ * submission on any queue of the group must be through Q0 GuC interface only.
+ *
+ * As it is not required to register secondary queues with GuC, the secondary queue
+ * context ids in the CGP are populated with Q0 context id.
+ */
+
 #define CGP_VERSION_MAJOR_SHIFT	8
 
 static void xe_guc_exec_queue_group_cgp_update(struct xe_device *xe,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 16/16] drm/xe/multi_queue: Enable multi_queue on xe3p_xpc
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
                   ` (14 preceding siblings ...)
  2025-10-31 18:29 ` [PATCH 15/16] drm/xe/doc: Add documentation for Multi Queue Group GuC interface Niranjana Vishwanathapura
@ 2025-10-31 18:29 ` Niranjana Vishwanathapura
  2025-11-02  0:05   ` Matthew Brost
  2025-10-31 18:47 ` [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-10-31 18:29 UTC (permalink / raw)
  To: intel-xe

xe3p_xpc supports multi_queue, enable it.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
---
 drivers/gpu/drm/xe/xe_pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index b5eaf0fc105c..43f1cc47b9b3 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -111,6 +111,7 @@ static const struct xe_graphics_desc graphics_xe3p_xpc = {
 	.hw_engine_mask =
 		GENMASK(XE_HW_ENGINE_BCS8, XE_HW_ENGINE_BCS1) |
 		GENMASK(XE_HW_ENGINE_CCS3, XE_HW_ENGINE_CCS0),
+	.multi_queue_enable_mask = BIT(XE_ENGINE_CLASS_COPY) | BIT(XE_ENGINE_CLASS_COMPUTE),
 };
 
 static const struct xe_media_desc media_xem = {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/16] drm/xe: Multi Queue feature support
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
                   ` (15 preceding siblings ...)
  2025-10-31 18:29 ` [PATCH 16/16] drm/xe/multi_queue: Enable multi_queue on xe3p_xpc Niranjana Vishwanathapura
@ 2025-10-31 18:47 ` Niranjana Vishwanathapura
  2025-10-31 21:15 ` ✗ CI.checkpatch: warning for " Patchwork
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-10-31 18:47 UTC (permalink / raw)
  To: intel-xe

On Fri, Oct 31, 2025 at 11:29:20AM -0700, Niranjana Vishwanathapura wrote:
>Multi Queue is a new mode of execution supported by the compute and
>blitter copy command streamers (CCS and BCS, respectively). It is an
>enhancement of the existing hardware architecture and leverages the
>same submission model. It enables support for efficient, parallel
>execution of multiple queues within a single context.
>
>Add support for multi-queue feature and enable it on xe3p_xpc.
>

The IGT validation patch series for this feature is posted here,
https://patchwork.freedesktop.org/series/156866/

The Compute UMD multi-queue usecase is in the process of getting
posted upstream very soon. I will provide the link to it as soon
as it is available.

Regards,
Niranjana

>Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>
>Niranjana Vishwanathapura (16):
>  drm/xe/multi_queue: Add multi_queue_enable_mask to gt information
>  drm/xe/multi_queue: Add user interface for multi queue support
>  drm/xe/multi_queue: Add GuC interface for multi queue support
>  drm/xe/multi_queue: Add multi queue priority property
>  drm/xe/multi_queue: Handle invalid exec queue property setting
>  drm/xe/multi_queue: Add exec_queue set_property ioctl support
>  drm/xe/multi_queue: Add support for multi queue dynamic priority
>    change
>  drm/xe/multi_queue: Add multi queue information to guc_info dump
>  drm/xe/multi_queue: Handle tearing down of a multi queue
>  drm/xe/multi_queue: Set QUEUE_DRAIN_MODE for Multi Queue batches
>  drm/xe/multi_queue: Handle CGP context error
>  drm/xe/multi_queue: Tracepoint support
>  drm/xe/multi_queue: Support active group after primary is destroyed
>  drm/xe/doc: Add documentation for Multi Queue Group
>  drm/xe/doc: Add documentation for Multi Queue Group GuC interface
>  drm/xe/multi_queue: Enable multi_queue on xe3p_xpc
>
> Documentation/gpu/xe/xe_exec_queue.rst        |  14 +
> drivers/gpu/drm/xe/abi/guc_actions_abi.h      |   4 +
> .../gpu/drm/xe/instructions/xe_gpu_commands.h |   1 +
> drivers/gpu/drm/xe/xe_debugfs.c               |   2 +
> drivers/gpu/drm/xe/xe_device.c                |   9 +-
> drivers/gpu/drm/xe/xe_exec_queue.c            | 414 +++++++++++-
> drivers/gpu/drm/xe/xe_exec_queue.h            |  51 ++
> drivers/gpu/drm/xe/xe_exec_queue_types.h      |  51 ++
> drivers/gpu/drm/xe/xe_gt_types.h              |   5 +
> drivers/gpu/drm/xe/xe_guc_ct.c                |   8 +
> drivers/gpu/drm/xe/xe_guc_fwif.h              |   3 +
> drivers/gpu/drm/xe/xe_guc_submit.c            | 612 ++++++++++++++++--
> drivers/gpu/drm/xe/xe_guc_submit.h            |   3 +
> drivers/gpu/drm/xe/xe_guc_submit_types.h      |  13 +
> drivers/gpu/drm/xe/xe_lrc.c                   |  32 +
> drivers/gpu/drm/xe/xe_lrc.h                   |   5 +
> drivers/gpu/drm/xe/xe_pci.c                   |   2 +
> drivers/gpu/drm/xe/xe_pci_types.h             |   1 +
> drivers/gpu/drm/xe/xe_ring_ops.c              |  68 +-
> drivers/gpu/drm/xe/xe_trace.h                 |  46 ++
> include/uapi/drm/xe_drm.h                     |  40 ++
> 21 files changed, 1276 insertions(+), 108 deletions(-)
>
>-- 
>2.43.0
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/16] drm/xe/multi_queue: Add user interface for multi queue support
  2025-10-31 18:29 ` [PATCH 02/16] drm/xe/multi_queue: Add user interface for multi queue support Niranjana Vishwanathapura
@ 2025-10-31 19:31   ` Matthew Brost
  2025-11-03 22:58     ` Niranjana Vishwanathapura
  2025-11-02  0:23   ` Matthew Brost
  2025-11-02 17:37   ` Matthew Brost
  2 siblings, 1 reply; 61+ messages in thread
From: Matthew Brost @ 2025-10-31 19:31 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Fri, Oct 31, 2025 at 11:29:22AM -0700, Niranjana Vishwanathapura wrote:
> Multi Queue is a new mode of execution supported by the compute and
> blitter copy command streamers (CCS and BCS, respectively). It is an
> enhancement of the existing hardware architecture and leverages the
> same submission model. It enables support for efficient, parallel
> execution of multiple queues within a single context. All the queues
> of a group must use the same address space (VM).
> 
> The new DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE execution queue
> property supports creating a multi queue group and adding queues to
> a queue group. All queues of a multi queue group share the same
> context.
> 
> A exec queue create ioctl call with above property specified with value
> DRM_XE_SUPER_GROUP_CREATE will create a new multi queue group with the
> queue being created as the primary queue (aka q0) of the group. To add
> secondary queues to the group, they need to be created with the above
> property with id of the primary queue as the value. The properties of
> the primary queue (like priority, timeslice) applies to the whole group.
> So, these properties can't be set for secondary queues of a group.
> 
> Once destroyed, the secondary queues of a multi queue group can't be
> replaced. However, they can be dynamically added to the group up to a
> total of 64 queues per group. Once the primary queue is destroyed,
> secondary queues can't be added to the queue group.
> 
> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_exec_queue.c       | 191 ++++++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_exec_queue.h       |  47 ++++++
>  drivers/gpu/drm/xe/xe_exec_queue_types.h |  30 ++++
>  include/uapi/drm/xe_drm.h                |   8 +
>  4 files changed, 274 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 1b57d7c2cc94..86404a7c9fe4 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -12,6 +12,7 @@
>  #include <drm/drm_file.h>
>  #include <uapi/drm/xe_drm.h>
>  
> +#include "xe_bo.h"
>  #include "xe_dep_scheduler.h"
>  #include "xe_device.h"
>  #include "xe_gt.h"
> @@ -62,6 +63,32 @@ enum xe_exec_queue_sched_prop {
>  static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q,
>  				      u64 extensions, int ext_number);
>  
> +static void xe_exec_queue_group_cleanup(struct xe_exec_queue *q)
> +{
> +	struct xe_exec_queue_group *group = q->multi_queue.group;
> +	struct xe_lrc *lrc;
> +	unsigned long idx;
> +
> +	if (xe_exec_queue_is_multi_queue_secondary(q)) {
> +		xe_exec_queue_put(xe_exec_queue_multi_queue_primary(q));
> +		return;
> +	}
> +
> +	if (!group)
> +		return;
> +
> +	/* Primary queue cleanup */
> +	mutex_lock(&group->lock);

I don't think you need the group->lock here. Xarrays have their own
internal locking.

We do use mutexes around xarrays in Xe, but that's to protect the object
reference—not the xarray itself.

For example, we follow this pattern:

lock();
obj = xa_find();
if (obj)
	xe_obj_get(obj);
unlock();

Similarly, we apply a lock on the removal side. This prevents the object
from being removed and a reference being dropped in parallel with a
lookup (i.e., it avoids a use-after-free).

We don’t always use this pattern correctly—some of that is legacy code
we haven’t cleaned up yet—but we should.

In your case, you're not protecting any object references (i.e., there's
no lookup function involved), as far as I can tell. So there's no need
for a lock here.

Matt

> +	xa_for_each(&group->xa, idx, lrc)
> +		xe_lrc_put(lrc);
> +	mutex_unlock(&group->lock);
> +
> +	xa_destroy(&group->xa);
> +	mutex_destroy(&group->lock);
> +	xe_bo_unpin_map_no_vm(group->cgp_bo);
> +	kfree(group);
> +}
> +
>  static void __xe_exec_queue_free(struct xe_exec_queue *q)
>  {
>  	int i;
> @@ -72,6 +99,10 @@ static void __xe_exec_queue_free(struct xe_exec_queue *q)
>  
>  	if (xe_exec_queue_uses_pxp(q))
>  		xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp, q);
> +
> +	if (xe_exec_queue_is_multi_queue(q))
> +		xe_exec_queue_group_cleanup(q);
> +
>  	if (q->vm)
>  		xe_vm_put(q->vm);
>  
> @@ -549,6 +580,148 @@ exec_queue_set_pxp_type(struct xe_device *xe, struct xe_exec_queue *q, u64 value
>  	return xe_pxp_exec_queue_set_type(xe->pxp, q, DRM_XE_PXP_TYPE_HWDRM);
>  }
>  
> +static int xe_exec_queue_group_init(struct xe_device *xe, struct xe_exec_queue *q)
> +{
> +	struct xe_tile *tile = gt_to_tile(q->gt);
> +	struct xe_exec_queue_group *group;
> +	struct xe_bo *bo;
> +
> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> +	if (!group)
> +		return -ENOMEM;
> +
> +	bo = xe_bo_create_pin_map_novm(xe, tile, SZ_4K, ttm_bo_type_kernel,
> +				       XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> +				       XE_BO_FLAG_GGTT, false);
> +	if (IS_ERR(bo)) {
> +		drm_err(&xe->drm, "CGP bo allocation for queue group failed: %ld\n",
> +			PTR_ERR(bo));
> +		kfree(group);
> +		return PTR_ERR(bo);
> +	}
> +
> +	xe_map_memset(xe, &bo->vmap, 0, 0, SZ_4K);
> +
> +	group->primary = q;
> +	group->cgp_bo = bo;
> +	xa_init_flags(&group->xa, XA_FLAGS_ALLOC1);
> +	mutex_init(&group->lock);
> +	mutex_init(&group->list_lock);
> +	q->multi_queue.group = group;
> +
> +	return 0;
> +}
> +
> +static inline bool xe_exec_queue_supports_multi_queue(struct xe_exec_queue *q)
> +{
> +	return q->gt->info.multi_queue_enable_mask & BIT(q->class);
> +}
> +
> +static int xe_exec_queue_group_validate(struct xe_device *xe, struct xe_exec_queue *q,
> +					u32 primary_id)
> +{
> +	struct xe_exec_queue_group *group;
> +	struct xe_exec_queue *primary;
> +	int ret;
> +
> +	primary = xe_exec_queue_lookup(q->vm->xef, primary_id);
> +	if (XE_IOCTL_DBG(xe, !primary))
> +		return -ENOENT;
> +
> +	if (XE_IOCTL_DBG(xe, !xe_exec_queue_is_multi_queue_primary(primary)) ||
> +	    XE_IOCTL_DBG(xe, q->vm != primary->vm) ||
> +	    XE_IOCTL_DBG(xe, q->logical_mask != primary->logical_mask)) {
> +		ret = -EINVAL;
> +		goto put_primary;
> +	}
> +
> +	group = primary->multi_queue.group;
> +	q->multi_queue.valid = true;
> +	q->multi_queue.group = group;
> +
> +	return 0;
> +put_primary:
> +	xe_exec_queue_put(primary);
> +	return ret;
> +}
> +
> +#define XE_MAX_GROUP_SIZE	64
> +static int xe_exec_queue_group_add(struct xe_device *xe, struct xe_exec_queue *q)
> +{
> +	struct xe_exec_queue_group *group = q->multi_queue.group;
> +	u32 pos;
> +	int err;
> +
> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
> +		return 0;
> +
> +	mutex_lock(&group->lock);
> +	err = xa_alloc(&group->xa, &pos, xe_lrc_get(q->lrc[0]),
> +		       XA_LIMIT(1, XE_MAX_GROUP_SIZE - 1), GFP_KERNEL);
> +	if (XE_IOCTL_DBG(xe, err)) {
> +		xe_lrc_put(q->lrc[0]);
> +		mutex_unlock(&group->lock);
> +
> +		/* It is invalid if queue group limit is exceeded */
> +		if (err == -EBUSY)
> +			err = -EINVAL;
> +
> +		return err;
> +	}
> +
> +	q->multi_queue.pos = pos;
> +	mutex_unlock(&group->lock);
> +
> +	return 0;
> +}
> +
> +static void xe_exec_queue_group_delete(struct xe_exec_queue *q)
> +{
> +	struct xe_exec_queue_group *group = q->multi_queue.group;
> +	struct xe_lrc *lrc;
> +
> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
> +		return;
> +
> +	mutex_lock(&group->lock);
> +	lrc = xa_erase(&group->xa, q->multi_queue.pos);
> +	if (lrc)
> +		xe_lrc_put(lrc);
> +	mutex_unlock(&group->lock);
> +}
> +
> +static int exec_queue_set_multi_group(struct xe_device *xe, struct xe_exec_queue *q,
> +				      u64 value)
> +{
> +	if (XE_IOCTL_DBG(xe, !xe_exec_queue_supports_multi_queue(q)))
> +		return -ENODEV;
> +
> +	if (XE_IOCTL_DBG(xe, !xe_device_uc_enabled(xe)))
> +		return -EOPNOTSUPP;
> +
> +	if (XE_IOCTL_DBG(xe, xe_exec_queue_is_parallel(q)))
> +		return -EINVAL;
> +
> +	if (XE_IOCTL_DBG(xe, xe_exec_queue_is_multi_queue(q)))
> +		return -EINVAL;
> +
> +	if (value & DRM_XE_MULTI_GROUP_CREATE) {
> +		if (XE_IOCTL_DBG(xe, value & ~DRM_XE_MULTI_GROUP_CREATE))
> +			return -EINVAL;
> +
> +		q->multi_queue.valid = true;
> +		q->multi_queue.is_primary = true;
> +		q->multi_queue.pos = 0;
> +		return 0;
> +	}
> +
> +	/* While adding secondary queues, the upper 32 bits must be 0 */
> +	if (XE_IOCTL_DBG(xe, value & (~0ull << 32)))
> +		return -EINVAL;
> +
> +	return xe_exec_queue_group_validate(xe, q, value);
> +}
> +
>  typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
>  					     struct xe_exec_queue *q,
>  					     u64 value);
> @@ -557,6 +730,7 @@ static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY] = exec_queue_set_priority,
>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE] = exec_queue_set_timeslice,
>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE] = exec_queue_set_pxp_type,
> +	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP] = exec_queue_set_multi_group,
>  };
>  
>  static int exec_queue_user_ext_set_property(struct xe_device *xe,
> @@ -577,7 +751,8 @@ static int exec_queue_user_ext_set_property(struct xe_device *xe,
>  	    XE_IOCTL_DBG(xe, ext.pad) ||
>  	    XE_IOCTL_DBG(xe, ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY &&
>  			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE &&
> -			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE))
> +			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE &&
> +			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP))
>  		return -EINVAL;
>  
>  	idx = array_index_nospec(ext.property, ARRAY_SIZE(exec_queue_set_property_funcs));
> @@ -626,6 +801,12 @@ static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue
>  		return exec_queue_user_extensions(xe, q, ext.next_extension,
>  						  ++ext_number);
>  
> +	if (xe_exec_queue_is_multi_queue_primary(q)) {
> +		err = xe_exec_queue_group_init(xe, q);
> +		if (XE_IOCTL_DBG(xe, err))
> +			return err;
> +	}
> +
>  	return 0;
>  }
>  
> @@ -780,12 +961,16 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>  		if (IS_ERR(q))
>  			return PTR_ERR(q);
>  
> +		err = xe_exec_queue_group_add(xe, q);
> +		if (XE_IOCTL_DBG(xe, err))
> +			goto put_exec_queue;
> +
>  		if (xe_vm_in_preempt_fence_mode(vm)) {
>  			q->lr.context = dma_fence_context_alloc(1);
>  
>  			err = xe_vm_add_compute_exec_queue(vm, q);
>  			if (XE_IOCTL_DBG(xe, err))
> -				goto put_exec_queue;
> +				goto delete_queue_group;
>  		}
>  
>  		if (q->vm && q->hwe->hw_engine_group) {
> @@ -808,6 +993,8 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>  
>  kill_exec_queue:
>  	xe_exec_queue_kill(q);
> +delete_queue_group:
> +	xe_exec_queue_group_delete(q);
>  put_exec_queue:
>  	xe_exec_queue_put(q);
>  	return err;
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
> index a4dfbe858bda..8cd6487018fa 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> @@ -62,6 +62,53 @@ static inline bool xe_exec_queue_uses_pxp(struct xe_exec_queue *q)
>  	return q->pxp.type;
>  }
>  
> +/**
> + * xe_exec_queue_is_multi_queue() - Whether an exec_queue is part of a queue group.
> + * @q: The exec_queue
> + *
> + * Return: True if the exec_queue is part of a queue group, false otherwise.
> + */
> +static inline bool xe_exec_queue_is_multi_queue(struct xe_exec_queue *q)
> +{
> +	return q->multi_queue.valid;
> +}
> +
> +/**
> + * xe_exec_queue_is_multi_queue_primary() - Whether an exec_queue is primary queue
> + * of a multi queue group.
> + * @q: The exec_queue
> + *
> + * Return: True if @q is primary queue of a queue group, false otherwise.
> + */
> +static inline bool xe_exec_queue_is_multi_queue_primary(struct xe_exec_queue *q)
> +{
> +	return q->multi_queue.is_primary;
> +}
> +
> +/**
> + * xe_exec_queue_is_multi_queue_secondary() - Whether an exec_queue is secondary queue
> + * of a multi queue group.
> + * @q: The exec_queue
> + *
> + * Return: True if @q is secondary queue of a queue group, false otherwise.
> + */
> +static inline bool xe_exec_queue_is_multi_queue_secondary(struct xe_exec_queue *q)
> +{
> +	return xe_exec_queue_is_multi_queue(q) && !q->multi_queue.is_primary;
> +}
> +
> +/**
> + * xe_exec_queue_multi_queue_primary() - Get multi queue group's primary queue
> + * @q: The exec_queue
> + *
> + * If @q belongs to a multi queue group, then the primary queue of the group will
> + * be returned. Otherwise, @q will be returned.
> + */
> +static inline struct xe_exec_queue *xe_exec_queue_multi_queue_primary(struct xe_exec_queue *q)
> +{
> +	return xe_exec_queue_is_multi_queue(q) ? q->multi_queue.group->primary : q;
> +}
> +
>  bool xe_exec_queue_is_lr(struct xe_exec_queue *q);
>  
>  bool xe_exec_queue_is_idle(struct xe_exec_queue *q);
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index c8807268ec6c..3856776df5c4 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -31,6 +31,24 @@ enum xe_exec_queue_priority {
>  	XE_EXEC_QUEUE_PRIORITY_COUNT
>  };
>  
> +/**
> + * struct xe_exec_queue_group - Execution multi queue group
> + *
> + * Contains multi queue group information.
> + */
> +struct xe_exec_queue_group {
> +	/** @primary: Primary queue of this group */
> +	struct xe_exec_queue *primary;
> +	/** @lock: Queue group update lock */
> +	struct mutex lock;
> +	/** @cgp_bo: BO for the Context Group Page */
> +	struct xe_bo *cgp_bo;
> +	/** @xa: xarray to store LRCs */
> +	struct xarray xa;
> +	/** @list_lock: Secondary queue list lock */
> +	struct mutex list_lock;
> +};
> +
>  /**
>   * struct xe_exec_queue - Execution queue
>   *
> @@ -110,6 +128,18 @@ struct xe_exec_queue {
>  		struct xe_guc_exec_queue *guc;
>  	};
>  
> +	/** @multi_queue: Multi queue information */
> +	struct {
> +		/** @multi_queue.group: Queue group information */
> +		struct xe_exec_queue_group *group;
> +		/** @multi_queue.pos: Position of queue within the multi-queue group */
> +		u8 pos;
> +		/** @multi_queue.valid: Queue belongs to a multi queue group */
> +		u8 valid:1;
> +		/** @multi_queue.is_primary: Is primary queue (Q0) of the group */
> +		u8 is_primary:1;
> +	} multi_queue;
> +
>  	/** @sched_props: scheduling properties */
>  	struct {
>  		/** @sched_props.timeslice_us: timeslice period in micro-seconds */
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index 47853659a705..d903b3a55ec1 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -1252,6 +1252,12 @@ struct drm_xe_vm_bind {
>   *    Given that going into a power-saving state kills PXP HWDRM sessions,
>   *    runtime PM will be blocked while queues of this type are alive.
>   *    All PXP queues will be killed if a PXP invalidation event occurs.
> + *  - %DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP - Create a multi-queue group
> + *    or add secondary queues to a multi-queue group.
> + *    If the extension's 'value' field has %DRM_XE_MULTI_GROUP_CREATE flag set,
> + *    then a new multi-queue group is created with this queue as the primary queue
> + *    (Q0). Otherwise, the queue gets added to the multi-queue group whose primary
> + *    queue id is specified in the 'value' field.
>   *
>   * The example below shows how to use @drm_xe_exec_queue_create to create
>   * a simple exec_queue (no parallel submission) of class
> @@ -1292,6 +1298,8 @@ struct drm_xe_exec_queue_create {
>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY		0
>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE		1
>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE		2
> +#define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP		3
> +#define     DRM_XE_MULTI_GROUP_CREATE				(1ull << 63)
>  	/** @extensions: Pointer to the first extension struct, if any */
>  	__u64 extensions;
>  
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* ✗ CI.checkpatch: warning for drm/xe: Multi Queue feature support
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
                   ` (16 preceding siblings ...)
  2025-10-31 18:47 ` [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
@ 2025-10-31 21:15 ` Patchwork
  2025-10-31 21:16 ` ✓ CI.KUnit: success " Patchwork
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 61+ messages in thread
From: Patchwork @ 2025-10-31 21:15 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

== Series Details ==

Series: drm/xe: Multi Queue feature support
URL   : https://patchwork.freedesktop.org/series/156865/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
f867e605613af1770f90c4b0afd4a8f06424d1f0
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 7f2c4e4148978379ac7e7be3e30795e3b675272d
Author: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Date:   Fri Oct 31 11:29:36 2025 -0700

    drm/xe/multi_queue: Enable multi_queue on xe3p_xpc
    
    xe3p_xpc supports multi_queue, enable it.
    
    Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
+ /mt/dim checkpatch d122adb64f56e02d1d9aa432f1a609864e3c33cb drm-intel
b7db5f92a064 drm/xe/multi_queue: Add multi_queue_enable_mask to gt information
380cdbcabd2c drm/xe/multi_queue: Add user interface for multi queue support
54808a2f7616 drm/xe/multi_queue: Add GuC interface for multi queue support
69a64d11a631 drm/xe/multi_queue: Add multi queue priority property
ad9a31d47fb6 drm/xe/multi_queue: Handle invalid exec queue property setting
90af65670a6e drm/xe/multi_queue: Add exec_queue set_property ioctl support
-:98: WARNING:LONG_LINE: line length of 145 exceeds 100 columns
#98: FILE: include/uapi/drm/xe_drm.h:127:
+#define DRM_IOCTL_XE_EXEC_QUEUE_SET_PROPERTY	DRM_IOW(DRM_COMMAND_BASE + DRM_XE_EXEC_QUEUE_SET_PROPERTY, struct drm_xe_exec_queue_set_property)

total: 0 errors, 1 warnings, 0 checks, 95 lines checked
c07ef2959dcc drm/xe/multi_queue: Add support for multi queue dynamic priority change
0210f9e5e3e5 drm/xe/multi_queue: Add multi queue information to guc_info dump
f72ebca322ec drm/xe/multi_queue: Handle tearing down of a multi queue
752c4f175994 drm/xe/multi_queue: Set QUEUE_DRAIN_MODE for Multi Queue batches
b1bf0dbb43e6 drm/xe/multi_queue: Handle CGP context error
35c5f77f5a51 drm/xe/multi_queue: Tracepoint support
-:47: CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#47: FILE: drivers/gpu/drm/xe/xe_trace.h:105:
+		    TP_STRUCT__entry(

-:59: CHECK:OPEN_ENDED_LINE: Lines should not end with a '('
#59: FILE: drivers/gpu/drm/xe/xe_trace.h:117:
+		    TP_fast_assign(

total: 0 errors, 0 warnings, 2 checks, 69 lines checked
569444bcfa7c drm/xe/multi_queue: Support active group after primary is destroyed
a63f41633cd2 drm/xe/doc: Add documentation for Multi Queue Group
aafec7acf6ba drm/xe/doc: Add documentation for Multi Queue Group GuC interface
7f2c4e414897 drm/xe/multi_queue: Enable multi_queue on xe3p_xpc



^ permalink raw reply	[flat|nested] 61+ messages in thread

* ✓ CI.KUnit: success for drm/xe: Multi Queue feature support
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
                   ` (17 preceding siblings ...)
  2025-10-31 21:15 ` ✗ CI.checkpatch: warning for " Patchwork
@ 2025-10-31 21:16 ` Patchwork
  2025-10-31 22:19 ` ✗ Xe.CI.BAT: failure " Patchwork
  2025-11-01 11:25 ` ✗ Xe.CI.Full: " Patchwork
  20 siblings, 0 replies; 61+ messages in thread
From: Patchwork @ 2025-10-31 21:16 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

== Series Details ==

Series: drm/xe: Multi Queue feature support
URL   : https://patchwork.freedesktop.org/series/156865/
State : success

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
[21:15:36] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[21:15:40] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[21:16:11] Starting KUnit Kernel (1/1)...
[21:16:11] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[21:16:11] ================== guc_buf (11 subtests) ===================
[21:16:11] [PASSED] test_smallest
[21:16:11] [PASSED] test_largest
[21:16:11] [PASSED] test_granular
[21:16:11] [PASSED] test_unique
[21:16:11] [PASSED] test_overlap
[21:16:11] [PASSED] test_reusable
[21:16:11] [PASSED] test_too_big
[21:16:11] [PASSED] test_flush
[21:16:11] [PASSED] test_lookup
[21:16:11] [PASSED] test_data
[21:16:11] [PASSED] test_class
[21:16:11] ===================== [PASSED] guc_buf =====================
[21:16:11] =================== guc_dbm (7 subtests) ===================
[21:16:11] [PASSED] test_empty
[21:16:11] [PASSED] test_default
[21:16:11] ======================== test_size  ========================
[21:16:11] [PASSED] 4
[21:16:11] [PASSED] 8
[21:16:11] [PASSED] 32
[21:16:11] [PASSED] 256
[21:16:11] ==================== [PASSED] test_size ====================
[21:16:11] ======================= test_reuse  ========================
[21:16:11] [PASSED] 4
[21:16:11] [PASSED] 8
[21:16:11] [PASSED] 32
[21:16:11] [PASSED] 256
[21:16:11] =================== [PASSED] test_reuse ====================
[21:16:11] =================== test_range_overlap  ====================
[21:16:11] [PASSED] 4
[21:16:11] [PASSED] 8
[21:16:11] [PASSED] 32
[21:16:11] [PASSED] 256
[21:16:11] =============== [PASSED] test_range_overlap ================
[21:16:11] =================== test_range_compact  ====================
[21:16:11] [PASSED] 4
[21:16:11] [PASSED] 8
[21:16:11] [PASSED] 32
[21:16:11] [PASSED] 256
[21:16:11] =============== [PASSED] test_range_compact ================
[21:16:11] ==================== test_range_spare  =====================
[21:16:11] [PASSED] 4
[21:16:11] [PASSED] 8
[21:16:11] [PASSED] 32
[21:16:11] [PASSED] 256
[21:16:11] ================ [PASSED] test_range_spare =================
[21:16:11] ===================== [PASSED] guc_dbm =====================
[21:16:11] =================== guc_idm (6 subtests) ===================
[21:16:11] [PASSED] bad_init
[21:16:11] [PASSED] no_init
[21:16:11] [PASSED] init_fini
[21:16:11] [PASSED] check_used
[21:16:11] [PASSED] check_quota
[21:16:11] [PASSED] check_all
[21:16:11] ===================== [PASSED] guc_idm =====================
[21:16:11] ================== no_relay (3 subtests) ===================
[21:16:11] [PASSED] xe_drops_guc2pf_if_not_ready
[21:16:11] [PASSED] xe_drops_guc2vf_if_not_ready
[21:16:11] [PASSED] xe_rejects_send_if_not_ready
[21:16:11] ==================== [PASSED] no_relay =====================
[21:16:11] ================== pf_relay (14 subtests) ==================
[21:16:11] [PASSED] pf_rejects_guc2pf_too_short
[21:16:11] [PASSED] pf_rejects_guc2pf_too_long
[21:16:11] [PASSED] pf_rejects_guc2pf_no_payload
[21:16:11] [PASSED] pf_fails_no_payload
[21:16:11] [PASSED] pf_fails_bad_origin
[21:16:11] [PASSED] pf_fails_bad_type
[21:16:11] [PASSED] pf_txn_reports_error
[21:16:11] [PASSED] pf_txn_sends_pf2guc
[21:16:11] [PASSED] pf_sends_pf2guc
[21:16:11] [SKIPPED] pf_loopback_nop
[21:16:11] [SKIPPED] pf_loopback_echo
[21:16:11] [SKIPPED] pf_loopback_fail
[21:16:11] [SKIPPED] pf_loopback_busy
[21:16:11] [SKIPPED] pf_loopback_retry
[21:16:11] ==================== [PASSED] pf_relay =====================
[21:16:11] ================== vf_relay (3 subtests) ===================
[21:16:11] [PASSED] vf_rejects_guc2vf_too_short
[21:16:11] [PASSED] vf_rejects_guc2vf_too_long
[21:16:11] [PASSED] vf_rejects_guc2vf_no_payload
[21:16:11] ==================== [PASSED] vf_relay =====================
[21:16:11] ===================== lmtt (1 subtest) =====================
[21:16:11] ======================== test_ops  =========================
[21:16:11] [PASSED] 2-level
[21:16:11] [PASSED] multi-level
[21:16:11] ==================== [PASSED] test_ops =====================
[21:16:11] ====================== [PASSED] lmtt =======================
[21:16:11] ================= pf_service (11 subtests) =================
[21:16:11] [PASSED] pf_negotiate_any
[21:16:11] [PASSED] pf_negotiate_base_match
[21:16:11] [PASSED] pf_negotiate_base_newer
[21:16:11] [PASSED] pf_negotiate_base_next
[21:16:11] [SKIPPED] pf_negotiate_base_older
[21:16:11] [PASSED] pf_negotiate_base_prev
[21:16:11] [PASSED] pf_negotiate_latest_match
[21:16:11] [PASSED] pf_negotiate_latest_newer
[21:16:11] [PASSED] pf_negotiate_latest_next
[21:16:11] [SKIPPED] pf_negotiate_latest_older
[21:16:11] [SKIPPED] pf_negotiate_latest_prev
[21:16:11] =================== [PASSED] pf_service ====================
[21:16:11] ================= xe_guc_g2g (2 subtests) ==================
[21:16:11] ============== xe_live_guc_g2g_kunit_default  ==============
[21:16:11] ========= [SKIPPED] xe_live_guc_g2g_kunit_default ==========
[21:16:11] ============== xe_live_guc_g2g_kunit_allmem  ===============
[21:16:11] ========== [SKIPPED] xe_live_guc_g2g_kunit_allmem ==========
[21:16:11] =================== [SKIPPED] xe_guc_g2g ===================
[21:16:11] =================== xe_mocs (2 subtests) ===================
[21:16:11] ================ xe_live_mocs_kernel_kunit  ================
[21:16:11] =========== [SKIPPED] xe_live_mocs_kernel_kunit ============
[21:16:11] ================ xe_live_mocs_reset_kunit  =================
[21:16:11] ============ [SKIPPED] xe_live_mocs_reset_kunit ============
[21:16:11] ==================== [SKIPPED] xe_mocs =====================
[21:16:11] ================= xe_migrate (2 subtests) ==================
[21:16:11] ================= xe_migrate_sanity_kunit  =================
[21:16:11] ============ [SKIPPED] xe_migrate_sanity_kunit =============
[21:16:11] ================== xe_validate_ccs_kunit  ==================
[21:16:11] ============= [SKIPPED] xe_validate_ccs_kunit ==============
[21:16:11] =================== [SKIPPED] xe_migrate ===================
[21:16:11] ================== xe_dma_buf (1 subtest) ==================
[21:16:11] ==================== xe_dma_buf_kunit  =====================
[21:16:11] ================ [SKIPPED] xe_dma_buf_kunit ================
[21:16:11] =================== [SKIPPED] xe_dma_buf ===================
[21:16:11] ================= xe_bo_shrink (1 subtest) =================
[21:16:11] =================== xe_bo_shrink_kunit  ====================
[21:16:11] =============== [SKIPPED] xe_bo_shrink_kunit ===============
[21:16:11] ================== [SKIPPED] xe_bo_shrink ==================
[21:16:11] ==================== xe_bo (2 subtests) ====================
[21:16:11] ================== xe_ccs_migrate_kunit  ===================
[21:16:11] ============== [SKIPPED] xe_ccs_migrate_kunit ==============
[21:16:11] ==================== xe_bo_evict_kunit  ====================
[21:16:11] =============== [SKIPPED] xe_bo_evict_kunit ================
[21:16:11] ===================== [SKIPPED] xe_bo ======================
[21:16:11] ==================== args (11 subtests) ====================
[21:16:11] [PASSED] count_args_test
[21:16:11] [PASSED] call_args_example
[21:16:11] [PASSED] call_args_test
[21:16:11] [PASSED] drop_first_arg_example
[21:16:11] [PASSED] drop_first_arg_test
[21:16:11] [PASSED] first_arg_example
[21:16:11] [PASSED] first_arg_test
[21:16:11] [PASSED] last_arg_example
[21:16:11] [PASSED] last_arg_test
[21:16:11] [PASSED] pick_arg_example
[21:16:11] [PASSED] sep_comma_example
[21:16:11] ====================== [PASSED] args =======================
[21:16:11] =================== xe_pci (3 subtests) ====================
[21:16:11] ==================== check_graphics_ip  ====================
[21:16:11] [PASSED] 12.00 Xe_LP
[21:16:11] [PASSED] 12.10 Xe_LP+
[21:16:11] [PASSED] 12.55 Xe_HPG
[21:16:11] [PASSED] 12.60 Xe_HPC
[21:16:11] [PASSED] 12.70 Xe_LPG
[21:16:11] [PASSED] 12.71 Xe_LPG
[21:16:11] [PASSED] 12.74 Xe_LPG+
[21:16:11] [PASSED] 20.01 Xe2_HPG
[21:16:11] [PASSED] 20.02 Xe2_HPG
[21:16:11] [PASSED] 20.04 Xe2_LPG
[21:16:11] [PASSED] 30.00 Xe3_LPG
[21:16:11] [PASSED] 30.01 Xe3_LPG
[21:16:11] [PASSED] 30.03 Xe3_LPG
[21:16:11] [PASSED] 30.04 Xe3_LPG
[21:16:11] [PASSED] 30.05 Xe3_LPG
[21:16:11] [PASSED] 35.11 Xe3p_XPC
[21:16:11] ================ [PASSED] check_graphics_ip ================
[21:16:11] ===================== check_media_ip  ======================
[21:16:11] [PASSED] 12.00 Xe_M
[21:16:11] [PASSED] 12.55 Xe_HPM
[21:16:11] [PASSED] 13.00 Xe_LPM+
[21:16:11] [PASSED] 13.01 Xe2_HPM
[21:16:11] [PASSED] 20.00 Xe2_LPM
[21:16:11] [PASSED] 30.00 Xe3_LPM
[21:16:11] [PASSED] 30.02 Xe3_LPM
[21:16:11] [PASSED] 35.00 Xe3p_LPM
[21:16:11] [PASSED] 35.03 Xe3p_HPM
[21:16:11] ================= [PASSED] check_media_ip ==================
[21:16:11] =================== check_platform_desc  ===================
[21:16:11] [PASSED] 0x9A60 (TIGERLAKE)
[21:16:11] [PASSED] 0x9A68 (TIGERLAKE)
[21:16:11] [PASSED] 0x9A70 (TIGERLAKE)
[21:16:11] [PASSED] 0x9A40 (TIGERLAKE)
[21:16:11] [PASSED] 0x9A49 (TIGERLAKE)
[21:16:11] [PASSED] 0x9A59 (TIGERLAKE)
[21:16:11] [PASSED] 0x9A78 (TIGERLAKE)
[21:16:11] [PASSED] 0x9AC0 (TIGERLAKE)
[21:16:11] [PASSED] 0x9AC9 (TIGERLAKE)
[21:16:11] [PASSED] 0x9AD9 (TIGERLAKE)
[21:16:11] [PASSED] 0x9AF8 (TIGERLAKE)
[21:16:11] [PASSED] 0x4C80 (ROCKETLAKE)
[21:16:11] [PASSED] 0x4C8A (ROCKETLAKE)
[21:16:11] [PASSED] 0x4C8B (ROCKETLAKE)
[21:16:11] [PASSED] 0x4C8C (ROCKETLAKE)
[21:16:11] [PASSED] 0x4C90 (ROCKETLAKE)
[21:16:11] [PASSED] 0x4C9A (ROCKETLAKE)
[21:16:11] [PASSED] 0x4680 (ALDERLAKE_S)
[21:16:11] [PASSED] 0x4682 (ALDERLAKE_S)
[21:16:11] [PASSED] 0x4688 (ALDERLAKE_S)
[21:16:11] [PASSED] 0x468A (ALDERLAKE_S)
[21:16:11] [PASSED] 0x468B (ALDERLAKE_S)
[21:16:11] [PASSED] 0x4690 (ALDERLAKE_S)
[21:16:11] [PASSED] 0x4692 (ALDERLAKE_S)
[21:16:11] [PASSED] 0x4693 (ALDERLAKE_S)
[21:16:11] [PASSED] 0x46A0 (ALDERLAKE_P)
[21:16:11] [PASSED] 0x46A1 (ALDERLAKE_P)
[21:16:11] [PASSED] 0x46A2 (ALDERLAKE_P)
[21:16:11] [PASSED] 0x46A3 (ALDERLAKE_P)
[21:16:11] [PASSED] 0x46A6 (ALDERLAKE_P)
[21:16:11] [PASSED] 0x46A8 (ALDERLAKE_P)
[21:16:11] [PASSED] 0x46AA (ALDERLAKE_P)
[21:16:11] [PASSED] 0x462A (ALDERLAKE_P)
[21:16:11] [PASSED] 0x4626 (ALDERLAKE_P)
[21:16:11] [PASSED] 0x4628 (ALDERLAKE_P)
[21:16:11] [PASSED] 0x46B0 (ALDERLAKE_P)
[21:16:11] [PASSED] 0x46B1 (ALDERLAKE_P)
[21:16:11] [PASSED] 0x46B2 (ALDERLAKE_P)
[21:16:11] [PASSED] 0x46B3 (ALDERLAKE_P)
[21:16:11] [PASSED] 0x46C0 (ALDERLAKE_P)
[21:16:11] [PASSED] 0x46C1 (ALDERLAKE_P)
[21:16:11] [PASSED] 0x46C2 (ALDERLAKE_P)
[21:16:11] [PASSED] 0x46C3 (ALDERLAKE_P)
[21:16:11] [PASSED] 0x46D0 (ALDERLAKE_N)
[21:16:11] [PASSED] 0x46D1 (ALDERLAKE_N)
[21:16:11] [PASSED] 0x46D2 (ALDERLAKE_N)
[21:16:11] [PASSED] 0x46D3 (ALDERLAKE_N)
[21:16:11] [PASSED] 0x46D4 (ALDERLAKE_N)
[21:16:11] [PASSED] 0xA721 (ALDERLAKE_P)
[21:16:11] [PASSED] 0xA7A1 (ALDERLAKE_P)
[21:16:11] [PASSED] 0xA7A9 (ALDERLAKE_P)
[21:16:11] [PASSED] 0xA7AC (ALDERLAKE_P)
[21:16:11] [PASSED] 0xA7AD (ALDERLAKE_P)
[21:16:11] [PASSED] 0xA720 (ALDERLAKE_P)
[21:16:11] [PASSED] 0xA7A0 (ALDERLAKE_P)
[21:16:11] [PASSED] 0xA7A8 (ALDERLAKE_P)
[21:16:11] [PASSED] 0xA7AA (ALDERLAKE_P)
[21:16:11] [PASSED] 0xA7AB (ALDERLAKE_P)
[21:16:11] [PASSED] 0xA780 (ALDERLAKE_S)
[21:16:11] [PASSED] 0xA781 (ALDERLAKE_S)
[21:16:11] [PASSED] 0xA782 (ALDERLAKE_S)
[21:16:11] [PASSED] 0xA783 (ALDERLAKE_S)
[21:16:11] [PASSED] 0xA788 (ALDERLAKE_S)
[21:16:11] [PASSED] 0xA789 (ALDERLAKE_S)
[21:16:11] [PASSED] 0xA78A (ALDERLAKE_S)
[21:16:11] [PASSED] 0xA78B (ALDERLAKE_S)
[21:16:11] [PASSED] 0x4905 (DG1)
[21:16:11] [PASSED] 0x4906 (DG1)
[21:16:11] [PASSED] 0x4907 (DG1)
[21:16:11] [PASSED] 0x4908 (DG1)
[21:16:11] [PASSED] 0x4909 (DG1)
[21:16:11] [PASSED] 0x56C0 (DG2)
[21:16:11] [PASSED] 0x56C2 (DG2)
[21:16:11] [PASSED] 0x56C1 (DG2)
[21:16:11] [PASSED] 0x7D51 (METEORLAKE)
[21:16:11] [PASSED] 0x7DD1 (METEORLAKE)
[21:16:11] [PASSED] 0x7D41 (METEORLAKE)
[21:16:11] [PASSED] 0x7D67 (METEORLAKE)
[21:16:11] [PASSED] 0xB640 (METEORLAKE)
[21:16:11] [PASSED] 0x56A0 (DG2)
[21:16:11] [PASSED] 0x56A1 (DG2)
[21:16:11] [PASSED] 0x56A2 (DG2)
[21:16:11] [PASSED] 0x56BE (DG2)
[21:16:11] [PASSED] 0x56BF (DG2)
[21:16:11] [PASSED] 0x5690 (DG2)
[21:16:11] [PASSED] 0x5691 (DG2)
[21:16:11] [PASSED] 0x5692 (DG2)
[21:16:11] [PASSED] 0x56A5 (DG2)
[21:16:11] [PASSED] 0x56A6 (DG2)
[21:16:11] [PASSED] 0x56B0 (DG2)
[21:16:11] [PASSED] 0x56B1 (DG2)
[21:16:11] [PASSED] 0x56BA (DG2)
[21:16:11] [PASSED] 0x56BB (DG2)
[21:16:11] [PASSED] 0x56BC (DG2)
[21:16:11] [PASSED] 0x56BD (DG2)
[21:16:11] [PASSED] 0x5693 (DG2)
[21:16:11] [PASSED] 0x5694 (DG2)
[21:16:11] [PASSED] 0x5695 (DG2)
[21:16:11] [PASSED] 0x56A3 (DG2)
[21:16:11] [PASSED] 0x56A4 (DG2)
[21:16:11] [PASSED] 0x56B2 (DG2)
[21:16:11] [PASSED] 0x56B3 (DG2)
[21:16:11] [PASSED] 0x5696 (DG2)
[21:16:11] [PASSED] 0x5697 (DG2)
[21:16:11] [PASSED] 0xB69 (PVC)
[21:16:11] [PASSED] 0xB6E (PVC)
[21:16:11] [PASSED] 0xBD4 (PVC)
[21:16:11] [PASSED] 0xBD5 (PVC)
[21:16:11] [PASSED] 0xBD6 (PVC)
[21:16:11] [PASSED] 0xBD7 (PVC)
[21:16:11] [PASSED] 0xBD8 (PVC)
[21:16:11] [PASSED] 0xBD9 (PVC)
[21:16:11] [PASSED] 0xBDA (PVC)
[21:16:11] [PASSED] 0xBDB (PVC)
[21:16:11] [PASSED] 0xBE0 (PVC)
[21:16:11] [PASSED] 0xBE1 (PVC)
[21:16:11] [PASSED] 0xBE5 (PVC)
[21:16:11] [PASSED] 0x7D40 (METEORLAKE)
[21:16:11] [PASSED] 0x7D45 (METEORLAKE)
[21:16:11] [PASSED] 0x7D55 (METEORLAKE)
[21:16:11] [PASSED] 0x7D60 (METEORLAKE)
[21:16:11] [PASSED] 0x7DD5 (METEORLAKE)
[21:16:11] [PASSED] 0x6420 (LUNARLAKE)
[21:16:11] [PASSED] 0x64A0 (LUNARLAKE)
[21:16:11] [PASSED] 0x64B0 (LUNARLAKE)
[21:16:11] [PASSED] 0xE202 (BATTLEMAGE)
[21:16:11] [PASSED] 0xE209 (BATTLEMAGE)
[21:16:11] [PASSED] 0xE20B (BATTLEMAGE)
[21:16:11] [PASSED] 0xE20C (BATTLEMAGE)
[21:16:11] [PASSED] 0xE20D (BATTLEMAGE)
[21:16:11] [PASSED] 0xE210 (BATTLEMAGE)
[21:16:11] [PASSED] 0xE211 (BATTLEMAGE)
[21:16:11] [PASSED] 0xE212 (BATTLEMAGE)
[21:16:11] [PASSED] 0xE216 (BATTLEMAGE)
[21:16:11] [PASSED] 0xE220 (BATTLEMAGE)
[21:16:11] [PASSED] 0xE221 (BATTLEMAGE)
[21:16:11] [PASSED] 0xE222 (BATTLEMAGE)
[21:16:11] [PASSED] 0xE223 (BATTLEMAGE)
[21:16:11] [PASSED] 0xB080 (PANTHERLAKE)
[21:16:11] [PASSED] 0xB081 (PANTHERLAKE)
[21:16:11] [PASSED] 0xB082 (PANTHERLAKE)
[21:16:11] [PASSED] 0xB083 (PANTHERLAKE)
[21:16:11] [PASSED] 0xB084 (PANTHERLAKE)
[21:16:11] [PASSED] 0xB085 (PANTHERLAKE)
[21:16:11] [PASSED] 0xB086 (PANTHERLAKE)
[21:16:11] [PASSED] 0xB087 (PANTHERLAKE)
[21:16:11] [PASSED] 0xB08F (PANTHERLAKE)
[21:16:11] [PASSED] 0xB090 (PANTHERLAKE)
[21:16:11] [PASSED] 0xB0A0 (PANTHERLAKE)
[21:16:11] [PASSED] 0xB0B0 (PANTHERLAKE)
[21:16:11] [PASSED] 0xD740 (NOVALAKE_S)
[21:16:11] [PASSED] 0xD741 (NOVALAKE_S)
[21:16:11] [PASSED] 0xD742 (NOVALAKE_S)
[21:16:11] [PASSED] 0xD743 (NOVALAKE_S)
[21:16:11] [PASSED] 0xD744 (NOVALAKE_S)
[21:16:11] [PASSED] 0xD745 (NOVALAKE_S)
[21:16:11] [PASSED] 0x674C (CRESCENTISLAND)
[21:16:11] [PASSED] 0xFD80 (PANTHERLAKE)
[21:16:11] [PASSED] 0xFD81 (PANTHERLAKE)
[21:16:11] =============== [PASSED] check_platform_desc ===============
[21:16:11] ===================== [PASSED] xe_pci ======================
[21:16:11] =================== xe_rtp (2 subtests) ====================
[21:16:11] =============== xe_rtp_process_to_sr_tests  ================
[21:16:11] [PASSED] coalesce-same-reg
[21:16:11] [PASSED] no-match-no-add
[21:16:11] [PASSED] match-or
[21:16:11] [PASSED] match-or-xfail
[21:16:11] [PASSED] no-match-no-add-multiple-rules
[21:16:11] [PASSED] two-regs-two-entries
[21:16:11] [PASSED] clr-one-set-other
[21:16:11] [PASSED] set-field
[21:16:11] [PASSED] conflict-duplicate
[21:16:11] [PASSED] conflict-not-disjoint
[21:16:11] [PASSED] conflict-reg-type
[21:16:11] =========== [PASSED] xe_rtp_process_to_sr_tests ============
[21:16:11] ================== xe_rtp_process_tests  ===================
[21:16:11] [PASSED] active1
[21:16:11] [PASSED] active2
[21:16:11] [PASSED] active-inactive
[21:16:11] [PASSED] inactive-active
[21:16:11] [PASSED] inactive-1st_or_active-inactive
[21:16:11] [PASSED] inactive-2nd_or_active-inactive
[21:16:11] [PASSED] inactive-last_or_active-inactive
stty: 'standard input': Inappropriate ioctl for device
[21:16:11] [PASSED] inactive-no_or_active-inactive
[21:16:11] ============== [PASSED] xe_rtp_process_tests ===============
[21:16:11] ===================== [PASSED] xe_rtp ======================
[21:16:11] ==================== xe_wa (1 subtest) =====================
[21:16:11] ======================== xe_wa_gt  =========================
[21:16:11] [PASSED] TIGERLAKE B0
[21:16:11] [PASSED] DG1 A0
[21:16:11] [PASSED] DG1 B0
[21:16:11] [PASSED] ALDERLAKE_S A0
[21:16:11] [PASSED] ALDERLAKE_S B0
[21:16:11] [PASSED] ALDERLAKE_S C0
[21:16:11] [PASSED] ALDERLAKE_S D0
[21:16:11] [PASSED] ALDERLAKE_P A0
[21:16:11] [PASSED] ALDERLAKE_P B0
[21:16:11] [PASSED] ALDERLAKE_P C0
[21:16:11] [PASSED] ALDERLAKE_S RPLS D0
[21:16:11] [PASSED] ALDERLAKE_P RPLU E0
[21:16:11] [PASSED] DG2 G10 C0
[21:16:11] [PASSED] DG2 G11 B1
[21:16:11] [PASSED] DG2 G12 A1
[21:16:11] [PASSED] METEORLAKE 12.70(Xe_LPG) A0 13.00(Xe_LPM+) A0
[21:16:11] [PASSED] METEORLAKE 12.71(Xe_LPG) A0 13.00(Xe_LPM+) A0
[21:16:11] [PASSED] METEORLAKE 12.74(Xe_LPG+) A0 13.00(Xe_LPM+) A0
[21:16:11] [PASSED] LUNARLAKE 20.04(Xe2_LPG) A0 20.00(Xe2_LPM) A0
[21:16:11] [PASSED] LUNARLAKE 20.04(Xe2_LPG) B0 20.00(Xe2_LPM) A0
[21:16:11] [PASSED] BATTLEMAGE 20.01(Xe2_HPG) A0 13.01(Xe2_HPM) A1
[21:16:11] [PASSED] PANTHERLAKE 30.00(Xe3_LPG) A0 30.00(Xe3_LPM) A0
[21:16:11] ==================== [PASSED] xe_wa_gt =====================
[21:16:11] ====================== [PASSED] xe_wa ======================
[21:16:11] ============================================================
[21:16:11] Testing complete. Ran 318 tests: passed: 300, skipped: 18
[21:16:11] Elapsed time: 34.878s total, 4.239s configuring, 30.272s building, 0.335s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/tests/.kunitconfig
[21:16:11] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[21:16:13] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[21:16:37] Starting KUnit Kernel (1/1)...
[21:16:37] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[21:16:37] ============ drm_test_pick_cmdline (2 subtests) ============
[21:16:37] [PASSED] drm_test_pick_cmdline_res_1920_1080_60
[21:16:37] =============== drm_test_pick_cmdline_named  ===============
[21:16:37] [PASSED] NTSC
[21:16:37] [PASSED] NTSC-J
[21:16:37] [PASSED] PAL
[21:16:37] [PASSED] PAL-M
[21:16:37] =========== [PASSED] drm_test_pick_cmdline_named ===========
[21:16:37] ============== [PASSED] drm_test_pick_cmdline ==============
[21:16:37] == drm_test_atomic_get_connector_for_encoder (1 subtest) ===
[21:16:37] [PASSED] drm_test_drm_atomic_get_connector_for_encoder
[21:16:37] ==== [PASSED] drm_test_atomic_get_connector_for_encoder ====
[21:16:37] =========== drm_validate_clone_mode (2 subtests) ===========
[21:16:37] ============== drm_test_check_in_clone_mode  ===============
[21:16:37] [PASSED] in_clone_mode
[21:16:37] [PASSED] not_in_clone_mode
[21:16:37] ========== [PASSED] drm_test_check_in_clone_mode ===========
[21:16:37] =============== drm_test_check_valid_clones  ===============
[21:16:37] [PASSED] not_in_clone_mode
[21:16:37] [PASSED] valid_clone
[21:16:37] [PASSED] invalid_clone
[21:16:37] =========== [PASSED] drm_test_check_valid_clones ===========
[21:16:37] ============= [PASSED] drm_validate_clone_mode =============
[21:16:37] ============= drm_validate_modeset (1 subtest) =============
[21:16:37] [PASSED] drm_test_check_connector_changed_modeset
[21:16:37] ============== [PASSED] drm_validate_modeset ===============
[21:16:37] ====== drm_test_bridge_get_current_state (2 subtests) ======
[21:16:37] [PASSED] drm_test_drm_bridge_get_current_state_atomic
[21:16:37] [PASSED] drm_test_drm_bridge_get_current_state_legacy
[21:16:37] ======== [PASSED] drm_test_bridge_get_current_state ========
[21:16:37] ====== drm_test_bridge_helper_reset_crtc (3 subtests) ======
[21:16:37] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic
[21:16:37] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic_disabled
[21:16:37] [PASSED] drm_test_drm_bridge_helper_reset_crtc_legacy
[21:16:38] ======== [PASSED] drm_test_bridge_helper_reset_crtc ========
[21:16:38] ============== drm_bridge_alloc (2 subtests) ===============
[21:16:38] [PASSED] drm_test_drm_bridge_alloc_basic
[21:16:38] [PASSED] drm_test_drm_bridge_alloc_get_put
[21:16:38] ================ [PASSED] drm_bridge_alloc =================
[21:16:38] ================== drm_buddy (8 subtests) ==================
[21:16:38] [PASSED] drm_test_buddy_alloc_limit
[21:16:38] [PASSED] drm_test_buddy_alloc_optimistic
[21:16:38] [PASSED] drm_test_buddy_alloc_pessimistic
[21:16:38] [PASSED] drm_test_buddy_alloc_pathological
[21:16:38] [PASSED] drm_test_buddy_alloc_contiguous
[21:16:38] [PASSED] drm_test_buddy_alloc_clear
[21:16:38] [PASSED] drm_test_buddy_alloc_range_bias
[21:16:38] [PASSED] drm_test_buddy_fragmentation_performance
[21:16:38] ==================== [PASSED] drm_buddy ====================
[21:16:38] ============= drm_cmdline_parser (40 subtests) =============
[21:16:38] [PASSED] drm_test_cmdline_force_d_only
[21:16:38] [PASSED] drm_test_cmdline_force_D_only_dvi
[21:16:38] [PASSED] drm_test_cmdline_force_D_only_hdmi
[21:16:38] [PASSED] drm_test_cmdline_force_D_only_not_digital
[21:16:38] [PASSED] drm_test_cmdline_force_e_only
[21:16:38] [PASSED] drm_test_cmdline_res
[21:16:38] [PASSED] drm_test_cmdline_res_vesa
[21:16:38] [PASSED] drm_test_cmdline_res_vesa_rblank
[21:16:38] [PASSED] drm_test_cmdline_res_rblank
[21:16:38] [PASSED] drm_test_cmdline_res_bpp
[21:16:38] [PASSED] drm_test_cmdline_res_refresh
[21:16:38] [PASSED] drm_test_cmdline_res_bpp_refresh
[21:16:38] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced
[21:16:38] [PASSED] drm_test_cmdline_res_bpp_refresh_margins
[21:16:38] [PASSED] drm_test_cmdline_res_bpp_refresh_force_off
[21:16:38] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on
[21:16:38] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_analog
[21:16:38] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_digital
[21:16:38] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced_margins_force_on
[21:16:38] [PASSED] drm_test_cmdline_res_margins_force_on
[21:16:38] [PASSED] drm_test_cmdline_res_vesa_margins
[21:16:38] [PASSED] drm_test_cmdline_name
[21:16:38] [PASSED] drm_test_cmdline_name_bpp
[21:16:38] [PASSED] drm_test_cmdline_name_option
[21:16:38] [PASSED] drm_test_cmdline_name_bpp_option
[21:16:38] [PASSED] drm_test_cmdline_rotate_0
[21:16:38] [PASSED] drm_test_cmdline_rotate_90
[21:16:38] [PASSED] drm_test_cmdline_rotate_180
[21:16:38] [PASSED] drm_test_cmdline_rotate_270
[21:16:38] [PASSED] drm_test_cmdline_hmirror
[21:16:38] [PASSED] drm_test_cmdline_vmirror
[21:16:38] [PASSED] drm_test_cmdline_margin_options
[21:16:38] [PASSED] drm_test_cmdline_multiple_options
[21:16:38] [PASSED] drm_test_cmdline_bpp_extra_and_option
[21:16:38] [PASSED] drm_test_cmdline_extra_and_option
[21:16:38] [PASSED] drm_test_cmdline_freestanding_options
[21:16:38] [PASSED] drm_test_cmdline_freestanding_force_e_and_options
[21:16:38] [PASSED] drm_test_cmdline_panel_orientation
[21:16:38] ================ drm_test_cmdline_invalid  =================
[21:16:38] [PASSED] margin_only
[21:16:38] [PASSED] interlace_only
[21:16:38] [PASSED] res_missing_x
[21:16:38] [PASSED] res_missing_y
[21:16:38] [PASSED] res_bad_y
[21:16:38] [PASSED] res_missing_y_bpp
[21:16:38] [PASSED] res_bad_bpp
[21:16:38] [PASSED] res_bad_refresh
[21:16:38] [PASSED] res_bpp_refresh_force_on_off
[21:16:38] [PASSED] res_invalid_mode
[21:16:38] [PASSED] res_bpp_wrong_place_mode
[21:16:38] [PASSED] name_bpp_refresh
[21:16:38] [PASSED] name_refresh
[21:16:38] [PASSED] name_refresh_wrong_mode
[21:16:38] [PASSED] name_refresh_invalid_mode
[21:16:38] [PASSED] rotate_multiple
[21:16:38] [PASSED] rotate_invalid_val
[21:16:38] [PASSED] rotate_truncated
[21:16:38] [PASSED] invalid_option
[21:16:38] [PASSED] invalid_tv_option
[21:16:38] [PASSED] truncated_tv_option
[21:16:38] ============ [PASSED] drm_test_cmdline_invalid =============
[21:16:38] =============== drm_test_cmdline_tv_options  ===============
[21:16:38] [PASSED] NTSC
[21:16:38] [PASSED] NTSC_443
[21:16:38] [PASSED] NTSC_J
[21:16:38] [PASSED] PAL
[21:16:38] [PASSED] PAL_M
[21:16:38] [PASSED] PAL_N
[21:16:38] [PASSED] SECAM
[21:16:38] [PASSED] MONO_525
[21:16:38] [PASSED] MONO_625
[21:16:38] =========== [PASSED] drm_test_cmdline_tv_options ===========
[21:16:38] =============== [PASSED] drm_cmdline_parser ================
[21:16:38] ========== drmm_connector_hdmi_init (20 subtests) ==========
[21:16:38] [PASSED] drm_test_connector_hdmi_init_valid
[21:16:38] [PASSED] drm_test_connector_hdmi_init_bpc_8
[21:16:38] [PASSED] drm_test_connector_hdmi_init_bpc_10
[21:16:38] [PASSED] drm_test_connector_hdmi_init_bpc_12
[21:16:38] [PASSED] drm_test_connector_hdmi_init_bpc_invalid
[21:16:38] [PASSED] drm_test_connector_hdmi_init_bpc_null
[21:16:38] [PASSED] drm_test_connector_hdmi_init_formats_empty
[21:16:38] [PASSED] drm_test_connector_hdmi_init_formats_no_rgb
[21:16:38] === drm_test_connector_hdmi_init_formats_yuv420_allowed  ===
[21:16:38] [PASSED] supported_formats=0x9 yuv420_allowed=1
[21:16:38] [PASSED] supported_formats=0x9 yuv420_allowed=0
[21:16:38] [PASSED] supported_formats=0x3 yuv420_allowed=1
[21:16:38] [PASSED] supported_formats=0x3 yuv420_allowed=0
[21:16:38] === [PASSED] drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[21:16:38] [PASSED] drm_test_connector_hdmi_init_null_ddc
[21:16:38] [PASSED] drm_test_connector_hdmi_init_null_product
[21:16:38] [PASSED] drm_test_connector_hdmi_init_null_vendor
[21:16:38] [PASSED] drm_test_connector_hdmi_init_product_length_exact
[21:16:38] [PASSED] drm_test_connector_hdmi_init_product_length_too_long
[21:16:38] [PASSED] drm_test_connector_hdmi_init_product_valid
[21:16:38] [PASSED] drm_test_connector_hdmi_init_vendor_length_exact
[21:16:38] [PASSED] drm_test_connector_hdmi_init_vendor_length_too_long
[21:16:38] [PASSED] drm_test_connector_hdmi_init_vendor_valid
[21:16:38] ========= drm_test_connector_hdmi_init_type_valid  =========
[21:16:38] [PASSED] HDMI-A
[21:16:38] [PASSED] HDMI-B
[21:16:38] ===== [PASSED] drm_test_connector_hdmi_init_type_valid =====
[21:16:38] ======== drm_test_connector_hdmi_init_type_invalid  ========
[21:16:38] [PASSED] Unknown
[21:16:38] [PASSED] VGA
[21:16:38] [PASSED] DVI-I
[21:16:38] [PASSED] DVI-D
[21:16:38] [PASSED] DVI-A
[21:16:38] [PASSED] Composite
[21:16:38] [PASSED] SVIDEO
[21:16:38] [PASSED] LVDS
[21:16:38] [PASSED] Component
[21:16:38] [PASSED] DIN
[21:16:38] [PASSED] DP
[21:16:38] [PASSED] TV
[21:16:38] [PASSED] eDP
[21:16:38] [PASSED] Virtual
[21:16:38] [PASSED] DSI
[21:16:38] [PASSED] DPI
[21:16:38] [PASSED] Writeback
[21:16:38] [PASSED] SPI
[21:16:38] [PASSED] USB
[21:16:38] ==== [PASSED] drm_test_connector_hdmi_init_type_invalid ====
[21:16:38] ============ [PASSED] drmm_connector_hdmi_init =============
[21:16:38] ============= drmm_connector_init (3 subtests) =============
[21:16:38] [PASSED] drm_test_drmm_connector_init
[21:16:38] [PASSED] drm_test_drmm_connector_init_null_ddc
[21:16:38] ========= drm_test_drmm_connector_init_type_valid  =========
[21:16:38] [PASSED] Unknown
[21:16:38] [PASSED] VGA
[21:16:38] [PASSED] DVI-I
[21:16:38] [PASSED] DVI-D
[21:16:38] [PASSED] DVI-A
[21:16:38] [PASSED] Composite
[21:16:38] [PASSED] SVIDEO
[21:16:38] [PASSED] LVDS
[21:16:38] [PASSED] Component
[21:16:38] [PASSED] DIN
[21:16:38] [PASSED] DP
[21:16:38] [PASSED] HDMI-A
[21:16:38] [PASSED] HDMI-B
[21:16:38] [PASSED] TV
[21:16:38] [PASSED] eDP
[21:16:38] [PASSED] Virtual
[21:16:38] [PASSED] DSI
[21:16:38] [PASSED] DPI
[21:16:38] [PASSED] Writeback
[21:16:38] [PASSED] SPI
[21:16:38] [PASSED] USB
[21:16:38] ===== [PASSED] drm_test_drmm_connector_init_type_valid =====
[21:16:38] =============== [PASSED] drmm_connector_init ===============
[21:16:38] ========= drm_connector_dynamic_init (6 subtests) ==========
[21:16:38] [PASSED] drm_test_drm_connector_dynamic_init
[21:16:38] [PASSED] drm_test_drm_connector_dynamic_init_null_ddc
[21:16:38] [PASSED] drm_test_drm_connector_dynamic_init_not_added
[21:16:38] [PASSED] drm_test_drm_connector_dynamic_init_properties
[21:16:38] ===== drm_test_drm_connector_dynamic_init_type_valid  ======
[21:16:38] [PASSED] Unknown
[21:16:38] [PASSED] VGA
[21:16:38] [PASSED] DVI-I
[21:16:38] [PASSED] DVI-D
[21:16:38] [PASSED] DVI-A
[21:16:38] [PASSED] Composite
[21:16:38] [PASSED] SVIDEO
[21:16:38] [PASSED] LVDS
[21:16:38] [PASSED] Component
[21:16:38] [PASSED] DIN
[21:16:38] [PASSED] DP
[21:16:38] [PASSED] HDMI-A
[21:16:38] [PASSED] HDMI-B
[21:16:38] [PASSED] TV
[21:16:38] [PASSED] eDP
[21:16:38] [PASSED] Virtual
[21:16:38] [PASSED] DSI
[21:16:38] [PASSED] DPI
[21:16:38] [PASSED] Writeback
[21:16:38] [PASSED] SPI
[21:16:38] [PASSED] USB
[21:16:38] = [PASSED] drm_test_drm_connector_dynamic_init_type_valid ==
[21:16:38] ======== drm_test_drm_connector_dynamic_init_name  =========
[21:16:38] [PASSED] Unknown
[21:16:38] [PASSED] VGA
[21:16:38] [PASSED] DVI-I
[21:16:38] [PASSED] DVI-D
[21:16:38] [PASSED] DVI-A
[21:16:38] [PASSED] Composite
[21:16:38] [PASSED] SVIDEO
[21:16:38] [PASSED] LVDS
[21:16:38] [PASSED] Component
[21:16:38] [PASSED] DIN
[21:16:38] [PASSED] DP
[21:16:38] [PASSED] HDMI-A
[21:16:38] [PASSED] HDMI-B
[21:16:38] [PASSED] TV
[21:16:38] [PASSED] eDP
[21:16:38] [PASSED] Virtual
[21:16:38] [PASSED] DSI
[21:16:38] [PASSED] DPI
[21:16:38] [PASSED] Writeback
[21:16:38] [PASSED] SPI
[21:16:38] [PASSED] USB
[21:16:38] ==== [PASSED] drm_test_drm_connector_dynamic_init_name =====
[21:16:38] =========== [PASSED] drm_connector_dynamic_init ============
[21:16:38] ==== drm_connector_dynamic_register_early (4 subtests) =====
[21:16:38] [PASSED] drm_test_drm_connector_dynamic_register_early_on_list
[21:16:38] [PASSED] drm_test_drm_connector_dynamic_register_early_defer
[21:16:38] [PASSED] drm_test_drm_connector_dynamic_register_early_no_init
[21:16:38] [PASSED] drm_test_drm_connector_dynamic_register_early_no_mode_object
[21:16:38] ====== [PASSED] drm_connector_dynamic_register_early =======
[21:16:38] ======= drm_connector_dynamic_register (7 subtests) ========
[21:16:38] [PASSED] drm_test_drm_connector_dynamic_register_on_list
[21:16:38] [PASSED] drm_test_drm_connector_dynamic_register_no_defer
[21:16:38] [PASSED] drm_test_drm_connector_dynamic_register_no_init
[21:16:38] [PASSED] drm_test_drm_connector_dynamic_register_mode_object
[21:16:38] [PASSED] drm_test_drm_connector_dynamic_register_sysfs
[21:16:38] [PASSED] drm_test_drm_connector_dynamic_register_sysfs_name
[21:16:38] [PASSED] drm_test_drm_connector_dynamic_register_debugfs
[21:16:38] ========= [PASSED] drm_connector_dynamic_register ==========
[21:16:38] = drm_connector_attach_broadcast_rgb_property (2 subtests) =
[21:16:38] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property
[21:16:38] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property_hdmi_connector
[21:16:38] === [PASSED] drm_connector_attach_broadcast_rgb_property ===
[21:16:38] ========== drm_get_tv_mode_from_name (2 subtests) ==========
[21:16:38] ========== drm_test_get_tv_mode_from_name_valid  ===========
[21:16:38] [PASSED] NTSC
[21:16:38] [PASSED] NTSC-443
[21:16:38] [PASSED] NTSC-J
[21:16:38] [PASSED] PAL
[21:16:38] [PASSED] PAL-M
[21:16:38] [PASSED] PAL-N
[21:16:38] [PASSED] SECAM
[21:16:38] [PASSED] Mono
[21:16:38] ====== [PASSED] drm_test_get_tv_mode_from_name_valid =======
[21:16:38] [PASSED] drm_test_get_tv_mode_from_name_truncated
[21:16:38] ============ [PASSED] drm_get_tv_mode_from_name ============
[21:16:38] = drm_test_connector_hdmi_compute_mode_clock (12 subtests) =
[21:16:38] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb
[21:16:38] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc
[21:16:38] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc_vic_1
[21:16:38] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc
[21:16:38] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc_vic_1
[21:16:38] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_double
[21:16:38] = drm_test_connector_hdmi_compute_mode_clock_yuv420_valid  =
[21:16:38] [PASSED] VIC 96
[21:16:38] [PASSED] VIC 97
[21:16:38] [PASSED] VIC 101
[21:16:38] [PASSED] VIC 102
[21:16:38] [PASSED] VIC 106
[21:16:38] [PASSED] VIC 107
[21:16:38] === [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_valid ===
[21:16:38] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_10_bpc
[21:16:38] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_12_bpc
[21:16:38] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_8_bpc
[21:16:38] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_10_bpc
[21:16:38] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_12_bpc
[21:16:38] === [PASSED] drm_test_connector_hdmi_compute_mode_clock ====
[21:16:38] == drm_hdmi_connector_get_broadcast_rgb_name (2 subtests) ==
[21:16:38] === drm_test_drm_hdmi_connector_get_broadcast_rgb_name  ====
[21:16:38] [PASSED] Automatic
[21:16:38] [PASSED] Full
[21:16:38] [PASSED] Limited 16:235
[21:16:38] === [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name ===
[21:16:38] [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name_invalid
[21:16:38] ==== [PASSED] drm_hdmi_connector_get_broadcast_rgb_name ====
[21:16:38] == drm_hdmi_connector_get_output_format_name (2 subtests) ==
[21:16:38] === drm_test_drm_hdmi_connector_get_output_format_name  ====
[21:16:38] [PASSED] RGB
[21:16:38] [PASSED] YUV 4:2:0
[21:16:38] [PASSED] YUV 4:2:2
[21:16:38] [PASSED] YUV 4:4:4
[21:16:38] === [PASSED] drm_test_drm_hdmi_connector_get_output_format_name ===
[21:16:38] [PASSED] drm_test_drm_hdmi_connector_get_output_format_name_invalid
[21:16:38] ==== [PASSED] drm_hdmi_connector_get_output_format_name ====
[21:16:38] ============= drm_damage_helper (21 subtests) ==============
[21:16:38] [PASSED] drm_test_damage_iter_no_damage
[21:16:38] [PASSED] drm_test_damage_iter_no_damage_fractional_src
[21:16:38] [PASSED] drm_test_damage_iter_no_damage_src_moved
[21:16:38] [PASSED] drm_test_damage_iter_no_damage_fractional_src_moved
[21:16:38] [PASSED] drm_test_damage_iter_no_damage_not_visible
[21:16:38] [PASSED] drm_test_damage_iter_no_damage_no_crtc
[21:16:38] [PASSED] drm_test_damage_iter_no_damage_no_fb
[21:16:38] [PASSED] drm_test_damage_iter_simple_damage
[21:16:38] [PASSED] drm_test_damage_iter_single_damage
[21:16:38] [PASSED] drm_test_damage_iter_single_damage_intersect_src
[21:16:38] [PASSED] drm_test_damage_iter_single_damage_outside_src
[21:16:38] [PASSED] drm_test_damage_iter_single_damage_fractional_src
[21:16:38] [PASSED] drm_test_damage_iter_single_damage_intersect_fractional_src
[21:16:38] [PASSED] drm_test_damage_iter_single_damage_outside_fractional_src
[21:16:38] [PASSED] drm_test_damage_iter_single_damage_src_moved
[21:16:38] [PASSED] drm_test_damage_iter_single_damage_fractional_src_moved
[21:16:38] [PASSED] drm_test_damage_iter_damage
[21:16:38] [PASSED] drm_test_damage_iter_damage_one_intersect
[21:16:38] [PASSED] drm_test_damage_iter_damage_one_outside
[21:16:38] [PASSED] drm_test_damage_iter_damage_src_moved
[21:16:38] [PASSED] drm_test_damage_iter_damage_not_visible
[21:16:38] ================ [PASSED] drm_damage_helper ================
[21:16:38] ============== drm_dp_mst_helper (3 subtests) ==============
[21:16:38] ============== drm_test_dp_mst_calc_pbn_mode  ==============
[21:16:38] [PASSED] Clock 154000 BPP 30 DSC disabled
[21:16:38] [PASSED] Clock 234000 BPP 30 DSC disabled
[21:16:38] [PASSED] Clock 297000 BPP 24 DSC disabled
[21:16:38] [PASSED] Clock 332880 BPP 24 DSC enabled
[21:16:38] [PASSED] Clock 324540 BPP 24 DSC enabled
[21:16:38] ========== [PASSED] drm_test_dp_mst_calc_pbn_mode ==========
[21:16:38] ============== drm_test_dp_mst_calc_pbn_div  ===============
[21:16:38] [PASSED] Link rate 2000000 lane count 4
[21:16:38] [PASSED] Link rate 2000000 lane count 2
[21:16:38] [PASSED] Link rate 2000000 lane count 1
[21:16:38] [PASSED] Link rate 1350000 lane count 4
[21:16:38] [PASSED] Link rate 1350000 lane count 2
[21:16:38] [PASSED] Link rate 1350000 lane count 1
[21:16:38] [PASSED] Link rate 1000000 lane count 4
[21:16:38] [PASSED] Link rate 1000000 lane count 2
[21:16:38] [PASSED] Link rate 1000000 lane count 1
[21:16:38] [PASSED] Link rate 810000 lane count 4
[21:16:38] [PASSED] Link rate 810000 lane count 2
[21:16:38] [PASSED] Link rate 810000 lane count 1
[21:16:38] [PASSED] Link rate 540000 lane count 4
[21:16:38] [PASSED] Link rate 540000 lane count 2
[21:16:38] [PASSED] Link rate 540000 lane count 1
[21:16:38] [PASSED] Link rate 270000 lane count 4
[21:16:38] [PASSED] Link rate 270000 lane count 2
[21:16:38] [PASSED] Link rate 270000 lane count 1
[21:16:38] [PASSED] Link rate 162000 lane count 4
[21:16:38] [PASSED] Link rate 162000 lane count 2
[21:16:38] [PASSED] Link rate 162000 lane count 1
[21:16:38] ========== [PASSED] drm_test_dp_mst_calc_pbn_div ===========
[21:16:38] ========= drm_test_dp_mst_sideband_msg_req_decode  =========
[21:16:38] [PASSED] DP_ENUM_PATH_RESOURCES with port number
[21:16:38] [PASSED] DP_POWER_UP_PHY with port number
[21:16:38] [PASSED] DP_POWER_DOWN_PHY with port number
[21:16:38] [PASSED] DP_ALLOCATE_PAYLOAD with SDP stream sinks
[21:16:38] [PASSED] DP_ALLOCATE_PAYLOAD with port number
[21:16:38] [PASSED] DP_ALLOCATE_PAYLOAD with VCPI
[21:16:38] [PASSED] DP_ALLOCATE_PAYLOAD with PBN
[21:16:38] [PASSED] DP_QUERY_PAYLOAD with port number
[21:16:38] [PASSED] DP_QUERY_PAYLOAD with VCPI
[21:16:38] [PASSED] DP_REMOTE_DPCD_READ with port number
[21:16:38] [PASSED] DP_REMOTE_DPCD_READ with DPCD address
[21:16:38] [PASSED] DP_REMOTE_DPCD_READ with max number of bytes
[21:16:38] [PASSED] DP_REMOTE_DPCD_WRITE with port number
[21:16:38] [PASSED] DP_REMOTE_DPCD_WRITE with DPCD address
[21:16:38] [PASSED] DP_REMOTE_DPCD_WRITE with data array
[21:16:38] [PASSED] DP_REMOTE_I2C_READ with port number
[21:16:38] [PASSED] DP_REMOTE_I2C_READ with I2C device ID
[21:16:38] [PASSED] DP_REMOTE_I2C_READ with transactions array
[21:16:38] [PASSED] DP_REMOTE_I2C_WRITE with port number
[21:16:38] [PASSED] DP_REMOTE_I2C_WRITE with I2C device ID
[21:16:38] [PASSED] DP_REMOTE_I2C_WRITE with data array
[21:16:38] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream ID
[21:16:38] [PASSED] DP_QUERY_STREAM_ENC_STATUS with client ID
[21:16:38] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream event
[21:16:38] [PASSED] DP_QUERY_STREAM_ENC_STATUS with valid stream event
[21:16:38] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream behavior
[21:16:38] [PASSED] DP_QUERY_STREAM_ENC_STATUS with a valid stream behavior
[21:16:38] ===== [PASSED] drm_test_dp_mst_sideband_msg_req_decode =====
[21:16:38] ================ [PASSED] drm_dp_mst_helper ================
[21:16:38] ================== drm_exec (7 subtests) ===================
[21:16:38] [PASSED] sanitycheck
[21:16:38] [PASSED] test_lock
[21:16:38] [PASSED] test_lock_unlock
[21:16:38] [PASSED] test_duplicates
[21:16:38] [PASSED] test_prepare
[21:16:38] [PASSED] test_prepare_array
[21:16:38] [PASSED] test_multiple_loops
[21:16:38] ==================== [PASSED] drm_exec =====================
[21:16:38] =========== drm_format_helper_test (17 subtests) ===========
[21:16:38] ============== drm_test_fb_xrgb8888_to_gray8  ==============
[21:16:38] [PASSED] single_pixel_source_buffer
[21:16:38] [PASSED] single_pixel_clip_rectangle
[21:16:38] [PASSED] well_known_colors
[21:16:38] [PASSED] destination_pitch
[21:16:38] ========== [PASSED] drm_test_fb_xrgb8888_to_gray8 ==========
[21:16:38] ============= drm_test_fb_xrgb8888_to_rgb332  ==============
[21:16:38] [PASSED] single_pixel_source_buffer
[21:16:38] [PASSED] single_pixel_clip_rectangle
[21:16:38] [PASSED] well_known_colors
[21:16:38] [PASSED] destination_pitch
[21:16:38] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb332 ==========
[21:16:38] ============= drm_test_fb_xrgb8888_to_rgb565  ==============
[21:16:38] [PASSED] single_pixel_source_buffer
[21:16:38] [PASSED] single_pixel_clip_rectangle
[21:16:38] [PASSED] well_known_colors
[21:16:38] [PASSED] destination_pitch
[21:16:38] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb565 ==========
[21:16:38] ============ drm_test_fb_xrgb8888_to_xrgb1555  =============
[21:16:38] [PASSED] single_pixel_source_buffer
[21:16:38] [PASSED] single_pixel_clip_rectangle
[21:16:38] [PASSED] well_known_colors
[21:16:38] [PASSED] destination_pitch
[21:16:38] ======== [PASSED] drm_test_fb_xrgb8888_to_xrgb1555 =========
[21:16:38] ============ drm_test_fb_xrgb8888_to_argb1555  =============
[21:16:38] [PASSED] single_pixel_source_buffer
[21:16:38] [PASSED] single_pixel_clip_rectangle
[21:16:38] [PASSED] well_known_colors
[21:16:38] [PASSED] destination_pitch
[21:16:38] ======== [PASSED] drm_test_fb_xrgb8888_to_argb1555 =========
[21:16:38] ============ drm_test_fb_xrgb8888_to_rgba5551  =============
[21:16:38] [PASSED] single_pixel_source_buffer
[21:16:38] [PASSED] single_pixel_clip_rectangle
[21:16:38] [PASSED] well_known_colors
[21:16:38] [PASSED] destination_pitch
[21:16:38] ======== [PASSED] drm_test_fb_xrgb8888_to_rgba5551 =========
[21:16:38] ============= drm_test_fb_xrgb8888_to_rgb888  ==============
[21:16:38] [PASSED] single_pixel_source_buffer
[21:16:38] [PASSED] single_pixel_clip_rectangle
[21:16:38] [PASSED] well_known_colors
[21:16:38] [PASSED] destination_pitch
[21:16:38] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb888 ==========
[21:16:38] ============= drm_test_fb_xrgb8888_to_bgr888  ==============
[21:16:38] [PASSED] single_pixel_source_buffer
[21:16:38] [PASSED] single_pixel_clip_rectangle
[21:16:38] [PASSED] well_known_colors
[21:16:38] [PASSED] destination_pitch
[21:16:38] ========= [PASSED] drm_test_fb_xrgb8888_to_bgr888 ==========
[21:16:38] ============ drm_test_fb_xrgb8888_to_argb8888  =============
[21:16:38] [PASSED] single_pixel_source_buffer
[21:16:38] [PASSED] single_pixel_clip_rectangle
[21:16:38] [PASSED] well_known_colors
[21:16:38] [PASSED] destination_pitch
[21:16:38] ======== [PASSED] drm_test_fb_xrgb8888_to_argb8888 =========
[21:16:38] =========== drm_test_fb_xrgb8888_to_xrgb2101010  ===========
[21:16:38] [PASSED] single_pixel_source_buffer
[21:16:38] [PASSED] single_pixel_clip_rectangle
[21:16:38] [PASSED] well_known_colors
[21:16:38] [PASSED] destination_pitch
[21:16:38] ======= [PASSED] drm_test_fb_xrgb8888_to_xrgb2101010 =======
[21:16:38] =========== drm_test_fb_xrgb8888_to_argb2101010  ===========
[21:16:38] [PASSED] single_pixel_source_buffer
[21:16:38] [PASSED] single_pixel_clip_rectangle
[21:16:38] [PASSED] well_known_colors
[21:16:38] [PASSED] destination_pitch
[21:16:38] ======= [PASSED] drm_test_fb_xrgb8888_to_argb2101010 =======
[21:16:38] ============== drm_test_fb_xrgb8888_to_mono  ===============
[21:16:38] [PASSED] single_pixel_source_buffer
[21:16:38] [PASSED] single_pixel_clip_rectangle
[21:16:38] [PASSED] well_known_colors
[21:16:38] [PASSED] destination_pitch
[21:16:38] ========== [PASSED] drm_test_fb_xrgb8888_to_mono ===========
[21:16:38] ==================== drm_test_fb_swab  =====================
[21:16:38] [PASSED] single_pixel_source_buffer
[21:16:38] [PASSED] single_pixel_clip_rectangle
[21:16:38] [PASSED] well_known_colors
[21:16:38] [PASSED] destination_pitch
[21:16:38] ================ [PASSED] drm_test_fb_swab =================
[21:16:38] ============ drm_test_fb_xrgb8888_to_xbgr8888  =============
[21:16:38] [PASSED] single_pixel_source_buffer
[21:16:38] [PASSED] single_pixel_clip_rectangle
[21:16:38] [PASSED] well_known_colors
[21:16:38] [PASSED] destination_pitch
[21:16:38] ======== [PASSED] drm_test_fb_xrgb8888_to_xbgr8888 =========
[21:16:38] ============ drm_test_fb_xrgb8888_to_abgr8888  =============
[21:16:38] [PASSED] single_pixel_source_buffer
[21:16:38] [PASSED] single_pixel_clip_rectangle
[21:16:38] [PASSED] well_known_colors
[21:16:38] [PASSED] destination_pitch
[21:16:38] ======== [PASSED] drm_test_fb_xrgb8888_to_abgr8888 =========
[21:16:38] ================= drm_test_fb_clip_offset  =================
[21:16:38] [PASSED] pass through
[21:16:38] [PASSED] horizontal offset
[21:16:38] [PASSED] vertical offset
[21:16:38] [PASSED] horizontal and vertical offset
[21:16:38] [PASSED] horizontal offset (custom pitch)
[21:16:38] [PASSED] vertical offset (custom pitch)
[21:16:38] [PASSED] horizontal and vertical offset (custom pitch)
[21:16:38] ============= [PASSED] drm_test_fb_clip_offset =============
[21:16:38] =================== drm_test_fb_memcpy  ====================
[21:16:38] [PASSED] single_pixel_source_buffer: XR24 little-endian (0x34325258)
[21:16:38] [PASSED] single_pixel_source_buffer: XRA8 little-endian (0x38415258)
[21:16:38] [PASSED] single_pixel_source_buffer: YU24 little-endian (0x34325559)
[21:16:38] [PASSED] single_pixel_clip_rectangle: XB24 little-endian (0x34324258)
[21:16:38] [PASSED] single_pixel_clip_rectangle: XRA8 little-endian (0x38415258)
[21:16:38] [PASSED] single_pixel_clip_rectangle: YU24 little-endian (0x34325559)
[21:16:38] [PASSED] well_known_colors: XB24 little-endian (0x34324258)
[21:16:38] [PASSED] well_known_colors: XRA8 little-endian (0x38415258)
[21:16:38] [PASSED] well_known_colors: YU24 little-endian (0x34325559)
[21:16:38] [PASSED] destination_pitch: XB24 little-endian (0x34324258)
[21:16:38] [PASSED] destination_pitch: XRA8 little-endian (0x38415258)
[21:16:38] [PASSED] destination_pitch: YU24 little-endian (0x34325559)
[21:16:38] =============== [PASSED] drm_test_fb_memcpy ================
[21:16:38] ============= [PASSED] drm_format_helper_test ==============
[21:16:38] ================= drm_format (18 subtests) =================
[21:16:38] [PASSED] drm_test_format_block_width_invalid
[21:16:38] [PASSED] drm_test_format_block_width_one_plane
[21:16:38] [PASSED] drm_test_format_block_width_two_plane
[21:16:38] [PASSED] drm_test_format_block_width_three_plane
[21:16:38] [PASSED] drm_test_format_block_width_tiled
[21:16:38] [PASSED] drm_test_format_block_height_invalid
[21:16:38] [PASSED] drm_test_format_block_height_one_plane
[21:16:38] [PASSED] drm_test_format_block_height_two_plane
[21:16:38] [PASSED] drm_test_format_block_height_three_plane
[21:16:38] [PASSED] drm_test_format_block_height_tiled
[21:16:38] [PASSED] drm_test_format_min_pitch_invalid
[21:16:38] [PASSED] drm_test_format_min_pitch_one_plane_8bpp
[21:16:38] [PASSED] drm_test_format_min_pitch_one_plane_16bpp
[21:16:38] [PASSED] drm_test_format_min_pitch_one_plane_24bpp
[21:16:38] [PASSED] drm_test_format_min_pitch_one_plane_32bpp
[21:16:38] [PASSED] drm_test_format_min_pitch_two_plane
[21:16:38] [PASSED] drm_test_format_min_pitch_three_plane_8bpp
[21:16:38] [PASSED] drm_test_format_min_pitch_tiled
[21:16:38] =================== [PASSED] drm_format ====================
[21:16:38] ============== drm_framebuffer (10 subtests) ===============
[21:16:38] ========== drm_test_framebuffer_check_src_coords  ==========
[21:16:38] [PASSED] Success: source fits into fb
[21:16:38] [PASSED] Fail: overflowing fb with x-axis coordinate
[21:16:38] [PASSED] Fail: overflowing fb with y-axis coordinate
[21:16:38] [PASSED] Fail: overflowing fb with source width
[21:16:38] [PASSED] Fail: overflowing fb with source height
[21:16:38] ====== [PASSED] drm_test_framebuffer_check_src_coords ======
[21:16:38] [PASSED] drm_test_framebuffer_cleanup
[21:16:38] =============== drm_test_framebuffer_create  ===============
[21:16:38] [PASSED] ABGR8888 normal sizes
[21:16:38] [PASSED] ABGR8888 max sizes
[21:16:38] [PASSED] ABGR8888 pitch greater than min required
[21:16:38] [PASSED] ABGR8888 pitch less than min required
[21:16:38] [PASSED] ABGR8888 Invalid width
[21:16:38] [PASSED] ABGR8888 Invalid buffer handle
[21:16:38] [PASSED] No pixel format
[21:16:38] [PASSED] ABGR8888 Width 0
[21:16:38] [PASSED] ABGR8888 Height 0
[21:16:38] [PASSED] ABGR8888 Out of bound height * pitch combination
[21:16:38] [PASSED] ABGR8888 Large buffer offset
[21:16:38] [PASSED] ABGR8888 Buffer offset for inexistent plane
[21:16:38] [PASSED] ABGR8888 Invalid flag
[21:16:38] [PASSED] ABGR8888 Set DRM_MODE_FB_MODIFIERS without modifiers
[21:16:38] [PASSED] ABGR8888 Valid buffer modifier
[21:16:38] [PASSED] ABGR8888 Invalid buffer modifier(DRM_FORMAT_MOD_SAMSUNG_64_32_TILE)
[21:16:38] [PASSED] ABGR8888 Extra pitches without DRM_MODE_FB_MODIFIERS
[21:16:38] [PASSED] ABGR8888 Extra pitches with DRM_MODE_FB_MODIFIERS
[21:16:38] [PASSED] NV12 Normal sizes
[21:16:38] [PASSED] NV12 Max sizes
[21:16:38] [PASSED] NV12 Invalid pitch
[21:16:38] [PASSED] NV12 Invalid modifier/missing DRM_MODE_FB_MODIFIERS flag
[21:16:38] [PASSED] NV12 different  modifier per-plane
[21:16:38] [PASSED] NV12 with DRM_FORMAT_MOD_SAMSUNG_64_32_TILE
[21:16:38] [PASSED] NV12 Valid modifiers without DRM_MODE_FB_MODIFIERS
[21:16:38] [PASSED] NV12 Modifier for inexistent plane
[21:16:38] [PASSED] NV12 Handle for inexistent plane
[21:16:38] [PASSED] NV12 Handle for inexistent plane without DRM_MODE_FB_MODIFIERS
[21:16:38] [PASSED] YVU420 DRM_MODE_FB_MODIFIERS set without modifier
[21:16:38] [PASSED] YVU420 Normal sizes
[21:16:38] [PASSED] YVU420 Max sizes
[21:16:38] [PASSED] YVU420 Invalid pitch
[21:16:38] [PASSED] YVU420 Different pitches
[21:16:38] [PASSED] YVU420 Different buffer offsets/pitches
[21:16:38] [PASSED] YVU420 Modifier set just for plane 0, without DRM_MODE_FB_MODIFIERS
[21:16:38] [PASSED] YVU420 Modifier set just for planes 0, 1, without DRM_MODE_FB_MODIFIERS
[21:16:38] [PASSED] YVU420 Modifier set just for plane 0, 1, with DRM_MODE_FB_MODIFIERS
[21:16:38] [PASSED] YVU420 Valid modifier
[21:16:38] [PASSED] YVU420 Different modifiers per plane
[21:16:38] [PASSED] YVU420 Modifier for inexistent plane
[21:16:38] [PASSED] YUV420_10BIT Invalid modifier(DRM_FORMAT_MOD_LINEAR)
[21:16:38] [PASSED] X0L2 Normal sizes
[21:16:38] [PASSED] X0L2 Max sizes
[21:16:38] [PASSED] X0L2 Invalid pitch
[21:16:38] [PASSED] X0L2 Pitch greater than minimum required
[21:16:38] [PASSED] X0L2 Handle for inexistent plane
[21:16:38] [PASSED] X0L2 Offset for inexistent plane, without DRM_MODE_FB_MODIFIERS set
[21:16:38] [PASSED] X0L2 Modifier without DRM_MODE_FB_MODIFIERS set
[21:16:38] [PASSED] X0L2 Valid modifier
[21:16:38] [PASSED] X0L2 Modifier for inexistent plane
[21:16:38] =========== [PASSED] drm_test_framebuffer_create ===========
[21:16:38] [PASSED] drm_test_framebuffer_free
[21:16:38] [PASSED] drm_test_framebuffer_init
[21:16:38] [PASSED] drm_test_framebuffer_init_bad_format
[21:16:38] [PASSED] drm_test_framebuffer_init_dev_mismatch
[21:16:38] [PASSED] drm_test_framebuffer_lookup
[21:16:38] [PASSED] drm_test_framebuffer_lookup_inexistent
[21:16:38] [PASSED] drm_test_framebuffer_modifiers_not_supported
[21:16:38] ================= [PASSED] drm_framebuffer =================
[21:16:38] ================ drm_gem_shmem (8 subtests) ================
[21:16:38] [PASSED] drm_gem_shmem_test_obj_create
[21:16:38] [PASSED] drm_gem_shmem_test_obj_create_private
[21:16:38] [PASSED] drm_gem_shmem_test_pin_pages
[21:16:38] [PASSED] drm_gem_shmem_test_vmap
[21:16:38] [PASSED] drm_gem_shmem_test_get_pages_sgt
[21:16:38] [PASSED] drm_gem_shmem_test_get_sg_table
[21:16:38] [PASSED] drm_gem_shmem_test_madvise
[21:16:38] [PASSED] drm_gem_shmem_test_purge
[21:16:38] ================== [PASSED] drm_gem_shmem ==================
[21:16:38] === drm_atomic_helper_connector_hdmi_check (27 subtests) ===
[21:16:38] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode
[21:16:38] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode_vic_1
[21:16:38] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode
[21:16:38] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode_vic_1
[21:16:38] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode
[21:16:38] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode_vic_1
[21:16:38] ====== drm_test_check_broadcast_rgb_cea_mode_yuv420  =======
[21:16:38] [PASSED] Automatic
[21:16:38] [PASSED] Full
[21:16:38] [PASSED] Limited 16:235
[21:16:38] == [PASSED] drm_test_check_broadcast_rgb_cea_mode_yuv420 ===
[21:16:38] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_changed
[21:16:38] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_not_changed
[21:16:38] [PASSED] drm_test_check_disable_connector
[21:16:38] [PASSED] drm_test_check_hdmi_funcs_reject_rate
[21:16:38] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_rgb
[21:16:38] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_yuv420
[21:16:38] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv422
[21:16:38] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv420
[21:16:38] [PASSED] drm_test_check_driver_unsupported_fallback_yuv420
[21:16:38] [PASSED] drm_test_check_output_bpc_crtc_mode_changed
[21:16:38] [PASSED] drm_test_check_output_bpc_crtc_mode_not_changed
[21:16:38] [PASSED] drm_test_check_output_bpc_dvi
[21:16:38] [PASSED] drm_test_check_output_bpc_format_vic_1
[21:16:38] [PASSED] drm_test_check_output_bpc_format_display_8bpc_only
[21:16:38] [PASSED] drm_test_check_output_bpc_format_display_rgb_only
[21:16:38] [PASSED] drm_test_check_output_bpc_format_driver_8bpc_only
[21:16:38] [PASSED] drm_test_check_output_bpc_format_driver_rgb_only
[21:16:38] [PASSED] drm_test_check_tmds_char_rate_rgb_8bpc
[21:16:38] [PASSED] drm_test_check_tmds_char_rate_rgb_10bpc
[21:16:38] [PASSED] drm_test_check_tmds_char_rate_rgb_12bpc
[21:16:38] ===== [PASSED] drm_atomic_helper_connector_hdmi_check ======
[21:16:38] === drm_atomic_helper_connector_hdmi_reset (6 subtests) ====
[21:16:38] [PASSED] drm_test_check_broadcast_rgb_value
[21:16:38] [PASSED] drm_test_check_bpc_8_value
[21:16:38] [PASSED] drm_test_check_bpc_10_value
[21:16:38] [PASSED] drm_test_check_bpc_12_value
[21:16:38] [PASSED] drm_test_check_format_value
[21:16:38] [PASSED] drm_test_check_tmds_char_value
[21:16:38] ===== [PASSED] drm_atomic_helper_connector_hdmi_reset ======
[21:16:38] = drm_atomic_helper_connector_hdmi_mode_valid (4 subtests) =
[21:16:38] [PASSED] drm_test_check_mode_valid
[21:16:38] [PASSED] drm_test_check_mode_valid_reject
[21:16:38] [PASSED] drm_test_check_mode_valid_reject_rate
[21:16:38] [PASSED] drm_test_check_mode_valid_reject_max_clock
[21:16:38] === [PASSED] drm_atomic_helper_connector_hdmi_mode_valid ===
[21:16:38] ================= drm_managed (2 subtests) =================
[21:16:38] [PASSED] drm_test_managed_release_action
[21:16:38] [PASSED] drm_test_managed_run_action
[21:16:38] =================== [PASSED] drm_managed ===================
[21:16:38] =================== drm_mm (6 subtests) ====================
[21:16:38] [PASSED] drm_test_mm_init
[21:16:38] [PASSED] drm_test_mm_debug
[21:16:38] [PASSED] drm_test_mm_align32
[21:16:38] [PASSED] drm_test_mm_align64
[21:16:38] [PASSED] drm_test_mm_lowest
[21:16:38] [PASSED] drm_test_mm_highest
[21:16:38] ===================== [PASSED] drm_mm ======================
[21:16:38] ============= drm_modes_analog_tv (5 subtests) =============
[21:16:38] [PASSED] drm_test_modes_analog_tv_mono_576i
[21:16:38] [PASSED] drm_test_modes_analog_tv_ntsc_480i
[21:16:38] [PASSED] drm_test_modes_analog_tv_ntsc_480i_inlined
[21:16:38] [PASSED] drm_test_modes_analog_tv_pal_576i
[21:16:38] [PASSED] drm_test_modes_analog_tv_pal_576i_inlined
[21:16:38] =============== [PASSED] drm_modes_analog_tv ===============
[21:16:38] ============== drm_plane_helper (2 subtests) ===============
[21:16:38] =============== drm_test_check_plane_state  ================
[21:16:38] [PASSED] clipping_simple
[21:16:38] [PASSED] clipping_rotate_reflect
[21:16:38] [PASSED] positioning_simple
[21:16:38] [PASSED] upscaling
[21:16:38] [PASSED] downscaling
[21:16:38] [PASSED] rounding1
[21:16:38] [PASSED] rounding2
[21:16:38] [PASSED] rounding3
[21:16:38] [PASSED] rounding4
[21:16:38] =========== [PASSED] drm_test_check_plane_state ============
[21:16:38] =========== drm_test_check_invalid_plane_state  ============
[21:16:38] [PASSED] positioning_invalid
[21:16:38] [PASSED] upscaling_invalid
[21:16:38] [PASSED] downscaling_invalid
[21:16:38] ======= [PASSED] drm_test_check_invalid_plane_state ========
[21:16:38] ================ [PASSED] drm_plane_helper =================
[21:16:38] ====== drm_connector_helper_tv_get_modes (1 subtest) =======
[21:16:38] ====== drm_test_connector_helper_tv_get_modes_check  =======
[21:16:38] [PASSED] None
[21:16:38] [PASSED] PAL
[21:16:38] [PASSED] NTSC
[21:16:38] [PASSED] Both, NTSC Default
[21:16:38] [PASSED] Both, PAL Default
[21:16:38] [PASSED] Both, NTSC Default, with PAL on command-line
[21:16:38] [PASSED] Both, PAL Default, with NTSC on command-line
[21:16:38] == [PASSED] drm_test_connector_helper_tv_get_modes_check ===
[21:16:38] ======== [PASSED] drm_connector_helper_tv_get_modes ========
[21:16:38] ================== drm_rect (9 subtests) ===================
[21:16:38] [PASSED] drm_test_rect_clip_scaled_div_by_zero
[21:16:38] [PASSED] drm_test_rect_clip_scaled_not_clipped
[21:16:38] [PASSED] drm_test_rect_clip_scaled_clipped
[21:16:38] [PASSED] drm_test_rect_clip_scaled_signed_vs_unsigned
[21:16:38] ================= drm_test_rect_intersect  =================
[21:16:38] [PASSED] top-left x bottom-right: 2x2+1+1 x 2x2+0+0
[21:16:38] [PASSED] top-right x bottom-left: 2x2+0+0 x 2x2+1-1
[21:16:38] [PASSED] bottom-left x top-right: 2x2+1-1 x 2x2+0+0
[21:16:38] [PASSED] bottom-right x top-left: 2x2+0+0 x 2x2+1+1
[21:16:38] [PASSED] right x left: 2x1+0+0 x 3x1+1+0
[21:16:38] [PASSED] left x right: 3x1+1+0 x 2x1+0+0
[21:16:38] [PASSED] up x bottom: 1x2+0+0 x 1x3+0-1
[21:16:38] [PASSED] bottom x up: 1x3+0-1 x 1x2+0+0
[21:16:38] [PASSED] touching corner: 1x1+0+0 x 2x2+1+1
[21:16:38] [PASSED] touching side: 1x1+0+0 x 1x1+1+0
[21:16:38] [PASSED] equal rects: 2x2+0+0 x 2x2+0+0
[21:16:38] [PASSED] inside another: 2x2+0+0 x 1x1+1+1
[21:16:38] [PASSED] far away: 1x1+0+0 x 1x1+3+6
[21:16:38] [PASSED] points intersecting: 0x0+5+10 x 0x0+5+10
[21:16:38] [PASSED] points not intersecting: 0x0+0+0 x 0x0+5+10
[21:16:38] ============= [PASSED] drm_test_rect_intersect =============
[21:16:38] ================ drm_test_rect_calc_hscale  ================
[21:16:38] [PASSED] normal use
[21:16:38] [PASSED] out of max range
[21:16:38] [PASSED] out of min range
[21:16:38] [PASSED] zero dst
[21:16:38] [PASSED] negative src
[21:16:38] [PASSED] negative dst
[21:16:38] ============ [PASSED] drm_test_rect_calc_hscale ============
[21:16:38] ================ drm_test_rect_calc_vscale  ================
[21:16:38] [PASSED] normal use
stty: 'standard input': Inappropriate ioctl for device
[21:16:38] [PASSED] out of max range
[21:16:38] [PASSED] out of min range
[21:16:38] [PASSED] zero dst
[21:16:38] [PASSED] negative src
[21:16:38] [PASSED] negative dst
[21:16:38] ============ [PASSED] drm_test_rect_calc_vscale ============
[21:16:38] ================== drm_test_rect_rotate  ===================
[21:16:38] [PASSED] reflect-x
[21:16:38] [PASSED] reflect-y
[21:16:38] [PASSED] rotate-0
[21:16:38] [PASSED] rotate-90
[21:16:38] [PASSED] rotate-180
[21:16:38] [PASSED] rotate-270
[21:16:38] ============== [PASSED] drm_test_rect_rotate ===============
[21:16:38] ================ drm_test_rect_rotate_inv  =================
[21:16:38] [PASSED] reflect-x
[21:16:38] [PASSED] reflect-y
[21:16:38] [PASSED] rotate-0
[21:16:38] [PASSED] rotate-90
[21:16:38] [PASSED] rotate-180
[21:16:38] [PASSED] rotate-270
[21:16:38] ============ [PASSED] drm_test_rect_rotate_inv =============
[21:16:38] ==================== [PASSED] drm_rect =====================
[21:16:38] ============ drm_sysfb_modeset_test (1 subtest) ============
[21:16:38] ============ drm_test_sysfb_build_fourcc_list  =============
[21:16:38] [PASSED] no native formats
[21:16:38] [PASSED] XRGB8888 as native format
[21:16:38] [PASSED] remove duplicates
[21:16:38] [PASSED] convert alpha formats
[21:16:38] [PASSED] random formats
[21:16:38] ======== [PASSED] drm_test_sysfb_build_fourcc_list =========
[21:16:38] ============= [PASSED] drm_sysfb_modeset_test ==============
[21:16:38] ============================================================
[21:16:38] Testing complete. Ran 622 tests: passed: 622
[21:16:38] Elapsed time: 26.831s total, 1.674s configuring, 24.736s building, 0.383s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/ttm/tests/.kunitconfig
[21:16:38] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[21:16:40] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[21:16:49] Starting KUnit Kernel (1/1)...
[21:16:49] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[21:16:49] ================= ttm_device (5 subtests) ==================
[21:16:49] [PASSED] ttm_device_init_basic
[21:16:49] [PASSED] ttm_device_init_multiple
[21:16:49] [PASSED] ttm_device_fini_basic
[21:16:49] [PASSED] ttm_device_init_no_vma_man
[21:16:49] ================== ttm_device_init_pools  ==================
[21:16:49] [PASSED] No DMA allocations, no DMA32 required
[21:16:49] [PASSED] DMA allocations, DMA32 required
[21:16:49] [PASSED] No DMA allocations, DMA32 required
[21:16:49] [PASSED] DMA allocations, no DMA32 required
[21:16:49] ============== [PASSED] ttm_device_init_pools ==============
[21:16:49] =================== [PASSED] ttm_device ====================
[21:16:49] ================== ttm_pool (8 subtests) ===================
[21:16:49] ================== ttm_pool_alloc_basic  ===================
[21:16:49] [PASSED] One page
[21:16:49] [PASSED] More than one page
[21:16:49] [PASSED] Above the allocation limit
[21:16:49] [PASSED] One page, with coherent DMA mappings enabled
[21:16:49] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[21:16:49] ============== [PASSED] ttm_pool_alloc_basic ===============
[21:16:49] ============== ttm_pool_alloc_basic_dma_addr  ==============
[21:16:49] [PASSED] One page
[21:16:49] [PASSED] More than one page
[21:16:49] [PASSED] Above the allocation limit
[21:16:49] [PASSED] One page, with coherent DMA mappings enabled
[21:16:49] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[21:16:49] ========== [PASSED] ttm_pool_alloc_basic_dma_addr ==========
[21:16:49] [PASSED] ttm_pool_alloc_order_caching_match
[21:16:49] [PASSED] ttm_pool_alloc_caching_mismatch
[21:16:49] [PASSED] ttm_pool_alloc_order_mismatch
[21:16:49] [PASSED] ttm_pool_free_dma_alloc
[21:16:49] [PASSED] ttm_pool_free_no_dma_alloc
[21:16:49] [PASSED] ttm_pool_fini_basic
[21:16:49] ==================== [PASSED] ttm_pool =====================
[21:16:49] ================ ttm_resource (8 subtests) =================
[21:16:49] ================= ttm_resource_init_basic  =================
[21:16:49] [PASSED] Init resource in TTM_PL_SYSTEM
[21:16:49] [PASSED] Init resource in TTM_PL_VRAM
[21:16:49] [PASSED] Init resource in a private placement
[21:16:49] [PASSED] Init resource in TTM_PL_SYSTEM, set placement flags
[21:16:49] ============= [PASSED] ttm_resource_init_basic =============
[21:16:49] [PASSED] ttm_resource_init_pinned
[21:16:49] [PASSED] ttm_resource_fini_basic
[21:16:49] [PASSED] ttm_resource_manager_init_basic
[21:16:49] [PASSED] ttm_resource_manager_usage_basic
[21:16:49] [PASSED] ttm_resource_manager_set_used_basic
[21:16:49] [PASSED] ttm_sys_man_alloc_basic
[21:16:49] [PASSED] ttm_sys_man_free_basic
[21:16:49] ================== [PASSED] ttm_resource ===================
[21:16:49] =================== ttm_tt (15 subtests) ===================
[21:16:49] ==================== ttm_tt_init_basic  ====================
[21:16:49] [PASSED] Page-aligned size
[21:16:49] [PASSED] Extra pages requested
[21:16:49] ================ [PASSED] ttm_tt_init_basic ================
[21:16:49] [PASSED] ttm_tt_init_misaligned
[21:16:49] [PASSED] ttm_tt_fini_basic
[21:16:49] [PASSED] ttm_tt_fini_sg
[21:16:49] [PASSED] ttm_tt_fini_shmem
[21:16:49] [PASSED] ttm_tt_create_basic
[21:16:49] [PASSED] ttm_tt_create_invalid_bo_type
[21:16:49] [PASSED] ttm_tt_create_ttm_exists
[21:16:49] [PASSED] ttm_tt_create_failed
[21:16:49] [PASSED] ttm_tt_destroy_basic
[21:16:49] [PASSED] ttm_tt_populate_null_ttm
[21:16:49] [PASSED] ttm_tt_populate_populated_ttm
[21:16:49] [PASSED] ttm_tt_unpopulate_basic
[21:16:49] [PASSED] ttm_tt_unpopulate_empty_ttm
[21:16:49] [PASSED] ttm_tt_swapin_basic
[21:16:49] ===================== [PASSED] ttm_tt ======================
[21:16:49] =================== ttm_bo (14 subtests) ===================
[21:16:49] =========== ttm_bo_reserve_optimistic_no_ticket  ===========
[21:16:49] [PASSED] Cannot be interrupted and sleeps
[21:16:49] [PASSED] Cannot be interrupted, locks straight away
[21:16:49] [PASSED] Can be interrupted, sleeps
[21:16:49] ======= [PASSED] ttm_bo_reserve_optimistic_no_ticket =======
[21:16:49] [PASSED] ttm_bo_reserve_locked_no_sleep
[21:16:49] [PASSED] ttm_bo_reserve_no_wait_ticket
[21:16:49] [PASSED] ttm_bo_reserve_double_resv
[21:16:49] [PASSED] ttm_bo_reserve_interrupted
[21:16:49] [PASSED] ttm_bo_reserve_deadlock
[21:16:49] [PASSED] ttm_bo_unreserve_basic
[21:16:49] [PASSED] ttm_bo_unreserve_pinned
[21:16:49] [PASSED] ttm_bo_unreserve_bulk
[21:16:49] [PASSED] ttm_bo_fini_basic
[21:16:49] [PASSED] ttm_bo_fini_shared_resv
[21:16:49] [PASSED] ttm_bo_pin_basic
[21:16:49] [PASSED] ttm_bo_pin_unpin_resource
[21:16:49] [PASSED] ttm_bo_multiple_pin_one_unpin
[21:16:49] ===================== [PASSED] ttm_bo ======================
[21:16:49] ============== ttm_bo_validate (21 subtests) ===============
[21:16:49] ============== ttm_bo_init_reserved_sys_man  ===============
[21:16:49] [PASSED] Buffer object for userspace
[21:16:49] [PASSED] Kernel buffer object
[21:16:49] [PASSED] Shared buffer object
[21:16:49] ========== [PASSED] ttm_bo_init_reserved_sys_man ===========
[21:16:49] ============== ttm_bo_init_reserved_mock_man  ==============
[21:16:49] [PASSED] Buffer object for userspace
[21:16:49] [PASSED] Kernel buffer object
[21:16:49] [PASSED] Shared buffer object
[21:16:49] ========== [PASSED] ttm_bo_init_reserved_mock_man ==========
[21:16:49] [PASSED] ttm_bo_init_reserved_resv
[21:16:49] ================== ttm_bo_validate_basic  ==================
[21:16:49] [PASSED] Buffer object for userspace
[21:16:49] [PASSED] Kernel buffer object
[21:16:49] [PASSED] Shared buffer object
[21:16:49] ============== [PASSED] ttm_bo_validate_basic ==============
[21:16:49] [PASSED] ttm_bo_validate_invalid_placement
[21:16:49] ============= ttm_bo_validate_same_placement  ==============
[21:16:49] [PASSED] System manager
[21:16:49] [PASSED] VRAM manager
[21:16:49] ========= [PASSED] ttm_bo_validate_same_placement ==========
[21:16:49] [PASSED] ttm_bo_validate_failed_alloc
[21:16:49] [PASSED] ttm_bo_validate_pinned
[21:16:49] [PASSED] ttm_bo_validate_busy_placement
[21:16:49] ================ ttm_bo_validate_multihop  =================
[21:16:49] [PASSED] Buffer object for userspace
[21:16:49] [PASSED] Kernel buffer object
[21:16:49] [PASSED] Shared buffer object
[21:16:49] ============ [PASSED] ttm_bo_validate_multihop =============
[21:16:49] ========== ttm_bo_validate_no_placement_signaled  ==========
[21:16:49] [PASSED] Buffer object in system domain, no page vector
[21:16:49] [PASSED] Buffer object in system domain with an existing page vector
[21:16:49] ====== [PASSED] ttm_bo_validate_no_placement_signaled ======
[21:16:49] ======== ttm_bo_validate_no_placement_not_signaled  ========
[21:16:49] [PASSED] Buffer object for userspace
[21:16:49] [PASSED] Kernel buffer object
[21:16:49] [PASSED] Shared buffer object
[21:16:49] ==== [PASSED] ttm_bo_validate_no_placement_not_signaled ====
[21:16:49] [PASSED] ttm_bo_validate_move_fence_signaled
[21:16:49] ========= ttm_bo_validate_move_fence_not_signaled  =========
[21:16:49] [PASSED] Waits for GPU
[21:16:49] [PASSED] Tries to lock straight away
[21:16:49] ===== [PASSED] ttm_bo_validate_move_fence_not_signaled =====
[21:16:49] [PASSED] ttm_bo_validate_happy_evict
[21:16:49] [PASSED] ttm_bo_validate_all_pinned_evict
[21:16:49] [PASSED] ttm_bo_validate_allowed_only_evict
[21:16:49] [PASSED] ttm_bo_validate_deleted_evict
[21:16:49] [PASSED] ttm_bo_validate_busy_domain_evict
[21:16:49] [PASSED] ttm_bo_validate_evict_gutting
[21:16:49] [PASSED] ttm_bo_validate_recrusive_evict
stty: 'standard input': Inappropriate ioctl for device
[21:16:49] ================= [PASSED] ttm_bo_validate =================
[21:16:49] ============================================================
[21:16:49] Testing complete. Ran 101 tests: passed: 101
[21:16:49] Elapsed time: 11.191s total, 1.637s configuring, 9.288s building, 0.227s running

+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 61+ messages in thread

* ✗ Xe.CI.BAT: failure for drm/xe: Multi Queue feature support
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
                   ` (18 preceding siblings ...)
  2025-10-31 21:16 ` ✓ CI.KUnit: success " Patchwork
@ 2025-10-31 22:19 ` Patchwork
  2025-11-01 11:25 ` ✗ Xe.CI.Full: " Patchwork
  20 siblings, 0 replies; 61+ messages in thread
From: Patchwork @ 2025-10-31 22:19 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 5724 bytes --]

== Series Details ==

Series: drm/xe: Multi Queue feature support
URL   : https://patchwork.freedesktop.org/series/156865/
State : failure

== Summary ==

CI Bug Log - changes from xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59_BAT -> xe-pw-156865v1_BAT
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with xe-pw-156865v1_BAT absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in xe-pw-156865v1_BAT, please notify your bug team (I915-ci-infra@lists.freedesktop.org) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (12 -> 12)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in xe-pw-156865v1_BAT:

### IGT changes ###

#### Possible regressions ####

  * igt@xe_exec_queue_property@invalid-property:
    - bat-ptl-2:          [PASS][1] -> [FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/bat-ptl-2/igt@xe_exec_queue_property@invalid-property.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/bat-ptl-2/igt@xe_exec_queue_property@invalid-property.html
    - bat-dg2-oem2:       [PASS][3] -> [FAIL][4]
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/bat-dg2-oem2/igt@xe_exec_queue_property@invalid-property.html
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/bat-dg2-oem2/igt@xe_exec_queue_property@invalid-property.html
    - bat-atsm-2:         [PASS][5] -> [FAIL][6]
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/bat-atsm-2/igt@xe_exec_queue_property@invalid-property.html
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/bat-atsm-2/igt@xe_exec_queue_property@invalid-property.html
    - bat-adlp-vm:        [PASS][7] -> [FAIL][8]
   [7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/bat-adlp-vm/igt@xe_exec_queue_property@invalid-property.html
   [8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/bat-adlp-vm/igt@xe_exec_queue_property@invalid-property.html
    - bat-ptl-1:          [PASS][9] -> [FAIL][10]
   [9]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/bat-ptl-1/igt@xe_exec_queue_property@invalid-property.html
   [10]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/bat-ptl-1/igt@xe_exec_queue_property@invalid-property.html
    - bat-lnl-1:          [PASS][11] -> [FAIL][12]
   [11]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/bat-lnl-1/igt@xe_exec_queue_property@invalid-property.html
   [12]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/bat-lnl-1/igt@xe_exec_queue_property@invalid-property.html
    - bat-ptl-vm:         [PASS][13] -> [FAIL][14]
   [13]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/bat-ptl-vm/igt@xe_exec_queue_property@invalid-property.html
   [14]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/bat-ptl-vm/igt@xe_exec_queue_property@invalid-property.html
    - bat-bmg-2:          [PASS][15] -> [FAIL][16]
   [15]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/bat-bmg-2/igt@xe_exec_queue_property@invalid-property.html
   [16]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/bat-bmg-2/igt@xe_exec_queue_property@invalid-property.html
    - bat-adlp-7:         [PASS][17] -> [FAIL][18]
   [17]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/bat-adlp-7/igt@xe_exec_queue_property@invalid-property.html
   [18]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/bat-adlp-7/igt@xe_exec_queue_property@invalid-property.html
    - bat-bmg-1:          [PASS][19] -> [FAIL][20]
   [19]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/bat-bmg-1/igt@xe_exec_queue_property@invalid-property.html
   [20]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/bat-bmg-1/igt@xe_exec_queue_property@invalid-property.html
    - bat-lnl-2:          [PASS][21] -> [FAIL][22]
   [21]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/bat-lnl-2/igt@xe_exec_queue_property@invalid-property.html
   [22]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/bat-lnl-2/igt@xe_exec_queue_property@invalid-property.html

  
Known issues
------------

  Here are the changes found in xe-pw-156865v1_BAT that come from known issues:

### IGT changes ###

#### Possible fixes ####

  * igt@kms_flip@basic-plain-flip@b-edp1:
    - bat-adlp-7:         [DMESG-WARN][23] ([Intel XE#4543]) -> [PASS][24] +1 other test pass
   [23]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/bat-adlp-7/igt@kms_flip@basic-plain-flip@b-edp1.html
   [24]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/bat-adlp-7/igt@kms_flip@basic-plain-flip@b-edp1.html

  
  [Intel XE#4543]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4543


Build changes
-------------

  * Linux: xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59 -> xe-pw-156865v1

  IGT_8604: 8604
  xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59: efcd62f1d00607db85d10ec29387fe6516daee59
  xe-pw-156865v1: 156865v1

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/index.html

[-- Attachment #2: Type: text/html, Size: 6380 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* ✗ Xe.CI.Full: failure for drm/xe: Multi Queue feature support
  2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
                   ` (19 preceding siblings ...)
  2025-10-31 22:19 ` ✗ Xe.CI.BAT: failure " Patchwork
@ 2025-11-01 11:25 ` Patchwork
  20 siblings, 0 replies; 61+ messages in thread
From: Patchwork @ 2025-11-01 11:25 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 28515 bytes --]

== Series Details ==

Series: drm/xe: Multi Queue feature support
URL   : https://patchwork.freedesktop.org/series/156865/
State : failure

== Summary ==

CI Bug Log - changes from xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59_FULL -> xe-pw-156865v1_FULL
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with xe-pw-156865v1_FULL absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in xe-pw-156865v1_FULL, please notify your bug team (I915-ci-infra@lists.freedesktop.org) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (4 -> 4)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in xe-pw-156865v1_FULL:

### IGT changes ###

#### Possible regressions ####

  * igt@kms_cursor_crc@cursor-suspend:
    - shard-bmg:          [PASS][1] -> [INCOMPLETE][2] +2 other tests incomplete
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-6/igt@kms_cursor_crc@cursor-suspend.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-6/igt@kms_cursor_crc@cursor-suspend.html

  * igt@xe_exec_queue_property@invalid-property:
    - shard-dg2-set2:     [PASS][3] -> [FAIL][4]
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-dg2-436/igt@xe_exec_queue_property@invalid-property.html
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-dg2-464/igt@xe_exec_queue_property@invalid-property.html
    - shard-lnl:          [PASS][5] -> [FAIL][6]
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-lnl-5/igt@xe_exec_queue_property@invalid-property.html
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-lnl-8/igt@xe_exec_queue_property@invalid-property.html
    - shard-adlp:         [PASS][7] -> [FAIL][8]
   [7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-adlp-2/igt@xe_exec_queue_property@invalid-property.html
   [8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-adlp-2/igt@xe_exec_queue_property@invalid-property.html
    - shard-bmg:          [PASS][9] -> [FAIL][10]
   [9]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-1/igt@xe_exec_queue_property@invalid-property.html
   [10]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-2/igt@xe_exec_queue_property@invalid-property.html

  
Known issues
------------

  Here are the changes found in xe-pw-156865v1_FULL that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_atomic_transition@modeset-transition:
    - shard-adlp:         [PASS][11] -> [DMESG-WARN][12] ([Intel XE#2953] / [Intel XE#4173]) +4 other tests dmesg-warn
   [11]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-adlp-1/igt@kms_atomic_transition@modeset-transition.html
   [12]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-adlp-4/igt@kms_atomic_transition@modeset-transition.html

  * igt@kms_big_fb@yf-tiled-addfb-size-overflow:
    - shard-bmg:          NOTRUN -> [SKIP][13] ([Intel XE#610])
   [13]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-4/igt@kms_big_fb@yf-tiled-addfb-size-overflow.html

  * igt@kms_bw@linear-tiling-1-displays-2160x1440p:
    - shard-dg2-set2:     NOTRUN -> [SKIP][14] ([Intel XE#367])
   [14]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-dg2-436/igt@kms_bw@linear-tiling-1-displays-2160x1440p.html

  * igt@kms_ccs@crc-primary-basic-yf-tiled-ccs:
    - shard-bmg:          NOTRUN -> [SKIP][15] ([Intel XE#2887]) +1 other test skip
   [15]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-4/igt@kms_ccs@crc-primary-basic-yf-tiled-ccs.html

  * igt@kms_ccs@crc-sprite-planes-basic-4-tiled-lnl-ccs@pipe-b-dp-2:
    - shard-bmg:          NOTRUN -> [SKIP][16] ([Intel XE#2652] / [Intel XE#787]) +3 other tests skip
   [16]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-7/igt@kms_ccs@crc-sprite-planes-basic-4-tiled-lnl-ccs@pipe-b-dp-2.html

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs:
    - shard-dg2-set2:     [PASS][17] -> [INCOMPLETE][18] ([Intel XE#1727] / [Intel XE#2705] / [Intel XE#3113] / [Intel XE#4212] / [Intel XE#4345] / [Intel XE#4522])
   [17]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-dg2-435/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs.html
   [18]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-dg2-463/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs.html

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs@pipe-a-dp-4:
    - shard-dg2-set2:     [PASS][19] -> [INCOMPLETE][20] ([Intel XE#1727] / [Intel XE#2705] / [Intel XE#3113] / [Intel XE#4212] / [Intel XE#4522])
   [19]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-dg2-435/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs@pipe-a-dp-4.html
   [20]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-dg2-463/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs@pipe-a-dp-4.html

  * igt@kms_cursor_crc@cursor-onscreen-512x170:
    - shard-bmg:          NOTRUN -> [SKIP][21] ([Intel XE#2321])
   [21]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-4/igt@kms_cursor_crc@cursor-onscreen-512x170.html

  * igt@kms_cursor_legacy@2x-long-cursor-vs-flip-legacy:
    - shard-bmg:          [PASS][22] -> [SKIP][23] ([Intel XE#2291]) +1 other test skip
   [22]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-1/igt@kms_cursor_legacy@2x-long-cursor-vs-flip-legacy.html
   [23]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-6/igt@kms_cursor_legacy@2x-long-cursor-vs-flip-legacy.html

  * igt@kms_flip@2x-blocking-absolute-wf_vblank:
    - shard-bmg:          [PASS][24] -> [SKIP][25] ([Intel XE#2316])
   [24]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-1/igt@kms_flip@2x-blocking-absolute-wf_vblank.html
   [25]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-6/igt@kms_flip@2x-blocking-absolute-wf_vblank.html

  * igt@kms_flip@basic-flip-vs-modeset@c-hdmi-a1:
    - shard-adlp:         [PASS][26] -> [DMESG-WARN][27] ([Intel XE#4543])
   [26]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-adlp-1/igt@kms_flip@basic-flip-vs-modeset@c-hdmi-a1.html
   [27]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-adlp-4/igt@kms_flip@basic-flip-vs-modeset@c-hdmi-a1.html

  * igt@kms_flip@flip-vs-suspend:
    - shard-bmg:          [PASS][28] -> [INCOMPLETE][29] ([Intel XE#2049] / [Intel XE#2597]) +1 other test incomplete
   [28]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-2/igt@kms_flip@flip-vs-suspend.html
   [29]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-8/igt@kms_flip@flip-vs-suspend.html

  * igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-spr-indfb-move:
    - shard-dg2-set2:     NOTRUN -> [SKIP][30] ([Intel XE#651])
   [30]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-dg2-436/igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-spr-indfb-move.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-draw-blt:
    - shard-bmg:          NOTRUN -> [SKIP][31] ([Intel XE#5390]) +1 other test skip
   [31]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-4/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-cur-indfb-move:
    - shard-bmg:          NOTRUN -> [SKIP][32] ([Intel XE#2311]) +1 other test skip
   [32]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-4/igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-cur-indfb-move.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-indfb-msflip-blt:
    - shard-dg2-set2:     NOTRUN -> [SKIP][33] ([Intel XE#653])
   [33]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-dg2-436/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-indfb-msflip-blt.html

  * igt@kms_pm_dc@dc5-retention-flops:
    - shard-bmg:          NOTRUN -> [SKIP][34] ([Intel XE#3309])
   [34]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-4/igt@kms_pm_dc@dc5-retention-flops.html

  * igt@kms_psr2_sf@fbc-psr2-cursor-plane-move-continuous-exceed-fully-sf:
    - shard-bmg:          NOTRUN -> [SKIP][35] ([Intel XE#1406] / [Intel XE#1489])
   [35]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-4/igt@kms_psr2_sf@fbc-psr2-cursor-plane-move-continuous-exceed-fully-sf.html

  * igt@kms_psr@psr2-primary-page-flip:
    - shard-bmg:          NOTRUN -> [SKIP][36] ([Intel XE#1406] / [Intel XE#2234] / [Intel XE#2850]) +1 other test skip
   [36]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-4/igt@kms_psr@psr2-primary-page-flip.html

  * igt@kms_rotation_crc@primary-x-tiled-reflect-x-0:
    - shard-bmg:          [PASS][37] -> [ABORT][38] ([Intel XE#3970])
   [37]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-5/igt@kms_rotation_crc@primary-x-tiled-reflect-x-0.html
   [38]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-4/igt@kms_rotation_crc@primary-x-tiled-reflect-x-0.html

  * igt@xe_eudebug_online@interrupt-all-set-breakpoint:
    - shard-bmg:          NOTRUN -> [SKIP][39] ([Intel XE#4837])
   [39]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-4/igt@xe_eudebug_online@interrupt-all-set-breakpoint.html

  * igt@xe_exec_basic@multigpu-many-execqueues-many-vm-userptr-invalidate-race:
    - shard-bmg:          NOTRUN -> [SKIP][40] ([Intel XE#2322])
   [40]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-4/igt@xe_exec_basic@multigpu-many-execqueues-many-vm-userptr-invalidate-race.html

  * igt@xe_exec_fault_mode@many-execqueues-userptr-invalidate-prefetch:
    - shard-dg2-set2:     NOTRUN -> [SKIP][41] ([Intel XE#288])
   [41]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-dg2-436/igt@xe_exec_fault_mode@many-execqueues-userptr-invalidate-prefetch.html

  * igt@xe_exec_system_allocator@process-many-large-execqueues-mmap-mlock-nomemset:
    - shard-dg2-set2:     NOTRUN -> [SKIP][42] ([Intel XE#4915]) +5 other tests skip
   [42]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-dg2-436/igt@xe_exec_system_allocator@process-many-large-execqueues-mmap-mlock-nomemset.html

  * igt@xe_exec_system_allocator@threads-many-mmap-huge:
    - shard-bmg:          NOTRUN -> [SKIP][43] ([Intel XE#4943]) +1 other test skip
   [43]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-4/igt@xe_exec_system_allocator@threads-many-mmap-huge.html

  * igt@xe_oa@non-privileged-access-vaddr:
    - shard-dg2-set2:     NOTRUN -> [SKIP][44] ([Intel XE#3573])
   [44]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-dg2-436/igt@xe_oa@non-privileged-access-vaddr.html

  * igt@xe_pmu@gt-c6-idle:
    - shard-dg2-set2:     NOTRUN -> [FAIL][45] ([Intel XE#6366])
   [45]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-dg2-436/igt@xe_pmu@gt-c6-idle.html

  * igt@xe_pmu@gt-frequency:
    - shard-dg2-set2:     [PASS][46] -> [FAIL][47] ([Intel XE#4819]) +1 other test fail
   [46]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-dg2-463/igt@xe_pmu@gt-frequency.html
   [47]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-dg2-432/igt@xe_pmu@gt-frequency.html

  * igt@xe_query@multigpu-query-config:
    - shard-bmg:          NOTRUN -> [SKIP][48] ([Intel XE#944])
   [48]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-4/igt@xe_query@multigpu-query-config.html

  
#### Possible fixes ####

  * igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions-varying-size:
    - shard-bmg:          [SKIP][49] ([Intel XE#2291]) -> [PASS][50] +5 other tests pass
   [49]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-6/igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions-varying-size.html
   [50]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-8/igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions-varying-size.html

  * igt@kms_cursor_legacy@flip-vs-cursor-legacy:
    - shard-bmg:          [FAIL][51] ([Intel XE#5299]) -> [PASS][52]
   [51]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-7/igt@kms_cursor_legacy@flip-vs-cursor-legacy.html
   [52]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-5/igt@kms_cursor_legacy@flip-vs-cursor-legacy.html

  * igt@kms_feature_discovery@display-2x:
    - shard-bmg:          [SKIP][53] ([Intel XE#2373]) -> [PASS][54]
   [53]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-6/igt@kms_feature_discovery@display-2x.html
   [54]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-8/igt@kms_feature_discovery@display-2x.html

  * igt@kms_flip@2x-plain-flip-ts-check-interruptible:
    - shard-bmg:          [SKIP][55] ([Intel XE#2316]) -> [PASS][56] +3 other tests pass
   [55]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-6/igt@kms_flip@2x-plain-flip-ts-check-interruptible.html
   [56]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-8/igt@kms_flip@2x-plain-flip-ts-check-interruptible.html

  * igt@kms_flip@basic-flip-vs-dpms:
    - shard-adlp:         [DMESG-WARN][57] ([Intel XE#4543]) -> [PASS][58] +4 other tests pass
   [57]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-adlp-8/igt@kms_flip@basic-flip-vs-dpms.html
   [58]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-adlp-1/igt@kms_flip@basic-flip-vs-dpms.html

  * igt@kms_flip@flip-vs-expired-vblank@a-edp1:
    - shard-lnl:          [FAIL][59] ([Intel XE#301]) -> [PASS][60] +1 other test pass
   [59]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-lnl-5/igt@kms_flip@flip-vs-expired-vblank@a-edp1.html
   [60]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-lnl-8/igt@kms_flip@flip-vs-expired-vblank@a-edp1.html

  * igt@kms_flip@flip-vs-expired-vblank@c-edp1:
    - shard-lnl:          [FAIL][61] ([Intel XE#301] / [Intel XE#3149]) -> [PASS][62] +1 other test pass
   [61]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-lnl-5/igt@kms_flip@flip-vs-expired-vblank@c-edp1.html
   [62]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-lnl-8/igt@kms_flip@flip-vs-expired-vblank@c-edp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-dg2-set2:     [INCOMPLETE][63] ([Intel XE#2049] / [Intel XE#2597]) -> [PASS][64] +1 other test pass
   [63]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-dg2-433/igt@kms_flip@flip-vs-suspend-interruptible.html
   [64]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-dg2-436/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_flip_scaled_crc@flip-64bpp-xtile-to-16bpp-xtile-upscaling:
    - shard-adlp:         [DMESG-WARN][65] ([Intel XE#2953] / [Intel XE#4173]) -> [PASS][66] +9 other tests pass
   [65]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-adlp-2/igt@kms_flip_scaled_crc@flip-64bpp-xtile-to-16bpp-xtile-upscaling.html
   [66]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-adlp-2/igt@kms_flip_scaled_crc@flip-64bpp-xtile-to-16bpp-xtile-upscaling.html

  * igt@kms_hdr@invalid-hdr:
    - shard-dg2-set2:     [SKIP][67] ([Intel XE#455]) -> [PASS][68]
   [67]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-dg2-435/igt@kms_hdr@invalid-hdr.html
   [68]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-dg2-463/igt@kms_hdr@invalid-hdr.html

  * igt@kms_hdr@static-toggle-suspend:
    - shard-bmg:          [DMESG-WARN][69] -> [PASS][70] +1 other test pass
   [69]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-5/igt@kms_hdr@static-toggle-suspend.html
   [70]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-8/igt@kms_hdr@static-toggle-suspend.html

  * igt@xe_evict@evict-mixed-many-threads-small:
    - shard-bmg:          [INCOMPLETE][71] ([Intel XE#6321]) -> [PASS][72]
   [71]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-8/igt@xe_evict@evict-mixed-many-threads-small.html
   [72]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-1/igt@xe_evict@evict-mixed-many-threads-small.html

  * igt@xe_exec_system_allocator@pat-index-madvise-pat-idx-uc-multi-vma:
    - shard-lnl:          [FAIL][73] ([Intel XE#6353]) -> [PASS][74]
   [73]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-lnl-2/igt@xe_exec_system_allocator@pat-index-madvise-pat-idx-uc-multi-vma.html
   [74]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-lnl-2/igt@xe_exec_system_allocator@pat-index-madvise-pat-idx-uc-multi-vma.html

  
#### Warnings ####

  * igt@kms_big_fb@linear-8bpp-rotate-90:
    - shard-lnl:          [SKIP][75] ([Intel XE#1407]) -> [ABORT][76] ([Intel XE#4760])
   [75]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-lnl-1/igt@kms_big_fb@linear-8bpp-rotate-90.html
   [76]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-lnl-1/igt@kms_big_fb@linear-8bpp-rotate-90.html

  * igt@kms_frontbuffer_tracking@drrs-2p-primscrn-spr-indfb-draw-mmap-wc:
    - shard-bmg:          [SKIP][77] ([Intel XE#2311]) -> [SKIP][78] ([Intel XE#2312]) +3 other tests skip
   [77]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-1/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-spr-indfb-draw-mmap-wc.html
   [78]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-6/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-spr-indfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-msflip-blt:
    - shard-bmg:          [SKIP][79] ([Intel XE#5390]) -> [SKIP][80] ([Intel XE#2312])
   [79]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-1/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-msflip-blt.html
   [80]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-6/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-msflip-blt.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-pgflip-blt:
    - shard-bmg:          [SKIP][81] ([Intel XE#2312]) -> [SKIP][82] ([Intel XE#5390]) +4 other tests skip
   [81]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-6/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-pgflip-blt.html
   [82]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-7/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-pgflip-blt.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-2p-scndscrn-cur-indfb-draw-mmap-wc:
    - shard-bmg:          [SKIP][83] ([Intel XE#2312]) -> [SKIP][84] ([Intel XE#2311]) +7 other tests skip
   [83]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-6/igt@kms_frontbuffer_tracking@fbcdrrs-2p-scndscrn-cur-indfb-draw-mmap-wc.html
   [84]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-8/igt@kms_frontbuffer_tracking@fbcdrrs-2p-scndscrn-cur-indfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-spr-indfb-onoff:
    - shard-bmg:          [SKIP][85] ([Intel XE#2313]) -> [SKIP][86] ([Intel XE#2312]) +4 other tests skip
   [85]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-1/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-spr-indfb-onoff.html
   [86]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-6/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-spr-indfb-onoff.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-spr-indfb-onoff:
    - shard-bmg:          [SKIP][87] ([Intel XE#2312]) -> [SKIP][88] ([Intel XE#2313]) +8 other tests skip
   [87]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-6/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-spr-indfb-onoff.html
   [88]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-7/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-spr-indfb-onoff.html

  * igt@kms_hdr@brightness-with-hdr:
    - shard-bmg:          [SKIP][89] ([Intel XE#3544]) -> [SKIP][90] ([Intel XE#3374] / [Intel XE#3544])
   [89]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-1/igt@kms_hdr@brightness-with-hdr.html
   [90]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-2/igt@kms_hdr@brightness-with-hdr.html

  * igt@kms_tiled_display@basic-test-pattern:
    - shard-bmg:          [SKIP][91] ([Intel XE#2426]) -> [FAIL][92] ([Intel XE#1729])
   [91]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-7/igt@kms_tiled_display@basic-test-pattern.html
   [92]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-5/igt@kms_tiled_display@basic-test-pattern.html

  * igt@kms_tiled_display@basic-test-pattern-with-chamelium:
    - shard-bmg:          [SKIP][93] ([Intel XE#2426]) -> [SKIP][94] ([Intel XE#2509])
   [93]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-bmg-4/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html
   [94]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-bmg-5/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html

  * igt@xe_exec_basic@multigpu-many-execqueues-many-vm-rebind:
    - shard-dg2-set2:     [SKIP][95] ([Intel XE#1392]) -> [INCOMPLETE][96] ([Intel XE#4842])
   [95]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-dg2-464/igt@xe_exec_basic@multigpu-many-execqueues-many-vm-rebind.html
   [96]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-dg2-466/igt@xe_exec_basic@multigpu-many-execqueues-many-vm-rebind.html

  * igt@xe_sriov_scheduling@equal-throughput:
    - shard-adlp:         [DMESG-FAIL][97] ([Intel XE#3868] / [Intel XE#5213]) -> [DMESG-FAIL][98] ([Intel XE#3868] / [Intel XE#5213] / [Intel XE#5545]) +1 other test dmesg-fail
   [97]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59/shard-adlp-8/igt@xe_sriov_scheduling@equal-throughput.html
   [98]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/shard-adlp-1/igt@xe_sriov_scheduling@equal-throughput.html

  
  [Intel XE#1392]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1392
  [Intel XE#1406]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1406
  [Intel XE#1407]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1407
  [Intel XE#1489]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1489
  [Intel XE#1727]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1727
  [Intel XE#1729]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1729
  [Intel XE#2049]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2049
  [Intel XE#2234]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2234
  [Intel XE#2291]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2291
  [Intel XE#2311]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2311
  [Intel XE#2312]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2312
  [Intel XE#2313]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2313
  [Intel XE#2316]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2316
  [Intel XE#2321]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2321
  [Intel XE#2322]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2322
  [Intel XE#2373]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2373
  [Intel XE#2426]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2426
  [Intel XE#2509]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2509
  [Intel XE#2597]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2597
  [Intel XE#2652]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2652
  [Intel XE#2705]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2705
  [Intel XE#2850]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2850
  [Intel XE#288]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/288
  [Intel XE#2887]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2887
  [Intel XE#2953]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2953
  [Intel XE#301]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/301
  [Intel XE#3113]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3113
  [Intel XE#3149]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3149
  [Intel XE#3309]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3309
  [Intel XE#3374]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3374
  [Intel XE#3544]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3544
  [Intel XE#3573]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3573
  [Intel XE#367]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/367
  [Intel XE#3868]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3868
  [Intel XE#3970]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3970
  [Intel XE#4173]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4173
  [Intel XE#4212]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4212
  [Intel XE#4345]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4345
  [Intel XE#4522]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4522
  [Intel XE#4543]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4543
  [Intel XE#455]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/455
  [Intel XE#4760]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4760
  [Intel XE#4819]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4819
  [Intel XE#4837]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4837
  [Intel XE#4842]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4842
  [Intel XE#4915]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4915
  [Intel XE#4943]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4943
  [Intel XE#5213]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5213
  [Intel XE#5299]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5299
  [Intel XE#5390]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5390
  [Intel XE#5545]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5545
  [Intel XE#610]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/610
  [Intel XE#6321]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6321
  [Intel XE#6353]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6353
  [Intel XE#6366]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6366
  [Intel XE#651]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/651
  [Intel XE#653]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/653
  [Intel XE#787]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/787
  [Intel XE#944]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/944


Build changes
-------------

  * Linux: xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59 -> xe-pw-156865v1

  IGT_8604: 8604
  xe-4025-efcd62f1d00607db85d10ec29387fe6516daee59: efcd62f1d00607db85d10ec29387fe6516daee59
  xe-pw-156865v1: 156865v1

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156865v1/index.html

[-- Attachment #2: Type: text/html, Size: 32239 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] drm/xe/multi_queue: Add GuC interface for multi queue support
  2025-10-31 18:29 ` [PATCH 03/16] drm/xe/multi_queue: Add GuC " Niranjana Vishwanathapura
@ 2025-11-01 18:07   ` Matthew Brost
  2025-11-04  4:56     ` Niranjana Vishwanathapura
  2025-11-02 18:02   ` Matthew Brost
  1 sibling, 1 reply; 61+ messages in thread
From: Matthew Brost @ 2025-11-01 18:07 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Fri, Oct 31, 2025 at 11:29:23AM -0700, Niranjana Vishwanathapura wrote:
> Implement GuC commands and response along with the Context
> Group Page (CGP) interface for multi queue support.
> 
> Ensure that only primary queue (q0) of a multi queue group
> communicate with GuC. The secondary queues of the group only
> need to maintain LRCA and interface with drm scheduler.
> 
> Use primary queue's submit_wq for all secondary queues of a multi
> queue group. This serialization avoids any locking around CGP
> synchronization with GuC.
> 

Not a complete review, but a few comments.

> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/xe/abi/guc_actions_abi.h |   3 +
>  drivers/gpu/drm/xe/xe_exec_queue_types.h |   2 +
>  drivers/gpu/drm/xe/xe_guc_ct.c           |   4 +
>  drivers/gpu/drm/xe/xe_guc_fwif.h         |   3 +
>  drivers/gpu/drm/xe/xe_guc_submit.c       | 302 +++++++++++++++++++----
>  drivers/gpu/drm/xe/xe_guc_submit.h       |   1 +
>  6 files changed, 270 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/abi/guc_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> index 47756e4674a1..3e9fbed9cda6 100644
> --- a/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> +++ b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> @@ -139,6 +139,9 @@ enum xe_guc_action {
>  	XE_GUC_ACTION_DEREGISTER_G2G = 0x4508,
>  	XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
>  	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
> +	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE = 0x4602,
> +	XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC = 0x4603,
> +	XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE = 0x4604,
>  	XE_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507,
>  	XE_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A,
>  	XE_GUC_ACTION_SET_DEVICE_ENGINE_ACTIVITY_BUFFER = 0x550C,
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index 3856776df5c4..38e47b003259 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -47,6 +47,8 @@ struct xe_exec_queue_group {
>  	struct xarray xa;
>  	/** @list_lock: Secondary queue list lock */
>  	struct mutex list_lock;
> +	/** @sync_pending: CGP_SYNC_DONE g2h response pending */
> +	bool sync_pending;
>  };
>  
>  /**
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> index e68953ef3a00..48b5006eb080 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -1304,6 +1304,7 @@ static int parse_g2h_event(struct xe_guc_ct *ct, u32 *msg, u32 len)
>  	lockdep_assert_held(&ct->lock);
>  
>  	switch (action) {
> +	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
>  	case XE_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
>  	case XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
>  	case XE_GUC_ACTION_SCHED_ENGINE_MODE_DONE:
> @@ -1570,6 +1571,9 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
>  		ret = xe_guc_g2g_test_notification(guc, payload, adj_len);
>  		break;
>  #endif
> +	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
> +		ret = xe_guc_exec_queue_cgp_sync_done_handler(guc, payload, adj_len);
> +		break;
>  	default:
>  		xe_gt_err(gt, "unexpected G2H action 0x%04x\n", action);
>  	}
> diff --git a/drivers/gpu/drm/xe/xe_guc_fwif.h b/drivers/gpu/drm/xe/xe_guc_fwif.h
> index c90dd266e9cf..610dfb2f1cb5 100644
> --- a/drivers/gpu/drm/xe/xe_guc_fwif.h
> +++ b/drivers/gpu/drm/xe/xe_guc_fwif.h
> @@ -16,6 +16,7 @@
>  #define G2H_LEN_DW_DEREGISTER_CONTEXT		3
>  #define G2H_LEN_DW_TLB_INVALIDATE		3
>  #define G2H_LEN_DW_G2G_NOTIFY_MIN		3
> +#define G2H_LEN_DW_MULTI_QUEUE_CONTEXT		4

This value doesn't look right. I'm not sure where 4 is coming from.

The length of XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE
appears to be 2. So with a value of 4, I believe the G2H credits will
leak.

You can run a multi-q test, then check the following debugfs:

cat /sys/kernel/debug/dri/0/gt0/uc/guc_info

In particular, these are the interesting fields:

G2H CTB (all sizes in DW):
        ...
	resv_space: 16384
        ...
	g2h outstanding: 0

^^^ This is what an idle G2H should look like. I suspect both G2H
outstanding values will be non-zero, and resv_space will continuously
decrease when running a multi-queue test.

>  
>  #define GUC_ID_MAX			65535
>  #define GUC_ID_UNKNOWN			0xffffffff
> @@ -62,6 +63,8 @@ struct guc_ctxt_registration_info {
>  	u32 wq_base_lo;
>  	u32 wq_base_hi;
>  	u32 wq_size;
> +	u32 cgp_lo;
> +	u32 cgp_hi;
>  	u32 hwlrca_lo;
>  	u32 hwlrca_hi;
>  };
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index d4ffdb71ef3d..d2aa9a2524e7 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -46,6 +46,7 @@
>  #include "xe_trace.h"
>  #include "xe_uc_fw.h"
>  #include "xe_vm.h"
> +#include "xe_bo.h"
>  
>  static struct xe_guc *
>  exec_queue_to_guc(struct xe_exec_queue *q)
> @@ -541,7 +542,8 @@ static void init_policies(struct xe_guc *guc, struct xe_exec_queue *q)
>  	u32 slpc_exec_queue_freq_req = 0;
>  	u32 preempt_timeout_us = q->sched_props.preempt_timeout_us;
>  
> -	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
> +	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q) &&
> +		     !xe_exec_queue_is_multi_queue_secondary(q));
>  
>  	if (q->flags & EXEC_QUEUE_FLAG_LOW_LATENCY)
>  		slpc_exec_queue_freq_req |= SLPC_CTX_FREQ_REQ_IS_COMPUTE;
> @@ -561,6 +563,8 @@ static void set_min_preemption_timeout(struct xe_guc *guc, struct xe_exec_queue
>  {
>  	struct exec_queue_policy policy;
>  
> +	xe_assert(guc_to_xe(guc), !xe_exec_queue_is_multi_queue_secondary(q));
> +
>  	__guc_exec_queue_policy_start_klv(&policy, q->guc->id);
>  	__guc_exec_queue_policy_add_preemption_timeout(&policy, 1);
>  
> @@ -575,6 +579,130 @@ static void set_min_preemption_timeout(struct xe_guc *guc, struct xe_exec_queue
>  	xe_map_wr_field(xe_, &map_, 0, struct guc_submit_parallel_scratch, \
>  			field_, val_)
>  
> +#define CGP_VERSION_MAJOR_SHIFT	8
> +
> +static void xe_guc_exec_queue_group_cgp_update(struct xe_device *xe,
> +					       struct xe_exec_queue *q)
> +{
> +	struct xe_exec_queue_group *group = q->multi_queue.group;
> +	u32 guc_id = group->primary->guc->id;
> +
> +	/* Currently implementing CGP version 1.0 */
> +	xe_map_wr(xe, &group->cgp_bo->vmap, 0, u32,
> +		  1 << CGP_VERSION_MAJOR_SHIFT);
> +
> +	xe_map_wr(xe, &group->cgp_bo->vmap,
> +		  (32 + q->multi_queue.pos * 2) * sizeof(u32),
> +		  u32, lower_32_bits(xe_lrc_descriptor(q->lrc[0])));
> +
> +	xe_map_wr(xe, &group->cgp_bo->vmap,
> +		  (33 + q->multi_queue.pos * 2) * sizeof(u32),
> +		  u32, guc_id);
> +
> +	if (q->multi_queue.pos / 32) {
> +		xe_map_wr(xe, &group->cgp_bo->vmap, 17 * sizeof(u32),
> +			  u32, BIT(q->multi_queue.pos % 32));
> +		xe_map_wr(xe, &group->cgp_bo->vmap, 16 * sizeof(u32), u32, 0);
> +	} else {
> +		xe_map_wr(xe, &group->cgp_bo->vmap, 16 * sizeof(u32),
> +			  u32, BIT(q->multi_queue.pos));
> +		xe_map_wr(xe, &group->cgp_bo->vmap, 17 * sizeof(u32), u32, 0);
> +	}
> +}
> +
> +static void xe_guc_exec_queue_group_cgp_sync(struct xe_guc *guc,
> +					     struct xe_exec_queue *q,
> +					     const u32 *action, u32 len)
> +{
> +	struct xe_exec_queue_group *group = q->multi_queue.group;
> +	struct xe_device *xe = guc_to_xe(guc);
> +	long ret;
> +
> +	/*
> +	 * As all queues of a multi queue group use single drm scheduler
> +	 * submit workqueue, CGP synchronization with GuC are serialized.
> +	 * Hence, no locking is required here.
> +	 * Wait for any pending CGP_SYNC_DONE response before updating the
> +	 * CGP page and sending CGP_SYNC message.
> +	 */
> +	ret = wait_event_timeout(guc->ct.wq,
> +				 !READ_ONCE(group->sync_pending) ||
> +				 xe_guc_read_stopped(guc), HZ);
> +	if (!ret || xe_guc_read_stopped(guc)) {
> +		drm_err(&xe->drm, "Wait for CGP_SYNC_DONE response failed!\n");

If this occurs you need a GT reset which should detect
group->sync_pending in guc_exec_queue_stop and clean it up.

Also here is where VF migration needs to be considered. The
wait_event_timeout should pop out on vf_recovery being set, but not
trigger a GT reset. In this case we need likely need some per secondary
queue tracking state to figure out which secondary queues lost the CPG
syncs so that flow can recover. We can figure out part out a bit later
though.

> +		/* Something wrong with the CTB or GuC, no need to proceed */
> +		return;
> +	}
> +
> +	xe_guc_exec_queue_group_cgp_update(xe, q);
> +
> +	WRITE_ONCE(group->sync_pending, true);
> +	xe_guc_ct_send(&guc->ct, action, len, G2H_LEN_DW_MULTI_QUEUE_CONTEXT, 1);

The problem here appears to be two fold:

- The value of G2H_LEN_DW_MULTI_QUEUE_CONTEXT looks incorrect
- On multi-q registration both G2H credits and count are set but multi-q
  register doesn't produce a G2H response. See my comment above thinga
  getting leaked, that can't happen as PM will be off and eventually G2H
  credits will run out and deadlock the CT channel leading to a GT reset.

> +}
> +
> +static void __register_exec_queue(struct xe_guc *guc,
> +				  struct guc_ctxt_registration_info *info)
> +{
> +	u32 action[] = {
> +		XE_GUC_ACTION_REGISTER_CONTEXT,
> +		info->flags,
> +		info->context_idx,
> +		info->engine_class,
> +		info->engine_submit_mask,
> +		info->wq_desc_lo,
> +		info->wq_desc_hi,
> +		info->wq_base_lo,
> +		info->wq_base_hi,
> +		info->wq_size,
> +		info->hwlrca_lo,
> +		info->hwlrca_hi,
> +	};
> +
> +	/* explicitly checks some fields that we might fixup later */
> +	xe_gt_assert(guc_to_gt(guc), info->wq_desc_lo ==
> +		     action[XE_GUC_REGISTER_CONTEXT_DATA_5_WQ_DESC_ADDR_LOWER]);
> +	xe_gt_assert(guc_to_gt(guc), info->wq_base_lo ==
> +		     action[XE_GUC_REGISTER_CONTEXT_DATA_7_WQ_BUF_BASE_LOWER]);
> +	xe_gt_assert(guc_to_gt(guc), info->hwlrca_lo ==
> +		     action[XE_GUC_REGISTER_CONTEXT_DATA_10_HW_LRC_ADDR]);
> +
> +	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
> +}
> +
> +static void __register_exec_queue_group(struct xe_guc *guc,
> +					struct xe_exec_queue *q,
> +					struct guc_ctxt_registration_info *info)
> +{
> +#define MAX_MULTI_QUEUE_REG_SIZE	(8)
> +	struct xe_device *xe = guc_to_xe(guc);
> +	u32 action[MAX_MULTI_QUEUE_REG_SIZE];
> +	int len = 0;
> +
> +	if (xe_exec_queue_is_multi_queue_primary(q)) {
> +		action[len++] = XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE;

Again as mentioned above, this command doesn't require G2H credits
unless this produces a XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE
response.

> +		action[len++] = info->flags;
> +		action[len++] = info->context_idx;
> +		action[len++] = info->engine_class;
> +		action[len++] = info->engine_submit_mask;
> +		action[len++] = 0; /* Reserved */
> +		action[len++] = info->cgp_lo;
> +		action[len++] = info->cgp_hi;
> +	} else {
> +		/*
> +		 * No need to wait before CGP sync since CT descriptors
> +		 * should be ordered.
> +		 */
> +
> +		action[len++] = XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC;
> +		action[len++] = q->multi_queue.group->primary->guc->id;
> +	}
> +
> +	xe_assert(xe, len <= MAX_MULTI_QUEUE_REG_SIZE);
> +#undef MAX_MULTI_QUEUE_REG_SIZE
> +
> +	xe_guc_exec_queue_group_cgp_sync(guc, q, action, len);
> +}
> +
>  static void __register_mlrc_exec_queue(struct xe_guc *guc,
>  				       struct xe_exec_queue *q,
>  				       struct guc_ctxt_registration_info *info)
> @@ -622,35 +750,6 @@ static void __register_mlrc_exec_queue(struct xe_guc *guc,
>  	xe_guc_ct_send(&guc->ct, action, len, 0, 0);
>  }
>  
> -static void __register_exec_queue(struct xe_guc *guc,
> -				  struct guc_ctxt_registration_info *info)
> -{
> -	u32 action[] = {
> -		XE_GUC_ACTION_REGISTER_CONTEXT,
> -		info->flags,
> -		info->context_idx,
> -		info->engine_class,
> -		info->engine_submit_mask,
> -		info->wq_desc_lo,
> -		info->wq_desc_hi,
> -		info->wq_base_lo,
> -		info->wq_base_hi,
> -		info->wq_size,
> -		info->hwlrca_lo,
> -		info->hwlrca_hi,
> -	};
> -
> -	/* explicitly checks some fields that we might fixup later */
> -	xe_gt_assert(guc_to_gt(guc), info->wq_desc_lo ==
> -		     action[XE_GUC_REGISTER_CONTEXT_DATA_5_WQ_DESC_ADDR_LOWER]);
> -	xe_gt_assert(guc_to_gt(guc), info->wq_base_lo ==
> -		     action[XE_GUC_REGISTER_CONTEXT_DATA_7_WQ_BUF_BASE_LOWER]);
> -	xe_gt_assert(guc_to_gt(guc), info->hwlrca_lo ==
> -		     action[XE_GUC_REGISTER_CONTEXT_DATA_10_HW_LRC_ADDR]);
> -
> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
> -}
> -
>  static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
>  {
>  	struct xe_guc *guc = exec_queue_to_guc(q);
> @@ -670,6 +769,13 @@ static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
>  	info.flags = CONTEXT_REGISTRATION_FLAG_KMD |
>  		FIELD_PREP(CONTEXT_REGISTRATION_FLAG_TYPE, ctx_type);
>  
> +	if (xe_exec_queue_is_multi_queue(q)) {
> +		struct xe_exec_queue_group *group = q->multi_queue.group;
> +
> +		info.cgp_lo = xe_bo_ggtt_addr(group->cgp_bo);
> +		info.cgp_hi = 0;
> +	}
> +
>  	if (xe_exec_queue_is_parallel(q)) {
>  		u64 ggtt_addr = xe_lrc_parallel_ggtt_addr(lrc);
>  		struct iosys_map map = xe_lrc_parallel_map(lrc);
> @@ -700,11 +806,15 @@ static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
>  
>  	set_exec_queue_registered(q);
>  	trace_xe_exec_queue_register(q);
> -	if (xe_exec_queue_is_parallel(q))
> +	if (xe_exec_queue_is_multi_queue(q))
> +		__register_exec_queue_group(guc, q, &info);
> +	else if (xe_exec_queue_is_parallel(q))
>  		__register_mlrc_exec_queue(guc, q, &info);
>  	else
>  		__register_exec_queue(guc, &info);
> -	init_policies(guc, q);
> +
> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
> +		init_policies(guc, q);
>  }
>  
>  static u32 wq_space_until_wrap(struct xe_exec_queue *q)
> @@ -833,6 +943,12 @@ static void submit_exec_queue(struct xe_exec_queue *q, struct xe_sched_job *job)
>  	if (exec_queue_suspended(q) && !xe_exec_queue_is_parallel(q))
>  		return;
>  
> +	/*
> +	 * All queues in a multi-queue group will use the primary queue
> +	 * of the group to interface with GuC.
> +	 */
> +	q = xe_exec_queue_multi_queue_primary(q);
> +
>  	if (!exec_queue_enabled(q) && !exec_queue_suspended(q)) {
>  		action[len++] = XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
>  		action[len++] = q->guc->id;
> @@ -879,6 +995,18 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
>  	trace_xe_sched_job_run(job);
>  
>  	if (!killed_or_banned_or_wedged && !xe_sched_job_is_error(job)) {
> +		if (xe_exec_queue_is_multi_queue_secondary(q)) {
> +			struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
> +
> +			if (exec_queue_killed_or_banned_or_wedged(primary)) {
> +				killed_or_banned_or_wedged = true;
> +				goto run_job_out;
> +			}
> +
> +			if (!exec_queue_registered(primary))
> +				register_exec_queue(primary, GUC_CONTEXT_NORMAL);
> +		}
> +
>  		if (!exec_queue_registered(q))
>  			register_exec_queue(q, GUC_CONTEXT_NORMAL);
>  		if (!job->skip_emit)
> @@ -887,6 +1015,7 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
>  		job->skip_emit = false;
>  	}
>  
> +run_job_out:
>  	/*
>  	 * We don't care about job-fence ordering in LR VMs because these fences
>  	 * are never exported; they are used solely to keep jobs on the pending
> @@ -912,6 +1041,11 @@ int xe_guc_read_stopped(struct xe_guc *guc)
>  	return atomic_read(&guc->submission_state.stopped);
>  }
>  
> +static void handle_multi_queue_secondary_sched_done(struct xe_guc *guc,
> +						    struct xe_exec_queue *q,
> +						    u32 runnable_state);
> +static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q);
> +
>  #define MAKE_SCHED_CONTEXT_ACTION(q, enable_disable)			\
>  	u32 action[] = {						\
>  		XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET,			\
> @@ -925,7 +1059,9 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
>  	MAKE_SCHED_CONTEXT_ACTION(q, DISABLE);
>  	int ret;
>  
> -	set_min_preemption_timeout(guc, q);
> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
> +		set_min_preemption_timeout(guc, q);
> +
>  	smp_rmb();
>  	ret = wait_event_timeout(guc->ct.wq,
>  				 (!exec_queue_pending_enable(q) &&
> @@ -953,9 +1089,12 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
>  	 * Reserve space for both G2H here as the 2nd G2H is sent from a G2H
>  	 * handler and we are not allowed to reserved G2H space in handlers.
>  	 */
> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET +
> -		       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
> +	if (xe_exec_queue_is_multi_queue_secondary(q))
> +		handle_multi_queue_secondary_sched_done(guc, q, 0);
> +	else
> +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET +
> +			       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
>  }
>  
>  static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
> @@ -1161,8 +1300,11 @@ static void enable_scheduling(struct xe_exec_queue *q)
>  	set_exec_queue_enabled(q);
>  	trace_xe_exec_queue_scheduling_enable(q);
>  
> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
> +	if (xe_exec_queue_is_multi_queue_secondary(q))
> +		handle_multi_queue_secondary_sched_done(guc, q, 1);
> +	else
> +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
>  
>  	ret = wait_event_timeout(guc->ct.wq,
>  				 !exec_queue_pending_enable(q) ||
> @@ -1186,14 +1328,17 @@ static void disable_scheduling(struct xe_exec_queue *q, bool immediate)
>  	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
>  	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q));
>  
> -	if (immediate)
> +	if (immediate && !xe_exec_queue_is_multi_queue_secondary(q))
>  		set_min_preemption_timeout(guc, q);
>  	clear_exec_queue_enabled(q);
>  	set_exec_queue_pending_disable(q);
>  	trace_xe_exec_queue_scheduling_disable(q);
>  
> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
> +	if (xe_exec_queue_is_multi_queue_secondary(q))
> +		handle_multi_queue_secondary_sched_done(guc, q, 0);
> +	else
> +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
>  }
>  
>  static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
> @@ -1211,8 +1356,11 @@ static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
>  	set_exec_queue_destroyed(q);
>  	trace_xe_exec_queue_deregister(q);
>  
> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> -		       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
> +	if (xe_exec_queue_is_multi_queue_secondary(q))
> +		handle_deregister_done(guc, q);
> +	else
> +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> +			       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
>  }
>  
>  static enum drm_gpu_sched_stat
> @@ -1660,6 +1808,7 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
>  {
>  	struct xe_gpu_scheduler *sched;
>  	struct xe_guc *guc = exec_queue_to_guc(q);
> +	struct workqueue_struct *submit_wq = NULL;
>  	struct xe_guc_exec_queue *ge;
>  	long timeout;
>  	int err, i;
> @@ -1680,8 +1829,20 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
>  
>  	timeout = (q->vm && xe_vm_in_lr_mode(q->vm)) ? MAX_SCHEDULE_TIMEOUT :
>  		  msecs_to_jiffies(q->sched_props.job_timeout_ms);
> +
> +	/*
> +	 * Use primary queue's submit_wq for all secondary queues of a
> +	 * multi queue group. This serialization avoids any locking around
> +	 * CGP synchronization with GuC.
> +	 */
> +	if (xe_exec_queue_is_multi_queue_secondary(q)) {
> +		struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
> +
> +		submit_wq = primary->guc->sched.base.submit_wq;
> +	}
> +
>  	err = xe_sched_init(&ge->sched, &drm_sched_ops, &xe_sched_ops,
> -			    NULL, xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES, 64,
> +			    submit_wq, xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES, 64,
>  			    timeout, guc_to_gt(guc)->ordered_wq, NULL,
>  			    q->name, gt_to_xe(q->gt)->drm.dev);
>  	if (err)
> @@ -2418,7 +2579,11 @@ static void deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
>  
>  	trace_xe_exec_queue_deregister(q);
>  
> -	xe_guc_ct_send_g2h_handler(&guc->ct, action, ARRAY_SIZE(action));
> +	if (xe_exec_queue_is_multi_queue_secondary(q))
> +		handle_deregister_done(guc, q);
> +	else
> +		xe_guc_ct_send_g2h_handler(&guc->ct, action,
> +					   ARRAY_SIZE(action));
>  }
>  
>  static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
> @@ -2468,6 +2633,15 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
>  	}
>  }
>  
> +static void handle_multi_queue_secondary_sched_done(struct xe_guc *guc,
> +						    struct xe_exec_queue *q,
> +						    u32 runnable_state)
> +{
> +	mutex_lock(&guc->ct.lock);

I don't think you need the CT lock here. This per-queue state which
should be safe to modify without the any lock. The CT lock never
protects queue state, we just happen to have it in G2H responses because
of how the CT layer works.

> +	handle_sched_done(guc, q, runnable_state);
> +	mutex_unlock(&guc->ct.lock);
> +}
> +
>  int xe_guc_sched_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
>  {
>  	struct xe_exec_queue *q;
> @@ -2672,6 +2846,44 @@ int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 le
>  	return 0;
>  }
>  
> +/**
> + * xe_guc_exec_queue_cgp_sync_done_handler - CGP synchronization done handler
> + * @guc: guc
> + * @msg: message indicating CGP sync done
> + * @len: length of message
> + *
> + * Set multi queue group's sync_pending flag to false and wakeup anyone waiting
> + * for CGP synchronization to complete.
> + *
> + * Return: 0 on success, -EPROTO for malformed messages.
> + */
> +int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
> +{
> +	struct xe_device *xe = guc_to_xe(guc);
> +	struct xe_exec_queue *q;
> +	u32 guc_id = msg[0];
> +
> +	if (unlikely(len < 1)) {
> +		drm_err(&xe->drm, "Invalid CGP_SYNC_DONE length %u", len);
> +		return -EPROTO;
> +	}
> +
> +	q = g2h_exec_queue_lookup(guc, guc_id);
> +	if (unlikely(!q))
> +		return -EPROTO;
> +
> +	if (!xe_exec_queue_is_multi_queue_primary(q)) {
> +		drm_err(&xe->drm, "Unexpected CGP_SYNC_DONE response");
> +		return -EPROTO;
> +	}
> +
> +	/* Wakeup the serialized cgp update wait */
> +	WRITE_ONCE(q->multi_queue.group->sync_pending, false);

So here - I suspect we need to associate the CGP_SYNC_DONE with a
secondary queue state tracking in order to get VF migration to work.
Again we can figure his part of a bit later but should be considered.

Matt

> +	wake_up_all(&guc->ct.wq);
> +
> +	return 0;
> +}
> +
>  static void
>  guc_exec_queue_wq_snapshot_capture(struct xe_exec_queue *q,
>  				   struct xe_guc_submit_exec_queue_snapshot *snapshot)
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
> index b49a2748ec46..abfa94bce391 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.h
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.h
> @@ -34,6 +34,7 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
>  					       u32 len);
>  int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 len);
>  int xe_guc_error_capture_handler(struct xe_guc *guc, u32 *msg, u32 len);
> +int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
>  
>  struct xe_guc_submit_exec_queue_snapshot *
>  xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q);
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 08/16] drm/xe/multi_queue: Add multi queue information to guc_info dump
  2025-10-31 18:29 ` [PATCH 08/16] drm/xe/multi_queue: Add multi queue information to guc_info dump Niranjana Vishwanathapura
@ 2025-11-01 18:31   ` Matthew Brost
  2025-11-03  1:15     ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 61+ messages in thread
From: Matthew Brost @ 2025-11-01 18:31 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Fri, Oct 31, 2025 at 11:29:28AM -0700, Niranjana Vishwanathapura wrote:
> Dump multi queue specific information in the guc exec queue
> dump.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_guc_submit.c       | 10 ++++++++++
>  drivers/gpu/drm/xe/xe_guc_submit_types.h | 13 +++++++++++++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 426b64ef8d99..b84a0be2eefe 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -3032,6 +3032,11 @@ xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q)
>  	if (snapshot->parallel_execution)
>  		guc_exec_queue_wq_snapshot_capture(q, snapshot);
>  
> +	snapshot->is_multi_queue = xe_exec_queue_is_multi_queue(q);
> +	if (snapshot->is_multi_queue) {
> +		snapshot->multi_queue.primary = xe_exec_queue_multi_queue_primary(q)->guc->id;
> +		snapshot->multi_queue.pos = q->multi_queue.pos;
> +	}
>  	spin_lock(&sched->base.job_list_lock);
>  	snapshot->pending_list_size = list_count_nodes(&sched->base.pending_list);
>  	snapshot->pending_list = kmalloc_array(snapshot->pending_list_size,
> @@ -3114,6 +3119,11 @@ xe_guc_exec_queue_snapshot_print(struct xe_guc_submit_exec_queue_snapshot *snaps
>  	if (snapshot->parallel_execution)
>  		guc_exec_queue_wq_snapshot_print(snapshot, p);
>  
> +	if (snapshot->is_multi_queue) {
> +		drm_printf(p, "\tMulti queue primary GuC ID: %d\n", snapshot->multi_queue.primary);
> +		drm_printf(p, "\tMulti queue position: %d\n", snapshot->multi_queue.pos);
> +	}
> +
>  	for (i = 0; snapshot->pending_list && i < snapshot->pending_list_size;
>  	     i++)
>  		drm_printf(p, "\tJob: seqno=%d, fence=%d, finished=%d\n",
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit_types.h b/drivers/gpu/drm/xe/xe_guc_submit_types.h
> index dc7456c34583..20dddf50d802 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit_types.h
> +++ b/drivers/gpu/drm/xe/xe_guc_submit_types.h
> @@ -135,6 +135,19 @@ struct xe_guc_submit_exec_queue_snapshot {
>  		u32 wq[WQ_SIZE / sizeof(u32)];
>  	} parallel;
>  
> +	/** @is_multi_queue: The exec queue is part of a multi queue group */
> +	bool is_multi_queue;

I'd stick this in the sub-structure.

Otherwise LGTM.

With this nit fixed:
Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> +	/** @multi_queue: snapshot of the multi queue information */
> +	struct {
> +		/**
> +		 * @multi_queue.primary: GuC id of the primary exec queue
> +		 * of the multi queue group.
> +		 */
> +		u32 primary;
> +		/** @multi_queue.pos: Position of the exec queue within the multi queue group */
> +		u8 pos;
> +	} multi_queue;
> +
>  	/** @pending_list_size: Size of the pending list snapshot array */
>  	int pending_list_size;
>  	/** @pending_list: snapshot of the pending list info */
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 12/16] drm/xe/multi_queue: Tracepoint support
  2025-10-31 18:29 ` [PATCH 12/16] drm/xe/multi_queue: Tracepoint support Niranjana Vishwanathapura
@ 2025-11-01 18:32   ` Matthew Brost
  0 siblings, 0 replies; 61+ messages in thread
From: Matthew Brost @ 2025-11-01 18:32 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Fri, Oct 31, 2025 at 11:29:32AM -0700, Niranjana Vishwanathapura wrote:
> Add xe_exec_queue_create_multi_queue event with
> multi-queue information.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_guc_submit.c |  5 +++-
>  drivers/gpu/drm/xe/xe_trace.h      | 41 ++++++++++++++++++++++++++++++
>  2 files changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 605352145d76..bc7296edf1be 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -2001,7 +2001,10 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
>  		mutex_unlock(&group->list_lock);
>  	}
>  
> -	trace_xe_exec_queue_create(q);
> +	if (xe_exec_queue_is_multi_queue(q))
> +		trace_xe_exec_queue_create_multi_queue(q);
> +	else
> +		trace_xe_exec_queue_create(q);
>  
>  	return 0;
>  
> diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
> index c9d0748dae9d..6d12fcc13f43 100644
> --- a/drivers/gpu/drm/xe/xe_trace.h
> +++ b/drivers/gpu/drm/xe/xe_trace.h
> @@ -13,6 +13,7 @@
>  #include <linux/types.h>
>  
>  #include "xe_exec_queue_types.h"
> +#include "xe_exec_queue.h"
>  #include "xe_gpu_scheduler_types.h"
>  #include "xe_gt_types.h"
>  #include "xe_guc_exec_queue_types.h"
> @@ -97,11 +98,51 @@ DECLARE_EVENT_CLASS(xe_exec_queue,
>  			      __entry->guc_state, __entry->flags)
>  );
>  
> +DECLARE_EVENT_CLASS(xe_exec_queue_multi_queue,
> +		    TP_PROTO(struct xe_exec_queue *q),
> +		    TP_ARGS(q),
> +
> +		    TP_STRUCT__entry(
> +			     __string(dev, __dev_name_eq(q))
> +			     __field(enum xe_engine_class, class)
> +			     __field(u32, logical_mask)
> +			     __field(u8, gt_id)
> +			     __field(u16, width)
> +			     __field(u32, guc_id)
> +			     __field(u32, guc_state)
> +			     __field(u32, flags)
> +			     __field(u32, primary)
> +			     ),
> +
> +		    TP_fast_assign(
> +			   __assign_str(dev);
> +			   __entry->class = q->class;
> +			   __entry->logical_mask = q->logical_mask;
> +			   __entry->gt_id = q->gt->info.id;
> +			   __entry->width = q->width;
> +			   __entry->guc_id = q->guc->id;
> +			   __entry->guc_state = atomic_read(&q->guc->state);
> +			   __entry->flags = q->flags;
> +			   __entry->primary = xe_exec_queue_multi_queue_primary(q)->guc->id;
> +			   ),
> +
> +		    TP_printk("dev=%s, %d:0x%x, gt=%d, width=%d guc_id=%d, guc_state=0x%x, flags=0x%x, primary=%d",
> +			      __get_str(dev), __entry->class, __entry->logical_mask,
> +			      __entry->gt_id, __entry->width, __entry->guc_id,
> +			      __entry->guc_state, __entry->flags,
> +			      __entry->primary)
> +);
> +
>  DEFINE_EVENT(xe_exec_queue, xe_exec_queue_create,
>  	     TP_PROTO(struct xe_exec_queue *q),
>  	     TP_ARGS(q)
>  );
>  
> +DEFINE_EVENT(xe_exec_queue_multi_queue, xe_exec_queue_create_multi_queue,
> +	     TP_PROTO(struct xe_exec_queue *q),
> +	     TP_ARGS(q)
> +);
> +
>  DEFINE_EVENT(xe_exec_queue, xe_exec_queue_supress_resume,
>  	     TP_PROTO(struct xe_exec_queue *q),
>  	     TP_ARGS(q)
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/16] drm/xe/multi_queue: Add support for multi queue dynamic priority change
  2025-10-31 18:29 ` [PATCH 07/16] drm/xe/multi_queue: Add support for multi queue dynamic priority change Niranjana Vishwanathapura
@ 2025-11-01 23:23   ` Matthew Brost
  2025-11-03 18:06     ` Niranjana Vishwanathapura
  2025-11-01 23:41   ` Matthew Brost
  1 sibling, 1 reply; 61+ messages in thread
From: Matthew Brost @ 2025-11-01 23:23 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Fri, Oct 31, 2025 at 11:29:27AM -0700, Niranjana Vishwanathapura wrote:
> Support dynamic priority change for multi queue group queues via
> exec queue set_property ioctl. Issue CGP_SYNC command to GuC through
> the drm scheduler message interface for priority to take effect.
> 
> Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_exec_queue.c       | 12 ++++-
>  drivers/gpu/drm/xe/xe_exec_queue_types.h |  3 ++
>  drivers/gpu/drm/xe/xe_guc_submit.c       | 56 ++++++++++++++++++++++--
>  3 files changed, 65 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 0264cab00fd4..98f8f1c7f13b 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -729,9 +729,13 @@ static int exec_queue_set_multi_queue_priority(struct xe_device *xe, struct xe_e
>  	if (XE_IOCTL_DBG(xe, value > XE_MULTI_QUEUE_PRIORITY_HIGH))
>  		return -EINVAL;
>  
> -	q->multi_queue.priority = value;
> +	/* For queue creation time (!q->xef) setting, just store the priority value */
> +	if (!q->xef) {
> +		q->multi_queue.priority = value;
> +		return 0;
> +	}
>  
> -	return 0;
> +	return q->ops->set_multi_queue_priority(q, value);
>  }
>  
>  typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
> @@ -760,6 +764,10 @@ int xe_exec_queue_set_property_ioctl(struct drm_device *dev, void *data,
>  	if (XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
>  		return -EINVAL;
>  
> +	if (XE_IOCTL_DBG(xe, args->property !=
> +			 DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY))
> +		return -EINVAL;

This check looks to be in the wrong place. There are other valid
properties, e.g., DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE, etc...

I think CI is complaining about this too.

> +
>  	q = xe_exec_queue_lookup(xef, args->exec_queue_id);
>  	if (XE_IOCTL_DBG(xe, !q))
>  		return -ENOENT;
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index 964a0e6654c7..dcb55b069ed8 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -241,6 +241,9 @@ struct xe_exec_queue_ops {
>  	int (*set_timeslice)(struct xe_exec_queue *q, u32 timeslice_us);
>  	/** @set_preempt_timeout: Set preemption timeout for exec queue */
>  	int (*set_preempt_timeout)(struct xe_exec_queue *q, u32 preempt_timeout_us);
> +	/** @set_multi_queue_priority: Set multi queue priority */
> +	int (*set_multi_queue_priority)(struct xe_exec_queue *q,
> +					enum xe_multi_queue_priority priority);
>  	/**
>  	 * @suspend: Suspend exec queue from executing, allowed to be called
>  	 * multiple times in a row before resume with the caveat that
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 5ec144c1c2dc..426b64ef8d99 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -1761,10 +1761,32 @@ static void __guc_exec_queue_process_msg_resume(struct xe_sched_msg *msg)
>  	}
>  }
>  
> -#define CLEANUP		1	/* Non-zero values to catch uninitialized msg */
> -#define SET_SCHED_PROPS	2
> -#define SUSPEND		3
> -#define RESUME		4
> +static void __guc_exec_queue_process_msg_set_multi_queue_priority(struct xe_sched_msg *msg)
> +{
> +	struct xe_exec_queue *q = msg->private_data;
> +
> +	if (guc_exec_queue_allowed_to_change_state(q)) {
> +#define MAX_MULTI_QUEUE_REG_SIZE        (2)
> +		struct xe_guc *guc = exec_queue_to_guc(q);
> +		struct xe_exec_queue_group *group = q->multi_queue.group;
> +		u32 action[MAX_MULTI_QUEUE_REG_SIZE];
> +		int len = 0;
> +
> +		action[len++] = XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC;
> +		action[len++] = group->primary->guc->id;
> +#undef MAX_MULTI_QUEUE_REG_SIZE
> +
> +		xe_guc_exec_queue_group_cgp_sync(guc, q, action, len);

Just to confirm - even though multi-group priority isn't part of CGP
rathe the LRC, a CGP_SYNC is needed?

> +	}
> +
> +	kfree(msg);
> +}
> +
> +#define CLEANUP				1	/* Non-zero values to catch uninitialized msg */
> +#define SET_SCHED_PROPS			2
> +#define SUSPEND				3
> +#define RESUME				4
> +#define SET_MULTI_QUEUE_PRIORITY	5
>  #define OPCODE_MASK	0xf
>  #define MSG_LOCKED	BIT(8)
>  #define MSG_HEAD	BIT(9)
> @@ -1788,6 +1810,9 @@ static void guc_exec_queue_process_msg(struct xe_sched_msg *msg)
>  	case RESUME:
>  		__guc_exec_queue_process_msg_resume(msg);
>  		break;
> +	case SET_MULTI_QUEUE_PRIORITY:
> +		__guc_exec_queue_process_msg_set_multi_queue_priority(msg);
> +		break;
>  	default:
>  		XE_WARN_ON("Unknown message type");
>  	}
> @@ -2004,6 +2029,28 @@ static int guc_exec_queue_set_preempt_timeout(struct xe_exec_queue *q,
>  	return 0;
>  }
>  
> +static int guc_exec_queue_set_multi_queue_priority(struct xe_exec_queue *q,
> +						   enum xe_multi_queue_priority priority)
> +{
> +	struct xe_sched_msg *msg;
> +
> +	if (!xe_exec_queue_is_multi_queue(q))
> +		return -EINVAL;

Minor nit, I'd move this check to the caller as
xe_exec_queue_is_multi_queue is available there, then assert it is a
xe_exec_queue_is_multi_queue here.

Matt

> +
> +	if (q->multi_queue.priority == priority ||
> +	    exec_queue_killed_or_banned_or_wedged(q))
> +		return 0;
> +
> +	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
> +	if (!msg)
> +		return -ENOMEM;
> +
> +	q->multi_queue.priority = priority;
> +	guc_exec_queue_add_msg(q, msg, SET_MULTI_QUEUE_PRIORITY);
> +
> +	return 0;
> +}
> +
>  static int guc_exec_queue_suspend(struct xe_exec_queue *q)
>  {
>  	struct xe_gpu_scheduler *sched = &q->guc->sched;
> @@ -2095,6 +2142,7 @@ static const struct xe_exec_queue_ops guc_exec_queue_ops = {
>  	.set_priority = guc_exec_queue_set_priority,
>  	.set_timeslice = guc_exec_queue_set_timeslice,
>  	.set_preempt_timeout = guc_exec_queue_set_preempt_timeout,
> +	.set_multi_queue_priority = guc_exec_queue_set_multi_queue_priority,
>  	.suspend = guc_exec_queue_suspend,
>  	.suspend_wait = guc_exec_queue_suspend_wait,
>  	.resume = guc_exec_queue_resume,
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/16] drm/xe/multi_queue: Add support for multi queue dynamic priority change
  2025-10-31 18:29 ` [PATCH 07/16] drm/xe/multi_queue: Add support for multi queue dynamic priority change Niranjana Vishwanathapura
  2025-11-01 23:23   ` Matthew Brost
@ 2025-11-01 23:41   ` Matthew Brost
  2025-11-03 18:14     ` Niranjana Vishwanathapura
  1 sibling, 1 reply; 61+ messages in thread
From: Matthew Brost @ 2025-11-01 23:41 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Fri, Oct 31, 2025 at 11:29:27AM -0700, Niranjana Vishwanathapura wrote:
> Support dynamic priority change for multi queue group queues via
> exec queue set_property ioctl. Issue CGP_SYNC command to GuC through
> the drm scheduler message interface for priority to take effect.
> 
> Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_exec_queue.c       | 12 ++++-
>  drivers/gpu/drm/xe/xe_exec_queue_types.h |  3 ++
>  drivers/gpu/drm/xe/xe_guc_submit.c       | 56 ++++++++++++++++++++++--
>  3 files changed, 65 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 0264cab00fd4..98f8f1c7f13b 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -729,9 +729,13 @@ static int exec_queue_set_multi_queue_priority(struct xe_device *xe, struct xe_e
>  	if (XE_IOCTL_DBG(xe, value > XE_MULTI_QUEUE_PRIORITY_HIGH))
>  		return -EINVAL;
>  
> -	q->multi_queue.priority = value;
> +	/* For queue creation time (!q->xef) setting, just store the priority value */
> +	if (!q->xef) {
> +		q->multi_queue.priority = value;
> +		return 0;
> +	}

I also don't love this check here as if exec queue creation order
changes, this code breaks. I'm pretty sure you can just delete this and
send on the message to the backend given
guc_exec_queue_allowed_to_change_state check will change the backend op to
a NOP.

Matt

>  
> -	return 0;
> +	return q->ops->set_multi_queue_priority(q, value);
>  }
>  
>  typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
> @@ -760,6 +764,10 @@ int xe_exec_queue_set_property_ioctl(struct drm_device *dev, void *data,
>  	if (XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
>  		return -EINVAL;
>  
> +	if (XE_IOCTL_DBG(xe, args->property !=
> +			 DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY))
> +		return -EINVAL;
> +
>  	q = xe_exec_queue_lookup(xef, args->exec_queue_id);
>  	if (XE_IOCTL_DBG(xe, !q))
>  		return -ENOENT;
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index 964a0e6654c7..dcb55b069ed8 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -241,6 +241,9 @@ struct xe_exec_queue_ops {
>  	int (*set_timeslice)(struct xe_exec_queue *q, u32 timeslice_us);
>  	/** @set_preempt_timeout: Set preemption timeout for exec queue */
>  	int (*set_preempt_timeout)(struct xe_exec_queue *q, u32 preempt_timeout_us);
> +	/** @set_multi_queue_priority: Set multi queue priority */
> +	int (*set_multi_queue_priority)(struct xe_exec_queue *q,
> +					enum xe_multi_queue_priority priority);
>  	/**
>  	 * @suspend: Suspend exec queue from executing, allowed to be called
>  	 * multiple times in a row before resume with the caveat that
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 5ec144c1c2dc..426b64ef8d99 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -1761,10 +1761,32 @@ static void __guc_exec_queue_process_msg_resume(struct xe_sched_msg *msg)
>  	}
>  }
>  
> -#define CLEANUP		1	/* Non-zero values to catch uninitialized msg */
> -#define SET_SCHED_PROPS	2
> -#define SUSPEND		3
> -#define RESUME		4
> +static void __guc_exec_queue_process_msg_set_multi_queue_priority(struct xe_sched_msg *msg)
> +{
> +	struct xe_exec_queue *q = msg->private_data;
> +
> +	if (guc_exec_queue_allowed_to_change_state(q)) {
> +#define MAX_MULTI_QUEUE_REG_SIZE        (2)
> +		struct xe_guc *guc = exec_queue_to_guc(q);
> +		struct xe_exec_queue_group *group = q->multi_queue.group;
> +		u32 action[MAX_MULTI_QUEUE_REG_SIZE];
> +		int len = 0;
> +
> +		action[len++] = XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC;
> +		action[len++] = group->primary->guc->id;
> +#undef MAX_MULTI_QUEUE_REG_SIZE
> +
> +		xe_guc_exec_queue_group_cgp_sync(guc, q, action, len);
> +	}
> +
> +	kfree(msg);
> +}
> +
> +#define CLEANUP				1	/* Non-zero values to catch uninitialized msg */
> +#define SET_SCHED_PROPS			2
> +#define SUSPEND				3
> +#define RESUME				4
> +#define SET_MULTI_QUEUE_PRIORITY	5
>  #define OPCODE_MASK	0xf
>  #define MSG_LOCKED	BIT(8)
>  #define MSG_HEAD	BIT(9)
> @@ -1788,6 +1810,9 @@ static void guc_exec_queue_process_msg(struct xe_sched_msg *msg)
>  	case RESUME:
>  		__guc_exec_queue_process_msg_resume(msg);
>  		break;
> +	case SET_MULTI_QUEUE_PRIORITY:
> +		__guc_exec_queue_process_msg_set_multi_queue_priority(msg);
> +		break;
>  	default:
>  		XE_WARN_ON("Unknown message type");
>  	}
> @@ -2004,6 +2029,28 @@ static int guc_exec_queue_set_preempt_timeout(struct xe_exec_queue *q,
>  	return 0;
>  }
>  
> +static int guc_exec_queue_set_multi_queue_priority(struct xe_exec_queue *q,
> +						   enum xe_multi_queue_priority priority)
> +{
> +	struct xe_sched_msg *msg;
> +
> +	if (!xe_exec_queue_is_multi_queue(q))
> +		return -EINVAL;
> +
> +	if (q->multi_queue.priority == priority ||
> +	    exec_queue_killed_or_banned_or_wedged(q))
> +		return 0;
> +
> +	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
> +	if (!msg)
> +		return -ENOMEM;
> +
> +	q->multi_queue.priority = priority;
> +	guc_exec_queue_add_msg(q, msg, SET_MULTI_QUEUE_PRIORITY);
> +
> +	return 0;
> +}
> +
>  static int guc_exec_queue_suspend(struct xe_exec_queue *q)
>  {
>  	struct xe_gpu_scheduler *sched = &q->guc->sched;
> @@ -2095,6 +2142,7 @@ static const struct xe_exec_queue_ops guc_exec_queue_ops = {
>  	.set_priority = guc_exec_queue_set_priority,
>  	.set_timeslice = guc_exec_queue_set_timeslice,
>  	.set_preempt_timeout = guc_exec_queue_set_preempt_timeout,
> +	.set_multi_queue_priority = guc_exec_queue_set_multi_queue_priority,
>  	.suspend = guc_exec_queue_suspend,
>  	.suspend_wait = guc_exec_queue_suspend_wait,
>  	.resume = guc_exec_queue_resume,
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/16] drm/xe/multi_queue: Add multi queue priority property
  2025-10-31 18:29 ` [PATCH 04/16] drm/xe/multi_queue: Add multi queue priority property Niranjana Vishwanathapura
@ 2025-11-01 23:59   ` Matthew Brost
  2025-11-03  4:45     ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 61+ messages in thread
From: Matthew Brost @ 2025-11-01 23:59 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Fri, Oct 31, 2025 at 11:29:24AM -0700, Niranjana Vishwanathapura wrote:
> Add support for queues of a multi queue group to set
> their priority within the queue group by adding property
> DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY.
> This is the only other property supported by secondary
> queues of a multi queue group, other than
> DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_exec_queue.c       | 17 ++++++++++++-
>  drivers/gpu/drm/xe/xe_exec_queue_types.h |  8 ++++++
>  drivers/gpu/drm/xe/xe_guc_submit.c       |  1 +
>  drivers/gpu/drm/xe/xe_lrc.c              | 32 ++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_lrc.h              |  5 ++++
>  include/uapi/drm/xe_drm.h                |  3 +++
>  6 files changed, 65 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 86404a7c9fe4..0da256428916 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -177,6 +177,7 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
>  	INIT_LIST_HEAD(&q->multi_gt_link);
>  	INIT_LIST_HEAD(&q->hw_engine_group_link);
>  	INIT_LIST_HEAD(&q->pxp.link);
> +	q->multi_queue.priority = XE_MULTI_QUEUE_PRIORITY_NORMAL;
>  
>  	q->sched_props.timeslice_us = hwe->eclass->sched_props.timeslice_us;
>  	q->sched_props.preempt_timeout_us =
> @@ -722,6 +723,17 @@ static int exec_queue_set_multi_group(struct xe_device *xe, struct xe_exec_queue
>  	return xe_exec_queue_group_validate(xe, q, value);
>  }
>  
> +static int exec_queue_set_multi_queue_priority(struct xe_device *xe, struct xe_exec_queue *q,
> +					       u64 value)
> +{
> +	if (XE_IOCTL_DBG(xe, value > XE_MULTI_QUEUE_PRIORITY_HIGH))
> +		return -EINVAL;
> +
> +	q->multi_queue.priority = value;
> +
> +	return 0;
> +}
> +
>  typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
>  					     struct xe_exec_queue *q,
>  					     u64 value);
> @@ -731,6 +743,8 @@ static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE] = exec_queue_set_timeslice,
>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE] = exec_queue_set_pxp_type,
>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP] = exec_queue_set_multi_group,
> +	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY] =
> +							exec_queue_set_multi_queue_priority,
>  };
>  
>  static int exec_queue_user_ext_set_property(struct xe_device *xe,
> @@ -752,7 +766,8 @@ static int exec_queue_user_ext_set_property(struct xe_device *xe,
>  	    XE_IOCTL_DBG(xe, ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY &&
>  			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE &&
>  			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE &&
> -			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP))
> +			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP &&
> +			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY))
>  		return -EINVAL;
>  
>  	idx = array_index_nospec(ext.property, ARRAY_SIZE(exec_queue_set_property_funcs));
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index 38e47b003259..964a0e6654c7 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -31,6 +31,12 @@ enum xe_exec_queue_priority {
>  	XE_EXEC_QUEUE_PRIORITY_COUNT
>  };
>  
> +enum xe_multi_queue_priority {
> +	XE_MULTI_QUEUE_PRIORITY_LOW = 0,
> +	XE_MULTI_QUEUE_PRIORITY_NORMAL,
> +	XE_MULTI_QUEUE_PRIORITY_HIGH,
> +};

Kernel doc.

> +
>  /**
>   * struct xe_exec_queue_group - Execution multi queue group
>   *
> @@ -134,6 +140,8 @@ struct xe_exec_queue {
>  	struct {
>  		/** @multi_queue.group: Queue group information */
>  		struct xe_exec_queue_group *group;
> +		/** @multi_queue.priority: Queue priority within the multi-queue group */
> +		enum xe_multi_queue_priority priority;
>  		/** @multi_queue.pos: Position of queue within the multi-queue group */
>  		u8 pos;
>  		/** @multi_queue.valid: Queue belongs to a multi queue group */
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index d2aa9a2524e7..5ec144c1c2dc 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -634,6 +634,7 @@ static void xe_guc_exec_queue_group_cgp_sync(struct xe_guc *guc,
>  		return;
>  	}
>  
> +	xe_lrc_set_multi_queue_priority(q->lrc[0], q->multi_queue.priority);
>  	xe_guc_exec_queue_group_cgp_update(xe, q);
>  
>  	WRITE_ONCE(group->sync_pending, true);
> diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
> index b5083c99dd50..45fc5bc5de5c 100644
> --- a/drivers/gpu/drm/xe/xe_lrc.c
> +++ b/drivers/gpu/drm/xe/xe_lrc.c
> @@ -44,6 +44,11 @@
>  #define LRC_INDIRECT_CTX_BO_SIZE		SZ_4K
>  #define LRC_INDIRECT_RING_STATE_SIZE		SZ_4K
>  
> +#define LRC_PRIORITY				GENMASK_ULL(10, 9)
> +#define LRC_PRIORITY_LOW			0
> +#define LRC_PRIORITY_NORMAL			1
> +#define LRC_PRIORITY_HIGH			2
> +
>  /*
>   * Layout of the LRC and associated data allocated as
>   * lrc->bo:
> @@ -1386,6 +1391,33 @@ setup_indirect_ctx(struct xe_lrc *lrc, struct xe_hw_engine *hwe)
>  	return 0;
>  }
>  
> +static u8 xe_multi_queue_prio_to_lrc(struct xe_lrc *lrc, enum xe_multi_queue_priority priority)
> +{
> +	struct xe_device *xe = gt_to_xe(lrc->gt);
> +
> +	/* xe_multi_queue_priority is directly mapped to LRC priority values */
> +	if (priority >= XE_MULTI_QUEUE_PRIORITY_LOW &&
> +	    priority <= XE_MULTI_QUEUE_PRIORITY_HIGH)
> +		return priority;

You santize at the IOCTL layer, so an assert here would be preferred.

> +
> +	/* Fallback to NORMAL if out of range */
> +	drm_warn(&xe->drm, "Unknown multi queue priority: %d, defaulting to NORMAL\n", priority);
> +	return LRC_PRIORITY_NORMAL;
> +}
> +
> +/**
> + * xe_lrc_set_multi_queue_priority() - Set multi queue priority in LRC
> + * @lrc: Logical Ring Context
> + * @priority: Multi queue priority of the exec queue
> + *
> + * Convert @priority to LRC multi queue priority and update the @lrc descriptor
> + */
> +void xe_lrc_set_multi_queue_priority(struct xe_lrc *lrc, enum xe_multi_queue_priority priority)
> +{
> +	lrc->desc &= ~LRC_PRIORITY;
> +	lrc->desc |= FIELD_PREP(LRC_PRIORITY, xe_multi_queue_prio_to_lrc(lrc, priority));
> +}
> +
>  static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
>  		       struct xe_vm *vm, u32 ring_size, u16 msix_vec,
>  		       u32 init_flags)
> diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h
> index 2fb628da5c43..3e6b356e0d1c 100644
> --- a/drivers/gpu/drm/xe/xe_lrc.h
> +++ b/drivers/gpu/drm/xe/xe_lrc.h
> @@ -8,11 +8,14 @@
>  #include <linux/types.h>
>  
>  #include "xe_lrc_types.h"
> +#include "xe_exec_queue_types.h"
>  
>  struct drm_printer;
>  struct xe_bb;
>  struct xe_device;
>  struct xe_exec_queue;
> +enum xe_exec_queue_priority;

Never needed in this file.

> +enum xe_multi_queue_priority;

No need to forward declare if xe_exec_queue_types.h is included. If this
compiles without "xe_exec_queue_types.h", please drop that include. If
it doesn't compile, drop this forward declaration.

>  enum xe_engine_class;
>  struct xe_gt;
>  struct xe_hw_engine;
> @@ -133,6 +136,8 @@ void xe_lrc_dump_default(struct drm_printer *p,
>  
>  u32 *xe_lrc_emit_hwe_state_instructions(struct xe_exec_queue *q, u32 *cs);
>  
> +void xe_lrc_set_multi_queue_priority(struct xe_lrc *lrc, enum xe_multi_queue_priority priority);
> +
>  struct xe_lrc_snapshot *xe_lrc_snapshot_capture(struct xe_lrc *lrc);
>  void xe_lrc_snapshot_capture_delayed(struct xe_lrc_snapshot *snapshot);
>  void xe_lrc_snapshot_print(struct xe_lrc_snapshot *snapshot, struct drm_printer *p);
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index d903b3a55ec1..8ab44413646a 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -1258,6 +1258,8 @@ struct drm_xe_vm_bind {
>   *    then a new multi-queue group is created with this queue as the primary queue
>   *    (Q0). Otherwise, the queue gets added to the multi-queue group whose primary
>   *    queue id is specified in the 'value' field.
> + *  - %DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY - Set the queue
> + *    priority within the multi-queue group.

Should the valid values be in uAPI as defines? At the minimum the valid
values should be mentioned in the kernel doc.

Matt

>   *
>   * The example below shows how to use @drm_xe_exec_queue_create to create
>   * a simple exec_queue (no parallel submission) of class
> @@ -1300,6 +1302,7 @@ struct drm_xe_exec_queue_create {
>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE		2
>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP		3
>  #define     DRM_XE_MULTI_GROUP_CREATE				(1ull << 63)
> +#define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY	4
>  	/** @extensions: Pointer to the first extension struct, if any */
>  	__u64 extensions;
>  
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/16] drm/xe/multi_queue: Add multi_queue_enable_mask to gt information
  2025-10-31 18:29 ` [PATCH 01/16] drm/xe/multi_queue: Add multi_queue_enable_mask to gt information Niranjana Vishwanathapura
@ 2025-11-02  0:01   ` Matthew Brost
  2025-11-03  1:25     ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 61+ messages in thread
From: Matthew Brost @ 2025-11-02  0:01 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Fri, Oct 31, 2025 at 11:29:21AM -0700, Niranjana Vishwanathapura wrote:
> Add multi_queue_enable_mask field to the gt information structure
> which is bitmask of all engine classes with multi queue support
> enabled.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_debugfs.c   | 2 ++
>  drivers/gpu/drm/xe/xe_gt_types.h  | 5 +++++
>  drivers/gpu/drm/xe/xe_pci.c       | 1 +
>  drivers/gpu/drm/xe/xe_pci_types.h | 1 +
>  4 files changed, 9 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
> index e91da9589c5f..34460d7ef71c 100644
> --- a/drivers/gpu/drm/xe/xe_debugfs.c
> +++ b/drivers/gpu/drm/xe/xe_debugfs.c
> @@ -93,6 +93,8 @@ static int info(struct seq_file *m, void *data)
>  			   xe_force_wake_ref(gt_to_fw(gt), XE_FW_GT));
>  		drm_printf(&p, "gt%d engine_mask 0x%llx\n", id,
>  			   gt->info.engine_mask);
> +		drm_printf(&p, "gt%d multi_queue_enable_mask 0x%x\n", id,
> +			   gt->info.multi_queue_enable_mask);
>  	}
>  
>  	xe_pm_runtime_put(xe);
> diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
> index 0b525643a048..4a18bf772b22 100644
> --- a/drivers/gpu/drm/xe/xe_gt_types.h
> +++ b/drivers/gpu/drm/xe/xe_gt_types.h
> @@ -140,6 +140,11 @@ struct xe_gt {
>  		u64 engine_mask;
>  		/** @info.gmdid: raw GMD_ID value from hardware */
>  		u32 gmdid;
> +		/**
> +		 * @multi_queue_enable_mask: Bitmask of engine classes with
> +		 * multi queue support enabled.
> +		 */
> +		u16 multi_queue_enable_mask;

s/multi_queue_enable_mask/multi_queue_class_enable_mask ?

Matt

>  		/** @info.id: Unique ID of this GT within the PCI Device */
>  		u8 id;
>  		/** @info.has_indirect_ring_state: GT has indirect ring state support */
> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> index 6e59642e7820..b5eaf0fc105c 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -754,6 +754,7 @@ static struct xe_gt *alloc_primary_gt(struct xe_tile *tile,
>  	gt->info.type = XE_GT_TYPE_MAIN;
>  	gt->info.id = tile->id * xe->info.max_gt_per_tile;
>  	gt->info.has_indirect_ring_state = graphics_desc->has_indirect_ring_state;
> +	gt->info.multi_queue_enable_mask = graphics_desc->multi_queue_enable_mask;
>  	gt->info.engine_mask = graphics_desc->hw_engine_mask;
>  
>  	/*
> diff --git a/drivers/gpu/drm/xe/xe_pci_types.h b/drivers/gpu/drm/xe/xe_pci_types.h
> index 9892c063a9c5..77e09a53da64 100644
> --- a/drivers/gpu/drm/xe/xe_pci_types.h
> +++ b/drivers/gpu/drm/xe/xe_pci_types.h
> @@ -58,6 +58,7 @@ struct xe_device_desc {
>  
>  struct xe_graphics_desc {
>  	u64 hw_engine_mask;	/* hardware engines provided by graphics IP */
> +	u16 multi_queue_enable_mask; /* bitmask of engine classes which support multi queue */
>  
>  	u8 has_asid:1;
>  	u8 has_atomic_enable_pte_bit:1;
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 16/16] drm/xe/multi_queue: Enable multi_queue on xe3p_xpc
  2025-10-31 18:29 ` [PATCH 16/16] drm/xe/multi_queue: Enable multi_queue on xe3p_xpc Niranjana Vishwanathapura
@ 2025-11-02  0:05   ` Matthew Brost
  0 siblings, 0 replies; 61+ messages in thread
From: Matthew Brost @ 2025-11-02  0:05 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Fri, Oct 31, 2025 at 11:29:36AM -0700, Niranjana Vishwanathapura wrote:
> xe3p_xpc supports multi_queue, enable it.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_pci.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> index b5eaf0fc105c..43f1cc47b9b3 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -111,6 +111,7 @@ static const struct xe_graphics_desc graphics_xe3p_xpc = {
>  	.hw_engine_mask =
>  		GENMASK(XE_HW_ENGINE_BCS8, XE_HW_ENGINE_BCS1) |
>  		GENMASK(XE_HW_ENGINE_CCS3, XE_HW_ENGINE_CCS0),
> +	.multi_queue_enable_mask = BIT(XE_ENGINE_CLASS_COPY) | BIT(XE_ENGINE_CLASS_COMPUTE),
>  };
>  
>  static const struct xe_media_desc media_xem = {
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/16] drm/xe/multi_queue: Add user interface for multi queue support
  2025-10-31 18:29 ` [PATCH 02/16] drm/xe/multi_queue: Add user interface for multi queue support Niranjana Vishwanathapura
  2025-10-31 19:31   ` Matthew Brost
@ 2025-11-02  0:23   ` Matthew Brost
  2025-11-03 22:59     ` Niranjana Vishwanathapura
  2025-11-02 17:37   ` Matthew Brost
  2 siblings, 1 reply; 61+ messages in thread
From: Matthew Brost @ 2025-11-02  0:23 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Fri, Oct 31, 2025 at 11:29:22AM -0700, Niranjana Vishwanathapura wrote:
> Multi Queue is a new mode of execution supported by the compute and
> blitter copy command streamers (CCS and BCS, respectively). It is an
> enhancement of the existing hardware architecture and leverages the
> same submission model. It enables support for efficient, parallel
> execution of multiple queues within a single context. All the queues
> of a group must use the same address space (VM).
> 
> The new DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE execution queue
> property supports creating a multi queue group and adding queues to
> a queue group. All queues of a multi queue group share the same
> context.
> 
> A exec queue create ioctl call with above property specified with value
> DRM_XE_SUPER_GROUP_CREATE will create a new multi queue group with the
> queue being created as the primary queue (aka q0) of the group. To add
> secondary queues to the group, they need to be created with the above
> property with id of the primary queue as the value. The properties of
> the primary queue (like priority, timeslice) applies to the whole group.
> So, these properties can't be set for secondary queues of a group.
> 
> Once destroyed, the secondary queues of a multi queue group can't be
> replaced. However, they can be dynamically added to the group up to a
> total of 64 queues per group. Once the primary queue is destroyed,
> secondary queues can't be added to the queue group.
> 
> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_exec_queue.c       | 191 ++++++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_exec_queue.h       |  47 ++++++
>  drivers/gpu/drm/xe/xe_exec_queue_types.h |  30 ++++
>  include/uapi/drm/xe_drm.h                |   8 +
>  4 files changed, 274 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 1b57d7c2cc94..86404a7c9fe4 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -12,6 +12,7 @@
>  #include <drm/drm_file.h>
>  #include <uapi/drm/xe_drm.h>
>  
> +#include "xe_bo.h"
>  #include "xe_dep_scheduler.h"
>  #include "xe_device.h"
>  #include "xe_gt.h"
> @@ -62,6 +63,32 @@ enum xe_exec_queue_sched_prop {
>  static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q,
>  				      u64 extensions, int ext_number);
>  
> +static void xe_exec_queue_group_cleanup(struct xe_exec_queue *q)
> +{
> +	struct xe_exec_queue_group *group = q->multi_queue.group;
> +	struct xe_lrc *lrc;
> +	unsigned long idx;
> +
> +	if (xe_exec_queue_is_multi_queue_secondary(q)) {
> +		xe_exec_queue_put(xe_exec_queue_multi_queue_primary(q));
> +		return;
> +	}
> +
> +	if (!group)
> +		return;
> +
> +	/* Primary queue cleanup */
> +	mutex_lock(&group->lock);
> +	xa_for_each(&group->xa, idx, lrc)
> +		xe_lrc_put(lrc);
> +	mutex_unlock(&group->lock);
> +
> +	xa_destroy(&group->xa);
> +	mutex_destroy(&group->lock);
> +	xe_bo_unpin_map_no_vm(group->cgp_bo);
> +	kfree(group);
> +}
> +
>  static void __xe_exec_queue_free(struct xe_exec_queue *q)
>  {
>  	int i;
> @@ -72,6 +99,10 @@ static void __xe_exec_queue_free(struct xe_exec_queue *q)
>  
>  	if (xe_exec_queue_uses_pxp(q))
>  		xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp, q);
> +
> +	if (xe_exec_queue_is_multi_queue(q))
> +		xe_exec_queue_group_cleanup(q);
> +
>  	if (q->vm)
>  		xe_vm_put(q->vm);
>  
> @@ -549,6 +580,148 @@ exec_queue_set_pxp_type(struct xe_device *xe, struct xe_exec_queue *q, u64 value
>  	return xe_pxp_exec_queue_set_type(xe->pxp, q, DRM_XE_PXP_TYPE_HWDRM);
>  }
>  
> +static int xe_exec_queue_group_init(struct xe_device *xe, struct xe_exec_queue *q)
> +{
> +	struct xe_tile *tile = gt_to_tile(q->gt);
> +	struct xe_exec_queue_group *group;
> +	struct xe_bo *bo;
> +
> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> +	if (!group)
> +		return -ENOMEM;
> +
> +	bo = xe_bo_create_pin_map_novm(xe, tile, SZ_4K, ttm_bo_type_kernel,
> +				       XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> +				       XE_BO_FLAG_GGTT, false);
> +	if (IS_ERR(bo)) {
> +		drm_err(&xe->drm, "CGP bo allocation for queue group failed: %ld\n",
> +			PTR_ERR(bo));
> +		kfree(group);
> +		return PTR_ERR(bo);
> +	}
> +
> +	xe_map_memset(xe, &bo->vmap, 0, 0, SZ_4K);
> +
> +	group->primary = q;
> +	group->cgp_bo = bo;
> +	xa_init_flags(&group->xa, XA_FLAGS_ALLOC1);
> +	mutex_init(&group->lock);
> +	mutex_init(&group->list_lock);
> +	q->multi_queue.group = group;
> +
> +	return 0;
> +}
> +
> +static inline bool xe_exec_queue_supports_multi_queue(struct xe_exec_queue *q)
> +{
> +	return q->gt->info.multi_queue_enable_mask & BIT(q->class);
> +}
> +
> +static int xe_exec_queue_group_validate(struct xe_device *xe, struct xe_exec_queue *q,
> +					u32 primary_id)
> +{
> +	struct xe_exec_queue_group *group;
> +	struct xe_exec_queue *primary;
> +	int ret;
> +
> +	primary = xe_exec_queue_lookup(q->vm->xef, primary_id);
> +	if (XE_IOCTL_DBG(xe, !primary))
> +		return -ENOENT;
> +
> +	if (XE_IOCTL_DBG(xe, !xe_exec_queue_is_multi_queue_primary(primary)) ||
> +	    XE_IOCTL_DBG(xe, q->vm != primary->vm) ||
> +	    XE_IOCTL_DBG(xe, q->logical_mask != primary->logical_mask)) {
> +		ret = -EINVAL;
> +		goto put_primary;
> +	}
> +
> +	group = primary->multi_queue.group;
> +	q->multi_queue.valid = true;
> +	q->multi_queue.group = group;
> +
> +	return 0;
> +put_primary:
> +	xe_exec_queue_put(primary);
> +	return ret;
> +}
> +
> +#define XE_MAX_GROUP_SIZE	64
> +static int xe_exec_queue_group_add(struct xe_device *xe, struct xe_exec_queue *q)
> +{
> +	struct xe_exec_queue_group *group = q->multi_queue.group;
> +	u32 pos;
> +	int err;
> +
> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
> +		return 0;
> +
> +	mutex_lock(&group->lock);
> +	err = xa_alloc(&group->xa, &pos, xe_lrc_get(q->lrc[0]),
> +		       XA_LIMIT(1, XE_MAX_GROUP_SIZE - 1), GFP_KERNEL);

Not a complete, just another few quick comment.

I see the documentation patches mention the ref counting scheme but it
is easy to over look that. Can we we have a quick inline indicating
primary holds a reference to secondary queues LRCs?

Matt

> +	if (XE_IOCTL_DBG(xe, err)) {
> +		xe_lrc_put(q->lrc[0]);
> +		mutex_unlock(&group->lock);
> +
> +		/* It is invalid if queue group limit is exceeded */
> +		if (err == -EBUSY)
> +			err = -EINVAL;
> +
> +		return err;
> +	}
> +
> +	q->multi_queue.pos = pos;
> +	mutex_unlock(&group->lock);
> +
> +	return 0;
> +}
> +
> +static void xe_exec_queue_group_delete(struct xe_exec_queue *q)
> +{
> +	struct xe_exec_queue_group *group = q->multi_queue.group;
> +	struct xe_lrc *lrc;
> +
> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
> +		return;
> +
> +	mutex_lock(&group->lock);
> +	lrc = xa_erase(&group->xa, q->multi_queue.pos);
> +	if (lrc)
> +		xe_lrc_put(lrc);
> +	mutex_unlock(&group->lock);
> +}
> +
> +static int exec_queue_set_multi_group(struct xe_device *xe, struct xe_exec_queue *q,
> +				      u64 value)
> +{
> +	if (XE_IOCTL_DBG(xe, !xe_exec_queue_supports_multi_queue(q)))
> +		return -ENODEV;
> +
> +	if (XE_IOCTL_DBG(xe, !xe_device_uc_enabled(xe)))
> +		return -EOPNOTSUPP;
> +
> +	if (XE_IOCTL_DBG(xe, xe_exec_queue_is_parallel(q)))
> +		return -EINVAL;
> +
> +	if (XE_IOCTL_DBG(xe, xe_exec_queue_is_multi_queue(q)))
> +		return -EINVAL;
> +
> +	if (value & DRM_XE_MULTI_GROUP_CREATE) {
> +		if (XE_IOCTL_DBG(xe, value & ~DRM_XE_MULTI_GROUP_CREATE))
> +			return -EINVAL;
> +
> +		q->multi_queue.valid = true;
> +		q->multi_queue.is_primary = true;
> +		q->multi_queue.pos = 0;
> +		return 0;
> +	}
> +
> +	/* While adding secondary queues, the upper 32 bits must be 0 */
> +	if (XE_IOCTL_DBG(xe, value & (~0ull << 32)))
> +		return -EINVAL;
> +
> +	return xe_exec_queue_group_validate(xe, q, value);
> +}
> +
>  typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
>  					     struct xe_exec_queue *q,
>  					     u64 value);
> @@ -557,6 +730,7 @@ static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY] = exec_queue_set_priority,
>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE] = exec_queue_set_timeslice,
>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE] = exec_queue_set_pxp_type,
> +	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP] = exec_queue_set_multi_group,
>  };
>  
>  static int exec_queue_user_ext_set_property(struct xe_device *xe,
> @@ -577,7 +751,8 @@ static int exec_queue_user_ext_set_property(struct xe_device *xe,
>  	    XE_IOCTL_DBG(xe, ext.pad) ||
>  	    XE_IOCTL_DBG(xe, ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY &&
>  			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE &&
> -			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE))
> +			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE &&
> +			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP))
>  		return -EINVAL;
>  
>  	idx = array_index_nospec(ext.property, ARRAY_SIZE(exec_queue_set_property_funcs));
> @@ -626,6 +801,12 @@ static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue
>  		return exec_queue_user_extensions(xe, q, ext.next_extension,
>  						  ++ext_number);
>  
> +	if (xe_exec_queue_is_multi_queue_primary(q)) {
> +		err = xe_exec_queue_group_init(xe, q);
> +		if (XE_IOCTL_DBG(xe, err))
> +			return err;
> +	}
> +
>  	return 0;
>  }
>  
> @@ -780,12 +961,16 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>  		if (IS_ERR(q))
>  			return PTR_ERR(q);
>  
> +		err = xe_exec_queue_group_add(xe, q);
> +		if (XE_IOCTL_DBG(xe, err))
> +			goto put_exec_queue;
> +
>  		if (xe_vm_in_preempt_fence_mode(vm)) {
>  			q->lr.context = dma_fence_context_alloc(1);
>  
>  			err = xe_vm_add_compute_exec_queue(vm, q);
>  			if (XE_IOCTL_DBG(xe, err))
> -				goto put_exec_queue;
> +				goto delete_queue_group;
>  		}
>  
>  		if (q->vm && q->hwe->hw_engine_group) {
> @@ -808,6 +993,8 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>  
>  kill_exec_queue:
>  	xe_exec_queue_kill(q);
> +delete_queue_group:
> +	xe_exec_queue_group_delete(q);
>  put_exec_queue:
>  	xe_exec_queue_put(q);
>  	return err;
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
> index a4dfbe858bda..8cd6487018fa 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> @@ -62,6 +62,53 @@ static inline bool xe_exec_queue_uses_pxp(struct xe_exec_queue *q)
>  	return q->pxp.type;
>  }
>  
> +/**
> + * xe_exec_queue_is_multi_queue() - Whether an exec_queue is part of a queue group.
> + * @q: The exec_queue
> + *
> + * Return: True if the exec_queue is part of a queue group, false otherwise.
> + */
> +static inline bool xe_exec_queue_is_multi_queue(struct xe_exec_queue *q)
> +{
> +	return q->multi_queue.valid;
> +}
> +
> +/**
> + * xe_exec_queue_is_multi_queue_primary() - Whether an exec_queue is primary queue
> + * of a multi queue group.
> + * @q: The exec_queue
> + *
> + * Return: True if @q is primary queue of a queue group, false otherwise.
> + */
> +static inline bool xe_exec_queue_is_multi_queue_primary(struct xe_exec_queue *q)
> +{
> +	return q->multi_queue.is_primary;
> +}
> +
> +/**
> + * xe_exec_queue_is_multi_queue_secondary() - Whether an exec_queue is secondary queue
> + * of a multi queue group.
> + * @q: The exec_queue
> + *
> + * Return: True if @q is secondary queue of a queue group, false otherwise.
> + */
> +static inline bool xe_exec_queue_is_multi_queue_secondary(struct xe_exec_queue *q)
> +{
> +	return xe_exec_queue_is_multi_queue(q) && !q->multi_queue.is_primary;
> +}
> +
> +/**
> + * xe_exec_queue_multi_queue_primary() - Get multi queue group's primary queue
> + * @q: The exec_queue
> + *
> + * If @q belongs to a multi queue group, then the primary queue of the group will
> + * be returned. Otherwise, @q will be returned.
> + */
> +static inline struct xe_exec_queue *xe_exec_queue_multi_queue_primary(struct xe_exec_queue *q)
> +{
> +	return xe_exec_queue_is_multi_queue(q) ? q->multi_queue.group->primary : q;
> +}
> +
>  bool xe_exec_queue_is_lr(struct xe_exec_queue *q);
>  
>  bool xe_exec_queue_is_idle(struct xe_exec_queue *q);
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index c8807268ec6c..3856776df5c4 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -31,6 +31,24 @@ enum xe_exec_queue_priority {
>  	XE_EXEC_QUEUE_PRIORITY_COUNT
>  };
>  
> +/**
> + * struct xe_exec_queue_group - Execution multi queue group
> + *
> + * Contains multi queue group information.
> + */
> +struct xe_exec_queue_group {
> +	/** @primary: Primary queue of this group */
> +	struct xe_exec_queue *primary;
> +	/** @lock: Queue group update lock */
> +	struct mutex lock;
> +	/** @cgp_bo: BO for the Context Group Page */
> +	struct xe_bo *cgp_bo;
> +	/** @xa: xarray to store LRCs */
> +	struct xarray xa;
> +	/** @list_lock: Secondary queue list lock */
> +	struct mutex list_lock;
> +};
> +
>  /**
>   * struct xe_exec_queue - Execution queue
>   *
> @@ -110,6 +128,18 @@ struct xe_exec_queue {
>  		struct xe_guc_exec_queue *guc;
>  	};
>  
> +	/** @multi_queue: Multi queue information */
> +	struct {
> +		/** @multi_queue.group: Queue group information */
> +		struct xe_exec_queue_group *group;
> +		/** @multi_queue.pos: Position of queue within the multi-queue group */
> +		u8 pos;
> +		/** @multi_queue.valid: Queue belongs to a multi queue group */
> +		u8 valid:1;
> +		/** @multi_queue.is_primary: Is primary queue (Q0) of the group */
> +		u8 is_primary:1;
> +	} multi_queue;
> +
>  	/** @sched_props: scheduling properties */
>  	struct {
>  		/** @sched_props.timeslice_us: timeslice period in micro-seconds */
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index 47853659a705..d903b3a55ec1 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -1252,6 +1252,12 @@ struct drm_xe_vm_bind {
>   *    Given that going into a power-saving state kills PXP HWDRM sessions,
>   *    runtime PM will be blocked while queues of this type are alive.
>   *    All PXP queues will be killed if a PXP invalidation event occurs.
> + *  - %DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP - Create a multi-queue group
> + *    or add secondary queues to a multi-queue group.
> + *    If the extension's 'value' field has %DRM_XE_MULTI_GROUP_CREATE flag set,
> + *    then a new multi-queue group is created with this queue as the primary queue
> + *    (Q0). Otherwise, the queue gets added to the multi-queue group whose primary
> + *    queue id is specified in the 'value' field.
>   *
>   * The example below shows how to use @drm_xe_exec_queue_create to create
>   * a simple exec_queue (no parallel submission) of class
> @@ -1292,6 +1298,8 @@ struct drm_xe_exec_queue_create {
>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY		0
>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE		1
>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE		2
> +#define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP		3
> +#define     DRM_XE_MULTI_GROUP_CREATE				(1ull << 63)
>  	/** @extensions: Pointer to the first extension struct, if any */
>  	__u64 extensions;
>  
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 09/16] drm/xe/multi_queue: Handle tearing down of a multi queue
  2025-10-31 18:29 ` [PATCH 09/16] drm/xe/multi_queue: Handle tearing down of a multi queue Niranjana Vishwanathapura
@ 2025-11-02  0:39   ` Matthew Brost
  2025-11-04  3:35     ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 61+ messages in thread
From: Matthew Brost @ 2025-11-02  0:39 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Fri, Oct 31, 2025 at 11:29:29AM -0700, Niranjana Vishwanathapura wrote:
> As all queues of a multi queue group use the primary queue of the group
> to interface with GuC. Hence there is a dependency between the queues of
> the group. So, when primary queue of a multi queue group is cleaned up,
> also trigger a cleanup of the secondary queues. During cleanup, stop and
> re-start submission for all queues of a multi queue group to avoid any
> submission happening in parallel when a queue is being cleaned up.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_exec_queue.c       |   2 +
>  drivers/gpu/drm/xe/xe_exec_queue_types.h |   4 +
>  drivers/gpu/drm/xe/xe_guc_submit.c       | 150 +++++++++++++++++++----
>  3 files changed, 134 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 98f8f1c7f13b..3c1bb4f10fd5 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -85,6 +85,7 @@ static void xe_exec_queue_group_cleanup(struct xe_exec_queue *q)
>  
>  	xa_destroy(&group->xa);
>  	mutex_destroy(&group->lock);
> +	mutex_destroy(&group->list_lock);

You init this lock in eariler patch but destroy it in this. Can we get
the init/destroy/instanaition in a single patch? 

>  	xe_bo_unpin_map_no_vm(group->cgp_bo);
>  	kfree(group);
>  }
> @@ -605,6 +606,7 @@ static int xe_exec_queue_group_init(struct xe_device *xe, struct xe_exec_queue *
>  
>  	group->primary = q;
>  	group->cgp_bo = bo;
> +	INIT_LIST_HEAD(&group->list);
>  	xa_init_flags(&group->xa, XA_FLAGS_ALLOC1);
>  	mutex_init(&group->lock);
>  	mutex_init(&group->list_lock);

group->list_lock is taken in the submission backend, which is entirely
in the path reclain. Can we teach lockdep this lock is in the path of
reclaim?

e.g.,

fs_reclaim_acquire(GFP_KERNEL);
might_lock(&group->list_lock);
fs_reclaim_release(GFP_KERNEL);

> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index dcb55b069ed8..e64b6588923e 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -51,6 +51,8 @@ struct xe_exec_queue_group {
>  	struct xe_bo *cgp_bo;
>  	/** @xa: xarray to store LRCs */
>  	struct xarray xa;
> +	/** @list: List of all secondary queues in the group */
> +	struct list_head list;
>  	/** @list_lock: Secondary queue list lock */
>  	struct mutex list_lock;
>  	/** @sync_pending: CGP_SYNC_DONE g2h response pending */
> @@ -140,6 +142,8 @@ struct xe_exec_queue {
>  	struct {
>  		/** @multi_queue.group: Queue group information */
>  		struct xe_exec_queue_group *group;
> +		/** @multi_queue.link: Link into group's secondary queues list */
> +		struct list_head link;
>  		/** @multi_queue.priority: Queue priority within the multi-queue group */
>  		enum xe_multi_queue_priority priority;
>  		/** @multi_queue.pos: Position of queue within the multi-queue group */
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index b84a0be2eefe..87c13feb2cef 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -920,6 +920,81 @@ static void wq_item_append(struct xe_exec_queue *q)
>  	parallel_write(xe, map, wq_desc.tail, q->guc->wqi_tail);
>  }
>  
> +static void xe_guc_exec_queue_submission_start(struct xe_exec_queue *q)
> +{
> +	/*
> +	 * If the exec queue is part of a multi queue group, then start submission
> +	 * on all queues of the multi queue group.
> +	 */
> +	if (xe_exec_queue_is_multi_queue(q)) {
> +		struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
> +		struct xe_exec_queue_group *group = q->multi_queue.group;
> +		struct xe_exec_queue *eq;
> +
> +		xe_sched_submission_start(&primary->guc->sched);
> +
> +		mutex_lock(&group->list_lock);
> +		list_for_each_entry(eq, &group->list, multi_queue.link)
> +			xe_sched_submission_start(&eq->guc->sched);
> +		mutex_unlock(&group->list_lock);
> +	} else {
> +		xe_sched_submission_start(&q->guc->sched);
> +	}
> +}
> +
> +static void xe_guc_exec_queue_submission_stop(struct xe_exec_queue *q)
> +{
> +	/*
> +	 * If the exec queue is part of a multi queue group, then stop submission
> +	 * on all queues of the multi queue group.
> +	 */
> +	if (xe_exec_queue_is_multi_queue(q)) {
> +		struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
> +		struct xe_exec_queue_group *group = q->multi_queue.group;
> +		struct xe_exec_queue *eq;
> +
> +		xe_sched_submission_stop(&primary->guc->sched);
> +
> +		mutex_lock(&group->list_lock);
> +		list_for_each_entry(eq, &group->list, multi_queue.link)
> +			xe_sched_submission_stop(&eq->guc->sched);
> +		mutex_unlock(&group->list_lock);
> +	} else {
> +		xe_sched_submission_stop(&q->guc->sched);
> +	}
> +}
> +
> +static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
> +{
> +	struct xe_guc *guc = exec_queue_to_guc(q);
> +	struct xe_device *xe = guc_to_xe(guc);
> +
> +	/** to wakeup xe_wait_user_fence ioctl if exec queue is reset */
> +	wake_up_all(&xe->ufence_wq);
> +
> +	if (xe_exec_queue_is_lr(q))
> +		queue_work(guc_to_gt(guc)->ordered_wq, &q->guc->lr_tdr);
> +	else
> +		xe_sched_tdr_queue_imm(&q->guc->sched);
> +}
> +
> +static void xe_guc_exec_queue_trigger_secondary_cleanup(struct xe_exec_queue *q)
> +{
> +	struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
> +	struct xe_exec_queue_group *group = q->multi_queue.group;
> +	struct xe_exec_queue *eq;
> +
> +	mutex_lock(&group->list_lock);
> +	list_for_each_entry(eq, &group->list, multi_queue.link) {
> +		if (exec_queue_reset(primary))

Do we need to propagate banned or killed?

Also happens if secondary queue is reset or a job times out? Does that
affect any of the other LRCs in the group?

> +			set_exec_queue_reset(eq);
> +
> +		if (!exec_queue_banned(eq))
> +			xe_guc_exec_queue_trigger_cleanup(eq);
> +	}
> +	mutex_unlock(&group->list_lock);
> +}
> +
>  #define RESUME_PENDING	~0x0ull
>  static void submit_exec_queue(struct xe_exec_queue *q, struct xe_sched_job *job)
>  {
> @@ -1098,20 +1173,6 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
>  			       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
>  }
>  
> -static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
> -{
> -	struct xe_guc *guc = exec_queue_to_guc(q);
> -	struct xe_device *xe = guc_to_xe(guc);
> -
> -	/** to wakeup xe_wait_user_fence ioctl if exec queue is reset */
> -	wake_up_all(&xe->ufence_wq);
> -
> -	if (xe_exec_queue_is_lr(q))
> -		queue_work(guc_to_gt(guc)->ordered_wq, &q->guc->lr_tdr);
> -	else
> -		xe_sched_tdr_queue_imm(&q->guc->sched);
> -}
> -
>  /**
>   * xe_guc_submit_wedge() - Wedge GuC submission
>   * @guc: the GuC object
> @@ -1185,8 +1246,12 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
>  	if (!exec_queue_killed(q))
>  		wedged = guc_submit_hint_wedged(exec_queue_to_guc(q));
>  
> -	/* Kill the run_job / process_msg entry points */
> -	xe_sched_submission_stop(sched);
> +	/*
> +	 * Kill the run_job / process_msg entry points.
> +	 * As this function is serialized across exec queues, it is safe to
> +	 * stop and restart submission on all queues of a multi queue group.
> +	 */
> +	xe_guc_exec_queue_submission_stop(q);
>  
>  	/*
>  	 * Engine state now mostly stable, disable scheduling / deregister if
> @@ -1222,7 +1287,7 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
>  				   q->guc->id);
>  			xe_devcoredump(q, NULL, "Schedule disable failed to respond, guc_id=%d\n",
>  				       q->guc->id);
> -			xe_sched_submission_start(sched);
> +			xe_guc_exec_queue_submission_start(q);
>  			xe_gt_reset_async(q->gt);
>  			return;
>  		}
> @@ -1233,7 +1298,11 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
>  
>  	xe_hw_fence_irq_stop(q->fence_irq);
>  
> -	xe_sched_submission_start(sched);
> +	xe_guc_exec_queue_submission_start(q);
> +
> +	/* Trigger cleanup of secondary queues of multi queue group */
> +	if (xe_exec_queue_is_multi_queue_primary(q))
> +		xe_guc_exec_queue_trigger_secondary_cleanup(q);
>  
>  	spin_lock(&sched->base.job_list_lock);
>  	list_for_each_entry(job, &sched->base.pending_list, drm.list)
> @@ -1392,8 +1461,12 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>  	    vf_recovery(guc))
>  		return DRM_GPU_SCHED_STAT_NO_HANG;
>  
> -	/* Kill the run_job entry point */
> -	xe_sched_submission_stop(sched);
> +	/*
> +	 * Kill the run_job entry point.
> +	 * As this function is serialized across exec queues, it is safe to
> +	 * stop and restart submission on all queues of a multi queue group.
> +	 */
> +	xe_guc_exec_queue_submission_stop(q);
>  

I'd know where to stick this comment, but disable_scheduling() looks
like a pure software things for secondary queues. We currently need the
LRC to not be running to accurately sample the timestamp - I think we
could fix that part, Umesh would likely know for sure. But until then
I'm pretty sure we'd need to disable scheduling on primary for an
accurate sample of the secondaries queue's LRC timestamp.

>  	/* Must check all state after stopping scheduler */
>  	skip_timeout_check = exec_queue_reset(q) ||
> @@ -1552,7 +1625,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>  	 * fences that are complete
>  	 */
>  	xe_sched_add_pending_job(sched, job);
> -	xe_sched_submission_start(sched);
> +	xe_guc_exec_queue_submission_start(q);
>  
>  	xe_guc_exec_queue_trigger_cleanup(q);
>  
> @@ -1565,6 +1638,10 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>  	/* Start fence signaling */
>  	xe_hw_fence_irq_start(q->fence_irq);
>  
> +	/* Trigger cleanup of secondary queues of multi queue group */
> +	if (xe_exec_queue_is_multi_queue_primary(q))
> +		xe_guc_exec_queue_trigger_secondary_cleanup(q);
> +

I'd stick this part by xe_guc_exec_queue_trigger_cleanup above.

>  	return DRM_GPU_SCHED_STAT_RESET;
>  
>  sched_enable:
> @@ -1576,7 +1653,11 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>  	 * but there is not currently an easy way to do in DRM scheduler. With
>  	 * some thought, do this in a follow up.
>  	 */
> -	xe_sched_submission_start(sched);
> +	xe_guc_exec_queue_submission_start(q);
> +
> +	/* Trigger cleanup of secondary queues of multi queue group */
> +	if (xe_exec_queue_is_multi_queue_primary(q))
> +		xe_guc_exec_queue_trigger_secondary_cleanup(q);

I don't think you need to trigger a cleanup here - this is no hang
situation, rather a false timeout.

Matt

>  handle_vf_resume:
>  	return DRM_GPU_SCHED_STAT_NO_HANG;
>  }
> @@ -1607,6 +1688,14 @@ static void __guc_exec_queue_destroy_async(struct work_struct *w)
>  	xe_pm_runtime_get(guc_to_xe(guc));
>  	trace_xe_exec_queue_destroy(q);
>  
> +	if (xe_exec_queue_is_multi_queue_secondary(q)) {
> +		struct xe_exec_queue_group *group = q->multi_queue.group;
> +
> +		mutex_lock(&group->list_lock);
> +		list_del(&q->multi_queue.link);
> +		mutex_unlock(&group->list_lock);
> +	}
> +
>  	if (xe_exec_queue_is_lr(q))
>  		cancel_work_sync(&ge->lr_tdr);
>  	/* Confirm no work left behind accessing device structures */
> @@ -1897,6 +1986,19 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
>  
>  	xe_exec_queue_assign_name(q, q->guc->id);
>  
> +	/*
> +	 * Maintain secondary queues of the multi queue group in a list
> +	 * for handling dependencies across the queues in the group.
> +	 */
> +	if (xe_exec_queue_is_multi_queue_secondary(q)) {
> +		struct xe_exec_queue_group *group = q->multi_queue.group;
> +
> +		INIT_LIST_HEAD(&q->multi_queue.link);
> +		mutex_lock(&group->list_lock);
> +		list_add_tail(&q->multi_queue.link, &group->list);
> +		mutex_unlock(&group->list_lock);
> +	}
> +
>  	trace_xe_exec_queue_create(q);
>  
>  	return 0;
> @@ -2125,6 +2227,10 @@ static void guc_exec_queue_resume(struct xe_exec_queue *q)
>  
>  static bool guc_exec_queue_reset_status(struct xe_exec_queue *q)
>  {
> +	if (xe_exec_queue_is_multi_queue_secondary(q) &&
> +	    guc_exec_queue_reset_status(xe_exec_queue_multi_queue_primary(q)))
> +		return true;
> +
>  	return exec_queue_reset(q) || exec_queue_killed_or_banned_or_wedged(q);
>  }
>  
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/16] drm/xe/multi_queue: Add exec_queue set_property ioctl support
  2025-10-31 18:29 ` [PATCH 06/16] drm/xe/multi_queue: Add exec_queue set_property ioctl support Niranjana Vishwanathapura
@ 2025-11-02 16:53   ` Matthew Brost
  2025-11-03  1:49     ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 61+ messages in thread
From: Matthew Brost @ 2025-11-02 16:53 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Fri, Oct 31, 2025 at 11:29:26AM -0700, Niranjana Vishwanathapura wrote:
> This patch adds support for exec_queue set_property ioctl.
> It is derived from the original work which is part of
> https://patchwork.freedesktop.org/series/112188/
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_device.c     |  2 ++
>  drivers/gpu/drm/xe/xe_exec_queue.c | 31 ++++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_exec_queue.h |  2 ++
>  include/uapi/drm/xe_drm.h          | 24 +++++++++++++++++++++++
>  4 files changed, 59 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 47f5391ad8e9..0b496676527a 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -208,6 +208,8 @@ static const struct drm_ioctl_desc xe_ioctls[] = {
>  	DRM_IOCTL_DEF_DRV(XE_MADVISE, xe_vm_madvise_ioctl, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(XE_VM_QUERY_MEM_RANGE_ATTRS, xe_vm_query_vmas_attrs_ioctl,
>  			  DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(XE_EXEC_QUEUE_SET_PROPERTY, xe_exec_queue_set_property_ioctl,
> +			  DRM_RENDER_ALLOW),
>  };
>  
>  static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 78b3a0e2ddd3..0264cab00fd4 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -747,6 +747,37 @@ static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
>  							exec_queue_set_multi_queue_priority,
>  };
>  
> +int xe_exec_queue_set_property_ioctl(struct drm_device *dev, void *data,
> +				     struct drm_file *file)
> +{
> +	struct xe_device *xe = to_xe_device(dev);
> +	struct xe_file *xef = to_xe_file(file);
> +	struct drm_xe_exec_queue_set_property *args = data;
> +	struct xe_exec_queue *q;
> +	int ret;
> +	u32 idx;
> +
> +	if (XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
> +		return -EINVAL;
> +
> +	q = xe_exec_queue_lookup(xef, args->exec_queue_id);
> +	if (XE_IOCTL_DBG(xe, !q))
> +		return -ENOENT;
> +

I didn't realize this was new code, so my comment here [1] about
dropping the check around
DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY is wrong, rather
pull that check into this patch. 

[1] https://patchwork.freedesktop.org/patch/684856/?series=156865&rev=1#comment_1257589

> +	idx = array_index_nospec(args->property,
> +				 ARRAY_SIZE(exec_queue_set_property_funcs));
> +	ret = exec_queue_set_property_funcs[idx](xe, q, args->value);
> +	if (XE_IOCTL_DBG(xe, ret))
> +		goto err_post_lookup;
> +
> +	xe_exec_queue_put(q);
> +	return 0;
> +
> + err_post_lookup:
> +	xe_exec_queue_put(q);
> +	return ret;
> +}
> +
>  static int exec_queue_user_ext_check(struct xe_exec_queue *q, u64 properties)
>  {
>  	u64 secondary_queue_valid_props = BIT_ULL(DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP) |
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
> index 8cd6487018fa..61478b2e883b 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> @@ -121,6 +121,8 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
>  				struct drm_file *file);
>  int xe_exec_queue_get_property_ioctl(struct drm_device *dev, void *data,
>  				     struct drm_file *file);
> +int xe_exec_queue_set_property_ioctl(struct drm_device *dev, void *data,
> +				     struct drm_file *file);
>  enum xe_exec_queue_priority xe_exec_queue_device_get_max_priority(struct xe_device *xe);
>  
>  void xe_exec_queue_last_fence_put(struct xe_exec_queue *e, struct xe_vm *vm);
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index 8ab44413646a..d72151163e77 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -106,6 +106,7 @@ extern "C" {
>  #define DRM_XE_OBSERVATION		0x0b
>  #define DRM_XE_MADVISE			0x0c
>  #define DRM_XE_VM_QUERY_MEM_RANGE_ATTRS	0x0d
> +#define DRM_XE_EXEC_QUEUE_SET_PROPERTY	0x0e
>  
>  /* Must be kept compact -- no holes */
>  
> @@ -123,6 +124,7 @@ extern "C" {
>  #define DRM_IOCTL_XE_OBSERVATION		DRM_IOW(DRM_COMMAND_BASE + DRM_XE_OBSERVATION, struct drm_xe_observation_param)
>  #define DRM_IOCTL_XE_MADVISE			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_MADVISE, struct drm_xe_madvise)
>  #define DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS	DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_VM_QUERY_MEM_RANGE_ATTRS, struct drm_xe_vm_query_mem_range_attr)
> +#define DRM_IOCTL_XE_EXEC_QUEUE_SET_PROPERTY	DRM_IOW(DRM_COMMAND_BASE + DRM_XE_EXEC_QUEUE_SET_PROPERTY, struct drm_xe_exec_queue_set_property)
>  
>  /**
>   * DOC: Xe IOCTL Extensions
> @@ -2284,6 +2286,28 @@ struct drm_xe_vm_query_mem_range_attr {
>  
>  };
>  
> +/**
> + * struct drm_xe_exec_queue_set_property - exec queue set property
> + *
> + * Sets execution queue properties dynamically.

Mention which properties this IOCTL is valid for.

Matt

> + */
> +struct drm_xe_exec_queue_set_property {
> +	/** @extensions: Pointer to the first extension struct, if any */
> +	__u64 extensions;
> +
> +	/** @exec_queue_id: Exec queue ID */
> +	__u32 exec_queue_id;
> +
> +	/** @property: property to set */
> +	__u32 property;
> +
> +	/** @value: property value */
> +	__u64 value;
> +
> +	/** @reserved: Reserved */
> +	__u64 reserved[2];
> +};
> +
>  #if defined(__cplusplus)
>  }
>  #endif
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/16] drm/xe/multi_queue: Add user interface for multi queue support
  2025-10-31 18:29 ` [PATCH 02/16] drm/xe/multi_queue: Add user interface for multi queue support Niranjana Vishwanathapura
  2025-10-31 19:31   ` Matthew Brost
  2025-11-02  0:23   ` Matthew Brost
@ 2025-11-02 17:37   ` Matthew Brost
  2025-11-03 23:06     ` Niranjana Vishwanathapura
  2 siblings, 1 reply; 61+ messages in thread
From: Matthew Brost @ 2025-11-02 17:37 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Fri, Oct 31, 2025 at 11:29:22AM -0700, Niranjana Vishwanathapura wrote:
> Multi Queue is a new mode of execution supported by the compute and
> blitter copy command streamers (CCS and BCS, respectively). It is an
> enhancement of the existing hardware architecture and leverages the
> same submission model. It enables support for efficient, parallel
> execution of multiple queues within a single context. All the queues
> of a group must use the same address space (VM).
> 
> The new DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE execution queue
> property supports creating a multi queue group and adding queues to
> a queue group. All queues of a multi queue group share the same
> context.
> 
> A exec queue create ioctl call with above property specified with value
> DRM_XE_SUPER_GROUP_CREATE will create a new multi queue group with the
> queue being created as the primary queue (aka q0) of the group. To add
> secondary queues to the group, they need to be created with the above
> property with id of the primary queue as the value. The properties of
> the primary queue (like priority, timeslice) applies to the whole group.
> So, these properties can't be set for secondary queues of a group.
> 
> Once destroyed, the secondary queues of a multi queue group can't be
> replaced. However, they can be dynamically added to the group up to a
> total of 64 queues per group. Once the primary queue is destroyed,
> secondary queues can't be added to the queue group.
> 
> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_exec_queue.c       | 191 ++++++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_exec_queue.h       |  47 ++++++
>  drivers/gpu/drm/xe/xe_exec_queue_types.h |  30 ++++
>  include/uapi/drm/xe_drm.h                |   8 +
>  4 files changed, 274 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 1b57d7c2cc94..86404a7c9fe4 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -12,6 +12,7 @@
>  #include <drm/drm_file.h>
>  #include <uapi/drm/xe_drm.h>
>  
> +#include "xe_bo.h"
>  #include "xe_dep_scheduler.h"
>  #include "xe_device.h"
>  #include "xe_gt.h"
> @@ -62,6 +63,32 @@ enum xe_exec_queue_sched_prop {
>  static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q,
>  				      u64 extensions, int ext_number);
>  
> +static void xe_exec_queue_group_cleanup(struct xe_exec_queue *q)
> +{

A little incongruent with xe_exec_queue_group_add/delete as those
functions are blindly called + check if needed in function, compared to
this function checking at the caller if multi-queue. I don't have a huge
preference but I'd at least make the call semantics consistent.

> +	struct xe_exec_queue_group *group = q->multi_queue.group;
> +	struct xe_lrc *lrc;
> +	unsigned long idx;
> +
> +	if (xe_exec_queue_is_multi_queue_secondary(q)) {
> +		xe_exec_queue_put(xe_exec_queue_multi_queue_primary(q));

It took me a minute to figure out where the associated get on the
primary came from - it is from xe_exec_queue_lookup in
xe_exec_queue_group_validate. Can you comments along the lines:

/* Put pairs with get from ... */

/* Get pairs with put in ... */

> +		return;
> +	}
> +
> +	if (!group)
> +		return;
> +
> +	/* Primary queue cleanup */
> +	mutex_lock(&group->lock);

As discussedi [1], group->lock not needed.

[1] https://patchwork.freedesktop.org/patch/684847/?series=156865&rev=1#comment_1257408

> +	xa_for_each(&group->xa, idx, lrc)
> +		xe_lrc_put(lrc);
> +	mutex_unlock(&group->lock);
> +
> +	xa_destroy(&group->xa);
> +	mutex_destroy(&group->lock);
> +	xe_bo_unpin_map_no_vm(group->cgp_bo);
> +	kfree(group);
> +}
> +
>  static void __xe_exec_queue_free(struct xe_exec_queue *q)
>  {
>  	int i;
> @@ -72,6 +99,10 @@ static void __xe_exec_queue_free(struct xe_exec_queue *q)
>  
>  	if (xe_exec_queue_uses_pxp(q))
>  		xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp, q);
> +
> +	if (xe_exec_queue_is_multi_queue(q))
> +		xe_exec_queue_group_cleanup(q);
> +
>  	if (q->vm)
>  		xe_vm_put(q->vm);
>  
> @@ -549,6 +580,148 @@ exec_queue_set_pxp_type(struct xe_device *xe, struct xe_exec_queue *q, u64 value
>  	return xe_pxp_exec_queue_set_type(xe->pxp, q, DRM_XE_PXP_TYPE_HWDRM);
>  }
>  
> +static int xe_exec_queue_group_init(struct xe_device *xe, struct xe_exec_queue *q)
> +{
> +	struct xe_tile *tile = gt_to_tile(q->gt);
> +	struct xe_exec_queue_group *group;
> +	struct xe_bo *bo;
> +
> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
> +	if (!group)
> +		return -ENOMEM;
> +
> +	bo = xe_bo_create_pin_map_novm(xe, tile, SZ_4K, ttm_bo_type_kernel,
> +				       XE_BO_FLAG_VRAM_IF_DGFX(tile) |
> +				       XE_BO_FLAG_GGTT, false);

XE_BO_FLAG_GGTT_INVALIDATE | XE_BO_FLAG_PINNED_LATE_RESTORE are needed.

I believe XE_BO_FLAG_FORCE_USER_VRAM is needed too, that's new so not
100% sure but I'd check the git blame on that to figure out if it is
needed.

> +	if (IS_ERR(bo)) {
> +		drm_err(&xe->drm, "CGP bo allocation for queue group failed: %ld\n",
> +			PTR_ERR(bo));
> +		kfree(group);
> +		return PTR_ERR(bo);
> +	}
> +
> +	xe_map_memset(xe, &bo->vmap, 0, 0, SZ_4K);
> +
> +	group->primary = q;
> +	group->cgp_bo = bo;
> +	xa_init_flags(&group->xa, XA_FLAGS_ALLOC1);
> +	mutex_init(&group->lock);
> +	mutex_init(&group->list_lock);

See my comments here [1] the list lock being initialized here, but
used/destroyed in [1].

[1] https://patchwork.freedesktop.org/patch/684850/?series=156865&rev=1#comment_1257596

> +	q->multi_queue.group = group;
> +
> +	return 0;
> +}
> +
> +static inline bool xe_exec_queue_supports_multi_queue(struct xe_exec_queue *q)
> +{
> +	return q->gt->info.multi_queue_enable_mask & BIT(q->class);
> +}
> +
> +static int xe_exec_queue_group_validate(struct xe_device *xe, struct xe_exec_queue *q,
> +					u32 primary_id)
> +{
> +	struct xe_exec_queue_group *group;
> +	struct xe_exec_queue *primary;
> +	int ret;
> +
> +	primary = xe_exec_queue_lookup(q->vm->xef, primary_id);
> +	if (XE_IOCTL_DBG(xe, !primary))
> +		return -ENOENT;
> +
> +	if (XE_IOCTL_DBG(xe, !xe_exec_queue_is_multi_queue_primary(primary)) ||
> +	    XE_IOCTL_DBG(xe, q->vm != primary->vm) ||
> +	    XE_IOCTL_DBG(xe, q->logical_mask != primary->logical_mask)) {
> +		ret = -EINVAL;
> +		goto put_primary;
> +	}
> +
> +	group = primary->multi_queue.group;
> +	q->multi_queue.valid = true;
> +	q->multi_queue.group = group;
> +
> +	return 0;
> +put_primary:
> +	xe_exec_queue_put(primary);
> +	return ret;
> +}
> +
> +#define XE_MAX_GROUP_SIZE	64
> +static int xe_exec_queue_group_add(struct xe_device *xe, struct xe_exec_queue *q)
> +{
> +	struct xe_exec_queue_group *group = q->multi_queue.group;
> +	u32 pos;
> +	int err;
> +
> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
> +		return 0;
> +
> +	mutex_lock(&group->lock);
> +	err = xa_alloc(&group->xa, &pos, xe_lrc_get(q->lrc[0]),
> +		       XA_LIMIT(1, XE_MAX_GROUP_SIZE - 1), GFP_KERNEL);

To consolidate threads [2], add quick inline comments here around ref
counting.

[2] https://patchwork.freedesktop.org/patch/684847/?series=156865&rev=1#comment_1257594

> +	if (XE_IOCTL_DBG(xe, err)) {
> +		xe_lrc_put(q->lrc[0]);
> +		mutex_unlock(&group->lock);
> +
> +		/* It is invalid if queue group limit is exceeded */
> +		if (err == -EBUSY)
> +			err = -EINVAL;
> +
> +		return err;
> +	}
> +
> +	q->multi_queue.pos = pos;
> +	mutex_unlock(&group->lock);
> +
> +	return 0;
> +}
> +
> +static void xe_exec_queue_group_delete(struct xe_exec_queue *q)
> +{
> +	struct xe_exec_queue_group *group = q->multi_queue.group;
> +	struct xe_lrc *lrc;
> +
> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
> +		return;
> +
> +	mutex_lock(&group->lock);
> +	lrc = xa_erase(&group->xa, q->multi_queue.pos);
> +	if (lrc)
>

I think here it is an assert if lrc is NULL? I don't think it can be
NULL unless there is bug somewhere, right? If so, let's do an assert to
ensure software correctness.

 +		xe_lrc_put(lrc);
> +	mutex_unlock(&group->lock);
> +}
> +
> +static int exec_queue_set_multi_group(struct xe_device *xe, struct xe_exec_queue *q,
> +				      u64 value)
> +{
> +	if (XE_IOCTL_DBG(xe, !xe_exec_queue_supports_multi_queue(q)))
> +		return -ENODEV;
> +
> +	if (XE_IOCTL_DBG(xe, !xe_device_uc_enabled(xe)))
> +		return -EOPNOTSUPP;
> +
> +	if (XE_IOCTL_DBG(xe, xe_exec_queue_is_parallel(q)))
> +		return -EINVAL;
> +
> +	if (XE_IOCTL_DBG(xe, xe_exec_queue_is_multi_queue(q)))
> +		return -EINVAL;
> +
> +	if (value & DRM_XE_MULTI_GROUP_CREATE) {
> +		if (XE_IOCTL_DBG(xe, value & ~DRM_XE_MULTI_GROUP_CREATE))
> +			return -EINVAL;
> +
> +		q->multi_queue.valid = true;
> +		q->multi_queue.is_primary = true;
> +		q->multi_queue.pos = 0;
> +		return 0;
> +	}
> +
> +	/* While adding secondary queues, the upper 32 bits must be 0 */

State this in uAPI doc too.

> +	if (XE_IOCTL_DBG(xe, value & (~0ull << 32)))
> +		return -EINVAL;
> +
> +	return xe_exec_queue_group_validate(xe, q, value);
> +}
> +
>  typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
>  					     struct xe_exec_queue *q,
>  					     u64 value);
> @@ -557,6 +730,7 @@ static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY] = exec_queue_set_priority,
>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE] = exec_queue_set_timeslice,
>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE] = exec_queue_set_pxp_type,
> +	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP] = exec_queue_set_multi_group,
>  };
>  
>  static int exec_queue_user_ext_set_property(struct xe_device *xe,
> @@ -577,7 +751,8 @@ static int exec_queue_user_ext_set_property(struct xe_device *xe,
>  	    XE_IOCTL_DBG(xe, ext.pad) ||
>  	    XE_IOCTL_DBG(xe, ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY &&
>  			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE &&
> -			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE))
> +			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE &&
> +			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP))
>  		return -EINVAL;
>  
>  	idx = array_index_nospec(ext.property, ARRAY_SIZE(exec_queue_set_property_funcs));
> @@ -626,6 +801,12 @@ static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue
>  		return exec_queue_user_extensions(xe, q, ext.next_extension,
>  						  ++ext_number);
>  
> +	if (xe_exec_queue_is_multi_queue_primary(q)) {
> +		err = xe_exec_queue_group_init(xe, q);
> +		if (XE_IOCTL_DBG(xe, err))
> +			return err;
> +	}

Any particular reason this isn't in exec_queue_set_multi_group? Or
perhaps in xe_exec_queue_create_ioctl? It is bit goofy to have in a very
generic function here.

> +
>  	return 0;
>  }
>  
> @@ -780,12 +961,16 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>  		if (IS_ERR(q))
>  			return PTR_ERR(q);
>  
> +		err = xe_exec_queue_group_add(xe, q);
> +		if (XE_IOCTL_DBG(xe, err))
> +			goto put_exec_queue;
> +
>  		if (xe_vm_in_preempt_fence_mode(vm)) {
>  			q->lr.context = dma_fence_context_alloc(1);
>  
>  			err = xe_vm_add_compute_exec_queue(vm, q);
>  			if (XE_IOCTL_DBG(xe, err))
> -				goto put_exec_queue;
> +				goto delete_queue_group;
>  		}
>  
>  		if (q->vm && q->hwe->hw_engine_group) {
> @@ -808,6 +993,8 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>  
>  kill_exec_queue:
>  	xe_exec_queue_kill(q);
> +delete_queue_group:
> +	xe_exec_queue_group_delete(q);
>  put_exec_queue:
>  	xe_exec_queue_put(q);
>  	return err;
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
> index a4dfbe858bda..8cd6487018fa 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> @@ -62,6 +62,53 @@ static inline bool xe_exec_queue_uses_pxp(struct xe_exec_queue *q)
>  	return q->pxp.type;
>  }
>  
> +/**
> + * xe_exec_queue_is_multi_queue() - Whether an exec_queue is part of a queue group.
> + * @q: The exec_queue
> + *
> + * Return: True if the exec_queue is part of a queue group, false otherwise.
> + */
> +static inline bool xe_exec_queue_is_multi_queue(struct xe_exec_queue *q)
> +{
> +	return q->multi_queue.valid;
> +}
> +
> +/**
> + * xe_exec_queue_is_multi_queue_primary() - Whether an exec_queue is primary queue
> + * of a multi queue group.
> + * @q: The exec_queue
> + *
> + * Return: True if @q is primary queue of a queue group, false otherwise.
> + */
> +static inline bool xe_exec_queue_is_multi_queue_primary(struct xe_exec_queue *q)
> +{
> +	return q->multi_queue.is_primary;
> +}
> +
> +/**
> + * xe_exec_queue_is_multi_queue_secondary() - Whether an exec_queue is secondary queue
> + * of a multi queue group.
> + * @q: The exec_queue
> + *
> + * Return: True if @q is secondary queue of a queue group, false otherwise.
> + */
> +static inline bool xe_exec_queue_is_multi_queue_secondary(struct xe_exec_queue *q)
> +{
> +	return xe_exec_queue_is_multi_queue(q) && !q->multi_queue.is_primary;


&& !xe_exec_queue_is_multi_queue_primary()

> +}
> +
> +/**
> + * xe_exec_queue_multi_queue_primary() - Get multi queue group's primary queue
> + * @q: The exec_queue
> + *
> + * If @q belongs to a multi queue group, then the primary queue of the group will
> + * be returned. Otherwise, @q will be returned.
> + */
> +static inline struct xe_exec_queue *xe_exec_queue_multi_queue_primary(struct xe_exec_queue *q)
> +{
> +	return xe_exec_queue_is_multi_queue(q) ? q->multi_queue.group->primary : q;
> +}
> +
>  bool xe_exec_queue_is_lr(struct xe_exec_queue *q);
>  
>  bool xe_exec_queue_is_idle(struct xe_exec_queue *q);
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index c8807268ec6c..3856776df5c4 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -31,6 +31,24 @@ enum xe_exec_queue_priority {
>  	XE_EXEC_QUEUE_PRIORITY_COUNT
>  };
>  
> +/**
> + * struct xe_exec_queue_group - Execution multi queue group
> + *
> + * Contains multi queue group information.
> + */
> +struct xe_exec_queue_group {
> +	/** @primary: Primary queue of this group */
> +	struct xe_exec_queue *primary;
> +	/** @lock: Queue group update lock */
> +	struct mutex lock;
> +	/** @cgp_bo: BO for the Context Group Page */
> +	struct xe_bo *cgp_bo;
> +	/** @xa: xarray to store LRCs */
> +	struct xarray xa;
> +	/** @list_lock: Secondary queue list lock */
> +	struct mutex list_lock;
> +};
> +
>  /**
>   * struct xe_exec_queue - Execution queue
>   *
> @@ -110,6 +128,18 @@ struct xe_exec_queue {
>  		struct xe_guc_exec_queue *guc;
>  	};
>  
> +	/** @multi_queue: Multi queue information */
> +	struct {
> +		/** @multi_queue.group: Queue group information */
> +		struct xe_exec_queue_group *group;
> +		/** @multi_queue.pos: Position of queue within the multi-queue group */
> +		u8 pos;
> +		/** @multi_queue.valid: Queue belongs to a multi queue group */
> +		u8 valid:1;
> +		/** @multi_queue.is_primary: Is primary queue (Q0) of the group */
> +		u8 is_primary:1;
> +	} multi_queue;
> +
>  	/** @sched_props: scheduling properties */
>  	struct {
>  		/** @sched_props.timeslice_us: timeslice period in micro-seconds */
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index 47853659a705..d903b3a55ec1 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -1252,6 +1252,12 @@ struct drm_xe_vm_bind {
>   *    Given that going into a power-saving state kills PXP HWDRM sessions,
>   *    runtime PM will be blocked while queues of this type are alive.
>   *    All PXP queues will be killed if a PXP invalidation event occurs.
> + *  - %DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP - Create a multi-queue group
> + *    or add secondary queues to a multi-queue group.
> + *    If the extension's 'value' field has %DRM_XE_MULTI_GROUP_CREATE flag set,
> + *    then a new multi-queue group is created with this queue as the primary queue
> + *    (Q0). Otherwise, the queue gets added to the multi-queue group whose primary
> + *    queue id is specified in the 'value' field.

s/queue id/exec_queue_id

^^^ to match names in structure.

Matt

>   *
>   * The example below shows how to use @drm_xe_exec_queue_create to create
>   * a simple exec_queue (no parallel submission) of class
> @@ -1292,6 +1298,8 @@ struct drm_xe_exec_queue_create {
>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY		0
>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE		1
>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE		2
> +#define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP		3
> +#define     DRM_XE_MULTI_GROUP_CREATE				(1ull << 63)
>  	/** @extensions: Pointer to the first extension struct, if any */
>  	__u64 extensions;
>  
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] drm/xe/multi_queue: Add GuC interface for multi queue support
  2025-10-31 18:29 ` [PATCH 03/16] drm/xe/multi_queue: Add GuC " Niranjana Vishwanathapura
  2025-11-01 18:07   ` Matthew Brost
@ 2025-11-02 18:02   ` Matthew Brost
  2025-11-04  5:02     ` Niranjana Vishwanathapura
  1 sibling, 1 reply; 61+ messages in thread
From: Matthew Brost @ 2025-11-02 18:02 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Fri, Oct 31, 2025 at 11:29:23AM -0700, Niranjana Vishwanathapura wrote:
> Implement GuC commands and response along with the Context
> Group Page (CGP) interface for multi queue support.
> 
> Ensure that only primary queue (q0) of a multi queue group
> communicate with GuC. The secondary queues of the group only
> need to maintain LRCA and interface with drm scheduler.
> 
> Use primary queue's submit_wq for all secondary queues of a multi
> queue group. This serialization avoids any locking around CGP
> synchronization with GuC.
> 
> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/xe/abi/guc_actions_abi.h |   3 +
>  drivers/gpu/drm/xe/xe_exec_queue_types.h |   2 +
>  drivers/gpu/drm/xe/xe_guc_ct.c           |   4 +
>  drivers/gpu/drm/xe/xe_guc_fwif.h         |   3 +
>  drivers/gpu/drm/xe/xe_guc_submit.c       | 302 +++++++++++++++++++----
>  drivers/gpu/drm/xe/xe_guc_submit.h       |   1 +
>  6 files changed, 270 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/abi/guc_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> index 47756e4674a1..3e9fbed9cda6 100644
> --- a/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> +++ b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> @@ -139,6 +139,9 @@ enum xe_guc_action {
>  	XE_GUC_ACTION_DEREGISTER_G2G = 0x4508,
>  	XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
>  	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
> +	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE = 0x4602,
> +	XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC = 0x4603,
> +	XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE = 0x4604,
>  	XE_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507,
>  	XE_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A,
>  	XE_GUC_ACTION_SET_DEVICE_ENGINE_ACTIVITY_BUFFER = 0x550C,
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index 3856776df5c4..38e47b003259 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -47,6 +47,8 @@ struct xe_exec_queue_group {
>  	struct xarray xa;
>  	/** @list_lock: Secondary queue list lock */
>  	struct mutex list_lock;
> +	/** @sync_pending: CGP_SYNC_DONE g2h response pending */
> +	bool sync_pending;
>  };
>  
>  /**
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> index e68953ef3a00..48b5006eb080 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -1304,6 +1304,7 @@ static int parse_g2h_event(struct xe_guc_ct *ct, u32 *msg, u32 len)
>  	lockdep_assert_held(&ct->lock);
>  
>  	switch (action) {
> +	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
>  	case XE_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
>  	case XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
>  	case XE_GUC_ACTION_SCHED_ENGINE_MODE_DONE:
> @@ -1570,6 +1571,9 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
>  		ret = xe_guc_g2g_test_notification(guc, payload, adj_len);
>  		break;
>  #endif
> +	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
> +		ret = xe_guc_exec_queue_cgp_sync_done_handler(guc, payload, adj_len);
> +		break;
>  	default:
>  		xe_gt_err(gt, "unexpected G2H action 0x%04x\n", action);
>  	}
> diff --git a/drivers/gpu/drm/xe/xe_guc_fwif.h b/drivers/gpu/drm/xe/xe_guc_fwif.h
> index c90dd266e9cf..610dfb2f1cb5 100644
> --- a/drivers/gpu/drm/xe/xe_guc_fwif.h
> +++ b/drivers/gpu/drm/xe/xe_guc_fwif.h
> @@ -16,6 +16,7 @@
>  #define G2H_LEN_DW_DEREGISTER_CONTEXT		3
>  #define G2H_LEN_DW_TLB_INVALIDATE		3
>  #define G2H_LEN_DW_G2G_NOTIFY_MIN		3
> +#define G2H_LEN_DW_MULTI_QUEUE_CONTEXT		4
>  
>  #define GUC_ID_MAX			65535
>  #define GUC_ID_UNKNOWN			0xffffffff
> @@ -62,6 +63,8 @@ struct guc_ctxt_registration_info {

Side note - this struct could probably move to private struct
xe_guc_submit.c.

>  	u32 wq_base_lo;
>  	u32 wq_base_hi;
>  	u32 wq_size;
> +	u32 cgp_lo;
> +	u32 cgp_hi;
>  	u32 hwlrca_lo;
>  	u32 hwlrca_hi;
>  };
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index d4ffdb71ef3d..d2aa9a2524e7 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -46,6 +46,7 @@
>  #include "xe_trace.h"
>  #include "xe_uc_fw.h"
>  #include "xe_vm.h"
> +#include "xe_bo.h"

Why do you need xe_bo.h? It is not obvious to me. If you need it,
alphabetical order.

>  
>  static struct xe_guc *
>  exec_queue_to_guc(struct xe_exec_queue *q)
> @@ -541,7 +542,8 @@ static void init_policies(struct xe_guc *guc, struct xe_exec_queue *q)
>  	u32 slpc_exec_queue_freq_req = 0;
>  	u32 preempt_timeout_us = q->sched_props.preempt_timeout_us;
>  
> -	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
> +	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q) &&
> +		     !xe_exec_queue_is_multi_queue_secondary(q));
>  
>  	if (q->flags & EXEC_QUEUE_FLAG_LOW_LATENCY)
>  		slpc_exec_queue_freq_req |= SLPC_CTX_FREQ_REQ_IS_COMPUTE;
> @@ -561,6 +563,8 @@ static void set_min_preemption_timeout(struct xe_guc *guc, struct xe_exec_queue
>  {
>  	struct exec_queue_policy policy;
>  
> +	xe_assert(guc_to_xe(guc), !xe_exec_queue_is_multi_queue_secondary(q));
> +
>  	__guc_exec_queue_policy_start_klv(&policy, q->guc->id);
>  	__guc_exec_queue_policy_add_preemption_timeout(&policy, 1);
>  
> @@ -575,6 +579,130 @@ static void set_min_preemption_timeout(struct xe_guc *guc, struct xe_exec_queue
>  	xe_map_wr_field(xe_, &map_, 0, struct guc_submit_parallel_scratch, \
>  			field_, val_)
>  
> +#define CGP_VERSION_MAJOR_SHIFT	8
> +
> +static void xe_guc_exec_queue_group_cgp_update(struct xe_device *xe,
> +					       struct xe_exec_queue *q)
> +{
> +	struct xe_exec_queue_group *group = q->multi_queue.group;
> +	u32 guc_id = group->primary->guc->id;
> +
> +	/* Currently implementing CGP version 1.0 */
> +	xe_map_wr(xe, &group->cgp_bo->vmap, 0, u32,
> +		  1 << CGP_VERSION_MAJOR_SHIFT);
> +
> +	xe_map_wr(xe, &group->cgp_bo->vmap,
> +		  (32 + q->multi_queue.pos * 2) * sizeof(u32),
> +		  u32, lower_32_bits(xe_lrc_descriptor(q->lrc[0])));
> +
> +	xe_map_wr(xe, &group->cgp_bo->vmap,
> +		  (33 + q->multi_queue.pos * 2) * sizeof(u32),
> +		  u32, guc_id);
> +
> +	if (q->multi_queue.pos / 32) {
> +		xe_map_wr(xe, &group->cgp_bo->vmap, 17 * sizeof(u32),
> +			  u32, BIT(q->multi_queue.pos % 32));
> +		xe_map_wr(xe, &group->cgp_bo->vmap, 16 * sizeof(u32), u32, 0);
> +	} else {
> +		xe_map_wr(xe, &group->cgp_bo->vmap, 16 * sizeof(u32),
> +			  u32, BIT(q->multi_queue.pos));
> +		xe_map_wr(xe, &group->cgp_bo->vmap, 17 * sizeof(u32), u32, 0);

Maybe some defines for all these numbers (16, 17, 32, 33) in this
function? Or some comments? It is very hard to look at this code and
know what it is doing.

> +	}
> +}
> +
> +static void xe_guc_exec_queue_group_cgp_sync(struct xe_guc *guc,
> +					     struct xe_exec_queue *q,
> +					     const u32 *action, u32 len)
> +{
> +	struct xe_exec_queue_group *group = q->multi_queue.group;
> +	struct xe_device *xe = guc_to_xe(guc);
> +	long ret;
> +
> +	/*
> +	 * As all queues of a multi queue group use single drm scheduler
> +	 * submit workqueue, CGP synchronization with GuC are serialized.
> +	 * Hence, no locking is required here.
> +	 * Wait for any pending CGP_SYNC_DONE response before updating the
> +	 * CGP page and sending CGP_SYNC message.
> +	 */
> +	ret = wait_event_timeout(guc->ct.wq,
> +				 !READ_ONCE(group->sync_pending) ||
> +				 xe_guc_read_stopped(guc), HZ);
> +	if (!ret || xe_guc_read_stopped(guc)) {
> +		drm_err(&xe->drm, "Wait for CGP_SYNC_DONE response failed!\n");
> +		/* Something wrong with the CTB or GuC, no need to proceed */
> +		return;
> +	}
> +
> +	xe_guc_exec_queue_group_cgp_update(xe, q);
> +
> +	WRITE_ONCE(group->sync_pending, true);
> +	xe_guc_ct_send(&guc->ct, action, len, G2H_LEN_DW_MULTI_QUEUE_CONTEXT, 1);
> +}
> +
> +static void __register_exec_queue(struct xe_guc *guc,
> +				  struct guc_ctxt_registration_info *info)
> +{
> +	u32 action[] = {
> +		XE_GUC_ACTION_REGISTER_CONTEXT,
> +		info->flags,
> +		info->context_idx,
> +		info->engine_class,
> +		info->engine_submit_mask,
> +		info->wq_desc_lo,
> +		info->wq_desc_hi,
> +		info->wq_base_lo,
> +		info->wq_base_hi,
> +		info->wq_size,
> +		info->hwlrca_lo,
> +		info->hwlrca_hi,
> +	};
> +
> +	/* explicitly checks some fields that we might fixup later */
> +	xe_gt_assert(guc_to_gt(guc), info->wq_desc_lo ==
> +		     action[XE_GUC_REGISTER_CONTEXT_DATA_5_WQ_DESC_ADDR_LOWER]);
> +	xe_gt_assert(guc_to_gt(guc), info->wq_base_lo ==
> +		     action[XE_GUC_REGISTER_CONTEXT_DATA_7_WQ_BUF_BASE_LOWER]);
> +	xe_gt_assert(guc_to_gt(guc), info->hwlrca_lo ==
> +		     action[XE_GUC_REGISTER_CONTEXT_DATA_10_HW_LRC_ADDR]);
> +
> +	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
> +}
> +
> +static void __register_exec_queue_group(struct xe_guc *guc,
> +					struct xe_exec_queue *q,
> +					struct guc_ctxt_registration_info *info)
> +{
> +#define MAX_MULTI_QUEUE_REG_SIZE	(8)
> +	struct xe_device *xe = guc_to_xe(guc);
> +	u32 action[MAX_MULTI_QUEUE_REG_SIZE];
> +	int len = 0;
> +
> +	if (xe_exec_queue_is_multi_queue_primary(q)) {
> +		action[len++] = XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE;
> +		action[len++] = info->flags;
> +		action[len++] = info->context_idx;
> +		action[len++] = info->engine_class;
> +		action[len++] = info->engine_submit_mask;
> +		action[len++] = 0; /* Reserved */
> +		action[len++] = info->cgp_lo;
> +		action[len++] = info->cgp_hi;
> +	} else {
> +		/*
> +		 * No need to wait before CGP sync since CT descriptors
> +		 * should be ordered.
> +		 */
> +
> +		action[len++] = XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC;
> +		action[len++] = q->multi_queue.group->primary->guc->id;
> +	}
> +
> +	xe_assert(xe, len <= MAX_MULTI_QUEUE_REG_SIZE);
> +#undef MAX_MULTI_QUEUE_REG_SIZE
> +
> +	xe_guc_exec_queue_group_cgp_sync(guc, q, action, len);
> +}
> +
>  static void __register_mlrc_exec_queue(struct xe_guc *guc,
>  				       struct xe_exec_queue *q,
>  				       struct guc_ctxt_registration_info *info)
> @@ -622,35 +750,6 @@ static void __register_mlrc_exec_queue(struct xe_guc *guc,
>  	xe_guc_ct_send(&guc->ct, action, len, 0, 0);
>  }
>  
> -static void __register_exec_queue(struct xe_guc *guc,
> -				  struct guc_ctxt_registration_info *info)
> -{
> -	u32 action[] = {
> -		XE_GUC_ACTION_REGISTER_CONTEXT,
> -		info->flags,
> -		info->context_idx,
> -		info->engine_class,
> -		info->engine_submit_mask,
> -		info->wq_desc_lo,
> -		info->wq_desc_hi,
> -		info->wq_base_lo,
> -		info->wq_base_hi,
> -		info->wq_size,
> -		info->hwlrca_lo,
> -		info->hwlrca_hi,
> -	};
> -
> -	/* explicitly checks some fields that we might fixup later */
> -	xe_gt_assert(guc_to_gt(guc), info->wq_desc_lo ==
> -		     action[XE_GUC_REGISTER_CONTEXT_DATA_5_WQ_DESC_ADDR_LOWER]);
> -	xe_gt_assert(guc_to_gt(guc), info->wq_base_lo ==
> -		     action[XE_GUC_REGISTER_CONTEXT_DATA_7_WQ_BUF_BASE_LOWER]);
> -	xe_gt_assert(guc_to_gt(guc), info->hwlrca_lo ==
> -		     action[XE_GUC_REGISTER_CONTEXT_DATA_10_HW_LRC_ADDR]);
> -
> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
> -}
> -
>  static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
>  {
>  	struct xe_guc *guc = exec_queue_to_guc(q);
> @@ -670,6 +769,13 @@ static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
>  	info.flags = CONTEXT_REGISTRATION_FLAG_KMD |
>  		FIELD_PREP(CONTEXT_REGISTRATION_FLAG_TYPE, ctx_type);
>  
> +	if (xe_exec_queue_is_multi_queue(q)) {
> +		struct xe_exec_queue_group *group = q->multi_queue.group;
> +
> +		info.cgp_lo = xe_bo_ggtt_addr(group->cgp_bo);
> +		info.cgp_hi = 0;
> +	}
> +
>  	if (xe_exec_queue_is_parallel(q)) {
>  		u64 ggtt_addr = xe_lrc_parallel_ggtt_addr(lrc);
>  		struct iosys_map map = xe_lrc_parallel_map(lrc);
> @@ -700,11 +806,15 @@ static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
>  
>  	set_exec_queue_registered(q);
>  	trace_xe_exec_queue_register(q);
> -	if (xe_exec_queue_is_parallel(q))
> +	if (xe_exec_queue_is_multi_queue(q))
> +		__register_exec_queue_group(guc, q, &info);
> +	else if (xe_exec_queue_is_parallel(q))
>  		__register_mlrc_exec_queue(guc, q, &info);
>  	else
>  		__register_exec_queue(guc, &info);
> -	init_policies(guc, q);
> +
> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
> +		init_policies(guc, q);
>  }
>  
>  static u32 wq_space_until_wrap(struct xe_exec_queue *q)
> @@ -833,6 +943,12 @@ static void submit_exec_queue(struct xe_exec_queue *q, struct xe_sched_job *job)
>  	if (exec_queue_suspended(q) && !xe_exec_queue_is_parallel(q))
>  		return;
>  
> +	/*
> +	 * All queues in a multi-queue group will use the primary queue
> +	 * of the group to interface with GuC.
> +	 */
> +	q = xe_exec_queue_multi_queue_primary(q);
> +

I think we might need a bit more thought which bits each queue owns in
q->guc->state. The state machine is pretty complicated and now pointing
secondary -> primary in some cases makes this even worse. I guess I'd
ask to figure out which bits a owned by primary, which ones by the
secondary, and which owns are mirrored and write this down somewhere.

Matt

>  	if (!exec_queue_enabled(q) && !exec_queue_suspended(q)) {
>  		action[len++] = XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
>  		action[len++] = q->guc->id;
> @@ -879,6 +995,18 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
>  	trace_xe_sched_job_run(job);
>  
>  	if (!killed_or_banned_or_wedged && !xe_sched_job_is_error(job)) {
> +		if (xe_exec_queue_is_multi_queue_secondary(q)) {
> +			struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
> +
> +			if (exec_queue_killed_or_banned_or_wedged(primary)) {
> +				killed_or_banned_or_wedged = true;
> +				goto run_job_out;
> +			}
> +
> +			if (!exec_queue_registered(primary))
> +				register_exec_queue(primary, GUC_CONTEXT_NORMAL);
> +		}
> +
>  		if (!exec_queue_registered(q))
>  			register_exec_queue(q, GUC_CONTEXT_NORMAL);
>  		if (!job->skip_emit)
> @@ -887,6 +1015,7 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
>  		job->skip_emit = false;
>  	}
>  
> +run_job_out:
>  	/*
>  	 * We don't care about job-fence ordering in LR VMs because these fences
>  	 * are never exported; they are used solely to keep jobs on the pending
> @@ -912,6 +1041,11 @@ int xe_guc_read_stopped(struct xe_guc *guc)
>  	return atomic_read(&guc->submission_state.stopped);
>  }
>  
> +static void handle_multi_queue_secondary_sched_done(struct xe_guc *guc,
> +						    struct xe_exec_queue *q,
> +						    u32 runnable_state);
> +static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q);
> +
>  #define MAKE_SCHED_CONTEXT_ACTION(q, enable_disable)			\
>  	u32 action[] = {						\
>  		XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET,			\
> @@ -925,7 +1059,9 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
>  	MAKE_SCHED_CONTEXT_ACTION(q, DISABLE);
>  	int ret;
>  
> -	set_min_preemption_timeout(guc, q);
> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
> +		set_min_preemption_timeout(guc, q);
> +
>  	smp_rmb();
>  	ret = wait_event_timeout(guc->ct.wq,
>  				 (!exec_queue_pending_enable(q) &&
> @@ -953,9 +1089,12 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
>  	 * Reserve space for both G2H here as the 2nd G2H is sent from a G2H
>  	 * handler and we are not allowed to reserved G2H space in handlers.
>  	 */
> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET +
> -		       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
> +	if (xe_exec_queue_is_multi_queue_secondary(q))
> +		handle_multi_queue_secondary_sched_done(guc, q, 0);
> +	else
> +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET +
> +			       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
>  }
>  
>  static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
> @@ -1161,8 +1300,11 @@ static void enable_scheduling(struct xe_exec_queue *q)
>  	set_exec_queue_enabled(q);
>  	trace_xe_exec_queue_scheduling_enable(q);
>  
> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
> +	if (xe_exec_queue_is_multi_queue_secondary(q))
> +		handle_multi_queue_secondary_sched_done(guc, q, 1);
> +	else
> +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
>  
>  	ret = wait_event_timeout(guc->ct.wq,
>  				 !exec_queue_pending_enable(q) ||
> @@ -1186,14 +1328,17 @@ static void disable_scheduling(struct xe_exec_queue *q, bool immediate)
>  	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
>  	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q));
>  
> -	if (immediate)
> +	if (immediate && !xe_exec_queue_is_multi_queue_secondary(q))
>  		set_min_preemption_timeout(guc, q);
>  	clear_exec_queue_enabled(q);
>  	set_exec_queue_pending_disable(q);
>  	trace_xe_exec_queue_scheduling_disable(q);
>  
> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
> +	if (xe_exec_queue_is_multi_queue_secondary(q))
> +		handle_multi_queue_secondary_sched_done(guc, q, 0);
> +	else
> +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
>  }
>  
>  static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
> @@ -1211,8 +1356,11 @@ static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
>  	set_exec_queue_destroyed(q);
>  	trace_xe_exec_queue_deregister(q);
>  
> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> -		       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
> +	if (xe_exec_queue_is_multi_queue_secondary(q))
> +		handle_deregister_done(guc, q);
> +	else
> +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> +			       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
>  }
>  
>  static enum drm_gpu_sched_stat
> @@ -1660,6 +1808,7 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
>  {
>  	struct xe_gpu_scheduler *sched;
>  	struct xe_guc *guc = exec_queue_to_guc(q);
> +	struct workqueue_struct *submit_wq = NULL;
>  	struct xe_guc_exec_queue *ge;
>  	long timeout;
>  	int err, i;
> @@ -1680,8 +1829,20 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
>  
>  	timeout = (q->vm && xe_vm_in_lr_mode(q->vm)) ? MAX_SCHEDULE_TIMEOUT :
>  		  msecs_to_jiffies(q->sched_props.job_timeout_ms);
> +
> +	/*
> +	 * Use primary queue's submit_wq for all secondary queues of a
> +	 * multi queue group. This serialization avoids any locking around
> +	 * CGP synchronization with GuC.
> +	 */
> +	if (xe_exec_queue_is_multi_queue_secondary(q)) {
> +		struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
> +
> +		submit_wq = primary->guc->sched.base.submit_wq;
> +	}
> +
>  	err = xe_sched_init(&ge->sched, &drm_sched_ops, &xe_sched_ops,
> -			    NULL, xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES, 64,
> +			    submit_wq, xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES, 64,
>  			    timeout, guc_to_gt(guc)->ordered_wq, NULL,
>  			    q->name, gt_to_xe(q->gt)->drm.dev);
>  	if (err)
> @@ -2418,7 +2579,11 @@ static void deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
>  
>  	trace_xe_exec_queue_deregister(q);
>  
> -	xe_guc_ct_send_g2h_handler(&guc->ct, action, ARRAY_SIZE(action));
> +	if (xe_exec_queue_is_multi_queue_secondary(q))
> +		handle_deregister_done(guc, q);
> +	else
> +		xe_guc_ct_send_g2h_handler(&guc->ct, action,
> +					   ARRAY_SIZE(action));
>  }
>  
>  static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
> @@ -2468,6 +2633,15 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
>  	}
>  }
>  
> +static void handle_multi_queue_secondary_sched_done(struct xe_guc *guc,
> +						    struct xe_exec_queue *q,
> +						    u32 runnable_state)
> +{
> +	mutex_lock(&guc->ct.lock);
> +	handle_sched_done(guc, q, runnable_state);
> +	mutex_unlock(&guc->ct.lock);
> +}
> +
>  int xe_guc_sched_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
>  {
>  	struct xe_exec_queue *q;
> @@ -2672,6 +2846,44 @@ int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 le
>  	return 0;
>  }
>  
> +/**
> + * xe_guc_exec_queue_cgp_sync_done_handler - CGP synchronization done handler
> + * @guc: guc
> + * @msg: message indicating CGP sync done
> + * @len: length of message
> + *
> + * Set multi queue group's sync_pending flag to false and wakeup anyone waiting
> + * for CGP synchronization to complete.
> + *
> + * Return: 0 on success, -EPROTO for malformed messages.
> + */
> +int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
> +{
> +	struct xe_device *xe = guc_to_xe(guc);
> +	struct xe_exec_queue *q;
> +	u32 guc_id = msg[0];
> +
> +	if (unlikely(len < 1)) {
> +		drm_err(&xe->drm, "Invalid CGP_SYNC_DONE length %u", len);
> +		return -EPROTO;
> +	}
> +
> +	q = g2h_exec_queue_lookup(guc, guc_id);
> +	if (unlikely(!q))
> +		return -EPROTO;
> +
> +	if (!xe_exec_queue_is_multi_queue_primary(q)) {
> +		drm_err(&xe->drm, "Unexpected CGP_SYNC_DONE response");
> +		return -EPROTO;
> +	}
> +
> +	/* Wakeup the serialized cgp update wait */
> +	WRITE_ONCE(q->multi_queue.group->sync_pending, false);
> +	wake_up_all(&guc->ct.wq);
> +
> +	return 0;
> +}
> +
>  static void
>  guc_exec_queue_wq_snapshot_capture(struct xe_exec_queue *q,
>  				   struct xe_guc_submit_exec_queue_snapshot *snapshot)
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
> index b49a2748ec46..abfa94bce391 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.h
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.h
> @@ -34,6 +34,7 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
>  					       u32 len);
>  int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 len);
>  int xe_guc_error_capture_handler(struct xe_guc *guc, u32 *msg, u32 len);
> +int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
>  
>  struct xe_guc_submit_exec_queue_snapshot *
>  xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q);
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 10/16] drm/xe/multi_queue: Set QUEUE_DRAIN_MODE for Multi Queue batches
  2025-10-31 18:29 ` [PATCH 10/16] drm/xe/multi_queue: Set QUEUE_DRAIN_MODE for Multi Queue batches Niranjana Vishwanathapura
@ 2025-11-02 18:22   ` Matthew Brost
  2025-11-03 17:09     ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 61+ messages in thread
From: Matthew Brost @ 2025-11-02 18:22 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Fri, Oct 31, 2025 at 11:29:30AM -0700, Niranjana Vishwanathapura wrote:
> To properly support soft light restore between batches
> being arbitrated at the CFEG, PIPE_CONTROL instructions
> have a new bit in the first DW, QUEUE_DRAIN_MODE. When
> set, this indicates to the CFEG that it should only
> drain the current queue.
> 
> Additionally we no longer want to set the CS_STALL bit
> for these multi queue queues as this causes the entire
> pipeline to stall waiting for completion of the prior
> batch, preventing this soft light restore from occurring
> between queues in a queue group.
> 

Bspec: <number>

This will help in the review.

Matt

> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  .../gpu/drm/xe/instructions/xe_gpu_commands.h |  1 +
>  drivers/gpu/drm/xe/xe_ring_ops.c              | 68 ++++++++++++-------
>  2 files changed, 45 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> index 5d41ca297447..885fcf211e6d 100644
> --- a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> +++ b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
> @@ -47,6 +47,7 @@
>  
>  #define GFX_OP_PIPE_CONTROL(len)	((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2))
>  
> +#define   PIPE_CONTROL0_QUEUE_DRAIN_MODE		BIT(12)
>  #define	  PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE	BIT(10)	/* gen12 */
>  #define	  PIPE_CONTROL0_HDC_PIPELINE_FLUSH		BIT(9)	/* gen12 */
>  
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
> index ac0c6dcffe15..71f0e19fe8ba 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> @@ -12,7 +12,7 @@
>  #include "regs/xe_engine_regs.h"
>  #include "regs/xe_gt_regs.h"
>  #include "regs/xe_lrc_layout.h"
> -#include "xe_exec_queue_types.h"
> +#include "xe_exec_queue.h"
>  #include "xe_gt.h"
>  #include "xe_lrc.h"
>  #include "xe_macros.h"
> @@ -135,12 +135,11 @@ emit_pipe_control(u32 *dw, int i, u32 bit_group_0, u32 bit_group_1, u32 offset,
>  	return i;
>  }
>  
> -static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
> -				int i)
> +static int emit_pipe_invalidate(struct xe_exec_queue *q, u32 mask_flags,
> +				bool invalidate_tlb, u32 *dw, int i)
>  {
>  	u32 flags0 = 0;
> -	u32 flags1 = PIPE_CONTROL_CS_STALL |
> -		PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
> +	u32 flags1 = PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
>  		PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE |
>  		PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
>  		PIPE_CONTROL_VF_CACHE_INVALIDATE |
> @@ -152,6 +151,11 @@ static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
>  	if (invalidate_tlb)
>  		flags1 |= PIPE_CONTROL_TLB_INVALIDATE;
>  
> +	if (xe_exec_queue_is_multi_queue(q))
> +		flags0 |= PIPE_CONTROL0_QUEUE_DRAIN_MODE;
> +	else
> +		flags1 |= PIPE_CONTROL_CS_STALL;
> +
>  	flags1 &= ~mask_flags;
>  
>  	if (flags1 & PIPE_CONTROL_VF_CACHE_INVALIDATE)
> @@ -175,54 +179,70 @@ static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
>  
>  static int emit_render_cache_flush(struct xe_sched_job *job, u32 *dw, int i)
>  {
> -	struct xe_gt *gt = job->q->gt;
> +	struct xe_exec_queue *q = job->q;
> +	struct xe_gt *gt = q->gt;
>  	bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
> -	u32 flags;
> +	u32 flags0, flags1;
>  
>  	if (XE_GT_WA(gt, 14016712196))
>  		i = emit_pipe_control(dw, i, 0, PIPE_CONTROL_DEPTH_CACHE_FLUSH,
>  				      LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
>  
> -	flags = (PIPE_CONTROL_CS_STALL |
> -		 PIPE_CONTROL_TILE_CACHE_FLUSH |
> +	flags0 = PIPE_CONTROL0_HDC_PIPELINE_FLUSH;
> +	flags1 = (PIPE_CONTROL_TILE_CACHE_FLUSH |
>  		 PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
>  		 PIPE_CONTROL_DEPTH_CACHE_FLUSH |
>  		 PIPE_CONTROL_DC_FLUSH_ENABLE |
>  		 PIPE_CONTROL_FLUSH_ENABLE);
>  
>  	if (XE_GT_WA(gt, 1409600907))
> -		flags |= PIPE_CONTROL_DEPTH_STALL;
> +		flags1 |= PIPE_CONTROL_DEPTH_STALL;
>  
>  	if (lacks_render)
> -		flags &= ~PIPE_CONTROL_3D_ARCH_FLAGS;
> +		flags1 &= ~PIPE_CONTROL_3D_ARCH_FLAGS;
>  	else if (job->q->class == XE_ENGINE_CLASS_COMPUTE)
> -		flags &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
> +		flags1 &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
> +
> +	if (xe_exec_queue_is_multi_queue(q))
> +		flags0 |= PIPE_CONTROL0_QUEUE_DRAIN_MODE;
> +	else
> +		flags1 |= PIPE_CONTROL_CS_STALL;
>  
> -	return emit_pipe_control(dw, i, PIPE_CONTROL0_HDC_PIPELINE_FLUSH, flags, 0, 0);
> +	return emit_pipe_control(dw, i, flags0, flags1, 0, 0);
>  }
>  
> -static int emit_pipe_control_to_ring_end(struct xe_hw_engine *hwe, u32 *dw, int i)
> +static int emit_pipe_control_to_ring_end(struct xe_exec_queue *q, u32 *dw, int i)
>  {
> +	u32 flags0 = 0, flags1 = PIPE_CONTROL_LRI_POST_SYNC;
> +	struct xe_hw_engine *hwe = q->hwe;
> +
>  	if (hwe->class != XE_ENGINE_CLASS_RENDER)
>  		return i;
>  
> +	if (xe_exec_queue_is_multi_queue(q))
> +		flags0 |= PIPE_CONTROL0_QUEUE_DRAIN_MODE;
> +
>  	if (XE_GT_WA(hwe->gt, 16020292621))
> -		i = emit_pipe_control(dw, i, 0, PIPE_CONTROL_LRI_POST_SYNC,
> +		i = emit_pipe_control(dw, i, flags0, flags1,
>  				      RING_NOPID(hwe->mmio_base).addr, 0);
>  
>  	return i;
>  }
>  
> -static int emit_pipe_imm_ggtt(u32 addr, u32 value, bool stall_only, u32 *dw,
> -			      int i)
> +static int emit_pipe_imm_ggtt(struct xe_exec_queue *q, u32 addr, u32 value,
> +			      bool stall_only, u32 *dw, int i)
>  {
> -	u32 flags = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_GLOBAL_GTT_IVB |
> -		    PIPE_CONTROL_QW_WRITE;
> +	u32 flags0 = 0, flags1 = PIPE_CONTROL_GLOBAL_GTT_IVB | PIPE_CONTROL_QW_WRITE;
>  
>  	if (!stall_only)
> -		flags |= PIPE_CONTROL_FLUSH_ENABLE;
> +		flags1 |= PIPE_CONTROL_FLUSH_ENABLE;
> +
> +	if (xe_exec_queue_is_multi_queue(q))
> +		flags0 |= PIPE_CONTROL0_QUEUE_DRAIN_MODE;
> +	else
> +		flags1 |= PIPE_CONTROL_CS_STALL;
>  
> -	return emit_pipe_control(dw, i, 0, flags, addr, value);
> +	return emit_pipe_control(dw, i, flags0, flags1, addr, value);
>  }
>  
>  static u32 get_ppgtt_flag(struct xe_sched_job *job)
> @@ -371,7 +391,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
>  		mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
>  
>  	/* See __xe_pt_bind_vma() for a discussion on TLB invalidations. */
> -	i = emit_pipe_invalidate(mask_flags, job->ring_ops_flush_tlb, dw, i);
> +	i = emit_pipe_invalidate(job->q, mask_flags, job->ring_ops_flush_tlb, dw, i);
>  
>  	/* hsdes: 1809175790 */
>  	if (has_aux_ccs(xe))
> @@ -391,11 +411,11 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
>  						job->user_fence.value,
>  						dw, i);
>  
> -	i = emit_pipe_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc), seqno, lacks_render, dw, i);
> +	i = emit_pipe_imm_ggtt(job->q, xe_lrc_seqno_ggtt_addr(lrc), seqno, lacks_render, dw, i);
>  
>  	i = emit_user_interrupt(dw, i);
>  
> -	i = emit_pipe_control_to_ring_end(job->q->hwe, dw, i);
> +	i = emit_pipe_control_to_ring_end(job->q, dw, i);
>  
>  	xe_gt_assert(gt, i <= MAX_JOB_SIZE_DW);
>  
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 11/16] drm/xe/multi_queue: Handle CGP context error
  2025-10-31 18:29 ` [PATCH 11/16] drm/xe/multi_queue: Handle CGP context error Niranjana Vishwanathapura
@ 2025-11-02 18:29   ` Matthew Brost
  2025-11-03 16:44     ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 61+ messages in thread
From: Matthew Brost @ 2025-11-02 18:29 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Fri, Oct 31, 2025 at 11:29:31AM -0700, Niranjana Vishwanathapura wrote:
> Trigger multi-queue context cleanup upon CGP context error
> notification from GuC.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/xe/abi/guc_actions_abi.h |  1 +
>  drivers/gpu/drm/xe/xe_guc_ct.c           |  4 +++
>  drivers/gpu/drm/xe/xe_guc_submit.c       | 33 ++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_guc_submit.h       |  2 ++
>  drivers/gpu/drm/xe/xe_trace.h            |  5 ++++
>  5 files changed, 45 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/abi/guc_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> index 3e9fbed9cda6..8af3691626bf 100644
> --- a/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> +++ b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> @@ -142,6 +142,7 @@ enum xe_guc_action {
>  	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE = 0x4602,
>  	XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC = 0x4603,
>  	XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE = 0x4604,
> +	XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CGP_CONTEXT_ERROR = 0x4605,
>  	XE_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507,
>  	XE_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A,
>  	XE_GUC_ACTION_SET_DEVICE_ENGINE_ACTIVITY_BUFFER = 0x550C,
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> index 48b5006eb080..d0e19af0b4d2 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -1574,6 +1574,10 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
>  	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
>  		ret = xe_guc_exec_queue_cgp_sync_done_handler(guc, payload, adj_len);
>  		break;
> +	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CGP_CONTEXT_ERROR:
> +		ret = xe_guc_exec_queue_cgp_context_error_handler(guc, payload,
> +								  adj_len);
> +		break;
>  	default:
>  		xe_gt_err(gt, "unexpected G2H action 0x%04x\n", action);
>  	}
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 87c13feb2cef..605352145d76 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -48,6 +48,8 @@
>  #include "xe_vm.h"
>  #include "xe_bo.h"
>  
> +#define XE_GUC_EXEC_QUEUE_CGP_CONTEXT_ERROR_LEN		6
> +
>  static struct xe_guc *
>  exec_queue_to_guc(struct xe_exec_queue *q)
>  {
> @@ -3001,6 +3003,37 @@ int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 le
>  	return 0;
>  }
>  
> +int xe_guc_exec_queue_cgp_context_error_handler(struct xe_guc *guc, u32 *msg,
> +						u32 len)
> +{
> +	struct xe_gt *gt = guc_to_gt(guc);
> +	struct xe_device *xe = guc_to_xe(guc);
> +	struct xe_exec_queue *q;
> +	u32 guc_id = msg[2];
> +
> +	if (unlikely(len != XE_GUC_EXEC_QUEUE_CGP_CONTEXT_ERROR_LEN)) {
> +		drm_err(&xe->drm, "Invalid length %u", len);
> +		return -EPROTO;
> +	}
> +
> +	q = g2h_exec_queue_lookup(guc, guc_id);
> +	if (unlikely(!q))
> +		return -EPROTO;
> +
> +	xe_gt_dbg(gt,
> +		  "CGP context error: region=%s err=0x%x, context=0x%x LRCA=0x%x:0x%x SgId=0x%x",
> +		  msg[0] & 1 ? "uc" : "kmd", msg[1], msg[2], msg[4], msg[3], msg[5]);
> +
> +	trace_xe_exec_queue_cgp_context_error(q);
> +
> +	/* Treat the same as engine reset */
> +	set_exec_queue_reset(q);
> +	if (!exec_queue_banned(q) && !exec_queue_check_timeout(q))

I don't think you need the exec_queue_check_timeout check.

Otherwise LGTM.

Matt

> +		xe_guc_exec_queue_trigger_cleanup(q);
> +
> +	return 0;
> +}
> +
>  /**
>   * xe_guc_exec_queue_cgp_sync_done_handler - CGP synchronization done handler
>   * @guc: guc
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
> index abfa94bce391..01b013a90b1b 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.h
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.h
> @@ -35,6 +35,8 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
>  int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 len);
>  int xe_guc_error_capture_handler(struct xe_guc *guc, u32 *msg, u32 len);
>  int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
> +int xe_guc_exec_queue_cgp_context_error_handler(struct xe_guc *guc, u32 *msg,
> +						u32 len);
>  
>  struct xe_guc_submit_exec_queue_snapshot *
>  xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q);
> diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
> index 79a97b086cb2..c9d0748dae9d 100644
> --- a/drivers/gpu/drm/xe/xe_trace.h
> +++ b/drivers/gpu/drm/xe/xe_trace.h
> @@ -172,6 +172,11 @@ DEFINE_EVENT(xe_exec_queue, xe_exec_queue_memory_cat_error,
>  	     TP_ARGS(q)
>  );
>  
> +DEFINE_EVENT(xe_exec_queue, xe_exec_queue_cgp_context_error,
> +	     TP_PROTO(struct xe_exec_queue *q),
> +	     TP_ARGS(q)
> +);
> +
>  DEFINE_EVENT(xe_exec_queue, xe_exec_queue_stop,
>  	     TP_PROTO(struct xe_exec_queue *q),
>  	     TP_ARGS(q)
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 08/16] drm/xe/multi_queue: Add multi queue information to guc_info dump
  2025-11-01 18:31   ` Matthew Brost
@ 2025-11-03  1:15     ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-03  1:15 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Sat, Nov 01, 2025 at 11:31:28AM -0700, Matthew Brost wrote:
>On Fri, Oct 31, 2025 at 11:29:28AM -0700, Niranjana Vishwanathapura wrote:
>> Dump multi queue specific information in the guc exec queue
>> dump.
>>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_guc_submit.c       | 10 ++++++++++
>>  drivers/gpu/drm/xe/xe_guc_submit_types.h | 13 +++++++++++++
>>  2 files changed, 23 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>> index 426b64ef8d99..b84a0be2eefe 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>> @@ -3032,6 +3032,11 @@ xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q)
>>  	if (snapshot->parallel_execution)
>>  		guc_exec_queue_wq_snapshot_capture(q, snapshot);
>>
>> +	snapshot->is_multi_queue = xe_exec_queue_is_multi_queue(q);
>> +	if (snapshot->is_multi_queue) {
>> +		snapshot->multi_queue.primary = xe_exec_queue_multi_queue_primary(q)->guc->id;
>> +		snapshot->multi_queue.pos = q->multi_queue.pos;
>> +	}
>>  	spin_lock(&sched->base.job_list_lock);
>>  	snapshot->pending_list_size = list_count_nodes(&sched->base.pending_list);
>>  	snapshot->pending_list = kmalloc_array(snapshot->pending_list_size,
>> @@ -3114,6 +3119,11 @@ xe_guc_exec_queue_snapshot_print(struct xe_guc_submit_exec_queue_snapshot *snaps
>>  	if (snapshot->parallel_execution)
>>  		guc_exec_queue_wq_snapshot_print(snapshot, p);
>>
>> +	if (snapshot->is_multi_queue) {
>> +		drm_printf(p, "\tMulti queue primary GuC ID: %d\n", snapshot->multi_queue.primary);
>> +		drm_printf(p, "\tMulti queue position: %d\n", snapshot->multi_queue.pos);
>> +	}
>> +
>>  	for (i = 0; snapshot->pending_list && i < snapshot->pending_list_size;
>>  	     i++)
>>  		drm_printf(p, "\tJob: seqno=%d, fence=%d, finished=%d\n",
>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit_types.h b/drivers/gpu/drm/xe/xe_guc_submit_types.h
>> index dc7456c34583..20dddf50d802 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_submit_types.h
>> +++ b/drivers/gpu/drm/xe/xe_guc_submit_types.h
>> @@ -135,6 +135,19 @@ struct xe_guc_submit_exec_queue_snapshot {
>>  		u32 wq[WQ_SIZE / sizeof(u32)];
>>  	} parallel;
>>
>> +	/** @is_multi_queue: The exec queue is part of a multi queue group */
>> +	bool is_multi_queue;
>
>I'd stick this in the sub-structure.
>

I guess, I just followed the 'parallel' structure example above :).
Yah, makes sense. Will put it as 'valid' inside multi_queue struct below.

Niranjana

>Otherwise LGTM.
>
>With this nit fixed:
>Reviewed-by: Matthew Brost <matthew.brost@intel.com>
>
>> +	/** @multi_queue: snapshot of the multi queue information */
>> +	struct {
>> +		/**
>> +		 * @multi_queue.primary: GuC id of the primary exec queue
>> +		 * of the multi queue group.
>> +		 */
>> +		u32 primary;
>> +		/** @multi_queue.pos: Position of the exec queue within the multi queue group */
>> +		u8 pos;
>> +	} multi_queue;
>> +
>>  	/** @pending_list_size: Size of the pending list snapshot array */
>>  	int pending_list_size;
>>  	/** @pending_list: snapshot of the pending list info */
>> --
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/16] drm/xe/multi_queue: Add multi_queue_enable_mask to gt information
  2025-11-02  0:01   ` Matthew Brost
@ 2025-11-03  1:25     ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-03  1:25 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Sat, Nov 01, 2025 at 05:01:44PM -0700, Matthew Brost wrote:
>On Fri, Oct 31, 2025 at 11:29:21AM -0700, Niranjana Vishwanathapura wrote:
>> Add multi_queue_enable_mask field to the gt information structure
>> which is bitmask of all engine classes with multi queue support
>> enabled.
>>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_debugfs.c   | 2 ++
>>  drivers/gpu/drm/xe/xe_gt_types.h  | 5 +++++
>>  drivers/gpu/drm/xe/xe_pci.c       | 1 +
>>  drivers/gpu/drm/xe/xe_pci_types.h | 1 +
>>  4 files changed, 9 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
>> index e91da9589c5f..34460d7ef71c 100644
>> --- a/drivers/gpu/drm/xe/xe_debugfs.c
>> +++ b/drivers/gpu/drm/xe/xe_debugfs.c
>> @@ -93,6 +93,8 @@ static int info(struct seq_file *m, void *data)
>>  			   xe_force_wake_ref(gt_to_fw(gt), XE_FW_GT));
>>  		drm_printf(&p, "gt%d engine_mask 0x%llx\n", id,
>>  			   gt->info.engine_mask);
>> +		drm_printf(&p, "gt%d multi_queue_enable_mask 0x%x\n", id,
>> +			   gt->info.multi_queue_enable_mask);
>>  	}
>>
>>  	xe_pm_runtime_put(xe);
>> diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
>> index 0b525643a048..4a18bf772b22 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_types.h
>> +++ b/drivers/gpu/drm/xe/xe_gt_types.h
>> @@ -140,6 +140,11 @@ struct xe_gt {
>>  		u64 engine_mask;
>>  		/** @info.gmdid: raw GMD_ID value from hardware */
>>  		u32 gmdid;
>> +		/**
>> +		 * @multi_queue_enable_mask: Bitmask of engine classes with
>> +		 * multi queue support enabled.
>> +		 */
>> +		u16 multi_queue_enable_mask;
>
>s/multi_queue_enable_mask/multi_queue_class_enable_mask ?
>

Ok, will rename it as 'multi_queue_engine_class_mask'.

Niranjana

>Matt
>
>>  		/** @info.id: Unique ID of this GT within the PCI Device */
>>  		u8 id;
>>  		/** @info.has_indirect_ring_state: GT has indirect ring state support */
>> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
>> index 6e59642e7820..b5eaf0fc105c 100644
>> --- a/drivers/gpu/drm/xe/xe_pci.c
>> +++ b/drivers/gpu/drm/xe/xe_pci.c
>> @@ -754,6 +754,7 @@ static struct xe_gt *alloc_primary_gt(struct xe_tile *tile,
>>  	gt->info.type = XE_GT_TYPE_MAIN;
>>  	gt->info.id = tile->id * xe->info.max_gt_per_tile;
>>  	gt->info.has_indirect_ring_state = graphics_desc->has_indirect_ring_state;
>> +	gt->info.multi_queue_enable_mask = graphics_desc->multi_queue_enable_mask;
>>  	gt->info.engine_mask = graphics_desc->hw_engine_mask;
>>
>>  	/*
>> diff --git a/drivers/gpu/drm/xe/xe_pci_types.h b/drivers/gpu/drm/xe/xe_pci_types.h
>> index 9892c063a9c5..77e09a53da64 100644
>> --- a/drivers/gpu/drm/xe/xe_pci_types.h
>> +++ b/drivers/gpu/drm/xe/xe_pci_types.h
>> @@ -58,6 +58,7 @@ struct xe_device_desc {
>>
>>  struct xe_graphics_desc {
>>  	u64 hw_engine_mask;	/* hardware engines provided by graphics IP */
>> +	u16 multi_queue_enable_mask; /* bitmask of engine classes which support multi queue */
>>
>>  	u8 has_asid:1;
>>  	u8 has_atomic_enable_pte_bit:1;
>> --
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/16] drm/xe/multi_queue: Add exec_queue set_property ioctl support
  2025-11-02 16:53   ` Matthew Brost
@ 2025-11-03  1:49     ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-03  1:49 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Sun, Nov 02, 2025 at 08:53:51AM -0800, Matthew Brost wrote:
>On Fri, Oct 31, 2025 at 11:29:26AM -0700, Niranjana Vishwanathapura wrote:
>> This patch adds support for exec_queue set_property ioctl.
>> It is derived from the original work which is part of
>> https://patchwork.freedesktop.org/series/112188/
>>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_device.c     |  2 ++
>>  drivers/gpu/drm/xe/xe_exec_queue.c | 31 ++++++++++++++++++++++++++++++
>>  drivers/gpu/drm/xe/xe_exec_queue.h |  2 ++
>>  include/uapi/drm/xe_drm.h          | 24 +++++++++++++++++++++++
>>  4 files changed, 59 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>> index 47f5391ad8e9..0b496676527a 100644
>> --- a/drivers/gpu/drm/xe/xe_device.c
>> +++ b/drivers/gpu/drm/xe/xe_device.c
>> @@ -208,6 +208,8 @@ static const struct drm_ioctl_desc xe_ioctls[] = {
>>  	DRM_IOCTL_DEF_DRV(XE_MADVISE, xe_vm_madvise_ioctl, DRM_RENDER_ALLOW),
>>  	DRM_IOCTL_DEF_DRV(XE_VM_QUERY_MEM_RANGE_ATTRS, xe_vm_query_vmas_attrs_ioctl,
>>  			  DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(XE_EXEC_QUEUE_SET_PROPERTY, xe_exec_queue_set_property_ioctl,
>> +			  DRM_RENDER_ALLOW),
>>  };
>>
>>  static long xe_drm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>> index 78b3a0e2ddd3..0264cab00fd4 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>> @@ -747,6 +747,37 @@ static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
>>  							exec_queue_set_multi_queue_priority,
>>  };
>>
>> +int xe_exec_queue_set_property_ioctl(struct drm_device *dev, void *data,
>> +				     struct drm_file *file)
>> +{
>> +	struct xe_device *xe = to_xe_device(dev);
>> +	struct xe_file *xef = to_xe_file(file);
>> +	struct drm_xe_exec_queue_set_property *args = data;
>> +	struct xe_exec_queue *q;
>> +	int ret;
>> +	u32 idx;
>> +
>> +	if (XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
>> +		return -EINVAL;
>> +
>> +	q = xe_exec_queue_lookup(xef, args->exec_queue_id);
>> +	if (XE_IOCTL_DBG(xe, !q))
>> +		return -ENOENT;
>> +
>
>I didn't realize this was new code, so my comment here [1] about
>dropping the check around
>DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY is wrong, rather
>pull that check into this patch.
>
>[1] https://patchwork.freedesktop.org/patch/684856/?series=156865&rev=1#comment_1257589
>

My plan was to keep this patch generic and not mention multi-queue,
though currently that is the only use case. But I guess, it need not
be so. Ok, will move DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY check
into this patch.

>> +	idx = array_index_nospec(args->property,
>> +				 ARRAY_SIZE(exec_queue_set_property_funcs));
>> +	ret = exec_queue_set_property_funcs[idx](xe, q, args->value);
>> +	if (XE_IOCTL_DBG(xe, ret))
>> +		goto err_post_lookup;
>> +
>> +	xe_exec_queue_put(q);
>> +	return 0;
>> +
>> + err_post_lookup:
>> +	xe_exec_queue_put(q);
>> +	return ret;
>> +}
>> +
>>  static int exec_queue_user_ext_check(struct xe_exec_queue *q, u64 properties)
>>  {
>>  	u64 secondary_queue_valid_props = BIT_ULL(DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP) |
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
>> index 8cd6487018fa..61478b2e883b 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue.h
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
>> @@ -121,6 +121,8 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
>>  				struct drm_file *file);
>>  int xe_exec_queue_get_property_ioctl(struct drm_device *dev, void *data,
>>  				     struct drm_file *file);
>> +int xe_exec_queue_set_property_ioctl(struct drm_device *dev, void *data,
>> +				     struct drm_file *file);
>>  enum xe_exec_queue_priority xe_exec_queue_device_get_max_priority(struct xe_device *xe);
>>
>>  void xe_exec_queue_last_fence_put(struct xe_exec_queue *e, struct xe_vm *vm);
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index 8ab44413646a..d72151163e77 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -106,6 +106,7 @@ extern "C" {
>>  #define DRM_XE_OBSERVATION		0x0b
>>  #define DRM_XE_MADVISE			0x0c
>>  #define DRM_XE_VM_QUERY_MEM_RANGE_ATTRS	0x0d
>> +#define DRM_XE_EXEC_QUEUE_SET_PROPERTY	0x0e
>>
>>  /* Must be kept compact -- no holes */
>>
>> @@ -123,6 +124,7 @@ extern "C" {
>>  #define DRM_IOCTL_XE_OBSERVATION		DRM_IOW(DRM_COMMAND_BASE + DRM_XE_OBSERVATION, struct drm_xe_observation_param)
>>  #define DRM_IOCTL_XE_MADVISE			DRM_IOW(DRM_COMMAND_BASE + DRM_XE_MADVISE, struct drm_xe_madvise)
>>  #define DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS	DRM_IOWR(DRM_COMMAND_BASE + DRM_XE_VM_QUERY_MEM_RANGE_ATTRS, struct drm_xe_vm_query_mem_range_attr)
>> +#define DRM_IOCTL_XE_EXEC_QUEUE_SET_PROPERTY	DRM_IOW(DRM_COMMAND_BASE + DRM_XE_EXEC_QUEUE_SET_PROPERTY, struct drm_xe_exec_queue_set_property)
>>
>>  /**
>>   * DOC: Xe IOCTL Extensions
>> @@ -2284,6 +2286,28 @@ struct drm_xe_vm_query_mem_range_attr {
>>
>>  };
>>
>> +/**
>> + * struct drm_xe_exec_queue_set_property - exec queue set property
>> + *
>> + * Sets execution queue properties dynamically.
>
>Mention which properties this IOCTL is valid for.
>

Ok.

Niranjana

>Matt
>
>> + */
>> +struct drm_xe_exec_queue_set_property {
>> +	/** @extensions: Pointer to the first extension struct, if any */
>> +	__u64 extensions;
>> +
>> +	/** @exec_queue_id: Exec queue ID */
>> +	__u32 exec_queue_id;
>> +
>> +	/** @property: property to set */
>> +	__u32 property;
>> +
>> +	/** @value: property value */
>> +	__u64 value;
>> +
>> +	/** @reserved: Reserved */
>> +	__u64 reserved[2];
>> +};
>> +
>>  #if defined(__cplusplus)
>>  }
>>  #endif
>> --
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/16] drm/xe/multi_queue: Add multi queue priority property
  2025-11-01 23:59   ` Matthew Brost
@ 2025-11-03  4:45     ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-03  4:45 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Sat, Nov 01, 2025 at 04:59:03PM -0700, Matthew Brost wrote:
>On Fri, Oct 31, 2025 at 11:29:24AM -0700, Niranjana Vishwanathapura wrote:
>> Add support for queues of a multi queue group to set
>> their priority within the queue group by adding property
>> DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY.
>> This is the only other property supported by secondary
>> queues of a multi queue group, other than
>> DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE.
>>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_exec_queue.c       | 17 ++++++++++++-
>>  drivers/gpu/drm/xe/xe_exec_queue_types.h |  8 ++++++
>>  drivers/gpu/drm/xe/xe_guc_submit.c       |  1 +
>>  drivers/gpu/drm/xe/xe_lrc.c              | 32 ++++++++++++++++++++++++
>>  drivers/gpu/drm/xe/xe_lrc.h              |  5 ++++
>>  include/uapi/drm/xe_drm.h                |  3 +++
>>  6 files changed, 65 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>> index 86404a7c9fe4..0da256428916 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>> @@ -177,6 +177,7 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
>>  	INIT_LIST_HEAD(&q->multi_gt_link);
>>  	INIT_LIST_HEAD(&q->hw_engine_group_link);
>>  	INIT_LIST_HEAD(&q->pxp.link);
>> +	q->multi_queue.priority = XE_MULTI_QUEUE_PRIORITY_NORMAL;
>>
>>  	q->sched_props.timeslice_us = hwe->eclass->sched_props.timeslice_us;
>>  	q->sched_props.preempt_timeout_us =
>> @@ -722,6 +723,17 @@ static int exec_queue_set_multi_group(struct xe_device *xe, struct xe_exec_queue
>>  	return xe_exec_queue_group_validate(xe, q, value);
>>  }
>>
>> +static int exec_queue_set_multi_queue_priority(struct xe_device *xe, struct xe_exec_queue *q,
>> +					       u64 value)
>> +{
>> +	if (XE_IOCTL_DBG(xe, value > XE_MULTI_QUEUE_PRIORITY_HIGH))
>> +		return -EINVAL;
>> +
>> +	q->multi_queue.priority = value;
>> +
>> +	return 0;
>> +}
>> +
>>  typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
>>  					     struct xe_exec_queue *q,
>>  					     u64 value);
>> @@ -731,6 +743,8 @@ static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
>>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE] = exec_queue_set_timeslice,
>>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE] = exec_queue_set_pxp_type,
>>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP] = exec_queue_set_multi_group,
>> +	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY] =
>> +							exec_queue_set_multi_queue_priority,
>>  };
>>
>>  static int exec_queue_user_ext_set_property(struct xe_device *xe,
>> @@ -752,7 +766,8 @@ static int exec_queue_user_ext_set_property(struct xe_device *xe,
>>  	    XE_IOCTL_DBG(xe, ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY &&
>>  			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE &&
>>  			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE &&
>> -			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP))
>> +			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP &&
>> +			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY))
>>  		return -EINVAL;
>>
>>  	idx = array_index_nospec(ext.property, ARRAY_SIZE(exec_queue_set_property_funcs));
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> index 38e47b003259..964a0e6654c7 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> @@ -31,6 +31,12 @@ enum xe_exec_queue_priority {
>>  	XE_EXEC_QUEUE_PRIORITY_COUNT
>>  };
>>
>> +enum xe_multi_queue_priority {
>> +	XE_MULTI_QUEUE_PRIORITY_LOW = 0,
>> +	XE_MULTI_QUEUE_PRIORITY_NORMAL,
>> +	XE_MULTI_QUEUE_PRIORITY_HIGH,
>> +};
>
>Kernel doc.

Ok

>
>> +
>>  /**
>>   * struct xe_exec_queue_group - Execution multi queue group
>>   *
>> @@ -134,6 +140,8 @@ struct xe_exec_queue {
>>  	struct {
>>  		/** @multi_queue.group: Queue group information */
>>  		struct xe_exec_queue_group *group;
>> +		/** @multi_queue.priority: Queue priority within the multi-queue group */
>> +		enum xe_multi_queue_priority priority;
>>  		/** @multi_queue.pos: Position of queue within the multi-queue group */
>>  		u8 pos;
>>  		/** @multi_queue.valid: Queue belongs to a multi queue group */
>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>> index d2aa9a2524e7..5ec144c1c2dc 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>> @@ -634,6 +634,7 @@ static void xe_guc_exec_queue_group_cgp_sync(struct xe_guc *guc,
>>  		return;
>>  	}
>>
>> +	xe_lrc_set_multi_queue_priority(q->lrc[0], q->multi_queue.priority);
>>  	xe_guc_exec_queue_group_cgp_update(xe, q);
>>
>>  	WRITE_ONCE(group->sync_pending, true);
>> diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
>> index b5083c99dd50..45fc5bc5de5c 100644
>> --- a/drivers/gpu/drm/xe/xe_lrc.c
>> +++ b/drivers/gpu/drm/xe/xe_lrc.c
>> @@ -44,6 +44,11 @@
>>  #define LRC_INDIRECT_CTX_BO_SIZE		SZ_4K
>>  #define LRC_INDIRECT_RING_STATE_SIZE		SZ_4K
>>
>> +#define LRC_PRIORITY				GENMASK_ULL(10, 9)
>> +#define LRC_PRIORITY_LOW			0
>> +#define LRC_PRIORITY_NORMAL			1
>> +#define LRC_PRIORITY_HIGH			2
>> +
>>  /*
>>   * Layout of the LRC and associated data allocated as
>>   * lrc->bo:
>> @@ -1386,6 +1391,33 @@ setup_indirect_ctx(struct xe_lrc *lrc, struct xe_hw_engine *hwe)
>>  	return 0;
>>  }
>>
>> +static u8 xe_multi_queue_prio_to_lrc(struct xe_lrc *lrc, enum xe_multi_queue_priority priority)
>> +{
>> +	struct xe_device *xe = gt_to_xe(lrc->gt);
>> +
>> +	/* xe_multi_queue_priority is directly mapped to LRC priority values */
>> +	if (priority >= XE_MULTI_QUEUE_PRIORITY_LOW &&
>> +	    priority <= XE_MULTI_QUEUE_PRIORITY_HIGH)
>> +		return priority;
>
>You santize at the IOCTL layer, so an assert here would be preferred.
>

Ok

>> +
>> +	/* Fallback to NORMAL if out of range */
>> +	drm_warn(&xe->drm, "Unknown multi queue priority: %d, defaulting to NORMAL\n", priority);
>> +	return LRC_PRIORITY_NORMAL;
>> +}
>> +
>> +/**
>> + * xe_lrc_set_multi_queue_priority() - Set multi queue priority in LRC
>> + * @lrc: Logical Ring Context
>> + * @priority: Multi queue priority of the exec queue
>> + *
>> + * Convert @priority to LRC multi queue priority and update the @lrc descriptor
>> + */
>> +void xe_lrc_set_multi_queue_priority(struct xe_lrc *lrc, enum xe_multi_queue_priority priority)
>> +{
>> +	lrc->desc &= ~LRC_PRIORITY;
>> +	lrc->desc |= FIELD_PREP(LRC_PRIORITY, xe_multi_queue_prio_to_lrc(lrc, priority));
>> +}
>> +
>>  static int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
>>  		       struct xe_vm *vm, u32 ring_size, u16 msix_vec,
>>  		       u32 init_flags)
>> diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h
>> index 2fb628da5c43..3e6b356e0d1c 100644
>> --- a/drivers/gpu/drm/xe/xe_lrc.h
>> +++ b/drivers/gpu/drm/xe/xe_lrc.h
>> @@ -8,11 +8,14 @@
>>  #include <linux/types.h>
>>
>>  #include "xe_lrc_types.h"
>> +#include "xe_exec_queue_types.h"
>>
>>  struct drm_printer;
>>  struct xe_bb;
>>  struct xe_device;
>>  struct xe_exec_queue;
>> +enum xe_exec_queue_priority;
>
>Never needed in this file.
>

Ok

>> +enum xe_multi_queue_priority;
>
>No need to forward declare if xe_exec_queue_types.h is included. If this
>compiles without "xe_exec_queue_types.h", please drop that include. If
>it doesn't compile, drop this forward declaration.
>

Ok

>>  enum xe_engine_class;
>>  struct xe_gt;
>>  struct xe_hw_engine;
>> @@ -133,6 +136,8 @@ void xe_lrc_dump_default(struct drm_printer *p,
>>
>>  u32 *xe_lrc_emit_hwe_state_instructions(struct xe_exec_queue *q, u32 *cs);
>>
>> +void xe_lrc_set_multi_queue_priority(struct xe_lrc *lrc, enum xe_multi_queue_priority priority);
>> +
>>  struct xe_lrc_snapshot *xe_lrc_snapshot_capture(struct xe_lrc *lrc);
>>  void xe_lrc_snapshot_capture_delayed(struct xe_lrc_snapshot *snapshot);
>>  void xe_lrc_snapshot_print(struct xe_lrc_snapshot *snapshot, struct drm_printer *p);
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index d903b3a55ec1..8ab44413646a 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -1258,6 +1258,8 @@ struct drm_xe_vm_bind {
>>   *    then a new multi-queue group is created with this queue as the primary queue
>>   *    (Q0). Otherwise, the queue gets added to the multi-queue group whose primary
>>   *    queue id is specified in the 'value' field.
>> + *  - %DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY - Set the queue
>> + *    priority within the multi-queue group.
>
>Should the valid values be in uAPI as defines? At the minimum the valid
>values should be mentioned in the kernel doc.
>

It is defined below. Also mentioned in the kernel-doc.

Niranjana

>Matt
>
>>   *
>>   * The example below shows how to use @drm_xe_exec_queue_create to create
>>   * a simple exec_queue (no parallel submission) of class
>> @@ -1300,6 +1302,7 @@ struct drm_xe_exec_queue_create {
>>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE		2
>>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP		3
>>  #define     DRM_XE_MULTI_GROUP_CREATE				(1ull << 63)
>> +#define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY	4
>>  	/** @extensions: Pointer to the first extension struct, if any */
>>  	__u64 extensions;
>>
>> --
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 11/16] drm/xe/multi_queue: Handle CGP context error
  2025-11-02 18:29   ` Matthew Brost
@ 2025-11-03 16:44     ` Niranjana Vishwanathapura
  2025-11-03 17:18       ` Matthew Brost
  0 siblings, 1 reply; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-03 16:44 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Sun, Nov 02, 2025 at 10:29:32AM -0800, Matthew Brost wrote:
>On Fri, Oct 31, 2025 at 11:29:31AM -0700, Niranjana Vishwanathapura wrote:
>> Trigger multi-queue context cleanup upon CGP context error
>> notification from GuC.
>>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/xe/abi/guc_actions_abi.h |  1 +
>>  drivers/gpu/drm/xe/xe_guc_ct.c           |  4 +++
>>  drivers/gpu/drm/xe/xe_guc_submit.c       | 33 ++++++++++++++++++++++++
>>  drivers/gpu/drm/xe/xe_guc_submit.h       |  2 ++
>>  drivers/gpu/drm/xe/xe_trace.h            |  5 ++++
>>  5 files changed, 45 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/abi/guc_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
>> index 3e9fbed9cda6..8af3691626bf 100644
>> --- a/drivers/gpu/drm/xe/abi/guc_actions_abi.h
>> +++ b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
>> @@ -142,6 +142,7 @@ enum xe_guc_action {
>>  	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE = 0x4602,
>>  	XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC = 0x4603,
>>  	XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE = 0x4604,
>> +	XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CGP_CONTEXT_ERROR = 0x4605,
>>  	XE_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507,
>>  	XE_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A,
>>  	XE_GUC_ACTION_SET_DEVICE_ENGINE_ACTIVITY_BUFFER = 0x550C,
>> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
>> index 48b5006eb080..d0e19af0b4d2 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
>> @@ -1574,6 +1574,10 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
>>  	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
>>  		ret = xe_guc_exec_queue_cgp_sync_done_handler(guc, payload, adj_len);
>>  		break;
>> +	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CGP_CONTEXT_ERROR:
>> +		ret = xe_guc_exec_queue_cgp_context_error_handler(guc, payload,
>> +								  adj_len);
>> +		break;
>>  	default:
>>  		xe_gt_err(gt, "unexpected G2H action 0x%04x\n", action);
>>  	}
>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>> index 87c13feb2cef..605352145d76 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>> @@ -48,6 +48,8 @@
>>  #include "xe_vm.h"
>>  #include "xe_bo.h"
>>
>> +#define XE_GUC_EXEC_QUEUE_CGP_CONTEXT_ERROR_LEN		6
>> +
>>  static struct xe_guc *
>>  exec_queue_to_guc(struct xe_exec_queue *q)
>>  {
>> @@ -3001,6 +3003,37 @@ int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 le
>>  	return 0;
>>  }
>>
>> +int xe_guc_exec_queue_cgp_context_error_handler(struct xe_guc *guc, u32 *msg,
>> +						u32 len)
>> +{
>> +	struct xe_gt *gt = guc_to_gt(guc);
>> +	struct xe_device *xe = guc_to_xe(guc);
>> +	struct xe_exec_queue *q;
>> +	u32 guc_id = msg[2];
>> +
>> +	if (unlikely(len != XE_GUC_EXEC_QUEUE_CGP_CONTEXT_ERROR_LEN)) {
>> +		drm_err(&xe->drm, "Invalid length %u", len);
>> +		return -EPROTO;
>> +	}
>> +
>> +	q = g2h_exec_queue_lookup(guc, guc_id);
>> +	if (unlikely(!q))
>> +		return -EPROTO;
>> +
>> +	xe_gt_dbg(gt,
>> +		  "CGP context error: region=%s err=0x%x, context=0x%x LRCA=0x%x:0x%x SgId=0x%x",
>> +		  msg[0] & 1 ? "uc" : "kmd", msg[1], msg[2], msg[4], msg[3], msg[5]);
>> +
>> +	trace_xe_exec_queue_cgp_context_error(q);
>> +
>> +	/* Treat the same as engine reset */
>> +	set_exec_queue_reset(q);
>> +	if (!exec_queue_banned(q) && !exec_queue_check_timeout(q))
>
>I don't think you need the exec_queue_check_timeout check.
>

The check here is same as in other guc error handlers like
xe_guc_exec_queue_reset_handler() and xe_guc_exec_queue_memory_cat_error_handler().
Hence the reason to keep it here also. Doesn't exec_queue_check_timeout()
mean TDR is already underway?

Niranjana

>Otherwise LGTM.
>
>Matt
>
>> +		xe_guc_exec_queue_trigger_cleanup(q);
>> +
>> +	return 0;
>> +}
>> +
>>  /**
>>   * xe_guc_exec_queue_cgp_sync_done_handler - CGP synchronization done handler
>>   * @guc: guc
>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
>> index abfa94bce391..01b013a90b1b 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_submit.h
>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.h
>> @@ -35,6 +35,8 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
>>  int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 len);
>>  int xe_guc_error_capture_handler(struct xe_guc *guc, u32 *msg, u32 len);
>>  int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
>> +int xe_guc_exec_queue_cgp_context_error_handler(struct xe_guc *guc, u32 *msg,
>> +						u32 len);
>>
>>  struct xe_guc_submit_exec_queue_snapshot *
>>  xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q);
>> diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
>> index 79a97b086cb2..c9d0748dae9d 100644
>> --- a/drivers/gpu/drm/xe/xe_trace.h
>> +++ b/drivers/gpu/drm/xe/xe_trace.h
>> @@ -172,6 +172,11 @@ DEFINE_EVENT(xe_exec_queue, xe_exec_queue_memory_cat_error,
>>  	     TP_ARGS(q)
>>  );
>>
>> +DEFINE_EVENT(xe_exec_queue, xe_exec_queue_cgp_context_error,
>> +	     TP_PROTO(struct xe_exec_queue *q),
>> +	     TP_ARGS(q)
>> +);
>> +
>>  DEFINE_EVENT(xe_exec_queue, xe_exec_queue_stop,
>>  	     TP_PROTO(struct xe_exec_queue *q),
>>  	     TP_ARGS(q)
>> --
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 10/16] drm/xe/multi_queue: Set QUEUE_DRAIN_MODE for Multi Queue batches
  2025-11-02 18:22   ` Matthew Brost
@ 2025-11-03 17:09     ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-03 17:09 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Sun, Nov 02, 2025 at 10:22:36AM -0800, Matthew Brost wrote:
>On Fri, Oct 31, 2025 at 11:29:30AM -0700, Niranjana Vishwanathapura wrote:
>> To properly support soft light restore between batches
>> being arbitrated at the CFEG, PIPE_CONTROL instructions
>> have a new bit in the first DW, QUEUE_DRAIN_MODE. When
>> set, this indicates to the CFEG that it should only
>> drain the current queue.
>>
>> Additionally we no longer want to set the CS_STALL bit
>> for these multi queue queues as this causes the entire
>> pipeline to stall waiting for completion of the prior
>> batch, preventing this soft light restore from occurring
>> between queues in a queue group.
>>
>
>Bspec: <number>
>
>This will help in the review.
>

56551.
Ok, will add it in commit message.

Niranjana

>Matt
>
>> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> ---
>>  .../gpu/drm/xe/instructions/xe_gpu_commands.h |  1 +
>>  drivers/gpu/drm/xe/xe_ring_ops.c              | 68 ++++++++++++-------
>>  2 files changed, 45 insertions(+), 24 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
>> index 5d41ca297447..885fcf211e6d 100644
>> --- a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
>> +++ b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
>> @@ -47,6 +47,7 @@
>>
>>  #define GFX_OP_PIPE_CONTROL(len)	((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2))
>>
>> +#define   PIPE_CONTROL0_QUEUE_DRAIN_MODE		BIT(12)
>>  #define	  PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE	BIT(10)	/* gen12 */
>>  #define	  PIPE_CONTROL0_HDC_PIPELINE_FLUSH		BIT(9)	/* gen12 */
>>
>> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
>> index ac0c6dcffe15..71f0e19fe8ba 100644
>> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
>> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
>> @@ -12,7 +12,7 @@
>>  #include "regs/xe_engine_regs.h"
>>  #include "regs/xe_gt_regs.h"
>>  #include "regs/xe_lrc_layout.h"
>> -#include "xe_exec_queue_types.h"
>> +#include "xe_exec_queue.h"
>>  #include "xe_gt.h"
>>  #include "xe_lrc.h"
>>  #include "xe_macros.h"
>> @@ -135,12 +135,11 @@ emit_pipe_control(u32 *dw, int i, u32 bit_group_0, u32 bit_group_1, u32 offset,
>>  	return i;
>>  }
>>
>> -static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
>> -				int i)
>> +static int emit_pipe_invalidate(struct xe_exec_queue *q, u32 mask_flags,
>> +				bool invalidate_tlb, u32 *dw, int i)
>>  {
>>  	u32 flags0 = 0;
>> -	u32 flags1 = PIPE_CONTROL_CS_STALL |
>> -		PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
>> +	u32 flags1 = PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
>>  		PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE |
>>  		PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
>>  		PIPE_CONTROL_VF_CACHE_INVALIDATE |
>> @@ -152,6 +151,11 @@ static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
>>  	if (invalidate_tlb)
>>  		flags1 |= PIPE_CONTROL_TLB_INVALIDATE;
>>
>> +	if (xe_exec_queue_is_multi_queue(q))
>> +		flags0 |= PIPE_CONTROL0_QUEUE_DRAIN_MODE;
>> +	else
>> +		flags1 |= PIPE_CONTROL_CS_STALL;
>> +
>>  	flags1 &= ~mask_flags;
>>
>>  	if (flags1 & PIPE_CONTROL_VF_CACHE_INVALIDATE)
>> @@ -175,54 +179,70 @@ static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
>>
>>  static int emit_render_cache_flush(struct xe_sched_job *job, u32 *dw, int i)
>>  {
>> -	struct xe_gt *gt = job->q->gt;
>> +	struct xe_exec_queue *q = job->q;
>> +	struct xe_gt *gt = q->gt;
>>  	bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
>> -	u32 flags;
>> +	u32 flags0, flags1;
>>
>>  	if (XE_GT_WA(gt, 14016712196))
>>  		i = emit_pipe_control(dw, i, 0, PIPE_CONTROL_DEPTH_CACHE_FLUSH,
>>  				      LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
>>
>> -	flags = (PIPE_CONTROL_CS_STALL |
>> -		 PIPE_CONTROL_TILE_CACHE_FLUSH |
>> +	flags0 = PIPE_CONTROL0_HDC_PIPELINE_FLUSH;
>> +	flags1 = (PIPE_CONTROL_TILE_CACHE_FLUSH |
>>  		 PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
>>  		 PIPE_CONTROL_DEPTH_CACHE_FLUSH |
>>  		 PIPE_CONTROL_DC_FLUSH_ENABLE |
>>  		 PIPE_CONTROL_FLUSH_ENABLE);
>>
>>  	if (XE_GT_WA(gt, 1409600907))
>> -		flags |= PIPE_CONTROL_DEPTH_STALL;
>> +		flags1 |= PIPE_CONTROL_DEPTH_STALL;
>>
>>  	if (lacks_render)
>> -		flags &= ~PIPE_CONTROL_3D_ARCH_FLAGS;
>> +		flags1 &= ~PIPE_CONTROL_3D_ARCH_FLAGS;
>>  	else if (job->q->class == XE_ENGINE_CLASS_COMPUTE)
>> -		flags &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
>> +		flags1 &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
>> +
>> +	if (xe_exec_queue_is_multi_queue(q))
>> +		flags0 |= PIPE_CONTROL0_QUEUE_DRAIN_MODE;
>> +	else
>> +		flags1 |= PIPE_CONTROL_CS_STALL;
>>
>> -	return emit_pipe_control(dw, i, PIPE_CONTROL0_HDC_PIPELINE_FLUSH, flags, 0, 0);
>> +	return emit_pipe_control(dw, i, flags0, flags1, 0, 0);
>>  }
>>
>> -static int emit_pipe_control_to_ring_end(struct xe_hw_engine *hwe, u32 *dw, int i)
>> +static int emit_pipe_control_to_ring_end(struct xe_exec_queue *q, u32 *dw, int i)
>>  {
>> +	u32 flags0 = 0, flags1 = PIPE_CONTROL_LRI_POST_SYNC;
>> +	struct xe_hw_engine *hwe = q->hwe;
>> +
>>  	if (hwe->class != XE_ENGINE_CLASS_RENDER)
>>  		return i;
>>
>> +	if (xe_exec_queue_is_multi_queue(q))
>> +		flags0 |= PIPE_CONTROL0_QUEUE_DRAIN_MODE;
>> +
>>  	if (XE_GT_WA(hwe->gt, 16020292621))
>> -		i = emit_pipe_control(dw, i, 0, PIPE_CONTROL_LRI_POST_SYNC,
>> +		i = emit_pipe_control(dw, i, flags0, flags1,
>>  				      RING_NOPID(hwe->mmio_base).addr, 0);
>>
>>  	return i;
>>  }
>>
>> -static int emit_pipe_imm_ggtt(u32 addr, u32 value, bool stall_only, u32 *dw,
>> -			      int i)
>> +static int emit_pipe_imm_ggtt(struct xe_exec_queue *q, u32 addr, u32 value,
>> +			      bool stall_only, u32 *dw, int i)
>>  {
>> -	u32 flags = PIPE_CONTROL_CS_STALL | PIPE_CONTROL_GLOBAL_GTT_IVB |
>> -		    PIPE_CONTROL_QW_WRITE;
>> +	u32 flags0 = 0, flags1 = PIPE_CONTROL_GLOBAL_GTT_IVB | PIPE_CONTROL_QW_WRITE;
>>
>>  	if (!stall_only)
>> -		flags |= PIPE_CONTROL_FLUSH_ENABLE;
>> +		flags1 |= PIPE_CONTROL_FLUSH_ENABLE;
>> +
>> +	if (xe_exec_queue_is_multi_queue(q))
>> +		flags0 |= PIPE_CONTROL0_QUEUE_DRAIN_MODE;
>> +	else
>> +		flags1 |= PIPE_CONTROL_CS_STALL;
>>
>> -	return emit_pipe_control(dw, i, 0, flags, addr, value);
>> +	return emit_pipe_control(dw, i, flags0, flags1, addr, value);
>>  }
>>
>>  static u32 get_ppgtt_flag(struct xe_sched_job *job)
>> @@ -371,7 +391,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
>>  		mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
>>
>>  	/* See __xe_pt_bind_vma() for a discussion on TLB invalidations. */
>> -	i = emit_pipe_invalidate(mask_flags, job->ring_ops_flush_tlb, dw, i);
>> +	i = emit_pipe_invalidate(job->q, mask_flags, job->ring_ops_flush_tlb, dw, i);
>>
>>  	/* hsdes: 1809175790 */
>>  	if (has_aux_ccs(xe))
>> @@ -391,11 +411,11 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
>>  						job->user_fence.value,
>>  						dw, i);
>>
>> -	i = emit_pipe_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc), seqno, lacks_render, dw, i);
>> +	i = emit_pipe_imm_ggtt(job->q, xe_lrc_seqno_ggtt_addr(lrc), seqno, lacks_render, dw, i);
>>
>>  	i = emit_user_interrupt(dw, i);
>>
>> -	i = emit_pipe_control_to_ring_end(job->q->hwe, dw, i);
>> +	i = emit_pipe_control_to_ring_end(job->q, dw, i);
>>
>>  	xe_gt_assert(gt, i <= MAX_JOB_SIZE_DW);
>>
>> --
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 11/16] drm/xe/multi_queue: Handle CGP context error
  2025-11-03 16:44     ` Niranjana Vishwanathapura
@ 2025-11-03 17:18       ` Matthew Brost
  0 siblings, 0 replies; 61+ messages in thread
From: Matthew Brost @ 2025-11-03 17:18 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Mon, Nov 03, 2025 at 08:44:16AM -0800, Niranjana Vishwanathapura wrote:
> On Sun, Nov 02, 2025 at 10:29:32AM -0800, Matthew Brost wrote:
> > On Fri, Oct 31, 2025 at 11:29:31AM -0700, Niranjana Vishwanathapura wrote:
> > > Trigger multi-queue context cleanup upon CGP context error
> > > notification from GuC.
> > > 
> > > Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/abi/guc_actions_abi.h |  1 +
> > >  drivers/gpu/drm/xe/xe_guc_ct.c           |  4 +++
> > >  drivers/gpu/drm/xe/xe_guc_submit.c       | 33 ++++++++++++++++++++++++
> > >  drivers/gpu/drm/xe/xe_guc_submit.h       |  2 ++
> > >  drivers/gpu/drm/xe/xe_trace.h            |  5 ++++
> > >  5 files changed, 45 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/abi/guc_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> > > index 3e9fbed9cda6..8af3691626bf 100644
> > > --- a/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> > > +++ b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> > > @@ -142,6 +142,7 @@ enum xe_guc_action {
> > >  	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE = 0x4602,
> > >  	XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC = 0x4603,
> > >  	XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE = 0x4604,
> > > +	XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CGP_CONTEXT_ERROR = 0x4605,
> > >  	XE_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507,
> > >  	XE_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A,
> > >  	XE_GUC_ACTION_SET_DEVICE_ENGINE_ACTIVITY_BUFFER = 0x550C,
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > index 48b5006eb080..d0e19af0b4d2 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > @@ -1574,6 +1574,10 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
> > >  	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
> > >  		ret = xe_guc_exec_queue_cgp_sync_done_handler(guc, payload, adj_len);
> > >  		break;
> > > +	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CGP_CONTEXT_ERROR:
> > > +		ret = xe_guc_exec_queue_cgp_context_error_handler(guc, payload,
> > > +								  adj_len);
> > > +		break;
> > >  	default:
> > >  		xe_gt_err(gt, "unexpected G2H action 0x%04x\n", action);
> > >  	}
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > index 87c13feb2cef..605352145d76 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > @@ -48,6 +48,8 @@
> > >  #include "xe_vm.h"
> > >  #include "xe_bo.h"
> > > 
> > > +#define XE_GUC_EXEC_QUEUE_CGP_CONTEXT_ERROR_LEN		6
> > > +
> > >  static struct xe_guc *
> > >  exec_queue_to_guc(struct xe_exec_queue *q)
> > >  {
> > > @@ -3001,6 +3003,37 @@ int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 le
> > >  	return 0;
> > >  }
> > > 
> > > +int xe_guc_exec_queue_cgp_context_error_handler(struct xe_guc *guc, u32 *msg,
> > > +						u32 len)
> > > +{
> > > +	struct xe_gt *gt = guc_to_gt(guc);
> > > +	struct xe_device *xe = guc_to_xe(guc);
> > > +	struct xe_exec_queue *q;
> > > +	u32 guc_id = msg[2];
> > > +
> > > +	if (unlikely(len != XE_GUC_EXEC_QUEUE_CGP_CONTEXT_ERROR_LEN)) {
> > > +		drm_err(&xe->drm, "Invalid length %u", len);
> > > +		return -EPROTO;
> > > +	}
> > > +
> > > +	q = g2h_exec_queue_lookup(guc, guc_id);
> > > +	if (unlikely(!q))
> > > +		return -EPROTO;
> > > +
> > > +	xe_gt_dbg(gt,
> > > +		  "CGP context error: region=%s err=0x%x, context=0x%x LRCA=0x%x:0x%x SgId=0x%x",
> > > +		  msg[0] & 1 ? "uc" : "kmd", msg[1], msg[2], msg[4], msg[3], msg[5]);
> > > +
> > > +	trace_xe_exec_queue_cgp_context_error(q);
> > > +
> > > +	/* Treat the same as engine reset */
> > > +	set_exec_queue_reset(q);
> > > +	if (!exec_queue_banned(q) && !exec_queue_check_timeout(q))
> > 
> > I don't think you need the exec_queue_check_timeout check.
> > 
> 
> The check here is same as in other guc error handlers like
> xe_guc_exec_queue_reset_handler() and xe_guc_exec_queue_memory_cat_error_handler().

Ah, it is but it doesn't seem right in those cases either. Maybe I'm
forgeting something, will try page this information back in.

> Hence the reason to keep it here also. Doesn't exec_queue_check_timeout()
> mean TDR is already underway?

Yes, it does.

So with the current state of the code, this is correct, so:
Reviewed-by: Matthew Brost <matthew.brost@intel.com>

If I reason this can be dropped, let's drop it everywhere in a single
shot.

Matt

> 
> Niranjana
> 
> > Otherwise LGTM.
> > 
> > Matt
> > 
> > > +		xe_guc_exec_queue_trigger_cleanup(q);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  /**
> > >   * xe_guc_exec_queue_cgp_sync_done_handler - CGP synchronization done handler
> > >   * @guc: guc
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
> > > index abfa94bce391..01b013a90b1b 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_submit.h
> > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.h
> > > @@ -35,6 +35,8 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
> > >  int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 len);
> > >  int xe_guc_error_capture_handler(struct xe_guc *guc, u32 *msg, u32 len);
> > >  int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
> > > +int xe_guc_exec_queue_cgp_context_error_handler(struct xe_guc *guc, u32 *msg,
> > > +						u32 len);
> > > 
> > >  struct xe_guc_submit_exec_queue_snapshot *
> > >  xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q);
> > > diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
> > > index 79a97b086cb2..c9d0748dae9d 100644
> > > --- a/drivers/gpu/drm/xe/xe_trace.h
> > > +++ b/drivers/gpu/drm/xe/xe_trace.h
> > > @@ -172,6 +172,11 @@ DEFINE_EVENT(xe_exec_queue, xe_exec_queue_memory_cat_error,
> > >  	     TP_ARGS(q)
> > >  );
> > > 
> > > +DEFINE_EVENT(xe_exec_queue, xe_exec_queue_cgp_context_error,
> > > +	     TP_PROTO(struct xe_exec_queue *q),
> > > +	     TP_ARGS(q)
> > > +);
> > > +
> > >  DEFINE_EVENT(xe_exec_queue, xe_exec_queue_stop,
> > >  	     TP_PROTO(struct xe_exec_queue *q),
> > >  	     TP_ARGS(q)
> > > --
> > > 2.43.0
> > > 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/16] drm/xe/multi_queue: Add support for multi queue dynamic priority change
  2025-11-01 23:23   ` Matthew Brost
@ 2025-11-03 18:06     ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-03 18:06 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Sat, Nov 01, 2025 at 04:23:28PM -0700, Matthew Brost wrote:
>On Fri, Oct 31, 2025 at 11:29:27AM -0700, Niranjana Vishwanathapura wrote:
>> Support dynamic priority change for multi queue group queues via
>> exec queue set_property ioctl. Issue CGP_SYNC command to GuC through
>> the drm scheduler message interface for priority to take effect.
>>
>> Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_exec_queue.c       | 12 ++++-
>>  drivers/gpu/drm/xe/xe_exec_queue_types.h |  3 ++
>>  drivers/gpu/drm/xe/xe_guc_submit.c       | 56 ++++++++++++++++++++++--
>>  3 files changed, 65 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>> index 0264cab00fd4..98f8f1c7f13b 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>> @@ -729,9 +729,13 @@ static int exec_queue_set_multi_queue_priority(struct xe_device *xe, struct xe_e
>>  	if (XE_IOCTL_DBG(xe, value > XE_MULTI_QUEUE_PRIORITY_HIGH))
>>  		return -EINVAL;
>>
>> -	q->multi_queue.priority = value;
>> +	/* For queue creation time (!q->xef) setting, just store the priority value */
>> +	if (!q->xef) {
>> +		q->multi_queue.priority = value;
>> +		return 0;
>> +	}
>>
>> -	return 0;
>> +	return q->ops->set_multi_queue_priority(q, value);
>>  }
>>
>>  typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
>> @@ -760,6 +764,10 @@ int xe_exec_queue_set_property_ioctl(struct drm_device *dev, void *data,
>>  	if (XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
>>  		return -EINVAL;
>>
>> +	if (XE_IOCTL_DBG(xe, args->property !=
>> +			 DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY))
>> +		return -EINVAL;
>
>This check looks to be in the wrong place. There are other valid
>properties, e.g., DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE, etc...
>
>I think CI is complaining about this too.
>

Yah, as you mentioned in other patch, currently only the property
DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY can be dynamically changed.

>> +
>>  	q = xe_exec_queue_lookup(xef, args->exec_queue_id);
>>  	if (XE_IOCTL_DBG(xe, !q))
>>  		return -ENOENT;
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> index 964a0e6654c7..dcb55b069ed8 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> @@ -241,6 +241,9 @@ struct xe_exec_queue_ops {
>>  	int (*set_timeslice)(struct xe_exec_queue *q, u32 timeslice_us);
>>  	/** @set_preempt_timeout: Set preemption timeout for exec queue */
>>  	int (*set_preempt_timeout)(struct xe_exec_queue *q, u32 preempt_timeout_us);
>> +	/** @set_multi_queue_priority: Set multi queue priority */
>> +	int (*set_multi_queue_priority)(struct xe_exec_queue *q,
>> +					enum xe_multi_queue_priority priority);
>>  	/**
>>  	 * @suspend: Suspend exec queue from executing, allowed to be called
>>  	 * multiple times in a row before resume with the caveat that
>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>> index 5ec144c1c2dc..426b64ef8d99 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>> @@ -1761,10 +1761,32 @@ static void __guc_exec_queue_process_msg_resume(struct xe_sched_msg *msg)
>>  	}
>>  }
>>
>> -#define CLEANUP		1	/* Non-zero values to catch uninitialized msg */
>> -#define SET_SCHED_PROPS	2
>> -#define SUSPEND		3
>> -#define RESUME		4
>> +static void __guc_exec_queue_process_msg_set_multi_queue_priority(struct xe_sched_msg *msg)
>> +{
>> +	struct xe_exec_queue *q = msg->private_data;
>> +
>> +	if (guc_exec_queue_allowed_to_change_state(q)) {
>> +#define MAX_MULTI_QUEUE_REG_SIZE        (2)
>> +		struct xe_guc *guc = exec_queue_to_guc(q);
>> +		struct xe_exec_queue_group *group = q->multi_queue.group;
>> +		u32 action[MAX_MULTI_QUEUE_REG_SIZE];
>> +		int len = 0;
>> +
>> +		action[len++] = XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC;
>> +		action[len++] = group->primary->guc->id;
>> +#undef MAX_MULTI_QUEUE_REG_SIZE
>> +
>> +		xe_guc_exec_queue_group_cgp_sync(guc, q, action, len);
>
>Just to confirm - even though multi-group priority isn't part of CGP
>rathe the LRC, a CGP_SYNC is needed?
>

Yes, it is needed. I think guc will issue a lite-restore so that updated
priorities take effect.

>> +	}
>> +
>> +	kfree(msg);
>> +}
>> +
>> +#define CLEANUP				1	/* Non-zero values to catch uninitialized msg */
>> +#define SET_SCHED_PROPS			2
>> +#define SUSPEND				3
>> +#define RESUME				4
>> +#define SET_MULTI_QUEUE_PRIORITY	5
>>  #define OPCODE_MASK	0xf
>>  #define MSG_LOCKED	BIT(8)
>>  #define MSG_HEAD	BIT(9)
>> @@ -1788,6 +1810,9 @@ static void guc_exec_queue_process_msg(struct xe_sched_msg *msg)
>>  	case RESUME:
>>  		__guc_exec_queue_process_msg_resume(msg);
>>  		break;
>> +	case SET_MULTI_QUEUE_PRIORITY:
>> +		__guc_exec_queue_process_msg_set_multi_queue_priority(msg);
>> +		break;
>>  	default:
>>  		XE_WARN_ON("Unknown message type");
>>  	}
>> @@ -2004,6 +2029,28 @@ static int guc_exec_queue_set_preempt_timeout(struct xe_exec_queue *q,
>>  	return 0;
>>  }
>>
>> +static int guc_exec_queue_set_multi_queue_priority(struct xe_exec_queue *q,
>> +						   enum xe_multi_queue_priority priority)
>> +{
>> +	struct xe_sched_msg *msg;
>> +
>> +	if (!xe_exec_queue_is_multi_queue(q))
>> +		return -EINVAL;
>
>Minor nit, I'd move this check to the caller as
>xe_exec_queue_is_multi_queue is available there, then assert it is a
>xe_exec_queue_is_multi_queue here.
>

Ok, will do.

Niranjana

>Matt
>
>> +
>> +	if (q->multi_queue.priority == priority ||
>> +	    exec_queue_killed_or_banned_or_wedged(q))
>> +		return 0;
>> +
>> +	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
>> +	if (!msg)
>> +		return -ENOMEM;
>> +
>> +	q->multi_queue.priority = priority;
>> +	guc_exec_queue_add_msg(q, msg, SET_MULTI_QUEUE_PRIORITY);
>> +
>> +	return 0;
>> +}
>> +
>>  static int guc_exec_queue_suspend(struct xe_exec_queue *q)
>>  {
>>  	struct xe_gpu_scheduler *sched = &q->guc->sched;
>> @@ -2095,6 +2142,7 @@ static const struct xe_exec_queue_ops guc_exec_queue_ops = {
>>  	.set_priority = guc_exec_queue_set_priority,
>>  	.set_timeslice = guc_exec_queue_set_timeslice,
>>  	.set_preempt_timeout = guc_exec_queue_set_preempt_timeout,
>> +	.set_multi_queue_priority = guc_exec_queue_set_multi_queue_priority,
>>  	.suspend = guc_exec_queue_suspend,
>>  	.suspend_wait = guc_exec_queue_suspend_wait,
>>  	.resume = guc_exec_queue_resume,
>> --
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/16] drm/xe/multi_queue: Add support for multi queue dynamic priority change
  2025-11-01 23:41   ` Matthew Brost
@ 2025-11-03 18:14     ` Niranjana Vishwanathapura
  2025-11-03 19:05       ` Matthew Brost
  0 siblings, 1 reply; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-03 18:14 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Sat, Nov 01, 2025 at 04:41:58PM -0700, Matthew Brost wrote:
>On Fri, Oct 31, 2025 at 11:29:27AM -0700, Niranjana Vishwanathapura wrote:
>> Support dynamic priority change for multi queue group queues via
>> exec queue set_property ioctl. Issue CGP_SYNC command to GuC through
>> the drm scheduler message interface for priority to take effect.
>>
>> Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_exec_queue.c       | 12 ++++-
>>  drivers/gpu/drm/xe/xe_exec_queue_types.h |  3 ++
>>  drivers/gpu/drm/xe/xe_guc_submit.c       | 56 ++++++++++++++++++++++--
>>  3 files changed, 65 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>> index 0264cab00fd4..98f8f1c7f13b 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>> @@ -729,9 +729,13 @@ static int exec_queue_set_multi_queue_priority(struct xe_device *xe, struct xe_e
>>  	if (XE_IOCTL_DBG(xe, value > XE_MULTI_QUEUE_PRIORITY_HIGH))
>>  		return -EINVAL;
>>
>> -	q->multi_queue.priority = value;
>> +	/* For queue creation time (!q->xef) setting, just store the priority value */
>> +	if (!q->xef) {
>> +		q->multi_queue.priority = value;
>> +		return 0;
>> +	}
>
>I also don't love this check here as if exec queue creation order
>changes, this code breaks. I'm pretty sure you can just delete this and
>send on the message to the backend given
>guc_exec_queue_allowed_to_change_state check will change the backend op to
>a NOP.
>

The multi_queue_priority propertly can be set either during queue
creation time (via multi_queue_priority extension) or later during
set_property ioctl. This function gets called in both cases.

When this function gets called during the queue creation time (during user
extension parsing), the guc initializaiton of the queue is not yet done
(ie., q->ops->init() is not called yet). So, we shouldn't call the
below q->ops->set_multi_queue_priority() function. That is what the
above code is handling.

Niranjana

>Matt
>
>>
>> -	return 0;
>> +	return q->ops->set_multi_queue_priority(q, value);
>>  }
>>
>>  typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
>> @@ -760,6 +764,10 @@ int xe_exec_queue_set_property_ioctl(struct drm_device *dev, void *data,
>>  	if (XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
>>  		return -EINVAL;
>>
>> +	if (XE_IOCTL_DBG(xe, args->property !=
>> +			 DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY))
>> +		return -EINVAL;
>> +
>>  	q = xe_exec_queue_lookup(xef, args->exec_queue_id);
>>  	if (XE_IOCTL_DBG(xe, !q))
>>  		return -ENOENT;
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> index 964a0e6654c7..dcb55b069ed8 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> @@ -241,6 +241,9 @@ struct xe_exec_queue_ops {
>>  	int (*set_timeslice)(struct xe_exec_queue *q, u32 timeslice_us);
>>  	/** @set_preempt_timeout: Set preemption timeout for exec queue */
>>  	int (*set_preempt_timeout)(struct xe_exec_queue *q, u32 preempt_timeout_us);
>> +	/** @set_multi_queue_priority: Set multi queue priority */
>> +	int (*set_multi_queue_priority)(struct xe_exec_queue *q,
>> +					enum xe_multi_queue_priority priority);
>>  	/**
>>  	 * @suspend: Suspend exec queue from executing, allowed to be called
>>  	 * multiple times in a row before resume with the caveat that
>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>> index 5ec144c1c2dc..426b64ef8d99 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>> @@ -1761,10 +1761,32 @@ static void __guc_exec_queue_process_msg_resume(struct xe_sched_msg *msg)
>>  	}
>>  }
>>
>> -#define CLEANUP		1	/* Non-zero values to catch uninitialized msg */
>> -#define SET_SCHED_PROPS	2
>> -#define SUSPEND		3
>> -#define RESUME		4
>> +static void __guc_exec_queue_process_msg_set_multi_queue_priority(struct xe_sched_msg *msg)
>> +{
>> +	struct xe_exec_queue *q = msg->private_data;
>> +
>> +	if (guc_exec_queue_allowed_to_change_state(q)) {
>> +#define MAX_MULTI_QUEUE_REG_SIZE        (2)
>> +		struct xe_guc *guc = exec_queue_to_guc(q);
>> +		struct xe_exec_queue_group *group = q->multi_queue.group;
>> +		u32 action[MAX_MULTI_QUEUE_REG_SIZE];
>> +		int len = 0;
>> +
>> +		action[len++] = XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC;
>> +		action[len++] = group->primary->guc->id;
>> +#undef MAX_MULTI_QUEUE_REG_SIZE
>> +
>> +		xe_guc_exec_queue_group_cgp_sync(guc, q, action, len);
>> +	}
>> +
>> +	kfree(msg);
>> +}
>> +
>> +#define CLEANUP				1	/* Non-zero values to catch uninitialized msg */
>> +#define SET_SCHED_PROPS			2
>> +#define SUSPEND				3
>> +#define RESUME				4
>> +#define SET_MULTI_QUEUE_PRIORITY	5
>>  #define OPCODE_MASK	0xf
>>  #define MSG_LOCKED	BIT(8)
>>  #define MSG_HEAD	BIT(9)
>> @@ -1788,6 +1810,9 @@ static void guc_exec_queue_process_msg(struct xe_sched_msg *msg)
>>  	case RESUME:
>>  		__guc_exec_queue_process_msg_resume(msg);
>>  		break;
>> +	case SET_MULTI_QUEUE_PRIORITY:
>> +		__guc_exec_queue_process_msg_set_multi_queue_priority(msg);
>> +		break;
>>  	default:
>>  		XE_WARN_ON("Unknown message type");
>>  	}
>> @@ -2004,6 +2029,28 @@ static int guc_exec_queue_set_preempt_timeout(struct xe_exec_queue *q,
>>  	return 0;
>>  }
>>
>> +static int guc_exec_queue_set_multi_queue_priority(struct xe_exec_queue *q,
>> +						   enum xe_multi_queue_priority priority)
>> +{
>> +	struct xe_sched_msg *msg;
>> +
>> +	if (!xe_exec_queue_is_multi_queue(q))
>> +		return -EINVAL;
>> +
>> +	if (q->multi_queue.priority == priority ||
>> +	    exec_queue_killed_or_banned_or_wedged(q))
>> +		return 0;
>> +
>> +	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
>> +	if (!msg)
>> +		return -ENOMEM;
>> +
>> +	q->multi_queue.priority = priority;
>> +	guc_exec_queue_add_msg(q, msg, SET_MULTI_QUEUE_PRIORITY);
>> +
>> +	return 0;
>> +}
>> +
>>  static int guc_exec_queue_suspend(struct xe_exec_queue *q)
>>  {
>>  	struct xe_gpu_scheduler *sched = &q->guc->sched;
>> @@ -2095,6 +2142,7 @@ static const struct xe_exec_queue_ops guc_exec_queue_ops = {
>>  	.set_priority = guc_exec_queue_set_priority,
>>  	.set_timeslice = guc_exec_queue_set_timeslice,
>>  	.set_preempt_timeout = guc_exec_queue_set_preempt_timeout,
>> +	.set_multi_queue_priority = guc_exec_queue_set_multi_queue_priority,
>>  	.suspend = guc_exec_queue_suspend,
>>  	.suspend_wait = guc_exec_queue_suspend_wait,
>>  	.resume = guc_exec_queue_resume,
>> --
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/16] drm/xe/multi_queue: Add support for multi queue dynamic priority change
  2025-11-03 18:14     ` Niranjana Vishwanathapura
@ 2025-11-03 19:05       ` Matthew Brost
  0 siblings, 0 replies; 61+ messages in thread
From: Matthew Brost @ 2025-11-03 19:05 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Mon, Nov 03, 2025 at 10:14:28AM -0800, Niranjana Vishwanathapura wrote:
> On Sat, Nov 01, 2025 at 04:41:58PM -0700, Matthew Brost wrote:
> > On Fri, Oct 31, 2025 at 11:29:27AM -0700, Niranjana Vishwanathapura wrote:
> > > Support dynamic priority change for multi queue group queues via
> > > exec queue set_property ioctl. Issue CGP_SYNC command to GuC through
> > > the drm scheduler message interface for priority to take effect.
> > > 
> > > Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
> > > Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_exec_queue.c       | 12 ++++-
> > >  drivers/gpu/drm/xe/xe_exec_queue_types.h |  3 ++
> > >  drivers/gpu/drm/xe/xe_guc_submit.c       | 56 ++++++++++++++++++++++--
> > >  3 files changed, 65 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > index 0264cab00fd4..98f8f1c7f13b 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> > > +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> > > @@ -729,9 +729,13 @@ static int exec_queue_set_multi_queue_priority(struct xe_device *xe, struct xe_e
> > >  	if (XE_IOCTL_DBG(xe, value > XE_MULTI_QUEUE_PRIORITY_HIGH))
> > >  		return -EINVAL;
> > > 
> > > -	q->multi_queue.priority = value;
> > > +	/* For queue creation time (!q->xef) setting, just store the priority value */
> > > +	if (!q->xef) {
> > > +		q->multi_queue.priority = value;
> > > +		return 0;
> > > +	}
> > 
> > I also don't love this check here as if exec queue creation order
> > changes, this code breaks. I'm pretty sure you can just delete this and
> > send on the message to the backend given
> > guc_exec_queue_allowed_to_change_state check will change the backend op to
> > a NOP.
> > 
> 
> The multi_queue_priority propertly can be set either during queue
> creation time (via multi_queue_priority extension) or later during
> set_property ioctl. This function gets called in both cases.
> 
> When this function gets called during the queue creation time (during user
> extension parsing), the guc initializaiton of the queue is not yet done
> (ie., q->ops->init() is not called yet). So, we shouldn't call the
> below q->ops->set_multi_queue_priority() function. That is what the
> above code is handling.
> 

I thinking thevfunc '->init' would have been called by the time the
extension was parsed, but I guess it isn't. I still don't love tying
this to q->xref was would be set that eariler but I guess this ok for
now.

Matt

> Niranjana
> 
> > Matt
> > 
> > > 
> > > -	return 0;
> > > +	return q->ops->set_multi_queue_priority(q, value);
> > >  }
> > > 
> > >  typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
> > > @@ -760,6 +764,10 @@ int xe_exec_queue_set_property_ioctl(struct drm_device *dev, void *data,
> > >  	if (XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
> > >  		return -EINVAL;
> > > 
> > > +	if (XE_IOCTL_DBG(xe, args->property !=
> > > +			 DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY))
> > > +		return -EINVAL;
> > > +
> > >  	q = xe_exec_queue_lookup(xef, args->exec_queue_id);
> > >  	if (XE_IOCTL_DBG(xe, !q))
> > >  		return -ENOENT;
> > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > index 964a0e6654c7..dcb55b069ed8 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > @@ -241,6 +241,9 @@ struct xe_exec_queue_ops {
> > >  	int (*set_timeslice)(struct xe_exec_queue *q, u32 timeslice_us);
> > >  	/** @set_preempt_timeout: Set preemption timeout for exec queue */
> > >  	int (*set_preempt_timeout)(struct xe_exec_queue *q, u32 preempt_timeout_us);
> > > +	/** @set_multi_queue_priority: Set multi queue priority */
> > > +	int (*set_multi_queue_priority)(struct xe_exec_queue *q,
> > > +					enum xe_multi_queue_priority priority);
> > >  	/**
> > >  	 * @suspend: Suspend exec queue from executing, allowed to be called
> > >  	 * multiple times in a row before resume with the caveat that
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > index 5ec144c1c2dc..426b64ef8d99 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > @@ -1761,10 +1761,32 @@ static void __guc_exec_queue_process_msg_resume(struct xe_sched_msg *msg)
> > >  	}
> > >  }
> > > 
> > > -#define CLEANUP		1	/* Non-zero values to catch uninitialized msg */
> > > -#define SET_SCHED_PROPS	2
> > > -#define SUSPEND		3
> > > -#define RESUME		4
> > > +static void __guc_exec_queue_process_msg_set_multi_queue_priority(struct xe_sched_msg *msg)
> > > +{
> > > +	struct xe_exec_queue *q = msg->private_data;
> > > +
> > > +	if (guc_exec_queue_allowed_to_change_state(q)) {
> > > +#define MAX_MULTI_QUEUE_REG_SIZE        (2)
> > > +		struct xe_guc *guc = exec_queue_to_guc(q);
> > > +		struct xe_exec_queue_group *group = q->multi_queue.group;
> > > +		u32 action[MAX_MULTI_QUEUE_REG_SIZE];
> > > +		int len = 0;
> > > +
> > > +		action[len++] = XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC;
> > > +		action[len++] = group->primary->guc->id;
> > > +#undef MAX_MULTI_QUEUE_REG_SIZE
> > > +
> > > +		xe_guc_exec_queue_group_cgp_sync(guc, q, action, len);
> > > +	}
> > > +
> > > +	kfree(msg);
> > > +}
> > > +
> > > +#define CLEANUP				1	/* Non-zero values to catch uninitialized msg */
> > > +#define SET_SCHED_PROPS			2
> > > +#define SUSPEND				3
> > > +#define RESUME				4
> > > +#define SET_MULTI_QUEUE_PRIORITY	5
> > >  #define OPCODE_MASK	0xf
> > >  #define MSG_LOCKED	BIT(8)
> > >  #define MSG_HEAD	BIT(9)
> > > @@ -1788,6 +1810,9 @@ static void guc_exec_queue_process_msg(struct xe_sched_msg *msg)
> > >  	case RESUME:
> > >  		__guc_exec_queue_process_msg_resume(msg);
> > >  		break;
> > > +	case SET_MULTI_QUEUE_PRIORITY:
> > > +		__guc_exec_queue_process_msg_set_multi_queue_priority(msg);
> > > +		break;
> > >  	default:
> > >  		XE_WARN_ON("Unknown message type");
> > >  	}
> > > @@ -2004,6 +2029,28 @@ static int guc_exec_queue_set_preempt_timeout(struct xe_exec_queue *q,
> > >  	return 0;
> > >  }
> > > 
> > > +static int guc_exec_queue_set_multi_queue_priority(struct xe_exec_queue *q,
> > > +						   enum xe_multi_queue_priority priority)
> > > +{
> > > +	struct xe_sched_msg *msg;
> > > +
> > > +	if (!xe_exec_queue_is_multi_queue(q))
> > > +		return -EINVAL;
> > > +
> > > +	if (q->multi_queue.priority == priority ||
> > > +	    exec_queue_killed_or_banned_or_wedged(q))
> > > +		return 0;
> > > +
> > > +	msg = kmalloc(sizeof(*msg), GFP_KERNEL);
> > > +	if (!msg)
> > > +		return -ENOMEM;
> > > +
> > > +	q->multi_queue.priority = priority;
> > > +	guc_exec_queue_add_msg(q, msg, SET_MULTI_QUEUE_PRIORITY);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  static int guc_exec_queue_suspend(struct xe_exec_queue *q)
> > >  {
> > >  	struct xe_gpu_scheduler *sched = &q->guc->sched;
> > > @@ -2095,6 +2142,7 @@ static const struct xe_exec_queue_ops guc_exec_queue_ops = {
> > >  	.set_priority = guc_exec_queue_set_priority,
> > >  	.set_timeslice = guc_exec_queue_set_timeslice,
> > >  	.set_preempt_timeout = guc_exec_queue_set_preempt_timeout,
> > > +	.set_multi_queue_priority = guc_exec_queue_set_multi_queue_priority,
> > >  	.suspend = guc_exec_queue_suspend,
> > >  	.suspend_wait = guc_exec_queue_suspend_wait,
> > >  	.resume = guc_exec_queue_resume,
> > > --
> > > 2.43.0
> > > 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 13/16] drm/xe/multi_queue: Support active group after primary is destroyed
  2025-10-31 18:29 ` [PATCH 13/16] drm/xe/multi_queue: Support active group after primary is destroyed Niranjana Vishwanathapura
@ 2025-11-03 22:05   ` Matthew Brost
  2025-11-04 17:24     ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 61+ messages in thread
From: Matthew Brost @ 2025-11-03 22:05 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Fri, Oct 31, 2025 at 11:29:33AM -0700, Niranjana Vishwanathapura wrote:
> Add support to keep the group active after the primary queue is
> destroyed. Instead of killing the primary queue during exec_queue
> destroy ioctl, kill it when all the secondary queues of the group
> are killed.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_device.c           |  7 ++-
>  drivers/gpu/drm/xe/xe_exec_queue.c       | 55 +++++++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_exec_queue.h       |  2 +
>  drivers/gpu/drm/xe/xe_exec_queue_types.h |  4 ++
>  include/uapi/drm/xe_drm.h                |  5 +++
>  5 files changed, 70 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 0b496676527a..708a17c357e6 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -176,7 +176,12 @@ static void xe_file_close(struct drm_device *dev, struct drm_file *file)
>  	xa_for_each(&xef->exec_queue.xa, idx, q) {
>  		if (q->vm && q->hwe->hw_engine_group)
>  			xe_hw_engine_group_del_exec_queue(q->hwe->hw_engine_group, q);
> -		xe_exec_queue_kill(q);
> +
> +		if (xe_exec_queue_is_multi_queue_primary(q))
> +			xe_exec_queue_group_kill_put(q->multi_queue.group);
> +		else
> +			xe_exec_queue_kill(q);
> +
>  		xe_exec_queue_put(q);
>  	}
>  	xa_for_each(&xef->vm.xa, idx, vm)
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 3c1bb4f10fd5..d7b0173691c1 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -405,6 +405,26 @@ struct xe_exec_queue *xe_exec_queue_create_bind(struct xe_device *xe,
>  }
>  ALLOW_ERROR_INJECTION(xe_exec_queue_create_bind, ERRNO);
>  
> +static void xe_exec_queue_group_kill(struct kref *ref)
> +{
> +	struct xe_exec_queue_group *group = container_of(ref, struct xe_exec_queue_group,
> +							 kill_refcount);
> +	xe_exec_queue_kill(group->primary);
> +}
> +
> +static inline void xe_exec_queue_group_kill_get(struct xe_exec_queue_group *group)
> +{
> +	kref_get(&group->kill_refcount);
> +}
> +
> +void xe_exec_queue_group_kill_put(struct xe_exec_queue_group *group)
> +{
> +	if (!group)
> +		return;
> +
> +	kref_put(&group->kill_refcount, xe_exec_queue_group_kill);
> +}
> +
>  void xe_exec_queue_destroy(struct kref *ref)
>  {
>  	struct xe_exec_queue *q = container_of(ref, struct xe_exec_queue, refcount);
> @@ -607,6 +627,7 @@ static int xe_exec_queue_group_init(struct xe_device *xe, struct xe_exec_queue *
>  	group->primary = q;
>  	group->cgp_bo = bo;
>  	INIT_LIST_HEAD(&group->list);
> +	kref_init(&group->kill_refcount);
>  	xa_init_flags(&group->xa, XA_FLAGS_ALLOC1);
>  	mutex_init(&group->lock);
>  	mutex_init(&group->list_lock);
> @@ -675,6 +696,11 @@ static int xe_exec_queue_group_add(struct xe_device *xe, struct xe_exec_queue *q
>  	q->multi_queue.pos = pos;
>  	mutex_unlock(&group->lock);
>  
> +	if (group->primary->multi_queue.keep_active) {
> +		xe_exec_queue_group_kill_get(group);
> +		q->multi_queue.keep_active = true;
> +	}
> +
>  	return 0;
>  }
>  
> @@ -691,6 +717,11 @@ static void xe_exec_queue_group_delete(struct xe_exec_queue *q)
>  	if (lrc)
>  		xe_lrc_put(lrc);
>  	mutex_unlock(&group->lock);
> +
> +	if (q->multi_queue.keep_active) {
> +		xe_exec_queue_group_kill_put(group);
> +		q->multi_queue.keep_active = false;
> +	}
>  }
>  
>  static int exec_queue_set_multi_group(struct xe_device *xe, struct xe_exec_queue *q,
> @@ -709,12 +740,24 @@ static int exec_queue_set_multi_group(struct xe_device *xe, struct xe_exec_queue
>  		return -EINVAL;
>  
>  	if (value & DRM_XE_MULTI_GROUP_CREATE) {
> -		if (XE_IOCTL_DBG(xe, value & ~DRM_XE_MULTI_GROUP_CREATE))
> +		if (XE_IOCTL_DBG(xe, value & ~(DRM_XE_MULTI_GROUP_CREATE |
> +					       DRM_XE_MULTI_GROUP_KEEP_ACTIVE)))
> +			return -EINVAL;
> +
> +		/*
> +		 * KEEP_ACTIVE is not supported in preempt fence mode as in that mode,
> +		 * VM_DESTROY ioctl expects all exec queues of that VM are already killed.
> +		 */
> +		if (XE_IOCTL_DBG(xe, (value & DRM_XE_MULTI_GROUP_KEEP_ACTIVE) &&
> +				 xe_vm_in_preempt_fence_mode(q->vm)))
>  			return -EINVAL;
>  
>  		q->multi_queue.valid = true;
>  		q->multi_queue.is_primary = true;
>  		q->multi_queue.pos = 0;
> +		if (value & DRM_XE_MULTI_GROUP_KEEP_ACTIVE)
> +			q->multi_queue.keep_active = true;
> +
>  		return 0;
>  	}
>  
> @@ -1254,6 +1297,11 @@ void xe_exec_queue_kill(struct xe_exec_queue *q)
>  
>  	q->ops->kill(q);
>  	xe_vm_remove_compute_exec_queue(q->vm, q);
> +
> +	if (!xe_exec_queue_is_multi_queue_primary(q) && q->multi_queue.keep_active) {
> +		xe_exec_queue_group_kill_put(q->multi_queue.group);
> +		q->multi_queue.keep_active = false;
> +	}

This looks a little odd. Either you don't need to clear
multi_queue.keep_active as xe_exec_queue_kill can be called at most once
(IIRC it can be called multiple times) or you need some locking around
multi_queue.keep_active or it needs to be an atomic to prevent multiple
threads from calling xe_exec_queue_group_kill_put twice.

Matt

>  }
>  
>  int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
> @@ -1280,7 +1328,10 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
>  	if (q->vm && q->hwe->hw_engine_group)
>  		xe_hw_engine_group_del_exec_queue(q->hwe->hw_engine_group, q);
>  
> -	xe_exec_queue_kill(q);
> +	if (xe_exec_queue_is_multi_queue_primary(q))
> +		xe_exec_queue_group_kill_put(q->multi_queue.group);
> +	else
> +		xe_exec_queue_kill(q);
>  
>  	trace_xe_exec_queue_close(q);
>  	xe_exec_queue_put(q);
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
> index 61478b2e883b..b642341f1ede 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
> @@ -109,6 +109,8 @@ static inline struct xe_exec_queue *xe_exec_queue_multi_queue_primary(struct xe_
>  	return xe_exec_queue_is_multi_queue(q) ? q->multi_queue.group->primary : q;
>  }
>  
> +void xe_exec_queue_group_kill_put(struct xe_exec_queue_group *group);
> +
>  bool xe_exec_queue_is_lr(struct xe_exec_queue *q);
>  
>  bool xe_exec_queue_is_idle(struct xe_exec_queue *q);
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index e64b6588923e..cdca3afe838c 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -55,6 +55,8 @@ struct xe_exec_queue_group {
>  	struct list_head list;
>  	/** @list_lock: Secondary queue list lock */
>  	struct mutex list_lock;
> +	/** @kill_refcount: ref count to kill primary queue */
> +	struct kref kill_refcount;
>  	/** @sync_pending: CGP_SYNC_DONE g2h response pending */
>  	bool sync_pending;
>  };
> @@ -152,6 +154,8 @@ struct xe_exec_queue {
>  		u8 valid:1;
>  		/** @multi_queue.is_primary: Is primary queue (Q0) of the group */
>  		u8 is_primary:1;
> +		/** @multi_queue.keep_active: Keep the group active after primary is destroyed */
> +		u8 keep_active:1;
>  	} multi_queue;
>  
>  	/** @sched_props: scheduling properties */
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index d72151163e77..333fb38b3404 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -1260,6 +1260,10 @@ struct drm_xe_vm_bind {
>   *    then a new multi-queue group is created with this queue as the primary queue
>   *    (Q0). Otherwise, the queue gets added to the multi-queue group whose primary
>   *    queue id is specified in the 'value' field.
> + *    If the extension's 'value' field has %DRM_XE_MULTI_GROUP_KEEP_ACTIVE flag
> + *    set, then the multi-queue group is kept active after the primary queue is
> + *    destroyed.
> + *
>   *  - %DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY - Set the queue
>   *    priority within the multi-queue group.
>   *
> @@ -1304,6 +1308,7 @@ struct drm_xe_exec_queue_create {
>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE		2
>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP		3
>  #define     DRM_XE_MULTI_GROUP_CREATE				(1ull << 63)
> +#define     DRM_XE_MULTI_GROUP_KEEP_ACTIVE			(1ull << 62)
>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY	4
>  	/** @extensions: Pointer to the first extension struct, if any */
>  	__u64 extensions;
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 05/16] drm/xe/multi_queue: Handle invalid exec queue property setting
  2025-10-31 18:29 ` [PATCH 05/16] drm/xe/multi_queue: Handle invalid exec queue property setting Niranjana Vishwanathapura
@ 2025-11-03 22:41   ` Matthew Brost
  0 siblings, 0 replies; 61+ messages in thread
From: Matthew Brost @ 2025-11-03 22:41 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Fri, Oct 31, 2025 at 11:29:25AM -0700, Niranjana Vishwanathapura wrote:
> Only MULTI_QUEUE_PRIORITY property is valid for secondary queues of a
> multi queue group. MULTI_QUEUE_PRIORITY only applies to multi queue group
> queues. Detect invalid user queue property setting and return error.
> 
> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_exec_queue.c | 66 ++++++++++++++++++++++++++----
>  1 file changed, 57 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
> index 0da256428916..78b3a0e2ddd3 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
> @@ -61,7 +61,7 @@ enum xe_exec_queue_sched_prop {
>  };
>  
>  static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q,
> -				      u64 extensions, int ext_number);
> +				      u64 extensions);
>  
>  static void xe_exec_queue_group_cleanup(struct xe_exec_queue *q)
>  {
> @@ -206,7 +206,7 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
>  		 * may set q->usm, must come before xe_lrc_create(),
>  		 * may overwrite q->sched_props, must come before q->ops->init()
>  		 */
> -		err = exec_queue_user_extensions(xe, q, extensions, 0);
> +		err = exec_queue_user_extensions(xe, q, extensions);
>  		if (err) {
>  			__xe_exec_queue_free(q);
>  			return ERR_PTR(err);
> @@ -747,9 +747,35 @@ static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
>  							exec_queue_set_multi_queue_priority,
>  };
>  
> +static int exec_queue_user_ext_check(struct xe_exec_queue *q, u64 properties)
> +{
> +	u64 secondary_queue_valid_props = BIT_ULL(DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP) |
> +				  BIT_ULL(DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY);
> +
> +	/*
> +	 * Only MULTI_QUEUE_PRIORITY property is valid for secondary queues of a
> +	 * multi-queue group.
> +	 */
> +	if (xe_exec_queue_is_multi_queue_secondary(q) &&
> +	    properties & ~secondary_queue_valid_props)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +static int exec_queue_user_ext_check_final(struct xe_exec_queue *q, u64 properties)
> +{
> +	/* MULTI_QUEUE_PRIORITY only applies to multi-queue group queues */
> +	if ((properties & BIT_ULL(DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY)) &&
> +	    !(properties & BIT_ULL(DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP)))
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
>  static int exec_queue_user_ext_set_property(struct xe_device *xe,
>  					    struct xe_exec_queue *q,
> -					    u64 extension)
> +					    u64 extension, u64 *properties)
>  {
>  	u64 __user *address = u64_to_user_ptr(extension);
>  	struct drm_xe_ext_set_property ext;
> @@ -774,20 +800,25 @@ static int exec_queue_user_ext_set_property(struct xe_device *xe,
>  	if (!exec_queue_set_property_funcs[idx])
>  		return -EINVAL;
>  
> +	*properties |= BIT_ULL(idx);
> +	err = exec_queue_user_ext_check(q, *properties);
> +	if (XE_IOCTL_DBG(xe, err))
> +		return err;
> +
>  	return exec_queue_set_property_funcs[idx](xe, q, ext.value);
>  }
>  
>  typedef int (*xe_exec_queue_user_extension_fn)(struct xe_device *xe,
>  					       struct xe_exec_queue *q,
> -					       u64 extension);
> +					       u64 extension, u64 *properties);
>  
>  static const xe_exec_queue_user_extension_fn exec_queue_user_extension_funcs[] = {
>  	[DRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY] = exec_queue_user_ext_set_property,
>  };
>  
>  #define MAX_USER_EXTENSIONS	16
> -static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q,
> -				      u64 extensions, int ext_number)
> +static int __exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q,
> +					u64 extensions, int ext_number, u64 *properties)
>  {
>  	u64 __user *address = u64_to_user_ptr(extensions);
>  	struct drm_xe_user_extension ext;
> @@ -808,13 +839,30 @@ static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue
>  
>  	idx = array_index_nospec(ext.name,
>  				 ARRAY_SIZE(exec_queue_user_extension_funcs));
> -	err = exec_queue_user_extension_funcs[idx](xe, q, extensions);
> +	err = exec_queue_user_extension_funcs[idx](xe, q, extensions, properties);
>  	if (XE_IOCTL_DBG(xe, err))
>  		return err;
>  
>  	if (ext.next_extension)
> -		return exec_queue_user_extensions(xe, q, ext.next_extension,
> -						  ++ext_number);
> +		return __exec_queue_user_extensions(xe, q, ext.next_extension,
> +						    ++ext_number, properties);
> +
> +	return 0;
> +}
> +
> +static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q,
> +				      u64 extensions)
> +{
> +	u64 properties = 0;
> +	int err;
> +
> +	err = __exec_queue_user_extensions(xe, q, extensions, 0, &properties);
> +	if (XE_IOCTL_DBG(xe, err))
> +		return err;
> +
> +	err = exec_queue_user_ext_check_final(q, properties);
> +	if (XE_IOCTL_DBG(xe, err))
> +		return err;
>  
>  	if (xe_exec_queue_is_multi_queue_primary(q)) {
>  		err = xe_exec_queue_group_init(xe, q);
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/16] drm/xe/multi_queue: Add user interface for multi queue support
  2025-10-31 19:31   ` Matthew Brost
@ 2025-11-03 22:58     ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-03 22:58 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Fri, Oct 31, 2025 at 12:31:24PM -0700, Matthew Brost wrote:
>On Fri, Oct 31, 2025 at 11:29:22AM -0700, Niranjana Vishwanathapura wrote:
>> Multi Queue is a new mode of execution supported by the compute and
>> blitter copy command streamers (CCS and BCS, respectively). It is an
>> enhancement of the existing hardware architecture and leverages the
>> same submission model. It enables support for efficient, parallel
>> execution of multiple queues within a single context. All the queues
>> of a group must use the same address space (VM).
>>
>> The new DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE execution queue
>> property supports creating a multi queue group and adding queues to
>> a queue group. All queues of a multi queue group share the same
>> context.
>>
>> A exec queue create ioctl call with above property specified with value
>> DRM_XE_SUPER_GROUP_CREATE will create a new multi queue group with the
>> queue being created as the primary queue (aka q0) of the group. To add
>> secondary queues to the group, they need to be created with the above
>> property with id of the primary queue as the value. The properties of
>> the primary queue (like priority, timeslice) applies to the whole group.
>> So, these properties can't be set for secondary queues of a group.
>>
>> Once destroyed, the secondary queues of a multi queue group can't be
>> replaced. However, they can be dynamically added to the group up to a
>> total of 64 queues per group. Once the primary queue is destroyed,
>> secondary queues can't be added to the queue group.
>>
>> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_exec_queue.c       | 191 ++++++++++++++++++++++-
>>  drivers/gpu/drm/xe/xe_exec_queue.h       |  47 ++++++
>>  drivers/gpu/drm/xe/xe_exec_queue_types.h |  30 ++++
>>  include/uapi/drm/xe_drm.h                |   8 +
>>  4 files changed, 274 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>> index 1b57d7c2cc94..86404a7c9fe4 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>> @@ -12,6 +12,7 @@
>>  #include <drm/drm_file.h>
>>  #include <uapi/drm/xe_drm.h>
>>
>> +#include "xe_bo.h"
>>  #include "xe_dep_scheduler.h"
>>  #include "xe_device.h"
>>  #include "xe_gt.h"
>> @@ -62,6 +63,32 @@ enum xe_exec_queue_sched_prop {
>>  static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q,
>>  				      u64 extensions, int ext_number);
>>
>> +static void xe_exec_queue_group_cleanup(struct xe_exec_queue *q)
>> +{
>> +	struct xe_exec_queue_group *group = q->multi_queue.group;
>> +	struct xe_lrc *lrc;
>> +	unsigned long idx;
>> +
>> +	if (xe_exec_queue_is_multi_queue_secondary(q)) {
>> +		xe_exec_queue_put(xe_exec_queue_multi_queue_primary(q));
>> +		return;
>> +	}
>> +
>> +	if (!group)
>> +		return;
>> +
>> +	/* Primary queue cleanup */
>> +	mutex_lock(&group->lock);
>
>I don't think you need the group->lock here. Xarrays have their own
>internal locking.
>
>We do use mutexes around xarrays in Xe, but that's to protect the object
>reference—not the xarray itself.
>
>For example, we follow this pattern:
>
>lock();
>obj = xa_find();
>if (obj)
>	xe_obj_get(obj);
>unlock();
>
>Similarly, we apply a lock on the removal side. This prevents the object
>from being removed and a reference being dropped in parallel with a
>lookup (i.e., it avoids a use-after-free).
>
>We don’t always use this pattern correctly—some of that is legacy code
>we haven’t cleaned up yet—but we should.
>
>In your case, you're not protecting any object references (i.e., there's
>no lookup function involved), as far as I can tell. So there's no need
>for a lock here.
>

Ok, will remove.

Niranjana

>Matt
>
>> +	xa_for_each(&group->xa, idx, lrc)
>> +		xe_lrc_put(lrc);
>> +	mutex_unlock(&group->lock);
>> +
>> +	xa_destroy(&group->xa);
>> +	mutex_destroy(&group->lock);
>> +	xe_bo_unpin_map_no_vm(group->cgp_bo);
>> +	kfree(group);
>> +}
>> +
>>  static void __xe_exec_queue_free(struct xe_exec_queue *q)
>>  {
>>  	int i;
>> @@ -72,6 +99,10 @@ static void __xe_exec_queue_free(struct xe_exec_queue *q)
>>
>>  	if (xe_exec_queue_uses_pxp(q))
>>  		xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp, q);
>> +
>> +	if (xe_exec_queue_is_multi_queue(q))
>> +		xe_exec_queue_group_cleanup(q);
>> +
>>  	if (q->vm)
>>  		xe_vm_put(q->vm);
>>
>> @@ -549,6 +580,148 @@ exec_queue_set_pxp_type(struct xe_device *xe, struct xe_exec_queue *q, u64 value
>>  	return xe_pxp_exec_queue_set_type(xe->pxp, q, DRM_XE_PXP_TYPE_HWDRM);
>>  }
>>
>> +static int xe_exec_queue_group_init(struct xe_device *xe, struct xe_exec_queue *q)
>> +{
>> +	struct xe_tile *tile = gt_to_tile(q->gt);
>> +	struct xe_exec_queue_group *group;
>> +	struct xe_bo *bo;
>> +
>> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
>> +	if (!group)
>> +		return -ENOMEM;
>> +
>> +	bo = xe_bo_create_pin_map_novm(xe, tile, SZ_4K, ttm_bo_type_kernel,
>> +				       XE_BO_FLAG_VRAM_IF_DGFX(tile) |
>> +				       XE_BO_FLAG_GGTT, false);
>> +	if (IS_ERR(bo)) {
>> +		drm_err(&xe->drm, "CGP bo allocation for queue group failed: %ld\n",
>> +			PTR_ERR(bo));
>> +		kfree(group);
>> +		return PTR_ERR(bo);
>> +	}
>> +
>> +	xe_map_memset(xe, &bo->vmap, 0, 0, SZ_4K);
>> +
>> +	group->primary = q;
>> +	group->cgp_bo = bo;
>> +	xa_init_flags(&group->xa, XA_FLAGS_ALLOC1);
>> +	mutex_init(&group->lock);
>> +	mutex_init(&group->list_lock);
>> +	q->multi_queue.group = group;
>> +
>> +	return 0;
>> +}
>> +
>> +static inline bool xe_exec_queue_supports_multi_queue(struct xe_exec_queue *q)
>> +{
>> +	return q->gt->info.multi_queue_enable_mask & BIT(q->class);
>> +}
>> +
>> +static int xe_exec_queue_group_validate(struct xe_device *xe, struct xe_exec_queue *q,
>> +					u32 primary_id)
>> +{
>> +	struct xe_exec_queue_group *group;
>> +	struct xe_exec_queue *primary;
>> +	int ret;
>> +
>> +	primary = xe_exec_queue_lookup(q->vm->xef, primary_id);
>> +	if (XE_IOCTL_DBG(xe, !primary))
>> +		return -ENOENT;
>> +
>> +	if (XE_IOCTL_DBG(xe, !xe_exec_queue_is_multi_queue_primary(primary)) ||
>> +	    XE_IOCTL_DBG(xe, q->vm != primary->vm) ||
>> +	    XE_IOCTL_DBG(xe, q->logical_mask != primary->logical_mask)) {
>> +		ret = -EINVAL;
>> +		goto put_primary;
>> +	}
>> +
>> +	group = primary->multi_queue.group;
>> +	q->multi_queue.valid = true;
>> +	q->multi_queue.group = group;
>> +
>> +	return 0;
>> +put_primary:
>> +	xe_exec_queue_put(primary);
>> +	return ret;
>> +}
>> +
>> +#define XE_MAX_GROUP_SIZE	64
>> +static int xe_exec_queue_group_add(struct xe_device *xe, struct xe_exec_queue *q)
>> +{
>> +	struct xe_exec_queue_group *group = q->multi_queue.group;
>> +	u32 pos;
>> +	int err;
>> +
>> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
>> +		return 0;
>> +
>> +	mutex_lock(&group->lock);
>> +	err = xa_alloc(&group->xa, &pos, xe_lrc_get(q->lrc[0]),
>> +		       XA_LIMIT(1, XE_MAX_GROUP_SIZE - 1), GFP_KERNEL);
>> +	if (XE_IOCTL_DBG(xe, err)) {
>> +		xe_lrc_put(q->lrc[0]);
>> +		mutex_unlock(&group->lock);
>> +
>> +		/* It is invalid if queue group limit is exceeded */
>> +		if (err == -EBUSY)
>> +			err = -EINVAL;
>> +
>> +		return err;
>> +	}
>> +
>> +	q->multi_queue.pos = pos;
>> +	mutex_unlock(&group->lock);
>> +
>> +	return 0;
>> +}
>> +
>> +static void xe_exec_queue_group_delete(struct xe_exec_queue *q)
>> +{
>> +	struct xe_exec_queue_group *group = q->multi_queue.group;
>> +	struct xe_lrc *lrc;
>> +
>> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
>> +		return;
>> +
>> +	mutex_lock(&group->lock);
>> +	lrc = xa_erase(&group->xa, q->multi_queue.pos);
>> +	if (lrc)
>> +		xe_lrc_put(lrc);
>> +	mutex_unlock(&group->lock);
>> +}
>> +
>> +static int exec_queue_set_multi_group(struct xe_device *xe, struct xe_exec_queue *q,
>> +				      u64 value)
>> +{
>> +	if (XE_IOCTL_DBG(xe, !xe_exec_queue_supports_multi_queue(q)))
>> +		return -ENODEV;
>> +
>> +	if (XE_IOCTL_DBG(xe, !xe_device_uc_enabled(xe)))
>> +		return -EOPNOTSUPP;
>> +
>> +	if (XE_IOCTL_DBG(xe, xe_exec_queue_is_parallel(q)))
>> +		return -EINVAL;
>> +
>> +	if (XE_IOCTL_DBG(xe, xe_exec_queue_is_multi_queue(q)))
>> +		return -EINVAL;
>> +
>> +	if (value & DRM_XE_MULTI_GROUP_CREATE) {
>> +		if (XE_IOCTL_DBG(xe, value & ~DRM_XE_MULTI_GROUP_CREATE))
>> +			return -EINVAL;
>> +
>> +		q->multi_queue.valid = true;
>> +		q->multi_queue.is_primary = true;
>> +		q->multi_queue.pos = 0;
>> +		return 0;
>> +	}
>> +
>> +	/* While adding secondary queues, the upper 32 bits must be 0 */
>> +	if (XE_IOCTL_DBG(xe, value & (~0ull << 32)))
>> +		return -EINVAL;
>> +
>> +	return xe_exec_queue_group_validate(xe, q, value);
>> +}
>> +
>>  typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
>>  					     struct xe_exec_queue *q,
>>  					     u64 value);
>> @@ -557,6 +730,7 @@ static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
>>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY] = exec_queue_set_priority,
>>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE] = exec_queue_set_timeslice,
>>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE] = exec_queue_set_pxp_type,
>> +	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP] = exec_queue_set_multi_group,
>>  };
>>
>>  static int exec_queue_user_ext_set_property(struct xe_device *xe,
>> @@ -577,7 +751,8 @@ static int exec_queue_user_ext_set_property(struct xe_device *xe,
>>  	    XE_IOCTL_DBG(xe, ext.pad) ||
>>  	    XE_IOCTL_DBG(xe, ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY &&
>>  			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE &&
>> -			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE))
>> +			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE &&
>> +			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP))
>>  		return -EINVAL;
>>
>>  	idx = array_index_nospec(ext.property, ARRAY_SIZE(exec_queue_set_property_funcs));
>> @@ -626,6 +801,12 @@ static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue
>>  		return exec_queue_user_extensions(xe, q, ext.next_extension,
>>  						  ++ext_number);
>>
>> +	if (xe_exec_queue_is_multi_queue_primary(q)) {
>> +		err = xe_exec_queue_group_init(xe, q);
>> +		if (XE_IOCTL_DBG(xe, err))
>> +			return err;
>> +	}
>> +
>>  	return 0;
>>  }
>>
>> @@ -780,12 +961,16 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>>  		if (IS_ERR(q))
>>  			return PTR_ERR(q);
>>
>> +		err = xe_exec_queue_group_add(xe, q);
>> +		if (XE_IOCTL_DBG(xe, err))
>> +			goto put_exec_queue;
>> +
>>  		if (xe_vm_in_preempt_fence_mode(vm)) {
>>  			q->lr.context = dma_fence_context_alloc(1);
>>
>>  			err = xe_vm_add_compute_exec_queue(vm, q);
>>  			if (XE_IOCTL_DBG(xe, err))
>> -				goto put_exec_queue;
>> +				goto delete_queue_group;
>>  		}
>>
>>  		if (q->vm && q->hwe->hw_engine_group) {
>> @@ -808,6 +993,8 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>>
>>  kill_exec_queue:
>>  	xe_exec_queue_kill(q);
>> +delete_queue_group:
>> +	xe_exec_queue_group_delete(q);
>>  put_exec_queue:
>>  	xe_exec_queue_put(q);
>>  	return err;
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
>> index a4dfbe858bda..8cd6487018fa 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue.h
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
>> @@ -62,6 +62,53 @@ static inline bool xe_exec_queue_uses_pxp(struct xe_exec_queue *q)
>>  	return q->pxp.type;
>>  }
>>
>> +/**
>> + * xe_exec_queue_is_multi_queue() - Whether an exec_queue is part of a queue group.
>> + * @q: The exec_queue
>> + *
>> + * Return: True if the exec_queue is part of a queue group, false otherwise.
>> + */
>> +static inline bool xe_exec_queue_is_multi_queue(struct xe_exec_queue *q)
>> +{
>> +	return q->multi_queue.valid;
>> +}
>> +
>> +/**
>> + * xe_exec_queue_is_multi_queue_primary() - Whether an exec_queue is primary queue
>> + * of a multi queue group.
>> + * @q: The exec_queue
>> + *
>> + * Return: True if @q is primary queue of a queue group, false otherwise.
>> + */
>> +static inline bool xe_exec_queue_is_multi_queue_primary(struct xe_exec_queue *q)
>> +{
>> +	return q->multi_queue.is_primary;
>> +}
>> +
>> +/**
>> + * xe_exec_queue_is_multi_queue_secondary() - Whether an exec_queue is secondary queue
>> + * of a multi queue group.
>> + * @q: The exec_queue
>> + *
>> + * Return: True if @q is secondary queue of a queue group, false otherwise.
>> + */
>> +static inline bool xe_exec_queue_is_multi_queue_secondary(struct xe_exec_queue *q)
>> +{
>> +	return xe_exec_queue_is_multi_queue(q) && !q->multi_queue.is_primary;
>> +}
>> +
>> +/**
>> + * xe_exec_queue_multi_queue_primary() - Get multi queue group's primary queue
>> + * @q: The exec_queue
>> + *
>> + * If @q belongs to a multi queue group, then the primary queue of the group will
>> + * be returned. Otherwise, @q will be returned.
>> + */
>> +static inline struct xe_exec_queue *xe_exec_queue_multi_queue_primary(struct xe_exec_queue *q)
>> +{
>> +	return xe_exec_queue_is_multi_queue(q) ? q->multi_queue.group->primary : q;
>> +}
>> +
>>  bool xe_exec_queue_is_lr(struct xe_exec_queue *q);
>>
>>  bool xe_exec_queue_is_idle(struct xe_exec_queue *q);
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> index c8807268ec6c..3856776df5c4 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> @@ -31,6 +31,24 @@ enum xe_exec_queue_priority {
>>  	XE_EXEC_QUEUE_PRIORITY_COUNT
>>  };
>>
>> +/**
>> + * struct xe_exec_queue_group - Execution multi queue group
>> + *
>> + * Contains multi queue group information.
>> + */
>> +struct xe_exec_queue_group {
>> +	/** @primary: Primary queue of this group */
>> +	struct xe_exec_queue *primary;
>> +	/** @lock: Queue group update lock */
>> +	struct mutex lock;
>> +	/** @cgp_bo: BO for the Context Group Page */
>> +	struct xe_bo *cgp_bo;
>> +	/** @xa: xarray to store LRCs */
>> +	struct xarray xa;
>> +	/** @list_lock: Secondary queue list lock */
>> +	struct mutex list_lock;
>> +};
>> +
>>  /**
>>   * struct xe_exec_queue - Execution queue
>>   *
>> @@ -110,6 +128,18 @@ struct xe_exec_queue {
>>  		struct xe_guc_exec_queue *guc;
>>  	};
>>
>> +	/** @multi_queue: Multi queue information */
>> +	struct {
>> +		/** @multi_queue.group: Queue group information */
>> +		struct xe_exec_queue_group *group;
>> +		/** @multi_queue.pos: Position of queue within the multi-queue group */
>> +		u8 pos;
>> +		/** @multi_queue.valid: Queue belongs to a multi queue group */
>> +		u8 valid:1;
>> +		/** @multi_queue.is_primary: Is primary queue (Q0) of the group */
>> +		u8 is_primary:1;
>> +	} multi_queue;
>> +
>>  	/** @sched_props: scheduling properties */
>>  	struct {
>>  		/** @sched_props.timeslice_us: timeslice period in micro-seconds */
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index 47853659a705..d903b3a55ec1 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -1252,6 +1252,12 @@ struct drm_xe_vm_bind {
>>   *    Given that going into a power-saving state kills PXP HWDRM sessions,
>>   *    runtime PM will be blocked while queues of this type are alive.
>>   *    All PXP queues will be killed if a PXP invalidation event occurs.
>> + *  - %DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP - Create a multi-queue group
>> + *    or add secondary queues to a multi-queue group.
>> + *    If the extension's 'value' field has %DRM_XE_MULTI_GROUP_CREATE flag set,
>> + *    then a new multi-queue group is created with this queue as the primary queue
>> + *    (Q0). Otherwise, the queue gets added to the multi-queue group whose primary
>> + *    queue id is specified in the 'value' field.
>>   *
>>   * The example below shows how to use @drm_xe_exec_queue_create to create
>>   * a simple exec_queue (no parallel submission) of class
>> @@ -1292,6 +1298,8 @@ struct drm_xe_exec_queue_create {
>>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY		0
>>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE		1
>>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE		2
>> +#define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP		3
>> +#define     DRM_XE_MULTI_GROUP_CREATE				(1ull << 63)
>>  	/** @extensions: Pointer to the first extension struct, if any */
>>  	__u64 extensions;
>>
>> --
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/16] drm/xe/multi_queue: Add user interface for multi queue support
  2025-11-02  0:23   ` Matthew Brost
@ 2025-11-03 22:59     ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-03 22:59 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Sat, Nov 01, 2025 at 05:23:19PM -0700, Matthew Brost wrote:
>On Fri, Oct 31, 2025 at 11:29:22AM -0700, Niranjana Vishwanathapura wrote:
>> Multi Queue is a new mode of execution supported by the compute and
>> blitter copy command streamers (CCS and BCS, respectively). It is an
>> enhancement of the existing hardware architecture and leverages the
>> same submission model. It enables support for efficient, parallel
>> execution of multiple queues within a single context. All the queues
>> of a group must use the same address space (VM).
>>
>> The new DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE execution queue
>> property supports creating a multi queue group and adding queues to
>> a queue group. All queues of a multi queue group share the same
>> context.
>>
>> A exec queue create ioctl call with above property specified with value
>> DRM_XE_SUPER_GROUP_CREATE will create a new multi queue group with the
>> queue being created as the primary queue (aka q0) of the group. To add
>> secondary queues to the group, they need to be created with the above
>> property with id of the primary queue as the value. The properties of
>> the primary queue (like priority, timeslice) applies to the whole group.
>> So, these properties can't be set for secondary queues of a group.
>>
>> Once destroyed, the secondary queues of a multi queue group can't be
>> replaced. However, they can be dynamically added to the group up to a
>> total of 64 queues per group. Once the primary queue is destroyed,
>> secondary queues can't be added to the queue group.
>>
>> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_exec_queue.c       | 191 ++++++++++++++++++++++-
>>  drivers/gpu/drm/xe/xe_exec_queue.h       |  47 ++++++
>>  drivers/gpu/drm/xe/xe_exec_queue_types.h |  30 ++++
>>  include/uapi/drm/xe_drm.h                |   8 +
>>  4 files changed, 274 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>> index 1b57d7c2cc94..86404a7c9fe4 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>> @@ -12,6 +12,7 @@
>>  #include <drm/drm_file.h>
>>  #include <uapi/drm/xe_drm.h>
>>
>> +#include "xe_bo.h"
>>  #include "xe_dep_scheduler.h"
>>  #include "xe_device.h"
>>  #include "xe_gt.h"
>> @@ -62,6 +63,32 @@ enum xe_exec_queue_sched_prop {
>>  static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q,
>>  				      u64 extensions, int ext_number);
>>
>> +static void xe_exec_queue_group_cleanup(struct xe_exec_queue *q)
>> +{
>> +	struct xe_exec_queue_group *group = q->multi_queue.group;
>> +	struct xe_lrc *lrc;
>> +	unsigned long idx;
>> +
>> +	if (xe_exec_queue_is_multi_queue_secondary(q)) {
>> +		xe_exec_queue_put(xe_exec_queue_multi_queue_primary(q));
>> +		return;
>> +	}
>> +
>> +	if (!group)
>> +		return;
>> +
>> +	/* Primary queue cleanup */
>> +	mutex_lock(&group->lock);
>> +	xa_for_each(&group->xa, idx, lrc)
>> +		xe_lrc_put(lrc);
>> +	mutex_unlock(&group->lock);
>> +
>> +	xa_destroy(&group->xa);
>> +	mutex_destroy(&group->lock);
>> +	xe_bo_unpin_map_no_vm(group->cgp_bo);
>> +	kfree(group);
>> +}
>> +
>>  static void __xe_exec_queue_free(struct xe_exec_queue *q)
>>  {
>>  	int i;
>> @@ -72,6 +99,10 @@ static void __xe_exec_queue_free(struct xe_exec_queue *q)
>>
>>  	if (xe_exec_queue_uses_pxp(q))
>>  		xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp, q);
>> +
>> +	if (xe_exec_queue_is_multi_queue(q))
>> +		xe_exec_queue_group_cleanup(q);
>> +
>>  	if (q->vm)
>>  		xe_vm_put(q->vm);
>>
>> @@ -549,6 +580,148 @@ exec_queue_set_pxp_type(struct xe_device *xe, struct xe_exec_queue *q, u64 value
>>  	return xe_pxp_exec_queue_set_type(xe->pxp, q, DRM_XE_PXP_TYPE_HWDRM);
>>  }
>>
>> +static int xe_exec_queue_group_init(struct xe_device *xe, struct xe_exec_queue *q)
>> +{
>> +	struct xe_tile *tile = gt_to_tile(q->gt);
>> +	struct xe_exec_queue_group *group;
>> +	struct xe_bo *bo;
>> +
>> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
>> +	if (!group)
>> +		return -ENOMEM;
>> +
>> +	bo = xe_bo_create_pin_map_novm(xe, tile, SZ_4K, ttm_bo_type_kernel,
>> +				       XE_BO_FLAG_VRAM_IF_DGFX(tile) |
>> +				       XE_BO_FLAG_GGTT, false);
>> +	if (IS_ERR(bo)) {
>> +		drm_err(&xe->drm, "CGP bo allocation for queue group failed: %ld\n",
>> +			PTR_ERR(bo));
>> +		kfree(group);
>> +		return PTR_ERR(bo);
>> +	}
>> +
>> +	xe_map_memset(xe, &bo->vmap, 0, 0, SZ_4K);
>> +
>> +	group->primary = q;
>> +	group->cgp_bo = bo;
>> +	xa_init_flags(&group->xa, XA_FLAGS_ALLOC1);
>> +	mutex_init(&group->lock);
>> +	mutex_init(&group->list_lock);
>> +	q->multi_queue.group = group;
>> +
>> +	return 0;
>> +}
>> +
>> +static inline bool xe_exec_queue_supports_multi_queue(struct xe_exec_queue *q)
>> +{
>> +	return q->gt->info.multi_queue_enable_mask & BIT(q->class);
>> +}
>> +
>> +static int xe_exec_queue_group_validate(struct xe_device *xe, struct xe_exec_queue *q,
>> +					u32 primary_id)
>> +{
>> +	struct xe_exec_queue_group *group;
>> +	struct xe_exec_queue *primary;
>> +	int ret;
>> +
>> +	primary = xe_exec_queue_lookup(q->vm->xef, primary_id);
>> +	if (XE_IOCTL_DBG(xe, !primary))
>> +		return -ENOENT;
>> +
>> +	if (XE_IOCTL_DBG(xe, !xe_exec_queue_is_multi_queue_primary(primary)) ||
>> +	    XE_IOCTL_DBG(xe, q->vm != primary->vm) ||
>> +	    XE_IOCTL_DBG(xe, q->logical_mask != primary->logical_mask)) {
>> +		ret = -EINVAL;
>> +		goto put_primary;
>> +	}
>> +
>> +	group = primary->multi_queue.group;
>> +	q->multi_queue.valid = true;
>> +	q->multi_queue.group = group;
>> +
>> +	return 0;
>> +put_primary:
>> +	xe_exec_queue_put(primary);
>> +	return ret;
>> +}
>> +
>> +#define XE_MAX_GROUP_SIZE	64
>> +static int xe_exec_queue_group_add(struct xe_device *xe, struct xe_exec_queue *q)
>> +{
>> +	struct xe_exec_queue_group *group = q->multi_queue.group;
>> +	u32 pos;
>> +	int err;
>> +
>> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
>> +		return 0;
>> +
>> +	mutex_lock(&group->lock);
>> +	err = xa_alloc(&group->xa, &pos, xe_lrc_get(q->lrc[0]),
>> +		       XA_LIMIT(1, XE_MAX_GROUP_SIZE - 1), GFP_KERNEL);
>
>Not a complete, just another few quick comment.
>
>I see the documentation patches mention the ref counting scheme but it
>is easy to over look that. Can we we have a quick inline indicating
>primary holds a reference to secondary queues LRCs?
>

Ok, will add the comment.

Niranjana

>Matt
>
>> +	if (XE_IOCTL_DBG(xe, err)) {
>> +		xe_lrc_put(q->lrc[0]);
>> +		mutex_unlock(&group->lock);
>> +
>> +		/* It is invalid if queue group limit is exceeded */
>> +		if (err == -EBUSY)
>> +			err = -EINVAL;
>> +
>> +		return err;
>> +	}
>> +
>> +	q->multi_queue.pos = pos;
>> +	mutex_unlock(&group->lock);
>> +
>> +	return 0;
>> +}
>> +
>> +static void xe_exec_queue_group_delete(struct xe_exec_queue *q)
>> +{
>> +	struct xe_exec_queue_group *group = q->multi_queue.group;
>> +	struct xe_lrc *lrc;
>> +
>> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
>> +		return;
>> +
>> +	mutex_lock(&group->lock);
>> +	lrc = xa_erase(&group->xa, q->multi_queue.pos);
>> +	if (lrc)
>> +		xe_lrc_put(lrc);
>> +	mutex_unlock(&group->lock);
>> +}
>> +
>> +static int exec_queue_set_multi_group(struct xe_device *xe, struct xe_exec_queue *q,
>> +				      u64 value)
>> +{
>> +	if (XE_IOCTL_DBG(xe, !xe_exec_queue_supports_multi_queue(q)))
>> +		return -ENODEV;
>> +
>> +	if (XE_IOCTL_DBG(xe, !xe_device_uc_enabled(xe)))
>> +		return -EOPNOTSUPP;
>> +
>> +	if (XE_IOCTL_DBG(xe, xe_exec_queue_is_parallel(q)))
>> +		return -EINVAL;
>> +
>> +	if (XE_IOCTL_DBG(xe, xe_exec_queue_is_multi_queue(q)))
>> +		return -EINVAL;
>> +
>> +	if (value & DRM_XE_MULTI_GROUP_CREATE) {
>> +		if (XE_IOCTL_DBG(xe, value & ~DRM_XE_MULTI_GROUP_CREATE))
>> +			return -EINVAL;
>> +
>> +		q->multi_queue.valid = true;
>> +		q->multi_queue.is_primary = true;
>> +		q->multi_queue.pos = 0;
>> +		return 0;
>> +	}
>> +
>> +	/* While adding secondary queues, the upper 32 bits must be 0 */
>> +	if (XE_IOCTL_DBG(xe, value & (~0ull << 32)))
>> +		return -EINVAL;
>> +
>> +	return xe_exec_queue_group_validate(xe, q, value);
>> +}
>> +
>>  typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
>>  					     struct xe_exec_queue *q,
>>  					     u64 value);
>> @@ -557,6 +730,7 @@ static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
>>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY] = exec_queue_set_priority,
>>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE] = exec_queue_set_timeslice,
>>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE] = exec_queue_set_pxp_type,
>> +	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP] = exec_queue_set_multi_group,
>>  };
>>
>>  static int exec_queue_user_ext_set_property(struct xe_device *xe,
>> @@ -577,7 +751,8 @@ static int exec_queue_user_ext_set_property(struct xe_device *xe,
>>  	    XE_IOCTL_DBG(xe, ext.pad) ||
>>  	    XE_IOCTL_DBG(xe, ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY &&
>>  			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE &&
>> -			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE))
>> +			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE &&
>> +			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP))
>>  		return -EINVAL;
>>
>>  	idx = array_index_nospec(ext.property, ARRAY_SIZE(exec_queue_set_property_funcs));
>> @@ -626,6 +801,12 @@ static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue
>>  		return exec_queue_user_extensions(xe, q, ext.next_extension,
>>  						  ++ext_number);
>>
>> +	if (xe_exec_queue_is_multi_queue_primary(q)) {
>> +		err = xe_exec_queue_group_init(xe, q);
>> +		if (XE_IOCTL_DBG(xe, err))
>> +			return err;
>> +	}
>> +
>>  	return 0;
>>  }
>>
>> @@ -780,12 +961,16 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>>  		if (IS_ERR(q))
>>  			return PTR_ERR(q);
>>
>> +		err = xe_exec_queue_group_add(xe, q);
>> +		if (XE_IOCTL_DBG(xe, err))
>> +			goto put_exec_queue;
>> +
>>  		if (xe_vm_in_preempt_fence_mode(vm)) {
>>  			q->lr.context = dma_fence_context_alloc(1);
>>
>>  			err = xe_vm_add_compute_exec_queue(vm, q);
>>  			if (XE_IOCTL_DBG(xe, err))
>> -				goto put_exec_queue;
>> +				goto delete_queue_group;
>>  		}
>>
>>  		if (q->vm && q->hwe->hw_engine_group) {
>> @@ -808,6 +993,8 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>>
>>  kill_exec_queue:
>>  	xe_exec_queue_kill(q);
>> +delete_queue_group:
>> +	xe_exec_queue_group_delete(q);
>>  put_exec_queue:
>>  	xe_exec_queue_put(q);
>>  	return err;
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
>> index a4dfbe858bda..8cd6487018fa 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue.h
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
>> @@ -62,6 +62,53 @@ static inline bool xe_exec_queue_uses_pxp(struct xe_exec_queue *q)
>>  	return q->pxp.type;
>>  }
>>
>> +/**
>> + * xe_exec_queue_is_multi_queue() - Whether an exec_queue is part of a queue group.
>> + * @q: The exec_queue
>> + *
>> + * Return: True if the exec_queue is part of a queue group, false otherwise.
>> + */
>> +static inline bool xe_exec_queue_is_multi_queue(struct xe_exec_queue *q)
>> +{
>> +	return q->multi_queue.valid;
>> +}
>> +
>> +/**
>> + * xe_exec_queue_is_multi_queue_primary() - Whether an exec_queue is primary queue
>> + * of a multi queue group.
>> + * @q: The exec_queue
>> + *
>> + * Return: True if @q is primary queue of a queue group, false otherwise.
>> + */
>> +static inline bool xe_exec_queue_is_multi_queue_primary(struct xe_exec_queue *q)
>> +{
>> +	return q->multi_queue.is_primary;
>> +}
>> +
>> +/**
>> + * xe_exec_queue_is_multi_queue_secondary() - Whether an exec_queue is secondary queue
>> + * of a multi queue group.
>> + * @q: The exec_queue
>> + *
>> + * Return: True if @q is secondary queue of a queue group, false otherwise.
>> + */
>> +static inline bool xe_exec_queue_is_multi_queue_secondary(struct xe_exec_queue *q)
>> +{
>> +	return xe_exec_queue_is_multi_queue(q) && !q->multi_queue.is_primary;
>> +}
>> +
>> +/**
>> + * xe_exec_queue_multi_queue_primary() - Get multi queue group's primary queue
>> + * @q: The exec_queue
>> + *
>> + * If @q belongs to a multi queue group, then the primary queue of the group will
>> + * be returned. Otherwise, @q will be returned.
>> + */
>> +static inline struct xe_exec_queue *xe_exec_queue_multi_queue_primary(struct xe_exec_queue *q)
>> +{
>> +	return xe_exec_queue_is_multi_queue(q) ? q->multi_queue.group->primary : q;
>> +}
>> +
>>  bool xe_exec_queue_is_lr(struct xe_exec_queue *q);
>>
>>  bool xe_exec_queue_is_idle(struct xe_exec_queue *q);
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> index c8807268ec6c..3856776df5c4 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> @@ -31,6 +31,24 @@ enum xe_exec_queue_priority {
>>  	XE_EXEC_QUEUE_PRIORITY_COUNT
>>  };
>>
>> +/**
>> + * struct xe_exec_queue_group - Execution multi queue group
>> + *
>> + * Contains multi queue group information.
>> + */
>> +struct xe_exec_queue_group {
>> +	/** @primary: Primary queue of this group */
>> +	struct xe_exec_queue *primary;
>> +	/** @lock: Queue group update lock */
>> +	struct mutex lock;
>> +	/** @cgp_bo: BO for the Context Group Page */
>> +	struct xe_bo *cgp_bo;
>> +	/** @xa: xarray to store LRCs */
>> +	struct xarray xa;
>> +	/** @list_lock: Secondary queue list lock */
>> +	struct mutex list_lock;
>> +};
>> +
>>  /**
>>   * struct xe_exec_queue - Execution queue
>>   *
>> @@ -110,6 +128,18 @@ struct xe_exec_queue {
>>  		struct xe_guc_exec_queue *guc;
>>  	};
>>
>> +	/** @multi_queue: Multi queue information */
>> +	struct {
>> +		/** @multi_queue.group: Queue group information */
>> +		struct xe_exec_queue_group *group;
>> +		/** @multi_queue.pos: Position of queue within the multi-queue group */
>> +		u8 pos;
>> +		/** @multi_queue.valid: Queue belongs to a multi queue group */
>> +		u8 valid:1;
>> +		/** @multi_queue.is_primary: Is primary queue (Q0) of the group */
>> +		u8 is_primary:1;
>> +	} multi_queue;
>> +
>>  	/** @sched_props: scheduling properties */
>>  	struct {
>>  		/** @sched_props.timeslice_us: timeslice period in micro-seconds */
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index 47853659a705..d903b3a55ec1 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -1252,6 +1252,12 @@ struct drm_xe_vm_bind {
>>   *    Given that going into a power-saving state kills PXP HWDRM sessions,
>>   *    runtime PM will be blocked while queues of this type are alive.
>>   *    All PXP queues will be killed if a PXP invalidation event occurs.
>> + *  - %DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP - Create a multi-queue group
>> + *    or add secondary queues to a multi-queue group.
>> + *    If the extension's 'value' field has %DRM_XE_MULTI_GROUP_CREATE flag set,
>> + *    then a new multi-queue group is created with this queue as the primary queue
>> + *    (Q0). Otherwise, the queue gets added to the multi-queue group whose primary
>> + *    queue id is specified in the 'value' field.
>>   *
>>   * The example below shows how to use @drm_xe_exec_queue_create to create
>>   * a simple exec_queue (no parallel submission) of class
>> @@ -1292,6 +1298,8 @@ struct drm_xe_exec_queue_create {
>>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY		0
>>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE		1
>>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE		2
>> +#define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP		3
>> +#define     DRM_XE_MULTI_GROUP_CREATE				(1ull << 63)
>>  	/** @extensions: Pointer to the first extension struct, if any */
>>  	__u64 extensions;
>>
>> --
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/16] drm/xe/multi_queue: Add user interface for multi queue support
  2025-11-02 17:37   ` Matthew Brost
@ 2025-11-03 23:06     ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-03 23:06 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Sun, Nov 02, 2025 at 09:37:35AM -0800, Matthew Brost wrote:
>On Fri, Oct 31, 2025 at 11:29:22AM -0700, Niranjana Vishwanathapura wrote:
>> Multi Queue is a new mode of execution supported by the compute and
>> blitter copy command streamers (CCS and BCS, respectively). It is an
>> enhancement of the existing hardware architecture and leverages the
>> same submission model. It enables support for efficient, parallel
>> execution of multiple queues within a single context. All the queues
>> of a group must use the same address space (VM).
>>
>> The new DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE execution queue
>> property supports creating a multi queue group and adding queues to
>> a queue group. All queues of a multi queue group share the same
>> context.
>>
>> A exec queue create ioctl call with above property specified with value
>> DRM_XE_SUPER_GROUP_CREATE will create a new multi queue group with the
>> queue being created as the primary queue (aka q0) of the group. To add
>> secondary queues to the group, they need to be created with the above
>> property with id of the primary queue as the value. The properties of
>> the primary queue (like priority, timeslice) applies to the whole group.
>> So, these properties can't be set for secondary queues of a group.
>>
>> Once destroyed, the secondary queues of a multi queue group can't be
>> replaced. However, they can be dynamically added to the group up to a
>> total of 64 queues per group. Once the primary queue is destroyed,
>> secondary queues can't be added to the queue group.
>>
>> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_exec_queue.c       | 191 ++++++++++++++++++++++-
>>  drivers/gpu/drm/xe/xe_exec_queue.h       |  47 ++++++
>>  drivers/gpu/drm/xe/xe_exec_queue_types.h |  30 ++++
>>  include/uapi/drm/xe_drm.h                |   8 +
>>  4 files changed, 274 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>> index 1b57d7c2cc94..86404a7c9fe4 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>> @@ -12,6 +12,7 @@
>>  #include <drm/drm_file.h>
>>  #include <uapi/drm/xe_drm.h>
>>
>> +#include "xe_bo.h"
>>  #include "xe_dep_scheduler.h"
>>  #include "xe_device.h"
>>  #include "xe_gt.h"
>> @@ -62,6 +63,32 @@ enum xe_exec_queue_sched_prop {
>>  static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue *q,
>>  				      u64 extensions, int ext_number);
>>
>> +static void xe_exec_queue_group_cleanup(struct xe_exec_queue *q)
>> +{
>
>A little incongruent with xe_exec_queue_group_add/delete as those
>functions are blindly called + check if needed in function, compared to
>this function checking at the caller if multi-queue. I don't have a huge
>preference but I'd at least make the call semantics consistent.
>

Ok, will fix by calling xe_exec_queue_group_add/delete only for the
secondary queues.

>> +	struct xe_exec_queue_group *group = q->multi_queue.group;
>> +	struct xe_lrc *lrc;
>> +	unsigned long idx;
>> +
>> +	if (xe_exec_queue_is_multi_queue_secondary(q)) {
>> +		xe_exec_queue_put(xe_exec_queue_multi_queue_primary(q));
>
>It took me a minute to figure out where the associated get on the
>primary came from - it is from xe_exec_queue_lookup in
>xe_exec_queue_group_validate. Can you comments along the lines:
>
>/* Put pairs with get from ... */
>
>/* Get pairs with put in ... */
>

Ok, will add.

>> +		return;
>> +	}
>> +
>> +	if (!group)
>> +		return;
>> +
>> +	/* Primary queue cleanup */
>> +	mutex_lock(&group->lock);
>
>As discussedi [1], group->lock not needed.
>
>[1] https://patchwork.freedesktop.org/patch/684847/?series=156865&rev=1#comment_1257408
>

Ok

>> +	xa_for_each(&group->xa, idx, lrc)
>> +		xe_lrc_put(lrc);
>> +	mutex_unlock(&group->lock);
>> +
>> +	xa_destroy(&group->xa);
>> +	mutex_destroy(&group->lock);
>> +	xe_bo_unpin_map_no_vm(group->cgp_bo);
>> +	kfree(group);
>> +}
>> +
>>  static void __xe_exec_queue_free(struct xe_exec_queue *q)
>>  {
>>  	int i;
>> @@ -72,6 +99,10 @@ static void __xe_exec_queue_free(struct xe_exec_queue *q)
>>
>>  	if (xe_exec_queue_uses_pxp(q))
>>  		xe_pxp_exec_queue_remove(gt_to_xe(q->gt)->pxp, q);
>> +
>> +	if (xe_exec_queue_is_multi_queue(q))
>> +		xe_exec_queue_group_cleanup(q);
>> +
>>  	if (q->vm)
>>  		xe_vm_put(q->vm);
>>
>> @@ -549,6 +580,148 @@ exec_queue_set_pxp_type(struct xe_device *xe, struct xe_exec_queue *q, u64 value
>>  	return xe_pxp_exec_queue_set_type(xe->pxp, q, DRM_XE_PXP_TYPE_HWDRM);
>>  }
>>
>> +static int xe_exec_queue_group_init(struct xe_device *xe, struct xe_exec_queue *q)
>> +{
>> +	struct xe_tile *tile = gt_to_tile(q->gt);
>> +	struct xe_exec_queue_group *group;
>> +	struct xe_bo *bo;
>> +
>> +	group = kzalloc(sizeof(*group), GFP_KERNEL);
>> +	if (!group)
>> +		return -ENOMEM;
>> +
>> +	bo = xe_bo_create_pin_map_novm(xe, tile, SZ_4K, ttm_bo_type_kernel,
>> +				       XE_BO_FLAG_VRAM_IF_DGFX(tile) |
>> +				       XE_BO_FLAG_GGTT, false);
>
>XE_BO_FLAG_GGTT_INVALIDATE | XE_BO_FLAG_PINNED_LATE_RESTORE are needed.
>
>I believe XE_BO_FLAG_FORCE_USER_VRAM is needed too, that's new so not
>100% sure but I'd check the git blame on that to figure out if it is
>needed.
>

Will add XE_BO_FLAG_GGTT_INVALIDATE here.

Looking at xe_bo_create_pin_map_novm() usecase in xe_lrc.c looks likes,
XE_BO_FLAG_PINNED_LATE_RESTORE and XE_BO_FLAG_FORCE_USER_VRAM are only
called for user bos. There isn't much documentation for these flags.

>> +	if (IS_ERR(bo)) {
>> +		drm_err(&xe->drm, "CGP bo allocation for queue group failed: %ld\n",
>> +			PTR_ERR(bo));
>> +		kfree(group);
>> +		return PTR_ERR(bo);
>> +	}
>> +
>> +	xe_map_memset(xe, &bo->vmap, 0, 0, SZ_4K);
>> +
>> +	group->primary = q;
>> +	group->cgp_bo = bo;
>> +	xa_init_flags(&group->xa, XA_FLAGS_ALLOC1);
>> +	mutex_init(&group->lock);
>> +	mutex_init(&group->list_lock);
>
>See my comments here [1] the list lock being initialized here, but
>used/destroyed in [1].
>
>[1] https://patchwork.freedesktop.org/patch/684850/?series=156865&rev=1#comment_1257596
>

Ok, will fix.

>> +	q->multi_queue.group = group;
>> +
>> +	return 0;
>> +}
>> +
>> +static inline bool xe_exec_queue_supports_multi_queue(struct xe_exec_queue *q)
>> +{
>> +	return q->gt->info.multi_queue_enable_mask & BIT(q->class);
>> +}
>> +
>> +static int xe_exec_queue_group_validate(struct xe_device *xe, struct xe_exec_queue *q,
>> +					u32 primary_id)
>> +{
>> +	struct xe_exec_queue_group *group;
>> +	struct xe_exec_queue *primary;
>> +	int ret;
>> +
>> +	primary = xe_exec_queue_lookup(q->vm->xef, primary_id);
>> +	if (XE_IOCTL_DBG(xe, !primary))
>> +		return -ENOENT;
>> +
>> +	if (XE_IOCTL_DBG(xe, !xe_exec_queue_is_multi_queue_primary(primary)) ||
>> +	    XE_IOCTL_DBG(xe, q->vm != primary->vm) ||
>> +	    XE_IOCTL_DBG(xe, q->logical_mask != primary->logical_mask)) {
>> +		ret = -EINVAL;
>> +		goto put_primary;
>> +	}
>> +
>> +	group = primary->multi_queue.group;
>> +	q->multi_queue.valid = true;
>> +	q->multi_queue.group = group;
>> +
>> +	return 0;
>> +put_primary:
>> +	xe_exec_queue_put(primary);
>> +	return ret;
>> +}
>> +
>> +#define XE_MAX_GROUP_SIZE	64
>> +static int xe_exec_queue_group_add(struct xe_device *xe, struct xe_exec_queue *q)
>> +{
>> +	struct xe_exec_queue_group *group = q->multi_queue.group;
>> +	u32 pos;
>> +	int err;
>> +
>> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
>> +		return 0;
>> +
>> +	mutex_lock(&group->lock);
>> +	err = xa_alloc(&group->xa, &pos, xe_lrc_get(q->lrc[0]),
>> +		       XA_LIMIT(1, XE_MAX_GROUP_SIZE - 1), GFP_KERNEL);
>
>To consolidate threads [2], add quick inline comments here around ref
>counting.
>
>[2] https://patchwork.freedesktop.org/patch/684847/?series=156865&rev=1#comment_1257594
>

Ok

>> +	if (XE_IOCTL_DBG(xe, err)) {
>> +		xe_lrc_put(q->lrc[0]);
>> +		mutex_unlock(&group->lock);
>> +
>> +		/* It is invalid if queue group limit is exceeded */
>> +		if (err == -EBUSY)
>> +			err = -EINVAL;
>> +
>> +		return err;
>> +	}
>> +
>> +	q->multi_queue.pos = pos;
>> +	mutex_unlock(&group->lock);
>> +
>> +	return 0;
>> +}
>> +
>> +static void xe_exec_queue_group_delete(struct xe_exec_queue *q)
>> +{
>> +	struct xe_exec_queue_group *group = q->multi_queue.group;
>> +	struct xe_lrc *lrc;
>> +
>> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
>> +		return;
>> +
>> +	mutex_lock(&group->lock);
>> +	lrc = xa_erase(&group->xa, q->multi_queue.pos);
>> +	if (lrc)
>>
>
>I think here it is an assert if lrc is NULL? I don't think it can be
>NULL unless there is bug somewhere, right? If so, let's do an assert to
>ensure software correctness.
>

Yah, will add assert.

> +		xe_lrc_put(lrc);
>> +	mutex_unlock(&group->lock);
>> +}
>> +
>> +static int exec_queue_set_multi_group(struct xe_device *xe, struct xe_exec_queue *q,
>> +				      u64 value)
>> +{
>> +	if (XE_IOCTL_DBG(xe, !xe_exec_queue_supports_multi_queue(q)))
>> +		return -ENODEV;
>> +
>> +	if (XE_IOCTL_DBG(xe, !xe_device_uc_enabled(xe)))
>> +		return -EOPNOTSUPP;
>> +
>> +	if (XE_IOCTL_DBG(xe, xe_exec_queue_is_parallel(q)))
>> +		return -EINVAL;
>> +
>> +	if (XE_IOCTL_DBG(xe, xe_exec_queue_is_multi_queue(q)))
>> +		return -EINVAL;
>> +
>> +	if (value & DRM_XE_MULTI_GROUP_CREATE) {
>> +		if (XE_IOCTL_DBG(xe, value & ~DRM_XE_MULTI_GROUP_CREATE))
>> +			return -EINVAL;
>> +
>> +		q->multi_queue.valid = true;
>> +		q->multi_queue.is_primary = true;
>> +		q->multi_queue.pos = 0;
>> +		return 0;
>> +	}
>> +
>> +	/* While adding secondary queues, the upper 32 bits must be 0 */
>
>State this in uAPI doc too.
>

Ok, will do.

>> +	if (XE_IOCTL_DBG(xe, value & (~0ull << 32)))
>> +		return -EINVAL;
>> +
>> +	return xe_exec_queue_group_validate(xe, q, value);
>> +}
>> +
>>  typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
>>  					     struct xe_exec_queue *q,
>>  					     u64 value);
>> @@ -557,6 +730,7 @@ static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
>>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY] = exec_queue_set_priority,
>>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE] = exec_queue_set_timeslice,
>>  	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE] = exec_queue_set_pxp_type,
>> +	[DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP] = exec_queue_set_multi_group,
>>  };
>>
>>  static int exec_queue_user_ext_set_property(struct xe_device *xe,
>> @@ -577,7 +751,8 @@ static int exec_queue_user_ext_set_property(struct xe_device *xe,
>>  	    XE_IOCTL_DBG(xe, ext.pad) ||
>>  	    XE_IOCTL_DBG(xe, ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY &&
>>  			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE &&
>> -			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE))
>> +			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE &&
>> +			 ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP))
>>  		return -EINVAL;
>>
>>  	idx = array_index_nospec(ext.property, ARRAY_SIZE(exec_queue_set_property_funcs));
>> @@ -626,6 +801,12 @@ static int exec_queue_user_extensions(struct xe_device *xe, struct xe_exec_queue
>>  		return exec_queue_user_extensions(xe, q, ext.next_extension,
>>  						  ++ext_number);
>>
>> +	if (xe_exec_queue_is_multi_queue_primary(q)) {
>> +		err = xe_exec_queue_group_init(xe, q);
>> +		if (XE_IOCTL_DBG(xe, err))
>> +			return err;
>> +	}
>
>Any particular reason this isn't in exec_queue_set_multi_group? Or
>perhaps in xe_exec_queue_create_ioctl? It is bit goofy to have in a very
>generic function here.
>

xe_exec_queue_group_init() does group and CGP allocations.
I want to do that only after all extensions are parsed through.
As extensions can be at odds with each other ending up in error scenario,
I think it would be good to do allocations only after extensions are parsed.

This code will look much better after the below patch in the series.
https://patchwork.freedesktop.org/patch/684848/?series=156865&rev=1

>> +
>>  	return 0;
>>  }
>>
>> @@ -780,12 +961,16 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>>  		if (IS_ERR(q))
>>  			return PTR_ERR(q);
>>
>> +		err = xe_exec_queue_group_add(xe, q);
>> +		if (XE_IOCTL_DBG(xe, err))
>> +			goto put_exec_queue;
>> +
>>  		if (xe_vm_in_preempt_fence_mode(vm)) {
>>  			q->lr.context = dma_fence_context_alloc(1);
>>
>>  			err = xe_vm_add_compute_exec_queue(vm, q);
>>  			if (XE_IOCTL_DBG(xe, err))
>> -				goto put_exec_queue;
>> +				goto delete_queue_group;
>>  		}
>>
>>  		if (q->vm && q->hwe->hw_engine_group) {
>> @@ -808,6 +993,8 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
>>
>>  kill_exec_queue:
>>  	xe_exec_queue_kill(q);
>> +delete_queue_group:
>> +	xe_exec_queue_group_delete(q);
>>  put_exec_queue:
>>  	xe_exec_queue_put(q);
>>  	return err;
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
>> index a4dfbe858bda..8cd6487018fa 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue.h
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
>> @@ -62,6 +62,53 @@ static inline bool xe_exec_queue_uses_pxp(struct xe_exec_queue *q)
>>  	return q->pxp.type;
>>  }
>>
>> +/**
>> + * xe_exec_queue_is_multi_queue() - Whether an exec_queue is part of a queue group.
>> + * @q: The exec_queue
>> + *
>> + * Return: True if the exec_queue is part of a queue group, false otherwise.
>> + */
>> +static inline bool xe_exec_queue_is_multi_queue(struct xe_exec_queue *q)
>> +{
>> +	return q->multi_queue.valid;
>> +}
>> +
>> +/**
>> + * xe_exec_queue_is_multi_queue_primary() - Whether an exec_queue is primary queue
>> + * of a multi queue group.
>> + * @q: The exec_queue
>> + *
>> + * Return: True if @q is primary queue of a queue group, false otherwise.
>> + */
>> +static inline bool xe_exec_queue_is_multi_queue_primary(struct xe_exec_queue *q)
>> +{
>> +	return q->multi_queue.is_primary;
>> +}
>> +
>> +/**
>> + * xe_exec_queue_is_multi_queue_secondary() - Whether an exec_queue is secondary queue
>> + * of a multi queue group.
>> + * @q: The exec_queue
>> + *
>> + * Return: True if @q is secondary queue of a queue group, false otherwise.
>> + */
>> +static inline bool xe_exec_queue_is_multi_queue_secondary(struct xe_exec_queue *q)
>> +{
>> +	return xe_exec_queue_is_multi_queue(q) && !q->multi_queue.is_primary;
>
>
>&& !xe_exec_queue_is_multi_queue_primary()
>

Ok

>> +}
>> +
>> +/**
>> + * xe_exec_queue_multi_queue_primary() - Get multi queue group's primary queue
>> + * @q: The exec_queue
>> + *
>> + * If @q belongs to a multi queue group, then the primary queue of the group will
>> + * be returned. Otherwise, @q will be returned.
>> + */
>> +static inline struct xe_exec_queue *xe_exec_queue_multi_queue_primary(struct xe_exec_queue *q)
>> +{
>> +	return xe_exec_queue_is_multi_queue(q) ? q->multi_queue.group->primary : q;
>> +}
>> +
>>  bool xe_exec_queue_is_lr(struct xe_exec_queue *q);
>>
>>  bool xe_exec_queue_is_idle(struct xe_exec_queue *q);
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> index c8807268ec6c..3856776df5c4 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> @@ -31,6 +31,24 @@ enum xe_exec_queue_priority {
>>  	XE_EXEC_QUEUE_PRIORITY_COUNT
>>  };
>>
>> +/**
>> + * struct xe_exec_queue_group - Execution multi queue group
>> + *
>> + * Contains multi queue group information.
>> + */
>> +struct xe_exec_queue_group {
>> +	/** @primary: Primary queue of this group */
>> +	struct xe_exec_queue *primary;
>> +	/** @lock: Queue group update lock */
>> +	struct mutex lock;
>> +	/** @cgp_bo: BO for the Context Group Page */
>> +	struct xe_bo *cgp_bo;
>> +	/** @xa: xarray to store LRCs */
>> +	struct xarray xa;
>> +	/** @list_lock: Secondary queue list lock */
>> +	struct mutex list_lock;
>> +};
>> +
>>  /**
>>   * struct xe_exec_queue - Execution queue
>>   *
>> @@ -110,6 +128,18 @@ struct xe_exec_queue {
>>  		struct xe_guc_exec_queue *guc;
>>  	};
>>
>> +	/** @multi_queue: Multi queue information */
>> +	struct {
>> +		/** @multi_queue.group: Queue group information */
>> +		struct xe_exec_queue_group *group;
>> +		/** @multi_queue.pos: Position of queue within the multi-queue group */
>> +		u8 pos;
>> +		/** @multi_queue.valid: Queue belongs to a multi queue group */
>> +		u8 valid:1;
>> +		/** @multi_queue.is_primary: Is primary queue (Q0) of the group */
>> +		u8 is_primary:1;
>> +	} multi_queue;
>> +
>>  	/** @sched_props: scheduling properties */
>>  	struct {
>>  		/** @sched_props.timeslice_us: timeslice period in micro-seconds */
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index 47853659a705..d903b3a55ec1 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -1252,6 +1252,12 @@ struct drm_xe_vm_bind {
>>   *    Given that going into a power-saving state kills PXP HWDRM sessions,
>>   *    runtime PM will be blocked while queues of this type are alive.
>>   *    All PXP queues will be killed if a PXP invalidation event occurs.
>> + *  - %DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP - Create a multi-queue group
>> + *    or add secondary queues to a multi-queue group.
>> + *    If the extension's 'value' field has %DRM_XE_MULTI_GROUP_CREATE flag set,
>> + *    then a new multi-queue group is created with this queue as the primary queue
>> + *    (Q0). Otherwise, the queue gets added to the multi-queue group whose primary
>> + *    queue id is specified in the 'value' field.
>
>s/queue id/exec_queue_id
>
>^^^ to match names in structure.
>

Ok, will change.

Niranjana

>Matt
>
>>   *
>>   * The example below shows how to use @drm_xe_exec_queue_create to create
>>   * a simple exec_queue (no parallel submission) of class
>> @@ -1292,6 +1298,8 @@ struct drm_xe_exec_queue_create {
>>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY		0
>>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE		1
>>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE		2
>> +#define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP		3
>> +#define     DRM_XE_MULTI_GROUP_CREATE				(1ull << 63)
>>  	/** @extensions: Pointer to the first extension struct, if any */
>>  	__u64 extensions;
>>
>> --
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 09/16] drm/xe/multi_queue: Handle tearing down of a multi queue
  2025-11-02  0:39   ` Matthew Brost
@ 2025-11-04  3:35     ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-04  3:35 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Sat, Nov 01, 2025 at 05:39:10PM -0700, Matthew Brost wrote:
>On Fri, Oct 31, 2025 at 11:29:29AM -0700, Niranjana Vishwanathapura wrote:
>> As all queues of a multi queue group use the primary queue of the group
>> to interface with GuC. Hence there is a dependency between the queues of
>> the group. So, when primary queue of a multi queue group is cleaned up,
>> also trigger a cleanup of the secondary queues. During cleanup, stop and
>> re-start submission for all queues of a multi queue group to avoid any
>> submission happening in parallel when a queue is being cleaned up.
>>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_exec_queue.c       |   2 +
>>  drivers/gpu/drm/xe/xe_exec_queue_types.h |   4 +
>>  drivers/gpu/drm/xe/xe_guc_submit.c       | 150 +++++++++++++++++++----
>>  3 files changed, 134 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>> index 98f8f1c7f13b..3c1bb4f10fd5 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>> @@ -85,6 +85,7 @@ static void xe_exec_queue_group_cleanup(struct xe_exec_queue *q)
>>
>>  	xa_destroy(&group->xa);
>>  	mutex_destroy(&group->lock);
>> +	mutex_destroy(&group->list_lock);
>
>You init this lock in eariler patch but destroy it in this. Can we get
>the init/destroy/instanaition in a single patch?
>

Ok, will fix.

>>  	xe_bo_unpin_map_no_vm(group->cgp_bo);
>>  	kfree(group);
>>  }
>> @@ -605,6 +606,7 @@ static int xe_exec_queue_group_init(struct xe_device *xe, struct xe_exec_queue *
>>
>>  	group->primary = q;
>>  	group->cgp_bo = bo;
>> +	INIT_LIST_HEAD(&group->list);
>>  	xa_init_flags(&group->xa, XA_FLAGS_ALLOC1);
>>  	mutex_init(&group->lock);
>>  	mutex_init(&group->list_lock);
>
>group->list_lock is taken in the submission backend, which is entirely
>in the path reclain. Can we teach lockdep this lock is in the path of
>reclaim?
>
>e.g.,
>
>fs_reclaim_acquire(GFP_KERNEL);
>might_lock(&group->list_lock);
>fs_reclaim_release(GFP_KERNEL);
>

Ok, will add.

>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> index dcb55b069ed8..e64b6588923e 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> @@ -51,6 +51,8 @@ struct xe_exec_queue_group {
>>  	struct xe_bo *cgp_bo;
>>  	/** @xa: xarray to store LRCs */
>>  	struct xarray xa;
>> +	/** @list: List of all secondary queues in the group */
>> +	struct list_head list;
>>  	/** @list_lock: Secondary queue list lock */
>>  	struct mutex list_lock;
>>  	/** @sync_pending: CGP_SYNC_DONE g2h response pending */
>> @@ -140,6 +142,8 @@ struct xe_exec_queue {
>>  	struct {
>>  		/** @multi_queue.group: Queue group information */
>>  		struct xe_exec_queue_group *group;
>> +		/** @multi_queue.link: Link into group's secondary queues list */
>> +		struct list_head link;
>>  		/** @multi_queue.priority: Queue priority within the multi-queue group */
>>  		enum xe_multi_queue_priority priority;
>>  		/** @multi_queue.pos: Position of queue within the multi-queue group */
>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>> index b84a0be2eefe..87c13feb2cef 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>> @@ -920,6 +920,81 @@ static void wq_item_append(struct xe_exec_queue *q)
>>  	parallel_write(xe, map, wq_desc.tail, q->guc->wqi_tail);
>>  }
>>
>> +static void xe_guc_exec_queue_submission_start(struct xe_exec_queue *q)
>> +{
>> +	/*
>> +	 * If the exec queue is part of a multi queue group, then start submission
>> +	 * on all queues of the multi queue group.
>> +	 */
>> +	if (xe_exec_queue_is_multi_queue(q)) {
>> +		struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
>> +		struct xe_exec_queue_group *group = q->multi_queue.group;
>> +		struct xe_exec_queue *eq;
>> +
>> +		xe_sched_submission_start(&primary->guc->sched);
>> +
>> +		mutex_lock(&group->list_lock);
>> +		list_for_each_entry(eq, &group->list, multi_queue.link)
>> +			xe_sched_submission_start(&eq->guc->sched);
>> +		mutex_unlock(&group->list_lock);
>> +	} else {
>> +		xe_sched_submission_start(&q->guc->sched);
>> +	}
>> +}
>> +
>> +static void xe_guc_exec_queue_submission_stop(struct xe_exec_queue *q)
>> +{
>> +	/*
>> +	 * If the exec queue is part of a multi queue group, then stop submission
>> +	 * on all queues of the multi queue group.
>> +	 */
>> +	if (xe_exec_queue_is_multi_queue(q)) {
>> +		struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
>> +		struct xe_exec_queue_group *group = q->multi_queue.group;
>> +		struct xe_exec_queue *eq;
>> +
>> +		xe_sched_submission_stop(&primary->guc->sched);
>> +
>> +		mutex_lock(&group->list_lock);
>> +		list_for_each_entry(eq, &group->list, multi_queue.link)
>> +			xe_sched_submission_stop(&eq->guc->sched);
>> +		mutex_unlock(&group->list_lock);
>> +	} else {
>> +		xe_sched_submission_stop(&q->guc->sched);
>> +	}
>> +}
>> +
>> +static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
>> +{
>> +	struct xe_guc *guc = exec_queue_to_guc(q);
>> +	struct xe_device *xe = guc_to_xe(guc);
>> +
>> +	/** to wakeup xe_wait_user_fence ioctl if exec queue is reset */
>> +	wake_up_all(&xe->ufence_wq);
>> +
>> +	if (xe_exec_queue_is_lr(q))
>> +		queue_work(guc_to_gt(guc)->ordered_wq, &q->guc->lr_tdr);
>> +	else
>> +		xe_sched_tdr_queue_imm(&q->guc->sched);
>> +}
>> +
>> +static void xe_guc_exec_queue_trigger_secondary_cleanup(struct xe_exec_queue *q)
>> +{
>> +	struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
>> +	struct xe_exec_queue_group *group = q->multi_queue.group;
>> +	struct xe_exec_queue *eq;
>> +
>> +	mutex_lock(&group->list_lock);
>> +	list_for_each_entry(eq, &group->list, multi_queue.link) {
>> +		if (exec_queue_reset(primary))
>
>Do we need to propagate banned or killed?
>

As the guc interface is only with the primary queue, I am only
propagating the reset here which gets set in the g2h error handlers.
The drm scheduler interface is on all queues. So, I am not propagating
others.

>Also happens if secondary queue is reset or a job times out? Does that
>affect any of the other LRCs in the group?
>

As the guc interface is only with primary, it can only reset the primary.
Timeout on the secondaries should not affect the other queues.

>> +			set_exec_queue_reset(eq);
>> +
>> +		if (!exec_queue_banned(eq))
>> +			xe_guc_exec_queue_trigger_cleanup(eq);
>> +	}
>> +	mutex_unlock(&group->list_lock);
>> +}
>> +
>>  #define RESUME_PENDING	~0x0ull
>>  static void submit_exec_queue(struct xe_exec_queue *q, struct xe_sched_job *job)
>>  {
>> @@ -1098,20 +1173,6 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
>>  			       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
>>  }
>>
>> -static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
>> -{
>> -	struct xe_guc *guc = exec_queue_to_guc(q);
>> -	struct xe_device *xe = guc_to_xe(guc);
>> -
>> -	/** to wakeup xe_wait_user_fence ioctl if exec queue is reset */
>> -	wake_up_all(&xe->ufence_wq);
>> -
>> -	if (xe_exec_queue_is_lr(q))
>> -		queue_work(guc_to_gt(guc)->ordered_wq, &q->guc->lr_tdr);
>> -	else
>> -		xe_sched_tdr_queue_imm(&q->guc->sched);
>> -}
>> -
>>  /**
>>   * xe_guc_submit_wedge() - Wedge GuC submission
>>   * @guc: the GuC object
>> @@ -1185,8 +1246,12 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
>>  	if (!exec_queue_killed(q))
>>  		wedged = guc_submit_hint_wedged(exec_queue_to_guc(q));
>>
>> -	/* Kill the run_job / process_msg entry points */
>> -	xe_sched_submission_stop(sched);
>> +	/*
>> +	 * Kill the run_job / process_msg entry points.
>> +	 * As this function is serialized across exec queues, it is safe to
>> +	 * stop and restart submission on all queues of a multi queue group.
>> +	 */
>> +	xe_guc_exec_queue_submission_stop(q);
>>
>>  	/*
>>  	 * Engine state now mostly stable, disable scheduling / deregister if
>> @@ -1222,7 +1287,7 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
>>  				   q->guc->id);
>>  			xe_devcoredump(q, NULL, "Schedule disable failed to respond, guc_id=%d\n",
>>  				       q->guc->id);
>> -			xe_sched_submission_start(sched);
>> +			xe_guc_exec_queue_submission_start(q);
>>  			xe_gt_reset_async(q->gt);
>>  			return;
>>  		}
>> @@ -1233,7 +1298,11 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
>>
>>  	xe_hw_fence_irq_stop(q->fence_irq);
>>
>> -	xe_sched_submission_start(sched);
>> +	xe_guc_exec_queue_submission_start(q);
>> +
>> +	/* Trigger cleanup of secondary queues of multi queue group */
>> +	if (xe_exec_queue_is_multi_queue_primary(q))
>> +		xe_guc_exec_queue_trigger_secondary_cleanup(q);
>>
>>  	spin_lock(&sched->base.job_list_lock);
>>  	list_for_each_entry(job, &sched->base.pending_list, drm.list)
>> @@ -1392,8 +1461,12 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>>  	    vf_recovery(guc))
>>  		return DRM_GPU_SCHED_STAT_NO_HANG;
>>
>> -	/* Kill the run_job entry point */
>> -	xe_sched_submission_stop(sched);
>> +	/*
>> +	 * Kill the run_job entry point.
>> +	 * As this function is serialized across exec queues, it is safe to
>> +	 * stop and restart submission on all queues of a multi queue group.
>> +	 */
>> +	xe_guc_exec_queue_submission_stop(q);
>>
>
>I'd know where to stick this comment, but disable_scheduling() looks
>like a pure software things for secondary queues. We currently need the
>LRC to not be running to accurately sample the timestamp - I think we
>could fix that part, Umesh would likely know for sure. But until then
>I'm pretty sure we'd need to disable scheduling on primary for an
>accurate sample of the secondaries queue's LRC timestamp.
>

Hmm...not sure about LRC timestamping. Timestamping is done during TDR,
right? Looks like it should be fine for all cases except secondary job
timeouts. It is not trivial to disable scheduling of primary in the
TDR handling of secondary. It would be good if we don't have to do it.
Let me check with Umesh.

>>  	/* Must check all state after stopping scheduler */
>>  	skip_timeout_check = exec_queue_reset(q) ||
>> @@ -1552,7 +1625,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>>  	 * fences that are complete
>>  	 */
>>  	xe_sched_add_pending_job(sched, job);
>> -	xe_sched_submission_start(sched);
>> +	xe_guc_exec_queue_submission_start(q);
>>
>>  	xe_guc_exec_queue_trigger_cleanup(q);
>>
>> @@ -1565,6 +1638,10 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>>  	/* Start fence signaling */
>>  	xe_hw_fence_irq_start(q->fence_irq);
>>
>> +	/* Trigger cleanup of secondary queues of multi queue group */
>> +	if (xe_exec_queue_is_multi_queue_primary(q))
>> +		xe_guc_exec_queue_trigger_secondary_cleanup(q);
>> +
>
>I'd stick this part by xe_guc_exec_queue_trigger_cleanup above.
>

I don't think that works. I had tried that before, but ran into issues.
I don't quite remember the issue exactly, but it is some kind of race
where secondary cleanup triggering before or during the primary is
TDR being run. So, it seemed much safer and cleaner to trigger
secondary cleanups after the primary cleanup is done.

>>  	return DRM_GPU_SCHED_STAT_RESET;
>>
>>  sched_enable:
>> @@ -1576,7 +1653,11 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>>  	 * but there is not currently an easy way to do in DRM scheduler. With
>>  	 * some thought, do this in a follow up.
>>  	 */
>> -	xe_sched_submission_start(sched);
>> +	xe_guc_exec_queue_submission_start(q);
>> +
>> +	/* Trigger cleanup of secondary queues of multi queue group */
>> +	if (xe_exec_queue_is_multi_queue_primary(q))
>> +		xe_guc_exec_queue_trigger_secondary_cleanup(q);
>
>I don't think you need to trigger a cleanup here - this is no hang
>situation, rather a false timeout.
>

Ok, will remove.

Niranjana

>Matt
>
>>  handle_vf_resume:
>>  	return DRM_GPU_SCHED_STAT_NO_HANG;
>>  }
>> @@ -1607,6 +1688,14 @@ static void __guc_exec_queue_destroy_async(struct work_struct *w)
>>  	xe_pm_runtime_get(guc_to_xe(guc));
>>  	trace_xe_exec_queue_destroy(q);
>>
>> +	if (xe_exec_queue_is_multi_queue_secondary(q)) {
>> +		struct xe_exec_queue_group *group = q->multi_queue.group;
>> +
>> +		mutex_lock(&group->list_lock);
>> +		list_del(&q->multi_queue.link);
>> +		mutex_unlock(&group->list_lock);
>> +	}
>> +
>>  	if (xe_exec_queue_is_lr(q))
>>  		cancel_work_sync(&ge->lr_tdr);
>>  	/* Confirm no work left behind accessing device structures */
>> @@ -1897,6 +1986,19 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
>>
>>  	xe_exec_queue_assign_name(q, q->guc->id);
>>
>> +	/*
>> +	 * Maintain secondary queues of the multi queue group in a list
>> +	 * for handling dependencies across the queues in the group.
>> +	 */
>> +	if (xe_exec_queue_is_multi_queue_secondary(q)) {
>> +		struct xe_exec_queue_group *group = q->multi_queue.group;
>> +
>> +		INIT_LIST_HEAD(&q->multi_queue.link);
>> +		mutex_lock(&group->list_lock);
>> +		list_add_tail(&q->multi_queue.link, &group->list);
>> +		mutex_unlock(&group->list_lock);
>> +	}
>> +
>>  	trace_xe_exec_queue_create(q);
>>
>>  	return 0;
>> @@ -2125,6 +2227,10 @@ static void guc_exec_queue_resume(struct xe_exec_queue *q)
>>
>>  static bool guc_exec_queue_reset_status(struct xe_exec_queue *q)
>>  {
>> +	if (xe_exec_queue_is_multi_queue_secondary(q) &&
>> +	    guc_exec_queue_reset_status(xe_exec_queue_multi_queue_primary(q)))
>> +		return true;
>> +
>>  	return exec_queue_reset(q) || exec_queue_killed_or_banned_or_wedged(q);
>>  }
>>
>> --
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] drm/xe/multi_queue: Add GuC interface for multi queue support
  2025-11-01 18:07   ` Matthew Brost
@ 2025-11-04  4:56     ` Niranjana Vishwanathapura
  2025-11-04 17:41       ` Matthew Brost
  0 siblings, 1 reply; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-04  4:56 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Sat, Nov 01, 2025 at 11:07:08AM -0700, Matthew Brost wrote:
>On Fri, Oct 31, 2025 at 11:29:23AM -0700, Niranjana Vishwanathapura wrote:
>> Implement GuC commands and response along with the Context
>> Group Page (CGP) interface for multi queue support.
>>
>> Ensure that only primary queue (q0) of a multi queue group
>> communicate with GuC. The secondary queues of the group only
>> need to maintain LRCA and interface with drm scheduler.
>>
>> Use primary queue's submit_wq for all secondary queues of a multi
>> queue group. This serialization avoids any locking around CGP
>> synchronization with GuC.
>>
>
>Not a complete review, but a few comments.
>
>> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/xe/abi/guc_actions_abi.h |   3 +
>>  drivers/gpu/drm/xe/xe_exec_queue_types.h |   2 +
>>  drivers/gpu/drm/xe/xe_guc_ct.c           |   4 +
>>  drivers/gpu/drm/xe/xe_guc_fwif.h         |   3 +
>>  drivers/gpu/drm/xe/xe_guc_submit.c       | 302 +++++++++++++++++++----
>>  drivers/gpu/drm/xe/xe_guc_submit.h       |   1 +
>>  6 files changed, 270 insertions(+), 45 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/abi/guc_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
>> index 47756e4674a1..3e9fbed9cda6 100644
>> --- a/drivers/gpu/drm/xe/abi/guc_actions_abi.h
>> +++ b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
>> @@ -139,6 +139,9 @@ enum xe_guc_action {
>>  	XE_GUC_ACTION_DEREGISTER_G2G = 0x4508,
>>  	XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
>>  	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
>> +	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE = 0x4602,
>> +	XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC = 0x4603,
>> +	XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE = 0x4604,
>>  	XE_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507,
>>  	XE_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A,
>>  	XE_GUC_ACTION_SET_DEVICE_ENGINE_ACTIVITY_BUFFER = 0x550C,
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> index 3856776df5c4..38e47b003259 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> @@ -47,6 +47,8 @@ struct xe_exec_queue_group {
>>  	struct xarray xa;
>>  	/** @list_lock: Secondary queue list lock */
>>  	struct mutex list_lock;
>> +	/** @sync_pending: CGP_SYNC_DONE g2h response pending */
>> +	bool sync_pending;
>>  };
>>
>>  /**
>> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
>> index e68953ef3a00..48b5006eb080 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
>> @@ -1304,6 +1304,7 @@ static int parse_g2h_event(struct xe_guc_ct *ct, u32 *msg, u32 len)
>>  	lockdep_assert_held(&ct->lock);
>>
>>  	switch (action) {
>> +	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
>>  	case XE_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
>>  	case XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
>>  	case XE_GUC_ACTION_SCHED_ENGINE_MODE_DONE:
>> @@ -1570,6 +1571,9 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
>>  		ret = xe_guc_g2g_test_notification(guc, payload, adj_len);
>>  		break;
>>  #endif
>> +	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
>> +		ret = xe_guc_exec_queue_cgp_sync_done_handler(guc, payload, adj_len);
>> +		break;
>>  	default:
>>  		xe_gt_err(gt, "unexpected G2H action 0x%04x\n", action);
>>  	}
>> diff --git a/drivers/gpu/drm/xe/xe_guc_fwif.h b/drivers/gpu/drm/xe/xe_guc_fwif.h
>> index c90dd266e9cf..610dfb2f1cb5 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_fwif.h
>> +++ b/drivers/gpu/drm/xe/xe_guc_fwif.h
>> @@ -16,6 +16,7 @@
>>  #define G2H_LEN_DW_DEREGISTER_CONTEXT		3
>>  #define G2H_LEN_DW_TLB_INVALIDATE		3
>>  #define G2H_LEN_DW_G2G_NOTIFY_MIN		3
>> +#define G2H_LEN_DW_MULTI_QUEUE_CONTEXT		4
>
>This value doesn't look right. I'm not sure where 4 is coming from.
>
>The length of XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE
>appears to be 2. So with a value of 4, I believe the G2H credits will
>leak.
>
>You can run a multi-q test, then check the following debugfs:
>
>cat /sys/kernel/debug/dri/0/gt0/uc/guc_info
>
>In particular, these are the interesting fields:
>
>G2H CTB (all sizes in DW):
>        ...
>	resv_space: 16384
>        ...
>	g2h outstanding: 0
>
>^^^ This is what an idle G2H should look like. I suspect both G2H
>outstanding values will be non-zero, and resv_space will continuously
>decrease when running a multi-queue test.
>

Looks like G2H_LEN_DW_MULTI_QUEUE_CONTEXT should be 3.
2 dwords header (HXG event) and 1 dword payload. Will change.

However, I always saw 'g2h outsanding' being 0 and resv_space being 16384,
after running the multi-queue tests, irrespective of whether I set
G2H_LEN_DW_MULTI_QUEUE_CONTEXT to 3 or 4.

>>
>>  #define GUC_ID_MAX			65535
>>  #define GUC_ID_UNKNOWN			0xffffffff
>> @@ -62,6 +63,8 @@ struct guc_ctxt_registration_info {
>>  	u32 wq_base_lo;
>>  	u32 wq_base_hi;
>>  	u32 wq_size;
>> +	u32 cgp_lo;
>> +	u32 cgp_hi;
>>  	u32 hwlrca_lo;
>>  	u32 hwlrca_hi;
>>  };
>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>> index d4ffdb71ef3d..d2aa9a2524e7 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>> @@ -46,6 +46,7 @@
>>  #include "xe_trace.h"
>>  #include "xe_uc_fw.h"
>>  #include "xe_vm.h"
>> +#include "xe_bo.h"
>>
>>  static struct xe_guc *
>>  exec_queue_to_guc(struct xe_exec_queue *q)
>> @@ -541,7 +542,8 @@ static void init_policies(struct xe_guc *guc, struct xe_exec_queue *q)
>>  	u32 slpc_exec_queue_freq_req = 0;
>>  	u32 preempt_timeout_us = q->sched_props.preempt_timeout_us;
>>
>> -	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
>> +	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q) &&
>> +		     !xe_exec_queue_is_multi_queue_secondary(q));
>>
>>  	if (q->flags & EXEC_QUEUE_FLAG_LOW_LATENCY)
>>  		slpc_exec_queue_freq_req |= SLPC_CTX_FREQ_REQ_IS_COMPUTE;
>> @@ -561,6 +563,8 @@ static void set_min_preemption_timeout(struct xe_guc *guc, struct xe_exec_queue
>>  {
>>  	struct exec_queue_policy policy;
>>
>> +	xe_assert(guc_to_xe(guc), !xe_exec_queue_is_multi_queue_secondary(q));
>> +
>>  	__guc_exec_queue_policy_start_klv(&policy, q->guc->id);
>>  	__guc_exec_queue_policy_add_preemption_timeout(&policy, 1);
>>
>> @@ -575,6 +579,130 @@ static void set_min_preemption_timeout(struct xe_guc *guc, struct xe_exec_queue
>>  	xe_map_wr_field(xe_, &map_, 0, struct guc_submit_parallel_scratch, \
>>  			field_, val_)
>>
>> +#define CGP_VERSION_MAJOR_SHIFT	8
>> +
>> +static void xe_guc_exec_queue_group_cgp_update(struct xe_device *xe,
>> +					       struct xe_exec_queue *q)
>> +{
>> +	struct xe_exec_queue_group *group = q->multi_queue.group;
>> +	u32 guc_id = group->primary->guc->id;
>> +
>> +	/* Currently implementing CGP version 1.0 */
>> +	xe_map_wr(xe, &group->cgp_bo->vmap, 0, u32,
>> +		  1 << CGP_VERSION_MAJOR_SHIFT);
>> +
>> +	xe_map_wr(xe, &group->cgp_bo->vmap,
>> +		  (32 + q->multi_queue.pos * 2) * sizeof(u32),
>> +		  u32, lower_32_bits(xe_lrc_descriptor(q->lrc[0])));
>> +
>> +	xe_map_wr(xe, &group->cgp_bo->vmap,
>> +		  (33 + q->multi_queue.pos * 2) * sizeof(u32),
>> +		  u32, guc_id);
>> +
>> +	if (q->multi_queue.pos / 32) {
>> +		xe_map_wr(xe, &group->cgp_bo->vmap, 17 * sizeof(u32),
>> +			  u32, BIT(q->multi_queue.pos % 32));
>> +		xe_map_wr(xe, &group->cgp_bo->vmap, 16 * sizeof(u32), u32, 0);
>> +	} else {
>> +		xe_map_wr(xe, &group->cgp_bo->vmap, 16 * sizeof(u32),
>> +			  u32, BIT(q->multi_queue.pos));
>> +		xe_map_wr(xe, &group->cgp_bo->vmap, 17 * sizeof(u32), u32, 0);
>> +	}
>> +}
>> +
>> +static void xe_guc_exec_queue_group_cgp_sync(struct xe_guc *guc,
>> +					     struct xe_exec_queue *q,
>> +					     const u32 *action, u32 len)
>> +{
>> +	struct xe_exec_queue_group *group = q->multi_queue.group;
>> +	struct xe_device *xe = guc_to_xe(guc);
>> +	long ret;
>> +
>> +	/*
>> +	 * As all queues of a multi queue group use single drm scheduler
>> +	 * submit workqueue, CGP synchronization with GuC are serialized.
>> +	 * Hence, no locking is required here.
>> +	 * Wait for any pending CGP_SYNC_DONE response before updating the
>> +	 * CGP page and sending CGP_SYNC message.
>> +	 */
>> +	ret = wait_event_timeout(guc->ct.wq,
>> +				 !READ_ONCE(group->sync_pending) ||
>> +				 xe_guc_read_stopped(guc), HZ);
>> +	if (!ret || xe_guc_read_stopped(guc)) {
>> +		drm_err(&xe->drm, "Wait for CGP_SYNC_DONE response failed!\n");
>
>If this occurs you need a GT reset which should detect
>group->sync_pending in guc_exec_queue_stop and clean it up.
>

hmm...ok, let me give that a try. Not sure how urgent is this as ideally
it should never occur.

>Also here is where VF migration needs to be considered. The
>wait_event_timeout should pop out on vf_recovery being set, but not
>trigger a GT reset. In this case we need likely need some per secondary
>queue tracking state to figure out which secondary queues lost the CPG
>syncs so that flow can recover. We can figure out part out a bit later
>though.

Hmm...ok.

>
>> +		/* Something wrong with the CTB or GuC, no need to proceed */
>> +		return;
>> +	}
>> +
>> +	xe_guc_exec_queue_group_cgp_update(xe, q);
>> +
>> +	WRITE_ONCE(group->sync_pending, true);
>> +	xe_guc_ct_send(&guc->ct, action, len, G2H_LEN_DW_MULTI_QUEUE_CONTEXT, 1);
>
>The problem here appears to be two fold:
>
>- The value of G2H_LEN_DW_MULTI_QUEUE_CONTEXT looks incorrect
>- On multi-q registration both G2H credits and count are set but multi-q
>  register doesn't produce a G2H response. See my comment above thinga
>  getting leaked, that can't happen as PM will be off and eventually G2H
>  credits will run out and deadlock the CT channel leading to a GT reset.
>

Responded above.

>> +}
>> +
>> +static void __register_exec_queue(struct xe_guc *guc,
>> +				  struct guc_ctxt_registration_info *info)
>> +{
>> +	u32 action[] = {
>> +		XE_GUC_ACTION_REGISTER_CONTEXT,
>> +		info->flags,
>> +		info->context_idx,
>> +		info->engine_class,
>> +		info->engine_submit_mask,
>> +		info->wq_desc_lo,
>> +		info->wq_desc_hi,
>> +		info->wq_base_lo,
>> +		info->wq_base_hi,
>> +		info->wq_size,
>> +		info->hwlrca_lo,
>> +		info->hwlrca_hi,
>> +	};
>> +
>> +	/* explicitly checks some fields that we might fixup later */
>> +	xe_gt_assert(guc_to_gt(guc), info->wq_desc_lo ==
>> +		     action[XE_GUC_REGISTER_CONTEXT_DATA_5_WQ_DESC_ADDR_LOWER]);
>> +	xe_gt_assert(guc_to_gt(guc), info->wq_base_lo ==
>> +		     action[XE_GUC_REGISTER_CONTEXT_DATA_7_WQ_BUF_BASE_LOWER]);
>> +	xe_gt_assert(guc_to_gt(guc), info->hwlrca_lo ==
>> +		     action[XE_GUC_REGISTER_CONTEXT_DATA_10_HW_LRC_ADDR]);
>> +
>> +	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
>> +}
>> +
>> +static void __register_exec_queue_group(struct xe_guc *guc,
>> +					struct xe_exec_queue *q,
>> +					struct guc_ctxt_registration_info *info)
>> +{
>> +#define MAX_MULTI_QUEUE_REG_SIZE	(8)
>> +	struct xe_device *xe = guc_to_xe(guc);
>> +	u32 action[MAX_MULTI_QUEUE_REG_SIZE];
>> +	int len = 0;
>> +
>> +	if (xe_exec_queue_is_multi_queue_primary(q)) {
>> +		action[len++] = XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE;
>
>Again as mentioned above, this command doesn't require G2H credits
>unless this produces a XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE
>response.
>

Yes, XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE will have a
XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE response from GuC.

>> +		action[len++] = info->flags;
>> +		action[len++] = info->context_idx;
>> +		action[len++] = info->engine_class;
>> +		action[len++] = info->engine_submit_mask;
>> +		action[len++] = 0; /* Reserved */
>> +		action[len++] = info->cgp_lo;
>> +		action[len++] = info->cgp_hi;
>> +	} else {
>> +		/*
>> +		 * No need to wait before CGP sync since CT descriptors
>> +		 * should be ordered.
>> +		 */
>> +
>> +		action[len++] = XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC;
>> +		action[len++] = q->multi_queue.group->primary->guc->id;
>> +	}
>> +
>> +	xe_assert(xe, len <= MAX_MULTI_QUEUE_REG_SIZE);
>> +#undef MAX_MULTI_QUEUE_REG_SIZE
>> +
>> +	xe_guc_exec_queue_group_cgp_sync(guc, q, action, len);
>> +}
>> +
>>  static void __register_mlrc_exec_queue(struct xe_guc *guc,
>>  				       struct xe_exec_queue *q,
>>  				       struct guc_ctxt_registration_info *info)
>> @@ -622,35 +750,6 @@ static void __register_mlrc_exec_queue(struct xe_guc *guc,
>>  	xe_guc_ct_send(&guc->ct, action, len, 0, 0);
>>  }
>>
>> -static void __register_exec_queue(struct xe_guc *guc,
>> -				  struct guc_ctxt_registration_info *info)
>> -{
>> -	u32 action[] = {
>> -		XE_GUC_ACTION_REGISTER_CONTEXT,
>> -		info->flags,
>> -		info->context_idx,
>> -		info->engine_class,
>> -		info->engine_submit_mask,
>> -		info->wq_desc_lo,
>> -		info->wq_desc_hi,
>> -		info->wq_base_lo,
>> -		info->wq_base_hi,
>> -		info->wq_size,
>> -		info->hwlrca_lo,
>> -		info->hwlrca_hi,
>> -	};
>> -
>> -	/* explicitly checks some fields that we might fixup later */
>> -	xe_gt_assert(guc_to_gt(guc), info->wq_desc_lo ==
>> -		     action[XE_GUC_REGISTER_CONTEXT_DATA_5_WQ_DESC_ADDR_LOWER]);
>> -	xe_gt_assert(guc_to_gt(guc), info->wq_base_lo ==
>> -		     action[XE_GUC_REGISTER_CONTEXT_DATA_7_WQ_BUF_BASE_LOWER]);
>> -	xe_gt_assert(guc_to_gt(guc), info->hwlrca_lo ==
>> -		     action[XE_GUC_REGISTER_CONTEXT_DATA_10_HW_LRC_ADDR]);
>> -
>> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
>> -}
>> -
>>  static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
>>  {
>>  	struct xe_guc *guc = exec_queue_to_guc(q);
>> @@ -670,6 +769,13 @@ static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
>>  	info.flags = CONTEXT_REGISTRATION_FLAG_KMD |
>>  		FIELD_PREP(CONTEXT_REGISTRATION_FLAG_TYPE, ctx_type);
>>
>> +	if (xe_exec_queue_is_multi_queue(q)) {
>> +		struct xe_exec_queue_group *group = q->multi_queue.group;
>> +
>> +		info.cgp_lo = xe_bo_ggtt_addr(group->cgp_bo);
>> +		info.cgp_hi = 0;
>> +	}
>> +
>>  	if (xe_exec_queue_is_parallel(q)) {
>>  		u64 ggtt_addr = xe_lrc_parallel_ggtt_addr(lrc);
>>  		struct iosys_map map = xe_lrc_parallel_map(lrc);
>> @@ -700,11 +806,15 @@ static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
>>
>>  	set_exec_queue_registered(q);
>>  	trace_xe_exec_queue_register(q);
>> -	if (xe_exec_queue_is_parallel(q))
>> +	if (xe_exec_queue_is_multi_queue(q))
>> +		__register_exec_queue_group(guc, q, &info);
>> +	else if (xe_exec_queue_is_parallel(q))
>>  		__register_mlrc_exec_queue(guc, q, &info);
>>  	else
>>  		__register_exec_queue(guc, &info);
>> -	init_policies(guc, q);
>> +
>> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
>> +		init_policies(guc, q);
>>  }
>>
>>  static u32 wq_space_until_wrap(struct xe_exec_queue *q)
>> @@ -833,6 +943,12 @@ static void submit_exec_queue(struct xe_exec_queue *q, struct xe_sched_job *job)
>>  	if (exec_queue_suspended(q) && !xe_exec_queue_is_parallel(q))
>>  		return;
>>
>> +	/*
>> +	 * All queues in a multi-queue group will use the primary queue
>> +	 * of the group to interface with GuC.
>> +	 */
>> +	q = xe_exec_queue_multi_queue_primary(q);
>> +
>>  	if (!exec_queue_enabled(q) && !exec_queue_suspended(q)) {
>>  		action[len++] = XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
>>  		action[len++] = q->guc->id;
>> @@ -879,6 +995,18 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
>>  	trace_xe_sched_job_run(job);
>>
>>  	if (!killed_or_banned_or_wedged && !xe_sched_job_is_error(job)) {
>> +		if (xe_exec_queue_is_multi_queue_secondary(q)) {
>> +			struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
>> +
>> +			if (exec_queue_killed_or_banned_or_wedged(primary)) {
>> +				killed_or_banned_or_wedged = true;
>> +				goto run_job_out;
>> +			}
>> +
>> +			if (!exec_queue_registered(primary))
>> +				register_exec_queue(primary, GUC_CONTEXT_NORMAL);
>> +		}
>> +
>>  		if (!exec_queue_registered(q))
>>  			register_exec_queue(q, GUC_CONTEXT_NORMAL);
>>  		if (!job->skip_emit)
>> @@ -887,6 +1015,7 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
>>  		job->skip_emit = false;
>>  	}
>>
>> +run_job_out:
>>  	/*
>>  	 * We don't care about job-fence ordering in LR VMs because these fences
>>  	 * are never exported; they are used solely to keep jobs on the pending
>> @@ -912,6 +1041,11 @@ int xe_guc_read_stopped(struct xe_guc *guc)
>>  	return atomic_read(&guc->submission_state.stopped);
>>  }
>>
>> +static void handle_multi_queue_secondary_sched_done(struct xe_guc *guc,
>> +						    struct xe_exec_queue *q,
>> +						    u32 runnable_state);
>> +static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q);
>> +
>>  #define MAKE_SCHED_CONTEXT_ACTION(q, enable_disable)			\
>>  	u32 action[] = {						\
>>  		XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET,			\
>> @@ -925,7 +1059,9 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
>>  	MAKE_SCHED_CONTEXT_ACTION(q, DISABLE);
>>  	int ret;
>>
>> -	set_min_preemption_timeout(guc, q);
>> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
>> +		set_min_preemption_timeout(guc, q);
>> +
>>  	smp_rmb();
>>  	ret = wait_event_timeout(guc->ct.wq,
>>  				 (!exec_queue_pending_enable(q) &&
>> @@ -953,9 +1089,12 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
>>  	 * Reserve space for both G2H here as the 2nd G2H is sent from a G2H
>>  	 * handler and we are not allowed to reserved G2H space in handlers.
>>  	 */
>> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET +
>> -		       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
>> +	if (xe_exec_queue_is_multi_queue_secondary(q))
>> +		handle_multi_queue_secondary_sched_done(guc, q, 0);
>> +	else
>> +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET +
>> +			       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
>>  }
>>
>>  static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
>> @@ -1161,8 +1300,11 @@ static void enable_scheduling(struct xe_exec_queue *q)
>>  	set_exec_queue_enabled(q);
>>  	trace_xe_exec_queue_scheduling_enable(q);
>>
>> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
>> +	if (xe_exec_queue_is_multi_queue_secondary(q))
>> +		handle_multi_queue_secondary_sched_done(guc, q, 1);
>> +	else
>> +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
>>
>>  	ret = wait_event_timeout(guc->ct.wq,
>>  				 !exec_queue_pending_enable(q) ||
>> @@ -1186,14 +1328,17 @@ static void disable_scheduling(struct xe_exec_queue *q, bool immediate)
>>  	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
>>  	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q));
>>
>> -	if (immediate)
>> +	if (immediate && !xe_exec_queue_is_multi_queue_secondary(q))
>>  		set_min_preemption_timeout(guc, q);
>>  	clear_exec_queue_enabled(q);
>>  	set_exec_queue_pending_disable(q);
>>  	trace_xe_exec_queue_scheduling_disable(q);
>>
>> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
>> +	if (xe_exec_queue_is_multi_queue_secondary(q))
>> +		handle_multi_queue_secondary_sched_done(guc, q, 0);
>> +	else
>> +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
>>  }
>>
>>  static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
>> @@ -1211,8 +1356,11 @@ static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
>>  	set_exec_queue_destroyed(q);
>>  	trace_xe_exec_queue_deregister(q);
>>
>> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> -		       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
>> +	if (xe_exec_queue_is_multi_queue_secondary(q))
>> +		handle_deregister_done(guc, q);
>> +	else
>> +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> +			       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
>>  }
>>
>>  static enum drm_gpu_sched_stat
>> @@ -1660,6 +1808,7 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
>>  {
>>  	struct xe_gpu_scheduler *sched;
>>  	struct xe_guc *guc = exec_queue_to_guc(q);
>> +	struct workqueue_struct *submit_wq = NULL;
>>  	struct xe_guc_exec_queue *ge;
>>  	long timeout;
>>  	int err, i;
>> @@ -1680,8 +1829,20 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
>>
>>  	timeout = (q->vm && xe_vm_in_lr_mode(q->vm)) ? MAX_SCHEDULE_TIMEOUT :
>>  		  msecs_to_jiffies(q->sched_props.job_timeout_ms);
>> +
>> +	/*
>> +	 * Use primary queue's submit_wq for all secondary queues of a
>> +	 * multi queue group. This serialization avoids any locking around
>> +	 * CGP synchronization with GuC.
>> +	 */
>> +	if (xe_exec_queue_is_multi_queue_secondary(q)) {
>> +		struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
>> +
>> +		submit_wq = primary->guc->sched.base.submit_wq;
>> +	}
>> +
>>  	err = xe_sched_init(&ge->sched, &drm_sched_ops, &xe_sched_ops,
>> -			    NULL, xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES, 64,
>> +			    submit_wq, xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES, 64,
>>  			    timeout, guc_to_gt(guc)->ordered_wq, NULL,
>>  			    q->name, gt_to_xe(q->gt)->drm.dev);
>>  	if (err)
>> @@ -2418,7 +2579,11 @@ static void deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
>>
>>  	trace_xe_exec_queue_deregister(q);
>>
>> -	xe_guc_ct_send_g2h_handler(&guc->ct, action, ARRAY_SIZE(action));
>> +	if (xe_exec_queue_is_multi_queue_secondary(q))
>> +		handle_deregister_done(guc, q);
>> +	else
>> +		xe_guc_ct_send_g2h_handler(&guc->ct, action,
>> +					   ARRAY_SIZE(action));
>>  }
>>
>>  static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
>> @@ -2468,6 +2633,15 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
>>  	}
>>  }
>>
>> +static void handle_multi_queue_secondary_sched_done(struct xe_guc *guc,
>> +						    struct xe_exec_queue *q,
>> +						    u32 runnable_state)
>> +{
>> +	mutex_lock(&guc->ct.lock);
>
>I don't think you need the CT lock here. This per-queue state which
>should be safe to modify without the any lock. The CT lock never
>protects queue state, we just happen to have it in G2H responses because
>of how the CT layer works.
>

Without the CT lock here, I get lockdep warning from _guc_ct_send_locked(),
h2g_has_room() etc. So, I guess we need to keep it.

>> +	handle_sched_done(guc, q, runnable_state);
>> +	mutex_unlock(&guc->ct.lock);
>> +}
>> +
>>  int xe_guc_sched_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
>>  {
>>  	struct xe_exec_queue *q;
>> @@ -2672,6 +2846,44 @@ int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 le
>>  	return 0;
>>  }
>>
>> +/**
>> + * xe_guc_exec_queue_cgp_sync_done_handler - CGP synchronization done handler
>> + * @guc: guc
>> + * @msg: message indicating CGP sync done
>> + * @len: length of message
>> + *
>> + * Set multi queue group's sync_pending flag to false and wakeup anyone waiting
>> + * for CGP synchronization to complete.
>> + *
>> + * Return: 0 on success, -EPROTO for malformed messages.
>> + */
>> +int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
>> +{
>> +	struct xe_device *xe = guc_to_xe(guc);
>> +	struct xe_exec_queue *q;
>> +	u32 guc_id = msg[0];
>> +
>> +	if (unlikely(len < 1)) {
>> +		drm_err(&xe->drm, "Invalid CGP_SYNC_DONE length %u", len);
>> +		return -EPROTO;
>> +	}
>> +
>> +	q = g2h_exec_queue_lookup(guc, guc_id);
>> +	if (unlikely(!q))
>> +		return -EPROTO;
>> +
>> +	if (!xe_exec_queue_is_multi_queue_primary(q)) {
>> +		drm_err(&xe->drm, "Unexpected CGP_SYNC_DONE response");
>> +		return -EPROTO;
>> +	}
>> +
>> +	/* Wakeup the serialized cgp update wait */
>> +	WRITE_ONCE(q->multi_queue.group->sync_pending, false);
>
>So here - I suspect we need to associate the CGP_SYNC_DONE with a
>secondary queue state tracking in order to get VF migration to work.
>Again we can figure his part of a bit later but should be considered.
>

Hmm..ok.

>Matt
>
>> +	wake_up_all(&guc->ct.wq);
>> +
>> +	return 0;
>> +}
>> +
>>  static void
>>  guc_exec_queue_wq_snapshot_capture(struct xe_exec_queue *q,
>>  				   struct xe_guc_submit_exec_queue_snapshot *snapshot)
>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
>> index b49a2748ec46..abfa94bce391 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_submit.h
>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.h
>> @@ -34,6 +34,7 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
>>  					       u32 len);
>>  int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 len);
>>  int xe_guc_error_capture_handler(struct xe_guc *guc, u32 *msg, u32 len);
>> +int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
>>
>>  struct xe_guc_submit_exec_queue_snapshot *
>>  xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q);
>> --
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] drm/xe/multi_queue: Add GuC interface for multi queue support
  2025-11-02 18:02   ` Matthew Brost
@ 2025-11-04  5:02     ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-04  5:02 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Sun, Nov 02, 2025 at 10:02:28AM -0800, Matthew Brost wrote:
>On Fri, Oct 31, 2025 at 11:29:23AM -0700, Niranjana Vishwanathapura wrote:
>> Implement GuC commands and response along with the Context
>> Group Page (CGP) interface for multi queue support.
>>
>> Ensure that only primary queue (q0) of a multi queue group
>> communicate with GuC. The secondary queues of the group only
>> need to maintain LRCA and interface with drm scheduler.
>>
>> Use primary queue's submit_wq for all secondary queues of a multi
>> queue group. This serialization avoids any locking around CGP
>> synchronization with GuC.
>>
>> Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/xe/abi/guc_actions_abi.h |   3 +
>>  drivers/gpu/drm/xe/xe_exec_queue_types.h |   2 +
>>  drivers/gpu/drm/xe/xe_guc_ct.c           |   4 +
>>  drivers/gpu/drm/xe/xe_guc_fwif.h         |   3 +
>>  drivers/gpu/drm/xe/xe_guc_submit.c       | 302 +++++++++++++++++++----
>>  drivers/gpu/drm/xe/xe_guc_submit.h       |   1 +
>>  6 files changed, 270 insertions(+), 45 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/abi/guc_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
>> index 47756e4674a1..3e9fbed9cda6 100644
>> --- a/drivers/gpu/drm/xe/abi/guc_actions_abi.h
>> +++ b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
>> @@ -139,6 +139,9 @@ enum xe_guc_action {
>>  	XE_GUC_ACTION_DEREGISTER_G2G = 0x4508,
>>  	XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
>>  	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
>> +	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE = 0x4602,
>> +	XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC = 0x4603,
>> +	XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE = 0x4604,
>>  	XE_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507,
>>  	XE_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A,
>>  	XE_GUC_ACTION_SET_DEVICE_ENGINE_ACTIVITY_BUFFER = 0x550C,
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> index 3856776df5c4..38e47b003259 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> @@ -47,6 +47,8 @@ struct xe_exec_queue_group {
>>  	struct xarray xa;
>>  	/** @list_lock: Secondary queue list lock */
>>  	struct mutex list_lock;
>> +	/** @sync_pending: CGP_SYNC_DONE g2h response pending */
>> +	bool sync_pending;
>>  };
>>
>>  /**
>> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
>> index e68953ef3a00..48b5006eb080 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
>> @@ -1304,6 +1304,7 @@ static int parse_g2h_event(struct xe_guc_ct *ct, u32 *msg, u32 len)
>>  	lockdep_assert_held(&ct->lock);
>>
>>  	switch (action) {
>> +	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
>>  	case XE_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
>>  	case XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
>>  	case XE_GUC_ACTION_SCHED_ENGINE_MODE_DONE:
>> @@ -1570,6 +1571,9 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
>>  		ret = xe_guc_g2g_test_notification(guc, payload, adj_len);
>>  		break;
>>  #endif
>> +	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
>> +		ret = xe_guc_exec_queue_cgp_sync_done_handler(guc, payload, adj_len);
>> +		break;
>>  	default:
>>  		xe_gt_err(gt, "unexpected G2H action 0x%04x\n", action);
>>  	}
>> diff --git a/drivers/gpu/drm/xe/xe_guc_fwif.h b/drivers/gpu/drm/xe/xe_guc_fwif.h
>> index c90dd266e9cf..610dfb2f1cb5 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_fwif.h
>> +++ b/drivers/gpu/drm/xe/xe_guc_fwif.h
>> @@ -16,6 +16,7 @@
>>  #define G2H_LEN_DW_DEREGISTER_CONTEXT		3
>>  #define G2H_LEN_DW_TLB_INVALIDATE		3
>>  #define G2H_LEN_DW_G2G_NOTIFY_MIN		3
>> +#define G2H_LEN_DW_MULTI_QUEUE_CONTEXT		4
>>
>>  #define GUC_ID_MAX			65535
>>  #define GUC_ID_UNKNOWN			0xffffffff
>> @@ -62,6 +63,8 @@ struct guc_ctxt_registration_info {
>
>Side note - this struct could probably move to private struct
>xe_guc_submit.c.
>

Yah, looks like it should be.

>>  	u32 wq_base_lo;
>>  	u32 wq_base_hi;
>>  	u32 wq_size;
>> +	u32 cgp_lo;
>> +	u32 cgp_hi;
>>  	u32 hwlrca_lo;
>>  	u32 hwlrca_hi;
>>  };
>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>> index d4ffdb71ef3d..d2aa9a2524e7 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>> @@ -46,6 +46,7 @@
>>  #include "xe_trace.h"
>>  #include "xe_uc_fw.h"
>>  #include "xe_vm.h"
>> +#include "xe_bo.h"
>
>Why do you need xe_bo.h? It is not obvious to me. If you need it,
>alphabetical order.
>

It is needed as we call xe_bo_ggtt_addr() below.
Ok, will make it alphabetical order.

>>
>>  static struct xe_guc *
>>  exec_queue_to_guc(struct xe_exec_queue *q)
>> @@ -541,7 +542,8 @@ static void init_policies(struct xe_guc *guc, struct xe_exec_queue *q)
>>  	u32 slpc_exec_queue_freq_req = 0;
>>  	u32 preempt_timeout_us = q->sched_props.preempt_timeout_us;
>>
>> -	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
>> +	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q) &&
>> +		     !xe_exec_queue_is_multi_queue_secondary(q));
>>
>>  	if (q->flags & EXEC_QUEUE_FLAG_LOW_LATENCY)
>>  		slpc_exec_queue_freq_req |= SLPC_CTX_FREQ_REQ_IS_COMPUTE;
>> @@ -561,6 +563,8 @@ static void set_min_preemption_timeout(struct xe_guc *guc, struct xe_exec_queue
>>  {
>>  	struct exec_queue_policy policy;
>>
>> +	xe_assert(guc_to_xe(guc), !xe_exec_queue_is_multi_queue_secondary(q));
>> +
>>  	__guc_exec_queue_policy_start_klv(&policy, q->guc->id);
>>  	__guc_exec_queue_policy_add_preemption_timeout(&policy, 1);
>>
>> @@ -575,6 +579,130 @@ static void set_min_preemption_timeout(struct xe_guc *guc, struct xe_exec_queue
>>  	xe_map_wr_field(xe_, &map_, 0, struct guc_submit_parallel_scratch, \
>>  			field_, val_)
>>
>> +#define CGP_VERSION_MAJOR_SHIFT	8
>> +
>> +static void xe_guc_exec_queue_group_cgp_update(struct xe_device *xe,
>> +					       struct xe_exec_queue *q)
>> +{
>> +	struct xe_exec_queue_group *group = q->multi_queue.group;
>> +	u32 guc_id = group->primary->guc->id;
>> +
>> +	/* Currently implementing CGP version 1.0 */
>> +	xe_map_wr(xe, &group->cgp_bo->vmap, 0, u32,
>> +		  1 << CGP_VERSION_MAJOR_SHIFT);
>> +
>> +	xe_map_wr(xe, &group->cgp_bo->vmap,
>> +		  (32 + q->multi_queue.pos * 2) * sizeof(u32),
>> +		  u32, lower_32_bits(xe_lrc_descriptor(q->lrc[0])));
>> +
>> +	xe_map_wr(xe, &group->cgp_bo->vmap,
>> +		  (33 + q->multi_queue.pos * 2) * sizeof(u32),
>> +		  u32, guc_id);
>> +
>> +	if (q->multi_queue.pos / 32) {
>> +		xe_map_wr(xe, &group->cgp_bo->vmap, 17 * sizeof(u32),
>> +			  u32, BIT(q->multi_queue.pos % 32));
>> +		xe_map_wr(xe, &group->cgp_bo->vmap, 16 * sizeof(u32), u32, 0);
>> +	} else {
>> +		xe_map_wr(xe, &group->cgp_bo->vmap, 16 * sizeof(u32),
>> +			  u32, BIT(q->multi_queue.pos));
>> +		xe_map_wr(xe, &group->cgp_bo->vmap, 17 * sizeof(u32), u32, 0);
>
>Maybe some defines for all these numbers (16, 17, 32, 33) in this
>function? Or some comments? It is very hard to look at this code and
>know what it is doing.
>

It is added in the kernel-doc patch below.
https://patchwork.freedesktop.org/patch/684846/?series=156865&rev=1

>> +	}
>> +}
>> +
>> +static void xe_guc_exec_queue_group_cgp_sync(struct xe_guc *guc,
>> +					     struct xe_exec_queue *q,
>> +					     const u32 *action, u32 len)
>> +{
>> +	struct xe_exec_queue_group *group = q->multi_queue.group;
>> +	struct xe_device *xe = guc_to_xe(guc);
>> +	long ret;
>> +
>> +	/*
>> +	 * As all queues of a multi queue group use single drm scheduler
>> +	 * submit workqueue, CGP synchronization with GuC are serialized.
>> +	 * Hence, no locking is required here.
>> +	 * Wait for any pending CGP_SYNC_DONE response before updating the
>> +	 * CGP page and sending CGP_SYNC message.
>> +	 */
>> +	ret = wait_event_timeout(guc->ct.wq,
>> +				 !READ_ONCE(group->sync_pending) ||
>> +				 xe_guc_read_stopped(guc), HZ);
>> +	if (!ret || xe_guc_read_stopped(guc)) {
>> +		drm_err(&xe->drm, "Wait for CGP_SYNC_DONE response failed!\n");
>> +		/* Something wrong with the CTB or GuC, no need to proceed */
>> +		return;
>> +	}
>> +
>> +	xe_guc_exec_queue_group_cgp_update(xe, q);
>> +
>> +	WRITE_ONCE(group->sync_pending, true);
>> +	xe_guc_ct_send(&guc->ct, action, len, G2H_LEN_DW_MULTI_QUEUE_CONTEXT, 1);
>> +}
>> +
>> +static void __register_exec_queue(struct xe_guc *guc,
>> +				  struct guc_ctxt_registration_info *info)
>> +{
>> +	u32 action[] = {
>> +		XE_GUC_ACTION_REGISTER_CONTEXT,
>> +		info->flags,
>> +		info->context_idx,
>> +		info->engine_class,
>> +		info->engine_submit_mask,
>> +		info->wq_desc_lo,
>> +		info->wq_desc_hi,
>> +		info->wq_base_lo,
>> +		info->wq_base_hi,
>> +		info->wq_size,
>> +		info->hwlrca_lo,
>> +		info->hwlrca_hi,
>> +	};
>> +
>> +	/* explicitly checks some fields that we might fixup later */
>> +	xe_gt_assert(guc_to_gt(guc), info->wq_desc_lo ==
>> +		     action[XE_GUC_REGISTER_CONTEXT_DATA_5_WQ_DESC_ADDR_LOWER]);
>> +	xe_gt_assert(guc_to_gt(guc), info->wq_base_lo ==
>> +		     action[XE_GUC_REGISTER_CONTEXT_DATA_7_WQ_BUF_BASE_LOWER]);
>> +	xe_gt_assert(guc_to_gt(guc), info->hwlrca_lo ==
>> +		     action[XE_GUC_REGISTER_CONTEXT_DATA_10_HW_LRC_ADDR]);
>> +
>> +	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
>> +}
>> +
>> +static void __register_exec_queue_group(struct xe_guc *guc,
>> +					struct xe_exec_queue *q,
>> +					struct guc_ctxt_registration_info *info)
>> +{
>> +#define MAX_MULTI_QUEUE_REG_SIZE	(8)
>> +	struct xe_device *xe = guc_to_xe(guc);
>> +	u32 action[MAX_MULTI_QUEUE_REG_SIZE];
>> +	int len = 0;
>> +
>> +	if (xe_exec_queue_is_multi_queue_primary(q)) {
>> +		action[len++] = XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE;
>> +		action[len++] = info->flags;
>> +		action[len++] = info->context_idx;
>> +		action[len++] = info->engine_class;
>> +		action[len++] = info->engine_submit_mask;
>> +		action[len++] = 0; /* Reserved */
>> +		action[len++] = info->cgp_lo;
>> +		action[len++] = info->cgp_hi;
>> +	} else {
>> +		/*
>> +		 * No need to wait before CGP sync since CT descriptors
>> +		 * should be ordered.
>> +		 */
>> +
>> +		action[len++] = XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC;
>> +		action[len++] = q->multi_queue.group->primary->guc->id;
>> +	}
>> +
>> +	xe_assert(xe, len <= MAX_MULTI_QUEUE_REG_SIZE);
>> +#undef MAX_MULTI_QUEUE_REG_SIZE
>> +
>> +	xe_guc_exec_queue_group_cgp_sync(guc, q, action, len);
>> +}
>> +
>>  static void __register_mlrc_exec_queue(struct xe_guc *guc,
>>  				       struct xe_exec_queue *q,
>>  				       struct guc_ctxt_registration_info *info)
>> @@ -622,35 +750,6 @@ static void __register_mlrc_exec_queue(struct xe_guc *guc,
>>  	xe_guc_ct_send(&guc->ct, action, len, 0, 0);
>>  }
>>
>> -static void __register_exec_queue(struct xe_guc *guc,
>> -				  struct guc_ctxt_registration_info *info)
>> -{
>> -	u32 action[] = {
>> -		XE_GUC_ACTION_REGISTER_CONTEXT,
>> -		info->flags,
>> -		info->context_idx,
>> -		info->engine_class,
>> -		info->engine_submit_mask,
>> -		info->wq_desc_lo,
>> -		info->wq_desc_hi,
>> -		info->wq_base_lo,
>> -		info->wq_base_hi,
>> -		info->wq_size,
>> -		info->hwlrca_lo,
>> -		info->hwlrca_hi,
>> -	};
>> -
>> -	/* explicitly checks some fields that we might fixup later */
>> -	xe_gt_assert(guc_to_gt(guc), info->wq_desc_lo ==
>> -		     action[XE_GUC_REGISTER_CONTEXT_DATA_5_WQ_DESC_ADDR_LOWER]);
>> -	xe_gt_assert(guc_to_gt(guc), info->wq_base_lo ==
>> -		     action[XE_GUC_REGISTER_CONTEXT_DATA_7_WQ_BUF_BASE_LOWER]);
>> -	xe_gt_assert(guc_to_gt(guc), info->hwlrca_lo ==
>> -		     action[XE_GUC_REGISTER_CONTEXT_DATA_10_HW_LRC_ADDR]);
>> -
>> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
>> -}
>> -
>>  static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
>>  {
>>  	struct xe_guc *guc = exec_queue_to_guc(q);
>> @@ -670,6 +769,13 @@ static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
>>  	info.flags = CONTEXT_REGISTRATION_FLAG_KMD |
>>  		FIELD_PREP(CONTEXT_REGISTRATION_FLAG_TYPE, ctx_type);
>>
>> +	if (xe_exec_queue_is_multi_queue(q)) {
>> +		struct xe_exec_queue_group *group = q->multi_queue.group;
>> +
>> +		info.cgp_lo = xe_bo_ggtt_addr(group->cgp_bo);
>> +		info.cgp_hi = 0;
>> +	}
>> +
>>  	if (xe_exec_queue_is_parallel(q)) {
>>  		u64 ggtt_addr = xe_lrc_parallel_ggtt_addr(lrc);
>>  		struct iosys_map map = xe_lrc_parallel_map(lrc);
>> @@ -700,11 +806,15 @@ static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
>>
>>  	set_exec_queue_registered(q);
>>  	trace_xe_exec_queue_register(q);
>> -	if (xe_exec_queue_is_parallel(q))
>> +	if (xe_exec_queue_is_multi_queue(q))
>> +		__register_exec_queue_group(guc, q, &info);
>> +	else if (xe_exec_queue_is_parallel(q))
>>  		__register_mlrc_exec_queue(guc, q, &info);
>>  	else
>>  		__register_exec_queue(guc, &info);
>> -	init_policies(guc, q);
>> +
>> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
>> +		init_policies(guc, q);
>>  }
>>
>>  static u32 wq_space_until_wrap(struct xe_exec_queue *q)
>> @@ -833,6 +943,12 @@ static void submit_exec_queue(struct xe_exec_queue *q, struct xe_sched_job *job)
>>  	if (exec_queue_suspended(q) && !xe_exec_queue_is_parallel(q))
>>  		return;
>>
>> +	/*
>> +	 * All queues in a multi-queue group will use the primary queue
>> +	 * of the group to interface with GuC.
>> +	 */
>> +	q = xe_exec_queue_multi_queue_primary(q);
>> +
>
>I think we might need a bit more thought which bits each queue owns in
>q->guc->state. The state machine is pretty complicated and now pointing
>secondary -> primary in some cases makes this even worse. I guess I'd
>ask to figure out which bits a owned by primary, which ones by the
>secondary, and which owns are mirrored and write this down somewhere.
>

The way I thought about it is that all queues (secondaries as well)
will have all the q->guc->state bits. Though we will probably short
circuit SCHED_MODE setting for secondaries and trigger cleanup of
secondaries upon primary cleanup etc., the bits will still be valid.

Niranjana

>Matt
>
>>  	if (!exec_queue_enabled(q) && !exec_queue_suspended(q)) {
>>  		action[len++] = XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
>>  		action[len++] = q->guc->id;
>> @@ -879,6 +995,18 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
>>  	trace_xe_sched_job_run(job);
>>
>>  	if (!killed_or_banned_or_wedged && !xe_sched_job_is_error(job)) {
>> +		if (xe_exec_queue_is_multi_queue_secondary(q)) {
>> +			struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
>> +
>> +			if (exec_queue_killed_or_banned_or_wedged(primary)) {
>> +				killed_or_banned_or_wedged = true;
>> +				goto run_job_out;
>> +			}
>> +
>> +			if (!exec_queue_registered(primary))
>> +				register_exec_queue(primary, GUC_CONTEXT_NORMAL);
>> +		}
>> +
>>  		if (!exec_queue_registered(q))
>>  			register_exec_queue(q, GUC_CONTEXT_NORMAL);
>>  		if (!job->skip_emit)
>> @@ -887,6 +1015,7 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
>>  		job->skip_emit = false;
>>  	}
>>
>> +run_job_out:
>>  	/*
>>  	 * We don't care about job-fence ordering in LR VMs because these fences
>>  	 * are never exported; they are used solely to keep jobs on the pending
>> @@ -912,6 +1041,11 @@ int xe_guc_read_stopped(struct xe_guc *guc)
>>  	return atomic_read(&guc->submission_state.stopped);
>>  }
>>
>> +static void handle_multi_queue_secondary_sched_done(struct xe_guc *guc,
>> +						    struct xe_exec_queue *q,
>> +						    u32 runnable_state);
>> +static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q);
>> +
>>  #define MAKE_SCHED_CONTEXT_ACTION(q, enable_disable)			\
>>  	u32 action[] = {						\
>>  		XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET,			\
>> @@ -925,7 +1059,9 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
>>  	MAKE_SCHED_CONTEXT_ACTION(q, DISABLE);
>>  	int ret;
>>
>> -	set_min_preemption_timeout(guc, q);
>> +	if (!xe_exec_queue_is_multi_queue_secondary(q))
>> +		set_min_preemption_timeout(guc, q);
>> +
>>  	smp_rmb();
>>  	ret = wait_event_timeout(guc->ct.wq,
>>  				 (!exec_queue_pending_enable(q) &&
>> @@ -953,9 +1089,12 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
>>  	 * Reserve space for both G2H here as the 2nd G2H is sent from a G2H
>>  	 * handler and we are not allowed to reserved G2H space in handlers.
>>  	 */
>> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET +
>> -		       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
>> +	if (xe_exec_queue_is_multi_queue_secondary(q))
>> +		handle_multi_queue_secondary_sched_done(guc, q, 0);
>> +	else
>> +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET +
>> +			       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
>>  }
>>
>>  static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
>> @@ -1161,8 +1300,11 @@ static void enable_scheduling(struct xe_exec_queue *q)
>>  	set_exec_queue_enabled(q);
>>  	trace_xe_exec_queue_scheduling_enable(q);
>>
>> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
>> +	if (xe_exec_queue_is_multi_queue_secondary(q))
>> +		handle_multi_queue_secondary_sched_done(guc, q, 1);
>> +	else
>> +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
>>
>>  	ret = wait_event_timeout(guc->ct.wq,
>>  				 !exec_queue_pending_enable(q) ||
>> @@ -1186,14 +1328,17 @@ static void disable_scheduling(struct xe_exec_queue *q, bool immediate)
>>  	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
>>  	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q));
>>
>> -	if (immediate)
>> +	if (immediate && !xe_exec_queue_is_multi_queue_secondary(q))
>>  		set_min_preemption_timeout(guc, q);
>>  	clear_exec_queue_enabled(q);
>>  	set_exec_queue_pending_disable(q);
>>  	trace_xe_exec_queue_scheduling_disable(q);
>>
>> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
>> +	if (xe_exec_queue_is_multi_queue_secondary(q))
>> +		handle_multi_queue_secondary_sched_done(guc, q, 0);
>> +	else
>> +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
>>  }
>>
>>  static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
>> @@ -1211,8 +1356,11 @@ static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
>>  	set_exec_queue_destroyed(q);
>>  	trace_xe_exec_queue_deregister(q);
>>
>> -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> -		       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
>> +	if (xe_exec_queue_is_multi_queue_secondary(q))
>> +		handle_deregister_done(guc, q);
>> +	else
>> +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> +			       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
>>  }
>>
>>  static enum drm_gpu_sched_stat
>> @@ -1660,6 +1808,7 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
>>  {
>>  	struct xe_gpu_scheduler *sched;
>>  	struct xe_guc *guc = exec_queue_to_guc(q);
>> +	struct workqueue_struct *submit_wq = NULL;
>>  	struct xe_guc_exec_queue *ge;
>>  	long timeout;
>>  	int err, i;
>> @@ -1680,8 +1829,20 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
>>
>>  	timeout = (q->vm && xe_vm_in_lr_mode(q->vm)) ? MAX_SCHEDULE_TIMEOUT :
>>  		  msecs_to_jiffies(q->sched_props.job_timeout_ms);
>> +
>> +	/*
>> +	 * Use primary queue's submit_wq for all secondary queues of a
>> +	 * multi queue group. This serialization avoids any locking around
>> +	 * CGP synchronization with GuC.
>> +	 */
>> +	if (xe_exec_queue_is_multi_queue_secondary(q)) {
>> +		struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
>> +
>> +		submit_wq = primary->guc->sched.base.submit_wq;
>> +	}
>> +
>>  	err = xe_sched_init(&ge->sched, &drm_sched_ops, &xe_sched_ops,
>> -			    NULL, xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES, 64,
>> +			    submit_wq, xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES, 64,
>>  			    timeout, guc_to_gt(guc)->ordered_wq, NULL,
>>  			    q->name, gt_to_xe(q->gt)->drm.dev);
>>  	if (err)
>> @@ -2418,7 +2579,11 @@ static void deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
>>
>>  	trace_xe_exec_queue_deregister(q);
>>
>> -	xe_guc_ct_send_g2h_handler(&guc->ct, action, ARRAY_SIZE(action));
>> +	if (xe_exec_queue_is_multi_queue_secondary(q))
>> +		handle_deregister_done(guc, q);
>> +	else
>> +		xe_guc_ct_send_g2h_handler(&guc->ct, action,
>> +					   ARRAY_SIZE(action));
>>  }
>>
>>  static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
>> @@ -2468,6 +2633,15 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
>>  	}
>>  }
>>
>> +static void handle_multi_queue_secondary_sched_done(struct xe_guc *guc,
>> +						    struct xe_exec_queue *q,
>> +						    u32 runnable_state)
>> +{
>> +	mutex_lock(&guc->ct.lock);
>> +	handle_sched_done(guc, q, runnable_state);
>> +	mutex_unlock(&guc->ct.lock);
>> +}
>> +
>>  int xe_guc_sched_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
>>  {
>>  	struct xe_exec_queue *q;
>> @@ -2672,6 +2846,44 @@ int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 le
>>  	return 0;
>>  }
>>
>> +/**
>> + * xe_guc_exec_queue_cgp_sync_done_handler - CGP synchronization done handler
>> + * @guc: guc
>> + * @msg: message indicating CGP sync done
>> + * @len: length of message
>> + *
>> + * Set multi queue group's sync_pending flag to false and wakeup anyone waiting
>> + * for CGP synchronization to complete.
>> + *
>> + * Return: 0 on success, -EPROTO for malformed messages.
>> + */
>> +int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
>> +{
>> +	struct xe_device *xe = guc_to_xe(guc);
>> +	struct xe_exec_queue *q;
>> +	u32 guc_id = msg[0];
>> +
>> +	if (unlikely(len < 1)) {
>> +		drm_err(&xe->drm, "Invalid CGP_SYNC_DONE length %u", len);
>> +		return -EPROTO;
>> +	}
>> +
>> +	q = g2h_exec_queue_lookup(guc, guc_id);
>> +	if (unlikely(!q))
>> +		return -EPROTO;
>> +
>> +	if (!xe_exec_queue_is_multi_queue_primary(q)) {
>> +		drm_err(&xe->drm, "Unexpected CGP_SYNC_DONE response");
>> +		return -EPROTO;
>> +	}
>> +
>> +	/* Wakeup the serialized cgp update wait */
>> +	WRITE_ONCE(q->multi_queue.group->sync_pending, false);
>> +	wake_up_all(&guc->ct.wq);
>> +
>> +	return 0;
>> +}
>> +
>>  static void
>>  guc_exec_queue_wq_snapshot_capture(struct xe_exec_queue *q,
>>  				   struct xe_guc_submit_exec_queue_snapshot *snapshot)
>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
>> index b49a2748ec46..abfa94bce391 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_submit.h
>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.h
>> @@ -34,6 +34,7 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
>>  					       u32 len);
>>  int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 len);
>>  int xe_guc_error_capture_handler(struct xe_guc *guc, u32 *msg, u32 len);
>> +int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
>>
>>  struct xe_guc_submit_exec_queue_snapshot *
>>  xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q);
>> --
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 13/16] drm/xe/multi_queue: Support active group after primary is destroyed
  2025-11-03 22:05   ` Matthew Brost
@ 2025-11-04 17:24     ` Niranjana Vishwanathapura
  2025-11-04 17:30       ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-04 17:24 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Mon, Nov 03, 2025 at 02:05:53PM -0800, Matthew Brost wrote:
>On Fri, Oct 31, 2025 at 11:29:33AM -0700, Niranjana Vishwanathapura wrote:
>> Add support to keep the group active after the primary queue is
>> destroyed. Instead of killing the primary queue during exec_queue
>> destroy ioctl, kill it when all the secondary queues of the group
>> are killed.
>>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_device.c           |  7 ++-
>>  drivers/gpu/drm/xe/xe_exec_queue.c       | 55 +++++++++++++++++++++++-
>>  drivers/gpu/drm/xe/xe_exec_queue.h       |  2 +
>>  drivers/gpu/drm/xe/xe_exec_queue_types.h |  4 ++
>>  include/uapi/drm/xe_drm.h                |  5 +++
>>  5 files changed, 70 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>> index 0b496676527a..708a17c357e6 100644
>> --- a/drivers/gpu/drm/xe/xe_device.c
>> +++ b/drivers/gpu/drm/xe/xe_device.c
>> @@ -176,7 +176,12 @@ static void xe_file_close(struct drm_device *dev, struct drm_file *file)
>>  	xa_for_each(&xef->exec_queue.xa, idx, q) {
>>  		if (q->vm && q->hwe->hw_engine_group)
>>  			xe_hw_engine_group_del_exec_queue(q->hwe->hw_engine_group, q);
>> -		xe_exec_queue_kill(q);
>> +
>> +		if (xe_exec_queue_is_multi_queue_primary(q))
>> +			xe_exec_queue_group_kill_put(q->multi_queue.group);
>> +		else
>> +			xe_exec_queue_kill(q);
>> +
>>  		xe_exec_queue_put(q);
>>  	}
>>  	xa_for_each(&xef->vm.xa, idx, vm)
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>> index 3c1bb4f10fd5..d7b0173691c1 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue.c
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>> @@ -405,6 +405,26 @@ struct xe_exec_queue *xe_exec_queue_create_bind(struct xe_device *xe,
>>  }
>>  ALLOW_ERROR_INJECTION(xe_exec_queue_create_bind, ERRNO);
>>
>> +static void xe_exec_queue_group_kill(struct kref *ref)
>> +{
>> +	struct xe_exec_queue_group *group = container_of(ref, struct xe_exec_queue_group,
>> +							 kill_refcount);
>> +	xe_exec_queue_kill(group->primary);
>> +}
>> +
>> +static inline void xe_exec_queue_group_kill_get(struct xe_exec_queue_group *group)
>> +{
>> +	kref_get(&group->kill_refcount);
>> +}
>> +
>> +void xe_exec_queue_group_kill_put(struct xe_exec_queue_group *group)
>> +{
>> +	if (!group)
>> +		return;
>> +
>> +	kref_put(&group->kill_refcount, xe_exec_queue_group_kill);
>> +}
>> +
>>  void xe_exec_queue_destroy(struct kref *ref)
>>  {
>>  	struct xe_exec_queue *q = container_of(ref, struct xe_exec_queue, refcount);
>> @@ -607,6 +627,7 @@ static int xe_exec_queue_group_init(struct xe_device *xe, struct xe_exec_queue *
>>  	group->primary = q;
>>  	group->cgp_bo = bo;
>>  	INIT_LIST_HEAD(&group->list);
>> +	kref_init(&group->kill_refcount);
>>  	xa_init_flags(&group->xa, XA_FLAGS_ALLOC1);
>>  	mutex_init(&group->lock);
>>  	mutex_init(&group->list_lock);
>> @@ -675,6 +696,11 @@ static int xe_exec_queue_group_add(struct xe_device *xe, struct xe_exec_queue *q
>>  	q->multi_queue.pos = pos;
>>  	mutex_unlock(&group->lock);
>>
>> +	if (group->primary->multi_queue.keep_active) {
>> +		xe_exec_queue_group_kill_get(group);
>> +		q->multi_queue.keep_active = true;
>> +	}
>> +
>>  	return 0;
>>  }
>>
>> @@ -691,6 +717,11 @@ static void xe_exec_queue_group_delete(struct xe_exec_queue *q)
>>  	if (lrc)
>>  		xe_lrc_put(lrc);
>>  	mutex_unlock(&group->lock);
>> +
>> +	if (q->multi_queue.keep_active) {
>> +		xe_exec_queue_group_kill_put(group);
>> +		q->multi_queue.keep_active = false;
>> +	}
>>  }
>>
>>  static int exec_queue_set_multi_group(struct xe_device *xe, struct xe_exec_queue *q,
>> @@ -709,12 +740,24 @@ static int exec_queue_set_multi_group(struct xe_device *xe, struct xe_exec_queue
>>  		return -EINVAL;
>>
>>  	if (value & DRM_XE_MULTI_GROUP_CREATE) {
>> -		if (XE_IOCTL_DBG(xe, value & ~DRM_XE_MULTI_GROUP_CREATE))
>> +		if (XE_IOCTL_DBG(xe, value & ~(DRM_XE_MULTI_GROUP_CREATE |
>> +					       DRM_XE_MULTI_GROUP_KEEP_ACTIVE)))
>> +			return -EINVAL;
>> +
>> +		/*
>> +		 * KEEP_ACTIVE is not supported in preempt fence mode as in that mode,
>> +		 * VM_DESTROY ioctl expects all exec queues of that VM are already killed.
>> +		 */
>> +		if (XE_IOCTL_DBG(xe, (value & DRM_XE_MULTI_GROUP_KEEP_ACTIVE) &&
>> +				 xe_vm_in_preempt_fence_mode(q->vm)))
>>  			return -EINVAL;
>>
>>  		q->multi_queue.valid = true;
>>  		q->multi_queue.is_primary = true;
>>  		q->multi_queue.pos = 0;
>> +		if (value & DRM_XE_MULTI_GROUP_KEEP_ACTIVE)
>> +			q->multi_queue.keep_active = true;
>> +
>>  		return 0;
>>  	}
>>
>> @@ -1254,6 +1297,11 @@ void xe_exec_queue_kill(struct xe_exec_queue *q)
>>
>>  	q->ops->kill(q);
>>  	xe_vm_remove_compute_exec_queue(q->vm, q);
>> +
>> +	if (!xe_exec_queue_is_multi_queue_primary(q) && q->multi_queue.keep_active) {
>> +		xe_exec_queue_group_kill_put(q->multi_queue.group);
>> +		q->multi_queue.keep_active = false;
>> +	}
>
>This looks a little odd. Either you don't need to clear
>multi_queue.keep_active as xe_exec_queue_kill can be called at most once
>(IIRC it can be called multiple times) or you need some locking around
>multi_queue.keep_active or it needs to be an atomic to prevent multiple
>threads from calling xe_exec_queue_group_kill_put twice.
>

It looks like xe_exec_queue_kill() will only get called once.
We really don't need to reset the keep_active flag here by setting it
to 'false'. I just added that for completeness.
Ok, Will remove.

Niranjana

>Matt
>
>>  }
>>
>>  int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
>> @@ -1280,7 +1328,10 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
>>  	if (q->vm && q->hwe->hw_engine_group)
>>  		xe_hw_engine_group_del_exec_queue(q->hwe->hw_engine_group, q);
>>
>> -	xe_exec_queue_kill(q);
>> +	if (xe_exec_queue_is_multi_queue_primary(q))
>> +		xe_exec_queue_group_kill_put(q->multi_queue.group);
>> +	else
>> +		xe_exec_queue_kill(q);
>>
>>  	trace_xe_exec_queue_close(q);
>>  	xe_exec_queue_put(q);
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
>> index 61478b2e883b..b642341f1ede 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue.h
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue.h
>> @@ -109,6 +109,8 @@ static inline struct xe_exec_queue *xe_exec_queue_multi_queue_primary(struct xe_
>>  	return xe_exec_queue_is_multi_queue(q) ? q->multi_queue.group->primary : q;
>>  }
>>
>> +void xe_exec_queue_group_kill_put(struct xe_exec_queue_group *group);
>> +
>>  bool xe_exec_queue_is_lr(struct xe_exec_queue *q);
>>
>>  bool xe_exec_queue_is_idle(struct xe_exec_queue *q);
>> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> index e64b6588923e..cdca3afe838c 100644
>> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> @@ -55,6 +55,8 @@ struct xe_exec_queue_group {
>>  	struct list_head list;
>>  	/** @list_lock: Secondary queue list lock */
>>  	struct mutex list_lock;
>> +	/** @kill_refcount: ref count to kill primary queue */
>> +	struct kref kill_refcount;
>>  	/** @sync_pending: CGP_SYNC_DONE g2h response pending */
>>  	bool sync_pending;
>>  };
>> @@ -152,6 +154,8 @@ struct xe_exec_queue {
>>  		u8 valid:1;
>>  		/** @multi_queue.is_primary: Is primary queue (Q0) of the group */
>>  		u8 is_primary:1;
>> +		/** @multi_queue.keep_active: Keep the group active after primary is destroyed */
>> +		u8 keep_active:1;
>>  	} multi_queue;
>>
>>  	/** @sched_props: scheduling properties */
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index d72151163e77..333fb38b3404 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -1260,6 +1260,10 @@ struct drm_xe_vm_bind {
>>   *    then a new multi-queue group is created with this queue as the primary queue
>>   *    (Q0). Otherwise, the queue gets added to the multi-queue group whose primary
>>   *    queue id is specified in the 'value' field.
>> + *    If the extension's 'value' field has %DRM_XE_MULTI_GROUP_KEEP_ACTIVE flag
>> + *    set, then the multi-queue group is kept active after the primary queue is
>> + *    destroyed.
>> + *
>>   *  - %DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY - Set the queue
>>   *    priority within the multi-queue group.
>>   *
>> @@ -1304,6 +1308,7 @@ struct drm_xe_exec_queue_create {
>>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE		2
>>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP		3
>>  #define     DRM_XE_MULTI_GROUP_CREATE				(1ull << 63)
>> +#define     DRM_XE_MULTI_GROUP_KEEP_ACTIVE			(1ull << 62)
>>  #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY	4
>>  	/** @extensions: Pointer to the first extension struct, if any */
>>  	__u64 extensions;
>> --
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 13/16] drm/xe/multi_queue: Support active group after primary is destroyed
  2025-11-04 17:24     ` Niranjana Vishwanathapura
@ 2025-11-04 17:30       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-04 17:30 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Tue, Nov 04, 2025 at 09:24:19AM -0800, Niranjana Vishwanathapura wrote:
>On Mon, Nov 03, 2025 at 02:05:53PM -0800, Matthew Brost wrote:
>>On Fri, Oct 31, 2025 at 11:29:33AM -0700, Niranjana Vishwanathapura wrote:
>>>Add support to keep the group active after the primary queue is
>>>destroyed. Instead of killing the primary queue during exec_queue
>>>destroy ioctl, kill it when all the secondary queues of the group
>>>are killed.
>>>
>>>Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>>---
>>> drivers/gpu/drm/xe/xe_device.c           |  7 ++-
>>> drivers/gpu/drm/xe/xe_exec_queue.c       | 55 +++++++++++++++++++++++-
>>> drivers/gpu/drm/xe/xe_exec_queue.h       |  2 +
>>> drivers/gpu/drm/xe/xe_exec_queue_types.h |  4 ++
>>> include/uapi/drm/xe_drm.h                |  5 +++
>>> 5 files changed, 70 insertions(+), 3 deletions(-)
>>>
>>>diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>>>index 0b496676527a..708a17c357e6 100644
>>>--- a/drivers/gpu/drm/xe/xe_device.c
>>>+++ b/drivers/gpu/drm/xe/xe_device.c
>>>@@ -176,7 +176,12 @@ static void xe_file_close(struct drm_device *dev, struct drm_file *file)
>>> 	xa_for_each(&xef->exec_queue.xa, idx, q) {
>>> 		if (q->vm && q->hwe->hw_engine_group)
>>> 			xe_hw_engine_group_del_exec_queue(q->hwe->hw_engine_group, q);
>>>-		xe_exec_queue_kill(q);
>>>+
>>>+		if (xe_exec_queue_is_multi_queue_primary(q))
>>>+			xe_exec_queue_group_kill_put(q->multi_queue.group);
>>>+		else
>>>+			xe_exec_queue_kill(q);
>>>+
>>> 		xe_exec_queue_put(q);
>>> 	}
>>> 	xa_for_each(&xef->vm.xa, idx, vm)
>>>diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>>>index 3c1bb4f10fd5..d7b0173691c1 100644
>>>--- a/drivers/gpu/drm/xe/xe_exec_queue.c
>>>+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>>>@@ -405,6 +405,26 @@ struct xe_exec_queue *xe_exec_queue_create_bind(struct xe_device *xe,
>>> }
>>> ALLOW_ERROR_INJECTION(xe_exec_queue_create_bind, ERRNO);
>>>
>>>+static void xe_exec_queue_group_kill(struct kref *ref)
>>>+{
>>>+	struct xe_exec_queue_group *group = container_of(ref, struct xe_exec_queue_group,
>>>+							 kill_refcount);
>>>+	xe_exec_queue_kill(group->primary);
>>>+}
>>>+
>>>+static inline void xe_exec_queue_group_kill_get(struct xe_exec_queue_group *group)
>>>+{
>>>+	kref_get(&group->kill_refcount);
>>>+}
>>>+
>>>+void xe_exec_queue_group_kill_put(struct xe_exec_queue_group *group)
>>>+{
>>>+	if (!group)
>>>+		return;
>>>+
>>>+	kref_put(&group->kill_refcount, xe_exec_queue_group_kill);
>>>+}
>>>+
>>> void xe_exec_queue_destroy(struct kref *ref)
>>> {
>>> 	struct xe_exec_queue *q = container_of(ref, struct xe_exec_queue, refcount);
>>>@@ -607,6 +627,7 @@ static int xe_exec_queue_group_init(struct xe_device *xe, struct xe_exec_queue *
>>> 	group->primary = q;
>>> 	group->cgp_bo = bo;
>>> 	INIT_LIST_HEAD(&group->list);
>>>+	kref_init(&group->kill_refcount);
>>> 	xa_init_flags(&group->xa, XA_FLAGS_ALLOC1);
>>> 	mutex_init(&group->lock);
>>> 	mutex_init(&group->list_lock);
>>>@@ -675,6 +696,11 @@ static int xe_exec_queue_group_add(struct xe_device *xe, struct xe_exec_queue *q
>>> 	q->multi_queue.pos = pos;
>>> 	mutex_unlock(&group->lock);
>>>
>>>+	if (group->primary->multi_queue.keep_active) {
>>>+		xe_exec_queue_group_kill_get(group);
>>>+		q->multi_queue.keep_active = true;
>>>+	}
>>>+
>>> 	return 0;
>>> }
>>>
>>>@@ -691,6 +717,11 @@ static void xe_exec_queue_group_delete(struct xe_exec_queue *q)
>>> 	if (lrc)
>>> 		xe_lrc_put(lrc);
>>> 	mutex_unlock(&group->lock);
>>>+
>>>+	if (q->multi_queue.keep_active) {
>>>+		xe_exec_queue_group_kill_put(group);
>>>+		q->multi_queue.keep_active = false;
>>>+	}
>>> }
>>>
>>> static int exec_queue_set_multi_group(struct xe_device *xe, struct xe_exec_queue *q,
>>>@@ -709,12 +740,24 @@ static int exec_queue_set_multi_group(struct xe_device *xe, struct xe_exec_queue
>>> 		return -EINVAL;
>>>
>>> 	if (value & DRM_XE_MULTI_GROUP_CREATE) {
>>>-		if (XE_IOCTL_DBG(xe, value & ~DRM_XE_MULTI_GROUP_CREATE))
>>>+		if (XE_IOCTL_DBG(xe, value & ~(DRM_XE_MULTI_GROUP_CREATE |
>>>+					       DRM_XE_MULTI_GROUP_KEEP_ACTIVE)))
>>>+			return -EINVAL;
>>>+
>>>+		/*
>>>+		 * KEEP_ACTIVE is not supported in preempt fence mode as in that mode,
>>>+		 * VM_DESTROY ioctl expects all exec queues of that VM are already killed.
>>>+		 */
>>>+		if (XE_IOCTL_DBG(xe, (value & DRM_XE_MULTI_GROUP_KEEP_ACTIVE) &&
>>>+				 xe_vm_in_preempt_fence_mode(q->vm)))
>>> 			return -EINVAL;
>>>
>>> 		q->multi_queue.valid = true;
>>> 		q->multi_queue.is_primary = true;
>>> 		q->multi_queue.pos = 0;
>>>+		if (value & DRM_XE_MULTI_GROUP_KEEP_ACTIVE)
>>>+			q->multi_queue.keep_active = true;
>>>+
>>> 		return 0;
>>> 	}
>>>
>>>@@ -1254,6 +1297,11 @@ void xe_exec_queue_kill(struct xe_exec_queue *q)
>>>
>>> 	q->ops->kill(q);
>>> 	xe_vm_remove_compute_exec_queue(q->vm, q);
>>>+
>>>+	if (!xe_exec_queue_is_multi_queue_primary(q) && q->multi_queue.keep_active) {
>>>+		xe_exec_queue_group_kill_put(q->multi_queue.group);
>>>+		q->multi_queue.keep_active = false;
>>>+	}
>>
>>This looks a little odd. Either you don't need to clear
>>multi_queue.keep_active as xe_exec_queue_kill can be called at most once
>>(IIRC it can be called multiple times) or you need some locking around
>>multi_queue.keep_active or it needs to be an atomic to prevent multiple
>>threads from calling xe_exec_queue_group_kill_put twice.
>>
>
>It looks like xe_exec_queue_kill() will only get called once.
>We really don't need to reset the keep_active flag here by setting it
>to 'false'. I just added that for completeness.
>Ok, Will remove.
>

Or may be just keep it as it looks more readable to me. ie., for the
secondary queues, we set keep_active while calling group_kill_get() and
we reset it while calling group_kill_put().

Niranjana

>Niranjana
>
>>Matt
>>
>>> }
>>>
>>> int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
>>>@@ -1280,7 +1328,10 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
>>> 	if (q->vm && q->hwe->hw_engine_group)
>>> 		xe_hw_engine_group_del_exec_queue(q->hwe->hw_engine_group, q);
>>>
>>>-	xe_exec_queue_kill(q);
>>>+	if (xe_exec_queue_is_multi_queue_primary(q))
>>>+		xe_exec_queue_group_kill_put(q->multi_queue.group);
>>>+	else
>>>+		xe_exec_queue_kill(q);
>>>
>>> 	trace_xe_exec_queue_close(q);
>>> 	xe_exec_queue_put(q);
>>>diff --git a/drivers/gpu/drm/xe/xe_exec_queue.h b/drivers/gpu/drm/xe/xe_exec_queue.h
>>>index 61478b2e883b..b642341f1ede 100644
>>>--- a/drivers/gpu/drm/xe/xe_exec_queue.h
>>>+++ b/drivers/gpu/drm/xe/xe_exec_queue.h
>>>@@ -109,6 +109,8 @@ static inline struct xe_exec_queue *xe_exec_queue_multi_queue_primary(struct xe_
>>> 	return xe_exec_queue_is_multi_queue(q) ? q->multi_queue.group->primary : q;
>>> }
>>>
>>>+void xe_exec_queue_group_kill_put(struct xe_exec_queue_group *group);
>>>+
>>> bool xe_exec_queue_is_lr(struct xe_exec_queue *q);
>>>
>>> bool xe_exec_queue_is_idle(struct xe_exec_queue *q);
>>>diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>>>index e64b6588923e..cdca3afe838c 100644
>>>--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
>>>+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>>>@@ -55,6 +55,8 @@ struct xe_exec_queue_group {
>>> 	struct list_head list;
>>> 	/** @list_lock: Secondary queue list lock */
>>> 	struct mutex list_lock;
>>>+	/** @kill_refcount: ref count to kill primary queue */
>>>+	struct kref kill_refcount;
>>> 	/** @sync_pending: CGP_SYNC_DONE g2h response pending */
>>> 	bool sync_pending;
>>> };
>>>@@ -152,6 +154,8 @@ struct xe_exec_queue {
>>> 		u8 valid:1;
>>> 		/** @multi_queue.is_primary: Is primary queue (Q0) of the group */
>>> 		u8 is_primary:1;
>>>+		/** @multi_queue.keep_active: Keep the group active after primary is destroyed */
>>>+		u8 keep_active:1;
>>> 	} multi_queue;
>>>
>>> 	/** @sched_props: scheduling properties */
>>>diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>>>index d72151163e77..333fb38b3404 100644
>>>--- a/include/uapi/drm/xe_drm.h
>>>+++ b/include/uapi/drm/xe_drm.h
>>>@@ -1260,6 +1260,10 @@ struct drm_xe_vm_bind {
>>>  *    then a new multi-queue group is created with this queue as the primary queue
>>>  *    (Q0). Otherwise, the queue gets added to the multi-queue group whose primary
>>>  *    queue id is specified in the 'value' field.
>>>+ *    If the extension's 'value' field has %DRM_XE_MULTI_GROUP_KEEP_ACTIVE flag
>>>+ *    set, then the multi-queue group is kept active after the primary queue is
>>>+ *    destroyed.
>>>+ *
>>>  *  - %DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY - Set the queue
>>>  *    priority within the multi-queue group.
>>>  *
>>>@@ -1304,6 +1308,7 @@ struct drm_xe_exec_queue_create {
>>> #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE		2
>>> #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_GROUP		3
>>> #define     DRM_XE_MULTI_GROUP_CREATE				(1ull << 63)
>>>+#define     DRM_XE_MULTI_GROUP_KEEP_ACTIVE			(1ull << 62)
>>> #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY	4
>>> 	/** @extensions: Pointer to the first extension struct, if any */
>>> 	__u64 extensions;
>>>--
>>>2.43.0
>>>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] drm/xe/multi_queue: Add GuC interface for multi queue support
  2025-11-04  4:56     ` Niranjana Vishwanathapura
@ 2025-11-04 17:41       ` Matthew Brost
  2025-11-04 18:55         ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 61+ messages in thread
From: Matthew Brost @ 2025-11-04 17:41 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Mon, Nov 03, 2025 at 08:56:39PM -0800, Niranjana Vishwanathapura wrote:
> On Sat, Nov 01, 2025 at 11:07:08AM -0700, Matthew Brost wrote:
> > On Fri, Oct 31, 2025 at 11:29:23AM -0700, Niranjana Vishwanathapura wrote:
> > > Implement GuC commands and response along with the Context
> > > Group Page (CGP) interface for multi queue support.
> > > 
> > > Ensure that only primary queue (q0) of a multi queue group
> > > communicate with GuC. The secondary queues of the group only
> > > need to maintain LRCA and interface with drm scheduler.
> > > 
> > > Use primary queue's submit_wq for all secondary queues of a multi
> > > queue group. This serialization avoids any locking around CGP
> > > synchronization with GuC.
> > > 
> > 
> > Not a complete review, but a few comments.
> > 
> > > Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> > > Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/abi/guc_actions_abi.h |   3 +
> > >  drivers/gpu/drm/xe/xe_exec_queue_types.h |   2 +
> > >  drivers/gpu/drm/xe/xe_guc_ct.c           |   4 +
> > >  drivers/gpu/drm/xe/xe_guc_fwif.h         |   3 +
> > >  drivers/gpu/drm/xe/xe_guc_submit.c       | 302 +++++++++++++++++++----
> > >  drivers/gpu/drm/xe/xe_guc_submit.h       |   1 +
> > >  6 files changed, 270 insertions(+), 45 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/abi/guc_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> > > index 47756e4674a1..3e9fbed9cda6 100644
> > > --- a/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> > > +++ b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> > > @@ -139,6 +139,9 @@ enum xe_guc_action {
> > >  	XE_GUC_ACTION_DEREGISTER_G2G = 0x4508,
> > >  	XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
> > >  	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
> > > +	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE = 0x4602,
> > > +	XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC = 0x4603,
> > > +	XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE = 0x4604,
> > >  	XE_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507,
> > >  	XE_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A,
> > >  	XE_GUC_ACTION_SET_DEVICE_ENGINE_ACTIVITY_BUFFER = 0x550C,
> > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > index 3856776df5c4..38e47b003259 100644
> > > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > @@ -47,6 +47,8 @@ struct xe_exec_queue_group {
> > >  	struct xarray xa;
> > >  	/** @list_lock: Secondary queue list lock */
> > >  	struct mutex list_lock;
> > > +	/** @sync_pending: CGP_SYNC_DONE g2h response pending */
> > > +	bool sync_pending;
> > >  };
> > > 
> > >  /**
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > index e68953ef3a00..48b5006eb080 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > @@ -1304,6 +1304,7 @@ static int parse_g2h_event(struct xe_guc_ct *ct, u32 *msg, u32 len)
> > >  	lockdep_assert_held(&ct->lock);
> > > 
> > >  	switch (action) {
> > > +	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
> > >  	case XE_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
> > >  	case XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
> > >  	case XE_GUC_ACTION_SCHED_ENGINE_MODE_DONE:
> > > @@ -1570,6 +1571,9 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
> > >  		ret = xe_guc_g2g_test_notification(guc, payload, adj_len);
> > >  		break;
> > >  #endif
> > > +	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
> > > +		ret = xe_guc_exec_queue_cgp_sync_done_handler(guc, payload, adj_len);
> > > +		break;
> > >  	default:
> > >  		xe_gt_err(gt, "unexpected G2H action 0x%04x\n", action);
> > >  	}
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_fwif.h b/drivers/gpu/drm/xe/xe_guc_fwif.h
> > > index c90dd266e9cf..610dfb2f1cb5 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_fwif.h
> > > +++ b/drivers/gpu/drm/xe/xe_guc_fwif.h
> > > @@ -16,6 +16,7 @@
> > >  #define G2H_LEN_DW_DEREGISTER_CONTEXT		3
> > >  #define G2H_LEN_DW_TLB_INVALIDATE		3
> > >  #define G2H_LEN_DW_G2G_NOTIFY_MIN		3
> > > +#define G2H_LEN_DW_MULTI_QUEUE_CONTEXT		4
> > 
> > This value doesn't look right. I'm not sure where 4 is coming from.
> > 
> > The length of XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE
> > appears to be 2. So with a value of 4, I believe the G2H credits will
> > leak.
> > 
> > You can run a multi-q test, then check the following debugfs:
> > 
> > cat /sys/kernel/debug/dri/0/gt0/uc/guc_info
> > 
> > In particular, these are the interesting fields:
> > 
> > G2H CTB (all sizes in DW):
> >        ...
> > 	resv_space: 16384
> >        ...
> > 	g2h outstanding: 0
> > 
> > ^^^ This is what an idle G2H should look like. I suspect both G2H
> > outstanding values will be non-zero, and resv_space will continuously
> > decrease when running a multi-queue test.
> > 
> 
> Looks like G2H_LEN_DW_MULTI_QUEUE_CONTEXT should be 3.
> 2 dwords header (HXG event) and 1 dword payload. Will change.
> 
> However, I always saw 'g2h outsanding' being 0 and resv_space being 16384,
> after running the multi-queue tests, irrespective of whether I set
> G2H_LEN_DW_MULTI_QUEUE_CONTEXT to 3 or 4.
> 

That is really odd the credits didn't get out screwed up. I'd double
check on this as that doesn't seem right. Perhaps the runtime PM refs
drop to zero and the GuC gets reloaded? We are removing that though here
[1]. Maybe try with this series.

[1] https://patchwork.freedesktop.org/series/154017/

I still think the value should be 2 here as this like deregister_done
which delivers a guc_id in msg[0].

> > > 
> > >  #define GUC_ID_MAX			65535
> > >  #define GUC_ID_UNKNOWN			0xffffffff
> > > @@ -62,6 +63,8 @@ struct guc_ctxt_registration_info {
> > >  	u32 wq_base_lo;
> > >  	u32 wq_base_hi;
> > >  	u32 wq_size;
> > > +	u32 cgp_lo;
> > > +	u32 cgp_hi;
> > >  	u32 hwlrca_lo;
> > >  	u32 hwlrca_hi;
> > >  };
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > index d4ffdb71ef3d..d2aa9a2524e7 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > @@ -46,6 +46,7 @@
> > >  #include "xe_trace.h"
> > >  #include "xe_uc_fw.h"
> > >  #include "xe_vm.h"
> > > +#include "xe_bo.h"
> > > 
> > >  static struct xe_guc *
> > >  exec_queue_to_guc(struct xe_exec_queue *q)
> > > @@ -541,7 +542,8 @@ static void init_policies(struct xe_guc *guc, struct xe_exec_queue *q)
> > >  	u32 slpc_exec_queue_freq_req = 0;
> > >  	u32 preempt_timeout_us = q->sched_props.preempt_timeout_us;
> > > 
> > > -	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
> > > +	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q) &&
> > > +		     !xe_exec_queue_is_multi_queue_secondary(q));
> > > 
> > >  	if (q->flags & EXEC_QUEUE_FLAG_LOW_LATENCY)
> > >  		slpc_exec_queue_freq_req |= SLPC_CTX_FREQ_REQ_IS_COMPUTE;
> > > @@ -561,6 +563,8 @@ static void set_min_preemption_timeout(struct xe_guc *guc, struct xe_exec_queue
> > >  {
> > >  	struct exec_queue_policy policy;
> > > 
> > > +	xe_assert(guc_to_xe(guc), !xe_exec_queue_is_multi_queue_secondary(q));
> > > +
> > >  	__guc_exec_queue_policy_start_klv(&policy, q->guc->id);
> > >  	__guc_exec_queue_policy_add_preemption_timeout(&policy, 1);
> > > 
> > > @@ -575,6 +579,130 @@ static void set_min_preemption_timeout(struct xe_guc *guc, struct xe_exec_queue
> > >  	xe_map_wr_field(xe_, &map_, 0, struct guc_submit_parallel_scratch, \
> > >  			field_, val_)
> > > 
> > > +#define CGP_VERSION_MAJOR_SHIFT	8
> > > +
> > > +static void xe_guc_exec_queue_group_cgp_update(struct xe_device *xe,
> > > +					       struct xe_exec_queue *q)
> > > +{
> > > +	struct xe_exec_queue_group *group = q->multi_queue.group;
> > > +	u32 guc_id = group->primary->guc->id;
> > > +
> > > +	/* Currently implementing CGP version 1.0 */
> > > +	xe_map_wr(xe, &group->cgp_bo->vmap, 0, u32,
> > > +		  1 << CGP_VERSION_MAJOR_SHIFT);
> > > +
> > > +	xe_map_wr(xe, &group->cgp_bo->vmap,
> > > +		  (32 + q->multi_queue.pos * 2) * sizeof(u32),
> > > +		  u32, lower_32_bits(xe_lrc_descriptor(q->lrc[0])));
> > > +
> > > +	xe_map_wr(xe, &group->cgp_bo->vmap,
> > > +		  (33 + q->multi_queue.pos * 2) * sizeof(u32),
> > > +		  u32, guc_id);
> > > +
> > > +	if (q->multi_queue.pos / 32) {
> > > +		xe_map_wr(xe, &group->cgp_bo->vmap, 17 * sizeof(u32),
> > > +			  u32, BIT(q->multi_queue.pos % 32));
> > > +		xe_map_wr(xe, &group->cgp_bo->vmap, 16 * sizeof(u32), u32, 0);
> > > +	} else {
> > > +		xe_map_wr(xe, &group->cgp_bo->vmap, 16 * sizeof(u32),
> > > +			  u32, BIT(q->multi_queue.pos));
> > > +		xe_map_wr(xe, &group->cgp_bo->vmap, 17 * sizeof(u32), u32, 0);
> > > +	}
> > > +}
> > > +
> > > +static void xe_guc_exec_queue_group_cgp_sync(struct xe_guc *guc,
> > > +					     struct xe_exec_queue *q,
> > > +					     const u32 *action, u32 len)
> > > +{
> > > +	struct xe_exec_queue_group *group = q->multi_queue.group;
> > > +	struct xe_device *xe = guc_to_xe(guc);
> > > +	long ret;
> > > +
> > > +	/*
> > > +	 * As all queues of a multi queue group use single drm scheduler
> > > +	 * submit workqueue, CGP synchronization with GuC are serialized.
> > > +	 * Hence, no locking is required here.
> > > +	 * Wait for any pending CGP_SYNC_DONE response before updating the
> > > +	 * CGP page and sending CGP_SYNC message.
> > > +	 */
> > > +	ret = wait_event_timeout(guc->ct.wq,
> > > +				 !READ_ONCE(group->sync_pending) ||
> > > +				 xe_guc_read_stopped(guc), HZ);
> > > +	if (!ret || xe_guc_read_stopped(guc)) {
> > > +		drm_err(&xe->drm, "Wait for CGP_SYNC_DONE response failed!\n");
> > 
> > If this occurs you need a GT reset which should detect
> > group->sync_pending in guc_exec_queue_stop and clean it up.
> > 
> 
> hmm...ok, let me give that a try. Not sure how urgent is this as ideally
> it should never occur.
> 

It shouldn't occur, but for correctness best to at least attempt to get
this right upfront.

> > Also here is where VF migration needs to be considered. The
> > wait_event_timeout should pop out on vf_recovery being set, but not
> > trigger a GT reset. In this case we need likely need some per secondary
> > queue tracking state to figure out which secondary queues lost the CPG
> > syncs so that flow can recover. We can figure out part out a bit later
> > though.
> 
> Hmm...ok.
> 
> > 
> > > +		/* Something wrong with the CTB or GuC, no need to proceed */
> > > +		return;
> > > +	}
> > > +
> > > +	xe_guc_exec_queue_group_cgp_update(xe, q);
> > > +
> > > +	WRITE_ONCE(group->sync_pending, true);
> > > +	xe_guc_ct_send(&guc->ct, action, len, G2H_LEN_DW_MULTI_QUEUE_CONTEXT, 1);
> > 
> > The problem here appears to be two fold:
> > 
> > - The value of G2H_LEN_DW_MULTI_QUEUE_CONTEXT looks incorrect
> > - On multi-q registration both G2H credits and count are set but multi-q
> >  register doesn't produce a G2H response. See my comment above thinga
> >  getting leaked, that can't happen as PM will be off and eventually G2H
> >  credits will run out and deadlock the CT channel leading to a GT reset.
> > 
> 
> Responded above.
> 
> > > +}
> > > +
> > > +static void __register_exec_queue(struct xe_guc *guc,
> > > +				  struct guc_ctxt_registration_info *info)
> > > +{
> > > +	u32 action[] = {
> > > +		XE_GUC_ACTION_REGISTER_CONTEXT,
> > > +		info->flags,
> > > +		info->context_idx,
> > > +		info->engine_class,
> > > +		info->engine_submit_mask,
> > > +		info->wq_desc_lo,
> > > +		info->wq_desc_hi,
> > > +		info->wq_base_lo,
> > > +		info->wq_base_hi,
> > > +		info->wq_size,
> > > +		info->hwlrca_lo,
> > > +		info->hwlrca_hi,
> > > +	};
> > > +
> > > +	/* explicitly checks some fields that we might fixup later */
> > > +	xe_gt_assert(guc_to_gt(guc), info->wq_desc_lo ==
> > > +		     action[XE_GUC_REGISTER_CONTEXT_DATA_5_WQ_DESC_ADDR_LOWER]);
> > > +	xe_gt_assert(guc_to_gt(guc), info->wq_base_lo ==
> > > +		     action[XE_GUC_REGISTER_CONTEXT_DATA_7_WQ_BUF_BASE_LOWER]);
> > > +	xe_gt_assert(guc_to_gt(guc), info->hwlrca_lo ==
> > > +		     action[XE_GUC_REGISTER_CONTEXT_DATA_10_HW_LRC_ADDR]);
> > > +
> > > +	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
> > > +}
> > > +
> > > +static void __register_exec_queue_group(struct xe_guc *guc,
> > > +					struct xe_exec_queue *q,
> > > +					struct guc_ctxt_registration_info *info)
> > > +{
> > > +#define MAX_MULTI_QUEUE_REG_SIZE	(8)
> > > +	struct xe_device *xe = guc_to_xe(guc);
> > > +	u32 action[MAX_MULTI_QUEUE_REG_SIZE];
> > > +	int len = 0;
> > > +
> > > +	if (xe_exec_queue_is_multi_queue_primary(q)) {
> > > +		action[len++] = XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE;
> > 
> > Again as mentioned above, this command doesn't require G2H credits
> > unless this produces a XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE
> > response.
> > 
> 
> Yes, XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE will have a
> XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE response from GuC.
> 

Ah, ok. That at least explains why g2h outstanding is the correct value,
doesn't explain the credits though. Can you add a comment indicating
XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE results in
XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE?

> > > +		action[len++] = info->flags;
> > > +		action[len++] = info->context_idx;
> > > +		action[len++] = info->engine_class;
> > > +		action[len++] = info->engine_submit_mask;
> > > +		action[len++] = 0; /* Reserved */
> > > +		action[len++] = info->cgp_lo;
> > > +		action[len++] = info->cgp_hi;
> > > +	} else {
> > > +		/*
> > > +		 * No need to wait before CGP sync since CT descriptors
> > > +		 * should be ordered.
> > > +		 */
> > > +
> > > +		action[len++] = XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC;
> > > +		action[len++] = q->multi_queue.group->primary->guc->id;
> > > +	}
> > > +
> > > +	xe_assert(xe, len <= MAX_MULTI_QUEUE_REG_SIZE);
> > > +#undef MAX_MULTI_QUEUE_REG_SIZE
> > > +
> > > +	xe_guc_exec_queue_group_cgp_sync(guc, q, action, len);
> > > +}
> > > +
> > >  static void __register_mlrc_exec_queue(struct xe_guc *guc,
> > >  				       struct xe_exec_queue *q,
> > >  				       struct guc_ctxt_registration_info *info)
> > > @@ -622,35 +750,6 @@ static void __register_mlrc_exec_queue(struct xe_guc *guc,
> > >  	xe_guc_ct_send(&guc->ct, action, len, 0, 0);
> > >  }
> > > 
> > > -static void __register_exec_queue(struct xe_guc *guc,
> > > -				  struct guc_ctxt_registration_info *info)
> > > -{
> > > -	u32 action[] = {
> > > -		XE_GUC_ACTION_REGISTER_CONTEXT,
> > > -		info->flags,
> > > -		info->context_idx,
> > > -		info->engine_class,
> > > -		info->engine_submit_mask,
> > > -		info->wq_desc_lo,
> > > -		info->wq_desc_hi,
> > > -		info->wq_base_lo,
> > > -		info->wq_base_hi,
> > > -		info->wq_size,
> > > -		info->hwlrca_lo,
> > > -		info->hwlrca_hi,
> > > -	};
> > > -
> > > -	/* explicitly checks some fields that we might fixup later */
> > > -	xe_gt_assert(guc_to_gt(guc), info->wq_desc_lo ==
> > > -		     action[XE_GUC_REGISTER_CONTEXT_DATA_5_WQ_DESC_ADDR_LOWER]);
> > > -	xe_gt_assert(guc_to_gt(guc), info->wq_base_lo ==
> > > -		     action[XE_GUC_REGISTER_CONTEXT_DATA_7_WQ_BUF_BASE_LOWER]);
> > > -	xe_gt_assert(guc_to_gt(guc), info->hwlrca_lo ==
> > > -		     action[XE_GUC_REGISTER_CONTEXT_DATA_10_HW_LRC_ADDR]);
> > > -
> > > -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
> > > -}
> > > -
> > >  static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
> > >  {
> > >  	struct xe_guc *guc = exec_queue_to_guc(q);
> > > @@ -670,6 +769,13 @@ static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
> > >  	info.flags = CONTEXT_REGISTRATION_FLAG_KMD |
> > >  		FIELD_PREP(CONTEXT_REGISTRATION_FLAG_TYPE, ctx_type);
> > > 
> > > +	if (xe_exec_queue_is_multi_queue(q)) {
> > > +		struct xe_exec_queue_group *group = q->multi_queue.group;
> > > +
> > > +		info.cgp_lo = xe_bo_ggtt_addr(group->cgp_bo);
> > > +		info.cgp_hi = 0;
> > > +	}
> > > +
> > >  	if (xe_exec_queue_is_parallel(q)) {
> > >  		u64 ggtt_addr = xe_lrc_parallel_ggtt_addr(lrc);
> > >  		struct iosys_map map = xe_lrc_parallel_map(lrc);
> > > @@ -700,11 +806,15 @@ static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
> > > 
> > >  	set_exec_queue_registered(q);
> > >  	trace_xe_exec_queue_register(q);
> > > -	if (xe_exec_queue_is_parallel(q))
> > > +	if (xe_exec_queue_is_multi_queue(q))
> > > +		__register_exec_queue_group(guc, q, &info);
> > > +	else if (xe_exec_queue_is_parallel(q))
> > >  		__register_mlrc_exec_queue(guc, q, &info);
> > >  	else
> > >  		__register_exec_queue(guc, &info);
> > > -	init_policies(guc, q);
> > > +
> > > +	if (!xe_exec_queue_is_multi_queue_secondary(q))
> > > +		init_policies(guc, q);
> > >  }
> > > 
> > >  static u32 wq_space_until_wrap(struct xe_exec_queue *q)
> > > @@ -833,6 +943,12 @@ static void submit_exec_queue(struct xe_exec_queue *q, struct xe_sched_job *job)
> > >  	if (exec_queue_suspended(q) && !xe_exec_queue_is_parallel(q))
> > >  		return;
> > > 
> > > +	/*
> > > +	 * All queues in a multi-queue group will use the primary queue
> > > +	 * of the group to interface with GuC.
> > > +	 */
> > > +	q = xe_exec_queue_multi_queue_primary(q);
> > > +
> > >  	if (!exec_queue_enabled(q) && !exec_queue_suspended(q)) {
> > >  		action[len++] = XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> > >  		action[len++] = q->guc->id;
> > > @@ -879,6 +995,18 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
> > >  	trace_xe_sched_job_run(job);
> > > 
> > >  	if (!killed_or_banned_or_wedged && !xe_sched_job_is_error(job)) {
> > > +		if (xe_exec_queue_is_multi_queue_secondary(q)) {
> > > +			struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
> > > +
> > > +			if (exec_queue_killed_or_banned_or_wedged(primary)) {
> > > +				killed_or_banned_or_wedged = true;
> > > +				goto run_job_out;
> > > +			}
> > > +
> > > +			if (!exec_queue_registered(primary))
> > > +				register_exec_queue(primary, GUC_CONTEXT_NORMAL);
> > > +		}
> > > +
> > >  		if (!exec_queue_registered(q))
> > >  			register_exec_queue(q, GUC_CONTEXT_NORMAL);
> > >  		if (!job->skip_emit)
> > > @@ -887,6 +1015,7 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
> > >  		job->skip_emit = false;
> > >  	}
> > > 
> > > +run_job_out:
> > >  	/*
> > >  	 * We don't care about job-fence ordering in LR VMs because these fences
> > >  	 * are never exported; they are used solely to keep jobs on the pending
> > > @@ -912,6 +1041,11 @@ int xe_guc_read_stopped(struct xe_guc *guc)
> > >  	return atomic_read(&guc->submission_state.stopped);
> > >  }
> > > 
> > > +static void handle_multi_queue_secondary_sched_done(struct xe_guc *guc,
> > > +						    struct xe_exec_queue *q,
> > > +						    u32 runnable_state);
> > > +static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q);
> > > +
> > >  #define MAKE_SCHED_CONTEXT_ACTION(q, enable_disable)			\
> > >  	u32 action[] = {						\
> > >  		XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET,			\
> > > @@ -925,7 +1059,9 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
> > >  	MAKE_SCHED_CONTEXT_ACTION(q, DISABLE);
> > >  	int ret;
> > > 
> > > -	set_min_preemption_timeout(guc, q);
> > > +	if (!xe_exec_queue_is_multi_queue_secondary(q))
> > > +		set_min_preemption_timeout(guc, q);
> > > +
> > >  	smp_rmb();
> > >  	ret = wait_event_timeout(guc->ct.wq,
> > >  				 (!exec_queue_pending_enable(q) &&
> > > @@ -953,9 +1089,12 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
> > >  	 * Reserve space for both G2H here as the 2nd G2H is sent from a G2H
> > >  	 * handler and we are not allowed to reserved G2H space in handlers.
> > >  	 */
> > > -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> > > -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET +
> > > -		       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
> > > +	if (xe_exec_queue_is_multi_queue_secondary(q))
> > > +		handle_multi_queue_secondary_sched_done(guc, q, 0);
> > > +	else
> > > +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> > > +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET +
> > > +			       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
> > >  }
> > > 
> > >  static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
> > > @@ -1161,8 +1300,11 @@ static void enable_scheduling(struct xe_exec_queue *q)
> > >  	set_exec_queue_enabled(q);
> > >  	trace_xe_exec_queue_scheduling_enable(q);
> > > 
> > > -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> > > -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
> > > +	if (xe_exec_queue_is_multi_queue_secondary(q))
> > > +		handle_multi_queue_secondary_sched_done(guc, q, 1);
> > > +	else
> > > +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> > > +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
> > > 
> > >  	ret = wait_event_timeout(guc->ct.wq,
> > >  				 !exec_queue_pending_enable(q) ||
> > > @@ -1186,14 +1328,17 @@ static void disable_scheduling(struct xe_exec_queue *q, bool immediate)
> > >  	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
> > >  	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q));
> > > 
> > > -	if (immediate)
> > > +	if (immediate && !xe_exec_queue_is_multi_queue_secondary(q))
> > >  		set_min_preemption_timeout(guc, q);
> > >  	clear_exec_queue_enabled(q);
> > >  	set_exec_queue_pending_disable(q);
> > >  	trace_xe_exec_queue_scheduling_disable(q);
> > > 
> > > -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> > > -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
> > > +	if (xe_exec_queue_is_multi_queue_secondary(q))
> > > +		handle_multi_queue_secondary_sched_done(guc, q, 0);
> > > +	else
> > > +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> > > +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
> > >  }
> > > 
> > >  static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
> > > @@ -1211,8 +1356,11 @@ static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
> > >  	set_exec_queue_destroyed(q);
> > >  	trace_xe_exec_queue_deregister(q);
> > > 
> > > -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> > > -		       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
> > > +	if (xe_exec_queue_is_multi_queue_secondary(q))
> > > +		handle_deregister_done(guc, q);
> > > +	else
> > > +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> > > +			       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
> > >  }
> > > 
> > >  static enum drm_gpu_sched_stat
> > > @@ -1660,6 +1808,7 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
> > >  {
> > >  	struct xe_gpu_scheduler *sched;
> > >  	struct xe_guc *guc = exec_queue_to_guc(q);
> > > +	struct workqueue_struct *submit_wq = NULL;
> > >  	struct xe_guc_exec_queue *ge;
> > >  	long timeout;
> > >  	int err, i;
> > > @@ -1680,8 +1829,20 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
> > > 
> > >  	timeout = (q->vm && xe_vm_in_lr_mode(q->vm)) ? MAX_SCHEDULE_TIMEOUT :
> > >  		  msecs_to_jiffies(q->sched_props.job_timeout_ms);
> > > +
> > > +	/*
> > > +	 * Use primary queue's submit_wq for all secondary queues of a
> > > +	 * multi queue group. This serialization avoids any locking around
> > > +	 * CGP synchronization with GuC.
> > > +	 */
> > > +	if (xe_exec_queue_is_multi_queue_secondary(q)) {
> > > +		struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
> > > +
> > > +		submit_wq = primary->guc->sched.base.submit_wq;
> > > +	}
> > > +
> > >  	err = xe_sched_init(&ge->sched, &drm_sched_ops, &xe_sched_ops,
> > > -			    NULL, xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES, 64,
> > > +			    submit_wq, xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES, 64,
> > >  			    timeout, guc_to_gt(guc)->ordered_wq, NULL,
> > >  			    q->name, gt_to_xe(q->gt)->drm.dev);
> > >  	if (err)
> > > @@ -2418,7 +2579,11 @@ static void deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
> > > 
> > >  	trace_xe_exec_queue_deregister(q);
> > > 
> > > -	xe_guc_ct_send_g2h_handler(&guc->ct, action, ARRAY_SIZE(action));
> > > +	if (xe_exec_queue_is_multi_queue_secondary(q))
> > > +		handle_deregister_done(guc, q);
> > > +	else
> > > +		xe_guc_ct_send_g2h_handler(&guc->ct, action,
> > > +					   ARRAY_SIZE(action));
> > >  }
> > > 
> > >  static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
> > > @@ -2468,6 +2633,15 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
> > >  	}
> > >  }
> > > 
> > > +static void handle_multi_queue_secondary_sched_done(struct xe_guc *guc,
> > > +						    struct xe_exec_queue *q,
> > > +						    u32 runnable_state)
> > > +{
> > > +	mutex_lock(&guc->ct.lock);
> > 
> > I don't think you need the CT lock here. This per-queue state which
> > should be safe to modify without the any lock. The CT lock never
> > protects queue state, we just happen to have it in G2H responses because
> > of how the CT layer works.
> > 
> 
> Without the CT lock here, I get lockdep warning from _guc_ct_send_locked(),
> h2g_has_room() etc. So, I guess we need to keep it.
> 

Ah, yes. I missed that part. If you send another H2G you will indeed
need the CT lock. Can you add a comment around that? Easy to forget
this.

Matt

> > > +	handle_sched_done(guc, q, runnable_state);
> > > +	mutex_unlock(&guc->ct.lock);
> > > +}
> > > +
> > >  int xe_guc_sched_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
> > >  {
> > >  	struct xe_exec_queue *q;
> > > @@ -2672,6 +2846,44 @@ int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 le
> > >  	return 0;
> > >  }
> > > 
> > > +/**
> > > + * xe_guc_exec_queue_cgp_sync_done_handler - CGP synchronization done handler
> > > + * @guc: guc
> > > + * @msg: message indicating CGP sync done
> > > + * @len: length of message
> > > + *
> > > + * Set multi queue group's sync_pending flag to false and wakeup anyone waiting
> > > + * for CGP synchronization to complete.
> > > + *
> > > + * Return: 0 on success, -EPROTO for malformed messages.
> > > + */
> > > +int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
> > > +{
> > > +	struct xe_device *xe = guc_to_xe(guc);
> > > +	struct xe_exec_queue *q;
> > > +	u32 guc_id = msg[0];
> > > +
> > > +	if (unlikely(len < 1)) {
> > > +		drm_err(&xe->drm, "Invalid CGP_SYNC_DONE length %u", len);
> > > +		return -EPROTO;
> > > +	}
> > > +
> > > +	q = g2h_exec_queue_lookup(guc, guc_id);
> > > +	if (unlikely(!q))
> > > +		return -EPROTO;
> > > +
> > > +	if (!xe_exec_queue_is_multi_queue_primary(q)) {
> > > +		drm_err(&xe->drm, "Unexpected CGP_SYNC_DONE response");
> > > +		return -EPROTO;
> > > +	}
> > > +
> > > +	/* Wakeup the serialized cgp update wait */
> > > +	WRITE_ONCE(q->multi_queue.group->sync_pending, false);
> > 
> > So here - I suspect we need to associate the CGP_SYNC_DONE with a
> > secondary queue state tracking in order to get VF migration to work.
> > Again we can figure his part of a bit later but should be considered.
> > 
> 
> Hmm..ok.
> 
> > Matt
> > 
> > > +	wake_up_all(&guc->ct.wq);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  static void
> > >  guc_exec_queue_wq_snapshot_capture(struct xe_exec_queue *q,
> > >  				   struct xe_guc_submit_exec_queue_snapshot *snapshot)
> > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
> > > index b49a2748ec46..abfa94bce391 100644
> > > --- a/drivers/gpu/drm/xe/xe_guc_submit.h
> > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.h
> > > @@ -34,6 +34,7 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
> > >  					       u32 len);
> > >  int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 len);
> > >  int xe_guc_error_capture_handler(struct xe_guc *guc, u32 *msg, u32 len);
> > > +int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
> > > 
> > >  struct xe_guc_submit_exec_queue_snapshot *
> > >  xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q);
> > > --
> > > 2.43.0
> > > 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] drm/xe/multi_queue: Add GuC interface for multi queue support
  2025-11-04 17:41       ` Matthew Brost
@ 2025-11-04 18:55         ` Niranjana Vishwanathapura
  2025-11-04 19:26           ` Matthew Brost
  0 siblings, 1 reply; 61+ messages in thread
From: Niranjana Vishwanathapura @ 2025-11-04 18:55 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

On Tue, Nov 04, 2025 at 09:41:48AM -0800, Matthew Brost wrote:
>On Mon, Nov 03, 2025 at 08:56:39PM -0800, Niranjana Vishwanathapura wrote:
>> On Sat, Nov 01, 2025 at 11:07:08AM -0700, Matthew Brost wrote:
>> > On Fri, Oct 31, 2025 at 11:29:23AM -0700, Niranjana Vishwanathapura wrote:
>> > > Implement GuC commands and response along with the Context
>> > > Group Page (CGP) interface for multi queue support.
>> > >
>> > > Ensure that only primary queue (q0) of a multi queue group
>> > > communicate with GuC. The secondary queues of the group only
>> > > need to maintain LRCA and interface with drm scheduler.
>> > >
>> > > Use primary queue's submit_wq for all secondary queues of a multi
>> > > queue group. This serialization avoids any locking around CGP
>> > > synchronization with GuC.
>> > >
>> >
>> > Not a complete review, but a few comments.
>> >
>> > > Signed-off-by: Stuart Summers <stuart.summers@intel.com>
>> > > Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>> > > ---
>> > >  drivers/gpu/drm/xe/abi/guc_actions_abi.h |   3 +
>> > >  drivers/gpu/drm/xe/xe_exec_queue_types.h |   2 +
>> > >  drivers/gpu/drm/xe/xe_guc_ct.c           |   4 +
>> > >  drivers/gpu/drm/xe/xe_guc_fwif.h         |   3 +
>> > >  drivers/gpu/drm/xe/xe_guc_submit.c       | 302 +++++++++++++++++++----
>> > >  drivers/gpu/drm/xe/xe_guc_submit.h       |   1 +
>> > >  6 files changed, 270 insertions(+), 45 deletions(-)
>> > >
>> > > diff --git a/drivers/gpu/drm/xe/abi/guc_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
>> > > index 47756e4674a1..3e9fbed9cda6 100644
>> > > --- a/drivers/gpu/drm/xe/abi/guc_actions_abi.h
>> > > +++ b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
>> > > @@ -139,6 +139,9 @@ enum xe_guc_action {
>> > >  	XE_GUC_ACTION_DEREGISTER_G2G = 0x4508,
>> > >  	XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
>> > >  	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
>> > > +	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE = 0x4602,
>> > > +	XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC = 0x4603,
>> > > +	XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE = 0x4604,
>> > >  	XE_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507,
>> > >  	XE_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A,
>> > >  	XE_GUC_ACTION_SET_DEVICE_ENGINE_ACTIVITY_BUFFER = 0x550C,
>> > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> > > index 3856776df5c4..38e47b003259 100644
>> > > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> > > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
>> > > @@ -47,6 +47,8 @@ struct xe_exec_queue_group {
>> > >  	struct xarray xa;
>> > >  	/** @list_lock: Secondary queue list lock */
>> > >  	struct mutex list_lock;
>> > > +	/** @sync_pending: CGP_SYNC_DONE g2h response pending */
>> > > +	bool sync_pending;
>> > >  };
>> > >
>> > >  /**
>> > > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
>> > > index e68953ef3a00..48b5006eb080 100644
>> > > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
>> > > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
>> > > @@ -1304,6 +1304,7 @@ static int parse_g2h_event(struct xe_guc_ct *ct, u32 *msg, u32 len)
>> > >  	lockdep_assert_held(&ct->lock);
>> > >
>> > >  	switch (action) {
>> > > +	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
>> > >  	case XE_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
>> > >  	case XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
>> > >  	case XE_GUC_ACTION_SCHED_ENGINE_MODE_DONE:
>> > > @@ -1570,6 +1571,9 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
>> > >  		ret = xe_guc_g2g_test_notification(guc, payload, adj_len);
>> > >  		break;
>> > >  #endif
>> > > +	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
>> > > +		ret = xe_guc_exec_queue_cgp_sync_done_handler(guc, payload, adj_len);
>> > > +		break;
>> > >  	default:
>> > >  		xe_gt_err(gt, "unexpected G2H action 0x%04x\n", action);
>> > >  	}
>> > > diff --git a/drivers/gpu/drm/xe/xe_guc_fwif.h b/drivers/gpu/drm/xe/xe_guc_fwif.h
>> > > index c90dd266e9cf..610dfb2f1cb5 100644
>> > > --- a/drivers/gpu/drm/xe/xe_guc_fwif.h
>> > > +++ b/drivers/gpu/drm/xe/xe_guc_fwif.h
>> > > @@ -16,6 +16,7 @@
>> > >  #define G2H_LEN_DW_DEREGISTER_CONTEXT		3
>> > >  #define G2H_LEN_DW_TLB_INVALIDATE		3
>> > >  #define G2H_LEN_DW_G2G_NOTIFY_MIN		3
>> > > +#define G2H_LEN_DW_MULTI_QUEUE_CONTEXT		4
>> >
>> > This value doesn't look right. I'm not sure where 4 is coming from.
>> >
>> > The length of XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE
>> > appears to be 2. So with a value of 4, I believe the G2H credits will
>> > leak.
>> >
>> > You can run a multi-q test, then check the following debugfs:
>> >
>> > cat /sys/kernel/debug/dri/0/gt0/uc/guc_info
>> >
>> > In particular, these are the interesting fields:
>> >
>> > G2H CTB (all sizes in DW):
>> >        ...
>> > 	resv_space: 16384
>> >        ...
>> > 	g2h outstanding: 0
>> >
>> > ^^^ This is what an idle G2H should look like. I suspect both G2H
>> > outstanding values will be non-zero, and resv_space will continuously
>> > decrease when running a multi-queue test.
>> >
>>
>> Looks like G2H_LEN_DW_MULTI_QUEUE_CONTEXT should be 3.
>> 2 dwords header (HXG event) and 1 dword payload. Will change.
>>
>> However, I always saw 'g2h outsanding' being 0 and resv_space being 16384,
>> after running the multi-queue tests, irrespective of whether I set
>> G2H_LEN_DW_MULTI_QUEUE_CONTEXT to 3 or 4.
>>
>
>That is really odd the credits didn't get out screwed up. I'd double
>check on this as that doesn't seem right. Perhaps the runtime PM refs
>drop to zero and the GuC gets reloaded? We are removing that though here
>[1]. Maybe try with this series.
>
>[1] https://patchwork.freedesktop.org/series/154017/
>
>I still think the value should be 2 here as this like deregister_done
>which delivers a guc_id in msg[0].
>

Yah, it is similar to deregister_done. The value of
G2H_LEN_DW_DEREGISTER_CONTEXT is also set to 3 above.
I tried with 2, but got guc error.
I have set it to 3 in the v2 patch series.

>> > >
>> > >  #define GUC_ID_MAX			65535
>> > >  #define GUC_ID_UNKNOWN			0xffffffff
>> > > @@ -62,6 +63,8 @@ struct guc_ctxt_registration_info {
>> > >  	u32 wq_base_lo;
>> > >  	u32 wq_base_hi;
>> > >  	u32 wq_size;
>> > > +	u32 cgp_lo;
>> > > +	u32 cgp_hi;
>> > >  	u32 hwlrca_lo;
>> > >  	u32 hwlrca_hi;
>> > >  };
>> > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>> > > index d4ffdb71ef3d..d2aa9a2524e7 100644
>> > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>> > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>> > > @@ -46,6 +46,7 @@
>> > >  #include "xe_trace.h"
>> > >  #include "xe_uc_fw.h"
>> > >  #include "xe_vm.h"
>> > > +#include "xe_bo.h"
>> > >
>> > >  static struct xe_guc *
>> > >  exec_queue_to_guc(struct xe_exec_queue *q)
>> > > @@ -541,7 +542,8 @@ static void init_policies(struct xe_guc *guc, struct xe_exec_queue *q)
>> > >  	u32 slpc_exec_queue_freq_req = 0;
>> > >  	u32 preempt_timeout_us = q->sched_props.preempt_timeout_us;
>> > >
>> > > -	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
>> > > +	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q) &&
>> > > +		     !xe_exec_queue_is_multi_queue_secondary(q));
>> > >
>> > >  	if (q->flags & EXEC_QUEUE_FLAG_LOW_LATENCY)
>> > >  		slpc_exec_queue_freq_req |= SLPC_CTX_FREQ_REQ_IS_COMPUTE;
>> > > @@ -561,6 +563,8 @@ static void set_min_preemption_timeout(struct xe_guc *guc, struct xe_exec_queue
>> > >  {
>> > >  	struct exec_queue_policy policy;
>> > >
>> > > +	xe_assert(guc_to_xe(guc), !xe_exec_queue_is_multi_queue_secondary(q));
>> > > +
>> > >  	__guc_exec_queue_policy_start_klv(&policy, q->guc->id);
>> > >  	__guc_exec_queue_policy_add_preemption_timeout(&policy, 1);
>> > >
>> > > @@ -575,6 +579,130 @@ static void set_min_preemption_timeout(struct xe_guc *guc, struct xe_exec_queue
>> > >  	xe_map_wr_field(xe_, &map_, 0, struct guc_submit_parallel_scratch, \
>> > >  			field_, val_)
>> > >
>> > > +#define CGP_VERSION_MAJOR_SHIFT	8
>> > > +
>> > > +static void xe_guc_exec_queue_group_cgp_update(struct xe_device *xe,
>> > > +					       struct xe_exec_queue *q)
>> > > +{
>> > > +	struct xe_exec_queue_group *group = q->multi_queue.group;
>> > > +	u32 guc_id = group->primary->guc->id;
>> > > +
>> > > +	/* Currently implementing CGP version 1.0 */
>> > > +	xe_map_wr(xe, &group->cgp_bo->vmap, 0, u32,
>> > > +		  1 << CGP_VERSION_MAJOR_SHIFT);
>> > > +
>> > > +	xe_map_wr(xe, &group->cgp_bo->vmap,
>> > > +		  (32 + q->multi_queue.pos * 2) * sizeof(u32),
>> > > +		  u32, lower_32_bits(xe_lrc_descriptor(q->lrc[0])));
>> > > +
>> > > +	xe_map_wr(xe, &group->cgp_bo->vmap,
>> > > +		  (33 + q->multi_queue.pos * 2) * sizeof(u32),
>> > > +		  u32, guc_id);
>> > > +
>> > > +	if (q->multi_queue.pos / 32) {
>> > > +		xe_map_wr(xe, &group->cgp_bo->vmap, 17 * sizeof(u32),
>> > > +			  u32, BIT(q->multi_queue.pos % 32));
>> > > +		xe_map_wr(xe, &group->cgp_bo->vmap, 16 * sizeof(u32), u32, 0);
>> > > +	} else {
>> > > +		xe_map_wr(xe, &group->cgp_bo->vmap, 16 * sizeof(u32),
>> > > +			  u32, BIT(q->multi_queue.pos));
>> > > +		xe_map_wr(xe, &group->cgp_bo->vmap, 17 * sizeof(u32), u32, 0);
>> > > +	}
>> > > +}
>> > > +
>> > > +static void xe_guc_exec_queue_group_cgp_sync(struct xe_guc *guc,
>> > > +					     struct xe_exec_queue *q,
>> > > +					     const u32 *action, u32 len)
>> > > +{
>> > > +	struct xe_exec_queue_group *group = q->multi_queue.group;
>> > > +	struct xe_device *xe = guc_to_xe(guc);
>> > > +	long ret;
>> > > +
>> > > +	/*
>> > > +	 * As all queues of a multi queue group use single drm scheduler
>> > > +	 * submit workqueue, CGP synchronization with GuC are serialized.
>> > > +	 * Hence, no locking is required here.
>> > > +	 * Wait for any pending CGP_SYNC_DONE response before updating the
>> > > +	 * CGP page and sending CGP_SYNC message.
>> > > +	 */
>> > > +	ret = wait_event_timeout(guc->ct.wq,
>> > > +				 !READ_ONCE(group->sync_pending) ||
>> > > +				 xe_guc_read_stopped(guc), HZ);
>> > > +	if (!ret || xe_guc_read_stopped(guc)) {
>> > > +		drm_err(&xe->drm, "Wait for CGP_SYNC_DONE response failed!\n");
>> >
>> > If this occurs you need a GT reset which should detect
>> > group->sync_pending in guc_exec_queue_stop and clean it up.
>> >
>>
>> hmm...ok, let me give that a try. Not sure how urgent is this as ideally
>> it should never occur.
>>
>
>It shouldn't occur, but for correctness best to at least attempt to get
>this right upfront.

Ok, will work on it.

>
>> > Also here is where VF migration needs to be considered. The
>> > wait_event_timeout should pop out on vf_recovery being set, but not
>> > trigger a GT reset. In this case we need likely need some per secondary
>> > queue tracking state to figure out which secondary queues lost the CPG
>> > syncs so that flow can recover. We can figure out part out a bit later
>> > though.
>>
>> Hmm...ok.
>>
>> >
>> > > +		/* Something wrong with the CTB or GuC, no need to proceed */
>> > > +		return;
>> > > +	}
>> > > +
>> > > +	xe_guc_exec_queue_group_cgp_update(xe, q);
>> > > +
>> > > +	WRITE_ONCE(group->sync_pending, true);
>> > > +	xe_guc_ct_send(&guc->ct, action, len, G2H_LEN_DW_MULTI_QUEUE_CONTEXT, 1);
>> >
>> > The problem here appears to be two fold:
>> >
>> > - The value of G2H_LEN_DW_MULTI_QUEUE_CONTEXT looks incorrect
>> > - On multi-q registration both G2H credits and count are set but multi-q
>> >  register doesn't produce a G2H response. See my comment above thinga
>> >  getting leaked, that can't happen as PM will be off and eventually G2H
>> >  credits will run out and deadlock the CT channel leading to a GT reset.
>> >
>>
>> Responded above.
>>
>> > > +}
>> > > +
>> > > +static void __register_exec_queue(struct xe_guc *guc,
>> > > +				  struct guc_ctxt_registration_info *info)
>> > > +{
>> > > +	u32 action[] = {
>> > > +		XE_GUC_ACTION_REGISTER_CONTEXT,
>> > > +		info->flags,
>> > > +		info->context_idx,
>> > > +		info->engine_class,
>> > > +		info->engine_submit_mask,
>> > > +		info->wq_desc_lo,
>> > > +		info->wq_desc_hi,
>> > > +		info->wq_base_lo,
>> > > +		info->wq_base_hi,
>> > > +		info->wq_size,
>> > > +		info->hwlrca_lo,
>> > > +		info->hwlrca_hi,
>> > > +	};
>> > > +
>> > > +	/* explicitly checks some fields that we might fixup later */
>> > > +	xe_gt_assert(guc_to_gt(guc), info->wq_desc_lo ==
>> > > +		     action[XE_GUC_REGISTER_CONTEXT_DATA_5_WQ_DESC_ADDR_LOWER]);
>> > > +	xe_gt_assert(guc_to_gt(guc), info->wq_base_lo ==
>> > > +		     action[XE_GUC_REGISTER_CONTEXT_DATA_7_WQ_BUF_BASE_LOWER]);
>> > > +	xe_gt_assert(guc_to_gt(guc), info->hwlrca_lo ==
>> > > +		     action[XE_GUC_REGISTER_CONTEXT_DATA_10_HW_LRC_ADDR]);
>> > > +
>> > > +	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
>> > > +}
>> > > +
>> > > +static void __register_exec_queue_group(struct xe_guc *guc,
>> > > +					struct xe_exec_queue *q,
>> > > +					struct guc_ctxt_registration_info *info)
>> > > +{
>> > > +#define MAX_MULTI_QUEUE_REG_SIZE	(8)
>> > > +	struct xe_device *xe = guc_to_xe(guc);
>> > > +	u32 action[MAX_MULTI_QUEUE_REG_SIZE];
>> > > +	int len = 0;
>> > > +
>> > > +	if (xe_exec_queue_is_multi_queue_primary(q)) {
>> > > +		action[len++] = XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE;
>> >
>> > Again as mentioned above, this command doesn't require G2H credits
>> > unless this produces a XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE
>> > response.
>> >
>>
>> Yes, XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE will have a
>> XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE response from GuC.
>>
>
>Ah, ok. That at least explains why g2h outstanding is the correct value,
>doesn't explain the credits though. Can you add a comment indicating
>XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE results in
>XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE?
>

Ok, added comment in v2.

>> > > +		action[len++] = info->flags;
>> > > +		action[len++] = info->context_idx;
>> > > +		action[len++] = info->engine_class;
>> > > +		action[len++] = info->engine_submit_mask;
>> > > +		action[len++] = 0; /* Reserved */
>> > > +		action[len++] = info->cgp_lo;
>> > > +		action[len++] = info->cgp_hi;
>> > > +	} else {
>> > > +		/*
>> > > +		 * No need to wait before CGP sync since CT descriptors
>> > > +		 * should be ordered.
>> > > +		 */
>> > > +
>> > > +		action[len++] = XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC;
>> > > +		action[len++] = q->multi_queue.group->primary->guc->id;
>> > > +	}
>> > > +
>> > > +	xe_assert(xe, len <= MAX_MULTI_QUEUE_REG_SIZE);
>> > > +#undef MAX_MULTI_QUEUE_REG_SIZE
>> > > +
>> > > +	xe_guc_exec_queue_group_cgp_sync(guc, q, action, len);
>> > > +}
>> > > +
>> > >  static void __register_mlrc_exec_queue(struct xe_guc *guc,
>> > >  				       struct xe_exec_queue *q,
>> > >  				       struct guc_ctxt_registration_info *info)
>> > > @@ -622,35 +750,6 @@ static void __register_mlrc_exec_queue(struct xe_guc *guc,
>> > >  	xe_guc_ct_send(&guc->ct, action, len, 0, 0);
>> > >  }
>> > >
>> > > -static void __register_exec_queue(struct xe_guc *guc,
>> > > -				  struct guc_ctxt_registration_info *info)
>> > > -{
>> > > -	u32 action[] = {
>> > > -		XE_GUC_ACTION_REGISTER_CONTEXT,
>> > > -		info->flags,
>> > > -		info->context_idx,
>> > > -		info->engine_class,
>> > > -		info->engine_submit_mask,
>> > > -		info->wq_desc_lo,
>> > > -		info->wq_desc_hi,
>> > > -		info->wq_base_lo,
>> > > -		info->wq_base_hi,
>> > > -		info->wq_size,
>> > > -		info->hwlrca_lo,
>> > > -		info->hwlrca_hi,
>> > > -	};
>> > > -
>> > > -	/* explicitly checks some fields that we might fixup later */
>> > > -	xe_gt_assert(guc_to_gt(guc), info->wq_desc_lo ==
>> > > -		     action[XE_GUC_REGISTER_CONTEXT_DATA_5_WQ_DESC_ADDR_LOWER]);
>> > > -	xe_gt_assert(guc_to_gt(guc), info->wq_base_lo ==
>> > > -		     action[XE_GUC_REGISTER_CONTEXT_DATA_7_WQ_BUF_BASE_LOWER]);
>> > > -	xe_gt_assert(guc_to_gt(guc), info->hwlrca_lo ==
>> > > -		     action[XE_GUC_REGISTER_CONTEXT_DATA_10_HW_LRC_ADDR]);
>> > > -
>> > > -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
>> > > -}
>> > > -
>> > >  static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
>> > >  {
>> > >  	struct xe_guc *guc = exec_queue_to_guc(q);
>> > > @@ -670,6 +769,13 @@ static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
>> > >  	info.flags = CONTEXT_REGISTRATION_FLAG_KMD |
>> > >  		FIELD_PREP(CONTEXT_REGISTRATION_FLAG_TYPE, ctx_type);
>> > >
>> > > +	if (xe_exec_queue_is_multi_queue(q)) {
>> > > +		struct xe_exec_queue_group *group = q->multi_queue.group;
>> > > +
>> > > +		info.cgp_lo = xe_bo_ggtt_addr(group->cgp_bo);
>> > > +		info.cgp_hi = 0;
>> > > +	}
>> > > +
>> > >  	if (xe_exec_queue_is_parallel(q)) {
>> > >  		u64 ggtt_addr = xe_lrc_parallel_ggtt_addr(lrc);
>> > >  		struct iosys_map map = xe_lrc_parallel_map(lrc);
>> > > @@ -700,11 +806,15 @@ static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
>> > >
>> > >  	set_exec_queue_registered(q);
>> > >  	trace_xe_exec_queue_register(q);
>> > > -	if (xe_exec_queue_is_parallel(q))
>> > > +	if (xe_exec_queue_is_multi_queue(q))
>> > > +		__register_exec_queue_group(guc, q, &info);
>> > > +	else if (xe_exec_queue_is_parallel(q))
>> > >  		__register_mlrc_exec_queue(guc, q, &info);
>> > >  	else
>> > >  		__register_exec_queue(guc, &info);
>> > > -	init_policies(guc, q);
>> > > +
>> > > +	if (!xe_exec_queue_is_multi_queue_secondary(q))
>> > > +		init_policies(guc, q);
>> > >  }
>> > >
>> > >  static u32 wq_space_until_wrap(struct xe_exec_queue *q)
>> > > @@ -833,6 +943,12 @@ static void submit_exec_queue(struct xe_exec_queue *q, struct xe_sched_job *job)
>> > >  	if (exec_queue_suspended(q) && !xe_exec_queue_is_parallel(q))
>> > >  		return;
>> > >
>> > > +	/*
>> > > +	 * All queues in a multi-queue group will use the primary queue
>> > > +	 * of the group to interface with GuC.
>> > > +	 */
>> > > +	q = xe_exec_queue_multi_queue_primary(q);
>> > > +
>> > >  	if (!exec_queue_enabled(q) && !exec_queue_suspended(q)) {
>> > >  		action[len++] = XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
>> > >  		action[len++] = q->guc->id;
>> > > @@ -879,6 +995,18 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
>> > >  	trace_xe_sched_job_run(job);
>> > >
>> > >  	if (!killed_or_banned_or_wedged && !xe_sched_job_is_error(job)) {
>> > > +		if (xe_exec_queue_is_multi_queue_secondary(q)) {
>> > > +			struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
>> > > +
>> > > +			if (exec_queue_killed_or_banned_or_wedged(primary)) {
>> > > +				killed_or_banned_or_wedged = true;
>> > > +				goto run_job_out;
>> > > +			}
>> > > +
>> > > +			if (!exec_queue_registered(primary))
>> > > +				register_exec_queue(primary, GUC_CONTEXT_NORMAL);
>> > > +		}
>> > > +
>> > >  		if (!exec_queue_registered(q))
>> > >  			register_exec_queue(q, GUC_CONTEXT_NORMAL);
>> > >  		if (!job->skip_emit)
>> > > @@ -887,6 +1015,7 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
>> > >  		job->skip_emit = false;
>> > >  	}
>> > >
>> > > +run_job_out:
>> > >  	/*
>> > >  	 * We don't care about job-fence ordering in LR VMs because these fences
>> > >  	 * are never exported; they are used solely to keep jobs on the pending
>> > > @@ -912,6 +1041,11 @@ int xe_guc_read_stopped(struct xe_guc *guc)
>> > >  	return atomic_read(&guc->submission_state.stopped);
>> > >  }
>> > >
>> > > +static void handle_multi_queue_secondary_sched_done(struct xe_guc *guc,
>> > > +						    struct xe_exec_queue *q,
>> > > +						    u32 runnable_state);
>> > > +static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q);
>> > > +
>> > >  #define MAKE_SCHED_CONTEXT_ACTION(q, enable_disable)			\
>> > >  	u32 action[] = {						\
>> > >  		XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET,			\
>> > > @@ -925,7 +1059,9 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
>> > >  	MAKE_SCHED_CONTEXT_ACTION(q, DISABLE);
>> > >  	int ret;
>> > >
>> > > -	set_min_preemption_timeout(guc, q);
>> > > +	if (!xe_exec_queue_is_multi_queue_secondary(q))
>> > > +		set_min_preemption_timeout(guc, q);
>> > > +
>> > >  	smp_rmb();
>> > >  	ret = wait_event_timeout(guc->ct.wq,
>> > >  				 (!exec_queue_pending_enable(q) &&
>> > > @@ -953,9 +1089,12 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
>> > >  	 * Reserve space for both G2H here as the 2nd G2H is sent from a G2H
>> > >  	 * handler and we are not allowed to reserved G2H space in handlers.
>> > >  	 */
>> > > -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> > > -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET +
>> > > -		       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
>> > > +	if (xe_exec_queue_is_multi_queue_secondary(q))
>> > > +		handle_multi_queue_secondary_sched_done(guc, q, 0);
>> > > +	else
>> > > +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> > > +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET +
>> > > +			       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
>> > >  }
>> > >
>> > >  static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
>> > > @@ -1161,8 +1300,11 @@ static void enable_scheduling(struct xe_exec_queue *q)
>> > >  	set_exec_queue_enabled(q);
>> > >  	trace_xe_exec_queue_scheduling_enable(q);
>> > >
>> > > -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> > > -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
>> > > +	if (xe_exec_queue_is_multi_queue_secondary(q))
>> > > +		handle_multi_queue_secondary_sched_done(guc, q, 1);
>> > > +	else
>> > > +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> > > +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
>> > >
>> > >  	ret = wait_event_timeout(guc->ct.wq,
>> > >  				 !exec_queue_pending_enable(q) ||
>> > > @@ -1186,14 +1328,17 @@ static void disable_scheduling(struct xe_exec_queue *q, bool immediate)
>> > >  	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
>> > >  	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q));
>> > >
>> > > -	if (immediate)
>> > > +	if (immediate && !xe_exec_queue_is_multi_queue_secondary(q))
>> > >  		set_min_preemption_timeout(guc, q);
>> > >  	clear_exec_queue_enabled(q);
>> > >  	set_exec_queue_pending_disable(q);
>> > >  	trace_xe_exec_queue_scheduling_disable(q);
>> > >
>> > > -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> > > -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
>> > > +	if (xe_exec_queue_is_multi_queue_secondary(q))
>> > > +		handle_multi_queue_secondary_sched_done(guc, q, 0);
>> > > +	else
>> > > +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> > > +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
>> > >  }
>> > >
>> > >  static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
>> > > @@ -1211,8 +1356,11 @@ static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
>> > >  	set_exec_queue_destroyed(q);
>> > >  	trace_xe_exec_queue_deregister(q);
>> > >
>> > > -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> > > -		       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
>> > > +	if (xe_exec_queue_is_multi_queue_secondary(q))
>> > > +		handle_deregister_done(guc, q);
>> > > +	else
>> > > +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
>> > > +			       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
>> > >  }
>> > >
>> > >  static enum drm_gpu_sched_stat
>> > > @@ -1660,6 +1808,7 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
>> > >  {
>> > >  	struct xe_gpu_scheduler *sched;
>> > >  	struct xe_guc *guc = exec_queue_to_guc(q);
>> > > +	struct workqueue_struct *submit_wq = NULL;
>> > >  	struct xe_guc_exec_queue *ge;
>> > >  	long timeout;
>> > >  	int err, i;
>> > > @@ -1680,8 +1829,20 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
>> > >
>> > >  	timeout = (q->vm && xe_vm_in_lr_mode(q->vm)) ? MAX_SCHEDULE_TIMEOUT :
>> > >  		  msecs_to_jiffies(q->sched_props.job_timeout_ms);
>> > > +
>> > > +	/*
>> > > +	 * Use primary queue's submit_wq for all secondary queues of a
>> > > +	 * multi queue group. This serialization avoids any locking around
>> > > +	 * CGP synchronization with GuC.
>> > > +	 */
>> > > +	if (xe_exec_queue_is_multi_queue_secondary(q)) {
>> > > +		struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
>> > > +
>> > > +		submit_wq = primary->guc->sched.base.submit_wq;
>> > > +	}
>> > > +
>> > >  	err = xe_sched_init(&ge->sched, &drm_sched_ops, &xe_sched_ops,
>> > > -			    NULL, xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES, 64,
>> > > +			    submit_wq, xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES, 64,
>> > >  			    timeout, guc_to_gt(guc)->ordered_wq, NULL,
>> > >  			    q->name, gt_to_xe(q->gt)->drm.dev);
>> > >  	if (err)
>> > > @@ -2418,7 +2579,11 @@ static void deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
>> > >
>> > >  	trace_xe_exec_queue_deregister(q);
>> > >
>> > > -	xe_guc_ct_send_g2h_handler(&guc->ct, action, ARRAY_SIZE(action));
>> > > +	if (xe_exec_queue_is_multi_queue_secondary(q))
>> > > +		handle_deregister_done(guc, q);
>> > > +	else
>> > > +		xe_guc_ct_send_g2h_handler(&guc->ct, action,
>> > > +					   ARRAY_SIZE(action));
>> > >  }
>> > >
>> > >  static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
>> > > @@ -2468,6 +2633,15 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
>> > >  	}
>> > >  }
>> > >
>> > > +static void handle_multi_queue_secondary_sched_done(struct xe_guc *guc,
>> > > +						    struct xe_exec_queue *q,
>> > > +						    u32 runnable_state)
>> > > +{
>> > > +	mutex_lock(&guc->ct.lock);
>> >
>> > I don't think you need the CT lock here. This per-queue state which
>> > should be safe to modify without the any lock. The CT lock never
>> > protects queue state, we just happen to have it in G2H responses because
>> > of how the CT layer works.
>> >
>>
>> Without the CT lock here, I get lockdep warning from _guc_ct_send_locked(),
>> h2g_has_room() etc. So, I guess we need to keep it.
>>
>
>Ah, yes. I missed that part. If you send another H2G you will indeed
>need the CT lock. Can you add a comment around that? Easy to forget
>this.
>

Ok, added comment in v2.

Niranjana

>Matt
>
>> > > +	handle_sched_done(guc, q, runnable_state);
>> > > +	mutex_unlock(&guc->ct.lock);
>> > > +}
>> > > +
>> > >  int xe_guc_sched_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
>> > >  {
>> > >  	struct xe_exec_queue *q;
>> > > @@ -2672,6 +2846,44 @@ int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 le
>> > >  	return 0;
>> > >  }
>> > >
>> > > +/**
>> > > + * xe_guc_exec_queue_cgp_sync_done_handler - CGP synchronization done handler
>> > > + * @guc: guc
>> > > + * @msg: message indicating CGP sync done
>> > > + * @len: length of message
>> > > + *
>> > > + * Set multi queue group's sync_pending flag to false and wakeup anyone waiting
>> > > + * for CGP synchronization to complete.
>> > > + *
>> > > + * Return: 0 on success, -EPROTO for malformed messages.
>> > > + */
>> > > +int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
>> > > +{
>> > > +	struct xe_device *xe = guc_to_xe(guc);
>> > > +	struct xe_exec_queue *q;
>> > > +	u32 guc_id = msg[0];
>> > > +
>> > > +	if (unlikely(len < 1)) {
>> > > +		drm_err(&xe->drm, "Invalid CGP_SYNC_DONE length %u", len);
>> > > +		return -EPROTO;
>> > > +	}
>> > > +
>> > > +	q = g2h_exec_queue_lookup(guc, guc_id);
>> > > +	if (unlikely(!q))
>> > > +		return -EPROTO;
>> > > +
>> > > +	if (!xe_exec_queue_is_multi_queue_primary(q)) {
>> > > +		drm_err(&xe->drm, "Unexpected CGP_SYNC_DONE response");
>> > > +		return -EPROTO;
>> > > +	}
>> > > +
>> > > +	/* Wakeup the serialized cgp update wait */
>> > > +	WRITE_ONCE(q->multi_queue.group->sync_pending, false);
>> >
>> > So here - I suspect we need to associate the CGP_SYNC_DONE with a
>> > secondary queue state tracking in order to get VF migration to work.
>> > Again we can figure his part of a bit later but should be considered.
>> >
>>
>> Hmm..ok.
>>
>> > Matt
>> >
>> > > +	wake_up_all(&guc->ct.wq);
>> > > +
>> > > +	return 0;
>> > > +}
>> > > +
>> > >  static void
>> > >  guc_exec_queue_wq_snapshot_capture(struct xe_exec_queue *q,
>> > >  				   struct xe_guc_submit_exec_queue_snapshot *snapshot)
>> > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
>> > > index b49a2748ec46..abfa94bce391 100644
>> > > --- a/drivers/gpu/drm/xe/xe_guc_submit.h
>> > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.h
>> > > @@ -34,6 +34,7 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
>> > >  					       u32 len);
>> > >  int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 len);
>> > >  int xe_guc_error_capture_handler(struct xe_guc *guc, u32 *msg, u32 len);
>> > > +int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
>> > >
>> > >  struct xe_guc_submit_exec_queue_snapshot *
>> > >  xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q);
>> > > --
>> > > 2.43.0
>> > >

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/16] drm/xe/multi_queue: Add GuC interface for multi queue support
  2025-11-04 18:55         ` Niranjana Vishwanathapura
@ 2025-11-04 19:26           ` Matthew Brost
  0 siblings, 0 replies; 61+ messages in thread
From: Matthew Brost @ 2025-11-04 19:26 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Tue, Nov 04, 2025 at 10:55:54AM -0800, Niranjana Vishwanathapura wrote:
> On Tue, Nov 04, 2025 at 09:41:48AM -0800, Matthew Brost wrote:
> > On Mon, Nov 03, 2025 at 08:56:39PM -0800, Niranjana Vishwanathapura wrote:
> > > On Sat, Nov 01, 2025 at 11:07:08AM -0700, Matthew Brost wrote:
> > > > On Fri, Oct 31, 2025 at 11:29:23AM -0700, Niranjana Vishwanathapura wrote:
> > > > > Implement GuC commands and response along with the Context
> > > > > Group Page (CGP) interface for multi queue support.
> > > > >
> > > > > Ensure that only primary queue (q0) of a multi queue group
> > > > > communicate with GuC. The secondary queues of the group only
> > > > > need to maintain LRCA and interface with drm scheduler.
> > > > >
> > > > > Use primary queue's submit_wq for all secondary queues of a multi
> > > > > queue group. This serialization avoids any locking around CGP
> > > > > synchronization with GuC.
> > > > >
> > > >
> > > > Not a complete review, but a few comments.
> > > >
> > > > > Signed-off-by: Stuart Summers <stuart.summers@intel.com>
> > > > > Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/xe/abi/guc_actions_abi.h |   3 +
> > > > >  drivers/gpu/drm/xe/xe_exec_queue_types.h |   2 +
> > > > >  drivers/gpu/drm/xe/xe_guc_ct.c           |   4 +
> > > > >  drivers/gpu/drm/xe/xe_guc_fwif.h         |   3 +
> > > > >  drivers/gpu/drm/xe/xe_guc_submit.c       | 302 +++++++++++++++++++----
> > > > >  drivers/gpu/drm/xe/xe_guc_submit.h       |   1 +
> > > > >  6 files changed, 270 insertions(+), 45 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/xe/abi/guc_actions_abi.h b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> > > > > index 47756e4674a1..3e9fbed9cda6 100644
> > > > > --- a/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> > > > > +++ b/drivers/gpu/drm/xe/abi/guc_actions_abi.h
> > > > > @@ -139,6 +139,9 @@ enum xe_guc_action {
> > > > >  	XE_GUC_ACTION_DEREGISTER_G2G = 0x4508,
> > > > >  	XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
> > > > >  	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
> > > > > +	XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE = 0x4602,
> > > > > +	XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC = 0x4603,
> > > > > +	XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE = 0x4604,
> > > > >  	XE_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507,
> > > > >  	XE_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A,
> > > > >  	XE_GUC_ACTION_SET_DEVICE_ENGINE_ACTIVITY_BUFFER = 0x550C,
> > > > > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > > index 3856776df5c4..38e47b003259 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > > > > @@ -47,6 +47,8 @@ struct xe_exec_queue_group {
> > > > >  	struct xarray xa;
> > > > >  	/** @list_lock: Secondary queue list lock */
> > > > >  	struct mutex list_lock;
> > > > > +	/** @sync_pending: CGP_SYNC_DONE g2h response pending */
> > > > > +	bool sync_pending;
> > > > >  };
> > > > >
> > > > >  /**
> > > > > diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > > > index e68953ef3a00..48b5006eb080 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> > > > > @@ -1304,6 +1304,7 @@ static int parse_g2h_event(struct xe_guc_ct *ct, u32 *msg, u32 len)
> > > > >  	lockdep_assert_held(&ct->lock);
> > > > >
> > > > >  	switch (action) {
> > > > > +	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
> > > > >  	case XE_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
> > > > >  	case XE_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
> > > > >  	case XE_GUC_ACTION_SCHED_ENGINE_MODE_DONE:
> > > > > @@ -1570,6 +1571,9 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
> > > > >  		ret = xe_guc_g2g_test_notification(guc, payload, adj_len);
> > > > >  		break;
> > > > >  #endif
> > > > > +	case XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE:
> > > > > +		ret = xe_guc_exec_queue_cgp_sync_done_handler(guc, payload, adj_len);
> > > > > +		break;
> > > > >  	default:
> > > > >  		xe_gt_err(gt, "unexpected G2H action 0x%04x\n", action);
> > > > >  	}
> > > > > diff --git a/drivers/gpu/drm/xe/xe_guc_fwif.h b/drivers/gpu/drm/xe/xe_guc_fwif.h
> > > > > index c90dd266e9cf..610dfb2f1cb5 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_guc_fwif.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_guc_fwif.h
> > > > > @@ -16,6 +16,7 @@
> > > > >  #define G2H_LEN_DW_DEREGISTER_CONTEXT		3
> > > > >  #define G2H_LEN_DW_TLB_INVALIDATE		3
> > > > >  #define G2H_LEN_DW_G2G_NOTIFY_MIN		3
> > > > > +#define G2H_LEN_DW_MULTI_QUEUE_CONTEXT		4
> > > >
> > > > This value doesn't look right. I'm not sure where 4 is coming from.
> > > >
> > > > The length of XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE
> > > > appears to be 2. So with a value of 4, I believe the G2H credits will
> > > > leak.
> > > >
> > > > You can run a multi-q test, then check the following debugfs:
> > > >
> > > > cat /sys/kernel/debug/dri/0/gt0/uc/guc_info
> > > >
> > > > In particular, these are the interesting fields:
> > > >
> > > > G2H CTB (all sizes in DW):
> > > >        ...
> > > > 	resv_space: 16384
> > > >        ...
> > > > 	g2h outstanding: 0
> > > >
> > > > ^^^ This is what an idle G2H should look like. I suspect both G2H
> > > > outstanding values will be non-zero, and resv_space will continuously
> > > > decrease when running a multi-queue test.
> > > >
> > > 
> > > Looks like G2H_LEN_DW_MULTI_QUEUE_CONTEXT should be 3.
> > > 2 dwords header (HXG event) and 1 dword payload. Will change.
> > > 
> > > However, I always saw 'g2h outsanding' being 0 and resv_space being 16384,
> > > after running the multi-queue tests, irrespective of whether I set
> > > G2H_LEN_DW_MULTI_QUEUE_CONTEXT to 3 or 4.
> > > 
> > 
> > That is really odd the credits didn't get out screwed up. I'd double
> > check on this as that doesn't seem right. Perhaps the runtime PM refs
> > drop to zero and the GuC gets reloaded? We are removing that though here
> > [1]. Maybe try with this series.
> > 
> > [1] https://patchwork.freedesktop.org/series/154017/
> > 
> > I still think the value should be 2 here as this like deregister_done
> > which delivers a guc_id in msg[0].
> > 
> 
> Yah, it is similar to deregister_done. The value of
> G2H_LEN_DW_DEREGISTER_CONTEXT is also set to 3 above.
> I tried with 2, but got guc error.
> I have set it to 3 in the v2 patch series.
> 

Ah, is 3. I thought it was 2. So yes, 3 appears to be the correct value.

Matt

> > > > >
> > > > >  #define GUC_ID_MAX			65535
> > > > >  #define GUC_ID_UNKNOWN			0xffffffff
> > > > > @@ -62,6 +63,8 @@ struct guc_ctxt_registration_info {
> > > > >  	u32 wq_base_lo;
> > > > >  	u32 wq_base_hi;
> > > > >  	u32 wq_size;
> > > > > +	u32 cgp_lo;
> > > > > +	u32 cgp_hi;
> > > > >  	u32 hwlrca_lo;
> > > > >  	u32 hwlrca_hi;
> > > > >  };
> > > > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > index d4ffdb71ef3d..d2aa9a2524e7 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > > > > @@ -46,6 +46,7 @@
> > > > >  #include "xe_trace.h"
> > > > >  #include "xe_uc_fw.h"
> > > > >  #include "xe_vm.h"
> > > > > +#include "xe_bo.h"
> > > > >
> > > > >  static struct xe_guc *
> > > > >  exec_queue_to_guc(struct xe_exec_queue *q)
> > > > > @@ -541,7 +542,8 @@ static void init_policies(struct xe_guc *guc, struct xe_exec_queue *q)
> > > > >  	u32 slpc_exec_queue_freq_req = 0;
> > > > >  	u32 preempt_timeout_us = q->sched_props.preempt_timeout_us;
> > > > >
> > > > > -	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
> > > > > +	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q) &&
> > > > > +		     !xe_exec_queue_is_multi_queue_secondary(q));
> > > > >
> > > > >  	if (q->flags & EXEC_QUEUE_FLAG_LOW_LATENCY)
> > > > >  		slpc_exec_queue_freq_req |= SLPC_CTX_FREQ_REQ_IS_COMPUTE;
> > > > > @@ -561,6 +563,8 @@ static void set_min_preemption_timeout(struct xe_guc *guc, struct xe_exec_queue
> > > > >  {
> > > > >  	struct exec_queue_policy policy;
> > > > >
> > > > > +	xe_assert(guc_to_xe(guc), !xe_exec_queue_is_multi_queue_secondary(q));
> > > > > +
> > > > >  	__guc_exec_queue_policy_start_klv(&policy, q->guc->id);
> > > > >  	__guc_exec_queue_policy_add_preemption_timeout(&policy, 1);
> > > > >
> > > > > @@ -575,6 +579,130 @@ static void set_min_preemption_timeout(struct xe_guc *guc, struct xe_exec_queue
> > > > >  	xe_map_wr_field(xe_, &map_, 0, struct guc_submit_parallel_scratch, \
> > > > >  			field_, val_)
> > > > >
> > > > > +#define CGP_VERSION_MAJOR_SHIFT	8
> > > > > +
> > > > > +static void xe_guc_exec_queue_group_cgp_update(struct xe_device *xe,
> > > > > +					       struct xe_exec_queue *q)
> > > > > +{
> > > > > +	struct xe_exec_queue_group *group = q->multi_queue.group;
> > > > > +	u32 guc_id = group->primary->guc->id;
> > > > > +
> > > > > +	/* Currently implementing CGP version 1.0 */
> > > > > +	xe_map_wr(xe, &group->cgp_bo->vmap, 0, u32,
> > > > > +		  1 << CGP_VERSION_MAJOR_SHIFT);
> > > > > +
> > > > > +	xe_map_wr(xe, &group->cgp_bo->vmap,
> > > > > +		  (32 + q->multi_queue.pos * 2) * sizeof(u32),
> > > > > +		  u32, lower_32_bits(xe_lrc_descriptor(q->lrc[0])));
> > > > > +
> > > > > +	xe_map_wr(xe, &group->cgp_bo->vmap,
> > > > > +		  (33 + q->multi_queue.pos * 2) * sizeof(u32),
> > > > > +		  u32, guc_id);
> > > > > +
> > > > > +	if (q->multi_queue.pos / 32) {
> > > > > +		xe_map_wr(xe, &group->cgp_bo->vmap, 17 * sizeof(u32),
> > > > > +			  u32, BIT(q->multi_queue.pos % 32));
> > > > > +		xe_map_wr(xe, &group->cgp_bo->vmap, 16 * sizeof(u32), u32, 0);
> > > > > +	} else {
> > > > > +		xe_map_wr(xe, &group->cgp_bo->vmap, 16 * sizeof(u32),
> > > > > +			  u32, BIT(q->multi_queue.pos));
> > > > > +		xe_map_wr(xe, &group->cgp_bo->vmap, 17 * sizeof(u32), u32, 0);
> > > > > +	}
> > > > > +}
> > > > > +
> > > > > +static void xe_guc_exec_queue_group_cgp_sync(struct xe_guc *guc,
> > > > > +					     struct xe_exec_queue *q,
> > > > > +					     const u32 *action, u32 len)
> > > > > +{
> > > > > +	struct xe_exec_queue_group *group = q->multi_queue.group;
> > > > > +	struct xe_device *xe = guc_to_xe(guc);
> > > > > +	long ret;
> > > > > +
> > > > > +	/*
> > > > > +	 * As all queues of a multi queue group use single drm scheduler
> > > > > +	 * submit workqueue, CGP synchronization with GuC are serialized.
> > > > > +	 * Hence, no locking is required here.
> > > > > +	 * Wait for any pending CGP_SYNC_DONE response before updating the
> > > > > +	 * CGP page and sending CGP_SYNC message.
> > > > > +	 */
> > > > > +	ret = wait_event_timeout(guc->ct.wq,
> > > > > +				 !READ_ONCE(group->sync_pending) ||
> > > > > +				 xe_guc_read_stopped(guc), HZ);
> > > > > +	if (!ret || xe_guc_read_stopped(guc)) {
> > > > > +		drm_err(&xe->drm, "Wait for CGP_SYNC_DONE response failed!\n");
> > > >
> > > > If this occurs you need a GT reset which should detect
> > > > group->sync_pending in guc_exec_queue_stop and clean it up.
> > > >
> > > 
> > > hmm...ok, let me give that a try. Not sure how urgent is this as ideally
> > > it should never occur.
> > > 
> > 
> > It shouldn't occur, but for correctness best to at least attempt to get
> > this right upfront.
> 
> Ok, will work on it.
> 
> > 
> > > > Also here is where VF migration needs to be considered. The
> > > > wait_event_timeout should pop out on vf_recovery being set, but not
> > > > trigger a GT reset. In this case we need likely need some per secondary
> > > > queue tracking state to figure out which secondary queues lost the CPG
> > > > syncs so that flow can recover. We can figure out part out a bit later
> > > > though.
> > > 
> > > Hmm...ok.
> > > 
> > > >
> > > > > +		/* Something wrong with the CTB or GuC, no need to proceed */
> > > > > +		return;
> > > > > +	}
> > > > > +
> > > > > +	xe_guc_exec_queue_group_cgp_update(xe, q);
> > > > > +
> > > > > +	WRITE_ONCE(group->sync_pending, true);
> > > > > +	xe_guc_ct_send(&guc->ct, action, len, G2H_LEN_DW_MULTI_QUEUE_CONTEXT, 1);
> > > >
> > > > The problem here appears to be two fold:
> > > >
> > > > - The value of G2H_LEN_DW_MULTI_QUEUE_CONTEXT looks incorrect
> > > > - On multi-q registration both G2H credits and count are set but multi-q
> > > >  register doesn't produce a G2H response. See my comment above thinga
> > > >  getting leaked, that can't happen as PM will be off and eventually G2H
> > > >  credits will run out and deadlock the CT channel leading to a GT reset.
> > > >
> > > 
> > > Responded above.
> > > 
> > > > > +}
> > > > > +
> > > > > +static void __register_exec_queue(struct xe_guc *guc,
> > > > > +				  struct guc_ctxt_registration_info *info)
> > > > > +{
> > > > > +	u32 action[] = {
> > > > > +		XE_GUC_ACTION_REGISTER_CONTEXT,
> > > > > +		info->flags,
> > > > > +		info->context_idx,
> > > > > +		info->engine_class,
> > > > > +		info->engine_submit_mask,
> > > > > +		info->wq_desc_lo,
> > > > > +		info->wq_desc_hi,
> > > > > +		info->wq_base_lo,
> > > > > +		info->wq_base_hi,
> > > > > +		info->wq_size,
> > > > > +		info->hwlrca_lo,
> > > > > +		info->hwlrca_hi,
> > > > > +	};
> > > > > +
> > > > > +	/* explicitly checks some fields that we might fixup later */
> > > > > +	xe_gt_assert(guc_to_gt(guc), info->wq_desc_lo ==
> > > > > +		     action[XE_GUC_REGISTER_CONTEXT_DATA_5_WQ_DESC_ADDR_LOWER]);
> > > > > +	xe_gt_assert(guc_to_gt(guc), info->wq_base_lo ==
> > > > > +		     action[XE_GUC_REGISTER_CONTEXT_DATA_7_WQ_BUF_BASE_LOWER]);
> > > > > +	xe_gt_assert(guc_to_gt(guc), info->hwlrca_lo ==
> > > > > +		     action[XE_GUC_REGISTER_CONTEXT_DATA_10_HW_LRC_ADDR]);
> > > > > +
> > > > > +	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
> > > > > +}
> > > > > +
> > > > > +static void __register_exec_queue_group(struct xe_guc *guc,
> > > > > +					struct xe_exec_queue *q,
> > > > > +					struct guc_ctxt_registration_info *info)
> > > > > +{
> > > > > +#define MAX_MULTI_QUEUE_REG_SIZE	(8)
> > > > > +	struct xe_device *xe = guc_to_xe(guc);
> > > > > +	u32 action[MAX_MULTI_QUEUE_REG_SIZE];
> > > > > +	int len = 0;
> > > > > +
> > > > > +	if (xe_exec_queue_is_multi_queue_primary(q)) {
> > > > > +		action[len++] = XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE;
> > > >
> > > > Again as mentioned above, this command doesn't require G2H credits
> > > > unless this produces a XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE
> > > > response.
> > > >
> > > 
> > > Yes, XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE will have a
> > > XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE response from GuC.
> > > 
> > 
> > Ah, ok. That at least explains why g2h outstanding is the correct value,
> > doesn't explain the credits though. Can you add a comment indicating
> > XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_QUEUE results in
> > XE_GUC_ACTION_NOTIFY_MULTI_QUEUE_CONTEXT_CGP_SYNC_DONE?
> > 
> 
> Ok, added comment in v2.
> 
> > > > > +		action[len++] = info->flags;
> > > > > +		action[len++] = info->context_idx;
> > > > > +		action[len++] = info->engine_class;
> > > > > +		action[len++] = info->engine_submit_mask;
> > > > > +		action[len++] = 0; /* Reserved */
> > > > > +		action[len++] = info->cgp_lo;
> > > > > +		action[len++] = info->cgp_hi;
> > > > > +	} else {
> > > > > +		/*
> > > > > +		 * No need to wait before CGP sync since CT descriptors
> > > > > +		 * should be ordered.
> > > > > +		 */
> > > > > +
> > > > > +		action[len++] = XE_GUC_ACTION_MULTI_QUEUE_CONTEXT_CGP_SYNC;
> > > > > +		action[len++] = q->multi_queue.group->primary->guc->id;
> > > > > +	}
> > > > > +
> > > > > +	xe_assert(xe, len <= MAX_MULTI_QUEUE_REG_SIZE);
> > > > > +#undef MAX_MULTI_QUEUE_REG_SIZE
> > > > > +
> > > > > +	xe_guc_exec_queue_group_cgp_sync(guc, q, action, len);
> > > > > +}
> > > > > +
> > > > >  static void __register_mlrc_exec_queue(struct xe_guc *guc,
> > > > >  				       struct xe_exec_queue *q,
> > > > >  				       struct guc_ctxt_registration_info *info)
> > > > > @@ -622,35 +750,6 @@ static void __register_mlrc_exec_queue(struct xe_guc *guc,
> > > > >  	xe_guc_ct_send(&guc->ct, action, len, 0, 0);
> > > > >  }
> > > > >
> > > > > -static void __register_exec_queue(struct xe_guc *guc,
> > > > > -				  struct guc_ctxt_registration_info *info)
> > > > > -{
> > > > > -	u32 action[] = {
> > > > > -		XE_GUC_ACTION_REGISTER_CONTEXT,
> > > > > -		info->flags,
> > > > > -		info->context_idx,
> > > > > -		info->engine_class,
> > > > > -		info->engine_submit_mask,
> > > > > -		info->wq_desc_lo,
> > > > > -		info->wq_desc_hi,
> > > > > -		info->wq_base_lo,
> > > > > -		info->wq_base_hi,
> > > > > -		info->wq_size,
> > > > > -		info->hwlrca_lo,
> > > > > -		info->hwlrca_hi,
> > > > > -	};
> > > > > -
> > > > > -	/* explicitly checks some fields that we might fixup later */
> > > > > -	xe_gt_assert(guc_to_gt(guc), info->wq_desc_lo ==
> > > > > -		     action[XE_GUC_REGISTER_CONTEXT_DATA_5_WQ_DESC_ADDR_LOWER]);
> > > > > -	xe_gt_assert(guc_to_gt(guc), info->wq_base_lo ==
> > > > > -		     action[XE_GUC_REGISTER_CONTEXT_DATA_7_WQ_BUF_BASE_LOWER]);
> > > > > -	xe_gt_assert(guc_to_gt(guc), info->hwlrca_lo ==
> > > > > -		     action[XE_GUC_REGISTER_CONTEXT_DATA_10_HW_LRC_ADDR]);
> > > > > -
> > > > > -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action), 0, 0);
> > > > > -}
> > > > > -
> > > > >  static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
> > > > >  {
> > > > >  	struct xe_guc *guc = exec_queue_to_guc(q);
> > > > > @@ -670,6 +769,13 @@ static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
> > > > >  	info.flags = CONTEXT_REGISTRATION_FLAG_KMD |
> > > > >  		FIELD_PREP(CONTEXT_REGISTRATION_FLAG_TYPE, ctx_type);
> > > > >
> > > > > +	if (xe_exec_queue_is_multi_queue(q)) {
> > > > > +		struct xe_exec_queue_group *group = q->multi_queue.group;
> > > > > +
> > > > > +		info.cgp_lo = xe_bo_ggtt_addr(group->cgp_bo);
> > > > > +		info.cgp_hi = 0;
> > > > > +	}
> > > > > +
> > > > >  	if (xe_exec_queue_is_parallel(q)) {
> > > > >  		u64 ggtt_addr = xe_lrc_parallel_ggtt_addr(lrc);
> > > > >  		struct iosys_map map = xe_lrc_parallel_map(lrc);
> > > > > @@ -700,11 +806,15 @@ static void register_exec_queue(struct xe_exec_queue *q, int ctx_type)
> > > > >
> > > > >  	set_exec_queue_registered(q);
> > > > >  	trace_xe_exec_queue_register(q);
> > > > > -	if (xe_exec_queue_is_parallel(q))
> > > > > +	if (xe_exec_queue_is_multi_queue(q))
> > > > > +		__register_exec_queue_group(guc, q, &info);
> > > > > +	else if (xe_exec_queue_is_parallel(q))
> > > > >  		__register_mlrc_exec_queue(guc, q, &info);
> > > > >  	else
> > > > >  		__register_exec_queue(guc, &info);
> > > > > -	init_policies(guc, q);
> > > > > +
> > > > > +	if (!xe_exec_queue_is_multi_queue_secondary(q))
> > > > > +		init_policies(guc, q);
> > > > >  }
> > > > >
> > > > >  static u32 wq_space_until_wrap(struct xe_exec_queue *q)
> > > > > @@ -833,6 +943,12 @@ static void submit_exec_queue(struct xe_exec_queue *q, struct xe_sched_job *job)
> > > > >  	if (exec_queue_suspended(q) && !xe_exec_queue_is_parallel(q))
> > > > >  		return;
> > > > >
> > > > > +	/*
> > > > > +	 * All queues in a multi-queue group will use the primary queue
> > > > > +	 * of the group to interface with GuC.
> > > > > +	 */
> > > > > +	q = xe_exec_queue_multi_queue_primary(q);
> > > > > +
> > > > >  	if (!exec_queue_enabled(q) && !exec_queue_suspended(q)) {
> > > > >  		action[len++] = XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> > > > >  		action[len++] = q->guc->id;
> > > > > @@ -879,6 +995,18 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
> > > > >  	trace_xe_sched_job_run(job);
> > > > >
> > > > >  	if (!killed_or_banned_or_wedged && !xe_sched_job_is_error(job)) {
> > > > > +		if (xe_exec_queue_is_multi_queue_secondary(q)) {
> > > > > +			struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
> > > > > +
> > > > > +			if (exec_queue_killed_or_banned_or_wedged(primary)) {
> > > > > +				killed_or_banned_or_wedged = true;
> > > > > +				goto run_job_out;
> > > > > +			}
> > > > > +
> > > > > +			if (!exec_queue_registered(primary))
> > > > > +				register_exec_queue(primary, GUC_CONTEXT_NORMAL);
> > > > > +		}
> > > > > +
> > > > >  		if (!exec_queue_registered(q))
> > > > >  			register_exec_queue(q, GUC_CONTEXT_NORMAL);
> > > > >  		if (!job->skip_emit)
> > > > > @@ -887,6 +1015,7 @@ guc_exec_queue_run_job(struct drm_sched_job *drm_job)
> > > > >  		job->skip_emit = false;
> > > > >  	}
> > > > >
> > > > > +run_job_out:
> > > > >  	/*
> > > > >  	 * We don't care about job-fence ordering in LR VMs because these fences
> > > > >  	 * are never exported; they are used solely to keep jobs on the pending
> > > > > @@ -912,6 +1041,11 @@ int xe_guc_read_stopped(struct xe_guc *guc)
> > > > >  	return atomic_read(&guc->submission_state.stopped);
> > > > >  }
> > > > >
> > > > > +static void handle_multi_queue_secondary_sched_done(struct xe_guc *guc,
> > > > > +						    struct xe_exec_queue *q,
> > > > > +						    u32 runnable_state);
> > > > > +static void handle_deregister_done(struct xe_guc *guc, struct xe_exec_queue *q);
> > > > > +
> > > > >  #define MAKE_SCHED_CONTEXT_ACTION(q, enable_disable)			\
> > > > >  	u32 action[] = {						\
> > > > >  		XE_GUC_ACTION_SCHED_CONTEXT_MODE_SET,			\
> > > > > @@ -925,7 +1059,9 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
> > > > >  	MAKE_SCHED_CONTEXT_ACTION(q, DISABLE);
> > > > >  	int ret;
> > > > >
> > > > > -	set_min_preemption_timeout(guc, q);
> > > > > +	if (!xe_exec_queue_is_multi_queue_secondary(q))
> > > > > +		set_min_preemption_timeout(guc, q);
> > > > > +
> > > > >  	smp_rmb();
> > > > >  	ret = wait_event_timeout(guc->ct.wq,
> > > > >  				 (!exec_queue_pending_enable(q) &&
> > > > > @@ -953,9 +1089,12 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
> > > > >  	 * Reserve space for both G2H here as the 2nd G2H is sent from a G2H
> > > > >  	 * handler and we are not allowed to reserved G2H space in handlers.
> > > > >  	 */
> > > > > -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> > > > > -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET +
> > > > > -		       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
> > > > > +	if (xe_exec_queue_is_multi_queue_secondary(q))
> > > > > +		handle_multi_queue_secondary_sched_done(guc, q, 0);
> > > > > +	else
> > > > > +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> > > > > +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET +
> > > > > +			       G2H_LEN_DW_DEREGISTER_CONTEXT, 2);
> > > > >  }
> > > > >
> > > > >  static void xe_guc_exec_queue_trigger_cleanup(struct xe_exec_queue *q)
> > > > > @@ -1161,8 +1300,11 @@ static void enable_scheduling(struct xe_exec_queue *q)
> > > > >  	set_exec_queue_enabled(q);
> > > > >  	trace_xe_exec_queue_scheduling_enable(q);
> > > > >
> > > > > -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> > > > > -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
> > > > > +	if (xe_exec_queue_is_multi_queue_secondary(q))
> > > > > +		handle_multi_queue_secondary_sched_done(guc, q, 1);
> > > > > +	else
> > > > > +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> > > > > +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
> > > > >
> > > > >  	ret = wait_event_timeout(guc->ct.wq,
> > > > >  				 !exec_queue_pending_enable(q) ||
> > > > > @@ -1186,14 +1328,17 @@ static void disable_scheduling(struct xe_exec_queue *q, bool immediate)
> > > > >  	xe_gt_assert(guc_to_gt(guc), exec_queue_registered(q));
> > > > >  	xe_gt_assert(guc_to_gt(guc), !exec_queue_pending_disable(q));
> > > > >
> > > > > -	if (immediate)
> > > > > +	if (immediate && !xe_exec_queue_is_multi_queue_secondary(q))
> > > > >  		set_min_preemption_timeout(guc, q);
> > > > >  	clear_exec_queue_enabled(q);
> > > > >  	set_exec_queue_pending_disable(q);
> > > > >  	trace_xe_exec_queue_scheduling_disable(q);
> > > > >
> > > > > -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> > > > > -		       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
> > > > > +	if (xe_exec_queue_is_multi_queue_secondary(q))
> > > > > +		handle_multi_queue_secondary_sched_done(guc, q, 0);
> > > > > +	else
> > > > > +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> > > > > +			       G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, 1);
> > > > >  }
> > > > >
> > > > >  static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
> > > > > @@ -1211,8 +1356,11 @@ static void __deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
> > > > >  	set_exec_queue_destroyed(q);
> > > > >  	trace_xe_exec_queue_deregister(q);
> > > > >
> > > > > -	xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> > > > > -		       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
> > > > > +	if (xe_exec_queue_is_multi_queue_secondary(q))
> > > > > +		handle_deregister_done(guc, q);
> > > > > +	else
> > > > > +		xe_guc_ct_send(&guc->ct, action, ARRAY_SIZE(action),
> > > > > +			       G2H_LEN_DW_DEREGISTER_CONTEXT, 1);
> > > > >  }
> > > > >
> > > > >  static enum drm_gpu_sched_stat
> > > > > @@ -1660,6 +1808,7 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
> > > > >  {
> > > > >  	struct xe_gpu_scheduler *sched;
> > > > >  	struct xe_guc *guc = exec_queue_to_guc(q);
> > > > > +	struct workqueue_struct *submit_wq = NULL;
> > > > >  	struct xe_guc_exec_queue *ge;
> > > > >  	long timeout;
> > > > >  	int err, i;
> > > > > @@ -1680,8 +1829,20 @@ static int guc_exec_queue_init(struct xe_exec_queue *q)
> > > > >
> > > > >  	timeout = (q->vm && xe_vm_in_lr_mode(q->vm)) ? MAX_SCHEDULE_TIMEOUT :
> > > > >  		  msecs_to_jiffies(q->sched_props.job_timeout_ms);
> > > > > +
> > > > > +	/*
> > > > > +	 * Use primary queue's submit_wq for all secondary queues of a
> > > > > +	 * multi queue group. This serialization avoids any locking around
> > > > > +	 * CGP synchronization with GuC.
> > > > > +	 */
> > > > > +	if (xe_exec_queue_is_multi_queue_secondary(q)) {
> > > > > +		struct xe_exec_queue *primary = xe_exec_queue_multi_queue_primary(q);
> > > > > +
> > > > > +		submit_wq = primary->guc->sched.base.submit_wq;
> > > > > +	}
> > > > > +
> > > > >  	err = xe_sched_init(&ge->sched, &drm_sched_ops, &xe_sched_ops,
> > > > > -			    NULL, xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES, 64,
> > > > > +			    submit_wq, xe_lrc_ring_size() / MAX_JOB_SIZE_BYTES, 64,
> > > > >  			    timeout, guc_to_gt(guc)->ordered_wq, NULL,
> > > > >  			    q->name, gt_to_xe(q->gt)->drm.dev);
> > > > >  	if (err)
> > > > > @@ -2418,7 +2579,11 @@ static void deregister_exec_queue(struct xe_guc *guc, struct xe_exec_queue *q)
> > > > >
> > > > >  	trace_xe_exec_queue_deregister(q);
> > > > >
> > > > > -	xe_guc_ct_send_g2h_handler(&guc->ct, action, ARRAY_SIZE(action));
> > > > > +	if (xe_exec_queue_is_multi_queue_secondary(q))
> > > > > +		handle_deregister_done(guc, q);
> > > > > +	else
> > > > > +		xe_guc_ct_send_g2h_handler(&guc->ct, action,
> > > > > +					   ARRAY_SIZE(action));
> > > > >  }
> > > > >
> > > > >  static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
> > > > > @@ -2468,6 +2633,15 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
> > > > >  	}
> > > > >  }
> > > > >
> > > > > +static void handle_multi_queue_secondary_sched_done(struct xe_guc *guc,
> > > > > +						    struct xe_exec_queue *q,
> > > > > +						    u32 runnable_state)
> > > > > +{
> > > > > +	mutex_lock(&guc->ct.lock);
> > > >
> > > > I don't think you need the CT lock here. This per-queue state which
> > > > should be safe to modify without the any lock. The CT lock never
> > > > protects queue state, we just happen to have it in G2H responses because
> > > > of how the CT layer works.
> > > >
> > > 
> > > Without the CT lock here, I get lockdep warning from _guc_ct_send_locked(),
> > > h2g_has_room() etc. So, I guess we need to keep it.
> > > 
> > 
> > Ah, yes. I missed that part. If you send another H2G you will indeed
> > need the CT lock. Can you add a comment around that? Easy to forget
> > this.
> > 
> 
> Ok, added comment in v2.
> 
> Niranjana
> 
> > Matt
> > 
> > > > > +	handle_sched_done(guc, q, runnable_state);
> > > > > +	mutex_unlock(&guc->ct.lock);
> > > > > +}
> > > > > +
> > > > >  int xe_guc_sched_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
> > > > >  {
> > > > >  	struct xe_exec_queue *q;
> > > > > @@ -2672,6 +2846,44 @@ int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 le
> > > > >  	return 0;
> > > > >  }
> > > > >
> > > > > +/**
> > > > > + * xe_guc_exec_queue_cgp_sync_done_handler - CGP synchronization done handler
> > > > > + * @guc: guc
> > > > > + * @msg: message indicating CGP sync done
> > > > > + * @len: length of message
> > > > > + *
> > > > > + * Set multi queue group's sync_pending flag to false and wakeup anyone waiting
> > > > > + * for CGP synchronization to complete.
> > > > > + *
> > > > > + * Return: 0 on success, -EPROTO for malformed messages.
> > > > > + */
> > > > > +int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
> > > > > +{
> > > > > +	struct xe_device *xe = guc_to_xe(guc);
> > > > > +	struct xe_exec_queue *q;
> > > > > +	u32 guc_id = msg[0];
> > > > > +
> > > > > +	if (unlikely(len < 1)) {
> > > > > +		drm_err(&xe->drm, "Invalid CGP_SYNC_DONE length %u", len);
> > > > > +		return -EPROTO;
> > > > > +	}
> > > > > +
> > > > > +	q = g2h_exec_queue_lookup(guc, guc_id);
> > > > > +	if (unlikely(!q))
> > > > > +		return -EPROTO;
> > > > > +
> > > > > +	if (!xe_exec_queue_is_multi_queue_primary(q)) {
> > > > > +		drm_err(&xe->drm, "Unexpected CGP_SYNC_DONE response");
> > > > > +		return -EPROTO;
> > > > > +	}
> > > > > +
> > > > > +	/* Wakeup the serialized cgp update wait */
> > > > > +	WRITE_ONCE(q->multi_queue.group->sync_pending, false);
> > > >
> > > > So here - I suspect we need to associate the CGP_SYNC_DONE with a
> > > > secondary queue state tracking in order to get VF migration to work.
> > > > Again we can figure his part of a bit later but should be considered.
> > > >
> > > 
> > > Hmm..ok.
> > > 
> > > > Matt
> > > >
> > > > > +	wake_up_all(&guc->ct.wq);
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > >  static void
> > > > >  guc_exec_queue_wq_snapshot_capture(struct xe_exec_queue *q,
> > > > >  				   struct xe_guc_submit_exec_queue_snapshot *snapshot)
> > > > > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h
> > > > > index b49a2748ec46..abfa94bce391 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_guc_submit.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_guc_submit.h
> > > > > @@ -34,6 +34,7 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
> > > > >  					       u32 len);
> > > > >  int xe_guc_exec_queue_reset_failure_handler(struct xe_guc *guc, u32 *msg, u32 len);
> > > > >  int xe_guc_error_capture_handler(struct xe_guc *guc, u32 *msg, u32 len);
> > > > > +int xe_guc_exec_queue_cgp_sync_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
> > > > >
> > > > >  struct xe_guc_submit_exec_queue_snapshot *
> > > > >  xe_guc_exec_queue_snapshot_capture(struct xe_exec_queue *q);
> > > > > --
> > > > > 2.43.0
> > > > >

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2025-11-04 19:26 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-31 18:29 [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
2025-10-31 18:29 ` [PATCH 01/16] drm/xe/multi_queue: Add multi_queue_enable_mask to gt information Niranjana Vishwanathapura
2025-11-02  0:01   ` Matthew Brost
2025-11-03  1:25     ` Niranjana Vishwanathapura
2025-10-31 18:29 ` [PATCH 02/16] drm/xe/multi_queue: Add user interface for multi queue support Niranjana Vishwanathapura
2025-10-31 19:31   ` Matthew Brost
2025-11-03 22:58     ` Niranjana Vishwanathapura
2025-11-02  0:23   ` Matthew Brost
2025-11-03 22:59     ` Niranjana Vishwanathapura
2025-11-02 17:37   ` Matthew Brost
2025-11-03 23:06     ` Niranjana Vishwanathapura
2025-10-31 18:29 ` [PATCH 03/16] drm/xe/multi_queue: Add GuC " Niranjana Vishwanathapura
2025-11-01 18:07   ` Matthew Brost
2025-11-04  4:56     ` Niranjana Vishwanathapura
2025-11-04 17:41       ` Matthew Brost
2025-11-04 18:55         ` Niranjana Vishwanathapura
2025-11-04 19:26           ` Matthew Brost
2025-11-02 18:02   ` Matthew Brost
2025-11-04  5:02     ` Niranjana Vishwanathapura
2025-10-31 18:29 ` [PATCH 04/16] drm/xe/multi_queue: Add multi queue priority property Niranjana Vishwanathapura
2025-11-01 23:59   ` Matthew Brost
2025-11-03  4:45     ` Niranjana Vishwanathapura
2025-10-31 18:29 ` [PATCH 05/16] drm/xe/multi_queue: Handle invalid exec queue property setting Niranjana Vishwanathapura
2025-11-03 22:41   ` Matthew Brost
2025-10-31 18:29 ` [PATCH 06/16] drm/xe/multi_queue: Add exec_queue set_property ioctl support Niranjana Vishwanathapura
2025-11-02 16:53   ` Matthew Brost
2025-11-03  1:49     ` Niranjana Vishwanathapura
2025-10-31 18:29 ` [PATCH 07/16] drm/xe/multi_queue: Add support for multi queue dynamic priority change Niranjana Vishwanathapura
2025-11-01 23:23   ` Matthew Brost
2025-11-03 18:06     ` Niranjana Vishwanathapura
2025-11-01 23:41   ` Matthew Brost
2025-11-03 18:14     ` Niranjana Vishwanathapura
2025-11-03 19:05       ` Matthew Brost
2025-10-31 18:29 ` [PATCH 08/16] drm/xe/multi_queue: Add multi queue information to guc_info dump Niranjana Vishwanathapura
2025-11-01 18:31   ` Matthew Brost
2025-11-03  1:15     ` Niranjana Vishwanathapura
2025-10-31 18:29 ` [PATCH 09/16] drm/xe/multi_queue: Handle tearing down of a multi queue Niranjana Vishwanathapura
2025-11-02  0:39   ` Matthew Brost
2025-11-04  3:35     ` Niranjana Vishwanathapura
2025-10-31 18:29 ` [PATCH 10/16] drm/xe/multi_queue: Set QUEUE_DRAIN_MODE for Multi Queue batches Niranjana Vishwanathapura
2025-11-02 18:22   ` Matthew Brost
2025-11-03 17:09     ` Niranjana Vishwanathapura
2025-10-31 18:29 ` [PATCH 11/16] drm/xe/multi_queue: Handle CGP context error Niranjana Vishwanathapura
2025-11-02 18:29   ` Matthew Brost
2025-11-03 16:44     ` Niranjana Vishwanathapura
2025-11-03 17:18       ` Matthew Brost
2025-10-31 18:29 ` [PATCH 12/16] drm/xe/multi_queue: Tracepoint support Niranjana Vishwanathapura
2025-11-01 18:32   ` Matthew Brost
2025-10-31 18:29 ` [PATCH 13/16] drm/xe/multi_queue: Support active group after primary is destroyed Niranjana Vishwanathapura
2025-11-03 22:05   ` Matthew Brost
2025-11-04 17:24     ` Niranjana Vishwanathapura
2025-11-04 17:30       ` Niranjana Vishwanathapura
2025-10-31 18:29 ` [PATCH 14/16] drm/xe/doc: Add documentation for Multi Queue Group Niranjana Vishwanathapura
2025-10-31 18:29 ` [PATCH 15/16] drm/xe/doc: Add documentation for Multi Queue Group GuC interface Niranjana Vishwanathapura
2025-10-31 18:29 ` [PATCH 16/16] drm/xe/multi_queue: Enable multi_queue on xe3p_xpc Niranjana Vishwanathapura
2025-11-02  0:05   ` Matthew Brost
2025-10-31 18:47 ` [PATCH 00/16] drm/xe: Multi Queue feature support Niranjana Vishwanathapura
2025-10-31 21:15 ` ✗ CI.checkpatch: warning for " Patchwork
2025-10-31 21:16 ` ✓ CI.KUnit: success " Patchwork
2025-10-31 22:19 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-11-01 11:25 ` ✗ Xe.CI.Full: " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox