[PATCH v1 0/3] drm/xe/vf: Post-migration recovery of GGTT nodes and CTB

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v1 0/3] drm/xe/vf: Post-migration recovery of GGTT nodes and CTB
@ 2024-11-16  1:27 Tomasz Lis
  2024-11-16  1:27 ` [PATCH v1 1/3] drm/drm_mm: Safe macro for iterating through nodes in range Tomasz Lis
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Tomasz Lis @ 2024-11-16  1:27 UTC (permalink / raw)
  To: intel-xe
  Cc: Michał Winiarski, Michał Wajdeczko,
	Piotr Piórkowski

To support VF Migration, it is necessary to do fixups to any
non-virtualized resources. These fixups need to be applied within
VM, on the KMD working with VF.

This series adds two fixup functions to the recovery worker:
* for fixing drm_mm nodes which represent GGTT allocations
* for fixing content of outgoing CTB buffer

Tomasz Lis (3):
  drm/drm_mm: Safe macro for iterating through nodes in range
  drm/xe/sriov: Shifting GGTT area post migration
  drm/xe/vf: Fixup CTB send buffer messages after migration

 drivers/gpu/drm/xe/xe_gt_sriov_vf.c       | 177 ++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_gt_sriov_vf.h       |   1 +
 drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h |   2 +
 drivers/gpu/drm/xe/xe_guc_ct.c            | 144 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_guc_ct.h            |   2 +
 drivers/gpu/drm/xe/xe_sriov_vf.c          |  26 ++++
 include/drm/drm_mm.h                      |  19 +++
 7 files changed, 371 insertions(+)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v1 1/3] drm/drm_mm: Safe macro for iterating through nodes in range
  2024-11-16  1:27 [PATCH v1 0/3] drm/xe/vf: Post-migration recovery of GGTT nodes and CTB Tomasz Lis
@ 2024-11-16  1:27 ` Tomasz Lis
  2024-11-16  1:27 ` [PATCH v1 2/3] drm/xe/sriov: Shifting GGTT area post migration Tomasz Lis
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Tomasz Lis @ 2024-11-16  1:27 UTC (permalink / raw)
  To: intel-xe
  Cc: Michał Winiarski, Michał Wajdeczko,
	Piotr Piórkowski

Benefits of drm_mm_for_each_node_safe and drm_mm_for_each_node_in_range
squished together into one macro.

Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
---
 include/drm/drm_mm.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/include/drm/drm_mm.h b/include/drm/drm_mm.h
index f654874c4ce6..43e99441f6ba 100644
--- a/include/drm/drm_mm.h
+++ b/include/drm/drm_mm.h
@@ -504,6 +504,25 @@ __drm_mm_interval_first(const struct drm_mm *mm, u64 start, u64 last);
 	     node__->start < (end__);					\
 	     node__ = list_next_entry(node__, node_list))
 
+/**
+ * drm_mm_for_each_node_in_range_safe - iterator to walk over a range of
+ * allocated nodes
+ * @node__: drm_mm_node structure to assign to in each iteration step
+ * @next__: &struct drm_mm_node to store the next step
+ * @mm__: drm_mm allocator to walk
+ * @start__: starting offset, the first node will overlap this
+ * @end__: ending offset, the last node will start before this (but may overlap)
+ *
+ * This iterator walks over all nodes in the range allocator that lie
+ * between @start and @end. It is implemented similarly to list_for_each_safe(),
+ * so safe against removal of elements.
+ */
+#define drm_mm_for_each_node_in_range_safe(node__, next__, mm__, start__, end__)	\
+	for (node__ = __drm_mm_interval_first((mm__), (start__), (end__)-1), \
+		next__ = list_next_entry(node__, node_list); \
+	     node__->start < (end__);					\
+	     node__ = next__, next__ = list_next_entry(next__, node_list))
+
 void drm_mm_scan_init_with_range(struct drm_mm_scan *scan,
 				 struct drm_mm *mm,
 				 u64 size, u64 alignment, unsigned long color,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v1 2/3] drm/xe/sriov: Shifting GGTT area post migration
  2024-11-16  1:27 [PATCH v1 0/3] drm/xe/vf: Post-migration recovery of GGTT nodes and CTB Tomasz Lis
  2024-11-16  1:27 ` [PATCH v1 1/3] drm/drm_mm: Safe macro for iterating through nodes in range Tomasz Lis
@ 2024-11-16  1:27 ` Tomasz Lis
  2024-11-16  1:27 ` [PATCH v1 3/3] drm/xe/vf: Fixup CTB send buffer messages after migration Tomasz Lis
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Tomasz Lis @ 2024-11-16  1:27 UTC (permalink / raw)
  To: intel-xe
  Cc: Michał Winiarski, Michał Wajdeczko,
	Piotr Piórkowski

We have only one GGTT for all IOV functions, with each VF having assigned
a range of addresses for its use. After migration, a VF can receive a
different range of addresses than it had initially.

This implements shifting GGTT addresses within drm_mm nodes, so that
VMAs stay valid after migration. This will make the driver use new
addresses when accessing GGTT from the moment the shifting ends.

By taking the ggtt->lock for the period of VMA fixups, this change
also adds constaint on that mutex. Any locks used during the recovery
cannot ever wait for hardware response - because after migration,
the hardware will not do anything until fixups are finished.

Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_sriov_vf.c | 175 ++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_gt_sriov_vf.h |   1 +
 drivers/gpu/drm/xe/xe_sriov_vf.c    |  14 +++
 3 files changed, 190 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
index cca5d5732802..ae24c47ed8f8 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
@@ -912,6 +912,181 @@ int xe_gt_sriov_vf_query_runtime(struct xe_gt *gt)
 	return err;
 }
 
+static u64 drm_mm_node_end(struct drm_mm_node *node)
+{
+	return node->start + node->size;
+}
+
+static s64 vf_get_post_migration_ggtt_shift(struct xe_gt *gt)
+{
+	struct xe_gt_sriov_vf_selfconfig *config = &gt->sriov.vf.self_config;
+	struct xe_tile *tile = gt_to_tile(gt);
+	u64 old_base;
+	s64 ggtt_shift;
+
+	old_base = drm_mm_node_end(&tile->sriov.vf.ggtt_balloon[0]->base);
+	ggtt_shift = config->ggtt_base - (s64)old_base;
+
+	xe_gt_sriov_info(gt, "GGTT base shifted from %#llx to %#llx\n",
+		  old_base, old_base + ggtt_shift);
+
+	return ggtt_shift;
+}
+
+static void xe_ggtt_mm_shift_nodes(struct xe_ggtt *ggtt, struct drm_mm_node *balloon_beg,
+				struct drm_mm_node *balloon_fin, s64 shift)
+{
+	struct drm_mm_node *node, *tmpn;
+	int err;
+	LIST_HEAD(temp_list_head);
+
+	lockdep_assert_held(&ggtt->lock);
+
+	/*
+	 * Move nodes, from range previously assigned to this VF, into temp list.
+	 *
+	 * The balloon_beg and balloon_fin nodes are there to eliminate unavailable
+	 * ranges from use: first reserves the GGTT area below the range for current VF,
+	 * and second reserves area above. There may also exist extra nodes at the bottom
+	 * or top of GGTT range, as long as there are no free spaces inbetween. Such
+	 * extra nodes will be left unchanged.
+	 *
+	 * Below is a GGTT layout of example VF, with a certain address range assigned to
+	 * said VF, and inaccessible areas above and below:
+	 *
+	 *  0                                                                  ggtt->size
+	 *  |<--------------------------- Total GGTT size ----------------------------->|
+	 *
+	 *  +-----------+-------------------------+----------+--------------+-----------+
+	 *  |\\\\\\\\\\\|/////////////////////////|  VF mem  |//////////////|\\\\\\\\\\\|
+	 *  +-----------+-------------------------+----------+--------------+-----------+
+	 *
+	 * Hardware enforced access rules before migration:
+	 *
+	 *  |<------- inaccessible for VF ------->|<VF owned>|<-- inaccessible for VF ->|
+	 *
+	 * drm_mm nodes used for tracking allocations:
+	 *
+	 *  |<- extra ->|<------- balloon ------->|<- nodes->|<-- balloon ->|<- extra ->|
+	 *
+	 * After the migration, GGTT area assigned to the VF might have shifted, either
+	 * to lower or to higher address. But we expect the total size and extra areas to
+	 * be identical, as migration can only happen between matching platforms.
+	 * Below is an example of GGTT layout of the VF after migration. Content of the
+	 * GGTT for VF has been moved to a new area, and we receive its address from GuC:
+	 *
+	 *  +-----------+--------------+----------+-------------------------+-----------+
+	 *  |\\\\\\\\\\\|//////////////|  VF mem  |/////////////////////////|\\\\\\\\\\\|
+	 *  +-----------+--------------+----------+-------------------------+-----------+
+	 *
+	 * Hardware enforced access rules after migration:
+	 *
+	 *  |<- inaccessible for VF -->|<VF owned>|<------- inaccessible for VF ------->|
+	 *
+	 * So the VF has a new slice of GGTT assigned, and during migration process, the
+	 * memory content was copied to that new area. But the drm_mm nodes within i915
+	 * are still tracking allocations using the old addresses. The nodes within VF
+	 * owned area have to be shifted, and balloon nodes need to be resized to
+	 * properly mask out areas not owned by the VF.
+	 *
+	 * Fixed drm_mm nodes used for tracking allocations:
+	 *
+	 *  |<- extra  ->|<- balloon ->|<-- VF -->|<-------- balloon ------>|<- extra ->|
+	 *
+	 * Due to use of GPU profiles, we do not expect the old and new GGTT ares to
+	 * overlap; but our node shifting will fix addresses properly regardless.
+	 *
+	 */
+	drm_mm_for_each_node_in_range_safe(node, tmpn, &ggtt->mm,
+					   drm_mm_node_end(balloon_beg),
+					   balloon_fin->start) {
+		drm_mm_remove_node(node);
+		list_add(&node->node_list, &temp_list_head);
+	}
+
+	/* shift and re-add ballooning nodes */
+	if (drm_mm_node_allocated(balloon_beg))
+		drm_mm_remove_node(balloon_beg);
+	if (drm_mm_node_allocated(balloon_fin))
+		drm_mm_remove_node(balloon_fin);
+	balloon_beg->size += shift;
+	balloon_fin->start += shift;
+	balloon_fin->size -= shift;
+	if (balloon_beg->size != 0) {
+		err = drm_mm_reserve_node(&ggtt->mm, balloon_beg);
+		XE_WARN_ON(err);
+	}
+	if (balloon_fin->size != 0) {
+		err = drm_mm_reserve_node(&ggtt->mm, balloon_fin);
+		XE_WARN_ON(err);
+	}
+
+	/*
+	 * Now the GGTT VM contains only nodes outside of area assigned to this VF.
+	 * We can re-add all VF nodes with shifted offsets.
+	 */
+	list_for_each_entry_safe(node, tmpn, &temp_list_head, node_list) {
+		list_del(&node->node_list);
+		node->start += shift;
+		err = drm_mm_reserve_node(&ggtt->mm, node);
+		XE_WARN_ON(err);
+	}
+}
+
+static void xe_ggtt_node_shift_nodes(struct xe_ggtt *ggtt, struct xe_ggtt_node *balloon_beg,
+				struct xe_ggtt_node *balloon_fin, s64 shift)
+{
+	struct drm_mm_node *balloon_mm_beg, *balloon_mm_end;
+	struct drm_mm_node loc_beg, loc_end;
+
+	if (balloon_beg && balloon_beg->ggtt)
+		balloon_mm_beg = &balloon_beg->base;
+	else {
+		loc_beg.color = 0;
+		loc_beg.flags = 0;
+		loc_beg.start = xe_wopcm_size(ggtt->tile->xe);
+		loc_beg.size = 0;
+		balloon_mm_beg = &loc_beg;
+	}
+
+	if (balloon_fin && balloon_fin->ggtt)
+		balloon_mm_end = &balloon_fin->base;
+	else {
+		loc_end.color = 0;
+		loc_end.flags = 0;
+		loc_end.start = GUC_GGTT_TOP;
+		loc_end.size = 0;
+		balloon_mm_end = &loc_end;
+	}
+
+	drm_dbg(&ggtt->tile->xe->drm, "tli: node shift start beg %llx %llx end %llx %llx\n",
+		balloon_mm_beg->start, balloon_mm_beg->size,
+		balloon_mm_end->start, balloon_mm_end->size);
+	xe_ggtt_mm_shift_nodes(ggtt, balloon_mm_beg, balloon_mm_end, shift);
+	drm_dbg(&ggtt->tile->xe->drm, "tli: node shift end\n");
+}
+
+/**
+ * xe_gt_sriov_vf_fixup_ggtt_nodes - Shift GGTT allocations to match assigned range.
+ * @gt: the &xe_gt struct instance
+ *
+ * Since Global GTT is not virtualized, each VF has an assigned range
+ * within the global space. This range might have changed during migration,
+ * which requires all memory addresses pointing to GGTT to be shifted.
+ */
+void xe_gt_sriov_vf_fixup_ggtt_nodes(struct xe_gt *gt)
+{
+	struct xe_tile *tile = gt_to_tile(gt);
+	struct xe_ggtt *ggtt = tile->mem.ggtt;
+	s64 ggtt_shift;
+
+	mutex_lock(&ggtt->lock);
+	ggtt_shift = vf_get_post_migration_ggtt_shift(gt);
+	xe_ggtt_node_shift_nodes(ggtt, tile->sriov.vf.ggtt_balloon[0],
+				 tile->sriov.vf.ggtt_balloon[1], ggtt_shift);
+	mutex_unlock(&ggtt->lock);
+}
+
 static int vf_runtime_reg_cmp(const void *a, const void *b)
 {
 	const struct vf_runtime_reg *ra = a;
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
index 912d20814261..a8745ec23380 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.h
@@ -17,6 +17,7 @@ int xe_gt_sriov_vf_query_config(struct xe_gt *gt);
 int xe_gt_sriov_vf_connect(struct xe_gt *gt);
 int xe_gt_sriov_vf_query_runtime(struct xe_gt *gt);
 int xe_gt_sriov_vf_prepare_ggtt(struct xe_gt *gt);
+void xe_gt_sriov_vf_fixup_ggtt_nodes(struct xe_gt *gt);
 int xe_gt_sriov_vf_notify_resfix_done(struct xe_gt *gt);
 void xe_gt_sriov_vf_migrated_event_handler(struct xe_gt *gt);
 
diff --git a/drivers/gpu/drm/xe/xe_sriov_vf.c b/drivers/gpu/drm/xe/xe_sriov_vf.c
index c1275e64aa9c..1d2559343706 100644
--- a/drivers/gpu/drm/xe/xe_sriov_vf.c
+++ b/drivers/gpu/drm/xe/xe_sriov_vf.c
@@ -170,6 +170,19 @@ static bool vf_post_migration_imminent(struct xe_device *xe)
 	work_pending(&xe->sriov.vf.migration.worker);
 }
 
+static void vf_post_migration_fixup_ggtt_nodes(struct xe_device *xe)
+{
+	struct xe_gt *gt;
+	unsigned int id;
+
+	for_each_gt(gt, xe, id) {
+		/* media doesn't have its own ggtt */
+		if (xe_gt_is_media_type(gt))
+			continue;
+		xe_gt_sriov_vf_fixup_ggtt_nodes(gt);
+	}
+}
+
 /*
  * Notify all GuCs about resource fixups apply finished.
  */
@@ -201,6 +214,7 @@ static void vf_post_migration_recovery(struct xe_device *xe)
 	if (unlikely(err))
 		goto fail;
 
+	vf_post_migration_fixup_ggtt_nodes(xe);
 	/* FIXME: add the recovery steps */
 	vf_post_migration_notify_resfix_done(xe);
 	xe_pm_runtime_put(xe);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v1 3/3] drm/xe/vf: Fixup CTB send buffer messages after migration
  2024-11-16  1:27 [PATCH v1 0/3] drm/xe/vf: Post-migration recovery of GGTT nodes and CTB Tomasz Lis
  2024-11-16  1:27 ` [PATCH v1 1/3] drm/drm_mm: Safe macro for iterating through nodes in range Tomasz Lis
  2024-11-16  1:27 ` [PATCH v1 2/3] drm/xe/sriov: Shifting GGTT area post migration Tomasz Lis
@ 2024-11-16  1:27 ` Tomasz Lis
  2024-11-16  1:37 ` ✓ CI.Patch_applied: success for drm/xe/vf: Post-migration recovery of GGTT nodes and CTB Patchwork
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Tomasz Lis @ 2024-11-16  1:27 UTC (permalink / raw)
  To: intel-xe
  Cc: Michał Winiarski, Michał Wajdeczko,
	Piotr Piórkowski

During post-migration recovery of a VF, it in necessary to update
GGTT references included in messages which are going to be sent
to GuC. GuC will start consuming messages after VF KMD will inform
it about fixups being done; before that, the VF KMD is expected
to update any H2G messages which are already in send buffer but
were not consumed by GuC.

Only a small subset of messages allowed for VFs have GGTT references
in them. This patch adds the functionality to parse the CTB send
ring buffer and shift addresses contained within.

While fixing the CTB content, ct->lock is not taken. This means
the only barier taken remains GGTT address lock - which is ok,
because only requests with GGTT addresses matter, but it also means
tail changes can happen during the CTB fixups execution (which may
be ignored as any new messages will not have anything to fix).

The GGTT address locking will be introduced in a future series.

Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_sriov_vf.c       |   2 +
 drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h |   2 +
 drivers/gpu/drm/xe/xe_guc_ct.c            | 144 ++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_guc_ct.h            |   2 +
 drivers/gpu/drm/xe/xe_sriov_vf.c          |  12 ++
 5 files changed, 162 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
index ae24c47ed8f8..604cbbf55d4f 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c
@@ -920,12 +920,14 @@ static u64 drm_mm_node_end(struct drm_mm_node *node)
 static s64 vf_get_post_migration_ggtt_shift(struct xe_gt *gt)
 {
 	struct xe_gt_sriov_vf_selfconfig *config = &gt->sriov.vf.self_config;
+	struct xe_gt_sriov_vf_runtime *runtime = &gt->sriov.vf.runtime;
 	struct xe_tile *tile = gt_to_tile(gt);
 	u64 old_base;
 	s64 ggtt_shift;
 
 	old_base = drm_mm_node_end(&tile->sriov.vf.ggtt_balloon[0]->base);
 	ggtt_shift = config->ggtt_base - (s64)old_base;
+	runtime->ggtt_shift = ggtt_shift;
 
 	xe_gt_sriov_info(gt, "GGTT base shifted from %#llx to %#llx\n",
 		  old_base, old_base + ggtt_shift);
diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
index a57f13b5afcd..6af219b0eb1e 100644
--- a/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf_types.h
@@ -65,6 +65,8 @@ struct xe_gt_sriov_vf_runtime {
 		/** @regs.value: register value. */
 		u32 value;
 	} *regs;
+	/** @ggtt_shift: difference in ggtt_base on last migration */
+	s64 ggtt_shift;
 };
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 7eb175a0b874..7e8a8f925589 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -84,6 +84,8 @@ struct g2h_fence {
 	bool done;
 };
 
+#define make_u64(hi, lo) ((u64)((u64)(u32)(hi) << 32 | (u32)(lo)))
+
 static void g2h_fence_init(struct g2h_fence *g2h_fence, u32 *response_buffer)
 {
 	g2h_fence->response_buffer = response_buffer;
@@ -1620,6 +1622,148 @@ static void g2h_worker_func(struct work_struct *w)
 	receive_g2h(ct);
 }
 
+/*
+ * ct_update_addresses_in_message - Shift any GGTT addresses within
+ * a single message left within CTB from before post-migration recovery.
+ * @ct: pointer to CT struct of the target GuC
+ * @cmds: iomap buffer containing CT messages
+ * @head: start of the target message within the buffer
+ * @len: length of the target message
+ * @size: size of the commands buffer
+ * @shift: the address shift to be added to each GGTT reference
+ */
+static void ct_update_addresses_in_message(struct xe_guc_ct *ct,
+					    struct iosys_map *cmds, u32 head,
+					    u32 len, u32 size, s64 shift)
+{
+	struct xe_device *xe = ct_to_xe(ct);
+	u32 action, i, n;
+	u32 msg[2];
+	u64 offset;
+
+#define read32(o, p)							\
+	xe_map_memcpy_from(xe, msg, cmds, (head + p) * sizeof(u32),	\
+			   1 * sizeof(u32));				\
+	o = msg[0]
+#define fixup64(p)							\
+	xe_map_memcpy_from(xe, msg, cmds, (head + p) * sizeof(u32),	\
+			   2 * sizeof(u32));				\
+	offset = make_u64(msg[1], msg[0]);				\
+	offset += shift;						\
+	msg[0] = lower_32_bits(offset);					\
+	msg[1] = upper_32_bits(offset);					\
+	xe_map_memcpy_to(xe, cmds, (head + p) * sizeof(u32), msg, 2 * sizeof(u32))
+
+	xe_map_memcpy_from(xe, msg, cmds, head * sizeof(u32),
+			   1 * sizeof(u32));
+	action = FIELD_GET(GUC_HXG_REQUEST_MSG_0_ACTION, msg[0]);
+	switch (action)
+	{
+	case XE_GUC_ACTION_REGISTER_CONTEXT:
+	case XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC:
+		/* field wq_desc */
+		fixup64(5);
+		/* field wq_base */
+		fixup64(7);
+		if (action == XE_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC) {
+			/* field number_children */
+			read32(n, 10);
+			/* field hwlrca and child lrcas */
+			for (i = 0; i < n; i++) {
+				fixup64(11 + 2 * i);
+			}
+		} else {
+			/* field hwlrca */
+			fixup64(10);
+		}
+		break;
+	default:
+		break;
+	}
+#undef fixup64
+#undef read32
+}
+
+static int ct_update_addresses_in_buffer(struct xe_guc_ct *ct,
+					 struct guc_ctb *h2g,
+					 s64 shift, u32 *mhead, s32 avail)
+{
+	struct xe_device *xe = ct_to_xe(ct);
+	u32 head = *mhead;
+	u32 size = h2g->info.size;
+	u32 msg[1];
+	u32 len;
+
+	/* Read header */
+	xe_map_memcpy_from(xe, msg, &h2g->cmds, sizeof(u32) * head,
+			   sizeof(u32));
+	len = FIELD_GET(GUC_CTB_MSG_0_NUM_DWORDS, msg[0]) + GUC_CTB_MSG_MIN_LEN;
+
+	if (unlikely(len > (u32)avail)) {
+		struct xe_gt *gt = ct_to_gt(ct);
+
+		xe_gt_err(gt, "H2G channel broken on read, avail=%d, len=%d, fixups skipped\n",
+			  avail, len);
+		return 0;
+	}
+
+	head = (head + 1) % size;
+	ct_update_addresses_in_message(ct, &h2g->cmds, head, len - 1, size, shift);
+	*mhead = (head + len - 1) % size;
+
+	return avail - len;
+}
+
+/**
+ * xe_guc_ct_update_addresses - Shifts any GGTT addresses left
+ * within CTB from before post-migration recovery.
+ * @ct: pointer to CT struct of the target GuC
+ */
+int xe_guc_ct_update_addresses(struct xe_guc_ct *ct)
+{
+	struct xe_guc *guc = ct_to_guc(ct);
+	struct xe_gt *gt = guc_to_gt(guc);
+	struct xe_gt_sriov_vf_runtime *runtime = &gt->sriov.vf.runtime;
+	struct guc_ctb *h2g = &ct->ctbs.h2g;
+	u32 head = h2g->info.head;
+	u32 tail = READ_ONCE(h2g->info.tail);
+	u32 size = h2g->info.size;
+	s32 avail;
+	s64 ggtt_shift;
+
+	if (unlikely(h2g->info.broken))
+		return -EPIPE;
+
+	XE_WARN_ON(head > size);
+
+	if (unlikely(tail >= size)) {
+		xe_gt_err(gt, "H2G channel has Invalid tail offset (%u >= %u)\n",
+			 tail, size);
+		goto corrupted;
+	}
+
+	avail = tail - head;
+
+	/* beware of buffer wrap case */
+	if (unlikely(avail < 0))
+		avail += size;
+	xe_gt_dbg(gt, "available %d (%u:%u:%u)\n", avail, head, tail, size);
+	XE_WARN_ON(avail < 0);
+
+	ggtt_shift = runtime->ggtt_shift;
+
+	while (avail > 0)
+		avail = ct_update_addresses_in_buffer(ct, h2g, ggtt_shift, &head, avail);
+
+	return 0;
+
+corrupted:
+	xe_gt_err(gt, "Corrupted descriptor head=%u tail=%u\n",
+		 head, tail);
+	h2g->info.broken = true;
+	return -EPIPE;
+}
+
 static struct xe_guc_ct_snapshot *guc_ct_snapshot_alloc(struct xe_guc_ct *ct, bool atomic,
 							bool want_ctb)
 {
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.h b/drivers/gpu/drm/xe/xe_guc_ct.h
index 82c4ae458dda..6b04fd4b1e03 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.h
+++ b/drivers/gpu/drm/xe/xe_guc_ct.h
@@ -22,6 +22,8 @@ void xe_guc_ct_snapshot_print(struct xe_guc_ct_snapshot *snapshot, struct drm_pr
 void xe_guc_ct_snapshot_free(struct xe_guc_ct_snapshot *snapshot);
 void xe_guc_ct_print(struct xe_guc_ct *ct, struct drm_printer *p, bool want_ctb);
 
+int xe_guc_ct_update_addresses(struct xe_guc_ct *ct);
+
 static inline bool xe_guc_ct_enabled(struct xe_guc_ct *ct)
 {
 	return ct->state == XE_GUC_CT_STATE_ENABLED;
diff --git a/drivers/gpu/drm/xe/xe_sriov_vf.c b/drivers/gpu/drm/xe/xe_sriov_vf.c
index 1d2559343706..5bd6172815ae 100644
--- a/drivers/gpu/drm/xe/xe_sriov_vf.c
+++ b/drivers/gpu/drm/xe/xe_sriov_vf.c
@@ -9,6 +9,7 @@
 #include "xe_device.h"
 #include "xe_gt_sriov_printk.h"
 #include "xe_gt_sriov_vf.h"
+#include "xe_guc_ct.h"
 #include "xe_pm.h"
 #include "xe_sriov.h"
 #include "xe_sriov_printk.h"
@@ -157,6 +158,15 @@ static int vf_post_migration_requery_guc(struct xe_device *xe)
 	return ret;
 }
 
+static void vf_post_migration_fixup_ctb(struct xe_device *xe)
+{
+	struct xe_gt *gt;
+	unsigned int id;
+
+	for_each_gt(gt, xe, id)
+		xe_guc_ct_update_addresses(&gt->uc.guc.ct);
+}
+
 /*
  * vf_post_migration_imminent - Check if post-restore recovery is coming.
  * @xe: the &xe_device struct instance
@@ -216,6 +226,8 @@ static void vf_post_migration_recovery(struct xe_device *xe)
 
 	vf_post_migration_fixup_ggtt_nodes(xe);
 	/* FIXME: add the recovery steps */
+	vf_post_migration_fixup_ctb(xe);
+
 	vf_post_migration_notify_resfix_done(xe);
 	xe_pm_runtime_put(xe);
 	drm_notice(&xe->drm, "migration recovery ended\n");
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* ✓ CI.Patch_applied: success for drm/xe/vf: Post-migration recovery of GGTT nodes and CTB
  2024-11-16  1:27 [PATCH v1 0/3] drm/xe/vf: Post-migration recovery of GGTT nodes and CTB Tomasz Lis
                   ` (2 preceding siblings ...)
  2024-11-16  1:27 ` [PATCH v1 3/3] drm/xe/vf: Fixup CTB send buffer messages after migration Tomasz Lis
@ 2024-11-16  1:37 ` Patchwork
  2024-11-16  1:37 ` ✗ CI.checkpatch: warning " Patchwork
  2024-11-16  1:38 ` ✗ CI.KUnit: failure " Patchwork
  5 siblings, 0 replies; 7+ messages in thread
From: Patchwork @ 2024-11-16  1:37 UTC (permalink / raw)
  To: Tomasz Lis; +Cc: intel-xe

== Series Details ==

Series: drm/xe/vf: Post-migration recovery of GGTT nodes and CTB
URL   : https://patchwork.freedesktop.org/series/141439/
State : success

== Summary ==

=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: 9a7388467f79 drm-tip: 2024y-11m-16d-00h-00m-45s UTC integration manifest
=== git am output follows ===
Applying: drm/drm_mm: Safe macro for iterating through nodes in range
Applying: drm/xe/sriov: Shifting GGTT area post migration
Applying: drm/xe/vf: Fixup CTB send buffer messages after migration



^ permalink raw reply	[flat|nested] 7+ messages in thread

* ✗ CI.checkpatch: warning for drm/xe/vf: Post-migration recovery of GGTT nodes and CTB
  2024-11-16  1:27 [PATCH v1 0/3] drm/xe/vf: Post-migration recovery of GGTT nodes and CTB Tomasz Lis
                   ` (3 preceding siblings ...)
  2024-11-16  1:37 ` ✓ CI.Patch_applied: success for drm/xe/vf: Post-migration recovery of GGTT nodes and CTB Patchwork
@ 2024-11-16  1:37 ` Patchwork
  2024-11-16  1:38 ` ✗ CI.KUnit: failure " Patchwork
  5 siblings, 0 replies; 7+ messages in thread
From: Patchwork @ 2024-11-16  1:37 UTC (permalink / raw)
  To: Tomasz Lis; +Cc: intel-xe

== Series Details ==

Series: drm/xe/vf: Post-migration recovery of GGTT nodes and CTB
URL   : https://patchwork.freedesktop.org/series/141439/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
30ab6715fc09baee6cc14cb3c89ad8858688d474
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit a381e4c3664a73f4acc56650dfb4440a255a98b3
Author: Tomasz Lis <tomasz.lis@intel.com>
Date:   Sat Nov 16 02:27:55 2024 +0100

    drm/xe/vf: Fixup CTB send buffer messages after migration
    
    During post-migration recovery of a VF, it in necessary to update
    GGTT references included in messages which are going to be sent
    to GuC. GuC will start consuming messages after VF KMD will inform
    it about fixups being done; before that, the VF KMD is expected
    to update any H2G messages which are already in send buffer but
    were not consumed by GuC.
    
    Only a small subset of messages allowed for VFs have GGTT references
    in them. This patch adds the functionality to parse the CTB send
    ring buffer and shift addresses contained within.
    
    While fixing the CTB content, ct->lock is not taken. This means
    the only barier taken remains GGTT address lock - which is ok,
    because only requests with GGTT addresses matter, but it also means
    tail changes can happen during the CTB fixups execution (which may
    be ignored as any new messages will not have anything to fix).
    
    The GGTT address locking will be introduced in a future series.
    
    Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
+ /mt/dim checkpatch 9a7388467f79fb74c67a2444c5b1add91652f89e drm-intel
b9e1c651ef2e drm/drm_mm: Safe macro for iterating through nodes in range
-:32: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'node__' - possible side-effects?
#32: FILE: include/drm/drm_mm.h:520:
+#define drm_mm_for_each_node_in_range_safe(node__, next__, mm__, start__, end__)	\
+	for (node__ = __drm_mm_interval_first((mm__), (start__), (end__)-1), \
+		next__ = list_next_entry(node__, node_list); \
+	     node__->start < (end__);					\
+	     node__ = next__, next__ = list_next_entry(next__, node_list))

-:32: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'next__' - possible side-effects?
#32: FILE: include/drm/drm_mm.h:520:
+#define drm_mm_for_each_node_in_range_safe(node__, next__, mm__, start__, end__)	\
+	for (node__ = __drm_mm_interval_first((mm__), (start__), (end__)-1), \
+		next__ = list_next_entry(node__, node_list); \
+	     node__->start < (end__);					\
+	     node__ = next__, next__ = list_next_entry(next__, node_list))

-:32: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'end__' - possible side-effects?
#32: FILE: include/drm/drm_mm.h:520:
+#define drm_mm_for_each_node_in_range_safe(node__, next__, mm__, start__, end__)	\
+	for (node__ = __drm_mm_interval_first((mm__), (start__), (end__)-1), \
+		next__ = list_next_entry(node__, node_list); \
+	     node__->start < (end__);					\
+	     node__ = next__, next__ = list_next_entry(next__, node_list))

-:33: CHECK:SPACING: spaces preferred around that '-' (ctx:VxV)
#33: FILE: include/drm/drm_mm.h:521:
+	for (node__ = __drm_mm_interval_first((mm__), (start__), (end__)-1), \
 	                                                                ^

total: 0 errors, 0 warnings, 4 checks, 25 lines checked
26ae2cf9c496 drm/xe/sriov: Shifting GGTT area post migration
-:45: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#45: FILE: drivers/gpu/drm/xe/xe_gt_sriov_vf.c:931:
+	xe_gt_sriov_info(gt, "GGTT base shifted from %#llx to %#llx\n",
+		  old_base, old_base + ggtt_shift);

-:51: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#51: FILE: drivers/gpu/drm/xe/xe_gt_sriov_vf.c:937:
+static void xe_ggtt_mm_shift_nodes(struct xe_ggtt *ggtt, struct drm_mm_node *balloon_beg,
+				struct drm_mm_node *balloon_fin, s64 shift)

-:151: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#151: FILE: drivers/gpu/drm/xe/xe_gt_sriov_vf.c:1037:
+static void xe_ggtt_node_shift_nodes(struct xe_ggtt *ggtt, struct xe_ggtt_node *balloon_beg,
+				struct xe_ggtt_node *balloon_fin, s64 shift)

-:156: CHECK:BRACES: braces {} should be used on all arms of this statement
#156: FILE: drivers/gpu/drm/xe/xe_gt_sriov_vf.c:1042:
+	if (balloon_beg && balloon_beg->ggtt)
[...]
+	else {
[...]

-:158: CHECK:BRACES: Unbalanced braces around else statement
#158: FILE: drivers/gpu/drm/xe/xe_gt_sriov_vf.c:1044:
+	else {

-:166: CHECK:BRACES: braces {} should be used on all arms of this statement
#166: FILE: drivers/gpu/drm/xe/xe_gt_sriov_vf.c:1052:
+	if (balloon_fin && balloon_fin->ggtt)
[...]
+	else {
[...]

-:168: CHECK:BRACES: Unbalanced braces around else statement
#168: FILE: drivers/gpu/drm/xe/xe_gt_sriov_vf.c:1054:
+	else {

total: 0 errors, 0 warnings, 7 checks, 214 lines checked
a381e4c3664a drm/xe/vf: Fixup CTB send buffer messages after migration
-:87: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#87: FILE: drivers/gpu/drm/xe/xe_guc_ct.c:1636:
+static void ct_update_addresses_in_message(struct xe_guc_ct *ct,
+					    struct iosys_map *cmds, u32 head,

-:95: ERROR:MULTISTATEMENT_MACRO_USE_DO_WHILE: Macros with multiple statements should be enclosed in a do - while loop
#95: FILE: drivers/gpu/drm/xe/xe_guc_ct.c:1644:
+#define read32(o, p)							\
+	xe_map_memcpy_from(xe, msg, cmds, (head + p) * sizeof(u32),	\
+			   1 * sizeof(u32));				\
+	o = msg[0]

-:95: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'p' may be better as '(p)' to avoid precedence issues
#95: FILE: drivers/gpu/drm/xe/xe_guc_ct.c:1644:
+#define read32(o, p)							\
+	xe_map_memcpy_from(xe, msg, cmds, (head + p) * sizeof(u32),	\
+			   1 * sizeof(u32));				\
+	o = msg[0]

-:99: ERROR:MULTISTATEMENT_MACRO_USE_DO_WHILE: Macros with multiple statements should be enclosed in a do - while loop
#99: FILE: drivers/gpu/drm/xe/xe_guc_ct.c:1648:
+#define fixup64(p)							\
+	xe_map_memcpy_from(xe, msg, cmds, (head + p) * sizeof(u32),	\
+			   2 * sizeof(u32));				\
+	offset = make_u64(msg[1], msg[0]);				\
+	offset += shift;						\
+	msg[0] = lower_32_bits(offset);					\
+	msg[1] = upper_32_bits(offset);					\
+	xe_map_memcpy_to(xe, cmds, (head + p) * sizeof(u32), msg, 2 * sizeof(u32))

-:99: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'p' - possible side-effects?
#99: FILE: drivers/gpu/drm/xe/xe_guc_ct.c:1648:
+#define fixup64(p)							\
+	xe_map_memcpy_from(xe, msg, cmds, (head + p) * sizeof(u32),	\
+			   2 * sizeof(u32));				\
+	offset = make_u64(msg[1], msg[0]);				\
+	offset += shift;						\
+	msg[0] = lower_32_bits(offset);					\
+	msg[1] = upper_32_bits(offset);					\
+	xe_map_memcpy_to(xe, cmds, (head + p) * sizeof(u32), msg, 2 * sizeof(u32))

-:99: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'p' may be better as '(p)' to avoid precedence issues
#99: FILE: drivers/gpu/drm/xe/xe_guc_ct.c:1648:
+#define fixup64(p)							\
+	xe_map_memcpy_from(xe, msg, cmds, (head + p) * sizeof(u32),	\
+			   2 * sizeof(u32));				\
+	offset = make_u64(msg[1], msg[0]);				\
+	offset += shift;						\
+	msg[0] = lower_32_bits(offset);					\
+	msg[1] = upper_32_bits(offset);					\
+	xe_map_memcpy_to(xe, cmds, (head + p) * sizeof(u32), msg, 2 * sizeof(u32))

-:111: ERROR:OPEN_BRACE: that open brace { should be on the previous line
#111: FILE: drivers/gpu/drm/xe/xe_guc_ct.c:1660:
+	switch (action)
+	{

-:123: WARNING:BRACES: braces {} are not necessary for single statement blocks
#123: FILE: drivers/gpu/drm/xe/xe_guc_ct.c:1672:
+			for (i = 0; i < n; i++) {
+				fixup64(11 + 2 * i);
+			}

-:192: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#192: FILE: drivers/gpu/drm/xe/xe_guc_ct.c:1741:
+		xe_gt_err(gt, "H2G channel has Invalid tail offset (%u >= %u)\n",
+			 tail, size);

-:213: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#213: FILE: drivers/gpu/drm/xe/xe_guc_ct.c:1762:
+	xe_gt_err(gt, "Corrupted descriptor head=%u tail=%u\n",
+		 head, tail);

total: 3 errors, 1 warnings, 6 checks, 216 lines checked



^ permalink raw reply	[flat|nested] 7+ messages in thread

* ✗ CI.KUnit: failure for drm/xe/vf: Post-migration recovery of GGTT nodes and CTB
  2024-11-16  1:27 [PATCH v1 0/3] drm/xe/vf: Post-migration recovery of GGTT nodes and CTB Tomasz Lis
                   ` (4 preceding siblings ...)
  2024-11-16  1:37 ` ✗ CI.checkpatch: warning " Patchwork
@ 2024-11-16  1:38 ` Patchwork
  5 siblings, 0 replies; 7+ messages in thread
From: Patchwork @ 2024-11-16  1:38 UTC (permalink / raw)
  To: Tomasz Lis; +Cc: intel-xe

== Series Details ==

Series: drm/xe/vf: Post-migration recovery of GGTT nodes and CTB
URL   : https://patchwork.freedesktop.org/series/141439/
State : failure

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
ERROR:root:../lib/iomap.c:156:5: warning: no previous prototype for ‘ioread64_lo_hi’ [-Wmissing-prototypes]
  156 | u64 ioread64_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:163:5: warning: no previous prototype for ‘ioread64_hi_lo’ [-Wmissing-prototypes]
  163 | u64 ioread64_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:170:5: warning: no previous prototype for ‘ioread64be_lo_hi’ [-Wmissing-prototypes]
  170 | u64 ioread64be_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:178:5: warning: no previous prototype for ‘ioread64be_hi_lo’ [-Wmissing-prototypes]
  178 | u64 ioread64be_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:264:6: warning: no previous prototype for ‘iowrite64_lo_hi’ [-Wmissing-prototypes]
  264 | void iowrite64_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:272:6: warning: no previous prototype for ‘iowrite64_hi_lo’ [-Wmissing-prototypes]
  272 | void iowrite64_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:280:6: warning: no previous prototype for ‘iowrite64be_lo_hi’ [-Wmissing-prototypes]
  280 | void iowrite64be_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~
../lib/iomap.c:288:6: warning: no previous prototype for ‘iowrite64be_hi_lo’ [-Wmissing-prototypes]
  288 | void iowrite64be_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~
../drivers/gpu/drm/xe/xe_sriov_vf.c: In function ‘vf_post_migration_fixup_ggtt_nodes’:
../drivers/gpu/drm/xe/xe_sriov_vf.c:190:21: error: implicit declaration of function ‘xe_gt_is_media_type’ [-Werror=implicit-function-declaration]
  190 |                 if (xe_gt_is_media_type(gt))
      |                     ^~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
make[7]: *** [../scripts/Makefile.build:229: drivers/gpu/drm/xe/xe_sriov_vf.o] Error 1
make[7]: *** Waiting for unfinished jobs....
make[6]: *** [../scripts/Makefile.build:478: drivers/gpu/drm/xe] Error 2
make[5]: *** [../scripts/Makefile.build:478: drivers/gpu/drm] Error 2
make[4]: *** [../scripts/Makefile.build:478: drivers/gpu] Error 2
make[3]: *** [../scripts/Makefile.build:478: drivers] Error 2
make[2]: *** [/kernel/Makefile:1936: .] Error 2
make[1]: *** [/kernel/Makefile:224: __sub-make] Error 2
make: *** [Makefile:224: __sub-make] Error 2

[01:37:43] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[01:37:47] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json ARCH=um O=.kunit --jobs=48
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-11-16  1:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-16  1:27 [PATCH v1 0/3] drm/xe/vf: Post-migration recovery of GGTT nodes and CTB Tomasz Lis
2024-11-16  1:27 ` [PATCH v1 1/3] drm/drm_mm: Safe macro for iterating through nodes in range Tomasz Lis
2024-11-16  1:27 ` [PATCH v1 2/3] drm/xe/sriov: Shifting GGTT area post migration Tomasz Lis
2024-11-16  1:27 ` [PATCH v1 3/3] drm/xe/vf: Fixup CTB send buffer messages after migration Tomasz Lis
2024-11-16  1:37 ` ✓ CI.Patch_applied: success for drm/xe/vf: Post-migration recovery of GGTT nodes and CTB Patchwork
2024-11-16  1:37 ` ✗ CI.checkpatch: warning " Patchwork
2024-11-16  1:38 ` ✗ CI.KUnit: failure " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox