[PATCH v7 0/3] drm/xe/migrate: Atomicize CCS copy command setup

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v7 0/3] drm/xe/migrate: Atomicize CCS copy command setup
@ 2025-10-17 14:12 Satyanarayana K V P
  2025-10-17 14:12 ` [PATCH v7 1/3] " Satyanarayana K V P
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Satyanarayana K V P @ 2025-10-17 14:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Satyanarayana K V P

The CCS copy command is a 5-dword sequence. If the vCPU halts during
save/restore while this sequence is being programmed, partial writes may
trigger page faults when saving IGPU CCS metadata. Use the VMOVDQU
instruction to write the sequence atomically. Since VMOVDQU operates on
256-bit chunks, update EMIT_COPY_CCS_DW to emit 8 dwords instead of 5
dwords. Update emit_flush_invalidate() to use VMOVDQU operating	with
128-bit chunks.

The MI_STORE_DATA_IMM instruction header is quad dword in size. If the
vCPU halts during save/restore while this sequence is being programmed,
partial writes may trigger page faults when saving IGPU CCS metadata.
Update instruction header atomically.

Clear the contents of the CCS read/write batch buffer, ensuring no page
faults / GPU hang occur if migration happens midway.

---
V6 -> V7:
- Added description explaining why to use assembly instructions for
atomicity.
- Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
- Include <asm/cpufeature.h> though checkpatch complains. With
<linux/cpufeature.h> KUnit is throwing errors.

V5 -> V6:
- Used xe_gt_assert() instead of xe_assert() (Matt B).
- Use emit_atomic() function to write MI_STORE_DATA_IMM instruction
(Matt B).
- Fixed review comments (Rodrigo)

V4 -> V5:
- Fixed review comments (Matt B)

V3 -> V4:
- Fixed review comments (Wajdeczko)
- Fix issues reported by patchworks.

V2 -> V3:
- Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
- Updated emit_flush_invalidate() to use vmovdqu instruction.

V1 -> V2:
- Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
(Auld, Matthew)
- Fix issues reported by patchworks.

Satyanarayana K V P (3):
  drm/xe/migrate: Atomicize CCS copy command setup
  drm/xe/migrate: Make emit_pte() header write atomic
  drm/xe/vf: Clear CCS read/write buffers in atomic way

 drivers/gpu/drm/xe/xe_migrate.c      | 260 ++++++++++++++++++++++++---
 drivers/gpu/drm/xe/xe_migrate.h      |   3 +
 drivers/gpu/drm/xe/xe_sriov_vf_ccs.c |   5 +-
 3 files changed, 243 insertions(+), 25 deletions(-)

-- 
2.51.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v7 1/3] drm/xe/migrate: Atomicize CCS copy command setup
  2025-10-17 14:12 [PATCH v7 0/3] drm/xe/migrate: Atomicize CCS copy command setup Satyanarayana K V P
@ 2025-10-17 14:12 ` Satyanarayana K V P
  2025-10-17 14:27   ` Ville Syrjälä
  2025-10-17 18:11   ` Ville Syrjälä
  2025-10-17 14:12 ` [PATCH v7 2/3] drm/xe/migrate: Make emit_pte() header write atomic Satyanarayana K V P
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 21+ messages in thread
From: Satyanarayana K V P @ 2025-10-17 14:12 UTC (permalink / raw)
  To: intel-xe
  Cc: Satyanarayana K V P, Michal Wajdeczko, Matthew Brost,
	Matthew Auld, Rodrigo Vivi, Matt Roper

The CCS copy command is a 5-dword sequence. If the vCPU halts during
save/restore while this sequence is being programmed, partial writes may
trigger page faults when saving IGPU CCS metadata. Use the VMOVDQU
instruction to write the sequence atomically.

Since VMOVDQU operates on 256-bit chunks, update EMIT_COPY_CCS_DW to emit
8 dwords instead of 5 dwords.

Update emit_flush_invalidate() to use VMOVDQU operating with 128-bit
chunks.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>

---
V6 -> V7:
- Added description explaining why to use assembly instructions for
atomicity.
- Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
- Include <asm/cpufeature.h> though checkpatch complains. With
<linux/cpufeature.h> KUnit is throwing errors.

V5 -> V6:
- Fixed review comments (Rodrigo)

V4 -> V5:
- Fixed review comments. (Matt B)

V3 -> V4:
- Fixed review comments. (Wajdeczko)
- Fix issues reported by patchworks.

V2 -> V3:
- Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
- Updated emit_flush_invalidate() to use vmovdqu instruction.

V1 -> V2:
- Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
  (Auld, Matthew)
  - Fix issues reported by patchworks.
---
 drivers/gpu/drm/xe/xe_migrate.c | 112 ++++++++++++++++++++++++++------
 1 file changed, 91 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 3112c966c67d..e0be7396a0ab 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -5,6 +5,8 @@
 
 #include "xe_migrate.h"
 
+#include <asm/fpu/api.h>
+#include <asm/cpufeature.h>
 #include <linux/bitfield.h>
 #include <linux/sizes.h>
 
@@ -33,6 +35,7 @@
 #include "xe_res_cursor.h"
 #include "xe_sa.h"
 #include "xe_sched_job.h"
+#include "xe_sriov_vf_ccs.h"
 #include "xe_sync.h"
 #include "xe_trace_bo.h"
 #include "xe_validation.h"
@@ -657,18 +660,68 @@ static void emit_pte(struct xe_migrate *m,
 	}
 }
 
-#define EMIT_COPY_CCS_DW 5
+/*
+ * VF KMD registers two specialized LRCs with the GuC to handle save/restore
+ * operations for CCS metadata on IGPU. The GuC executes these LRCAs during
+ * VF state/restore operations.
+ *
+ * Each LRC contains a batch buffer pool that GuC submits to hardware during
+ * VF state save/restore operations. Since these operations can occur
+ * asynchronously at any time, we must ensure GPU instructions in the batch
+ * buffer are written atomically to prevent corruption from incomplete writes.
+ *
+ * To guarantee atomic instruction writes, we use x86 SIMD instructions
+ * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
+ * sections. This prevents vCPU preemption during instruction generation,
+ * ensuring complete GPU commands are written to the batch buffer.
+ */
+
+static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
+{
+	xe_assert(xe, !IS_DGFX(xe));
+#ifdef CONFIG_X86
+	kernel_fpu_begin();
+	if (size == SZ_128) {
+		asm("vmovdqu (%0), %%xmm0\n"
+		    "vmovups %%xmm0,   (%1)\n"
+		    :: "r" (src), "r" (dst) : "memory");
+	} else if (size == SZ_256) {
+		asm("vmovdqu (%0), %%ymm0\n"
+		    "vmovups %%ymm0,   (%1)\n"
+		    :: "r" (src), "r" (dst) : "memory");
+	}
+	kernel_fpu_end();
+#endif
+}
+
+static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
+{
+	u32 instr_size = size * BITS_PER_BYTE;
+
+	xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
+
+	if (IS_VF_CCS_READY(gt_to_xe(gt))) {
+		xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
+		memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
+	} else {
+		memcpy(dst, src, size);
+	}
+}
+
+#define EMIT_COPY_CCS_DW 8
 static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
 			  u64 dst_ofs, bool dst_is_indirect,
 			  u64 src_ofs, bool src_is_indirect,
 			  u32 size)
 {
+	u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
 	struct xe_device *xe = gt_to_xe(gt);
 	u32 *cs = bb->cs + bb->len;
 	u32 num_ccs_blks;
 	u32 num_pages;
 	u32 ccs_copy_size;
 	u32 mocs;
+	u32 i = 0;
 
 	if (GRAPHICS_VERx100(xe) >= 2000) {
 		num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
@@ -686,15 +739,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
 		mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
 	}
 
-	*cs++ = XY_CTRL_SURF_COPY_BLT |
-		(src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
-		(dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
-		ccs_copy_size;
-	*cs++ = lower_32_bits(src_ofs);
-	*cs++ = upper_32_bits(src_ofs) | mocs;
-	*cs++ = lower_32_bits(dst_ofs);
-	*cs++ = upper_32_bits(dst_ofs) | mocs;
+	dw[i++] = XY_CTRL_SURF_COPY_BLT |
+		  (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
+		  (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
+		  ccs_copy_size;
+	dw[i++] = lower_32_bits(src_ofs);
+	dw[i++] = upper_32_bits(src_ofs) | mocs;
+	dw[i++] = lower_32_bits(dst_ofs);
+	dw[i++] = upper_32_bits(dst_ofs) | mocs;
 
+	/*
+	 * The CCS copy command is a 5-dword sequence. If the vCPU halts during
+	 * save/restore while this sequence is being issued, partial writes may trigger
+	 * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
+	 * write the sequence atomically.
+	 */
+	emit_atomic(gt, cs, dw, sizeof(dw));
+	cs += EMIT_COPY_CCS_DW;
 	bb->len = cs - bb->cs;
 }
 
@@ -1006,18 +1067,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
 	return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
 }
 
-static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
+/*
+ * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
+ * save/restore while this sequence is being issued, partial writes may
+ * trigger page faults when saving iGPU CCS metadata. Use
+ * emit_atomic() to write the sequence atomically.
+ */
+#define EMIT_FLUSH_INVALIDATE_DW 4
+static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
 {
 	u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
+	u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
+
+	dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
+		      MI_FLUSH_IMM_DW | flags;
+	dw[j++] = lower_32_bits(addr);
+	dw[j++] = upper_32_bits(addr);
+	dw[j++] = MI_NOOP;
 
-	dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
-		  MI_FLUSH_IMM_DW | flags;
-	dw[i++] = lower_32_bits(addr);
-	dw[i++] = upper_32_bits(addr);
-	dw[i++] = MI_NOOP;
-	dw[i++] = MI_NOOP;
+	emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
 
-	return i;
+	return i + j;
 }
 
 /**
@@ -1062,7 +1132,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
 	/* Calculate Batch buffer size */
 	batch_size = 0;
 	while (size) {
-		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
+		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
 		u64 ccs_ofs, ccs_size;
 		u32 ccs_pt;
 
@@ -1103,7 +1173,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
 	 * sizes here again before copy command is emitted.
 	 */
 	while (size) {
-		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
+		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
 		u32 flush_flags = 0;
 		u64 ccs_ofs, ccs_size;
 		u32 ccs_pt;
@@ -1126,11 +1196,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
 
 		emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
 
-		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
+		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
 		flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
 						  src_L0_ofs, dst_is_pltt,
 						  src_L0, ccs_ofs, true);
-		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
+		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
 
 		size -= src_L0;
 	}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/3] drm/xe/migrate: Atomicize CCS copy command setup
  2025-10-17 14:12 ` [PATCH v7 1/3] " Satyanarayana K V P
@ 2025-10-17 14:27   ` Ville Syrjälä
  2025-10-17 15:16     ` K V P, Satyanarayana
  2025-10-17 18:11   ` Ville Syrjälä
  1 sibling, 1 reply; 21+ messages in thread
From: Ville Syrjälä @ 2025-10-17 14:27 UTC (permalink / raw)
  To: Satyanarayana K V P
  Cc: intel-xe, Michal Wajdeczko, Matthew Brost, Matthew Auld,
	Rodrigo Vivi, Matt Roper

On Fri, Oct 17, 2025 at 07:42:28PM +0530, Satyanarayana K V P wrote:
> The CCS copy command is a 5-dword sequence. If the vCPU halts during
> save/restore while this sequence is being programmed, partial writes may
> trigger page faults when saving IGPU CCS metadata. Use the VMOVDQU
> instruction to write the sequence atomically.

If this whole thing is so racy why don't you always add a new
BB_END after new commands, and only replace the previous BB_END
with NOOP _after_ the new commands have been fully written?

> 
> Since VMOVDQU operates on 256-bit chunks, update EMIT_COPY_CCS_DW to emit
> 8 dwords instead of 5 dwords.
> 
> Update emit_flush_invalidate() to use VMOVDQU operating with 128-bit
> chunks.
> 
> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> 
> ---
> V6 -> V7:
> - Added description explaining why to use assembly instructions for
> atomicity.
> - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
> - Include <asm/cpufeature.h> though checkpatch complains. With
> <linux/cpufeature.h> KUnit is throwing errors.
> 
> V5 -> V6:
> - Fixed review comments (Rodrigo)
> 
> V4 -> V5:
> - Fixed review comments. (Matt B)
> 
> V3 -> V4:
> - Fixed review comments. (Wajdeczko)
> - Fix issues reported by patchworks.
> 
> V2 -> V3:
> - Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
> - Updated emit_flush_invalidate() to use vmovdqu instruction.
> 
> V1 -> V2:
> - Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
>   (Auld, Matthew)
>   - Fix issues reported by patchworks.
> ---
>  drivers/gpu/drm/xe/xe_migrate.c | 112 ++++++++++++++++++++++++++------
>  1 file changed, 91 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index 3112c966c67d..e0be7396a0ab 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -5,6 +5,8 @@
>  
>  #include "xe_migrate.h"
>  
> +#include <asm/fpu/api.h>
> +#include <asm/cpufeature.h>
>  #include <linux/bitfield.h>
>  #include <linux/sizes.h>
>  
> @@ -33,6 +35,7 @@
>  #include "xe_res_cursor.h"
>  #include "xe_sa.h"
>  #include "xe_sched_job.h"
> +#include "xe_sriov_vf_ccs.h"
>  #include "xe_sync.h"
>  #include "xe_trace_bo.h"
>  #include "xe_validation.h"
> @@ -657,18 +660,68 @@ static void emit_pte(struct xe_migrate *m,
>  	}
>  }
>  
> -#define EMIT_COPY_CCS_DW 5
> +/*
> + * VF KMD registers two specialized LRCs with the GuC to handle save/restore
> + * operations for CCS metadata on IGPU. The GuC executes these LRCAs during
> + * VF state/restore operations.
> + *
> + * Each LRC contains a batch buffer pool that GuC submits to hardware during
> + * VF state save/restore operations. Since these operations can occur
> + * asynchronously at any time, we must ensure GPU instructions in the batch
> + * buffer are written atomically to prevent corruption from incomplete writes.
> + *
> + * To guarantee atomic instruction writes, we use x86 SIMD instructions
> + * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
> + * sections. This prevents vCPU preemption during instruction generation,
> + * ensuring complete GPU commands are written to the batch buffer.
> + */
> +
> +static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
> +{
> +	xe_assert(xe, !IS_DGFX(xe));
> +#ifdef CONFIG_X86
> +	kernel_fpu_begin();
> +	if (size == SZ_128) {
> +		asm("vmovdqu (%0), %%xmm0\n"
> +		    "vmovups %%xmm0,   (%1)\n"
> +		    :: "r" (src), "r" (dst) : "memory");
> +	} else if (size == SZ_256) {
> +		asm("vmovdqu (%0), %%ymm0\n"
> +		    "vmovups %%ymm0,   (%1)\n"
> +		    :: "r" (src), "r" (dst) : "memory");
> +	}
> +	kernel_fpu_end();
> +#endif
> +}
> +
> +static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
> +{
> +	u32 instr_size = size * BITS_PER_BYTE;
> +
> +	xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
> +
> +	if (IS_VF_CCS_READY(gt_to_xe(gt))) {
> +		xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
> +		memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
> +	} else {
> +		memcpy(dst, src, size);
> +	}
> +}
> +
> +#define EMIT_COPY_CCS_DW 8
>  static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
>  			  u64 dst_ofs, bool dst_is_indirect,
>  			  u64 src_ofs, bool src_is_indirect,
>  			  u32 size)
>  {
> +	u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
>  	struct xe_device *xe = gt_to_xe(gt);
>  	u32 *cs = bb->cs + bb->len;
>  	u32 num_ccs_blks;
>  	u32 num_pages;
>  	u32 ccs_copy_size;
>  	u32 mocs;
> +	u32 i = 0;
>  
>  	if (GRAPHICS_VERx100(xe) >= 2000) {
>  		num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
> @@ -686,15 +739,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
>  		mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
>  	}
>  
> -	*cs++ = XY_CTRL_SURF_COPY_BLT |
> -		(src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> -		(dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> -		ccs_copy_size;
> -	*cs++ = lower_32_bits(src_ofs);
> -	*cs++ = upper_32_bits(src_ofs) | mocs;
> -	*cs++ = lower_32_bits(dst_ofs);
> -	*cs++ = upper_32_bits(dst_ofs) | mocs;
> +	dw[i++] = XY_CTRL_SURF_COPY_BLT |
> +		  (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> +		  (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> +		  ccs_copy_size;
> +	dw[i++] = lower_32_bits(src_ofs);
> +	dw[i++] = upper_32_bits(src_ofs) | mocs;
> +	dw[i++] = lower_32_bits(dst_ofs);
> +	dw[i++] = upper_32_bits(dst_ofs) | mocs;
>  
> +	/*
> +	 * The CCS copy command is a 5-dword sequence. If the vCPU halts during
> +	 * save/restore while this sequence is being issued, partial writes may trigger
> +	 * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
> +	 * write the sequence atomically.
> +	 */
> +	emit_atomic(gt, cs, dw, sizeof(dw));
> +	cs += EMIT_COPY_CCS_DW;
>  	bb->len = cs - bb->cs;
>  }
>  
> @@ -1006,18 +1067,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
>  	return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
>  }
>  
> -static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
> +/*
> + * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
> + * save/restore while this sequence is being issued, partial writes may
> + * trigger page faults when saving iGPU CCS metadata. Use
> + * emit_atomic() to write the sequence atomically.
> + */
> +#define EMIT_FLUSH_INVALIDATE_DW 4
> +static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
>  {
>  	u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
> +	u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
> +
> +	dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> +		      MI_FLUSH_IMM_DW | flags;
> +	dw[j++] = lower_32_bits(addr);
> +	dw[j++] = upper_32_bits(addr);
> +	dw[j++] = MI_NOOP;
>  
> -	dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> -		  MI_FLUSH_IMM_DW | flags;
> -	dw[i++] = lower_32_bits(addr);
> -	dw[i++] = upper_32_bits(addr);
> -	dw[i++] = MI_NOOP;
> -	dw[i++] = MI_NOOP;
> +	emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
>  
> -	return i;
> +	return i + j;
>  }
>  
>  /**
> @@ -1062,7 +1132,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
>  	/* Calculate Batch buffer size */
>  	batch_size = 0;
>  	while (size) {
> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
>  		u64 ccs_ofs, ccs_size;
>  		u32 ccs_pt;
>  
> @@ -1103,7 +1173,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
>  	 * sizes here again before copy command is emitted.
>  	 */
>  	while (size) {
> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
>  		u32 flush_flags = 0;
>  		u64 ccs_ofs, ccs_size;
>  		u32 ccs_pt;
> @@ -1126,11 +1196,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
>  
>  		emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
>  
> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
>  		flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
>  						  src_L0_ofs, dst_is_pltt,
>  						  src_L0, ccs_ofs, true);
> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
>  
>  		size -= src_L0;
>  	}
> -- 
> 2.51.0

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/3] drm/xe/migrate: Atomicize CCS copy command setup
  2025-10-17 14:27   ` Ville Syrjälä
@ 2025-10-17 15:16     ` K V P, Satyanarayana
  2025-10-17 15:26       ` Ville Syrjälä
  0 siblings, 1 reply; 21+ messages in thread
From: K V P, Satyanarayana @ 2025-10-17 15:16 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: intel-xe, Michal Wajdeczko, Matthew Brost, Matthew Auld,
	Rodrigo Vivi, Matt Roper



On 17-10-2025 19:57, Ville Syrjälä wrote:
> On Fri, Oct 17, 2025 at 07:42:28PM +0530, Satyanarayana K V P wrote:
>> The CCS copy command is a 5-dword sequence. If the vCPU halts during
>> save/restore while this sequence is being programmed, partial writes may
>> trigger page faults when saving IGPU CCS metadata. Use the VMOVDQU
>> instruction to write the sequence atomically.
> 
> If this whole thing is so racy why don't you always add a new
> BB_END after new commands, and only replace the previous BB_END
> with NOOP _after_ the new commands have been fully written?
> 
We maintain a suballocator for batch buffer management, with size 
proportional to system memory (e.g., 16MB suballocator for 8GB SMEM). 
Batch buffers are dynamically allocated from this pool based on the 
number of active workloads. The entire suballocator region is submitted 
to hardware for CCS metadata copy operations.

We cannot insert BB_END commands after each individual instruction 
sequence because additional GPU instructions may be appended later. 
Instead, a single BB_END marker is placed at the suballocator's end to 
terminate execution.

This patch ensures race-condition-safe CCS metadata save/restore 
operations by guaranteeing atomic writes to the batch buffer, preventing 
corruption regardless of when save/restore operations are triggered.

-Satya.>>
>> Since VMOVDQU operates on 256-bit chunks, update EMIT_COPY_CCS_DW to emit
>> 8 dwords instead of 5 dwords.
>>
>> Update emit_flush_invalidate() to use VMOVDQU operating with 128-bit
>> chunks.
>>
>> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Matthew Auld <matthew.auld@intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>>
>> ---
>> V6 -> V7:
>> - Added description explaining why to use assembly instructions for
>> atomicity.
>> - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
>> - Include <asm/cpufeature.h> though checkpatch complains. With
>> <linux/cpufeature.h> KUnit is throwing errors.
>>
>> V5 -> V6:
>> - Fixed review comments (Rodrigo)
>>
>> V4 -> V5:
>> - Fixed review comments. (Matt B)
>>
>> V3 -> V4:
>> - Fixed review comments. (Wajdeczko)
>> - Fix issues reported by patchworks.
>>
>> V2 -> V3:
>> - Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
>> - Updated emit_flush_invalidate() to use vmovdqu instruction.
>>
>> V1 -> V2:
>> - Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
>>    (Auld, Matthew)
>>    - Fix issues reported by patchworks.
>> ---
>>   drivers/gpu/drm/xe/xe_migrate.c | 112 ++++++++++++++++++++++++++------
>>   1 file changed, 91 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
>> index 3112c966c67d..e0be7396a0ab 100644
>> --- a/drivers/gpu/drm/xe/xe_migrate.c
>> +++ b/drivers/gpu/drm/xe/xe_migrate.c
>> @@ -5,6 +5,8 @@
>>   
>>   #include "xe_migrate.h"
>>   
>> +#include <asm/fpu/api.h>
>> +#include <asm/cpufeature.h>
>>   #include <linux/bitfield.h>
>>   #include <linux/sizes.h>
>>   
>> @@ -33,6 +35,7 @@
>>   #include "xe_res_cursor.h"
>>   #include "xe_sa.h"
>>   #include "xe_sched_job.h"
>> +#include "xe_sriov_vf_ccs.h"
>>   #include "xe_sync.h"
>>   #include "xe_trace_bo.h"
>>   #include "xe_validation.h"
>> @@ -657,18 +660,68 @@ static void emit_pte(struct xe_migrate *m,
>>   	}
>>   }
>>   
>> -#define EMIT_COPY_CCS_DW 5
>> +/*
>> + * VF KMD registers two specialized LRCs with the GuC to handle save/restore
>> + * operations for CCS metadata on IGPU. The GuC executes these LRCAs during
>> + * VF state/restore operations.
>> + *
>> + * Each LRC contains a batch buffer pool that GuC submits to hardware during
>> + * VF state save/restore operations. Since these operations can occur
>> + * asynchronously at any time, we must ensure GPU instructions in the batch
>> + * buffer are written atomically to prevent corruption from incomplete writes.
>> + *
>> + * To guarantee atomic instruction writes, we use x86 SIMD instructions
>> + * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
>> + * sections. This prevents vCPU preemption during instruction generation,
>> + * ensuring complete GPU commands are written to the batch buffer.
>> + */
>> +
>> +static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
>> +{
>> +	xe_assert(xe, !IS_DGFX(xe));
>> +#ifdef CONFIG_X86
>> +	kernel_fpu_begin();
>> +	if (size == SZ_128) {
>> +		asm("vmovdqu (%0), %%xmm0\n"
>> +		    "vmovups %%xmm0,   (%1)\n"
>> +		    :: "r" (src), "r" (dst) : "memory");
>> +	} else if (size == SZ_256) {
>> +		asm("vmovdqu (%0), %%ymm0\n"
>> +		    "vmovups %%ymm0,   (%1)\n"
>> +		    :: "r" (src), "r" (dst) : "memory");
>> +	}
>> +	kernel_fpu_end();
>> +#endif
>> +}
>> +
>> +static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
>> +{
>> +	u32 instr_size = size * BITS_PER_BYTE;
>> +
>> +	xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
>> +
>> +	if (IS_VF_CCS_READY(gt_to_xe(gt))) {
>> +		xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
>> +		memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
>> +	} else {
>> +		memcpy(dst, src, size);
>> +	}
>> +}
>> +
>> +#define EMIT_COPY_CCS_DW 8
>>   static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
>>   			  u64 dst_ofs, bool dst_is_indirect,
>>   			  u64 src_ofs, bool src_is_indirect,
>>   			  u32 size)
>>   {
>> +	u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
>>   	struct xe_device *xe = gt_to_xe(gt);
>>   	u32 *cs = bb->cs + bb->len;
>>   	u32 num_ccs_blks;
>>   	u32 num_pages;
>>   	u32 ccs_copy_size;
>>   	u32 mocs;
>> +	u32 i = 0;
>>   
>>   	if (GRAPHICS_VERx100(xe) >= 2000) {
>>   		num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
>> @@ -686,15 +739,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
>>   		mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
>>   	}
>>   
>> -	*cs++ = XY_CTRL_SURF_COPY_BLT |
>> -		(src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
>> -		(dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
>> -		ccs_copy_size;
>> -	*cs++ = lower_32_bits(src_ofs);
>> -	*cs++ = upper_32_bits(src_ofs) | mocs;
>> -	*cs++ = lower_32_bits(dst_ofs);
>> -	*cs++ = upper_32_bits(dst_ofs) | mocs;
>> +	dw[i++] = XY_CTRL_SURF_COPY_BLT |
>> +		  (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
>> +		  (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
>> +		  ccs_copy_size;
>> +	dw[i++] = lower_32_bits(src_ofs);
>> +	dw[i++] = upper_32_bits(src_ofs) | mocs;
>> +	dw[i++] = lower_32_bits(dst_ofs);
>> +	dw[i++] = upper_32_bits(dst_ofs) | mocs;
>>   
>> +	/*
>> +	 * The CCS copy command is a 5-dword sequence. If the vCPU halts during
>> +	 * save/restore while this sequence is being issued, partial writes may trigger
>> +	 * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
>> +	 * write the sequence atomically.
>> +	 */
>> +	emit_atomic(gt, cs, dw, sizeof(dw));
>> +	cs += EMIT_COPY_CCS_DW;
>>   	bb->len = cs - bb->cs;
>>   }
>>   
>> @@ -1006,18 +1067,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
>>   	return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
>>   }
>>   
>> -static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
>> +/*
>> + * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
>> + * save/restore while this sequence is being issued, partial writes may
>> + * trigger page faults when saving iGPU CCS metadata. Use
>> + * emit_atomic() to write the sequence atomically.
>> + */
>> +#define EMIT_FLUSH_INVALIDATE_DW 4
>> +static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
>>   {
>>   	u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
>> +	u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
>> +
>> +	dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
>> +		      MI_FLUSH_IMM_DW | flags;
>> +	dw[j++] = lower_32_bits(addr);
>> +	dw[j++] = upper_32_bits(addr);
>> +	dw[j++] = MI_NOOP;
>>   
>> -	dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
>> -		  MI_FLUSH_IMM_DW | flags;
>> -	dw[i++] = lower_32_bits(addr);
>> -	dw[i++] = upper_32_bits(addr);
>> -	dw[i++] = MI_NOOP;
>> -	dw[i++] = MI_NOOP;
>> +	emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
>>   
>> -	return i;
>> +	return i + j;
>>   }
>>   
>>   /**
>> @@ -1062,7 +1132,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
>>   	/* Calculate Batch buffer size */
>>   	batch_size = 0;
>>   	while (size) {
>> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
>> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
>>   		u64 ccs_ofs, ccs_size;
>>   		u32 ccs_pt;
>>   
>> @@ -1103,7 +1173,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
>>   	 * sizes here again before copy command is emitted.
>>   	 */
>>   	while (size) {
>> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
>> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
>>   		u32 flush_flags = 0;
>>   		u64 ccs_ofs, ccs_size;
>>   		u32 ccs_pt;
>> @@ -1126,11 +1196,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
>>   
>>   		emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
>>   
>> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
>> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
>>   		flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
>>   						  src_L0_ofs, dst_is_pltt,
>>   						  src_L0, ccs_ofs, true);
>> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
>> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
>>   
>>   		size -= src_L0;
>>   	}
>> -- 
>> 2.51.0
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/3] drm/xe/migrate: Atomicize CCS copy command setup
  2025-10-17 15:16     ` K V P, Satyanarayana
@ 2025-10-17 15:26       ` Ville Syrjälä
  2025-10-17 16:29         ` K V P, Satyanarayana
  0 siblings, 1 reply; 21+ messages in thread
From: Ville Syrjälä @ 2025-10-17 15:26 UTC (permalink / raw)
  To: K V P, Satyanarayana
  Cc: intel-xe, Michal Wajdeczko, Matthew Brost, Matthew Auld,
	Rodrigo Vivi, Matt Roper

On Fri, Oct 17, 2025 at 08:46:37PM +0530, K V P, Satyanarayana wrote:
> 
> 
> On 17-10-2025 19:57, Ville Syrjälä wrote:
> > On Fri, Oct 17, 2025 at 07:42:28PM +0530, Satyanarayana K V P wrote:
> >> The CCS copy command is a 5-dword sequence. If the vCPU halts during
> >> save/restore while this sequence is being programmed, partial writes may
> >> trigger page faults when saving IGPU CCS metadata. Use the VMOVDQU
> >> instruction to write the sequence atomically.
> > 
> > If this whole thing is so racy why don't you always add a new
> > BB_END after new commands, and only replace the previous BB_END
> > with NOOP _after_ the new commands have been fully written?
> > 
> We maintain a suballocator for batch buffer management, with size 
> proportional to system memory (e.g., 16MB suballocator for 8GB SMEM). 
> Batch buffers are dynamically allocated from this pool based on the 
> number of active workloads. The entire suballocator region is submitted 
> to hardware for CCS metadata copy operations.
> 
> We cannot insert BB_END commands after each individual instruction 
> sequence because additional GPU instructions may be appended later. 

You *overwrite* the previous BB_END after the new commands have been
appended.

> Instead, a single BB_END marker is placed at the suballocator's end to 
> terminate execution.
> 
> This patch ensures race-condition-safe CCS metadata save/restore 
> operations by guaranteeing atomic writes to the batch buffer, preventing 
> corruption regardless of when save/restore operations are triggered.
> 
> -Satya.>>
> >> Since VMOVDQU operates on 256-bit chunks, update EMIT_COPY_CCS_DW to emit
> >> 8 dwords instead of 5 dwords.
> >>
> >> Update emit_flush_invalidate() to use VMOVDQU operating with 128-bit
> >> chunks.
> >>
> >> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> >> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> >> Cc: Matthew Brost <matthew.brost@intel.com>
> >> Cc: Matthew Auld <matthew.auld@intel.com>
> >> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> >> Cc: Matt Roper <matthew.d.roper@intel.com>
> >>
> >> ---
> >> V6 -> V7:
> >> - Added description explaining why to use assembly instructions for
> >> atomicity.
> >> - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
> >> - Include <asm/cpufeature.h> though checkpatch complains. With
> >> <linux/cpufeature.h> KUnit is throwing errors.
> >>
> >> V5 -> V6:
> >> - Fixed review comments (Rodrigo)
> >>
> >> V4 -> V5:
> >> - Fixed review comments. (Matt B)
> >>
> >> V3 -> V4:
> >> - Fixed review comments. (Wajdeczko)
> >> - Fix issues reported by patchworks.
> >>
> >> V2 -> V3:
> >> - Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
> >> - Updated emit_flush_invalidate() to use vmovdqu instruction.
> >>
> >> V1 -> V2:
> >> - Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
> >>    (Auld, Matthew)
> >>    - Fix issues reported by patchworks.
> >> ---
> >>   drivers/gpu/drm/xe/xe_migrate.c | 112 ++++++++++++++++++++++++++------
> >>   1 file changed, 91 insertions(+), 21 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> >> index 3112c966c67d..e0be7396a0ab 100644
> >> --- a/drivers/gpu/drm/xe/xe_migrate.c
> >> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> >> @@ -5,6 +5,8 @@
> >>   
> >>   #include "xe_migrate.h"
> >>   
> >> +#include <asm/fpu/api.h>
> >> +#include <asm/cpufeature.h>
> >>   #include <linux/bitfield.h>
> >>   #include <linux/sizes.h>
> >>   
> >> @@ -33,6 +35,7 @@
> >>   #include "xe_res_cursor.h"
> >>   #include "xe_sa.h"
> >>   #include "xe_sched_job.h"
> >> +#include "xe_sriov_vf_ccs.h"
> >>   #include "xe_sync.h"
> >>   #include "xe_trace_bo.h"
> >>   #include "xe_validation.h"
> >> @@ -657,18 +660,68 @@ static void emit_pte(struct xe_migrate *m,
> >>   	}
> >>   }
> >>   
> >> -#define EMIT_COPY_CCS_DW 5
> >> +/*
> >> + * VF KMD registers two specialized LRCs with the GuC to handle save/restore
> >> + * operations for CCS metadata on IGPU. The GuC executes these LRCAs during
> >> + * VF state/restore operations.
> >> + *
> >> + * Each LRC contains a batch buffer pool that GuC submits to hardware during
> >> + * VF state save/restore operations. Since these operations can occur
> >> + * asynchronously at any time, we must ensure GPU instructions in the batch
> >> + * buffer are written atomically to prevent corruption from incomplete writes.
> >> + *
> >> + * To guarantee atomic instruction writes, we use x86 SIMD instructions
> >> + * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
> >> + * sections. This prevents vCPU preemption during instruction generation,
> >> + * ensuring complete GPU commands are written to the batch buffer.
> >> + */
> >> +
> >> +static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
> >> +{
> >> +	xe_assert(xe, !IS_DGFX(xe));
> >> +#ifdef CONFIG_X86
> >> +	kernel_fpu_begin();
> >> +	if (size == SZ_128) {
> >> +		asm("vmovdqu (%0), %%xmm0\n"
> >> +		    "vmovups %%xmm0,   (%1)\n"
> >> +		    :: "r" (src), "r" (dst) : "memory");
> >> +	} else if (size == SZ_256) {
> >> +		asm("vmovdqu (%0), %%ymm0\n"
> >> +		    "vmovups %%ymm0,   (%1)\n"
> >> +		    :: "r" (src), "r" (dst) : "memory");
> >> +	}
> >> +	kernel_fpu_end();
> >> +#endif
> >> +}
> >> +
> >> +static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
> >> +{
> >> +	u32 instr_size = size * BITS_PER_BYTE;
> >> +
> >> +	xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
> >> +
> >> +	if (IS_VF_CCS_READY(gt_to_xe(gt))) {
> >> +		xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
> >> +		memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
> >> +	} else {
> >> +		memcpy(dst, src, size);
> >> +	}
> >> +}
> >> +
> >> +#define EMIT_COPY_CCS_DW 8
> >>   static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> >>   			  u64 dst_ofs, bool dst_is_indirect,
> >>   			  u64 src_ofs, bool src_is_indirect,
> >>   			  u32 size)
> >>   {
> >> +	u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
> >>   	struct xe_device *xe = gt_to_xe(gt);
> >>   	u32 *cs = bb->cs + bb->len;
> >>   	u32 num_ccs_blks;
> >>   	u32 num_pages;
> >>   	u32 ccs_copy_size;
> >>   	u32 mocs;
> >> +	u32 i = 0;
> >>   
> >>   	if (GRAPHICS_VERx100(xe) >= 2000) {
> >>   		num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
> >> @@ -686,15 +739,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> >>   		mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
> >>   	}
> >>   
> >> -	*cs++ = XY_CTRL_SURF_COPY_BLT |
> >> -		(src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> >> -		(dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> >> -		ccs_copy_size;
> >> -	*cs++ = lower_32_bits(src_ofs);
> >> -	*cs++ = upper_32_bits(src_ofs) | mocs;
> >> -	*cs++ = lower_32_bits(dst_ofs);
> >> -	*cs++ = upper_32_bits(dst_ofs) | mocs;
> >> +	dw[i++] = XY_CTRL_SURF_COPY_BLT |
> >> +		  (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> >> +		  (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> >> +		  ccs_copy_size;
> >> +	dw[i++] = lower_32_bits(src_ofs);
> >> +	dw[i++] = upper_32_bits(src_ofs) | mocs;
> >> +	dw[i++] = lower_32_bits(dst_ofs);
> >> +	dw[i++] = upper_32_bits(dst_ofs) | mocs;
> >>   
> >> +	/*
> >> +	 * The CCS copy command is a 5-dword sequence. If the vCPU halts during
> >> +	 * save/restore while this sequence is being issued, partial writes may trigger
> >> +	 * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
> >> +	 * write the sequence atomically.
> >> +	 */
> >> +	emit_atomic(gt, cs, dw, sizeof(dw));
> >> +	cs += EMIT_COPY_CCS_DW;
> >>   	bb->len = cs - bb->cs;
> >>   }
> >>   
> >> @@ -1006,18 +1067,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
> >>   	return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
> >>   }
> >>   
> >> -static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
> >> +/*
> >> + * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
> >> + * save/restore while this sequence is being issued, partial writes may
> >> + * trigger page faults when saving iGPU CCS metadata. Use
> >> + * emit_atomic() to write the sequence atomically.
> >> + */
> >> +#define EMIT_FLUSH_INVALIDATE_DW 4
> >> +static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
> >>   {
> >>   	u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
> >> +	u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
> >> +
> >> +	dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> >> +		      MI_FLUSH_IMM_DW | flags;
> >> +	dw[j++] = lower_32_bits(addr);
> >> +	dw[j++] = upper_32_bits(addr);
> >> +	dw[j++] = MI_NOOP;
> >>   
> >> -	dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> >> -		  MI_FLUSH_IMM_DW | flags;
> >> -	dw[i++] = lower_32_bits(addr);
> >> -	dw[i++] = upper_32_bits(addr);
> >> -	dw[i++] = MI_NOOP;
> >> -	dw[i++] = MI_NOOP;
> >> +	emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
> >>   
> >> -	return i;
> >> +	return i + j;
> >>   }
> >>   
> >>   /**
> >> @@ -1062,7 +1132,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> >>   	/* Calculate Batch buffer size */
> >>   	batch_size = 0;
> >>   	while (size) {
> >> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> >> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> >>   		u64 ccs_ofs, ccs_size;
> >>   		u32 ccs_pt;
> >>   
> >> @@ -1103,7 +1173,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> >>   	 * sizes here again before copy command is emitted.
> >>   	 */
> >>   	while (size) {
> >> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> >> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> >>   		u32 flush_flags = 0;
> >>   		u64 ccs_ofs, ccs_size;
> >>   		u32 ccs_pt;
> >> @@ -1126,11 +1196,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> >>   
> >>   		emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
> >>   
> >> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> >> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> >>   		flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
> >>   						  src_L0_ofs, dst_is_pltt,
> >>   						  src_L0, ccs_ofs, true);
> >> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> >> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> >>   
> >>   		size -= src_L0;
> >>   	}
> >> -- 
> >> 2.51.0
> > 

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/3] drm/xe/migrate: Atomicize CCS copy command setup
  2025-10-17 15:26       ` Ville Syrjälä
@ 2025-10-17 16:29         ` K V P, Satyanarayana
  2025-10-17 16:41           ` Rodrigo Vivi
  2025-10-17 16:51           ` Ville Syrjälä
  0 siblings, 2 replies; 21+ messages in thread
From: K V P, Satyanarayana @ 2025-10-17 16:29 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: intel-xe, Michal Wajdeczko, Matthew Brost, Matthew Auld,
	Rodrigo Vivi, Matt Roper



On 17-10-2025 20:56, Ville Syrjälä wrote:
> On Fri, Oct 17, 2025 at 08:46:37PM +0530, K V P, Satyanarayana wrote:
>>
>>
>> On 17-10-2025 19:57, Ville Syrjälä wrote:
>>> On Fri, Oct 17, 2025 at 07:42:28PM +0530, Satyanarayana K V P wrote:
>>>> The CCS copy command is a 5-dword sequence. If the vCPU halts during
>>>> save/restore while this sequence is being programmed, partial writes may
>>>> trigger page faults when saving IGPU CCS metadata. Use the VMOVDQU
>>>> instruction to write the sequence atomically.
>>>
>>> If this whole thing is so racy why don't you always add a new
>>> BB_END after new commands, and only replace the previous BB_END
>>> with NOOP _after_ the new commands have been fully written?
>>>
>> We maintain a suballocator for batch buffer management, with size
>> proportional to system memory (e.g., 16MB suballocator for 8GB SMEM).
>> Batch buffers are dynamically allocated from this pool based on the
>> number of active workloads. The entire suballocator region is submitted
>> to hardware for CCS metadata copy operations.
>>
>> We cannot insert BB_END commands after each individual instruction
>> sequence because additional GPU instructions may be appended later.
> 
> You *overwrite* the previous BB_END after the new commands have been
> appended.
We do not know where the new BB allocation will be. It may not be 
sequential and every BO has a BB. BBs are allocated and freed so often 
based on BOs getting created and destroyed. So, we can't use that approach.

-Satya.>
>> Instead, a single BB_END marker is placed at the suballocator's end to
>> terminate execution.
>>
>> This patch ensures race-condition-safe CCS metadata save/restore
>> operations by guaranteeing atomic writes to the batch buffer, preventing
>> corruption regardless of when save/restore operations are triggered.
>>
>> -Satya.>>
>>>> Since VMOVDQU operates on 256-bit chunks, update EMIT_COPY_CCS_DW to emit
>>>> 8 dwords instead of 5 dwords.
>>>>
>>>> Update emit_flush_invalidate() to use VMOVDQU operating with 128-bit
>>>> chunks.
>>>>
>>>> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
>>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>>> Cc: Matthew Auld <matthew.auld@intel.com>
>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>> Cc: Matt Roper <matthew.d.roper@intel.com>
>>>>
>>>> ---
>>>> V6 -> V7:
>>>> - Added description explaining why to use assembly instructions for
>>>> atomicity.
>>>> - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
>>>> - Include <asm/cpufeature.h> though checkpatch complains. With
>>>> <linux/cpufeature.h> KUnit is throwing errors.
>>>>
>>>> V5 -> V6:
>>>> - Fixed review comments (Rodrigo)
>>>>
>>>> V4 -> V5:
>>>> - Fixed review comments. (Matt B)
>>>>
>>>> V3 -> V4:
>>>> - Fixed review comments. (Wajdeczko)
>>>> - Fix issues reported by patchworks.
>>>>
>>>> V2 -> V3:
>>>> - Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
>>>> - Updated emit_flush_invalidate() to use vmovdqu instruction.
>>>>
>>>> V1 -> V2:
>>>> - Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
>>>>     (Auld, Matthew)
>>>>     - Fix issues reported by patchworks.
>>>> ---
>>>>    drivers/gpu/drm/xe/xe_migrate.c | 112 ++++++++++++++++++++++++++------
>>>>    1 file changed, 91 insertions(+), 21 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
>>>> index 3112c966c67d..e0be7396a0ab 100644
>>>> --- a/drivers/gpu/drm/xe/xe_migrate.c
>>>> +++ b/drivers/gpu/drm/xe/xe_migrate.c
>>>> @@ -5,6 +5,8 @@
>>>>    
>>>>    #include "xe_migrate.h"
>>>>    
>>>> +#include <asm/fpu/api.h>
>>>> +#include <asm/cpufeature.h>
>>>>    #include <linux/bitfield.h>
>>>>    #include <linux/sizes.h>
>>>>    
>>>> @@ -33,6 +35,7 @@
>>>>    #include "xe_res_cursor.h"
>>>>    #include "xe_sa.h"
>>>>    #include "xe_sched_job.h"
>>>> +#include "xe_sriov_vf_ccs.h"
>>>>    #include "xe_sync.h"
>>>>    #include "xe_trace_bo.h"
>>>>    #include "xe_validation.h"
>>>> @@ -657,18 +660,68 @@ static void emit_pte(struct xe_migrate *m,
>>>>    	}
>>>>    }
>>>>    
>>>> -#define EMIT_COPY_CCS_DW 5
>>>> +/*
>>>> + * VF KMD registers two specialized LRCs with the GuC to handle save/restore
>>>> + * operations for CCS metadata on IGPU. The GuC executes these LRCAs during
>>>> + * VF state/restore operations.
>>>> + *
>>>> + * Each LRC contains a batch buffer pool that GuC submits to hardware during
>>>> + * VF state save/restore operations. Since these operations can occur
>>>> + * asynchronously at any time, we must ensure GPU instructions in the batch
>>>> + * buffer are written atomically to prevent corruption from incomplete writes.
>>>> + *
>>>> + * To guarantee atomic instruction writes, we use x86 SIMD instructions
>>>> + * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
>>>> + * sections. This prevents vCPU preemption during instruction generation,
>>>> + * ensuring complete GPU commands are written to the batch buffer.
>>>> + */
>>>> +
>>>> +static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
>>>> +{
>>>> +	xe_assert(xe, !IS_DGFX(xe));
>>>> +#ifdef CONFIG_X86
>>>> +	kernel_fpu_begin();
>>>> +	if (size == SZ_128) {
>>>> +		asm("vmovdqu (%0), %%xmm0\n"
>>>> +		    "vmovups %%xmm0,   (%1)\n"
>>>> +		    :: "r" (src), "r" (dst) : "memory");
>>>> +	} else if (size == SZ_256) {
>>>> +		asm("vmovdqu (%0), %%ymm0\n"
>>>> +		    "vmovups %%ymm0,   (%1)\n"
>>>> +		    :: "r" (src), "r" (dst) : "memory");
>>>> +	}
>>>> +	kernel_fpu_end();
>>>> +#endif
>>>> +}
>>>> +
>>>> +static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
>>>> +{
>>>> +	u32 instr_size = size * BITS_PER_BYTE;
>>>> +
>>>> +	xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
>>>> +
>>>> +	if (IS_VF_CCS_READY(gt_to_xe(gt))) {
>>>> +		xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
>>>> +		memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
>>>> +	} else {
>>>> +		memcpy(dst, src, size);
>>>> +	}
>>>> +}
>>>> +
>>>> +#define EMIT_COPY_CCS_DW 8
>>>>    static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
>>>>    			  u64 dst_ofs, bool dst_is_indirect,
>>>>    			  u64 src_ofs, bool src_is_indirect,
>>>>    			  u32 size)
>>>>    {
>>>> +	u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
>>>>    	struct xe_device *xe = gt_to_xe(gt);
>>>>    	u32 *cs = bb->cs + bb->len;
>>>>    	u32 num_ccs_blks;
>>>>    	u32 num_pages;
>>>>    	u32 ccs_copy_size;
>>>>    	u32 mocs;
>>>> +	u32 i = 0;
>>>>    
>>>>    	if (GRAPHICS_VERx100(xe) >= 2000) {
>>>>    		num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
>>>> @@ -686,15 +739,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
>>>>    		mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
>>>>    	}
>>>>    
>>>> -	*cs++ = XY_CTRL_SURF_COPY_BLT |
>>>> -		(src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
>>>> -		(dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
>>>> -		ccs_copy_size;
>>>> -	*cs++ = lower_32_bits(src_ofs);
>>>> -	*cs++ = upper_32_bits(src_ofs) | mocs;
>>>> -	*cs++ = lower_32_bits(dst_ofs);
>>>> -	*cs++ = upper_32_bits(dst_ofs) | mocs;
>>>> +	dw[i++] = XY_CTRL_SURF_COPY_BLT |
>>>> +		  (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
>>>> +		  (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
>>>> +		  ccs_copy_size;
>>>> +	dw[i++] = lower_32_bits(src_ofs);
>>>> +	dw[i++] = upper_32_bits(src_ofs) | mocs;
>>>> +	dw[i++] = lower_32_bits(dst_ofs);
>>>> +	dw[i++] = upper_32_bits(dst_ofs) | mocs;
>>>>    
>>>> +	/*
>>>> +	 * The CCS copy command is a 5-dword sequence. If the vCPU halts during
>>>> +	 * save/restore while this sequence is being issued, partial writes may trigger
>>>> +	 * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
>>>> +	 * write the sequence atomically.
>>>> +	 */
>>>> +	emit_atomic(gt, cs, dw, sizeof(dw));
>>>> +	cs += EMIT_COPY_CCS_DW;
>>>>    	bb->len = cs - bb->cs;
>>>>    }
>>>>    
>>>> @@ -1006,18 +1067,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
>>>>    	return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
>>>>    }
>>>>    
>>>> -static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
>>>> +/*
>>>> + * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
>>>> + * save/restore while this sequence is being issued, partial writes may
>>>> + * trigger page faults when saving iGPU CCS metadata. Use
>>>> + * emit_atomic() to write the sequence atomically.
>>>> + */
>>>> +#define EMIT_FLUSH_INVALIDATE_DW 4
>>>> +static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
>>>>    {
>>>>    	u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
>>>> +	u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
>>>> +
>>>> +	dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
>>>> +		      MI_FLUSH_IMM_DW | flags;
>>>> +	dw[j++] = lower_32_bits(addr);
>>>> +	dw[j++] = upper_32_bits(addr);
>>>> +	dw[j++] = MI_NOOP;
>>>>    
>>>> -	dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
>>>> -		  MI_FLUSH_IMM_DW | flags;
>>>> -	dw[i++] = lower_32_bits(addr);
>>>> -	dw[i++] = upper_32_bits(addr);
>>>> -	dw[i++] = MI_NOOP;
>>>> -	dw[i++] = MI_NOOP;
>>>> +	emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
>>>>    
>>>> -	return i;
>>>> +	return i + j;
>>>>    }
>>>>    
>>>>    /**
>>>> @@ -1062,7 +1132,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
>>>>    	/* Calculate Batch buffer size */
>>>>    	batch_size = 0;
>>>>    	while (size) {
>>>> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
>>>> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
>>>>    		u64 ccs_ofs, ccs_size;
>>>>    		u32 ccs_pt;
>>>>    
>>>> @@ -1103,7 +1173,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
>>>>    	 * sizes here again before copy command is emitted.
>>>>    	 */
>>>>    	while (size) {
>>>> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
>>>> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
>>>>    		u32 flush_flags = 0;
>>>>    		u64 ccs_ofs, ccs_size;
>>>>    		u32 ccs_pt;
>>>> @@ -1126,11 +1196,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
>>>>    
>>>>    		emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
>>>>    
>>>> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
>>>> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
>>>>    		flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
>>>>    						  src_L0_ofs, dst_is_pltt,
>>>>    						  src_L0, ccs_ofs, true);
>>>> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
>>>> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
>>>>    
>>>>    		size -= src_L0;
>>>>    	}
>>>> -- 
>>>> 2.51.0
>>>
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/3] drm/xe/migrate: Atomicize CCS copy command setup
  2025-10-17 16:29         ` K V P, Satyanarayana
@ 2025-10-17 16:41           ` Rodrigo Vivi
  2025-10-17 16:51           ` Ville Syrjälä
  1 sibling, 0 replies; 21+ messages in thread
From: Rodrigo Vivi @ 2025-10-17 16:41 UTC (permalink / raw)
  To: K V P, Satyanarayana
  Cc: Ville Syrjälä, intel-xe, Michal Wajdeczko,
	Matthew Brost, Matthew Auld, Matt Roper

On Fri, Oct 17, 2025 at 09:59:48PM +0530, K V P, Satyanarayana wrote:
> 
> 
> On 17-10-2025 20:56, Ville Syrjälä wrote:
> > On Fri, Oct 17, 2025 at 08:46:37PM +0530, K V P, Satyanarayana wrote:
> > > 
> > > 
> > > On 17-10-2025 19:57, Ville Syrjälä wrote:
> > > > On Fri, Oct 17, 2025 at 07:42:28PM +0530, Satyanarayana K V P wrote:
> > > > > The CCS copy command is a 5-dword sequence. If the vCPU halts during
> > > > > save/restore while this sequence is being programmed, partial writes may
> > > > > trigger page faults when saving IGPU CCS metadata. Use the VMOVDQU
> > > > > instruction to write the sequence atomically.
> > > > 
> > > > If this whole thing is so racy why don't you always add a new
> > > > BB_END after new commands, and only replace the previous BB_END
> > > > with NOOP _after_ the new commands have been fully written?
> > > > 
> > > We maintain a suballocator for batch buffer management, with size
> > > proportional to system memory (e.g., 16MB suballocator for 8GB SMEM).
> > > Batch buffers are dynamically allocated from this pool based on the
> > > number of active workloads. The entire suballocator region is submitted
> > > to hardware for CCS metadata copy operations.
> > > 
> > > We cannot insert BB_END commands after each individual instruction
> > > sequence because additional GPU instructions may be appended later.
> > 
> > You *overwrite* the previous BB_END after the new commands have been
> > appended.
> We do not know where the new BB allocation will be. It may not be sequential
> and every BO has a BB. BBs are allocated and freed so often based on BOs
> getting created and destroyed. So, we can't use that approach.

Satya, the thing is that this Ville's question proves that this commit
message and comment are still not good enough.

Ville, the thing is that this buffer here needs to be written entirely
to the memory. The execution of this buffer will start right after the
VM-pause-stop. You cannot stop the VM while you are writing this BB.
Adding BB_END might possibly ensure it doesn't hang, but it doesn't
ensure that this buffer is entirely executed. But I believe that
even the write of BB_END may parhaps be cut in the middle here.

The only way to block the vm-pause while you write the buffer is with
this AVX command.

So, the asm here seems to be the safest way.

> 
> -Satya.>
> > > Instead, a single BB_END marker is placed at the suballocator's end to
> > > terminate execution.
> > > 
> > > This patch ensures race-condition-safe CCS metadata save/restore
> > > operations by guaranteeing atomic writes to the batch buffer, preventing
> > > corruption regardless of when save/restore operations are triggered.
> > > 
> > > -Satya.>>
> > > > > Since VMOVDQU operates on 256-bit chunks, update EMIT_COPY_CCS_DW to emit
> > > > > 8 dwords instead of 5 dwords.
> > > > > 
> > > > > Update emit_flush_invalidate() to use VMOVDQU operating with 128-bit
> > > > > chunks.
> > > > > 
> > > > > Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> > > > > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > > > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > > > Cc: Matthew Auld <matthew.auld@intel.com>
> > > > > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > > Cc: Matt Roper <matthew.d.roper@intel.com>
> > > > > 
> > > > > ---
> > > > > V6 -> V7:
> > > > > - Added description explaining why to use assembly instructions for
> > > > > atomicity.
> > > > > - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
> > > > > - Include <asm/cpufeature.h> though checkpatch complains. With
> > > > > <linux/cpufeature.h> KUnit is throwing errors.
> > > > > 
> > > > > V5 -> V6:
> > > > > - Fixed review comments (Rodrigo)
> > > > > 
> > > > > V4 -> V5:
> > > > > - Fixed review comments. (Matt B)
> > > > > 
> > > > > V3 -> V4:
> > > > > - Fixed review comments. (Wajdeczko)
> > > > > - Fix issues reported by patchworks.
> > > > > 
> > > > > V2 -> V3:
> > > > > - Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
> > > > > - Updated emit_flush_invalidate() to use vmovdqu instruction.
> > > > > 
> > > > > V1 -> V2:
> > > > > - Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
> > > > >     (Auld, Matthew)
> > > > >     - Fix issues reported by patchworks.
> > > > > ---
> > > > >    drivers/gpu/drm/xe/xe_migrate.c | 112 ++++++++++++++++++++++++++------
> > > > >    1 file changed, 91 insertions(+), 21 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> > > > > index 3112c966c67d..e0be7396a0ab 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > > > @@ -5,6 +5,8 @@
> > > > >    #include "xe_migrate.h"
> > > > > +#include <asm/fpu/api.h>
> > > > > +#include <asm/cpufeature.h>
> > > > >    #include <linux/bitfield.h>
> > > > >    #include <linux/sizes.h>
> > > > > @@ -33,6 +35,7 @@
> > > > >    #include "xe_res_cursor.h"
> > > > >    #include "xe_sa.h"
> > > > >    #include "xe_sched_job.h"
> > > > > +#include "xe_sriov_vf_ccs.h"
> > > > >    #include "xe_sync.h"
> > > > >    #include "xe_trace_bo.h"
> > > > >    #include "xe_validation.h"
> > > > > @@ -657,18 +660,68 @@ static void emit_pte(struct xe_migrate *m,
> > > > >    	}
> > > > >    }
> > > > > -#define EMIT_COPY_CCS_DW 5
> > > > > +/*
> > > > > + * VF KMD registers two specialized LRCs with the GuC to handle save/restore
> > > > > + * operations for CCS metadata on IGPU. The GuC executes these LRCAs during
> > > > > + * VF state/restore operations.
> > > > > + *
> > > > > + * Each LRC contains a batch buffer pool that GuC submits to hardware during
> > > > > + * VF state save/restore operations. Since these operations can occur
> > > > > + * asynchronously at any time, we must ensure GPU instructions in the batch
> > > > > + * buffer are written atomically to prevent corruption from incomplete writes.
> > > > > + *
> > > > > + * To guarantee atomic instruction writes, we use x86 SIMD instructions
> > > > > + * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
> > > > > + * sections. This prevents vCPU preemption during instruction generation,
> > > > > + * ensuring complete GPU commands are written to the batch buffer.
> > > > > + */
> > > > > +
> > > > > +static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
> > > > > +{
> > > > > +	xe_assert(xe, !IS_DGFX(xe));
> > > > > +#ifdef CONFIG_X86
> > > > > +	kernel_fpu_begin();
> > > > > +	if (size == SZ_128) {
> > > > > +		asm("vmovdqu (%0), %%xmm0\n"
> > > > > +		    "vmovups %%xmm0,   (%1)\n"
> > > > > +		    :: "r" (src), "r" (dst) : "memory");
> > > > > +	} else if (size == SZ_256) {
> > > > > +		asm("vmovdqu (%0), %%ymm0\n"
> > > > > +		    "vmovups %%ymm0,   (%1)\n"
> > > > > +		    :: "r" (src), "r" (dst) : "memory");
> > > > > +	}
> > > > > +	kernel_fpu_end();
> > > > > +#endif
> > > > > +}
> > > > > +
> > > > > +static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
> > > > > +{
> > > > > +	u32 instr_size = size * BITS_PER_BYTE;
> > > > > +
> > > > > +	xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
> > > > > +
> > > > > +	if (IS_VF_CCS_READY(gt_to_xe(gt))) {
> > > > > +		xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
> > > > > +		memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
> > > > > +	} else {
> > > > > +		memcpy(dst, src, size);
> > > > > +	}
> > > > > +}
> > > > > +
> > > > > +#define EMIT_COPY_CCS_DW 8
> > > > >    static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > > > >    			  u64 dst_ofs, bool dst_is_indirect,
> > > > >    			  u64 src_ofs, bool src_is_indirect,
> > > > >    			  u32 size)
> > > > >    {
> > > > > +	u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
> > > > >    	struct xe_device *xe = gt_to_xe(gt);
> > > > >    	u32 *cs = bb->cs + bb->len;
> > > > >    	u32 num_ccs_blks;
> > > > >    	u32 num_pages;
> > > > >    	u32 ccs_copy_size;
> > > > >    	u32 mocs;
> > > > > +	u32 i = 0;
> > > > >    	if (GRAPHICS_VERx100(xe) >= 2000) {
> > > > >    		num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
> > > > > @@ -686,15 +739,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > > > >    		mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
> > > > >    	}
> > > > > -	*cs++ = XY_CTRL_SURF_COPY_BLT |
> > > > > -		(src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > > > > -		(dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > > > > -		ccs_copy_size;
> > > > > -	*cs++ = lower_32_bits(src_ofs);
> > > > > -	*cs++ = upper_32_bits(src_ofs) | mocs;
> > > > > -	*cs++ = lower_32_bits(dst_ofs);
> > > > > -	*cs++ = upper_32_bits(dst_ofs) | mocs;
> > > > > +	dw[i++] = XY_CTRL_SURF_COPY_BLT |
> > > > > +		  (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > > > > +		  (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > > > > +		  ccs_copy_size;
> > > > > +	dw[i++] = lower_32_bits(src_ofs);
> > > > > +	dw[i++] = upper_32_bits(src_ofs) | mocs;
> > > > > +	dw[i++] = lower_32_bits(dst_ofs);
> > > > > +	dw[i++] = upper_32_bits(dst_ofs) | mocs;
> > > > > +	/*
> > > > > +	 * The CCS copy command is a 5-dword sequence. If the vCPU halts during
> > > > > +	 * save/restore while this sequence is being issued, partial writes may trigger
> > > > > +	 * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
> > > > > +	 * write the sequence atomically.
> > > > > +	 */
> > > > > +	emit_atomic(gt, cs, dw, sizeof(dw));
> > > > > +	cs += EMIT_COPY_CCS_DW;
> > > > >    	bb->len = cs - bb->cs;
> > > > >    }
> > > > > @@ -1006,18 +1067,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
> > > > >    	return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
> > > > >    }
> > > > > -static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
> > > > > +/*
> > > > > + * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
> > > > > + * save/restore while this sequence is being issued, partial writes may
> > > > > + * trigger page faults when saving iGPU CCS metadata. Use
> > > > > + * emit_atomic() to write the sequence atomically.
> > > > > + */
> > > > > +#define EMIT_FLUSH_INVALIDATE_DW 4
> > > > > +static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
> > > > >    {
> > > > >    	u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
> > > > > +	u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
> > > > > +
> > > > > +	dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > > > > +		      MI_FLUSH_IMM_DW | flags;
> > > > > +	dw[j++] = lower_32_bits(addr);
> > > > > +	dw[j++] = upper_32_bits(addr);
> > > > > +	dw[j++] = MI_NOOP;
> > > > > -	dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > > > > -		  MI_FLUSH_IMM_DW | flags;
> > > > > -	dw[i++] = lower_32_bits(addr);
> > > > > -	dw[i++] = upper_32_bits(addr);
> > > > > -	dw[i++] = MI_NOOP;
> > > > > -	dw[i++] = MI_NOOP;
> > > > > +	emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
> > > > > -	return i;
> > > > > +	return i + j;
> > > > >    }
> > > > >    /**
> > > > > @@ -1062,7 +1132,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > > >    	/* Calculate Batch buffer size */
> > > > >    	batch_size = 0;
> > > > >    	while (size) {
> > > > > -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > > > > +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > > > >    		u64 ccs_ofs, ccs_size;
> > > > >    		u32 ccs_pt;
> > > > > @@ -1103,7 +1173,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > > >    	 * sizes here again before copy command is emitted.
> > > > >    	 */
> > > > >    	while (size) {
> > > > > -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > > > > +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > > > >    		u32 flush_flags = 0;
> > > > >    		u64 ccs_ofs, ccs_size;
> > > > >    		u32 ccs_pt;
> > > > > @@ -1126,11 +1196,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > > >    		emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
> > > > > -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > > > > +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> > > > >    		flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
> > > > >    						  src_L0_ofs, dst_is_pltt,
> > > > >    						  src_L0, ccs_ofs, true);
> > > > > -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > > > > +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> > > > >    		size -= src_L0;
> > > > >    	}
> > > > > -- 
> > > > > 2.51.0
> > > > 
> > 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/3] drm/xe/migrate: Atomicize CCS copy command setup
  2025-10-17 16:29         ` K V P, Satyanarayana
  2025-10-17 16:41           ` Rodrigo Vivi
@ 2025-10-17 16:51           ` Ville Syrjälä
  2025-10-17 18:21             ` Rodrigo Vivi
  1 sibling, 1 reply; 21+ messages in thread
From: Ville Syrjälä @ 2025-10-17 16:51 UTC (permalink / raw)
  To: K V P, Satyanarayana
  Cc: intel-xe, Michal Wajdeczko, Matthew Brost, Matthew Auld,
	Rodrigo Vivi, Matt Roper

On Fri, Oct 17, 2025 at 09:59:48PM +0530, K V P, Satyanarayana wrote:
> 
> 
> On 17-10-2025 20:56, Ville Syrjälä wrote:
> > On Fri, Oct 17, 2025 at 08:46:37PM +0530, K V P, Satyanarayana wrote:
> >>
> >>
> >> On 17-10-2025 19:57, Ville Syrjälä wrote:
> >>> On Fri, Oct 17, 2025 at 07:42:28PM +0530, Satyanarayana K V P wrote:
> >>>> The CCS copy command is a 5-dword sequence. If the vCPU halts during
> >>>> save/restore while this sequence is being programmed, partial writes may
> >>>> trigger page faults when saving IGPU CCS metadata. Use the VMOVDQU
> >>>> instruction to write the sequence atomically.
> >>>
> >>> If this whole thing is so racy why don't you always add a new
> >>> BB_END after new commands, and only replace the previous BB_END
> >>> with NOOP _after_ the new commands have been fully written?
> >>>
> >> We maintain a suballocator for batch buffer management, with size
> >> proportional to system memory (e.g., 16MB suballocator for 8GB SMEM).
> >> Batch buffers are dynamically allocated from this pool based on the
> >> number of active workloads. The entire suballocator region is submitted
> >> to hardware for CCS metadata copy operations.
> >>
> >> We cannot insert BB_END commands after each individual instruction
> >> sequence because additional GPU instructions may be appended later.
> > 
> > You *overwrite* the previous BB_END after the new commands have been
> > appended.
> We do not know where the new BB allocation will be. It may not be 
> sequential and every BO has a BB. BBs are allocated and freed so often 
> based on BOs getting created and destroyed. So, we can't use that approach.

Hmm, could perhaps use second level batches then. Each BO would gets
its own second level batch, and the first level would just call them
in sequence. Or is this already running as a second level batch?

It might also be getting a bit complicated I guess, but at least it
wouldn't have all obvious problems of the SIMD stuff:
- looks like it will explode on non-AVX capable x86
- will be broken on other arches until someone implements the equivalent
  code (assuming the arch has such an atomic copy instruction
  and supports in kernel SIMD stuff sufficiently to use it)

> 
> -Satya.>
> >> Instead, a single BB_END marker is placed at the suballocator's end to
> >> terminate execution.
> >>
> >> This patch ensures race-condition-safe CCS metadata save/restore
> >> operations by guaranteeing atomic writes to the batch buffer, preventing
> >> corruption regardless of when save/restore operations are triggered.
> >>
> >> -Satya.>>
> >>>> Since VMOVDQU operates on 256-bit chunks, update EMIT_COPY_CCS_DW to emit
> >>>> 8 dwords instead of 5 dwords.
> >>>>
> >>>> Update emit_flush_invalidate() to use VMOVDQU operating with 128-bit
> >>>> chunks.
> >>>>
> >>>> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> >>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> >>>> Cc: Matthew Brost <matthew.brost@intel.com>
> >>>> Cc: Matthew Auld <matthew.auld@intel.com>
> >>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> >>>> Cc: Matt Roper <matthew.d.roper@intel.com>
> >>>>
> >>>> ---
> >>>> V6 -> V7:
> >>>> - Added description explaining why to use assembly instructions for
> >>>> atomicity.
> >>>> - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
> >>>> - Include <asm/cpufeature.h> though checkpatch complains. With
> >>>> <linux/cpufeature.h> KUnit is throwing errors.
> >>>>
> >>>> V5 -> V6:
> >>>> - Fixed review comments (Rodrigo)
> >>>>
> >>>> V4 -> V5:
> >>>> - Fixed review comments. (Matt B)
> >>>>
> >>>> V3 -> V4:
> >>>> - Fixed review comments. (Wajdeczko)
> >>>> - Fix issues reported by patchworks.
> >>>>
> >>>> V2 -> V3:
> >>>> - Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
> >>>> - Updated emit_flush_invalidate() to use vmovdqu instruction.
> >>>>
> >>>> V1 -> V2:
> >>>> - Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
> >>>>     (Auld, Matthew)
> >>>>     - Fix issues reported by patchworks.
> >>>> ---
> >>>>    drivers/gpu/drm/xe/xe_migrate.c | 112 ++++++++++++++++++++++++++------
> >>>>    1 file changed, 91 insertions(+), 21 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> >>>> index 3112c966c67d..e0be7396a0ab 100644
> >>>> --- a/drivers/gpu/drm/xe/xe_migrate.c
> >>>> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> >>>> @@ -5,6 +5,8 @@
> >>>>    
> >>>>    #include "xe_migrate.h"
> >>>>    
> >>>> +#include <asm/fpu/api.h>
> >>>> +#include <asm/cpufeature.h>
> >>>>    #include <linux/bitfield.h>
> >>>>    #include <linux/sizes.h>
> >>>>    
> >>>> @@ -33,6 +35,7 @@
> >>>>    #include "xe_res_cursor.h"
> >>>>    #include "xe_sa.h"
> >>>>    #include "xe_sched_job.h"
> >>>> +#include "xe_sriov_vf_ccs.h"
> >>>>    #include "xe_sync.h"
> >>>>    #include "xe_trace_bo.h"
> >>>>    #include "xe_validation.h"
> >>>> @@ -657,18 +660,68 @@ static void emit_pte(struct xe_migrate *m,
> >>>>    	}
> >>>>    }
> >>>>    
> >>>> -#define EMIT_COPY_CCS_DW 5
> >>>> +/*
> >>>> + * VF KMD registers two specialized LRCs with the GuC to handle save/restore
> >>>> + * operations for CCS metadata on IGPU. The GuC executes these LRCAs during
> >>>> + * VF state/restore operations.
> >>>> + *
> >>>> + * Each LRC contains a batch buffer pool that GuC submits to hardware during
> >>>> + * VF state save/restore operations. Since these operations can occur
> >>>> + * asynchronously at any time, we must ensure GPU instructions in the batch
> >>>> + * buffer are written atomically to prevent corruption from incomplete writes.
> >>>> + *
> >>>> + * To guarantee atomic instruction writes, we use x86 SIMD instructions
> >>>> + * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
> >>>> + * sections. This prevents vCPU preemption during instruction generation,
> >>>> + * ensuring complete GPU commands are written to the batch buffer.
> >>>> + */
> >>>> +
> >>>> +static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
> >>>> +{
> >>>> +	xe_assert(xe, !IS_DGFX(xe));
> >>>> +#ifdef CONFIG_X86
> >>>> +	kernel_fpu_begin();
> >>>> +	if (size == SZ_128) {
> >>>> +		asm("vmovdqu (%0), %%xmm0\n"
> >>>> +		    "vmovups %%xmm0,   (%1)\n"
> >>>> +		    :: "r" (src), "r" (dst) : "memory");
> >>>> +	} else if (size == SZ_256) {
> >>>> +		asm("vmovdqu (%0), %%ymm0\n"
> >>>> +		    "vmovups %%ymm0,   (%1)\n"
> >>>> +		    :: "r" (src), "r" (dst) : "memory");
> >>>> +	}
> >>>> +	kernel_fpu_end();
> >>>> +#endif
> >>>> +}
> >>>> +
> >>>> +static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
> >>>> +{
> >>>> +	u32 instr_size = size * BITS_PER_BYTE;
> >>>> +
> >>>> +	xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
> >>>> +
> >>>> +	if (IS_VF_CCS_READY(gt_to_xe(gt))) {
> >>>> +		xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
> >>>> +		memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
> >>>> +	} else {
> >>>> +		memcpy(dst, src, size);
> >>>> +	}
> >>>> +}
> >>>> +
> >>>> +#define EMIT_COPY_CCS_DW 8
> >>>>    static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> >>>>    			  u64 dst_ofs, bool dst_is_indirect,
> >>>>    			  u64 src_ofs, bool src_is_indirect,
> >>>>    			  u32 size)
> >>>>    {
> >>>> +	u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
> >>>>    	struct xe_device *xe = gt_to_xe(gt);
> >>>>    	u32 *cs = bb->cs + bb->len;
> >>>>    	u32 num_ccs_blks;
> >>>>    	u32 num_pages;
> >>>>    	u32 ccs_copy_size;
> >>>>    	u32 mocs;
> >>>> +	u32 i = 0;
> >>>>    
> >>>>    	if (GRAPHICS_VERx100(xe) >= 2000) {
> >>>>    		num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
> >>>> @@ -686,15 +739,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> >>>>    		mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
> >>>>    	}
> >>>>    
> >>>> -	*cs++ = XY_CTRL_SURF_COPY_BLT |
> >>>> -		(src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> >>>> -		(dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> >>>> -		ccs_copy_size;
> >>>> -	*cs++ = lower_32_bits(src_ofs);
> >>>> -	*cs++ = upper_32_bits(src_ofs) | mocs;
> >>>> -	*cs++ = lower_32_bits(dst_ofs);
> >>>> -	*cs++ = upper_32_bits(dst_ofs) | mocs;
> >>>> +	dw[i++] = XY_CTRL_SURF_COPY_BLT |
> >>>> +		  (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> >>>> +		  (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> >>>> +		  ccs_copy_size;
> >>>> +	dw[i++] = lower_32_bits(src_ofs);
> >>>> +	dw[i++] = upper_32_bits(src_ofs) | mocs;
> >>>> +	dw[i++] = lower_32_bits(dst_ofs);
> >>>> +	dw[i++] = upper_32_bits(dst_ofs) | mocs;
> >>>>    
> >>>> +	/*
> >>>> +	 * The CCS copy command is a 5-dword sequence. If the vCPU halts during
> >>>> +	 * save/restore while this sequence is being issued, partial writes may trigger
> >>>> +	 * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
> >>>> +	 * write the sequence atomically.
> >>>> +	 */
> >>>> +	emit_atomic(gt, cs, dw, sizeof(dw));
> >>>> +	cs += EMIT_COPY_CCS_DW;
> >>>>    	bb->len = cs - bb->cs;
> >>>>    }
> >>>>    
> >>>> @@ -1006,18 +1067,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
> >>>>    	return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
> >>>>    }
> >>>>    
> >>>> -static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
> >>>> +/*
> >>>> + * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
> >>>> + * save/restore while this sequence is being issued, partial writes may
> >>>> + * trigger page faults when saving iGPU CCS metadata. Use
> >>>> + * emit_atomic() to write the sequence atomically.
> >>>> + */
> >>>> +#define EMIT_FLUSH_INVALIDATE_DW 4
> >>>> +static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
> >>>>    {
> >>>>    	u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
> >>>> +	u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
> >>>> +
> >>>> +	dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> >>>> +		      MI_FLUSH_IMM_DW | flags;
> >>>> +	dw[j++] = lower_32_bits(addr);
> >>>> +	dw[j++] = upper_32_bits(addr);
> >>>> +	dw[j++] = MI_NOOP;
> >>>>    
> >>>> -	dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> >>>> -		  MI_FLUSH_IMM_DW | flags;
> >>>> -	dw[i++] = lower_32_bits(addr);
> >>>> -	dw[i++] = upper_32_bits(addr);
> >>>> -	dw[i++] = MI_NOOP;
> >>>> -	dw[i++] = MI_NOOP;
> >>>> +	emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
> >>>>    
> >>>> -	return i;
> >>>> +	return i + j;
> >>>>    }
> >>>>    
> >>>>    /**
> >>>> @@ -1062,7 +1132,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> >>>>    	/* Calculate Batch buffer size */
> >>>>    	batch_size = 0;
> >>>>    	while (size) {
> >>>> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> >>>> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> >>>>    		u64 ccs_ofs, ccs_size;
> >>>>    		u32 ccs_pt;
> >>>>    
> >>>> @@ -1103,7 +1173,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> >>>>    	 * sizes here again before copy command is emitted.
> >>>>    	 */
> >>>>    	while (size) {
> >>>> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> >>>> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> >>>>    		u32 flush_flags = 0;
> >>>>    		u64 ccs_ofs, ccs_size;
> >>>>    		u32 ccs_pt;
> >>>> @@ -1126,11 +1196,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> >>>>    
> >>>>    		emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
> >>>>    
> >>>> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> >>>> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> >>>>    		flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
> >>>>    						  src_L0_ofs, dst_is_pltt,
> >>>>    						  src_L0, ccs_ofs, true);
> >>>> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> >>>> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> >>>>    
> >>>>    		size -= src_L0;
> >>>>    	}
> >>>> -- 
> >>>> 2.51.0
> >>>
> > 

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/3] drm/xe/migrate: Atomicize CCS copy command setup
  2025-10-17 16:51           ` Ville Syrjälä
@ 2025-10-17 18:21             ` Rodrigo Vivi
  2025-10-17 22:35               ` Matthew Brost
  2025-10-17 22:35               ` Matt Roper
  0 siblings, 2 replies; 21+ messages in thread
From: Rodrigo Vivi @ 2025-10-17 18:21 UTC (permalink / raw)
  To: Ville Syrjälä, Matthew Brost
  Cc: K V P, Satyanarayana, intel-xe, Michal Wajdeczko, Matthew Brost,
	Matthew Auld, Matt Roper

On Fri, Oct 17, 2025 at 07:51:47PM +0300, Ville Syrjälä wrote:
> On Fri, Oct 17, 2025 at 09:59:48PM +0530, K V P, Satyanarayana wrote:
> > 
> > 
> > On 17-10-2025 20:56, Ville Syrjälä wrote:
> > > On Fri, Oct 17, 2025 at 08:46:37PM +0530, K V P, Satyanarayana wrote:
> > >>
> > >>
> > >> On 17-10-2025 19:57, Ville Syrjälä wrote:
> > >>> On Fri, Oct 17, 2025 at 07:42:28PM +0530, Satyanarayana K V P wrote:
> > >>>> The CCS copy command is a 5-dword sequence. If the vCPU halts during
> > >>>> save/restore while this sequence is being programmed, partial writes may
> > >>>> trigger page faults when saving IGPU CCS metadata. Use the VMOVDQU
> > >>>> instruction to write the sequence atomically.
> > >>>
> > >>> If this whole thing is so racy why don't you always add a new
> > >>> BB_END after new commands, and only replace the previous BB_END
> > >>> with NOOP _after_ the new commands have been fully written?
> > >>>
> > >> We maintain a suballocator for batch buffer management, with size
> > >> proportional to system memory (e.g., 16MB suballocator for 8GB SMEM).
> > >> Batch buffers are dynamically allocated from this pool based on the
> > >> number of active workloads. The entire suballocator region is submitted
> > >> to hardware for CCS metadata copy operations.
> > >>
> > >> We cannot insert BB_END commands after each individual instruction
> > >> sequence because additional GPU instructions may be appended later.
> > > 
> > > You *overwrite* the previous BB_END after the new commands have been
> > > appended.
> > We do not know where the new BB allocation will be. It may not be 
> > sequential and every BO has a BB. BBs are allocated and freed so often 
> > based on BOs getting created and destroyed. So, we can't use that approach.
> 
> Hmm, could perhaps use second level batches then. Each BO would gets
> its own second level batch, and the first level would just call them
> in sequence. Or is this already running as a second level batch?

This I'm not sure...

Matt, do you know?

> 
> It might also be getting a bit complicated I guess, but at least it
> wouldn't have all obvious problems of the SIMD stuff:
> - looks like it will explode on non-AVX capable x86
> - will be broken on other arches until someone implements the equivalent
>   code (assuming the arch has such an atomic copy instruction
>   and supports in kernel SIMD stuff sufficiently to use it)

This is Pantherlake only. And the reason why I asked to add a
check with error/warn for IS_DGFX()...

which by the way is an assert... I still don't believe it is enough.
I believe a return with warn_on seems more appropriate to really
never try to run that code in case of a big future mistake.

> 
> > 
> > -Satya.>
> > >> Instead, a single BB_END marker is placed at the suballocator's end to
> > >> terminate execution.
> > >>
> > >> This patch ensures race-condition-safe CCS metadata save/restore
> > >> operations by guaranteeing atomic writes to the batch buffer, preventing
> > >> corruption regardless of when save/restore operations are triggered.
> > >>
> > >> -Satya.>>
> > >>>> Since VMOVDQU operates on 256-bit chunks, update EMIT_COPY_CCS_DW to emit
> > >>>> 8 dwords instead of 5 dwords.
> > >>>>
> > >>>> Update emit_flush_invalidate() to use VMOVDQU operating with 128-bit
> > >>>> chunks.
> > >>>>
> > >>>> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> > >>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > >>>> Cc: Matthew Brost <matthew.brost@intel.com>
> > >>>> Cc: Matthew Auld <matthew.auld@intel.com>
> > >>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > >>>> Cc: Matt Roper <matthew.d.roper@intel.com>
> > >>>>
> > >>>> ---
> > >>>> V6 -> V7:
> > >>>> - Added description explaining why to use assembly instructions for
> > >>>> atomicity.
> > >>>> - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
> > >>>> - Include <asm/cpufeature.h> though checkpatch complains. With
> > >>>> <linux/cpufeature.h> KUnit is throwing errors.
> > >>>>
> > >>>> V5 -> V6:
> > >>>> - Fixed review comments (Rodrigo)
> > >>>>
> > >>>> V4 -> V5:
> > >>>> - Fixed review comments. (Matt B)
> > >>>>
> > >>>> V3 -> V4:
> > >>>> - Fixed review comments. (Wajdeczko)
> > >>>> - Fix issues reported by patchworks.
> > >>>>
> > >>>> V2 -> V3:
> > >>>> - Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
> > >>>> - Updated emit_flush_invalidate() to use vmovdqu instruction.
> > >>>>
> > >>>> V1 -> V2:
> > >>>> - Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
> > >>>>     (Auld, Matthew)
> > >>>>     - Fix issues reported by patchworks.
> > >>>> ---
> > >>>>    drivers/gpu/drm/xe/xe_migrate.c | 112 ++++++++++++++++++++++++++------
> > >>>>    1 file changed, 91 insertions(+), 21 deletions(-)
> > >>>>
> > >>>> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> > >>>> index 3112c966c67d..e0be7396a0ab 100644
> > >>>> --- a/drivers/gpu/drm/xe/xe_migrate.c
> > >>>> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > >>>> @@ -5,6 +5,8 @@
> > >>>>    
> > >>>>    #include "xe_migrate.h"
> > >>>>    
> > >>>> +#include <asm/fpu/api.h>
> > >>>> +#include <asm/cpufeature.h>
> > >>>>    #include <linux/bitfield.h>
> > >>>>    #include <linux/sizes.h>
> > >>>>    
> > >>>> @@ -33,6 +35,7 @@
> > >>>>    #include "xe_res_cursor.h"
> > >>>>    #include "xe_sa.h"
> > >>>>    #include "xe_sched_job.h"
> > >>>> +#include "xe_sriov_vf_ccs.h"
> > >>>>    #include "xe_sync.h"
> > >>>>    #include "xe_trace_bo.h"
> > >>>>    #include "xe_validation.h"
> > >>>> @@ -657,18 +660,68 @@ static void emit_pte(struct xe_migrate *m,
> > >>>>    	}
> > >>>>    }
> > >>>>    
> > >>>> -#define EMIT_COPY_CCS_DW 5
> > >>>> +/*
> > >>>> + * VF KMD registers two specialized LRCs with the GuC to handle save/restore
> > >>>> + * operations for CCS metadata on IGPU. The GuC executes these LRCAs during
> > >>>> + * VF state/restore operations.
> > >>>> + *
> > >>>> + * Each LRC contains a batch buffer pool that GuC submits to hardware during
> > >>>> + * VF state save/restore operations. Since these operations can occur
> > >>>> + * asynchronously at any time, we must ensure GPU instructions in the batch
> > >>>> + * buffer are written atomically to prevent corruption from incomplete writes.
> > >>>> + *
> > >>>> + * To guarantee atomic instruction writes, we use x86 SIMD instructions
> > >>>> + * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
> > >>>> + * sections. This prevents vCPU preemption during instruction generation,
> > >>>> + * ensuring complete GPU commands are written to the batch buffer.
> > >>>> + */
> > >>>> +
> > >>>> +static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
> > >>>> +{
> > >>>> +	xe_assert(xe, !IS_DGFX(xe));
> > >>>> +#ifdef CONFIG_X86
> > >>>> +	kernel_fpu_begin();
> > >>>> +	if (size == SZ_128) {
> > >>>> +		asm("vmovdqu (%0), %%xmm0\n"
> > >>>> +		    "vmovups %%xmm0,   (%1)\n"
> > >>>> +		    :: "r" (src), "r" (dst) : "memory");
> > >>>> +	} else if (size == SZ_256) {
> > >>>> +		asm("vmovdqu (%0), %%ymm0\n"
> > >>>> +		    "vmovups %%ymm0,   (%1)\n"
> > >>>> +		    :: "r" (src), "r" (dst) : "memory");
> > >>>> +	}
> > >>>> +	kernel_fpu_end();
> > >>>> +#endif
> > >>>> +}
> > >>>> +
> > >>>> +static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
> > >>>> +{
> > >>>> +	u32 instr_size = size * BITS_PER_BYTE;
> > >>>> +
> > >>>> +	xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
> > >>>> +
> > >>>> +	if (IS_VF_CCS_READY(gt_to_xe(gt))) {
> > >>>> +		xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
> > >>>> +		memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
> > >>>> +	} else {
> > >>>> +		memcpy(dst, src, size);
> > >>>> +	}
> > >>>> +}
> > >>>> +
> > >>>> +#define EMIT_COPY_CCS_DW 8
> > >>>>    static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > >>>>    			  u64 dst_ofs, bool dst_is_indirect,
> > >>>>    			  u64 src_ofs, bool src_is_indirect,
> > >>>>    			  u32 size)
> > >>>>    {
> > >>>> +	u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
> > >>>>    	struct xe_device *xe = gt_to_xe(gt);
> > >>>>    	u32 *cs = bb->cs + bb->len;
> > >>>>    	u32 num_ccs_blks;
> > >>>>    	u32 num_pages;
> > >>>>    	u32 ccs_copy_size;
> > >>>>    	u32 mocs;
> > >>>> +	u32 i = 0;
> > >>>>    
> > >>>>    	if (GRAPHICS_VERx100(xe) >= 2000) {
> > >>>>    		num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
> > >>>> @@ -686,15 +739,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > >>>>    		mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
> > >>>>    	}
> > >>>>    
> > >>>> -	*cs++ = XY_CTRL_SURF_COPY_BLT |
> > >>>> -		(src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > >>>> -		(dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > >>>> -		ccs_copy_size;
> > >>>> -	*cs++ = lower_32_bits(src_ofs);
> > >>>> -	*cs++ = upper_32_bits(src_ofs) | mocs;
> > >>>> -	*cs++ = lower_32_bits(dst_ofs);
> > >>>> -	*cs++ = upper_32_bits(dst_ofs) | mocs;
> > >>>> +	dw[i++] = XY_CTRL_SURF_COPY_BLT |
> > >>>> +		  (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > >>>> +		  (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > >>>> +		  ccs_copy_size;
> > >>>> +	dw[i++] = lower_32_bits(src_ofs);
> > >>>> +	dw[i++] = upper_32_bits(src_ofs) | mocs;
> > >>>> +	dw[i++] = lower_32_bits(dst_ofs);
> > >>>> +	dw[i++] = upper_32_bits(dst_ofs) | mocs;
> > >>>>    
> > >>>> +	/*
> > >>>> +	 * The CCS copy command is a 5-dword sequence. If the vCPU halts during
> > >>>> +	 * save/restore while this sequence is being issued, partial writes may trigger
> > >>>> +	 * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
> > >>>> +	 * write the sequence atomically.
> > >>>> +	 */
> > >>>> +	emit_atomic(gt, cs, dw, sizeof(dw));
> > >>>> +	cs += EMIT_COPY_CCS_DW;
> > >>>>    	bb->len = cs - bb->cs;
> > >>>>    }
> > >>>>    
> > >>>> @@ -1006,18 +1067,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
> > >>>>    	return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
> > >>>>    }
> > >>>>    
> > >>>> -static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
> > >>>> +/*
> > >>>> + * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
> > >>>> + * save/restore while this sequence is being issued, partial writes may
> > >>>> + * trigger page faults when saving iGPU CCS metadata. Use
> > >>>> + * emit_atomic() to write the sequence atomically.
> > >>>> + */
> > >>>> +#define EMIT_FLUSH_INVALIDATE_DW 4
> > >>>> +static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
> > >>>>    {
> > >>>>    	u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
> > >>>> +	u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
> > >>>> +
> > >>>> +	dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > >>>> +		      MI_FLUSH_IMM_DW | flags;
> > >>>> +	dw[j++] = lower_32_bits(addr);
> > >>>> +	dw[j++] = upper_32_bits(addr);
> > >>>> +	dw[j++] = MI_NOOP;
> > >>>>    
> > >>>> -	dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > >>>> -		  MI_FLUSH_IMM_DW | flags;
> > >>>> -	dw[i++] = lower_32_bits(addr);
> > >>>> -	dw[i++] = upper_32_bits(addr);
> > >>>> -	dw[i++] = MI_NOOP;
> > >>>> -	dw[i++] = MI_NOOP;
> > >>>> +	emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
> > >>>>    
> > >>>> -	return i;
> > >>>> +	return i + j;
> > >>>>    }
> > >>>>    
> > >>>>    /**
> > >>>> @@ -1062,7 +1132,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > >>>>    	/* Calculate Batch buffer size */
> > >>>>    	batch_size = 0;
> > >>>>    	while (size) {
> > >>>> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > >>>> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > >>>>    		u64 ccs_ofs, ccs_size;
> > >>>>    		u32 ccs_pt;
> > >>>>    
> > >>>> @@ -1103,7 +1173,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > >>>>    	 * sizes here again before copy command is emitted.
> > >>>>    	 */
> > >>>>    	while (size) {
> > >>>> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > >>>> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > >>>>    		u32 flush_flags = 0;
> > >>>>    		u64 ccs_ofs, ccs_size;
> > >>>>    		u32 ccs_pt;
> > >>>> @@ -1126,11 +1196,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > >>>>    
> > >>>>    		emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
> > >>>>    
> > >>>> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > >>>> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> > >>>>    		flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
> > >>>>    						  src_L0_ofs, dst_is_pltt,
> > >>>>    						  src_L0, ccs_ofs, true);
> > >>>> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > >>>> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> > >>>>    
> > >>>>    		size -= src_L0;
> > >>>>    	}
> > >>>> -- 
> > >>>> 2.51.0
> > >>>
> > > 
> 
> -- 
> Ville Syrjälä
> Intel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/3] drm/xe/migrate: Atomicize CCS copy command setup
  2025-10-17 18:21             ` Rodrigo Vivi
@ 2025-10-17 22:35               ` Matthew Brost
  2025-10-17 22:45                 ` Matt Roper
  2025-10-17 22:35               ` Matt Roper
  1 sibling, 1 reply; 21+ messages in thread
From: Matthew Brost @ 2025-10-17 22:35 UTC (permalink / raw)
  To: Rodrigo Vivi
  Cc: Ville Syrjälä, K V P, Satyanarayana, intel-xe,
	Michal Wajdeczko, Matthew Auld, Matt Roper

On Fri, Oct 17, 2025 at 02:21:59PM -0400, Rodrigo Vivi wrote:
> On Fri, Oct 17, 2025 at 07:51:47PM +0300, Ville Syrjälä wrote:
> > On Fri, Oct 17, 2025 at 09:59:48PM +0530, K V P, Satyanarayana wrote:
> > > 
> > > 
> > > On 17-10-2025 20:56, Ville Syrjälä wrote:
> > > > On Fri, Oct 17, 2025 at 08:46:37PM +0530, K V P, Satyanarayana wrote:
> > > >>
> > > >>
> > > >> On 17-10-2025 19:57, Ville Syrjälä wrote:
> > > >>> On Fri, Oct 17, 2025 at 07:42:28PM +0530, Satyanarayana K V P wrote:
> > > >>>> The CCS copy command is a 5-dword sequence. If the vCPU halts during
> > > >>>> save/restore while this sequence is being programmed, partial writes may
> > > >>>> trigger page faults when saving IGPU CCS metadata. Use the VMOVDQU
> > > >>>> instruction to write the sequence atomically.
> > > >>>
> > > >>> If this whole thing is so racy why don't you always add a new
> > > >>> BB_END after new commands, and only replace the previous BB_END
> > > >>> with NOOP _after_ the new commands have been fully written?
> > > >>>
> > > >> We maintain a suballocator for batch buffer management, with size
> > > >> proportional to system memory (e.g., 16MB suballocator for 8GB SMEM).
> > > >> Batch buffers are dynamically allocated from this pool based on the
> > > >> number of active workloads. The entire suballocator region is submitted
> > > >> to hardware for CCS metadata copy operations.
> > > >>
> > > >> We cannot insert BB_END commands after each individual instruction
> > > >> sequence because additional GPU instructions may be appended later.
> > > > 
> > > > You *overwrite* the previous BB_END after the new commands have been
> > > > appended.
> > > We do not know where the new BB allocation will be. It may not be 
> > > sequential and every BO has a BB. BBs are allocated and freed so often 
> > > based on BOs getting created and destroyed. So, we can't use that approach.
> > 
> > Hmm, could perhaps use second level batches then. Each BO would gets
> > its own second level batch, and the first level would just call them
> > in sequence. Or is this already running as a second level batch?
> 
> This I'm not sure...
> 

Embarrassingly, I’m not exactly sure what “second-level batch” means.
What I can tell is that this is a batch buffer (BB) executed from a
single BB start command in the ring.

> Matt, do you know?
> 

I actually thought about this, and I believe it could be made to work.
However, we would need two BOs and suballocators. The first BO would
contain only jump-to-second-level batch instructions, while the second
BO would contain the CCS copy commands. Even in this mode, the
jump-to-second-level batch instruction would have to be written using
AVX instructions. Maybe this approach is better, but it would also
require a significantly larger rewrite.

> > 
> > It might also be getting a bit complicated I guess, but at least it
> > wouldn't have all obvious problems of the SIMD stuff:
> > - looks like it will explode on non-AVX capable x86
> > - will be broken on other arches until someone implements the equivalent
> >   code (assuming the arch has such an atomic copy instruction
> >   and supports in kernel SIMD stuff sufficiently to use it)
> 
> This is Pantherlake only. And the reason why I asked to add a
> check with error/warn for IS_DGFX()...
> 
> which by the way is an assert... I still don't believe it is enough.
> I believe a return with warn_on seems more appropriate to really
> never try to run that code in case of a big future mistake.
> 

We don’t have an issue here since this is an iGPU—for now. Let’s hope
that a future dGPU doesn’t consider a solution like this a good idea for
anything, as this PTL approach is questionable at best. A big WARN_ON
with a return is probably not a bad idea.

Matt

> > 
> > > 
> > > -Satya.>
> > > >> Instead, a single BB_END marker is placed at the suballocator's end to
> > > >> terminate execution.
> > > >>
> > > >> This patch ensures race-condition-safe CCS metadata save/restore
> > > >> operations by guaranteeing atomic writes to the batch buffer, preventing
> > > >> corruption regardless of when save/restore operations are triggered.
> > > >>
> > > >> -Satya.>>
> > > >>>> Since VMOVDQU operates on 256-bit chunks, update EMIT_COPY_CCS_DW to emit
> > > >>>> 8 dwords instead of 5 dwords.
> > > >>>>
> > > >>>> Update emit_flush_invalidate() to use VMOVDQU operating with 128-bit
> > > >>>> chunks.
> > > >>>>
> > > >>>> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> > > >>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > > >>>> Cc: Matthew Brost <matthew.brost@intel.com>
> > > >>>> Cc: Matthew Auld <matthew.auld@intel.com>
> > > >>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > >>>> Cc: Matt Roper <matthew.d.roper@intel.com>
> > > >>>>
> > > >>>> ---
> > > >>>> V6 -> V7:
> > > >>>> - Added description explaining why to use assembly instructions for
> > > >>>> atomicity.
> > > >>>> - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
> > > >>>> - Include <asm/cpufeature.h> though checkpatch complains. With
> > > >>>> <linux/cpufeature.h> KUnit is throwing errors.
> > > >>>>
> > > >>>> V5 -> V6:
> > > >>>> - Fixed review comments (Rodrigo)
> > > >>>>
> > > >>>> V4 -> V5:
> > > >>>> - Fixed review comments. (Matt B)
> > > >>>>
> > > >>>> V3 -> V4:
> > > >>>> - Fixed review comments. (Wajdeczko)
> > > >>>> - Fix issues reported by patchworks.
> > > >>>>
> > > >>>> V2 -> V3:
> > > >>>> - Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
> > > >>>> - Updated emit_flush_invalidate() to use vmovdqu instruction.
> > > >>>>
> > > >>>> V1 -> V2:
> > > >>>> - Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
> > > >>>>     (Auld, Matthew)
> > > >>>>     - Fix issues reported by patchworks.
> > > >>>> ---
> > > >>>>    drivers/gpu/drm/xe/xe_migrate.c | 112 ++++++++++++++++++++++++++------
> > > >>>>    1 file changed, 91 insertions(+), 21 deletions(-)
> > > >>>>
> > > >>>> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> > > >>>> index 3112c966c67d..e0be7396a0ab 100644
> > > >>>> --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > >>>> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > >>>> @@ -5,6 +5,8 @@
> > > >>>>    
> > > >>>>    #include "xe_migrate.h"
> > > >>>>    
> > > >>>> +#include <asm/fpu/api.h>
> > > >>>> +#include <asm/cpufeature.h>
> > > >>>>    #include <linux/bitfield.h>
> > > >>>>    #include <linux/sizes.h>
> > > >>>>    
> > > >>>> @@ -33,6 +35,7 @@
> > > >>>>    #include "xe_res_cursor.h"
> > > >>>>    #include "xe_sa.h"
> > > >>>>    #include "xe_sched_job.h"
> > > >>>> +#include "xe_sriov_vf_ccs.h"
> > > >>>>    #include "xe_sync.h"
> > > >>>>    #include "xe_trace_bo.h"
> > > >>>>    #include "xe_validation.h"
> > > >>>> @@ -657,18 +660,68 @@ static void emit_pte(struct xe_migrate *m,
> > > >>>>    	}
> > > >>>>    }
> > > >>>>    
> > > >>>> -#define EMIT_COPY_CCS_DW 5
> > > >>>> +/*
> > > >>>> + * VF KMD registers two specialized LRCs with the GuC to handle save/restore
> > > >>>> + * operations for CCS metadata on IGPU. The GuC executes these LRCAs during
> > > >>>> + * VF state/restore operations.
> > > >>>> + *
> > > >>>> + * Each LRC contains a batch buffer pool that GuC submits to hardware during
> > > >>>> + * VF state save/restore operations. Since these operations can occur
> > > >>>> + * asynchronously at any time, we must ensure GPU instructions in the batch
> > > >>>> + * buffer are written atomically to prevent corruption from incomplete writes.
> > > >>>> + *
> > > >>>> + * To guarantee atomic instruction writes, we use x86 SIMD instructions
> > > >>>> + * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
> > > >>>> + * sections. This prevents vCPU preemption during instruction generation,
> > > >>>> + * ensuring complete GPU commands are written to the batch buffer.
> > > >>>> + */
> > > >>>> +
> > > >>>> +static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
> > > >>>> +{
> > > >>>> +	xe_assert(xe, !IS_DGFX(xe));
> > > >>>> +#ifdef CONFIG_X86
> > > >>>> +	kernel_fpu_begin();
> > > >>>> +	if (size == SZ_128) {
> > > >>>> +		asm("vmovdqu (%0), %%xmm0\n"
> > > >>>> +		    "vmovups %%xmm0,   (%1)\n"
> > > >>>> +		    :: "r" (src), "r" (dst) : "memory");
> > > >>>> +	} else if (size == SZ_256) {
> > > >>>> +		asm("vmovdqu (%0), %%ymm0\n"
> > > >>>> +		    "vmovups %%ymm0,   (%1)\n"
> > > >>>> +		    :: "r" (src), "r" (dst) : "memory");
> > > >>>> +	}
> > > >>>> +	kernel_fpu_end();
> > > >>>> +#endif
> > > >>>> +}
> > > >>>> +
> > > >>>> +static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
> > > >>>> +{
> > > >>>> +	u32 instr_size = size * BITS_PER_BYTE;
> > > >>>> +
> > > >>>> +	xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
> > > >>>> +
> > > >>>> +	if (IS_VF_CCS_READY(gt_to_xe(gt))) {
> > > >>>> +		xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
> > > >>>> +		memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
> > > >>>> +	} else {
> > > >>>> +		memcpy(dst, src, size);
> > > >>>> +	}
> > > >>>> +}
> > > >>>> +
> > > >>>> +#define EMIT_COPY_CCS_DW 8
> > > >>>>    static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > > >>>>    			  u64 dst_ofs, bool dst_is_indirect,
> > > >>>>    			  u64 src_ofs, bool src_is_indirect,
> > > >>>>    			  u32 size)
> > > >>>>    {
> > > >>>> +	u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
> > > >>>>    	struct xe_device *xe = gt_to_xe(gt);
> > > >>>>    	u32 *cs = bb->cs + bb->len;
> > > >>>>    	u32 num_ccs_blks;
> > > >>>>    	u32 num_pages;
> > > >>>>    	u32 ccs_copy_size;
> > > >>>>    	u32 mocs;
> > > >>>> +	u32 i = 0;
> > > >>>>    
> > > >>>>    	if (GRAPHICS_VERx100(xe) >= 2000) {
> > > >>>>    		num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
> > > >>>> @@ -686,15 +739,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > > >>>>    		mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
> > > >>>>    	}
> > > >>>>    
> > > >>>> -	*cs++ = XY_CTRL_SURF_COPY_BLT |
> > > >>>> -		(src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > > >>>> -		(dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > > >>>> -		ccs_copy_size;
> > > >>>> -	*cs++ = lower_32_bits(src_ofs);
> > > >>>> -	*cs++ = upper_32_bits(src_ofs) | mocs;
> > > >>>> -	*cs++ = lower_32_bits(dst_ofs);
> > > >>>> -	*cs++ = upper_32_bits(dst_ofs) | mocs;
> > > >>>> +	dw[i++] = XY_CTRL_SURF_COPY_BLT |
> > > >>>> +		  (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > > >>>> +		  (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > > >>>> +		  ccs_copy_size;
> > > >>>> +	dw[i++] = lower_32_bits(src_ofs);
> > > >>>> +	dw[i++] = upper_32_bits(src_ofs) | mocs;
> > > >>>> +	dw[i++] = lower_32_bits(dst_ofs);
> > > >>>> +	dw[i++] = upper_32_bits(dst_ofs) | mocs;
> > > >>>>    
> > > >>>> +	/*
> > > >>>> +	 * The CCS copy command is a 5-dword sequence. If the vCPU halts during
> > > >>>> +	 * save/restore while this sequence is being issued, partial writes may trigger
> > > >>>> +	 * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
> > > >>>> +	 * write the sequence atomically.
> > > >>>> +	 */
> > > >>>> +	emit_atomic(gt, cs, dw, sizeof(dw));
> > > >>>> +	cs += EMIT_COPY_CCS_DW;
> > > >>>>    	bb->len = cs - bb->cs;
> > > >>>>    }
> > > >>>>    
> > > >>>> @@ -1006,18 +1067,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
> > > >>>>    	return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
> > > >>>>    }
> > > >>>>    
> > > >>>> -static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
> > > >>>> +/*
> > > >>>> + * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
> > > >>>> + * save/restore while this sequence is being issued, partial writes may
> > > >>>> + * trigger page faults when saving iGPU CCS metadata. Use
> > > >>>> + * emit_atomic() to write the sequence atomically.
> > > >>>> + */
> > > >>>> +#define EMIT_FLUSH_INVALIDATE_DW 4
> > > >>>> +static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
> > > >>>>    {
> > > >>>>    	u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
> > > >>>> +	u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
> > > >>>> +
> > > >>>> +	dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > > >>>> +		      MI_FLUSH_IMM_DW | flags;
> > > >>>> +	dw[j++] = lower_32_bits(addr);
> > > >>>> +	dw[j++] = upper_32_bits(addr);
> > > >>>> +	dw[j++] = MI_NOOP;
> > > >>>>    
> > > >>>> -	dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > > >>>> -		  MI_FLUSH_IMM_DW | flags;
> > > >>>> -	dw[i++] = lower_32_bits(addr);
> > > >>>> -	dw[i++] = upper_32_bits(addr);
> > > >>>> -	dw[i++] = MI_NOOP;
> > > >>>> -	dw[i++] = MI_NOOP;
> > > >>>> +	emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
> > > >>>>    
> > > >>>> -	return i;
> > > >>>> +	return i + j;
> > > >>>>    }
> > > >>>>    
> > > >>>>    /**
> > > >>>> @@ -1062,7 +1132,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > >>>>    	/* Calculate Batch buffer size */
> > > >>>>    	batch_size = 0;
> > > >>>>    	while (size) {
> > > >>>> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > > >>>> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > > >>>>    		u64 ccs_ofs, ccs_size;
> > > >>>>    		u32 ccs_pt;
> > > >>>>    
> > > >>>> @@ -1103,7 +1173,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > >>>>    	 * sizes here again before copy command is emitted.
> > > >>>>    	 */
> > > >>>>    	while (size) {
> > > >>>> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > > >>>> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > > >>>>    		u32 flush_flags = 0;
> > > >>>>    		u64 ccs_ofs, ccs_size;
> > > >>>>    		u32 ccs_pt;
> > > >>>> @@ -1126,11 +1196,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > >>>>    
> > > >>>>    		emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
> > > >>>>    
> > > >>>> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > > >>>> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> > > >>>>    		flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
> > > >>>>    						  src_L0_ofs, dst_is_pltt,
> > > >>>>    						  src_L0, ccs_ofs, true);
> > > >>>> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > > >>>> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> > > >>>>    
> > > >>>>    		size -= src_L0;
> > > >>>>    	}
> > > >>>> -- 
> > > >>>> 2.51.0
> > > >>>
> > > > 
> > 
> > -- 
> > Ville Syrjälä
> > Intel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/3] drm/xe/migrate: Atomicize CCS copy command setup
  2025-10-17 22:35               ` Matthew Brost
@ 2025-10-17 22:45                 ` Matt Roper
  0 siblings, 0 replies; 21+ messages in thread
From: Matt Roper @ 2025-10-17 22:45 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Rodrigo Vivi, Ville Syrjälä, K V P, Satyanarayana,
	intel-xe, Michal Wajdeczko, Matthew Auld

On Fri, Oct 17, 2025 at 03:35:07PM -0700, Matthew Brost wrote:
> On Fri, Oct 17, 2025 at 02:21:59PM -0400, Rodrigo Vivi wrote:
> > On Fri, Oct 17, 2025 at 07:51:47PM +0300, Ville Syrjälä wrote:
> > > On Fri, Oct 17, 2025 at 09:59:48PM +0530, K V P, Satyanarayana wrote:
> > > > 
> > > > 
> > > > On 17-10-2025 20:56, Ville Syrjälä wrote:
> > > > > On Fri, Oct 17, 2025 at 08:46:37PM +0530, K V P, Satyanarayana wrote:
> > > > >>
> > > > >>
> > > > >> On 17-10-2025 19:57, Ville Syrjälä wrote:
> > > > >>> On Fri, Oct 17, 2025 at 07:42:28PM +0530, Satyanarayana K V P wrote:
> > > > >>>> The CCS copy command is a 5-dword sequence. If the vCPU halts during
> > > > >>>> save/restore while this sequence is being programmed, partial writes may
> > > > >>>> trigger page faults when saving IGPU CCS metadata. Use the VMOVDQU
> > > > >>>> instruction to write the sequence atomically.
> > > > >>>
> > > > >>> If this whole thing is so racy why don't you always add a new
> > > > >>> BB_END after new commands, and only replace the previous BB_END
> > > > >>> with NOOP _after_ the new commands have been fully written?
> > > > >>>
> > > > >> We maintain a suballocator for batch buffer management, with size
> > > > >> proportional to system memory (e.g., 16MB suballocator for 8GB SMEM).
> > > > >> Batch buffers are dynamically allocated from this pool based on the
> > > > >> number of active workloads. The entire suballocator region is submitted
> > > > >> to hardware for CCS metadata copy operations.
> > > > >>
> > > > >> We cannot insert BB_END commands after each individual instruction
> > > > >> sequence because additional GPU instructions may be appended later.
> > > > > 
> > > > > You *overwrite* the previous BB_END after the new commands have been
> > > > > appended.
> > > > We do not know where the new BB allocation will be. It may not be 
> > > > sequential and every BO has a BB. BBs are allocated and freed so often 
> > > > based on BOs getting created and destroyed. So, we can't use that approach.
> > > 
> > > Hmm, could perhaps use second level batches then. Each BO would gets
> > > its own second level batch, and the first level would just call them
> > > in sequence. Or is this already running as a second level batch?
> > 
> > This I'm not sure...
> > 
> 
> Embarrassingly, I’m not exactly sure what “second-level batch” means.

That's just referring to batchbuffer nesting.  MI_BATCH_BUFFER_START can
be executed within a batchbuffer to jump off to a 2nd level (or deeper)
batchbuffer.  When the nested batch buffer hits MI_BATCH_BUFFER_END,
execution will return to the spot in the first batchbuffer where it left
off:

  - <in ring>
  - MI_BATCH_BUFFER_START    --> start executing first-level batch
     - MI_BATCH_BUFFER_START --> start executing second-level batch
     - MI_BATCH_BUFFER_END   --> return to first level batch
  - MI_BATCH_BUFFER_END      --> return to ring

I believe current hardware allows up to three levels of nesting, older
hardware allowed less (and early hardware didn't support nesting at
all).  If you do a MI_BATCH_BUFFER_START in the highest level possible
on your platform, then it's treated like a 'goto' (i.e., the next
MI_BATCH_BUFFER_END won't return to the current batch, but rather to the
parent's batch).

There used to be an option MI_MODE that disabled nested batchbuffers
completely; I think we set that in i915 (see
fakewa_disable_nestedbb_mode()).


Matt

> What I can tell is that this is a batch buffer (BB) executed from a
> single BB start command in the ring.
> 
> > Matt, do you know?
> > 
> 
> I actually thought about this, and I believe it could be made to work.
> However, we would need two BOs and suballocators. The first BO would
> contain only jump-to-second-level batch instructions, while the second
> BO would contain the CCS copy commands. Even in this mode, the
> jump-to-second-level batch instruction would have to be written using
> AVX instructions. Maybe this approach is better, but it would also
> require a significantly larger rewrite.
> 
> > > 
> > > It might also be getting a bit complicated I guess, but at least it
> > > wouldn't have all obvious problems of the SIMD stuff:
> > > - looks like it will explode on non-AVX capable x86
> > > - will be broken on other arches until someone implements the equivalent
> > >   code (assuming the arch has such an atomic copy instruction
> > >   and supports in kernel SIMD stuff sufficiently to use it)
> > 
> > This is Pantherlake only. And the reason why I asked to add a
> > check with error/warn for IS_DGFX()...
> > 
> > which by the way is an assert... I still don't believe it is enough.
> > I believe a return with warn_on seems more appropriate to really
> > never try to run that code in case of a big future mistake.
> > 
> 
> We don’t have an issue here since this is an iGPU—for now. Let’s hope
> that a future dGPU doesn’t consider a solution like this a good idea for
> anything, as this PTL approach is questionable at best. A big WARN_ON
> with a return is probably not a bad idea.
> 
> Matt
> 
> > > 
> > > > 
> > > > -Satya.>
> > > > >> Instead, a single BB_END marker is placed at the suballocator's end to
> > > > >> terminate execution.
> > > > >>
> > > > >> This patch ensures race-condition-safe CCS metadata save/restore
> > > > >> operations by guaranteeing atomic writes to the batch buffer, preventing
> > > > >> corruption regardless of when save/restore operations are triggered.
> > > > >>
> > > > >> -Satya.>>
> > > > >>>> Since VMOVDQU operates on 256-bit chunks, update EMIT_COPY_CCS_DW to emit
> > > > >>>> 8 dwords instead of 5 dwords.
> > > > >>>>
> > > > >>>> Update emit_flush_invalidate() to use VMOVDQU operating with 128-bit
> > > > >>>> chunks.
> > > > >>>>
> > > > >>>> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> > > > >>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > > > >>>> Cc: Matthew Brost <matthew.brost@intel.com>
> > > > >>>> Cc: Matthew Auld <matthew.auld@intel.com>
> > > > >>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > >>>> Cc: Matt Roper <matthew.d.roper@intel.com>
> > > > >>>>
> > > > >>>> ---
> > > > >>>> V6 -> V7:
> > > > >>>> - Added description explaining why to use assembly instructions for
> > > > >>>> atomicity.
> > > > >>>> - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
> > > > >>>> - Include <asm/cpufeature.h> though checkpatch complains. With
> > > > >>>> <linux/cpufeature.h> KUnit is throwing errors.
> > > > >>>>
> > > > >>>> V5 -> V6:
> > > > >>>> - Fixed review comments (Rodrigo)
> > > > >>>>
> > > > >>>> V4 -> V5:
> > > > >>>> - Fixed review comments. (Matt B)
> > > > >>>>
> > > > >>>> V3 -> V4:
> > > > >>>> - Fixed review comments. (Wajdeczko)
> > > > >>>> - Fix issues reported by patchworks.
> > > > >>>>
> > > > >>>> V2 -> V3:
> > > > >>>> - Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
> > > > >>>> - Updated emit_flush_invalidate() to use vmovdqu instruction.
> > > > >>>>
> > > > >>>> V1 -> V2:
> > > > >>>> - Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
> > > > >>>>     (Auld, Matthew)
> > > > >>>>     - Fix issues reported by patchworks.
> > > > >>>> ---
> > > > >>>>    drivers/gpu/drm/xe/xe_migrate.c | 112 ++++++++++++++++++++++++++------
> > > > >>>>    1 file changed, 91 insertions(+), 21 deletions(-)
> > > > >>>>
> > > > >>>> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> > > > >>>> index 3112c966c67d..e0be7396a0ab 100644
> > > > >>>> --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > > >>>> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > > >>>> @@ -5,6 +5,8 @@
> > > > >>>>    
> > > > >>>>    #include "xe_migrate.h"
> > > > >>>>    
> > > > >>>> +#include <asm/fpu/api.h>
> > > > >>>> +#include <asm/cpufeature.h>
> > > > >>>>    #include <linux/bitfield.h>
> > > > >>>>    #include <linux/sizes.h>
> > > > >>>>    
> > > > >>>> @@ -33,6 +35,7 @@
> > > > >>>>    #include "xe_res_cursor.h"
> > > > >>>>    #include "xe_sa.h"
> > > > >>>>    #include "xe_sched_job.h"
> > > > >>>> +#include "xe_sriov_vf_ccs.h"
> > > > >>>>    #include "xe_sync.h"
> > > > >>>>    #include "xe_trace_bo.h"
> > > > >>>>    #include "xe_validation.h"
> > > > >>>> @@ -657,18 +660,68 @@ static void emit_pte(struct xe_migrate *m,
> > > > >>>>    	}
> > > > >>>>    }
> > > > >>>>    
> > > > >>>> -#define EMIT_COPY_CCS_DW 5
> > > > >>>> +/*
> > > > >>>> + * VF KMD registers two specialized LRCs with the GuC to handle save/restore
> > > > >>>> + * operations for CCS metadata on IGPU. The GuC executes these LRCAs during
> > > > >>>> + * VF state/restore operations.
> > > > >>>> + *
> > > > >>>> + * Each LRC contains a batch buffer pool that GuC submits to hardware during
> > > > >>>> + * VF state save/restore operations. Since these operations can occur
> > > > >>>> + * asynchronously at any time, we must ensure GPU instructions in the batch
> > > > >>>> + * buffer are written atomically to prevent corruption from incomplete writes.
> > > > >>>> + *
> > > > >>>> + * To guarantee atomic instruction writes, we use x86 SIMD instructions
> > > > >>>> + * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
> > > > >>>> + * sections. This prevents vCPU preemption during instruction generation,
> > > > >>>> + * ensuring complete GPU commands are written to the batch buffer.
> > > > >>>> + */
> > > > >>>> +
> > > > >>>> +static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
> > > > >>>> +{
> > > > >>>> +	xe_assert(xe, !IS_DGFX(xe));
> > > > >>>> +#ifdef CONFIG_X86
> > > > >>>> +	kernel_fpu_begin();
> > > > >>>> +	if (size == SZ_128) {
> > > > >>>> +		asm("vmovdqu (%0), %%xmm0\n"
> > > > >>>> +		    "vmovups %%xmm0,   (%1)\n"
> > > > >>>> +		    :: "r" (src), "r" (dst) : "memory");
> > > > >>>> +	} else if (size == SZ_256) {
> > > > >>>> +		asm("vmovdqu (%0), %%ymm0\n"
> > > > >>>> +		    "vmovups %%ymm0,   (%1)\n"
> > > > >>>> +		    :: "r" (src), "r" (dst) : "memory");
> > > > >>>> +	}
> > > > >>>> +	kernel_fpu_end();
> > > > >>>> +#endif
> > > > >>>> +}
> > > > >>>> +
> > > > >>>> +static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
> > > > >>>> +{
> > > > >>>> +	u32 instr_size = size * BITS_PER_BYTE;
> > > > >>>> +
> > > > >>>> +	xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
> > > > >>>> +
> > > > >>>> +	if (IS_VF_CCS_READY(gt_to_xe(gt))) {
> > > > >>>> +		xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
> > > > >>>> +		memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
> > > > >>>> +	} else {
> > > > >>>> +		memcpy(dst, src, size);
> > > > >>>> +	}
> > > > >>>> +}
> > > > >>>> +
> > > > >>>> +#define EMIT_COPY_CCS_DW 8
> > > > >>>>    static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > > > >>>>    			  u64 dst_ofs, bool dst_is_indirect,
> > > > >>>>    			  u64 src_ofs, bool src_is_indirect,
> > > > >>>>    			  u32 size)
> > > > >>>>    {
> > > > >>>> +	u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
> > > > >>>>    	struct xe_device *xe = gt_to_xe(gt);
> > > > >>>>    	u32 *cs = bb->cs + bb->len;
> > > > >>>>    	u32 num_ccs_blks;
> > > > >>>>    	u32 num_pages;
> > > > >>>>    	u32 ccs_copy_size;
> > > > >>>>    	u32 mocs;
> > > > >>>> +	u32 i = 0;
> > > > >>>>    
> > > > >>>>    	if (GRAPHICS_VERx100(xe) >= 2000) {
> > > > >>>>    		num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
> > > > >>>> @@ -686,15 +739,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > > > >>>>    		mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
> > > > >>>>    	}
> > > > >>>>    
> > > > >>>> -	*cs++ = XY_CTRL_SURF_COPY_BLT |
> > > > >>>> -		(src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > > > >>>> -		(dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > > > >>>> -		ccs_copy_size;
> > > > >>>> -	*cs++ = lower_32_bits(src_ofs);
> > > > >>>> -	*cs++ = upper_32_bits(src_ofs) | mocs;
> > > > >>>> -	*cs++ = lower_32_bits(dst_ofs);
> > > > >>>> -	*cs++ = upper_32_bits(dst_ofs) | mocs;
> > > > >>>> +	dw[i++] = XY_CTRL_SURF_COPY_BLT |
> > > > >>>> +		  (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > > > >>>> +		  (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > > > >>>> +		  ccs_copy_size;
> > > > >>>> +	dw[i++] = lower_32_bits(src_ofs);
> > > > >>>> +	dw[i++] = upper_32_bits(src_ofs) | mocs;
> > > > >>>> +	dw[i++] = lower_32_bits(dst_ofs);
> > > > >>>> +	dw[i++] = upper_32_bits(dst_ofs) | mocs;
> > > > >>>>    
> > > > >>>> +	/*
> > > > >>>> +	 * The CCS copy command is a 5-dword sequence. If the vCPU halts during
> > > > >>>> +	 * save/restore while this sequence is being issued, partial writes may trigger
> > > > >>>> +	 * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
> > > > >>>> +	 * write the sequence atomically.
> > > > >>>> +	 */
> > > > >>>> +	emit_atomic(gt, cs, dw, sizeof(dw));
> > > > >>>> +	cs += EMIT_COPY_CCS_DW;
> > > > >>>>    	bb->len = cs - bb->cs;
> > > > >>>>    }
> > > > >>>>    
> > > > >>>> @@ -1006,18 +1067,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
> > > > >>>>    	return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
> > > > >>>>    }
> > > > >>>>    
> > > > >>>> -static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
> > > > >>>> +/*
> > > > >>>> + * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
> > > > >>>> + * save/restore while this sequence is being issued, partial writes may
> > > > >>>> + * trigger page faults when saving iGPU CCS metadata. Use
> > > > >>>> + * emit_atomic() to write the sequence atomically.
> > > > >>>> + */
> > > > >>>> +#define EMIT_FLUSH_INVALIDATE_DW 4
> > > > >>>> +static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
> > > > >>>>    {
> > > > >>>>    	u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
> > > > >>>> +	u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
> > > > >>>> +
> > > > >>>> +	dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > > > >>>> +		      MI_FLUSH_IMM_DW | flags;
> > > > >>>> +	dw[j++] = lower_32_bits(addr);
> > > > >>>> +	dw[j++] = upper_32_bits(addr);
> > > > >>>> +	dw[j++] = MI_NOOP;
> > > > >>>>    
> > > > >>>> -	dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > > > >>>> -		  MI_FLUSH_IMM_DW | flags;
> > > > >>>> -	dw[i++] = lower_32_bits(addr);
> > > > >>>> -	dw[i++] = upper_32_bits(addr);
> > > > >>>> -	dw[i++] = MI_NOOP;
> > > > >>>> -	dw[i++] = MI_NOOP;
> > > > >>>> +	emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
> > > > >>>>    
> > > > >>>> -	return i;
> > > > >>>> +	return i + j;
> > > > >>>>    }
> > > > >>>>    
> > > > >>>>    /**
> > > > >>>> @@ -1062,7 +1132,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > > >>>>    	/* Calculate Batch buffer size */
> > > > >>>>    	batch_size = 0;
> > > > >>>>    	while (size) {
> > > > >>>> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > > > >>>> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > > > >>>>    		u64 ccs_ofs, ccs_size;
> > > > >>>>    		u32 ccs_pt;
> > > > >>>>    
> > > > >>>> @@ -1103,7 +1173,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > > >>>>    	 * sizes here again before copy command is emitted.
> > > > >>>>    	 */
> > > > >>>>    	while (size) {
> > > > >>>> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > > > >>>> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > > > >>>>    		u32 flush_flags = 0;
> > > > >>>>    		u64 ccs_ofs, ccs_size;
> > > > >>>>    		u32 ccs_pt;
> > > > >>>> @@ -1126,11 +1196,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > > >>>>    
> > > > >>>>    		emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
> > > > >>>>    
> > > > >>>> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > > > >>>> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> > > > >>>>    		flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
> > > > >>>>    						  src_L0_ofs, dst_is_pltt,
> > > > >>>>    						  src_L0, ccs_ofs, true);
> > > > >>>> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > > > >>>> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> > > > >>>>    
> > > > >>>>    		size -= src_L0;
> > > > >>>>    	}
> > > > >>>> -- 
> > > > >>>> 2.51.0
> > > > >>>
> > > > > 
> > > 
> > > -- 
> > > Ville Syrjälä
> > > Intel

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/3] drm/xe/migrate: Atomicize CCS copy command setup
  2025-10-17 18:21             ` Rodrigo Vivi
  2025-10-17 22:35               ` Matthew Brost
@ 2025-10-17 22:35               ` Matt Roper
  2025-10-17 22:59                 ` Matthew Brost
  1 sibling, 1 reply; 21+ messages in thread
From: Matt Roper @ 2025-10-17 22:35 UTC (permalink / raw)
  To: Rodrigo Vivi
  Cc: Ville Syrjälä, Matthew Brost, K V P, Satyanarayana,
	intel-xe, Michal Wajdeczko, Matthew Auld

On Fri, Oct 17, 2025 at 02:21:59PM -0400, Rodrigo Vivi wrote:
> On Fri, Oct 17, 2025 at 07:51:47PM +0300, Ville Syrjälä wrote:
> > On Fri, Oct 17, 2025 at 09:59:48PM +0530, K V P, Satyanarayana wrote:
> > > 
> > > 
> > > On 17-10-2025 20:56, Ville Syrjälä wrote:
> > > > On Fri, Oct 17, 2025 at 08:46:37PM +0530, K V P, Satyanarayana wrote:
> > > >>
> > > >>
> > > >> On 17-10-2025 19:57, Ville Syrjälä wrote:
> > > >>> On Fri, Oct 17, 2025 at 07:42:28PM +0530, Satyanarayana K V P wrote:
> > > >>>> The CCS copy command is a 5-dword sequence. If the vCPU halts during
> > > >>>> save/restore while this sequence is being programmed, partial writes may
> > > >>>> trigger page faults when saving IGPU CCS metadata. Use the VMOVDQU
> > > >>>> instruction to write the sequence atomically.
> > > >>>
> > > >>> If this whole thing is so racy why don't you always add a new
> > > >>> BB_END after new commands, and only replace the previous BB_END
> > > >>> with NOOP _after_ the new commands have been fully written?
> > > >>>
> > > >> We maintain a suballocator for batch buffer management, with size
> > > >> proportional to system memory (e.g., 16MB suballocator for 8GB SMEM).
> > > >> Batch buffers are dynamically allocated from this pool based on the
> > > >> number of active workloads. The entire suballocator region is submitted
> > > >> to hardware for CCS metadata copy operations.
> > > >>
> > > >> We cannot insert BB_END commands after each individual instruction
> > > >> sequence because additional GPU instructions may be appended later.
> > > > 
> > > > You *overwrite* the previous BB_END after the new commands have been
> > > > appended.
> > > We do not know where the new BB allocation will be. It may not be 
> > > sequential and every BO has a BB. BBs are allocated and freed so often 
> > > based on BOs getting created and destroyed. So, we can't use that approach.
> > 
> > Hmm, could perhaps use second level batches then. Each BO would gets
> > its own second level batch, and the first level would just call them
> > in sequence. Or is this already running as a second level batch?
> 
> This I'm not sure...
> 
> Matt, do you know?

My understanding of this feature is that we create two additional
contexts (LRC's) and tell the GuC that they're special --- one should be
scheduled whenever a VF is being stopped and the other should be
scheduled when a VF is being started.  The intent is to use these
contexts to do a "context switch" of the VF's CCS data --- saving it out
when the VF is stopping and bringing it back in when the VF is resumed.

From a hardware point of view I think we could handle things however we
like in the LRC's ring and/or batch buffers.  We could add all the
necessary copy commands directly to the LRCs' rings if we wanted, or we
can add them to batchbuffers, or add them to 2- or 3-level nested
batchbuffers.  The current approach actually taken appears to be to
allocate a large chunk of memory (ctx->mem.ccs_bb_pool) and do an
MI_BATCH_BUFFER_START off to it.  That ccs_bb_pool is originally full of
NOOP (nothing to do), but as the VF allocates buffers, suballocations of
the pool are done for each buffer allocated, and those suballocations
are filled with the necessary commands to copy the CCS data in/out.
When a buffer is released by the VF, it's suballocation of the pool is
wiped over with NOOPs.  Any time a VF starts/stops, the save and restore
LRCs get scheduled by the GuC and execute their entire ccs_bb_pool as a
single batchbuffer (mostly containing noops, but with sections of copy
instructions scattered around).  

          Save LRC
H +-----------------------+
  | MI_BATCH_BUFFER_START |-------> +-------------------------------+
T +-----------------------+         |            <noops>            |
                                    +-------------------------------+
                                    | instr's to copy CCS for a bo  |
                                    +-------------------------------+
                                    |            <noops>            |
                                    +-------------------------------+
                                    | instr's to copy CCS for a bo  |
                                    +-------------------------------+
                                    |            <noops>            |
                                    +-------------------------------+
                                    ~             ...               ~
                                    +-------------------------------+

So the problem this patch is trying to address is when a suballocation
of the pool is made, and the CPU is in the process of poking
instructions into it with the CPU when the VF is stopped; part of the
copy instructions will have landed in memory, other parts may not have.
But the GuC will still execute the entire pool as one giant batchbuffer
but the subsection that was being updated will be incomplete, possibly
in harmful ways (e.g., an instruction started, but the addresses it
referenced not yet filled in).


The architecture document for this feature suggests the following:

        """
        VF KMD can utilize standard cmd programming technique like
        Shadow cmd buffer or ping/pong and swap BB_Start address to
        avoid partially-updated BB is executed by GuC in case of middle
        update pause.
        """

So it sounds like you could have two "save LRCs," make the updates to
the one that's currently inactive, and then tell the GuC to replace the
current save context with the other one once you finish an update (if
the GuC interface lets you do that --- I haven't checked).  Any time the
GuC actually starts running something, it's a complete LRC, ring, and
batchbuffer, and there are no racing updates to those from the VF KMD.

Alternatively, you could stick with a single context for each, but just
create a shadow batch buffer instead of a whole shadow context and then
patch the address that the MI_BATCH_BUFFER_START is jumping to after you
update the inactive buffer.  If the batchbuffer is in the GGTT, then we
only need an uninterrupted 32-bit update since the upper dword of the
address is always 0.

There are probably also ways you could use MI_PREDICATE and/or
MI_COND_BATCH_BUFFER_END to make the hardware execute the "ready"
batchbuffer and not the "inactive, updating" batchbuffer without having
to go back and patch the ring contents via the CPU, but I'd need to
refresh my memory on the exact usage of those kinds of instructions.


Matt


> 
> > 
> > It might also be getting a bit complicated I guess, but at least it
> > wouldn't have all obvious problems of the SIMD stuff:
> > - looks like it will explode on non-AVX capable x86
> > - will be broken on other arches until someone implements the equivalent
> >   code (assuming the arch has such an atomic copy instruction
> >   and supports in kernel SIMD stuff sufficiently to use it)
> 
> This is Pantherlake only. And the reason why I asked to add a
> check with error/warn for IS_DGFX()...
> 
> which by the way is an assert... I still don't believe it is enough.
> I believe a return with warn_on seems more appropriate to really
> never try to run that code in case of a big future mistake.
> 
> > 
> > > 
> > > -Satya.>
> > > >> Instead, a single BB_END marker is placed at the suballocator's end to
> > > >> terminate execution.
> > > >>
> > > >> This patch ensures race-condition-safe CCS metadata save/restore
> > > >> operations by guaranteeing atomic writes to the batch buffer, preventing
> > > >> corruption regardless of when save/restore operations are triggered.
> > > >>
> > > >> -Satya.>>
> > > >>>> Since VMOVDQU operates on 256-bit chunks, update EMIT_COPY_CCS_DW to emit
> > > >>>> 8 dwords instead of 5 dwords.
> > > >>>>
> > > >>>> Update emit_flush_invalidate() to use VMOVDQU operating with 128-bit
> > > >>>> chunks.
> > > >>>>
> > > >>>> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> > > >>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > > >>>> Cc: Matthew Brost <matthew.brost@intel.com>
> > > >>>> Cc: Matthew Auld <matthew.auld@intel.com>
> > > >>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > >>>> Cc: Matt Roper <matthew.d.roper@intel.com>
> > > >>>>
> > > >>>> ---
> > > >>>> V6 -> V7:
> > > >>>> - Added description explaining why to use assembly instructions for
> > > >>>> atomicity.
> > > >>>> - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
> > > >>>> - Include <asm/cpufeature.h> though checkpatch complains. With
> > > >>>> <linux/cpufeature.h> KUnit is throwing errors.
> > > >>>>
> > > >>>> V5 -> V6:
> > > >>>> - Fixed review comments (Rodrigo)
> > > >>>>
> > > >>>> V4 -> V5:
> > > >>>> - Fixed review comments. (Matt B)
> > > >>>>
> > > >>>> V3 -> V4:
> > > >>>> - Fixed review comments. (Wajdeczko)
> > > >>>> - Fix issues reported by patchworks.
> > > >>>>
> > > >>>> V2 -> V3:
> > > >>>> - Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
> > > >>>> - Updated emit_flush_invalidate() to use vmovdqu instruction.
> > > >>>>
> > > >>>> V1 -> V2:
> > > >>>> - Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
> > > >>>>     (Auld, Matthew)
> > > >>>>     - Fix issues reported by patchworks.
> > > >>>> ---
> > > >>>>    drivers/gpu/drm/xe/xe_migrate.c | 112 ++++++++++++++++++++++++++------
> > > >>>>    1 file changed, 91 insertions(+), 21 deletions(-)
> > > >>>>
> > > >>>> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> > > >>>> index 3112c966c67d..e0be7396a0ab 100644
> > > >>>> --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > >>>> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > >>>> @@ -5,6 +5,8 @@
> > > >>>>    
> > > >>>>    #include "xe_migrate.h"
> > > >>>>    
> > > >>>> +#include <asm/fpu/api.h>
> > > >>>> +#include <asm/cpufeature.h>
> > > >>>>    #include <linux/bitfield.h>
> > > >>>>    #include <linux/sizes.h>
> > > >>>>    
> > > >>>> @@ -33,6 +35,7 @@
> > > >>>>    #include "xe_res_cursor.h"
> > > >>>>    #include "xe_sa.h"
> > > >>>>    #include "xe_sched_job.h"
> > > >>>> +#include "xe_sriov_vf_ccs.h"
> > > >>>>    #include "xe_sync.h"
> > > >>>>    #include "xe_trace_bo.h"
> > > >>>>    #include "xe_validation.h"
> > > >>>> @@ -657,18 +660,68 @@ static void emit_pte(struct xe_migrate *m,
> > > >>>>    	}
> > > >>>>    }
> > > >>>>    
> > > >>>> -#define EMIT_COPY_CCS_DW 5
> > > >>>> +/*
> > > >>>> + * VF KMD registers two specialized LRCs with the GuC to handle save/restore
> > > >>>> + * operations for CCS metadata on IGPU. The GuC executes these LRCAs during
> > > >>>> + * VF state/restore operations.
> > > >>>> + *
> > > >>>> + * Each LRC contains a batch buffer pool that GuC submits to hardware during
> > > >>>> + * VF state save/restore operations. Since these operations can occur
> > > >>>> + * asynchronously at any time, we must ensure GPU instructions in the batch
> > > >>>> + * buffer are written atomically to prevent corruption from incomplete writes.
> > > >>>> + *
> > > >>>> + * To guarantee atomic instruction writes, we use x86 SIMD instructions
> > > >>>> + * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
> > > >>>> + * sections. This prevents vCPU preemption during instruction generation,
> > > >>>> + * ensuring complete GPU commands are written to the batch buffer.
> > > >>>> + */
> > > >>>> +
> > > >>>> +static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
> > > >>>> +{
> > > >>>> +	xe_assert(xe, !IS_DGFX(xe));
> > > >>>> +#ifdef CONFIG_X86
> > > >>>> +	kernel_fpu_begin();
> > > >>>> +	if (size == SZ_128) {
> > > >>>> +		asm("vmovdqu (%0), %%xmm0\n"
> > > >>>> +		    "vmovups %%xmm0,   (%1)\n"
> > > >>>> +		    :: "r" (src), "r" (dst) : "memory");
> > > >>>> +	} else if (size == SZ_256) {
> > > >>>> +		asm("vmovdqu (%0), %%ymm0\n"
> > > >>>> +		    "vmovups %%ymm0,   (%1)\n"
> > > >>>> +		    :: "r" (src), "r" (dst) : "memory");
> > > >>>> +	}
> > > >>>> +	kernel_fpu_end();
> > > >>>> +#endif
> > > >>>> +}
> > > >>>> +
> > > >>>> +static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
> > > >>>> +{
> > > >>>> +	u32 instr_size = size * BITS_PER_BYTE;
> > > >>>> +
> > > >>>> +	xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
> > > >>>> +
> > > >>>> +	if (IS_VF_CCS_READY(gt_to_xe(gt))) {
> > > >>>> +		xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
> > > >>>> +		memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
> > > >>>> +	} else {
> > > >>>> +		memcpy(dst, src, size);
> > > >>>> +	}
> > > >>>> +}
> > > >>>> +
> > > >>>> +#define EMIT_COPY_CCS_DW 8
> > > >>>>    static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > > >>>>    			  u64 dst_ofs, bool dst_is_indirect,
> > > >>>>    			  u64 src_ofs, bool src_is_indirect,
> > > >>>>    			  u32 size)
> > > >>>>    {
> > > >>>> +	u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
> > > >>>>    	struct xe_device *xe = gt_to_xe(gt);
> > > >>>>    	u32 *cs = bb->cs + bb->len;
> > > >>>>    	u32 num_ccs_blks;
> > > >>>>    	u32 num_pages;
> > > >>>>    	u32 ccs_copy_size;
> > > >>>>    	u32 mocs;
> > > >>>> +	u32 i = 0;
> > > >>>>    
> > > >>>>    	if (GRAPHICS_VERx100(xe) >= 2000) {
> > > >>>>    		num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
> > > >>>> @@ -686,15 +739,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > > >>>>    		mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
> > > >>>>    	}
> > > >>>>    
> > > >>>> -	*cs++ = XY_CTRL_SURF_COPY_BLT |
> > > >>>> -		(src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > > >>>> -		(dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > > >>>> -		ccs_copy_size;
> > > >>>> -	*cs++ = lower_32_bits(src_ofs);
> > > >>>> -	*cs++ = upper_32_bits(src_ofs) | mocs;
> > > >>>> -	*cs++ = lower_32_bits(dst_ofs);
> > > >>>> -	*cs++ = upper_32_bits(dst_ofs) | mocs;
> > > >>>> +	dw[i++] = XY_CTRL_SURF_COPY_BLT |
> > > >>>> +		  (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > > >>>> +		  (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > > >>>> +		  ccs_copy_size;
> > > >>>> +	dw[i++] = lower_32_bits(src_ofs);
> > > >>>> +	dw[i++] = upper_32_bits(src_ofs) | mocs;
> > > >>>> +	dw[i++] = lower_32_bits(dst_ofs);
> > > >>>> +	dw[i++] = upper_32_bits(dst_ofs) | mocs;
> > > >>>>    
> > > >>>> +	/*
> > > >>>> +	 * The CCS copy command is a 5-dword sequence. If the vCPU halts during
> > > >>>> +	 * save/restore while this sequence is being issued, partial writes may trigger
> > > >>>> +	 * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
> > > >>>> +	 * write the sequence atomically.
> > > >>>> +	 */
> > > >>>> +	emit_atomic(gt, cs, dw, sizeof(dw));
> > > >>>> +	cs += EMIT_COPY_CCS_DW;
> > > >>>>    	bb->len = cs - bb->cs;
> > > >>>>    }
> > > >>>>    
> > > >>>> @@ -1006,18 +1067,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
> > > >>>>    	return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
> > > >>>>    }
> > > >>>>    
> > > >>>> -static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
> > > >>>> +/*
> > > >>>> + * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
> > > >>>> + * save/restore while this sequence is being issued, partial writes may
> > > >>>> + * trigger page faults when saving iGPU CCS metadata. Use
> > > >>>> + * emit_atomic() to write the sequence atomically.
> > > >>>> + */
> > > >>>> +#define EMIT_FLUSH_INVALIDATE_DW 4
> > > >>>> +static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
> > > >>>>    {
> > > >>>>    	u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
> > > >>>> +	u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
> > > >>>> +
> > > >>>> +	dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > > >>>> +		      MI_FLUSH_IMM_DW | flags;
> > > >>>> +	dw[j++] = lower_32_bits(addr);
> > > >>>> +	dw[j++] = upper_32_bits(addr);
> > > >>>> +	dw[j++] = MI_NOOP;
> > > >>>>    
> > > >>>> -	dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > > >>>> -		  MI_FLUSH_IMM_DW | flags;
> > > >>>> -	dw[i++] = lower_32_bits(addr);
> > > >>>> -	dw[i++] = upper_32_bits(addr);
> > > >>>> -	dw[i++] = MI_NOOP;
> > > >>>> -	dw[i++] = MI_NOOP;
> > > >>>> +	emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
> > > >>>>    
> > > >>>> -	return i;
> > > >>>> +	return i + j;
> > > >>>>    }
> > > >>>>    
> > > >>>>    /**
> > > >>>> @@ -1062,7 +1132,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > >>>>    	/* Calculate Batch buffer size */
> > > >>>>    	batch_size = 0;
> > > >>>>    	while (size) {
> > > >>>> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > > >>>> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > > >>>>    		u64 ccs_ofs, ccs_size;
> > > >>>>    		u32 ccs_pt;
> > > >>>>    
> > > >>>> @@ -1103,7 +1173,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > >>>>    	 * sizes here again before copy command is emitted.
> > > >>>>    	 */
> > > >>>>    	while (size) {
> > > >>>> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > > >>>> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > > >>>>    		u32 flush_flags = 0;
> > > >>>>    		u64 ccs_ofs, ccs_size;
> > > >>>>    		u32 ccs_pt;
> > > >>>> @@ -1126,11 +1196,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > >>>>    
> > > >>>>    		emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
> > > >>>>    
> > > >>>> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > > >>>> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> > > >>>>    		flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
> > > >>>>    						  src_L0_ofs, dst_is_pltt,
> > > >>>>    						  src_L0, ccs_ofs, true);
> > > >>>> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > > >>>> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> > > >>>>    
> > > >>>>    		size -= src_L0;
> > > >>>>    	}
> > > >>>> -- 
> > > >>>> 2.51.0
> > > >>>
> > > > 
> > 
> > -- 
> > Ville Syrjälä
> > Intel

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/3] drm/xe/migrate: Atomicize CCS copy command setup
  2025-10-17 22:35               ` Matt Roper
@ 2025-10-17 22:59                 ` Matthew Brost
  0 siblings, 0 replies; 21+ messages in thread
From: Matthew Brost @ 2025-10-17 22:59 UTC (permalink / raw)
  To: Matt Roper
  Cc: Rodrigo Vivi, Ville Syrjälä, K V P, Satyanarayana,
	intel-xe, Michal Wajdeczko, Matthew Auld

On Fri, Oct 17, 2025 at 03:35:16PM -0700, Matt Roper wrote:
> On Fri, Oct 17, 2025 at 02:21:59PM -0400, Rodrigo Vivi wrote:
> > On Fri, Oct 17, 2025 at 07:51:47PM +0300, Ville Syrjälä wrote:
> > > On Fri, Oct 17, 2025 at 09:59:48PM +0530, K V P, Satyanarayana wrote:
> > > > 
> > > > 
> > > > On 17-10-2025 20:56, Ville Syrjälä wrote:
> > > > > On Fri, Oct 17, 2025 at 08:46:37PM +0530, K V P, Satyanarayana wrote:
> > > > >>
> > > > >>
> > > > >> On 17-10-2025 19:57, Ville Syrjälä wrote:
> > > > >>> On Fri, Oct 17, 2025 at 07:42:28PM +0530, Satyanarayana K V P wrote:
> > > > >>>> The CCS copy command is a 5-dword sequence. If the vCPU halts during
> > > > >>>> save/restore while this sequence is being programmed, partial writes may
> > > > >>>> trigger page faults when saving IGPU CCS metadata. Use the VMOVDQU
> > > > >>>> instruction to write the sequence atomically.
> > > > >>>
> > > > >>> If this whole thing is so racy why don't you always add a new
> > > > >>> BB_END after new commands, and only replace the previous BB_END
> > > > >>> with NOOP _after_ the new commands have been fully written?
> > > > >>>
> > > > >> We maintain a suballocator for batch buffer management, with size
> > > > >> proportional to system memory (e.g., 16MB suballocator for 8GB SMEM).
> > > > >> Batch buffers are dynamically allocated from this pool based on the
> > > > >> number of active workloads. The entire suballocator region is submitted
> > > > >> to hardware for CCS metadata copy operations.
> > > > >>
> > > > >> We cannot insert BB_END commands after each individual instruction
> > > > >> sequence because additional GPU instructions may be appended later.
> > > > > 
> > > > > You *overwrite* the previous BB_END after the new commands have been
> > > > > appended.
> > > > We do not know where the new BB allocation will be. It may not be 
> > > > sequential and every BO has a BB. BBs are allocated and freed so often 
> > > > based on BOs getting created and destroyed. So, we can't use that approach.
> > > 
> > > Hmm, could perhaps use second level batches then. Each BO would gets
> > > its own second level batch, and the first level would just call them
> > > in sequence. Or is this already running as a second level batch?
> > 
> > This I'm not sure...
> > 
> > Matt, do you know?
> 
> My understanding of this feature is that we create two additional
> contexts (LRC's) and tell the GuC that they're special --- one should be
> scheduled whenever a VF is being stopped and the other should be
> scheduled when a VF is being started.  The intent is to use these
> contexts to do a "context switch" of the VF's CCS data --- saving it out
> when the VF is stopping and bringing it back in when the VF is resumed.
> 
> From a hardware point of view I think we could handle things however we
> like in the LRC's ring and/or batch buffers.  We could add all the
> necessary copy commands directly to the LRCs' rings if we wanted, or we
> can add them to batchbuffers, or add them to 2- or 3-level nested
> batchbuffers.  The current approach actually taken appears to be to
> allocate a large chunk of memory (ctx->mem.ccs_bb_pool) and do an
> MI_BATCH_BUFFER_START off to it.  That ccs_bb_pool is originally full of
> NOOP (nothing to do), but as the VF allocates buffers, suballocations of
> the pool are done for each buffer allocated, and those suballocations
> are filled with the necessary commands to copy the CCS data in/out.
> When a buffer is released by the VF, it's suballocation of the pool is
> wiped over with NOOPs.  Any time a VF starts/stops, the save and restore
> LRCs get scheduled by the GuC and execute their entire ccs_bb_pool as a
> single batchbuffer (mostly containing noops, but with sections of copy
> instructions scattered around).  
> 
>           Save LRC
> H +-----------------------+
>   | MI_BATCH_BUFFER_START |-------> +-------------------------------+
> T +-----------------------+         |            <noops>            |
>                                     +-------------------------------+
>                                     | instr's to copy CCS for a bo  |
>                                     +-------------------------------+
>                                     |            <noops>            |
>                                     +-------------------------------+
>                                     | instr's to copy CCS for a bo  |
>                                     +-------------------------------+
>                                     |            <noops>            |
>                                     +-------------------------------+
>                                     ~             ...               ~
>                                     +-------------------------------+
> 
> So the problem this patch is trying to address is when a suballocation
> of the pool is made, and the CPU is in the process of poking
> instructions into it with the CPU when the VF is stopped; part of the
> copy instructions will have landed in memory, other parts may not have.
> But the GuC will still execute the entire pool as one giant batchbuffer
> but the subsection that was being updated will be incomplete, possibly
> in harmful ways (e.g., an instruction started, but the addresses it
> referenced not yet filled in).
> 
> 
> The architecture document for this feature suggests the following:
> 
>         """
>         VF KMD can utilize standard cmd programming technique like
>         Shadow cmd buffer or ping/pong and swap BB_Start address to
>         avoid partially-updated BB is executed by GuC in case of middle
>         update pause.
>         """

I totally missed this in the SaS. I wonder if this was added recently or
I'm just bad reading.

> 
> So it sounds like you could have two "save LRCs," make the updates to
> the one that's currently inactive, and then tell the GuC to replace the
> current save context with the other one once you finish an update (if
> the GuC interface lets you do that --- I haven't checked).  Any time the
> GuC actually starts running something, it's a complete LRC, ring, and
> batchbuffer, and there are no racing updates to those from the VF KMD.
> 
> Alternatively, you could stick with a single context for each, but just
> create a shadow batch buffer instead of a whole shadow context and then
> patch the address that the MI_BATCH_BUFFER_START is jumping to after you
> update the inactive buffer.  If the batchbuffer is in the GGTT, then we
> only need an uninterrupted 32-bit update since the upper dword of the
> address is always 0.

Yes, I don't think it would be "two saved LRCs" rather, "two saved
BOs.".

On the surface, using two BOs sounds promising, but we’d need a
device-level lock to ensure that at most one thread controls which BO is
the shadow and which one the GuC sees. That might be fine, but it’s
something that needs to be considered.

Matt

> 
> There are probably also ways you could use MI_PREDICATE and/or
> MI_COND_BATCH_BUFFER_END to make the hardware execute the "ready"
> batchbuffer and not the "inactive, updating" batchbuffer without having
> to go back and patch the ring contents via the CPU, but I'd need to
> refresh my memory on the exact usage of those kinds of instructions.
> 
> 
> Matt
> 
> 
> > 
> > > 
> > > It might also be getting a bit complicated I guess, but at least it
> > > wouldn't have all obvious problems of the SIMD stuff:
> > > - looks like it will explode on non-AVX capable x86
> > > - will be broken on other arches until someone implements the equivalent
> > >   code (assuming the arch has such an atomic copy instruction
> > >   and supports in kernel SIMD stuff sufficiently to use it)
> > 
> > This is Pantherlake only. And the reason why I asked to add a
> > check with error/warn for IS_DGFX()...
> > 
> > which by the way is an assert... I still don't believe it is enough.
> > I believe a return with warn_on seems more appropriate to really
> > never try to run that code in case of a big future mistake.
> > 
> > > 
> > > > 
> > > > -Satya.>
> > > > >> Instead, a single BB_END marker is placed at the suballocator's end to
> > > > >> terminate execution.
> > > > >>
> > > > >> This patch ensures race-condition-safe CCS metadata save/restore
> > > > >> operations by guaranteeing atomic writes to the batch buffer, preventing
> > > > >> corruption regardless of when save/restore operations are triggered.
> > > > >>
> > > > >> -Satya.>>
> > > > >>>> Since VMOVDQU operates on 256-bit chunks, update EMIT_COPY_CCS_DW to emit
> > > > >>>> 8 dwords instead of 5 dwords.
> > > > >>>>
> > > > >>>> Update emit_flush_invalidate() to use VMOVDQU operating with 128-bit
> > > > >>>> chunks.
> > > > >>>>
> > > > >>>> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> > > > >>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > > > >>>> Cc: Matthew Brost <matthew.brost@intel.com>
> > > > >>>> Cc: Matthew Auld <matthew.auld@intel.com>
> > > > >>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > >>>> Cc: Matt Roper <matthew.d.roper@intel.com>
> > > > >>>>
> > > > >>>> ---
> > > > >>>> V6 -> V7:
> > > > >>>> - Added description explaining why to use assembly instructions for
> > > > >>>> atomicity.
> > > > >>>> - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
> > > > >>>> - Include <asm/cpufeature.h> though checkpatch complains. With
> > > > >>>> <linux/cpufeature.h> KUnit is throwing errors.
> > > > >>>>
> > > > >>>> V5 -> V6:
> > > > >>>> - Fixed review comments (Rodrigo)
> > > > >>>>
> > > > >>>> V4 -> V5:
> > > > >>>> - Fixed review comments. (Matt B)
> > > > >>>>
> > > > >>>> V3 -> V4:
> > > > >>>> - Fixed review comments. (Wajdeczko)
> > > > >>>> - Fix issues reported by patchworks.
> > > > >>>>
> > > > >>>> V2 -> V3:
> > > > >>>> - Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
> > > > >>>> - Updated emit_flush_invalidate() to use vmovdqu instruction.
> > > > >>>>
> > > > >>>> V1 -> V2:
> > > > >>>> - Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
> > > > >>>>     (Auld, Matthew)
> > > > >>>>     - Fix issues reported by patchworks.
> > > > >>>> ---
> > > > >>>>    drivers/gpu/drm/xe/xe_migrate.c | 112 ++++++++++++++++++++++++++------
> > > > >>>>    1 file changed, 91 insertions(+), 21 deletions(-)
> > > > >>>>
> > > > >>>> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> > > > >>>> index 3112c966c67d..e0be7396a0ab 100644
> > > > >>>> --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > > >>>> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > > >>>> @@ -5,6 +5,8 @@
> > > > >>>>    
> > > > >>>>    #include "xe_migrate.h"
> > > > >>>>    
> > > > >>>> +#include <asm/fpu/api.h>
> > > > >>>> +#include <asm/cpufeature.h>
> > > > >>>>    #include <linux/bitfield.h>
> > > > >>>>    #include <linux/sizes.h>
> > > > >>>>    
> > > > >>>> @@ -33,6 +35,7 @@
> > > > >>>>    #include "xe_res_cursor.h"
> > > > >>>>    #include "xe_sa.h"
> > > > >>>>    #include "xe_sched_job.h"
> > > > >>>> +#include "xe_sriov_vf_ccs.h"
> > > > >>>>    #include "xe_sync.h"
> > > > >>>>    #include "xe_trace_bo.h"
> > > > >>>>    #include "xe_validation.h"
> > > > >>>> @@ -657,18 +660,68 @@ static void emit_pte(struct xe_migrate *m,
> > > > >>>>    	}
> > > > >>>>    }
> > > > >>>>    
> > > > >>>> -#define EMIT_COPY_CCS_DW 5
> > > > >>>> +/*
> > > > >>>> + * VF KMD registers two specialized LRCs with the GuC to handle save/restore
> > > > >>>> + * operations for CCS metadata on IGPU. The GuC executes these LRCAs during
> > > > >>>> + * VF state/restore operations.
> > > > >>>> + *
> > > > >>>> + * Each LRC contains a batch buffer pool that GuC submits to hardware during
> > > > >>>> + * VF state save/restore operations. Since these operations can occur
> > > > >>>> + * asynchronously at any time, we must ensure GPU instructions in the batch
> > > > >>>> + * buffer are written atomically to prevent corruption from incomplete writes.
> > > > >>>> + *
> > > > >>>> + * To guarantee atomic instruction writes, we use x86 SIMD instructions
> > > > >>>> + * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
> > > > >>>> + * sections. This prevents vCPU preemption during instruction generation,
> > > > >>>> + * ensuring complete GPU commands are written to the batch buffer.
> > > > >>>> + */
> > > > >>>> +
> > > > >>>> +static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
> > > > >>>> +{
> > > > >>>> +	xe_assert(xe, !IS_DGFX(xe));
> > > > >>>> +#ifdef CONFIG_X86
> > > > >>>> +	kernel_fpu_begin();
> > > > >>>> +	if (size == SZ_128) {
> > > > >>>> +		asm("vmovdqu (%0), %%xmm0\n"
> > > > >>>> +		    "vmovups %%xmm0,   (%1)\n"
> > > > >>>> +		    :: "r" (src), "r" (dst) : "memory");
> > > > >>>> +	} else if (size == SZ_256) {
> > > > >>>> +		asm("vmovdqu (%0), %%ymm0\n"
> > > > >>>> +		    "vmovups %%ymm0,   (%1)\n"
> > > > >>>> +		    :: "r" (src), "r" (dst) : "memory");
> > > > >>>> +	}
> > > > >>>> +	kernel_fpu_end();
> > > > >>>> +#endif
> > > > >>>> +}
> > > > >>>> +
> > > > >>>> +static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
> > > > >>>> +{
> > > > >>>> +	u32 instr_size = size * BITS_PER_BYTE;
> > > > >>>> +
> > > > >>>> +	xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
> > > > >>>> +
> > > > >>>> +	if (IS_VF_CCS_READY(gt_to_xe(gt))) {
> > > > >>>> +		xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
> > > > >>>> +		memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
> > > > >>>> +	} else {
> > > > >>>> +		memcpy(dst, src, size);
> > > > >>>> +	}
> > > > >>>> +}
> > > > >>>> +
> > > > >>>> +#define EMIT_COPY_CCS_DW 8
> > > > >>>>    static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > > > >>>>    			  u64 dst_ofs, bool dst_is_indirect,
> > > > >>>>    			  u64 src_ofs, bool src_is_indirect,
> > > > >>>>    			  u32 size)
> > > > >>>>    {
> > > > >>>> +	u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
> > > > >>>>    	struct xe_device *xe = gt_to_xe(gt);
> > > > >>>>    	u32 *cs = bb->cs + bb->len;
> > > > >>>>    	u32 num_ccs_blks;
> > > > >>>>    	u32 num_pages;
> > > > >>>>    	u32 ccs_copy_size;
> > > > >>>>    	u32 mocs;
> > > > >>>> +	u32 i = 0;
> > > > >>>>    
> > > > >>>>    	if (GRAPHICS_VERx100(xe) >= 2000) {
> > > > >>>>    		num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
> > > > >>>> @@ -686,15 +739,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > > > >>>>    		mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
> > > > >>>>    	}
> > > > >>>>    
> > > > >>>> -	*cs++ = XY_CTRL_SURF_COPY_BLT |
> > > > >>>> -		(src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > > > >>>> -		(dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > > > >>>> -		ccs_copy_size;
> > > > >>>> -	*cs++ = lower_32_bits(src_ofs);
> > > > >>>> -	*cs++ = upper_32_bits(src_ofs) | mocs;
> > > > >>>> -	*cs++ = lower_32_bits(dst_ofs);
> > > > >>>> -	*cs++ = upper_32_bits(dst_ofs) | mocs;
> > > > >>>> +	dw[i++] = XY_CTRL_SURF_COPY_BLT |
> > > > >>>> +		  (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > > > >>>> +		  (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > > > >>>> +		  ccs_copy_size;
> > > > >>>> +	dw[i++] = lower_32_bits(src_ofs);
> > > > >>>> +	dw[i++] = upper_32_bits(src_ofs) | mocs;
> > > > >>>> +	dw[i++] = lower_32_bits(dst_ofs);
> > > > >>>> +	dw[i++] = upper_32_bits(dst_ofs) | mocs;
> > > > >>>>    
> > > > >>>> +	/*
> > > > >>>> +	 * The CCS copy command is a 5-dword sequence. If the vCPU halts during
> > > > >>>> +	 * save/restore while this sequence is being issued, partial writes may trigger
> > > > >>>> +	 * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
> > > > >>>> +	 * write the sequence atomically.
> > > > >>>> +	 */
> > > > >>>> +	emit_atomic(gt, cs, dw, sizeof(dw));
> > > > >>>> +	cs += EMIT_COPY_CCS_DW;
> > > > >>>>    	bb->len = cs - bb->cs;
> > > > >>>>    }
> > > > >>>>    
> > > > >>>> @@ -1006,18 +1067,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
> > > > >>>>    	return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
> > > > >>>>    }
> > > > >>>>    
> > > > >>>> -static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
> > > > >>>> +/*
> > > > >>>> + * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
> > > > >>>> + * save/restore while this sequence is being issued, partial writes may
> > > > >>>> + * trigger page faults when saving iGPU CCS metadata. Use
> > > > >>>> + * emit_atomic() to write the sequence atomically.
> > > > >>>> + */
> > > > >>>> +#define EMIT_FLUSH_INVALIDATE_DW 4
> > > > >>>> +static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
> > > > >>>>    {
> > > > >>>>    	u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
> > > > >>>> +	u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
> > > > >>>> +
> > > > >>>> +	dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > > > >>>> +		      MI_FLUSH_IMM_DW | flags;
> > > > >>>> +	dw[j++] = lower_32_bits(addr);
> > > > >>>> +	dw[j++] = upper_32_bits(addr);
> > > > >>>> +	dw[j++] = MI_NOOP;
> > > > >>>>    
> > > > >>>> -	dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > > > >>>> -		  MI_FLUSH_IMM_DW | flags;
> > > > >>>> -	dw[i++] = lower_32_bits(addr);
> > > > >>>> -	dw[i++] = upper_32_bits(addr);
> > > > >>>> -	dw[i++] = MI_NOOP;
> > > > >>>> -	dw[i++] = MI_NOOP;
> > > > >>>> +	emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
> > > > >>>>    
> > > > >>>> -	return i;
> > > > >>>> +	return i + j;
> > > > >>>>    }
> > > > >>>>    
> > > > >>>>    /**
> > > > >>>> @@ -1062,7 +1132,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > > >>>>    	/* Calculate Batch buffer size */
> > > > >>>>    	batch_size = 0;
> > > > >>>>    	while (size) {
> > > > >>>> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > > > >>>> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > > > >>>>    		u64 ccs_ofs, ccs_size;
> > > > >>>>    		u32 ccs_pt;
> > > > >>>>    
> > > > >>>> @@ -1103,7 +1173,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > > >>>>    	 * sizes here again before copy command is emitted.
> > > > >>>>    	 */
> > > > >>>>    	while (size) {
> > > > >>>> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > > > >>>> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > > > >>>>    		u32 flush_flags = 0;
> > > > >>>>    		u64 ccs_ofs, ccs_size;
> > > > >>>>    		u32 ccs_pt;
> > > > >>>> @@ -1126,11 +1196,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > > >>>>    
> > > > >>>>    		emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
> > > > >>>>    
> > > > >>>> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > > > >>>> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> > > > >>>>    		flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
> > > > >>>>    						  src_L0_ofs, dst_is_pltt,
> > > > >>>>    						  src_L0, ccs_ofs, true);
> > > > >>>> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > > > >>>> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> > > > >>>>    
> > > > >>>>    		size -= src_L0;
> > > > >>>>    	}
> > > > >>>> -- 
> > > > >>>> 2.51.0
> > > > >>>
> > > > > 
> > > 
> > > -- 
> > > Ville Syrjälä
> > > Intel
> 
> -- 
> Matt Roper
> Graphics Software Engineer
> Linux GPU Platform Enablement
> Intel Corporation

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/3] drm/xe/migrate: Atomicize CCS copy command setup
  2025-10-17 14:12 ` [PATCH v7 1/3] " Satyanarayana K V P
  2025-10-17 14:27   ` Ville Syrjälä
@ 2025-10-17 18:11   ` Ville Syrjälä
  2025-10-17 18:24     ` Rodrigo Vivi
  1 sibling, 1 reply; 21+ messages in thread
From: Ville Syrjälä @ 2025-10-17 18:11 UTC (permalink / raw)
  To: Satyanarayana K V P
  Cc: intel-xe, Michal Wajdeczko, Matthew Brost, Matthew Auld,
	Rodrigo Vivi, Matt Roper

On Fri, Oct 17, 2025 at 07:42:28PM +0530, Satyanarayana K V P wrote:
> The CCS copy command is a 5-dword sequence. If the vCPU halts during
> save/restore while this sequence is being programmed, partial writes may
> trigger page faults when saving IGPU CCS metadata. Use the VMOVDQU
> instruction to write the sequence atomically.
> 
> Since VMOVDQU operates on 256-bit chunks, update EMIT_COPY_CCS_DW to emit
> 8 dwords instead of 5 dwords.
> 
> Update emit_flush_invalidate() to use VMOVDQU operating with 128-bit
> chunks.
> 
> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> 
> ---
> V6 -> V7:
> - Added description explaining why to use assembly instructions for
> atomicity.
> - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
> - Include <asm/cpufeature.h> though checkpatch complains. With
> <linux/cpufeature.h> KUnit is throwing errors.
> 
> V5 -> V6:
> - Fixed review comments (Rodrigo)
> 
> V4 -> V5:
> - Fixed review comments. (Matt B)
> 
> V3 -> V4:
> - Fixed review comments. (Wajdeczko)
> - Fix issues reported by patchworks.
> 
> V2 -> V3:
> - Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
> - Updated emit_flush_invalidate() to use vmovdqu instruction.
> 
> V1 -> V2:
> - Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
>   (Auld, Matthew)
>   - Fix issues reported by patchworks.
> ---
>  drivers/gpu/drm/xe/xe_migrate.c | 112 ++++++++++++++++++++++++++------
>  1 file changed, 91 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index 3112c966c67d..e0be7396a0ab 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -5,6 +5,8 @@
>  
>  #include "xe_migrate.h"
>  
> +#include <asm/fpu/api.h>
> +#include <asm/cpufeature.h>
>  #include <linux/bitfield.h>
>  #include <linux/sizes.h>
>  
> @@ -33,6 +35,7 @@
>  #include "xe_res_cursor.h"
>  #include "xe_sa.h"
>  #include "xe_sched_job.h"
> +#include "xe_sriov_vf_ccs.h"
>  #include "xe_sync.h"
>  #include "xe_trace_bo.h"
>  #include "xe_validation.h"
> @@ -657,18 +660,68 @@ static void emit_pte(struct xe_migrate *m,
>  	}
>  }
>  
> -#define EMIT_COPY_CCS_DW 5
> +/*
> + * VF KMD registers two specialized LRCs with the GuC to handle save/restore
> + * operations for CCS metadata on IGPU. The GuC executes these LRCAs during
> + * VF state/restore operations.
> + *
> + * Each LRC contains a batch buffer pool that GuC submits to hardware during
> + * VF state save/restore operations. Since these operations can occur
> + * asynchronously at any time, we must ensure GPU instructions in the batch
> + * buffer are written atomically to prevent corruption from incomplete writes.
> + *
> + * To guarantee atomic instruction writes, we use x86 SIMD instructions
> + * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
> + * sections. This prevents vCPU preemption during instruction generation,
> + * ensuring complete GPU commands are written to the batch buffer.
> + */
> +
> +static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
> +{
> +	xe_assert(xe, !IS_DGFX(xe));
> +#ifdef CONFIG_X86
> +	kernel_fpu_begin();
> +	if (size == SZ_128) {
> +		asm("vmovdqu (%0), %%xmm0\n"
> +		    "vmovups %%xmm0,   (%1)\n"
> +		    :: "r" (src), "r" (dst) : "memory");

AFAICS atomicity guarantee is only given for the aligned variants.

> +	} else if (size == SZ_256) {
> +		asm("vmovdqu (%0), %%ymm0\n"
> +		    "vmovups %%ymm0,   (%1)\n"
> +		    :: "r" (src), "r" (dst) : "memory");

There is no 32B atomicity guarantee listed in the docs.

The only bigger guaranteed atomic thing I can see is
MOVDIR64B but dunno what subset of CPUs have that.

> +	}
> +	kernel_fpu_end();
> +#endif
> +}
> +
> +static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
> +{
> +	u32 instr_size = size * BITS_PER_BYTE;
> +
> +	xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
> +
> +	if (IS_VF_CCS_READY(gt_to_xe(gt))) {
> +		xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
> +		memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
> +	} else {
> +		memcpy(dst, src, size);
> +	}
> +}
> +
> +#define EMIT_COPY_CCS_DW 8
>  static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
>  			  u64 dst_ofs, bool dst_is_indirect,
>  			  u64 src_ofs, bool src_is_indirect,
>  			  u32 size)
>  {
> +	u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
>  	struct xe_device *xe = gt_to_xe(gt);
>  	u32 *cs = bb->cs + bb->len;
>  	u32 num_ccs_blks;
>  	u32 num_pages;
>  	u32 ccs_copy_size;
>  	u32 mocs;
> +	u32 i = 0;
>  
>  	if (GRAPHICS_VERx100(xe) >= 2000) {
>  		num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
> @@ -686,15 +739,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
>  		mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
>  	}
>  
> -	*cs++ = XY_CTRL_SURF_COPY_BLT |
> -		(src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> -		(dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> -		ccs_copy_size;
> -	*cs++ = lower_32_bits(src_ofs);
> -	*cs++ = upper_32_bits(src_ofs) | mocs;
> -	*cs++ = lower_32_bits(dst_ofs);
> -	*cs++ = upper_32_bits(dst_ofs) | mocs;
> +	dw[i++] = XY_CTRL_SURF_COPY_BLT |
> +		  (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> +		  (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> +		  ccs_copy_size;
> +	dw[i++] = lower_32_bits(src_ofs);
> +	dw[i++] = upper_32_bits(src_ofs) | mocs;
> +	dw[i++] = lower_32_bits(dst_ofs);
> +	dw[i++] = upper_32_bits(dst_ofs) | mocs;
>  
> +	/*
> +	 * The CCS copy command is a 5-dword sequence. If the vCPU halts during
> +	 * save/restore while this sequence is being issued, partial writes may trigger
> +	 * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
> +	 * write the sequence atomically.
> +	 */
> +	emit_atomic(gt, cs, dw, sizeof(dw));
> +	cs += EMIT_COPY_CCS_DW;
>  	bb->len = cs - bb->cs;
>  }
>  
> @@ -1006,18 +1067,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
>  	return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
>  }
>  
> -static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
> +/*
> + * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
> + * save/restore while this sequence is being issued, partial writes may
> + * trigger page faults when saving iGPU CCS metadata. Use
> + * emit_atomic() to write the sequence atomically.
> + */
> +#define EMIT_FLUSH_INVALIDATE_DW 4
> +static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
>  {
>  	u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
> +	u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
> +
> +	dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> +		      MI_FLUSH_IMM_DW | flags;
> +	dw[j++] = lower_32_bits(addr);
> +	dw[j++] = upper_32_bits(addr);
> +	dw[j++] = MI_NOOP;
>  
> -	dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> -		  MI_FLUSH_IMM_DW | flags;
> -	dw[i++] = lower_32_bits(addr);
> -	dw[i++] = upper_32_bits(addr);
> -	dw[i++] = MI_NOOP;
> -	dw[i++] = MI_NOOP;
> +	emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
>  
> -	return i;
> +	return i + j;
>  }
>  
>  /**
> @@ -1062,7 +1132,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
>  	/* Calculate Batch buffer size */
>  	batch_size = 0;
>  	while (size) {
> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
>  		u64 ccs_ofs, ccs_size;
>  		u32 ccs_pt;
>  
> @@ -1103,7 +1173,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
>  	 * sizes here again before copy command is emitted.
>  	 */
>  	while (size) {
> -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
>  		u32 flush_flags = 0;
>  		u64 ccs_ofs, ccs_size;
>  		u32 ccs_pt;
> @@ -1126,11 +1196,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
>  
>  		emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
>  
> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
>  		flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
>  						  src_L0_ofs, dst_is_pltt,
>  						  src_L0, ccs_ofs, true);
> -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
>  
>  		size -= src_L0;
>  	}
> -- 
> 2.51.0

-- 
Ville Syrjälä
Intel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 1/3] drm/xe/migrate: Atomicize CCS copy command setup
  2025-10-17 18:11   ` Ville Syrjälä
@ 2025-10-17 18:24     ` Rodrigo Vivi
  0 siblings, 0 replies; 21+ messages in thread
From: Rodrigo Vivi @ 2025-10-17 18:24 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Satyanarayana K V P, intel-xe, Michal Wajdeczko, Matthew Brost,
	Matthew Auld, Matt Roper

On Fri, Oct 17, 2025 at 09:11:37PM +0300, Ville Syrjälä wrote:
> On Fri, Oct 17, 2025 at 07:42:28PM +0530, Satyanarayana K V P wrote:
> > The CCS copy command is a 5-dword sequence. If the vCPU halts during
> > save/restore while this sequence is being programmed, partial writes may
> > trigger page faults when saving IGPU CCS metadata. Use the VMOVDQU
> > instruction to write the sequence atomically.
> > 
> > Since VMOVDQU operates on 256-bit chunks, update EMIT_COPY_CCS_DW to emit
> > 8 dwords instead of 5 dwords.
> > 
> > Update emit_flush_invalidate() to use VMOVDQU operating with 128-bit
> > chunks.
> > 
> > Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: Matthew Auld <matthew.auld@intel.com>
> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > Cc: Matt Roper <matthew.d.roper@intel.com>
> > 
> > ---
> > V6 -> V7:
> > - Added description explaining why to use assembly instructions for
> > atomicity.
> > - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
> > - Include <asm/cpufeature.h> though checkpatch complains. With
> > <linux/cpufeature.h> KUnit is throwing errors.
> > 
> > V5 -> V6:
> > - Fixed review comments (Rodrigo)
> > 
> > V4 -> V5:
> > - Fixed review comments. (Matt B)
> > 
> > V3 -> V4:
> > - Fixed review comments. (Wajdeczko)
> > - Fix issues reported by patchworks.
> > 
> > V2 -> V3:
> > - Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
> > - Updated emit_flush_invalidate() to use vmovdqu instruction.
> > 
> > V1 -> V2:
> > - Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
> >   (Auld, Matthew)
> >   - Fix issues reported by patchworks.
> > ---
> >  drivers/gpu/drm/xe/xe_migrate.c | 112 ++++++++++++++++++++++++++------
> >  1 file changed, 91 insertions(+), 21 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> > index 3112c966c67d..e0be7396a0ab 100644
> > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > @@ -5,6 +5,8 @@
> >  
> >  #include "xe_migrate.h"
> >  
> > +#include <asm/fpu/api.h>
> > +#include <asm/cpufeature.h>
> >  #include <linux/bitfield.h>
> >  #include <linux/sizes.h>
> >  
> > @@ -33,6 +35,7 @@
> >  #include "xe_res_cursor.h"
> >  #include "xe_sa.h"
> >  #include "xe_sched_job.h"
> > +#include "xe_sriov_vf_ccs.h"
> >  #include "xe_sync.h"
> >  #include "xe_trace_bo.h"
> >  #include "xe_validation.h"
> > @@ -657,18 +660,68 @@ static void emit_pte(struct xe_migrate *m,
> >  	}
> >  }
> >  
> > -#define EMIT_COPY_CCS_DW 5
> > +/*
> > + * VF KMD registers two specialized LRCs with the GuC to handle save/restore
> > + * operations for CCS metadata on IGPU. The GuC executes these LRCAs during
> > + * VF state/restore operations.
> > + *
> > + * Each LRC contains a batch buffer pool that GuC submits to hardware during
> > + * VF state save/restore operations. Since these operations can occur
> > + * asynchronously at any time, we must ensure GPU instructions in the batch
> > + * buffer are written atomically to prevent corruption from incomplete writes.
> > + *
> > + * To guarantee atomic instruction writes, we use x86 SIMD instructions
> > + * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
> > + * sections. This prevents vCPU preemption during instruction generation,
> > + * ensuring complete GPU commands are written to the batch buffer.
> > + */
> > +
> > +static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
> > +{
> > +	xe_assert(xe, !IS_DGFX(xe));
> > +#ifdef CONFIG_X86
> > +	kernel_fpu_begin();
> > +	if (size == SZ_128) {
> > +		asm("vmovdqu (%0), %%xmm0\n"
> > +		    "vmovups %%xmm0,   (%1)\n"
> > +		    :: "r" (src), "r" (dst) : "memory");
> 
> AFAICS atomicity guarantee is only given for the aligned variants.

Yes, I already told the same.

We should probably avoid at all the word 'atomic' even in the subject
and anywhere else in this code.

This is not an atomic memory write. It is just an ugly hack to block
the VM-stop in the middle of the BB write.

> 
> > +	} else if (size == SZ_256) {
> > +		asm("vmovdqu (%0), %%ymm0\n"
> > +		    "vmovups %%ymm0,   (%1)\n"
> > +		    :: "r" (src), "r" (dst) : "memory");
> 
> There is no 32B atomicity guarantee listed in the docs.
> 
> The only bigger guaranteed atomic thing I can see is
> MOVDIR64B but dunno what subset of CPUs have that.
> 
> > +	}
> > +	kernel_fpu_end();
> > +#endif
> > +}
> > +
> > +static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
> > +{
> > +	u32 instr_size = size * BITS_PER_BYTE;
> > +
> > +	xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
> > +
> > +	if (IS_VF_CCS_READY(gt_to_xe(gt))) {
> > +		xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
> > +		memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
> > +	} else {
> > +		memcpy(dst, src, size);
> > +	}
> > +}
> > +
> > +#define EMIT_COPY_CCS_DW 8
> >  static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> >  			  u64 dst_ofs, bool dst_is_indirect,
> >  			  u64 src_ofs, bool src_is_indirect,
> >  			  u32 size)
> >  {
> > +	u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
> >  	struct xe_device *xe = gt_to_xe(gt);
> >  	u32 *cs = bb->cs + bb->len;
> >  	u32 num_ccs_blks;
> >  	u32 num_pages;
> >  	u32 ccs_copy_size;
> >  	u32 mocs;
> > +	u32 i = 0;
> >  
> >  	if (GRAPHICS_VERx100(xe) >= 2000) {
> >  		num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
> > @@ -686,15 +739,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> >  		mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
> >  	}
> >  
> > -	*cs++ = XY_CTRL_SURF_COPY_BLT |
> > -		(src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > -		(dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > -		ccs_copy_size;
> > -	*cs++ = lower_32_bits(src_ofs);
> > -	*cs++ = upper_32_bits(src_ofs) | mocs;
> > -	*cs++ = lower_32_bits(dst_ofs);
> > -	*cs++ = upper_32_bits(dst_ofs) | mocs;
> > +	dw[i++] = XY_CTRL_SURF_COPY_BLT |
> > +		  (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > +		  (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > +		  ccs_copy_size;
> > +	dw[i++] = lower_32_bits(src_ofs);
> > +	dw[i++] = upper_32_bits(src_ofs) | mocs;
> > +	dw[i++] = lower_32_bits(dst_ofs);
> > +	dw[i++] = upper_32_bits(dst_ofs) | mocs;
> >  
> > +	/*
> > +	 * The CCS copy command is a 5-dword sequence. If the vCPU halts during
> > +	 * save/restore while this sequence is being issued, partial writes may trigger
> > +	 * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
> > +	 * write the sequence atomically.
> > +	 */
> > +	emit_atomic(gt, cs, dw, sizeof(dw));
> > +	cs += EMIT_COPY_CCS_DW;
> >  	bb->len = cs - bb->cs;
> >  }
> >  
> > @@ -1006,18 +1067,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
> >  	return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
> >  }
> >  
> > -static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
> > +/*
> > + * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
> > + * save/restore while this sequence is being issued, partial writes may
> > + * trigger page faults when saving iGPU CCS metadata. Use
> > + * emit_atomic() to write the sequence atomically.
> > + */
> > +#define EMIT_FLUSH_INVALIDATE_DW 4
> > +static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
> >  {
> >  	u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
> > +	u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
> > +
> > +	dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > +		      MI_FLUSH_IMM_DW | flags;
> > +	dw[j++] = lower_32_bits(addr);
> > +	dw[j++] = upper_32_bits(addr);
> > +	dw[j++] = MI_NOOP;
> >  
> > -	dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > -		  MI_FLUSH_IMM_DW | flags;
> > -	dw[i++] = lower_32_bits(addr);
> > -	dw[i++] = upper_32_bits(addr);
> > -	dw[i++] = MI_NOOP;
> > -	dw[i++] = MI_NOOP;
> > +	emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
> >  
> > -	return i;
> > +	return i + j;
> >  }
> >  
> >  /**
> > @@ -1062,7 +1132,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> >  	/* Calculate Batch buffer size */
> >  	batch_size = 0;
> >  	while (size) {
> > -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> >  		u64 ccs_ofs, ccs_size;
> >  		u32 ccs_pt;
> >  
> > @@ -1103,7 +1173,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> >  	 * sizes here again before copy command is emitted.
> >  	 */
> >  	while (size) {
> > -		batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > +		batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> >  		u32 flush_flags = 0;
> >  		u64 ccs_ofs, ccs_size;
> >  		u32 ccs_pt;
> > @@ -1126,11 +1196,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> >  
> >  		emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
> >  
> > -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> >  		flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
> >  						  src_L0_ofs, dst_is_pltt,
> >  						  src_L0, ccs_ofs, true);
> > -		bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > +		bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> >  
> >  		size -= src_L0;
> >  	}
> > -- 
> > 2.51.0
> 
> -- 
> Ville Syrjälä
> Intel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v7 2/3] drm/xe/migrate: Make emit_pte() header write atomic
  2025-10-17 14:12 [PATCH v7 0/3] drm/xe/migrate: Atomicize CCS copy command setup Satyanarayana K V P
  2025-10-17 14:12 ` [PATCH v7 1/3] " Satyanarayana K V P
@ 2025-10-17 14:12 ` Satyanarayana K V P
  2025-10-17 14:12 ` [PATCH v7 3/3] drm/xe/vf: Clear CCS read/write buffers in atomic way Satyanarayana K V P
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Satyanarayana K V P @ 2025-10-17 14:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Satyanarayana K V P, Michal Wajdeczko, Matthew Brost,
	Matthew Auld

The MI_STORE_DATA_IMM instruction header is quad dword in size. If the
vCPU halts during save/restore while this sequence is being programmed,
partial writes may trigger page faults when saving IGPU CCS metadata.
Update instruction header atomically.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>

---
V6 -> V7:
-None.

V5 -> V6:
- Use emit_atomic() function to write MI_STORE_DATA_IMM instruction
(Matt B).

V4 -> V5:
- Fixed review comments (Matt B).

V3 -> V4:
- New commit added.

V2 -> V3:
- None

V1 -> V2:
- None
---
 drivers/gpu/drm/xe/xe_migrate.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index e0be7396a0ab..88634a26ebf5 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -89,6 +89,8 @@ struct xe_migrate {
 #define MAX_NUM_PTE 512
 #define IDENTITY_OFFSET 256ULL
 
+static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size);
+
 /*
  * Although MI_STORE_DATA_IMM's "length" field is 10-bits, 0x3FE is the largest
  * legal value accepted.  Since that instruction field is always stored in
@@ -596,6 +598,7 @@ static u32 pte_update_size(struct xe_migrate *m,
 	return cmds;
 }
 
+#define EMIT_STORE_DATA_IMM_DW 4
 static void emit_pte(struct xe_migrate *m,
 		     struct xe_bb *bb, u32 at_pt,
 		     bool is_vram, bool is_comp_pte,
@@ -619,11 +622,16 @@ static void emit_pte(struct xe_migrate *m,
 	ptes = DIV_ROUND_UP(size, XE_PAGE_SIZE);
 
 	while (ptes) {
+		u32 dw[EMIT_STORE_DATA_IMM_DW] = {MI_NOOP}, i = 0;
 		u32 chunk = min(MAX_PTE_PER_SDI, ptes);
 
-		bb->cs[bb->len++] = MI_STORE_DATA_IMM | MI_SDI_NUM_QW(chunk);
-		bb->cs[bb->len++] = ofs;
-		bb->cs[bb->len++] = 0;
+		dw[i++] = MI_STORE_DATA_IMM | MI_SDI_NUM_QW(chunk);
+		dw[i++] = ofs;
+		dw[i++] = 0;
+
+		emit_atomic(m->q->gt, &bb->cs[bb->len], dw, sizeof(dw));
+
+		bb->len += i;
 
 		cur_ofs = ofs;
 		ofs += chunk * 8;
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v7 3/3] drm/xe/vf: Clear CCS read/write buffers in atomic way
  2025-10-17 14:12 [PATCH v7 0/3] drm/xe/migrate: Atomicize CCS copy command setup Satyanarayana K V P
  2025-10-17 14:12 ` [PATCH v7 1/3] " Satyanarayana K V P
  2025-10-17 14:12 ` [PATCH v7 2/3] drm/xe/migrate: Make emit_pte() header write atomic Satyanarayana K V P
@ 2025-10-17 14:12 ` Satyanarayana K V P
  2025-10-17 14:17 ` ✗ CI.checkpatch: warning for drm/xe/migrate: Atomicize CCS copy command setup Patchwork
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Satyanarayana K V P @ 2025-10-17 14:12 UTC (permalink / raw)
  To: intel-xe; +Cc: Satyanarayana K V P, Michal Wajdeczko, Matthew Brost,
	Matthew Auld

Clear the contents of the CCS read/write batch buffer, ensuring no page
faults / GPU hang occur if migration happens midway.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>

---
V6 -> V7:
- None.

V5 -> V6:
- Used xe_gt_assert() instead of xe_assert() (Matt B).

V4 -> V5:
- Fixed review comments (Matt B).

V3 -> V4:
- New commit added.

V2 -> V3:
- None

V1 -> V2:
- None
---
 drivers/gpu/drm/xe/xe_migrate.c      | 134 +++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_migrate.h      |   3 +
 drivers/gpu/drm/xe/xe_sriov_vf_ccs.c |   5 +-
 3 files changed, 141 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 88634a26ebf5..8d9d79018555 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -668,6 +668,43 @@ static void emit_pte(struct xe_migrate *m,
 	}
 }
 
+static void emit_pte_clear(struct xe_gt *gt, struct xe_bb *bb, int start_offset,
+			   int end_offset)
+{
+	u32 dw_nop[SZ_2] = {MI_NOOP};
+	int i = start_offset;
+	int len = end_offset;
+	u32 *cs = bb->cs;
+
+	/* Reverses the operations performed by emit_pte() */
+	while (i < len) {
+		u32 dwords, qwords;
+
+		xe_gt_assert(gt, (REG_FIELD_GET(REG_GENMASK(31, 23), cs[i]) == 0x20));
+
+		qwords = REG_FIELD_GET(MI_SDI_LEN_DW, cs[i]);
+		/*
+		 * If Store QW is enabled, then the value of the dwlengh
+		 * includes the header, address and multiple QW pairs of data
+		 * which means the values will be limited to odd values starting
+		 * at a value of 3(3 representing the size of a 5 DW command
+		 * including header, 2 dw address and 2 dw data).
+		 */
+		dwords = qwords - 1;
+		/*
+		 * Do not clear header first. Clear PTEs first and then clear the
+		 * header to avoid page faults.
+		 */
+		memset(&cs[i + 3], MI_NOOP, (dwords) * sizeof(u32));
+
+		xe_device_wmb(gt_to_xe(gt));
+		WRITE_ONCE(*(u64 *)&cs[i], *(u64 *)dw_nop);
+
+		cs[i + 2] = MI_NOOP;
+		i += (dwords + 3);
+	}
+}
+
 /*
  * VF KMD registers two specialized LRCs with the GuC to handle save/restore
  * operations for CCS metadata on IGPU. The GuC executes these LRCAs during
@@ -767,6 +804,18 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
 	bb->len = cs - bb->cs;
 }
 
+static u32 emit_copy_ccs_clear(struct xe_gt *gt, struct xe_bb *bb, u32 offset)
+{
+	u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
+	u32 *cs = bb->cs + offset - EMIT_COPY_CCS_DW;
+
+	xe_gt_assert(gt, (REG_FIELD_GET(REG_GENMASK(31, 22), *cs) == 0x148));
+	emit_atomic(gt, cs, dw, sizeof(dw));
+	xe_device_wmb(gt_to_xe(gt));
+
+	return offset - EMIT_COPY_CCS_DW;
+}
+
 #define EMIT_COPY_DW 10
 static void emit_copy(struct xe_gt *gt, struct xe_bb *bb,
 		      u64 src_ofs, u64 dst_ofs, unsigned int size,
@@ -1098,6 +1147,19 @@ static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 fl
 	return i + j;
 }
 
+static u32 emit_flush_invalidate_clear(struct xe_gt *gt, struct xe_bb *bb,
+				       u32 offset)
+{
+	u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP};
+	u32 *cs = bb->cs + offset - EMIT_FLUSH_INVALIDATE_DW;
+
+	xe_gt_assert(gt, (REG_FIELD_GET(REG_GENMASK(31, 23), *cs) == 0x26));
+
+	emit_atomic(gt, cs, dw, sizeof(dw));
+
+	return offset - EMIT_FLUSH_INVALIDATE_DW;
+}
+
 /**
  * xe_migrate_ccs_rw_copy() - Copy content of TTM resources.
  * @tile: Tile whose migration context to be used.
@@ -1222,6 +1284,78 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
 	return err;
 }
 
+static u32 ccs_rw_pte_size(struct xe_gt *gt, struct xe_bb *bb, u32 offset)
+{
+	int len = bb->len;
+	u32 *cs = bb->cs;
+	u32 i = offset;
+
+	while (i < len) {
+		u32 dwords, qwords;
+
+		xe_gt_assert(gt, (REG_FIELD_GET(REG_GENMASK(31, 23), cs[i]) == 0x20));
+
+		qwords = REG_FIELD_GET(MI_SDI_LEN_DW, cs[i]);
+		/*
+		 * If Store QW is enabled, then the value of the dwlengh
+		 * includes the header, address and multiple QW pairs of data
+		 * which means the values will be limited to odd values starting
+		 * at a value of 3(3 representing the size of a 5 DW command
+		 * including header, 2 dw address and 2 dw data).
+		 */
+		dwords = qwords - 1;
+		i += dwords + 3;
+
+		/*
+		 * Break if the next dword is for emit_flush_invalidate_clear()
+		 * or emit_copy_ccs_clear()
+		 */
+		if ((REG_FIELD_GET(REG_GENMASK(31, 23), cs[i]) == 0x26) ||
+		    (REG_FIELD_GET(REG_GENMASK(31, 22), cs[i]) == 0x148))
+			break;
+	}
+	return i;
+}
+
+/**
+ * xe_migrate_ccs_rw_copy_clear() - Clear the CCS read/write batch buffer
+ * content.
+ * @tile: Tile whose migration context to be used.
+ * @src_bo: The buffer object @src is currently bound to.
+ * @read_write : Creates BB commands for CCS read/write.
+ *
+ * The CCS copy command has three stages: PTE setup, TLB invalidation, and CCS
+ * copy. Each stage includes a header followed by instructions. When clearing,
+ * remove the instructions first, then the header. For the TLB invalidation and
+ * CCS copy stages, ensure the writes are atomic.
+ *
+ * This reverses the operations performed by xe_migrate_ccs_rw_copy().
+ *
+ * Returns: None.
+ */
+void xe_migrate_ccs_rw_copy_clear(struct xe_tile *tile, struct xe_bo *src_bo,
+				  enum xe_sriov_vf_ccs_rw_ctxs read_write)
+{
+	struct xe_bb *bb = src_bo->bb_ccs[read_write];
+	u32 bb_offset = 0, bb_offset_chunk = 0;
+	struct xe_gt *gt = tile->primary_gt;
+
+	while (bb_offset_chunk >= 0 && bb_offset_chunk < bb->len) {
+		bb_offset = ccs_rw_pte_size(gt, bb, bb_offset_chunk);
+		/*
+		 * After PTE entries, we have one TLB invalidation, CCS copy
+		 * command and another TLB invalidation command.
+		 */
+		bb_offset_chunk = bb_offset + EMIT_FLUSH_INVALIDATE_DW +
+				  EMIT_COPY_CCS_DW + EMIT_FLUSH_INVALIDATE_DW;
+
+		bb_offset = emit_flush_invalidate_clear(gt, bb, bb_offset_chunk);
+		bb_offset = emit_copy_ccs_clear(gt, bb, bb_offset);
+		bb_offset = emit_flush_invalidate_clear(gt, bb, bb_offset);
+		emit_pte_clear(gt, bb, bb_offset_chunk, bb_offset);
+	}
+}
+
 /**
  * xe_get_migrate_exec_queue() - Get the execution queue from migrate context.
  * @migrate: Migrate context.
diff --git a/drivers/gpu/drm/xe/xe_migrate.h b/drivers/gpu/drm/xe/xe_migrate.h
index 4fad324b6253..7d3d4c5109dd 100644
--- a/drivers/gpu/drm/xe/xe_migrate.h
+++ b/drivers/gpu/drm/xe/xe_migrate.h
@@ -129,6 +129,9 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
 			   struct xe_bo *src_bo,
 			   enum xe_sriov_vf_ccs_rw_ctxs read_write);
 
+void xe_migrate_ccs_rw_copy_clear(struct xe_tile *tile, struct xe_bo *src_bo,
+				  enum xe_sriov_vf_ccs_rw_ctxs read_write);
+
 struct xe_lrc *xe_migrate_lrc(struct xe_migrate *migrate);
 struct xe_exec_queue *xe_migrate_exec_queue(struct xe_migrate *migrate);
 int xe_migrate_access_memory(struct xe_migrate *m, struct xe_bo *bo,
diff --git a/drivers/gpu/drm/xe/xe_sriov_vf_ccs.c b/drivers/gpu/drm/xe/xe_sriov_vf_ccs.c
index 790249801364..2d3728cb24ca 100644
--- a/drivers/gpu/drm/xe/xe_sriov_vf_ccs.c
+++ b/drivers/gpu/drm/xe/xe_sriov_vf_ccs.c
@@ -387,6 +387,7 @@ int xe_sriov_vf_ccs_detach_bo(struct xe_bo *bo)
 {
 	struct xe_device *xe = xe_bo_device(bo);
 	enum xe_sriov_vf_ccs_rw_ctxs ctx_id;
+	struct xe_tile *tile;
 	struct xe_bb *bb;
 
 	xe_assert(xe, IS_VF_CCS_READY(xe));
@@ -394,12 +395,14 @@ int xe_sriov_vf_ccs_detach_bo(struct xe_bo *bo)
 	if (!xe_bo_has_valid_ccs_bb(bo))
 		return 0;
 
+	tile = xe_device_get_root_tile(xe);
+
 	for_each_ccs_rw_ctx(ctx_id) {
 		bb = bo->bb_ccs[ctx_id];
 		if (!bb)
 			continue;
 
-		memset(bb->cs, MI_NOOP, bb->len * sizeof(u32));
+		xe_migrate_ccs_rw_copy_clear(tile, bo, ctx_id);
 		xe_bb_free(bb, NULL);
 		bo->bb_ccs[ctx_id] = NULL;
 	}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* ✗ CI.checkpatch: warning for drm/xe/migrate: Atomicize CCS copy command setup
  2025-10-17 14:12 [PATCH v7 0/3] drm/xe/migrate: Atomicize CCS copy command setup Satyanarayana K V P
                   ` (2 preceding siblings ...)
  2025-10-17 14:12 ` [PATCH v7 3/3] drm/xe/vf: Clear CCS read/write buffers in atomic way Satyanarayana K V P
@ 2025-10-17 14:17 ` Patchwork
  2025-10-17 14:18 ` ✓ CI.KUnit: success " Patchwork
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Patchwork @ 2025-10-17 14:17 UTC (permalink / raw)
  To: Satyanarayana K V P; +Cc: intel-xe

== Series Details ==

Series: drm/xe/migrate: Atomicize CCS copy command setup
URL   : https://patchwork.freedesktop.org/series/156128/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
fbd08a78c3a3bb17964db2a326514c69c1dca660
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 59e7b6b2b86b49601b0669f0facb308371b1e7cc
Author: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Date:   Fri Oct 17 19:42:30 2025 +0530

    drm/xe/vf: Clear CCS read/write buffers in atomic way
    
    Clear the contents of the CCS read/write batch buffer, ensuring no page
    faults / GPU hang occur if migration happens midway.
    
    Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
    Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
    Cc: Matthew Brost <matthew.brost@intel.com>
    Cc: Matthew Auld <matthew.auld@intel.com>
+ /mt/dim checkpatch c809fdcf60b85e8a261eaa1b49f18b9c5731b18c drm-intel
b68b94ce1e47 drm/xe/migrate: Atomicize CCS copy command setup
-:33: WARNING:INCLUDE_LINUX: Use #include <linux/cpufeature.h> instead of <asm/cpufeature.h>
#33: FILE: drivers/gpu/drm/xe/xe_migrate.c:9:
+#include <asm/cpufeature.h>

total: 0 errors, 1 warnings, 0 checks, 179 lines checked
6de49bc1b74b drm/xe/migrate: Make emit_pte() header write atomic
59e7b6b2b86b drm/xe/vf: Clear CCS read/write buffers in atomic way



^ permalink raw reply	[flat|nested] 21+ messages in thread

* ✓ CI.KUnit: success for drm/xe/migrate: Atomicize CCS copy command setup
  2025-10-17 14:12 [PATCH v7 0/3] drm/xe/migrate: Atomicize CCS copy command setup Satyanarayana K V P
                   ` (3 preceding siblings ...)
  2025-10-17 14:17 ` ✗ CI.checkpatch: warning for drm/xe/migrate: Atomicize CCS copy command setup Patchwork
@ 2025-10-17 14:18 ` Patchwork
  2025-10-17 15:23 ` ✓ Xe.CI.BAT: " Patchwork
  2025-10-18 12:27 ` ✗ Xe.CI.Full: failure " Patchwork
  6 siblings, 0 replies; 21+ messages in thread
From: Patchwork @ 2025-10-17 14:18 UTC (permalink / raw)
  To: Satyanarayana K V P; +Cc: intel-xe

== Series Details ==

Series: drm/xe/migrate: Atomicize CCS copy command setup
URL   : https://patchwork.freedesktop.org/series/156128/
State : success

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
[14:17:01] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[14:17:05] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[14:17:36] Starting KUnit Kernel (1/1)...
[14:17:36] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[14:17:36] ================== guc_buf (11 subtests) ===================
[14:17:36] [PASSED] test_smallest
[14:17:36] [PASSED] test_largest
[14:17:36] [PASSED] test_granular
[14:17:36] [PASSED] test_unique
[14:17:36] [PASSED] test_overlap
[14:17:36] [PASSED] test_reusable
[14:17:36] [PASSED] test_too_big
[14:17:36] [PASSED] test_flush
[14:17:36] [PASSED] test_lookup
[14:17:36] [PASSED] test_data
[14:17:36] [PASSED] test_class
[14:17:36] ===================== [PASSED] guc_buf =====================
[14:17:36] =================== guc_dbm (7 subtests) ===================
[14:17:36] [PASSED] test_empty
[14:17:36] [PASSED] test_default
[14:17:36] ======================== test_size  ========================
[14:17:36] [PASSED] 4
[14:17:36] [PASSED] 8
[14:17:36] [PASSED] 32
[14:17:36] [PASSED] 256
[14:17:36] ==================== [PASSED] test_size ====================
[14:17:36] ======================= test_reuse  ========================
[14:17:36] [PASSED] 4
[14:17:36] [PASSED] 8
[14:17:36] [PASSED] 32
[14:17:36] [PASSED] 256
[14:17:36] =================== [PASSED] test_reuse ====================
[14:17:36] =================== test_range_overlap  ====================
[14:17:36] [PASSED] 4
[14:17:36] [PASSED] 8
[14:17:36] [PASSED] 32
[14:17:36] [PASSED] 256
[14:17:36] =============== [PASSED] test_range_overlap ================
[14:17:36] =================== test_range_compact  ====================
[14:17:36] [PASSED] 4
[14:17:36] [PASSED] 8
[14:17:36] [PASSED] 32
[14:17:36] [PASSED] 256
[14:17:36] =============== [PASSED] test_range_compact ================
[14:17:36] ==================== test_range_spare  =====================
[14:17:36] [PASSED] 4
[14:17:36] [PASSED] 8
[14:17:36] [PASSED] 32
[14:17:36] [PASSED] 256
[14:17:36] ================ [PASSED] test_range_spare =================
[14:17:36] ===================== [PASSED] guc_dbm =====================
[14:17:36] =================== guc_idm (6 subtests) ===================
[14:17:36] [PASSED] bad_init
[14:17:36] [PASSED] no_init
[14:17:36] [PASSED] init_fini
[14:17:36] [PASSED] check_used
[14:17:36] [PASSED] check_quota
[14:17:36] [PASSED] check_all
[14:17:36] ===================== [PASSED] guc_idm =====================
[14:17:36] ================== no_relay (3 subtests) ===================
[14:17:36] [PASSED] xe_drops_guc2pf_if_not_ready
[14:17:36] [PASSED] xe_drops_guc2vf_if_not_ready
[14:17:36] [PASSED] xe_rejects_send_if_not_ready
[14:17:36] ==================== [PASSED] no_relay =====================
[14:17:36] ================== pf_relay (14 subtests) ==================
[14:17:36] [PASSED] pf_rejects_guc2pf_too_short
[14:17:36] [PASSED] pf_rejects_guc2pf_too_long
[14:17:36] [PASSED] pf_rejects_guc2pf_no_payload
[14:17:36] [PASSED] pf_fails_no_payload
[14:17:36] [PASSED] pf_fails_bad_origin
[14:17:36] [PASSED] pf_fails_bad_type
[14:17:36] [PASSED] pf_txn_reports_error
[14:17:36] [PASSED] pf_txn_sends_pf2guc
[14:17:36] [PASSED] pf_sends_pf2guc
[14:17:36] [SKIPPED] pf_loopback_nop
[14:17:36] [SKIPPED] pf_loopback_echo
[14:17:36] [SKIPPED] pf_loopback_fail
[14:17:36] [SKIPPED] pf_loopback_busy
[14:17:36] [SKIPPED] pf_loopback_retry
[14:17:36] ==================== [PASSED] pf_relay =====================
[14:17:36] ================== vf_relay (3 subtests) ===================
[14:17:36] [PASSED] vf_rejects_guc2vf_too_short
[14:17:36] [PASSED] vf_rejects_guc2vf_too_long
[14:17:36] [PASSED] vf_rejects_guc2vf_no_payload
[14:17:36] ==================== [PASSED] vf_relay =====================
[14:17:36] ===================== lmtt (1 subtest) =====================
[14:17:36] ======================== test_ops  =========================
[14:17:36] [PASSED] 2-level
[14:17:36] [PASSED] multi-level
[14:17:36] ==================== [PASSED] test_ops =====================
[14:17:36] ====================== [PASSED] lmtt =======================
[14:17:36] ================= pf_service (11 subtests) =================
[14:17:36] [PASSED] pf_negotiate_any
[14:17:36] [PASSED] pf_negotiate_base_match
[14:17:36] [PASSED] pf_negotiate_base_newer
[14:17:36] [PASSED] pf_negotiate_base_next
[14:17:36] [SKIPPED] pf_negotiate_base_older
[14:17:36] [PASSED] pf_negotiate_base_prev
[14:17:36] [PASSED] pf_negotiate_latest_match
[14:17:36] [PASSED] pf_negotiate_latest_newer
[14:17:36] [PASSED] pf_negotiate_latest_next
[14:17:36] [SKIPPED] pf_negotiate_latest_older
[14:17:36] [SKIPPED] pf_negotiate_latest_prev
[14:17:36] =================== [PASSED] pf_service ====================
[14:17:36] ================= xe_guc_g2g (2 subtests) ==================
[14:17:36] ============== xe_live_guc_g2g_kunit_default  ==============
[14:17:36] ========= [SKIPPED] xe_live_guc_g2g_kunit_default ==========
[14:17:36] ============== xe_live_guc_g2g_kunit_allmem  ===============
[14:17:36] ========== [SKIPPED] xe_live_guc_g2g_kunit_allmem ==========
[14:17:36] =================== [SKIPPED] xe_guc_g2g ===================
[14:17:36] =================== xe_mocs (2 subtests) ===================
[14:17:36] ================ xe_live_mocs_kernel_kunit  ================
[14:17:36] =========== [SKIPPED] xe_live_mocs_kernel_kunit ============
[14:17:36] ================ xe_live_mocs_reset_kunit  =================
[14:17:36] ============ [SKIPPED] xe_live_mocs_reset_kunit ============
[14:17:36] ==================== [SKIPPED] xe_mocs =====================
[14:17:36] ================= xe_migrate (2 subtests) ==================
[14:17:36] ================= xe_migrate_sanity_kunit  =================
[14:17:36] ============ [SKIPPED] xe_migrate_sanity_kunit =============
[14:17:36] ================== xe_validate_ccs_kunit  ==================
[14:17:36] ============= [SKIPPED] xe_validate_ccs_kunit ==============
[14:17:36] =================== [SKIPPED] xe_migrate ===================
[14:17:36] ================== xe_dma_buf (1 subtest) ==================
[14:17:36] ==================== xe_dma_buf_kunit  =====================
[14:17:36] ================ [SKIPPED] xe_dma_buf_kunit ================
[14:17:36] =================== [SKIPPED] xe_dma_buf ===================
[14:17:36] ================= xe_bo_shrink (1 subtest) =================
[14:17:36] =================== xe_bo_shrink_kunit  ====================
[14:17:36] =============== [SKIPPED] xe_bo_shrink_kunit ===============
[14:17:36] ================== [SKIPPED] xe_bo_shrink ==================
[14:17:36] ==================== xe_bo (2 subtests) ====================
[14:17:36] ================== xe_ccs_migrate_kunit  ===================
[14:17:36] ============== [SKIPPED] xe_ccs_migrate_kunit ==============
[14:17:36] ==================== xe_bo_evict_kunit  ====================
[14:17:36] =============== [SKIPPED] xe_bo_evict_kunit ================
[14:17:36] ===================== [SKIPPED] xe_bo ======================
[14:17:36] ==================== args (11 subtests) ====================
[14:17:36] [PASSED] count_args_test
[14:17:36] [PASSED] call_args_example
[14:17:36] [PASSED] call_args_test
[14:17:36] [PASSED] drop_first_arg_example
[14:17:36] [PASSED] drop_first_arg_test
[14:17:36] [PASSED] first_arg_example
[14:17:36] [PASSED] first_arg_test
[14:17:36] [PASSED] last_arg_example
[14:17:36] [PASSED] last_arg_test
[14:17:36] [PASSED] pick_arg_example
[14:17:36] [PASSED] sep_comma_example
[14:17:36] ====================== [PASSED] args =======================
[14:17:36] =================== xe_pci (3 subtests) ====================
[14:17:36] ==================== check_graphics_ip  ====================
[14:17:36] [PASSED] 12.00 Xe_LP
[14:17:36] [PASSED] 12.10 Xe_LP+
[14:17:36] [PASSED] 12.55 Xe_HPG
[14:17:36] [PASSED] 12.60 Xe_HPC
[14:17:36] [PASSED] 12.70 Xe_LPG
[14:17:36] [PASSED] 12.71 Xe_LPG
[14:17:36] [PASSED] 12.74 Xe_LPG+
[14:17:36] [PASSED] 20.01 Xe2_HPG
[14:17:36] [PASSED] 20.02 Xe2_HPG
[14:17:36] [PASSED] 20.04 Xe2_LPG
[14:17:36] [PASSED] 30.00 Xe3_LPG
[14:17:36] [PASSED] 30.01 Xe3_LPG
[14:17:36] [PASSED] 30.03 Xe3_LPG
[14:17:36] ================ [PASSED] check_graphics_ip ================
[14:17:36] ===================== check_media_ip  ======================
[14:17:36] [PASSED] 12.00 Xe_M
[14:17:36] [PASSED] 12.55 Xe_HPM
[14:17:36] [PASSED] 13.00 Xe_LPM+
[14:17:36] [PASSED] 13.01 Xe2_HPM
[14:17:36] [PASSED] 20.00 Xe2_LPM
[14:17:36] [PASSED] 30.00 Xe3_LPM
[14:17:36] [PASSED] 30.02 Xe3_LPM
[14:17:36] ================= [PASSED] check_media_ip ==================
[14:17:36] ================= check_platform_gt_count  =================
[14:17:36] [PASSED] 0x9A60 (TIGERLAKE)
[14:17:36] [PASSED] 0x9A68 (TIGERLAKE)
[14:17:36] [PASSED] 0x9A70 (TIGERLAKE)
[14:17:36] [PASSED] 0x9A40 (TIGERLAKE)
[14:17:36] [PASSED] 0x9A49 (TIGERLAKE)
[14:17:36] [PASSED] 0x9A59 (TIGERLAKE)
[14:17:36] [PASSED] 0x9A78 (TIGERLAKE)
[14:17:36] [PASSED] 0x9AC0 (TIGERLAKE)
[14:17:36] [PASSED] 0x9AC9 (TIGERLAKE)
[14:17:36] [PASSED] 0x9AD9 (TIGERLAKE)
[14:17:36] [PASSED] 0x9AF8 (TIGERLAKE)
[14:17:36] [PASSED] 0x4C80 (ROCKETLAKE)
[14:17:36] [PASSED] 0x4C8A (ROCKETLAKE)
[14:17:36] [PASSED] 0x4C8B (ROCKETLAKE)
[14:17:36] [PASSED] 0x4C8C (ROCKETLAKE)
[14:17:36] [PASSED] 0x4C90 (ROCKETLAKE)
[14:17:36] [PASSED] 0x4C9A (ROCKETLAKE)
[14:17:36] [PASSED] 0x4680 (ALDERLAKE_S)
[14:17:36] [PASSED] 0x4682 (ALDERLAKE_S)
[14:17:36] [PASSED] 0x4688 (ALDERLAKE_S)
[14:17:36] [PASSED] 0x468A (ALDERLAKE_S)
[14:17:36] [PASSED] 0x468B (ALDERLAKE_S)
[14:17:36] [PASSED] 0x4690 (ALDERLAKE_S)
[14:17:36] [PASSED] 0x4692 (ALDERLAKE_S)
[14:17:36] [PASSED] 0x4693 (ALDERLAKE_S)
[14:17:36] [PASSED] 0x46A0 (ALDERLAKE_P)
[14:17:36] [PASSED] 0x46A1 (ALDERLAKE_P)
[14:17:36] [PASSED] 0x46A2 (ALDERLAKE_P)
[14:17:36] [PASSED] 0x46A3 (ALDERLAKE_P)
[14:17:36] [PASSED] 0x46A6 (ALDERLAKE_P)
[14:17:36] [PASSED] 0x46A8 (ALDERLAKE_P)
[14:17:36] [PASSED] 0x46AA (ALDERLAKE_P)
[14:17:36] [PASSED] 0x462A (ALDERLAKE_P)
[14:17:36] [PASSED] 0x4626 (ALDERLAKE_P)
[14:17:36] [PASSED] 0x4628 (ALDERLAKE_P)
[14:17:36] [PASSED] 0x46B0 (ALDERLAKE_P)
[14:17:36] [PASSED] 0x46B1 (ALDERLAKE_P)
[14:17:36] [PASSED] 0x46B2 (ALDERLAKE_P)
[14:17:36] [PASSED] 0x46B3 (ALDERLAKE_P)
[14:17:36] [PASSED] 0x46C0 (ALDERLAKE_P)
[14:17:36] [PASSED] 0x46C1 (ALDERLAKE_P)
[14:17:36] [PASSED] 0x46C2 (ALDERLAKE_P)
[14:17:36] [PASSED] 0x46C3 (ALDERLAKE_P)
[14:17:36] [PASSED] 0x46D0 (ALDERLAKE_N)
[14:17:36] [PASSED] 0x46D1 (ALDERLAKE_N)
[14:17:36] [PASSED] 0x46D2 (ALDERLAKE_N)
[14:17:36] [PASSED] 0x46D3 (ALDERLAKE_N)
[14:17:36] [PASSED] 0x46D4 (ALDERLAKE_N)
[14:17:36] [PASSED] 0xA721 (ALDERLAKE_P)
[14:17:36] [PASSED] 0xA7A1 (ALDERLAKE_P)
[14:17:36] [PASSED] 0xA7A9 (ALDERLAKE_P)
[14:17:36] [PASSED] 0xA7AC (ALDERLAKE_P)
[14:17:36] [PASSED] 0xA7AD (ALDERLAKE_P)
[14:17:36] [PASSED] 0xA720 (ALDERLAKE_P)
[14:17:36] [PASSED] 0xA7A0 (ALDERLAKE_P)
[14:17:36] [PASSED] 0xA7A8 (ALDERLAKE_P)
[14:17:36] [PASSED] 0xA7AA (ALDERLAKE_P)
[14:17:36] [PASSED] 0xA7AB (ALDERLAKE_P)
[14:17:36] [PASSED] 0xA780 (ALDERLAKE_S)
[14:17:36] [PASSED] 0xA781 (ALDERLAKE_S)
[14:17:36] [PASSED] 0xA782 (ALDERLAKE_S)
[14:17:36] [PASSED] 0xA783 (ALDERLAKE_S)
[14:17:36] [PASSED] 0xA788 (ALDERLAKE_S)
[14:17:36] [PASSED] 0xA789 (ALDERLAKE_S)
[14:17:36] [PASSED] 0xA78A (ALDERLAKE_S)
[14:17:36] [PASSED] 0xA78B (ALDERLAKE_S)
[14:17:36] [PASSED] 0x4905 (DG1)
[14:17:36] [PASSED] 0x4906 (DG1)
[14:17:36] [PASSED] 0x4907 (DG1)
[14:17:36] [PASSED] 0x4908 (DG1)
[14:17:36] [PASSED] 0x4909 (DG1)
[14:17:36] [PASSED] 0x56C0 (DG2)
[14:17:36] [PASSED] 0x56C2 (DG2)
[14:17:36] [PASSED] 0x56C1 (DG2)
[14:17:36] [PASSED] 0x7D51 (METEORLAKE)
[14:17:36] [PASSED] 0x7DD1 (METEORLAKE)
[14:17:36] [PASSED] 0x7D41 (METEORLAKE)
[14:17:36] [PASSED] 0x7D67 (METEORLAKE)
[14:17:36] [PASSED] 0xB640 (METEORLAKE)
[14:17:36] [PASSED] 0x56A0 (DG2)
[14:17:36] [PASSED] 0x56A1 (DG2)
[14:17:36] [PASSED] 0x56A2 (DG2)
[14:17:36] [PASSED] 0x56BE (DG2)
[14:17:36] [PASSED] 0x56BF (DG2)
[14:17:36] [PASSED] 0x5690 (DG2)
[14:17:36] [PASSED] 0x5691 (DG2)
[14:17:36] [PASSED] 0x5692 (DG2)
[14:17:36] [PASSED] 0x56A5 (DG2)
[14:17:36] [PASSED] 0x56A6 (DG2)
[14:17:36] [PASSED] 0x56B0 (DG2)
[14:17:36] [PASSED] 0x56B1 (DG2)
[14:17:36] [PASSED] 0x56BA (DG2)
[14:17:36] [PASSED] 0x56BB (DG2)
[14:17:36] [PASSED] 0x56BC (DG2)
[14:17:36] [PASSED] 0x56BD (DG2)
[14:17:36] [PASSED] 0x5693 (DG2)
[14:17:36] [PASSED] 0x5694 (DG2)
[14:17:36] [PASSED] 0x5695 (DG2)
[14:17:36] [PASSED] 0x56A3 (DG2)
[14:17:36] [PASSED] 0x56A4 (DG2)
[14:17:36] [PASSED] 0x56B2 (DG2)
[14:17:36] [PASSED] 0x56B3 (DG2)
[14:17:36] [PASSED] 0x5696 (DG2)
[14:17:36] [PASSED] 0x5697 (DG2)
[14:17:36] [PASSED] 0xB69 (PVC)
[14:17:36] [PASSED] 0xB6E (PVC)
[14:17:36] [PASSED] 0xBD4 (PVC)
[14:17:36] [PASSED] 0xBD5 (PVC)
[14:17:36] [PASSED] 0xBD6 (PVC)
[14:17:36] [PASSED] 0xBD7 (PVC)
[14:17:36] [PASSED] 0xBD8 (PVC)
[14:17:36] [PASSED] 0xBD9 (PVC)
[14:17:36] [PASSED] 0xBDA (PVC)
[14:17:36] [PASSED] 0xBDB (PVC)
[14:17:36] [PASSED] 0xBE0 (PVC)
[14:17:36] [PASSED] 0xBE1 (PVC)
[14:17:36] [PASSED] 0xBE5 (PVC)
[14:17:36] [PASSED] 0x7D40 (METEORLAKE)
[14:17:36] [PASSED] 0x7D45 (METEORLAKE)
[14:17:36] [PASSED] 0x7D55 (METEORLAKE)
[14:17:36] [PASSED] 0x7D60 (METEORLAKE)
[14:17:36] [PASSED] 0x7DD5 (METEORLAKE)
[14:17:36] [PASSED] 0x6420 (LUNARLAKE)
[14:17:36] [PASSED] 0x64A0 (LUNARLAKE)
[14:17:36] [PASSED] 0x64B0 (LUNARLAKE)
[14:17:36] [PASSED] 0xE202 (BATTLEMAGE)
[14:17:36] [PASSED] 0xE209 (BATTLEMAGE)
[14:17:36] [PASSED] 0xE20B (BATTLEMAGE)
[14:17:36] [PASSED] 0xE20C (BATTLEMAGE)
[14:17:36] [PASSED] 0xE20D (BATTLEMAGE)
[14:17:36] [PASSED] 0xE210 (BATTLEMAGE)
[14:17:36] [PASSED] 0xE211 (BATTLEMAGE)
[14:17:36] [PASSED] 0xE212 (BATTLEMAGE)
[14:17:36] [PASSED] 0xE216 (BATTLEMAGE)
[14:17:36] [PASSED] 0xE220 (BATTLEMAGE)
[14:17:36] [PASSED] 0xE221 (BATTLEMAGE)
[14:17:36] [PASSED] 0xE222 (BATTLEMAGE)
[14:17:36] [PASSED] 0xE223 (BATTLEMAGE)
[14:17:36] [PASSED] 0xB080 (PANTHERLAKE)
[14:17:36] [PASSED] 0xB081 (PANTHERLAKE)
[14:17:36] [PASSED] 0xB082 (PANTHERLAKE)
[14:17:36] [PASSED] 0xB083 (PANTHERLAKE)
[14:17:36] [PASSED] 0xB084 (PANTHERLAKE)
[14:17:36] [PASSED] 0xB085 (PANTHERLAKE)
[14:17:36] [PASSED] 0xB086 (PANTHERLAKE)
[14:17:36] [PASSED] 0xB087 (PANTHERLAKE)
[14:17:36] [PASSED] 0xB08F (PANTHERLAKE)
[14:17:36] [PASSED] 0xB090 (PANTHERLAKE)
[14:17:36] [PASSED] 0xB0A0 (PANTHERLAKE)
[14:17:36] [PASSED] 0xB0B0 (PANTHERLAKE)
[14:17:36] [PASSED] 0xFD80 (PANTHERLAKE)
[14:17:36] [PASSED] 0xFD81 (PANTHERLAKE)
[14:17:36] ============= [PASSED] check_platform_gt_count =============
[14:17:36] ===================== [PASSED] xe_pci ======================
[14:17:36] =================== xe_rtp (2 subtests) ====================
[14:17:36] =============== xe_rtp_process_to_sr_tests  ================
[14:17:36] [PASSED] coalesce-same-reg
[14:17:36] [PASSED] no-match-no-add
[14:17:36] [PASSED] match-or
[14:17:36] [PASSED] match-or-xfail
[14:17:36] [PASSED] no-match-no-add-multiple-rules
[14:17:36] [PASSED] two-regs-two-entries
[14:17:36] [PASSED] clr-one-set-other
[14:17:36] [PASSED] set-field
[14:17:36] [PASSED] conflict-duplicate
[14:17:36] [PASSED] conflict-not-disjoint
[14:17:36] [PASSED] conflict-reg-type
[14:17:36] =========== [PASSED] xe_rtp_process_to_sr_tests ============
[14:17:36] ================== xe_rtp_process_tests  ===================
[14:17:36] [PASSED] active1
[14:17:36] [PASSED] active2
[14:17:36] [PASSED] active-inactive
[14:17:36] [PASSED] inactive-active
[14:17:36] [PASSED] inactive-1st_or_active-inactive
[14:17:36] [PASSED] inactive-2nd_or_active-inactive
[14:17:36] [PASSED] inactive-last_or_active-inactive
[14:17:36] [PASSED] inactive-no_or_active-inactive
[14:17:36] ============== [PASSED] xe_rtp_process_tests ===============
[14:17:36] ===================== [PASSED] xe_rtp ======================
[14:17:36] ==================== xe_wa (1 subtest) =====================
[14:17:36] ======================== xe_wa_gt  =========================
[14:17:36] [PASSED] TIGERLAKE B0
[14:17:36] [PASSED] DG1 A0
[14:17:36] [PASSED] DG1 B0
[14:17:36] [PASSED] ALDERLAKE_S A0
[14:17:36] [PASSED] ALDERLAKE_S B0
stty: 'standard input': Inappropriate ioctl for device
[14:17:36] [PASSED] ALDERLAKE_S C0
[14:17:36] [PASSED] ALDERLAKE_S D0
[14:17:36] [PASSED] ALDERLAKE_P A0
[14:17:36] [PASSED] ALDERLAKE_P B0
[14:17:36] [PASSED] ALDERLAKE_P C0
[14:17:36] [PASSED] ALDERLAKE_S RPLS D0
[14:17:36] [PASSED] ALDERLAKE_P RPLU E0
[14:17:36] [PASSED] DG2 G10 C0
[14:17:36] [PASSED] DG2 G11 B1
[14:17:36] [PASSED] DG2 G12 A1
[14:17:36] [PASSED] METEORLAKE 12.70(Xe_LPG) A0 13.00(Xe_LPM+) A0
[14:17:36] [PASSED] METEORLAKE 12.71(Xe_LPG) A0 13.00(Xe_LPM+) A0
[14:17:36] [PASSED] METEORLAKE 12.74(Xe_LPG+) A0 13.00(Xe_LPM+) A0
[14:17:36] [PASSED] LUNARLAKE 20.04(Xe2_LPG) A0 20.00(Xe2_LPM) A0
[14:17:36] [PASSED] LUNARLAKE 20.04(Xe2_LPG) B0 20.00(Xe2_LPM) A0
[14:17:36] [PASSED] BATTLEMAGE 20.01(Xe2_HPG) A0 13.01(Xe2_HPM) A1
[14:17:36] [PASSED] PANTHERLAKE 30.00(Xe3_LPG) A0 30.00(Xe3_LPM) A0
[14:17:36] ==================== [PASSED] xe_wa_gt =====================
[14:17:36] ====================== [PASSED] xe_wa ======================
[14:17:36] ============================================================
[14:17:36] Testing complete. Ran 306 tests: passed: 288, skipped: 18
[14:17:36] Elapsed time: 35.162s total, 4.238s configuring, 30.558s building, 0.336s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/tests/.kunitconfig
[14:17:36] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[14:17:38] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[14:18:03] Starting KUnit Kernel (1/1)...
[14:18:03] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[14:18:03] ============ drm_test_pick_cmdline (2 subtests) ============
[14:18:03] [PASSED] drm_test_pick_cmdline_res_1920_1080_60
[14:18:03] =============== drm_test_pick_cmdline_named  ===============
[14:18:03] [PASSED] NTSC
[14:18:03] [PASSED] NTSC-J
[14:18:03] [PASSED] PAL
[14:18:03] [PASSED] PAL-M
[14:18:03] =========== [PASSED] drm_test_pick_cmdline_named ===========
[14:18:03] ============== [PASSED] drm_test_pick_cmdline ==============
[14:18:03] == drm_test_atomic_get_connector_for_encoder (1 subtest) ===
[14:18:03] [PASSED] drm_test_drm_atomic_get_connector_for_encoder
[14:18:03] ==== [PASSED] drm_test_atomic_get_connector_for_encoder ====
[14:18:03] =========== drm_validate_clone_mode (2 subtests) ===========
[14:18:03] ============== drm_test_check_in_clone_mode  ===============
[14:18:03] [PASSED] in_clone_mode
[14:18:03] [PASSED] not_in_clone_mode
[14:18:03] ========== [PASSED] drm_test_check_in_clone_mode ===========
[14:18:03] =============== drm_test_check_valid_clones  ===============
[14:18:03] [PASSED] not_in_clone_mode
[14:18:03] [PASSED] valid_clone
[14:18:03] [PASSED] invalid_clone
[14:18:03] =========== [PASSED] drm_test_check_valid_clones ===========
[14:18:03] ============= [PASSED] drm_validate_clone_mode =============
[14:18:03] ============= drm_validate_modeset (1 subtest) =============
[14:18:03] [PASSED] drm_test_check_connector_changed_modeset
[14:18:03] ============== [PASSED] drm_validate_modeset ===============
[14:18:03] ====== drm_test_bridge_get_current_state (2 subtests) ======
[14:18:03] [PASSED] drm_test_drm_bridge_get_current_state_atomic
[14:18:03] [PASSED] drm_test_drm_bridge_get_current_state_legacy
[14:18:03] ======== [PASSED] drm_test_bridge_get_current_state ========
[14:18:03] ====== drm_test_bridge_helper_reset_crtc (3 subtests) ======
[14:18:03] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic
[14:18:03] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic_disabled
[14:18:03] [PASSED] drm_test_drm_bridge_helper_reset_crtc_legacy
[14:18:03] ======== [PASSED] drm_test_bridge_helper_reset_crtc ========
[14:18:03] ============== drm_bridge_alloc (2 subtests) ===============
[14:18:03] [PASSED] drm_test_drm_bridge_alloc_basic
[14:18:03] [PASSED] drm_test_drm_bridge_alloc_get_put
[14:18:03] ================ [PASSED] drm_bridge_alloc =================
[14:18:03] ================== drm_buddy (8 subtests) ==================
[14:18:03] [PASSED] drm_test_buddy_alloc_limit
[14:18:03] [PASSED] drm_test_buddy_alloc_optimistic
[14:18:03] [PASSED] drm_test_buddy_alloc_pessimistic
[14:18:03] [PASSED] drm_test_buddy_alloc_pathological
[14:18:03] [PASSED] drm_test_buddy_alloc_contiguous
[14:18:03] [PASSED] drm_test_buddy_alloc_clear
[14:18:03] [PASSED] drm_test_buddy_alloc_range_bias
[14:18:03] [PASSED] drm_test_buddy_fragmentation_performance
[14:18:03] ==================== [PASSED] drm_buddy ====================
[14:18:03] ============= drm_cmdline_parser (40 subtests) =============
[14:18:03] [PASSED] drm_test_cmdline_force_d_only
[14:18:03] [PASSED] drm_test_cmdline_force_D_only_dvi
[14:18:03] [PASSED] drm_test_cmdline_force_D_only_hdmi
[14:18:03] [PASSED] drm_test_cmdline_force_D_only_not_digital
[14:18:03] [PASSED] drm_test_cmdline_force_e_only
[14:18:03] [PASSED] drm_test_cmdline_res
[14:18:03] [PASSED] drm_test_cmdline_res_vesa
[14:18:03] [PASSED] drm_test_cmdline_res_vesa_rblank
[14:18:03] [PASSED] drm_test_cmdline_res_rblank
[14:18:03] [PASSED] drm_test_cmdline_res_bpp
[14:18:03] [PASSED] drm_test_cmdline_res_refresh
[14:18:03] [PASSED] drm_test_cmdline_res_bpp_refresh
[14:18:03] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced
[14:18:03] [PASSED] drm_test_cmdline_res_bpp_refresh_margins
[14:18:03] [PASSED] drm_test_cmdline_res_bpp_refresh_force_off
[14:18:03] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on
[14:18:03] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_analog
[14:18:03] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_digital
[14:18:03] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced_margins_force_on
[14:18:03] [PASSED] drm_test_cmdline_res_margins_force_on
[14:18:03] [PASSED] drm_test_cmdline_res_vesa_margins
[14:18:03] [PASSED] drm_test_cmdline_name
[14:18:03] [PASSED] drm_test_cmdline_name_bpp
[14:18:03] [PASSED] drm_test_cmdline_name_option
[14:18:03] [PASSED] drm_test_cmdline_name_bpp_option
[14:18:03] [PASSED] drm_test_cmdline_rotate_0
[14:18:03] [PASSED] drm_test_cmdline_rotate_90
[14:18:03] [PASSED] drm_test_cmdline_rotate_180
[14:18:03] [PASSED] drm_test_cmdline_rotate_270
[14:18:03] [PASSED] drm_test_cmdline_hmirror
[14:18:03] [PASSED] drm_test_cmdline_vmirror
[14:18:03] [PASSED] drm_test_cmdline_margin_options
[14:18:03] [PASSED] drm_test_cmdline_multiple_options
[14:18:03] [PASSED] drm_test_cmdline_bpp_extra_and_option
[14:18:03] [PASSED] drm_test_cmdline_extra_and_option
[14:18:03] [PASSED] drm_test_cmdline_freestanding_options
[14:18:03] [PASSED] drm_test_cmdline_freestanding_force_e_and_options
[14:18:03] [PASSED] drm_test_cmdline_panel_orientation
[14:18:03] ================ drm_test_cmdline_invalid  =================
[14:18:03] [PASSED] margin_only
[14:18:03] [PASSED] interlace_only
[14:18:03] [PASSED] res_missing_x
[14:18:03] [PASSED] res_missing_y
[14:18:03] [PASSED] res_bad_y
[14:18:03] [PASSED] res_missing_y_bpp
[14:18:03] [PASSED] res_bad_bpp
[14:18:03] [PASSED] res_bad_refresh
[14:18:03] [PASSED] res_bpp_refresh_force_on_off
[14:18:03] [PASSED] res_invalid_mode
[14:18:03] [PASSED] res_bpp_wrong_place_mode
[14:18:03] [PASSED] name_bpp_refresh
[14:18:03] [PASSED] name_refresh
[14:18:03] [PASSED] name_refresh_wrong_mode
[14:18:03] [PASSED] name_refresh_invalid_mode
[14:18:03] [PASSED] rotate_multiple
[14:18:03] [PASSED] rotate_invalid_val
[14:18:03] [PASSED] rotate_truncated
[14:18:03] [PASSED] invalid_option
[14:18:03] [PASSED] invalid_tv_option
[14:18:03] [PASSED] truncated_tv_option
[14:18:03] ============ [PASSED] drm_test_cmdline_invalid =============
[14:18:03] =============== drm_test_cmdline_tv_options  ===============
[14:18:03] [PASSED] NTSC
[14:18:03] [PASSED] NTSC_443
[14:18:03] [PASSED] NTSC_J
[14:18:03] [PASSED] PAL
[14:18:03] [PASSED] PAL_M
[14:18:03] [PASSED] PAL_N
[14:18:03] [PASSED] SECAM
[14:18:03] [PASSED] MONO_525
[14:18:03] [PASSED] MONO_625
[14:18:03] =========== [PASSED] drm_test_cmdline_tv_options ===========
[14:18:03] =============== [PASSED] drm_cmdline_parser ================
[14:18:03] ========== drmm_connector_hdmi_init (20 subtests) ==========
[14:18:03] [PASSED] drm_test_connector_hdmi_init_valid
[14:18:03] [PASSED] drm_test_connector_hdmi_init_bpc_8
[14:18:03] [PASSED] drm_test_connector_hdmi_init_bpc_10
[14:18:03] [PASSED] drm_test_connector_hdmi_init_bpc_12
[14:18:03] [PASSED] drm_test_connector_hdmi_init_bpc_invalid
[14:18:03] [PASSED] drm_test_connector_hdmi_init_bpc_null
[14:18:03] [PASSED] drm_test_connector_hdmi_init_formats_empty
[14:18:03] [PASSED] drm_test_connector_hdmi_init_formats_no_rgb
[14:18:03] === drm_test_connector_hdmi_init_formats_yuv420_allowed  ===
[14:18:03] [PASSED] supported_formats=0x9 yuv420_allowed=1
[14:18:03] [PASSED] supported_formats=0x9 yuv420_allowed=0
[14:18:03] [PASSED] supported_formats=0x3 yuv420_allowed=1
[14:18:03] [PASSED] supported_formats=0x3 yuv420_allowed=0
[14:18:03] === [PASSED] drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[14:18:03] [PASSED] drm_test_connector_hdmi_init_null_ddc
[14:18:03] [PASSED] drm_test_connector_hdmi_init_null_product
[14:18:03] [PASSED] drm_test_connector_hdmi_init_null_vendor
[14:18:03] [PASSED] drm_test_connector_hdmi_init_product_length_exact
[14:18:03] [PASSED] drm_test_connector_hdmi_init_product_length_too_long
[14:18:03] [PASSED] drm_test_connector_hdmi_init_product_valid
[14:18:03] [PASSED] drm_test_connector_hdmi_init_vendor_length_exact
[14:18:03] [PASSED] drm_test_connector_hdmi_init_vendor_length_too_long
[14:18:03] [PASSED] drm_test_connector_hdmi_init_vendor_valid
[14:18:03] ========= drm_test_connector_hdmi_init_type_valid  =========
[14:18:03] [PASSED] HDMI-A
[14:18:03] [PASSED] HDMI-B
[14:18:03] ===== [PASSED] drm_test_connector_hdmi_init_type_valid =====
[14:18:03] ======== drm_test_connector_hdmi_init_type_invalid  ========
[14:18:03] [PASSED] Unknown
[14:18:03] [PASSED] VGA
[14:18:03] [PASSED] DVI-I
[14:18:03] [PASSED] DVI-D
[14:18:03] [PASSED] DVI-A
[14:18:03] [PASSED] Composite
[14:18:03] [PASSED] SVIDEO
[14:18:03] [PASSED] LVDS
[14:18:03] [PASSED] Component
[14:18:03] [PASSED] DIN
[14:18:03] [PASSED] DP
[14:18:03] [PASSED] TV
[14:18:03] [PASSED] eDP
[14:18:03] [PASSED] Virtual
[14:18:03] [PASSED] DSI
[14:18:03] [PASSED] DPI
[14:18:03] [PASSED] Writeback
[14:18:03] [PASSED] SPI
[14:18:03] [PASSED] USB
[14:18:03] ==== [PASSED] drm_test_connector_hdmi_init_type_invalid ====
[14:18:03] ============ [PASSED] drmm_connector_hdmi_init =============
[14:18:03] ============= drmm_connector_init (3 subtests) =============
[14:18:03] [PASSED] drm_test_drmm_connector_init
[14:18:03] [PASSED] drm_test_drmm_connector_init_null_ddc
[14:18:03] ========= drm_test_drmm_connector_init_type_valid  =========
[14:18:03] [PASSED] Unknown
[14:18:03] [PASSED] VGA
[14:18:03] [PASSED] DVI-I
[14:18:03] [PASSED] DVI-D
[14:18:03] [PASSED] DVI-A
[14:18:03] [PASSED] Composite
[14:18:03] [PASSED] SVIDEO
[14:18:03] [PASSED] LVDS
[14:18:03] [PASSED] Component
[14:18:03] [PASSED] DIN
[14:18:03] [PASSED] DP
[14:18:03] [PASSED] HDMI-A
[14:18:03] [PASSED] HDMI-B
[14:18:03] [PASSED] TV
[14:18:03] [PASSED] eDP
[14:18:03] [PASSED] Virtual
[14:18:03] [PASSED] DSI
[14:18:03] [PASSED] DPI
[14:18:03] [PASSED] Writeback
[14:18:03] [PASSED] SPI
[14:18:03] [PASSED] USB
[14:18:03] ===== [PASSED] drm_test_drmm_connector_init_type_valid =====
[14:18:03] =============== [PASSED] drmm_connector_init ===============
[14:18:03] ========= drm_connector_dynamic_init (6 subtests) ==========
[14:18:03] [PASSED] drm_test_drm_connector_dynamic_init
[14:18:03] [PASSED] drm_test_drm_connector_dynamic_init_null_ddc
[14:18:03] [PASSED] drm_test_drm_connector_dynamic_init_not_added
[14:18:03] [PASSED] drm_test_drm_connector_dynamic_init_properties
[14:18:03] ===== drm_test_drm_connector_dynamic_init_type_valid  ======
[14:18:03] [PASSED] Unknown
[14:18:03] [PASSED] VGA
[14:18:03] [PASSED] DVI-I
[14:18:03] [PASSED] DVI-D
[14:18:03] [PASSED] DVI-A
[14:18:03] [PASSED] Composite
[14:18:03] [PASSED] SVIDEO
[14:18:03] [PASSED] LVDS
[14:18:03] [PASSED] Component
[14:18:03] [PASSED] DIN
[14:18:03] [PASSED] DP
[14:18:03] [PASSED] HDMI-A
[14:18:03] [PASSED] HDMI-B
[14:18:03] [PASSED] TV
[14:18:03] [PASSED] eDP
[14:18:03] [PASSED] Virtual
[14:18:03] [PASSED] DSI
[14:18:03] [PASSED] DPI
[14:18:03] [PASSED] Writeback
[14:18:03] [PASSED] SPI
[14:18:03] [PASSED] USB
[14:18:03] = [PASSED] drm_test_drm_connector_dynamic_init_type_valid ==
[14:18:03] ======== drm_test_drm_connector_dynamic_init_name  =========
[14:18:03] [PASSED] Unknown
[14:18:03] [PASSED] VGA
[14:18:03] [PASSED] DVI-I
[14:18:03] [PASSED] DVI-D
[14:18:03] [PASSED] DVI-A
[14:18:03] [PASSED] Composite
[14:18:03] [PASSED] SVIDEO
[14:18:03] [PASSED] LVDS
[14:18:03] [PASSED] Component
[14:18:03] [PASSED] DIN
[14:18:03] [PASSED] DP
[14:18:03] [PASSED] HDMI-A
[14:18:03] [PASSED] HDMI-B
[14:18:03] [PASSED] TV
[14:18:03] [PASSED] eDP
[14:18:03] [PASSED] Virtual
[14:18:03] [PASSED] DSI
[14:18:03] [PASSED] DPI
[14:18:03] [PASSED] Writeback
[14:18:03] [PASSED] SPI
[14:18:03] [PASSED] USB
[14:18:03] ==== [PASSED] drm_test_drm_connector_dynamic_init_name =====
[14:18:03] =========== [PASSED] drm_connector_dynamic_init ============
[14:18:03] ==== drm_connector_dynamic_register_early (4 subtests) =====
[14:18:03] [PASSED] drm_test_drm_connector_dynamic_register_early_on_list
[14:18:03] [PASSED] drm_test_drm_connector_dynamic_register_early_defer
[14:18:03] [PASSED] drm_test_drm_connector_dynamic_register_early_no_init
[14:18:03] [PASSED] drm_test_drm_connector_dynamic_register_early_no_mode_object
[14:18:03] ====== [PASSED] drm_connector_dynamic_register_early =======
[14:18:03] ======= drm_connector_dynamic_register (7 subtests) ========
[14:18:03] [PASSED] drm_test_drm_connector_dynamic_register_on_list
[14:18:03] [PASSED] drm_test_drm_connector_dynamic_register_no_defer
[14:18:03] [PASSED] drm_test_drm_connector_dynamic_register_no_init
[14:18:03] [PASSED] drm_test_drm_connector_dynamic_register_mode_object
[14:18:03] [PASSED] drm_test_drm_connector_dynamic_register_sysfs
[14:18:03] [PASSED] drm_test_drm_connector_dynamic_register_sysfs_name
[14:18:03] [PASSED] drm_test_drm_connector_dynamic_register_debugfs
[14:18:03] ========= [PASSED] drm_connector_dynamic_register ==========
[14:18:03] = drm_connector_attach_broadcast_rgb_property (2 subtests) =
[14:18:03] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property
[14:18:03] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property_hdmi_connector
[14:18:03] === [PASSED] drm_connector_attach_broadcast_rgb_property ===
[14:18:03] ========== drm_get_tv_mode_from_name (2 subtests) ==========
[14:18:03] ========== drm_test_get_tv_mode_from_name_valid  ===========
[14:18:03] [PASSED] NTSC
[14:18:03] [PASSED] NTSC-443
[14:18:03] [PASSED] NTSC-J
[14:18:03] [PASSED] PAL
[14:18:03] [PASSED] PAL-M
[14:18:03] [PASSED] PAL-N
[14:18:03] [PASSED] SECAM
[14:18:03] [PASSED] Mono
[14:18:03] ====== [PASSED] drm_test_get_tv_mode_from_name_valid =======
[14:18:03] [PASSED] drm_test_get_tv_mode_from_name_truncated
[14:18:03] ============ [PASSED] drm_get_tv_mode_from_name ============
[14:18:03] = drm_test_connector_hdmi_compute_mode_clock (12 subtests) =
[14:18:03] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb
[14:18:03] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc
[14:18:03] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc_vic_1
[14:18:03] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc
[14:18:03] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc_vic_1
[14:18:03] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_double
[14:18:03] = drm_test_connector_hdmi_compute_mode_clock_yuv420_valid  =
[14:18:03] [PASSED] VIC 96
[14:18:03] [PASSED] VIC 97
[14:18:03] [PASSED] VIC 101
[14:18:03] [PASSED] VIC 102
[14:18:03] [PASSED] VIC 106
[14:18:03] [PASSED] VIC 107
[14:18:03] === [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_valid ===
[14:18:03] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_10_bpc
[14:18:03] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_12_bpc
[14:18:03] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_8_bpc
[14:18:03] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_10_bpc
[14:18:03] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_12_bpc
[14:18:03] === [PASSED] drm_test_connector_hdmi_compute_mode_clock ====
[14:18:03] == drm_hdmi_connector_get_broadcast_rgb_name (2 subtests) ==
[14:18:03] === drm_test_drm_hdmi_connector_get_broadcast_rgb_name  ====
[14:18:03] [PASSED] Automatic
[14:18:03] [PASSED] Full
[14:18:03] [PASSED] Limited 16:235
[14:18:03] === [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name ===
[14:18:03] [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name_invalid
[14:18:03] ==== [PASSED] drm_hdmi_connector_get_broadcast_rgb_name ====
[14:18:03] == drm_hdmi_connector_get_output_format_name (2 subtests) ==
[14:18:03] === drm_test_drm_hdmi_connector_get_output_format_name  ====
[14:18:03] [PASSED] RGB
[14:18:03] [PASSED] YUV 4:2:0
[14:18:03] [PASSED] YUV 4:2:2
[14:18:03] [PASSED] YUV 4:4:4
[14:18:03] === [PASSED] drm_test_drm_hdmi_connector_get_output_format_name ===
[14:18:03] [PASSED] drm_test_drm_hdmi_connector_get_output_format_name_invalid
[14:18:03] ==== [PASSED] drm_hdmi_connector_get_output_format_name ====
[14:18:03] ============= drm_damage_helper (21 subtests) ==============
[14:18:03] [PASSED] drm_test_damage_iter_no_damage
[14:18:03] [PASSED] drm_test_damage_iter_no_damage_fractional_src
[14:18:03] [PASSED] drm_test_damage_iter_no_damage_src_moved
[14:18:03] [PASSED] drm_test_damage_iter_no_damage_fractional_src_moved
[14:18:03] [PASSED] drm_test_damage_iter_no_damage_not_visible
[14:18:03] [PASSED] drm_test_damage_iter_no_damage_no_crtc
[14:18:03] [PASSED] drm_test_damage_iter_no_damage_no_fb
[14:18:03] [PASSED] drm_test_damage_iter_simple_damage
[14:18:03] [PASSED] drm_test_damage_iter_single_damage
[14:18:03] [PASSED] drm_test_damage_iter_single_damage_intersect_src
[14:18:03] [PASSED] drm_test_damage_iter_single_damage_outside_src
[14:18:03] [PASSED] drm_test_damage_iter_single_damage_fractional_src
[14:18:03] [PASSED] drm_test_damage_iter_single_damage_intersect_fractional_src
[14:18:03] [PASSED] drm_test_damage_iter_single_damage_outside_fractional_src
[14:18:03] [PASSED] drm_test_damage_iter_single_damage_src_moved
[14:18:03] [PASSED] drm_test_damage_iter_single_damage_fractional_src_moved
[14:18:03] [PASSED] drm_test_damage_iter_damage
[14:18:03] [PASSED] drm_test_damage_iter_damage_one_intersect
[14:18:03] [PASSED] drm_test_damage_iter_damage_one_outside
[14:18:03] [PASSED] drm_test_damage_iter_damage_src_moved
[14:18:03] [PASSED] drm_test_damage_iter_damage_not_visible
[14:18:03] ================ [PASSED] drm_damage_helper ================
[14:18:03] ============== drm_dp_mst_helper (3 subtests) ==============
[14:18:03] ============== drm_test_dp_mst_calc_pbn_mode  ==============
[14:18:03] [PASSED] Clock 154000 BPP 30 DSC disabled
[14:18:03] [PASSED] Clock 234000 BPP 30 DSC disabled
[14:18:03] [PASSED] Clock 297000 BPP 24 DSC disabled
[14:18:03] [PASSED] Clock 332880 BPP 24 DSC enabled
[14:18:03] [PASSED] Clock 324540 BPP 24 DSC enabled
[14:18:03] ========== [PASSED] drm_test_dp_mst_calc_pbn_mode ==========
[14:18:03] ============== drm_test_dp_mst_calc_pbn_div  ===============
[14:18:03] [PASSED] Link rate 2000000 lane count 4
[14:18:03] [PASSED] Link rate 2000000 lane count 2
[14:18:03] [PASSED] Link rate 2000000 lane count 1
[14:18:03] [PASSED] Link rate 1350000 lane count 4
[14:18:03] [PASSED] Link rate 1350000 lane count 2
[14:18:03] [PASSED] Link rate 1350000 lane count 1
[14:18:03] [PASSED] Link rate 1000000 lane count 4
[14:18:03] [PASSED] Link rate 1000000 lane count 2
[14:18:03] [PASSED] Link rate 1000000 lane count 1
[14:18:03] [PASSED] Link rate 810000 lane count 4
[14:18:03] [PASSED] Link rate 810000 lane count 2
[14:18:03] [PASSED] Link rate 810000 lane count 1
[14:18:03] [PASSED] Link rate 540000 lane count 4
[14:18:03] [PASSED] Link rate 540000 lane count 2
[14:18:03] [PASSED] Link rate 540000 lane count 1
[14:18:03] [PASSED] Link rate 270000 lane count 4
[14:18:03] [PASSED] Link rate 270000 lane count 2
[14:18:03] [PASSED] Link rate 270000 lane count 1
[14:18:03] [PASSED] Link rate 162000 lane count 4
[14:18:03] [PASSED] Link rate 162000 lane count 2
[14:18:03] [PASSED] Link rate 162000 lane count 1
[14:18:03] ========== [PASSED] drm_test_dp_mst_calc_pbn_div ===========
[14:18:03] ========= drm_test_dp_mst_sideband_msg_req_decode  =========
[14:18:03] [PASSED] DP_ENUM_PATH_RESOURCES with port number
[14:18:03] [PASSED] DP_POWER_UP_PHY with port number
[14:18:03] [PASSED] DP_POWER_DOWN_PHY with port number
[14:18:03] [PASSED] DP_ALLOCATE_PAYLOAD with SDP stream sinks
[14:18:03] [PASSED] DP_ALLOCATE_PAYLOAD with port number
[14:18:03] [PASSED] DP_ALLOCATE_PAYLOAD with VCPI
[14:18:03] [PASSED] DP_ALLOCATE_PAYLOAD with PBN
[14:18:03] [PASSED] DP_QUERY_PAYLOAD with port number
[14:18:03] [PASSED] DP_QUERY_PAYLOAD with VCPI
[14:18:03] [PASSED] DP_REMOTE_DPCD_READ with port number
[14:18:03] [PASSED] DP_REMOTE_DPCD_READ with DPCD address
[14:18:03] [PASSED] DP_REMOTE_DPCD_READ with max number of bytes
[14:18:03] [PASSED] DP_REMOTE_DPCD_WRITE with port number
[14:18:03] [PASSED] DP_REMOTE_DPCD_WRITE with DPCD address
[14:18:03] [PASSED] DP_REMOTE_DPCD_WRITE with data array
[14:18:03] [PASSED] DP_REMOTE_I2C_READ with port number
[14:18:03] [PASSED] DP_REMOTE_I2C_READ with I2C device ID
[14:18:03] [PASSED] DP_REMOTE_I2C_READ with transactions array
[14:18:03] [PASSED] DP_REMOTE_I2C_WRITE with port number
[14:18:03] [PASSED] DP_REMOTE_I2C_WRITE with I2C device ID
[14:18:03] [PASSED] DP_REMOTE_I2C_WRITE with data array
[14:18:03] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream ID
[14:18:03] [PASSED] DP_QUERY_STREAM_ENC_STATUS with client ID
[14:18:03] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream event
[14:18:03] [PASSED] DP_QUERY_STREAM_ENC_STATUS with valid stream event
[14:18:03] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream behavior
[14:18:03] [PASSED] DP_QUERY_STREAM_ENC_STATUS with a valid stream behavior
[14:18:03] ===== [PASSED] drm_test_dp_mst_sideband_msg_req_decode =====
[14:18:03] ================ [PASSED] drm_dp_mst_helper ================
[14:18:03] ================== drm_exec (7 subtests) ===================
[14:18:03] [PASSED] sanitycheck
[14:18:03] [PASSED] test_lock
[14:18:03] [PASSED] test_lock_unlock
[14:18:03] [PASSED] test_duplicates
[14:18:03] [PASSED] test_prepare
[14:18:03] [PASSED] test_prepare_array
[14:18:03] [PASSED] test_multiple_loops
[14:18:03] ==================== [PASSED] drm_exec =====================
[14:18:03] =========== drm_format_helper_test (17 subtests) ===========
[14:18:03] ============== drm_test_fb_xrgb8888_to_gray8  ==============
[14:18:03] [PASSED] single_pixel_source_buffer
[14:18:03] [PASSED] single_pixel_clip_rectangle
[14:18:03] [PASSED] well_known_colors
[14:18:03] [PASSED] destination_pitch
[14:18:03] ========== [PASSED] drm_test_fb_xrgb8888_to_gray8 ==========
[14:18:03] ============= drm_test_fb_xrgb8888_to_rgb332  ==============
[14:18:03] [PASSED] single_pixel_source_buffer
[14:18:03] [PASSED] single_pixel_clip_rectangle
[14:18:03] [PASSED] well_known_colors
[14:18:03] [PASSED] destination_pitch
[14:18:03] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb332 ==========
[14:18:03] ============= drm_test_fb_xrgb8888_to_rgb565  ==============
[14:18:03] [PASSED] single_pixel_source_buffer
[14:18:03] [PASSED] single_pixel_clip_rectangle
[14:18:03] [PASSED] well_known_colors
[14:18:03] [PASSED] destination_pitch
[14:18:03] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb565 ==========
[14:18:03] ============ drm_test_fb_xrgb8888_to_xrgb1555  =============
[14:18:03] [PASSED] single_pixel_source_buffer
[14:18:03] [PASSED] single_pixel_clip_rectangle
[14:18:03] [PASSED] well_known_colors
[14:18:03] [PASSED] destination_pitch
[14:18:03] ======== [PASSED] drm_test_fb_xrgb8888_to_xrgb1555 =========
[14:18:03] ============ drm_test_fb_xrgb8888_to_argb1555  =============
[14:18:03] [PASSED] single_pixel_source_buffer
[14:18:03] [PASSED] single_pixel_clip_rectangle
[14:18:03] [PASSED] well_known_colors
[14:18:03] [PASSED] destination_pitch
[14:18:03] ======== [PASSED] drm_test_fb_xrgb8888_to_argb1555 =========
[14:18:03] ============ drm_test_fb_xrgb8888_to_rgba5551  =============
[14:18:03] [PASSED] single_pixel_source_buffer
[14:18:03] [PASSED] single_pixel_clip_rectangle
[14:18:03] [PASSED] well_known_colors
[14:18:03] [PASSED] destination_pitch
[14:18:03] ======== [PASSED] drm_test_fb_xrgb8888_to_rgba5551 =========
[14:18:03] ============= drm_test_fb_xrgb8888_to_rgb888  ==============
[14:18:03] [PASSED] single_pixel_source_buffer
[14:18:03] [PASSED] single_pixel_clip_rectangle
[14:18:03] [PASSED] well_known_colors
[14:18:03] [PASSED] destination_pitch
[14:18:03] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb888 ==========
[14:18:03] ============= drm_test_fb_xrgb8888_to_bgr888  ==============
[14:18:03] [PASSED] single_pixel_source_buffer
[14:18:03] [PASSED] single_pixel_clip_rectangle
[14:18:03] [PASSED] well_known_colors
[14:18:03] [PASSED] destination_pitch
[14:18:03] ========= [PASSED] drm_test_fb_xrgb8888_to_bgr888 ==========
[14:18:03] ============ drm_test_fb_xrgb8888_to_argb8888  =============
[14:18:03] [PASSED] single_pixel_source_buffer
[14:18:03] [PASSED] single_pixel_clip_rectangle
[14:18:03] [PASSED] well_known_colors
[14:18:03] [PASSED] destination_pitch
[14:18:03] ======== [PASSED] drm_test_fb_xrgb8888_to_argb8888 =========
[14:18:03] =========== drm_test_fb_xrgb8888_to_xrgb2101010  ===========
[14:18:03] [PASSED] single_pixel_source_buffer
[14:18:03] [PASSED] single_pixel_clip_rectangle
[14:18:03] [PASSED] well_known_colors
[14:18:03] [PASSED] destination_pitch
[14:18:03] ======= [PASSED] drm_test_fb_xrgb8888_to_xrgb2101010 =======
[14:18:03] =========== drm_test_fb_xrgb8888_to_argb2101010  ===========
[14:18:03] [PASSED] single_pixel_source_buffer
[14:18:03] [PASSED] single_pixel_clip_rectangle
[14:18:03] [PASSED] well_known_colors
[14:18:03] [PASSED] destination_pitch
[14:18:03] ======= [PASSED] drm_test_fb_xrgb8888_to_argb2101010 =======
[14:18:03] ============== drm_test_fb_xrgb8888_to_mono  ===============
[14:18:03] [PASSED] single_pixel_source_buffer
[14:18:03] [PASSED] single_pixel_clip_rectangle
[14:18:03] [PASSED] well_known_colors
[14:18:03] [PASSED] destination_pitch
[14:18:03] ========== [PASSED] drm_test_fb_xrgb8888_to_mono ===========
[14:18:03] ==================== drm_test_fb_swab  =====================
[14:18:03] [PASSED] single_pixel_source_buffer
[14:18:03] [PASSED] single_pixel_clip_rectangle
[14:18:03] [PASSED] well_known_colors
[14:18:03] [PASSED] destination_pitch
[14:18:03] ================ [PASSED] drm_test_fb_swab =================
[14:18:03] ============ drm_test_fb_xrgb8888_to_xbgr8888  =============
[14:18:03] [PASSED] single_pixel_source_buffer
[14:18:03] [PASSED] single_pixel_clip_rectangle
[14:18:03] [PASSED] well_known_colors
[14:18:03] [PASSED] destination_pitch
[14:18:03] ======== [PASSED] drm_test_fb_xrgb8888_to_xbgr8888 =========
[14:18:03] ============ drm_test_fb_xrgb8888_to_abgr8888  =============
[14:18:03] [PASSED] single_pixel_source_buffer
[14:18:03] [PASSED] single_pixel_clip_rectangle
[14:18:03] [PASSED] well_known_colors
[14:18:03] [PASSED] destination_pitch
[14:18:03] ======== [PASSED] drm_test_fb_xrgb8888_to_abgr8888 =========
[14:18:03] ================= drm_test_fb_clip_offset  =================
[14:18:03] [PASSED] pass through
[14:18:03] [PASSED] horizontal offset
[14:18:03] [PASSED] vertical offset
[14:18:03] [PASSED] horizontal and vertical offset
[14:18:03] [PASSED] horizontal offset (custom pitch)
[14:18:03] [PASSED] vertical offset (custom pitch)
[14:18:03] [PASSED] horizontal and vertical offset (custom pitch)
[14:18:03] ============= [PASSED] drm_test_fb_clip_offset =============
[14:18:03] =================== drm_test_fb_memcpy  ====================
[14:18:03] [PASSED] single_pixel_source_buffer: XR24 little-endian (0x34325258)
[14:18:03] [PASSED] single_pixel_source_buffer: XRA8 little-endian (0x38415258)
[14:18:03] [PASSED] single_pixel_source_buffer: YU24 little-endian (0x34325559)
[14:18:03] [PASSED] single_pixel_clip_rectangle: XB24 little-endian (0x34324258)
[14:18:03] [PASSED] single_pixel_clip_rectangle: XRA8 little-endian (0x38415258)
[14:18:03] [PASSED] single_pixel_clip_rectangle: YU24 little-endian (0x34325559)
[14:18:03] [PASSED] well_known_colors: XB24 little-endian (0x34324258)
[14:18:03] [PASSED] well_known_colors: XRA8 little-endian (0x38415258)
[14:18:03] [PASSED] well_known_colors: YU24 little-endian (0x34325559)
[14:18:03] [PASSED] destination_pitch: XB24 little-endian (0x34324258)
[14:18:03] [PASSED] destination_pitch: XRA8 little-endian (0x38415258)
[14:18:03] [PASSED] destination_pitch: YU24 little-endian (0x34325559)
[14:18:03] =============== [PASSED] drm_test_fb_memcpy ================
[14:18:03] ============= [PASSED] drm_format_helper_test ==============
[14:18:03] ================= drm_format (18 subtests) =================
[14:18:03] [PASSED] drm_test_format_block_width_invalid
[14:18:03] [PASSED] drm_test_format_block_width_one_plane
[14:18:03] [PASSED] drm_test_format_block_width_two_plane
[14:18:03] [PASSED] drm_test_format_block_width_three_plane
[14:18:03] [PASSED] drm_test_format_block_width_tiled
[14:18:03] [PASSED] drm_test_format_block_height_invalid
[14:18:03] [PASSED] drm_test_format_block_height_one_plane
[14:18:03] [PASSED] drm_test_format_block_height_two_plane
[14:18:03] [PASSED] drm_test_format_block_height_three_plane
[14:18:03] [PASSED] drm_test_format_block_height_tiled
[14:18:03] [PASSED] drm_test_format_min_pitch_invalid
[14:18:03] [PASSED] drm_test_format_min_pitch_one_plane_8bpp
[14:18:03] [PASSED] drm_test_format_min_pitch_one_plane_16bpp
[14:18:03] [PASSED] drm_test_format_min_pitch_one_plane_24bpp
[14:18:03] [PASSED] drm_test_format_min_pitch_one_plane_32bpp
[14:18:03] [PASSED] drm_test_format_min_pitch_two_plane
[14:18:03] [PASSED] drm_test_format_min_pitch_three_plane_8bpp
[14:18:03] [PASSED] drm_test_format_min_pitch_tiled
[14:18:03] =================== [PASSED] drm_format ====================
[14:18:03] ============== drm_framebuffer (10 subtests) ===============
[14:18:03] ========== drm_test_framebuffer_check_src_coords  ==========
[14:18:03] [PASSED] Success: source fits into fb
[14:18:03] [PASSED] Fail: overflowing fb with x-axis coordinate
[14:18:03] [PASSED] Fail: overflowing fb with y-axis coordinate
[14:18:03] [PASSED] Fail: overflowing fb with source width
[14:18:03] [PASSED] Fail: overflowing fb with source height
[14:18:03] ====== [PASSED] drm_test_framebuffer_check_src_coords ======
[14:18:03] [PASSED] drm_test_framebuffer_cleanup
[14:18:03] =============== drm_test_framebuffer_create  ===============
[14:18:03] [PASSED] ABGR8888 normal sizes
[14:18:03] [PASSED] ABGR8888 max sizes
[14:18:03] [PASSED] ABGR8888 pitch greater than min required
[14:18:03] [PASSED] ABGR8888 pitch less than min required
[14:18:03] [PASSED] ABGR8888 Invalid width
[14:18:03] [PASSED] ABGR8888 Invalid buffer handle
[14:18:03] [PASSED] No pixel format
[14:18:03] [PASSED] ABGR8888 Width 0
[14:18:03] [PASSED] ABGR8888 Height 0
[14:18:03] [PASSED] ABGR8888 Out of bound height * pitch combination
[14:18:03] [PASSED] ABGR8888 Large buffer offset
[14:18:03] [PASSED] ABGR8888 Buffer offset for inexistent plane
[14:18:03] [PASSED] ABGR8888 Invalid flag
[14:18:03] [PASSED] ABGR8888 Set DRM_MODE_FB_MODIFIERS without modifiers
[14:18:03] [PASSED] ABGR8888 Valid buffer modifier
[14:18:03] [PASSED] ABGR8888 Invalid buffer modifier(DRM_FORMAT_MOD_SAMSUNG_64_32_TILE)
[14:18:03] [PASSED] ABGR8888 Extra pitches without DRM_MODE_FB_MODIFIERS
[14:18:03] [PASSED] ABGR8888 Extra pitches with DRM_MODE_FB_MODIFIERS
[14:18:03] [PASSED] NV12 Normal sizes
[14:18:03] [PASSED] NV12 Max sizes
[14:18:03] [PASSED] NV12 Invalid pitch
[14:18:03] [PASSED] NV12 Invalid modifier/missing DRM_MODE_FB_MODIFIERS flag
[14:18:03] [PASSED] NV12 different  modifier per-plane
[14:18:03] [PASSED] NV12 with DRM_FORMAT_MOD_SAMSUNG_64_32_TILE
[14:18:03] [PASSED] NV12 Valid modifiers without DRM_MODE_FB_MODIFIERS
[14:18:03] [PASSED] NV12 Modifier for inexistent plane
[14:18:03] [PASSED] NV12 Handle for inexistent plane
[14:18:03] [PASSED] NV12 Handle for inexistent plane without DRM_MODE_FB_MODIFIERS
[14:18:03] [PASSED] YVU420 DRM_MODE_FB_MODIFIERS set without modifier
[14:18:03] [PASSED] YVU420 Normal sizes
[14:18:03] [PASSED] YVU420 Max sizes
[14:18:03] [PASSED] YVU420 Invalid pitch
[14:18:03] [PASSED] YVU420 Different pitches
[14:18:03] [PASSED] YVU420 Different buffer offsets/pitches
[14:18:03] [PASSED] YVU420 Modifier set just for plane 0, without DRM_MODE_FB_MODIFIERS
[14:18:03] [PASSED] YVU420 Modifier set just for planes 0, 1, without DRM_MODE_FB_MODIFIERS
[14:18:03] [PASSED] YVU420 Modifier set just for plane 0, 1, with DRM_MODE_FB_MODIFIERS
[14:18:03] [PASSED] YVU420 Valid modifier
[14:18:03] [PASSED] YVU420 Different modifiers per plane
[14:18:03] [PASSED] YVU420 Modifier for inexistent plane
[14:18:03] [PASSED] YUV420_10BIT Invalid modifier(DRM_FORMAT_MOD_LINEAR)
[14:18:03] [PASSED] X0L2 Normal sizes
[14:18:03] [PASSED] X0L2 Max sizes
[14:18:03] [PASSED] X0L2 Invalid pitch
[14:18:03] [PASSED] X0L2 Pitch greater than minimum required
[14:18:03] [PASSED] X0L2 Handle for inexistent plane
[14:18:03] [PASSED] X0L2 Offset for inexistent plane, without DRM_MODE_FB_MODIFIERS set
[14:18:03] [PASSED] X0L2 Modifier without DRM_MODE_FB_MODIFIERS set
[14:18:03] [PASSED] X0L2 Valid modifier
[14:18:03] [PASSED] X0L2 Modifier for inexistent plane
[14:18:03] =========== [PASSED] drm_test_framebuffer_create ===========
[14:18:03] [PASSED] drm_test_framebuffer_free
[14:18:03] [PASSED] drm_test_framebuffer_init
[14:18:03] [PASSED] drm_test_framebuffer_init_bad_format
[14:18:03] [PASSED] drm_test_framebuffer_init_dev_mismatch
[14:18:03] [PASSED] drm_test_framebuffer_lookup
[14:18:03] [PASSED] drm_test_framebuffer_lookup_inexistent
[14:18:03] [PASSED] drm_test_framebuffer_modifiers_not_supported
[14:18:03] ================= [PASSED] drm_framebuffer =================
[14:18:03] ================ drm_gem_shmem (8 subtests) ================
[14:18:03] [PASSED] drm_gem_shmem_test_obj_create
[14:18:03] [PASSED] drm_gem_shmem_test_obj_create_private
[14:18:03] [PASSED] drm_gem_shmem_test_pin_pages
[14:18:03] [PASSED] drm_gem_shmem_test_vmap
[14:18:03] [PASSED] drm_gem_shmem_test_get_pages_sgt
[14:18:03] [PASSED] drm_gem_shmem_test_get_sg_table
[14:18:03] [PASSED] drm_gem_shmem_test_madvise
[14:18:03] [PASSED] drm_gem_shmem_test_purge
[14:18:03] ================== [PASSED] drm_gem_shmem ==================
[14:18:03] === drm_atomic_helper_connector_hdmi_check (27 subtests) ===
[14:18:03] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode
[14:18:03] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode_vic_1
[14:18:03] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode
[14:18:03] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode_vic_1
[14:18:03] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode
[14:18:03] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode_vic_1
[14:18:03] ====== drm_test_check_broadcast_rgb_cea_mode_yuv420  =======
[14:18:03] [PASSED] Automatic
[14:18:03] [PASSED] Full
[14:18:03] [PASSED] Limited 16:235
[14:18:03] == [PASSED] drm_test_check_broadcast_rgb_cea_mode_yuv420 ===
[14:18:03] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_changed
[14:18:03] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_not_changed
[14:18:03] [PASSED] drm_test_check_disable_connector
[14:18:03] [PASSED] drm_test_check_hdmi_funcs_reject_rate
[14:18:03] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_rgb
[14:18:03] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_yuv420
[14:18:03] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv422
[14:18:03] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv420
[14:18:03] [PASSED] drm_test_check_driver_unsupported_fallback_yuv420
[14:18:03] [PASSED] drm_test_check_output_bpc_crtc_mode_changed
[14:18:03] [PASSED] drm_test_check_output_bpc_crtc_mode_not_changed
[14:18:03] [PASSED] drm_test_check_output_bpc_dvi
[14:18:03] [PASSED] drm_test_check_output_bpc_format_vic_1
[14:18:03] [PASSED] drm_test_check_output_bpc_format_display_8bpc_only
[14:18:03] [PASSED] drm_test_check_output_bpc_format_display_rgb_only
[14:18:03] [PASSED] drm_test_check_output_bpc_format_driver_8bpc_only
[14:18:03] [PASSED] drm_test_check_output_bpc_format_driver_rgb_only
[14:18:03] [PASSED] drm_test_check_tmds_char_rate_rgb_8bpc
[14:18:03] [PASSED] drm_test_check_tmds_char_rate_rgb_10bpc
[14:18:03] [PASSED] drm_test_check_tmds_char_rate_rgb_12bpc
[14:18:03] ===== [PASSED] drm_atomic_helper_connector_hdmi_check ======
[14:18:03] === drm_atomic_helper_connector_hdmi_reset (6 subtests) ====
[14:18:03] [PASSED] drm_test_check_broadcast_rgb_value
[14:18:03] [PASSED] drm_test_check_bpc_8_value
[14:18:03] [PASSED] drm_test_check_bpc_10_value
[14:18:03] [PASSED] drm_test_check_bpc_12_value
[14:18:03] [PASSED] drm_test_check_format_value
[14:18:03] [PASSED] drm_test_check_tmds_char_value
[14:18:03] ===== [PASSED] drm_atomic_helper_connector_hdmi_reset ======
[14:18:03] = drm_atomic_helper_connector_hdmi_mode_valid (4 subtests) =
[14:18:03] [PASSED] drm_test_check_mode_valid
[14:18:03] [PASSED] drm_test_check_mode_valid_reject
[14:18:03] [PASSED] drm_test_check_mode_valid_reject_rate
[14:18:03] [PASSED] drm_test_check_mode_valid_reject_max_clock
[14:18:03] === [PASSED] drm_atomic_helper_connector_hdmi_mode_valid ===
[14:18:03] ================= drm_managed (2 subtests) =================
[14:18:03] [PASSED] drm_test_managed_release_action
[14:18:03] [PASSED] drm_test_managed_run_action
[14:18:03] =================== [PASSED] drm_managed ===================
[14:18:03] =================== drm_mm (6 subtests) ====================
[14:18:03] [PASSED] drm_test_mm_init
[14:18:03] [PASSED] drm_test_mm_debug
[14:18:03] [PASSED] drm_test_mm_align32
[14:18:03] [PASSED] drm_test_mm_align64
[14:18:03] [PASSED] drm_test_mm_lowest
[14:18:03] [PASSED] drm_test_mm_highest
[14:18:03] ===================== [PASSED] drm_mm ======================
[14:18:03] ============= drm_modes_analog_tv (5 subtests) =============
[14:18:03] [PASSED] drm_test_modes_analog_tv_mono_576i
[14:18:03] [PASSED] drm_test_modes_analog_tv_ntsc_480i
[14:18:03] [PASSED] drm_test_modes_analog_tv_ntsc_480i_inlined
[14:18:03] [PASSED] drm_test_modes_analog_tv_pal_576i
[14:18:03] [PASSED] drm_test_modes_analog_tv_pal_576i_inlined
[14:18:03] =============== [PASSED] drm_modes_analog_tv ===============
[14:18:03] ============== drm_plane_helper (2 subtests) ===============
[14:18:03] =============== drm_test_check_plane_state  ================
[14:18:03] [PASSED] clipping_simple
[14:18:03] [PASSED] clipping_rotate_reflect
[14:18:03] [PASSED] positioning_simple
[14:18:03] [PASSED] upscaling
[14:18:03] [PASSED] downscaling
[14:18:03] [PASSED] rounding1
[14:18:03] [PASSED] rounding2
[14:18:03] [PASSED] rounding3
[14:18:03] [PASSED] rounding4
[14:18:03] =========== [PASSED] drm_test_check_plane_state ============
[14:18:03] =========== drm_test_check_invalid_plane_state  ============
[14:18:03] [PASSED] positioning_invalid
[14:18:03] [PASSED] upscaling_invalid
[14:18:03] [PASSED] downscaling_invalid
[14:18:03] ======= [PASSED] drm_test_check_invalid_plane_state ========
[14:18:03] ================ [PASSED] drm_plane_helper =================
[14:18:03] ====== drm_connector_helper_tv_get_modes (1 subtest) =======
[14:18:03] ====== drm_test_connector_helper_tv_get_modes_check  =======
[14:18:03] [PASSED] None
[14:18:03] [PASSED] PAL
[14:18:03] [PASSED] NTSC
[14:18:03] [PASSED] Both, NTSC Default
[14:18:03] [PASSED] Both, PAL Default
[14:18:03] [PASSED] Both, NTSC Default, with PAL on command-line
[14:18:03] [PASSED] Both, PAL Default, with NTSC on command-line
[14:18:03] == [PASSED] drm_test_connector_helper_tv_get_modes_check ===
[14:18:03] ======== [PASSED] drm_connector_helper_tv_get_modes ========
[14:18:03] ================== drm_rect (9 subtests) ===================
[14:18:03] [PASSED] drm_test_rect_clip_scaled_div_by_zero
[14:18:03] [PASSED] drm_test_rect_clip_scaled_not_clipped
[14:18:03] [PASSED] drm_test_rect_clip_scaled_clipped
[14:18:03] [PASSED] drm_test_rect_clip_scaled_signed_vs_unsigned
[14:18:03] ================= drm_test_rect_intersect  =================
[14:18:03] [PASSED] top-left x bottom-right: 2x2+1+1 x 2x2+0+0
[14:18:03] [PASSED] top-right x bottom-left: 2x2+0+0 x 2x2+1-1
[14:18:03] [PASSED] bottom-left x top-right: 2x2+1-1 x 2x2+0+0
[14:18:03] [PASSED] bottom-right x top-left: 2x2+0+0 x 2x2+1+1
[14:18:03] [PASSED] right x left: 2x1+0+0 x 3x1+1+0
[14:18:03] [PASSED] left x right: 3x1+1+0 x 2x1+0+0
[14:18:03] [PASSED] up x bottom: 1x2+0+0 x 1x3+0-1
[14:18:03] [PASSED] bottom x up: 1x3+0-1 x 1x2+0+0
[14:18:03] [PASSED] touching corner: 1x1+0+0 x 2x2+1+1
[14:18:03] [PASSED] touching side: 1x1+0+0 x 1x1+1+0
[14:18:03] [PASSED] equal rects: 2x2+0+0 x 2x2+0+0
[14:18:03] [PASSED] inside another: 2x2+0+0 x 1x1+1+1
[14:18:03] [PASSED] far away: 1x1+0+0 x 1x1+3+6
[14:18:03] [PASSED] points intersecting: 0x0+5+10 x 0x0+5+10
[14:18:03] [PASSED] points not intersecting: 0x0+0+0 x 0x0+5+10
[14:18:03] ============= [PASSED] drm_test_rect_intersect =============
[14:18:03] ================ drm_test_rect_calc_hscale  ================
[14:18:03] [PASSED] normal use
[14:18:03] [PASSED] out of max range
[14:18:03] [PASSED] out of min range
[14:18:03] [PASSED] zero dst
[14:18:03] [PASSED] negative src
[14:18:03] [PASSED] negative dst
[14:18:03] ============ [PASSED] drm_test_rect_calc_hscale ============
[14:18:03] ================ drm_test_rect_calc_vscale  ================
[14:18:03] [PASSED] normal use
stty: 'standard input': Inappropriate ioctl for device
[14:18:03] [PASSED] out of max range
[14:18:03] [PASSED] out of min range
[14:18:03] [PASSED] zero dst
[14:18:03] [PASSED] negative src
[14:18:03] [PASSED] negative dst
[14:18:03] ============ [PASSED] drm_test_rect_calc_vscale ============
[14:18:03] ================== drm_test_rect_rotate  ===================
[14:18:03] [PASSED] reflect-x
[14:18:03] [PASSED] reflect-y
[14:18:03] [PASSED] rotate-0
[14:18:03] [PASSED] rotate-90
[14:18:03] [PASSED] rotate-180
[14:18:03] [PASSED] rotate-270
[14:18:03] ============== [PASSED] drm_test_rect_rotate ===============
[14:18:03] ================ drm_test_rect_rotate_inv  =================
[14:18:03] [PASSED] reflect-x
[14:18:03] [PASSED] reflect-y
[14:18:03] [PASSED] rotate-0
[14:18:03] [PASSED] rotate-90
[14:18:03] [PASSED] rotate-180
[14:18:03] [PASSED] rotate-270
[14:18:03] ============ [PASSED] drm_test_rect_rotate_inv =============
[14:18:03] ==================== [PASSED] drm_rect =====================
[14:18:03] ============ drm_sysfb_modeset_test (1 subtest) ============
[14:18:03] ============ drm_test_sysfb_build_fourcc_list  =============
[14:18:03] [PASSED] no native formats
[14:18:03] [PASSED] XRGB8888 as native format
[14:18:03] [PASSED] remove duplicates
[14:18:03] [PASSED] convert alpha formats
[14:18:03] [PASSED] random formats
[14:18:03] ======== [PASSED] drm_test_sysfb_build_fourcc_list =========
[14:18:03] ============= [PASSED] drm_sysfb_modeset_test ==============
[14:18:03] ============================================================
[14:18:03] Testing complete. Ran 622 tests: passed: 622
[14:18:03] Elapsed time: 27.045s total, 1.728s configuring, 24.888s building, 0.389s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/ttm/tests/.kunitconfig
[14:18:04] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[14:18:05] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
[14:18:15] Starting KUnit Kernel (1/1)...
[14:18:15] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[14:18:15] ================= ttm_device (5 subtests) ==================
[14:18:15] [PASSED] ttm_device_init_basic
[14:18:15] [PASSED] ttm_device_init_multiple
[14:18:15] [PASSED] ttm_device_fini_basic
[14:18:15] [PASSED] ttm_device_init_no_vma_man
[14:18:15] ================== ttm_device_init_pools  ==================
[14:18:15] [PASSED] No DMA allocations, no DMA32 required
[14:18:15] [PASSED] DMA allocations, DMA32 required
[14:18:15] [PASSED] No DMA allocations, DMA32 required
[14:18:15] [PASSED] DMA allocations, no DMA32 required
[14:18:15] ============== [PASSED] ttm_device_init_pools ==============
[14:18:15] =================== [PASSED] ttm_device ====================
[14:18:15] ================== ttm_pool (8 subtests) ===================
[14:18:15] ================== ttm_pool_alloc_basic  ===================
[14:18:15] [PASSED] One page
[14:18:15] [PASSED] More than one page
[14:18:15] [PASSED] Above the allocation limit
[14:18:15] [PASSED] One page, with coherent DMA mappings enabled
[14:18:15] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[14:18:15] ============== [PASSED] ttm_pool_alloc_basic ===============
[14:18:15] ============== ttm_pool_alloc_basic_dma_addr  ==============
[14:18:15] [PASSED] One page
[14:18:15] [PASSED] More than one page
[14:18:15] [PASSED] Above the allocation limit
[14:18:15] [PASSED] One page, with coherent DMA mappings enabled
[14:18:15] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[14:18:15] ========== [PASSED] ttm_pool_alloc_basic_dma_addr ==========
[14:18:15] [PASSED] ttm_pool_alloc_order_caching_match
[14:18:15] [PASSED] ttm_pool_alloc_caching_mismatch
[14:18:15] [PASSED] ttm_pool_alloc_order_mismatch
[14:18:15] [PASSED] ttm_pool_free_dma_alloc
[14:18:15] [PASSED] ttm_pool_free_no_dma_alloc
[14:18:15] [PASSED] ttm_pool_fini_basic
[14:18:15] ==================== [PASSED] ttm_pool =====================
[14:18:15] ================ ttm_resource (8 subtests) =================
[14:18:15] ================= ttm_resource_init_basic  =================
[14:18:15] [PASSED] Init resource in TTM_PL_SYSTEM
[14:18:15] [PASSED] Init resource in TTM_PL_VRAM
[14:18:15] [PASSED] Init resource in a private placement
[14:18:15] [PASSED] Init resource in TTM_PL_SYSTEM, set placement flags
[14:18:15] ============= [PASSED] ttm_resource_init_basic =============
[14:18:15] [PASSED] ttm_resource_init_pinned
[14:18:15] [PASSED] ttm_resource_fini_basic
[14:18:15] [PASSED] ttm_resource_manager_init_basic
[14:18:15] [PASSED] ttm_resource_manager_usage_basic
[14:18:15] [PASSED] ttm_resource_manager_set_used_basic
[14:18:15] [PASSED] ttm_sys_man_alloc_basic
[14:18:15] [PASSED] ttm_sys_man_free_basic
[14:18:15] ================== [PASSED] ttm_resource ===================
[14:18:15] =================== ttm_tt (15 subtests) ===================
[14:18:15] ==================== ttm_tt_init_basic  ====================
[14:18:15] [PASSED] Page-aligned size
[14:18:15] [PASSED] Extra pages requested
[14:18:15] ================ [PASSED] ttm_tt_init_basic ================
[14:18:15] [PASSED] ttm_tt_init_misaligned
[14:18:15] [PASSED] ttm_tt_fini_basic
[14:18:15] [PASSED] ttm_tt_fini_sg
[14:18:15] [PASSED] ttm_tt_fini_shmem
[14:18:15] [PASSED] ttm_tt_create_basic
[14:18:15] [PASSED] ttm_tt_create_invalid_bo_type
[14:18:15] [PASSED] ttm_tt_create_ttm_exists
[14:18:15] [PASSED] ttm_tt_create_failed
[14:18:15] [PASSED] ttm_tt_destroy_basic
[14:18:15] [PASSED] ttm_tt_populate_null_ttm
[14:18:15] [PASSED] ttm_tt_populate_populated_ttm
[14:18:15] [PASSED] ttm_tt_unpopulate_basic
[14:18:15] [PASSED] ttm_tt_unpopulate_empty_ttm
[14:18:15] [PASSED] ttm_tt_swapin_basic
[14:18:15] ===================== [PASSED] ttm_tt ======================
[14:18:15] =================== ttm_bo (14 subtests) ===================
[14:18:15] =========== ttm_bo_reserve_optimistic_no_ticket  ===========
[14:18:15] [PASSED] Cannot be interrupted and sleeps
[14:18:15] [PASSED] Cannot be interrupted, locks straight away
[14:18:15] [PASSED] Can be interrupted, sleeps
[14:18:15] ======= [PASSED] ttm_bo_reserve_optimistic_no_ticket =======
[14:18:15] [PASSED] ttm_bo_reserve_locked_no_sleep
[14:18:15] [PASSED] ttm_bo_reserve_no_wait_ticket
[14:18:15] [PASSED] ttm_bo_reserve_double_resv
[14:18:15] [PASSED] ttm_bo_reserve_interrupted
[14:18:15] [PASSED] ttm_bo_reserve_deadlock
[14:18:15] [PASSED] ttm_bo_unreserve_basic
[14:18:15] [PASSED] ttm_bo_unreserve_pinned
[14:18:15] [PASSED] ttm_bo_unreserve_bulk
[14:18:15] [PASSED] ttm_bo_fini_basic
[14:18:15] [PASSED] ttm_bo_fini_shared_resv
[14:18:15] [PASSED] ttm_bo_pin_basic
[14:18:15] [PASSED] ttm_bo_pin_unpin_resource
[14:18:15] [PASSED] ttm_bo_multiple_pin_one_unpin
[14:18:15] ===================== [PASSED] ttm_bo ======================
[14:18:15] ============== ttm_bo_validate (21 subtests) ===============
[14:18:15] ============== ttm_bo_init_reserved_sys_man  ===============
[14:18:15] [PASSED] Buffer object for userspace
[14:18:15] [PASSED] Kernel buffer object
[14:18:15] [PASSED] Shared buffer object
[14:18:15] ========== [PASSED] ttm_bo_init_reserved_sys_man ===========
[14:18:15] ============== ttm_bo_init_reserved_mock_man  ==============
[14:18:15] [PASSED] Buffer object for userspace
[14:18:15] [PASSED] Kernel buffer object
[14:18:15] [PASSED] Shared buffer object
[14:18:15] ========== [PASSED] ttm_bo_init_reserved_mock_man ==========
[14:18:15] [PASSED] ttm_bo_init_reserved_resv
[14:18:15] ================== ttm_bo_validate_basic  ==================
[14:18:15] [PASSED] Buffer object for userspace
[14:18:15] [PASSED] Kernel buffer object
[14:18:15] [PASSED] Shared buffer object
[14:18:15] ============== [PASSED] ttm_bo_validate_basic ==============
[14:18:15] [PASSED] ttm_bo_validate_invalid_placement
[14:18:15] ============= ttm_bo_validate_same_placement  ==============
[14:18:15] [PASSED] System manager
[14:18:15] [PASSED] VRAM manager
[14:18:15] ========= [PASSED] ttm_bo_validate_same_placement ==========
[14:18:15] [PASSED] ttm_bo_validate_failed_alloc
[14:18:15] [PASSED] ttm_bo_validate_pinned
[14:18:15] [PASSED] ttm_bo_validate_busy_placement
[14:18:15] ================ ttm_bo_validate_multihop  =================
[14:18:15] [PASSED] Buffer object for userspace
[14:18:15] [PASSED] Kernel buffer object
[14:18:15] [PASSED] Shared buffer object
[14:18:15] ============ [PASSED] ttm_bo_validate_multihop =============
[14:18:15] ========== ttm_bo_validate_no_placement_signaled  ==========
[14:18:15] [PASSED] Buffer object in system domain, no page vector
[14:18:15] [PASSED] Buffer object in system domain with an existing page vector
[14:18:15] ====== [PASSED] ttm_bo_validate_no_placement_signaled ======
[14:18:15] ======== ttm_bo_validate_no_placement_not_signaled  ========
[14:18:15] [PASSED] Buffer object for userspace
[14:18:15] [PASSED] Kernel buffer object
[14:18:15] [PASSED] Shared buffer object
[14:18:15] ==== [PASSED] ttm_bo_validate_no_placement_not_signaled ====
[14:18:15] [PASSED] ttm_bo_validate_move_fence_signaled
[14:18:15] ========= ttm_bo_validate_move_fence_not_signaled  =========
[14:18:15] [PASSED] Waits for GPU
[14:18:15] [PASSED] Tries to lock straight away
[14:18:15] ===== [PASSED] ttm_bo_validate_move_fence_not_signaled =====
[14:18:15] [PASSED] ttm_bo_validate_happy_evict
[14:18:15] [PASSED] ttm_bo_validate_all_pinned_evict
[14:18:15] [PASSED] ttm_bo_validate_allowed_only_evict
[14:18:15] [PASSED] ttm_bo_validate_deleted_evict
[14:18:15] [PASSED] ttm_bo_validate_busy_domain_evict
[14:18:15] [PASSED] ttm_bo_validate_evict_gutting
[14:18:15] [PASSED] ttm_bo_validate_recrusive_evict
stty: 'standard input': Inappropriate ioctl for device
[14:18:15] ================= [PASSED] ttm_bo_validate =================
[14:18:15] ============================================================
[14:18:15] Testing complete. Ran 101 tests: passed: 101
[14:18:15] Elapsed time: 11.322s total, 1.755s configuring, 9.351s building, 0.186s running

+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 21+ messages in thread

* ✓ Xe.CI.BAT: success for drm/xe/migrate: Atomicize CCS copy command setup
  2025-10-17 14:12 [PATCH v7 0/3] drm/xe/migrate: Atomicize CCS copy command setup Satyanarayana K V P
                   ` (4 preceding siblings ...)
  2025-10-17 14:18 ` ✓ CI.KUnit: success " Patchwork
@ 2025-10-17 15:23 ` Patchwork
  2025-10-18 12:27 ` ✗ Xe.CI.Full: failure " Patchwork
  6 siblings, 0 replies; 21+ messages in thread
From: Patchwork @ 2025-10-17 15:23 UTC (permalink / raw)
  To: K V P, Satyanarayana; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 1596 bytes --]

== Series Details ==

Series: drm/xe/migrate: Atomicize CCS copy command setup
URL   : https://patchwork.freedesktop.org/series/156128/
State : success

== Summary ==

CI Bug Log - changes from xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c_BAT -> xe-pw-156128v1_BAT
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (11 -> 11)
------------------------------

  No changes in participating hosts

Known issues
------------

  Here are the changes found in xe-pw-156128v1_BAT that come from known issues:

### IGT changes ###

#### Possible fixes ####

  * igt@kms_flip@basic-flip-vs-dpms:
    - bat-adlp-7:         [DMESG-WARN][1] ([Intel XE#4543]) -> [PASS][2] +1 other test pass
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/bat-adlp-7/igt@kms_flip@basic-flip-vs-dpms.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/bat-adlp-7/igt@kms_flip@basic-flip-vs-dpms.html

  
  [Intel XE#4543]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4543


Build changes
-------------

  * IGT: IGT_8591 -> IGT_8592
  * Linux: xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c -> xe-pw-156128v1

  IGT_8591: 8591
  IGT_8592: b3d809d537febc23792ab8d0eb6d13cf80d626c8 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c: c809fdcf60b85e8a261eaa1b49f18b9c5731b18c
  xe-pw-156128v1: 156128v1

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/index.html

[-- Attachment #2: Type: text/html, Size: 2175 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* ✗ Xe.CI.Full: failure for drm/xe/migrate: Atomicize CCS copy command setup
  2025-10-17 14:12 [PATCH v7 0/3] drm/xe/migrate: Atomicize CCS copy command setup Satyanarayana K V P
                   ` (5 preceding siblings ...)
  2025-10-17 15:23 ` ✓ Xe.CI.BAT: " Patchwork
@ 2025-10-18 12:27 ` Patchwork
  6 siblings, 0 replies; 21+ messages in thread
From: Patchwork @ 2025-10-18 12:27 UTC (permalink / raw)
  To: K V P, Satyanarayana; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 89048 bytes --]

== Series Details ==

Series: drm/xe/migrate: Atomicize CCS copy command setup
URL   : https://patchwork.freedesktop.org/series/156128/
State : failure

== Summary ==

CI Bug Log - changes from xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c_FULL -> xe-pw-156128v1_FULL
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with xe-pw-156128v1_FULL absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in xe-pw-156128v1_FULL, please notify your bug team (I915-ci-infra@lists.freedesktop.org) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (4 -> 4)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in xe-pw-156128v1_FULL:

### IGT changes ###

#### Possible regressions ####

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-indfb-draw-render:
    - shard-dg2-set2:     [PASS][1] -> [INCOMPLETE][2]
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-dg2-436/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-indfb-draw-render.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-436/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-indfb-draw-render.html

  * igt@kms_vblank@ts-continuation-suspend:
    - shard-bmg:          [PASS][3] -> [DMESG-FAIL][4] +1 other test dmesg-fail
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-1/igt@kms_vblank@ts-continuation-suspend.html
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-3/igt@kms_vblank@ts-continuation-suspend.html

  * igt@xe_pm@s4-basic:
    - shard-lnl:          [PASS][5] -> [FAIL][6] +1 other test fail
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-lnl-7/igt@xe_pm@s4-basic.html
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-3/igt@xe_pm@s4-basic.html

  * igt@xe_pm@s4-multiple-execs:
    - shard-adlp:         NOTRUN -> [FAIL][7]
   [7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@xe_pm@s4-multiple-execs.html
    - shard-dg2-set2:     [PASS][8] -> [FAIL][9]
   [8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-dg2-435/igt@xe_pm@s4-multiple-execs.html
   [9]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-464/igt@xe_pm@s4-multiple-execs.html

  * igt@xe_pm@s4-vm-bind-prefetch:
    - shard-adlp:         [PASS][10] -> [FAIL][11]
   [10]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-1/igt@xe_pm@s4-vm-bind-prefetch.html
   [11]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@xe_pm@s4-vm-bind-prefetch.html
    - shard-bmg:          [PASS][12] -> [FAIL][13] +1 other test fail
   [12]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-4/igt@xe_pm@s4-vm-bind-prefetch.html
   [13]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-7/igt@xe_pm@s4-vm-bind-prefetch.html
    - shard-dg2-set2:     NOTRUN -> [FAIL][14]
   [14]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-466/igt@xe_pm@s4-vm-bind-prefetch.html

  * igt@xe_pm@s4-vm-bind-userptr:
    - shard-lnl:          NOTRUN -> [FAIL][15] +1 other test fail
   [15]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-3/igt@xe_pm@s4-vm-bind-userptr.html

  
Known issues
------------

  Here are the changes found in xe-pw-156128v1_FULL that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@intel_hwmon@hwmon-read:
    - shard-adlp:         NOTRUN -> [SKIP][16] ([Intel XE#1125] / [Intel XE#5574])
   [16]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@intel_hwmon@hwmon-read.html

  * igt@intel_hwmon@hwmon-write:
    - shard-bmg:          [PASS][17] -> [FAIL][18] ([Intel XE#4665])
   [17]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-2/igt@intel_hwmon@hwmon-write.html
   [18]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-3/igt@intel_hwmon@hwmon-write.html

  * igt@kms_addfb_basic@addfb25-y-tiled-small-legacy:
    - shard-lnl:          NOTRUN -> [SKIP][19] ([Intel XE#1466])
   [19]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-3/igt@kms_addfb_basic@addfb25-y-tiled-small-legacy.html

  * igt@kms_async_flips@async-flip-with-page-flip-events-linear-atomic@pipe-c-edp-1:
    - shard-lnl:          NOTRUN -> [FAIL][20] ([Intel XE#6054]) +3 other tests fail
   [20]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-8/igt@kms_async_flips@async-flip-with-page-flip-events-linear-atomic@pipe-c-edp-1.html

  * igt@kms_async_flips@crc-atomic@pipe-d-hdmi-a-1:
    - shard-adlp:         NOTRUN -> [FAIL][21] ([Intel XE#3884]) +1 other test fail
   [21]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@kms_async_flips@crc-atomic@pipe-d-hdmi-a-1.html

  * igt@kms_big_fb@4-tiled-addfb-size-overflow:
    - shard-adlp:         NOTRUN -> [SKIP][22] ([Intel XE#610])
   [22]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-2/igt@kms_big_fb@4-tiled-addfb-size-overflow.html

  * igt@kms_big_fb@4-tiled-max-hw-stride-32bpp-rotate-180-async-flip:
    - shard-adlp:         NOTRUN -> [SKIP][23] ([Intel XE#1124]) +15 other tests skip
   [23]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@kms_big_fb@4-tiled-max-hw-stride-32bpp-rotate-180-async-flip.html

  * igt@kms_big_fb@x-tiled-16bpp-rotate-270:
    - shard-dg2-set2:     NOTRUN -> [SKIP][24] ([Intel XE#316]) +2 other tests skip
   [24]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-463/igt@kms_big_fb@x-tiled-16bpp-rotate-270.html
    - shard-lnl:          NOTRUN -> [SKIP][25] ([Intel XE#1407]) +2 other tests skip
   [25]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-5/igt@kms_big_fb@x-tiled-16bpp-rotate-270.html

  * igt@kms_big_fb@x-tiled-32bpp-rotate-270:
    - shard-bmg:          NOTRUN -> [SKIP][26] ([Intel XE#2327]) +2 other tests skip
   [26]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-1/igt@kms_big_fb@x-tiled-32bpp-rotate-270.html

  * igt@kms_big_fb@x-tiled-8bpp-rotate-90:
    - shard-adlp:         NOTRUN -> [SKIP][27] ([Intel XE#316]) +5 other tests skip
   [27]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-8/igt@kms_big_fb@x-tiled-8bpp-rotate-90.html

  * igt@kms_big_fb@y-tiled-32bpp-rotate-0:
    - shard-lnl:          NOTRUN -> [SKIP][28] ([Intel XE#1124]) +2 other tests skip
   [28]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-7/igt@kms_big_fb@y-tiled-32bpp-rotate-0.html

  * igt@kms_big_fb@yf-tiled-16bpp-rotate-180:
    - shard-bmg:          NOTRUN -> [SKIP][29] ([Intel XE#1124]) +2 other tests skip
   [29]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-2/igt@kms_big_fb@yf-tiled-16bpp-rotate-180.html

  * igt@kms_big_fb@yf-tiled-addfb-size-offset-overflow:
    - shard-adlp:         NOTRUN -> [SKIP][30] ([Intel XE#607])
   [30]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@kms_big_fb@yf-tiled-addfb-size-offset-overflow.html

  * igt@kms_big_fb@yf-tiled-addfb-size-overflow:
    - shard-dg2-set2:     NOTRUN -> [SKIP][31] ([Intel XE#610])
   [31]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-433/igt@kms_big_fb@yf-tiled-addfb-size-overflow.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-180-hflip-async-flip:
    - shard-dg2-set2:     NOTRUN -> [SKIP][32] ([Intel XE#1124]) +3 other tests skip
   [32]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-466/igt@kms_big_fb@yf-tiled-max-hw-stride-64bpp-rotate-180-hflip-async-flip.html

  * igt@kms_bw@connected-linear-tiling-4-displays-3840x2160p:
    - shard-adlp:         NOTRUN -> [SKIP][33] ([Intel XE#2191]) +1 other test skip
   [33]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-2/igt@kms_bw@connected-linear-tiling-4-displays-3840x2160p.html

  * igt@kms_bw@linear-tiling-1-displays-2160x1440p:
    - shard-bmg:          NOTRUN -> [SKIP][34] ([Intel XE#367])
   [34]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-6/igt@kms_bw@linear-tiling-1-displays-2160x1440p.html

  * igt@kms_bw@linear-tiling-3-displays-2160x1440p:
    - shard-dg2-set2:     NOTRUN -> [SKIP][35] ([Intel XE#367]) +1 other test skip
   [35]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-432/igt@kms_bw@linear-tiling-3-displays-2160x1440p.html

  * igt@kms_bw@linear-tiling-4-displays-2560x1440p:
    - shard-adlp:         NOTRUN -> [SKIP][36] ([Intel XE#367]) +7 other tests skip
   [36]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-8/igt@kms_bw@linear-tiling-4-displays-2560x1440p.html

  * igt@kms_ccs@bad-rotation-90-4-tiled-lnl-ccs@pipe-c-dp-2:
    - shard-bmg:          NOTRUN -> [SKIP][37] ([Intel XE#2652] / [Intel XE#787]) +11 other tests skip
   [37]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-4/igt@kms_ccs@bad-rotation-90-4-tiled-lnl-ccs@pipe-c-dp-2.html

  * igt@kms_ccs@crc-primary-basic-4-tiled-mtl-mc-ccs@pipe-c-hdmi-a-1:
    - shard-adlp:         NOTRUN -> [SKIP][38] ([Intel XE#787]) +80 other tests skip
   [38]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@kms_ccs@crc-primary-basic-4-tiled-mtl-mc-ccs@pipe-c-hdmi-a-1.html

  * igt@kms_ccs@crc-primary-rotation-180-4-tiled-dg2-rc-ccs-cc@pipe-d-hdmi-a-1:
    - shard-adlp:         NOTRUN -> [SKIP][39] ([Intel XE#455] / [Intel XE#787]) +53 other tests skip
   [39]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-8/igt@kms_ccs@crc-primary-rotation-180-4-tiled-dg2-rc-ccs-cc@pipe-d-hdmi-a-1.html

  * igt@kms_ccs@crc-primary-rotation-180-4-tiled-mtl-rc-ccs-cc@pipe-d-dp-4:
    - shard-dg2-set2:     NOTRUN -> [SKIP][40] ([Intel XE#455] / [Intel XE#787]) +9 other tests skip
   [40]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-435/igt@kms_ccs@crc-primary-rotation-180-4-tiled-mtl-rc-ccs-cc@pipe-d-dp-4.html

  * igt@kms_ccs@crc-primary-suspend-4-tiled-mtl-rc-ccs:
    - shard-lnl:          NOTRUN -> [SKIP][41] ([Intel XE#3432])
   [41]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-5/igt@kms_ccs@crc-primary-suspend-4-tiled-mtl-rc-ccs.html

  * igt@kms_ccs@crc-sprite-planes-basic-4-tiled-lnl-ccs:
    - shard-adlp:         NOTRUN -> [SKIP][42] ([Intel XE#2907]) +3 other tests skip
   [42]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-6/igt@kms_ccs@crc-sprite-planes-basic-4-tiled-lnl-ccs.html

  * igt@kms_ccs@missing-ccs-buffer-4-tiled-mtl-rc-ccs-cc:
    - shard-bmg:          NOTRUN -> [SKIP][43] ([Intel XE#2887]) +3 other tests skip
   [43]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-4/igt@kms_ccs@missing-ccs-buffer-4-tiled-mtl-rc-ccs-cc.html

  * igt@kms_ccs@missing-ccs-buffer-4-tiled-mtl-rc-ccs-cc@pipe-a-dp-4:
    - shard-dg2-set2:     NOTRUN -> [SKIP][44] ([Intel XE#787]) +34 other tests skip
   [44]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-466/igt@kms_ccs@missing-ccs-buffer-4-tiled-mtl-rc-ccs-cc@pipe-a-dp-4.html

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc:
    - shard-dg2-set2:     [PASS][45] -> [INCOMPLETE][46] ([Intel XE#1727] / [Intel XE#3113] / [Intel XE#4345] / [Intel XE#6168])
   [45]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-dg2-435/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc.html
   [46]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-463/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc.html

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-c-dp-4:
    - shard-dg2-set2:     [PASS][47] -> [INCOMPLETE][48] ([Intel XE#6168])
   [47]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-dg2-435/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-c-dp-4.html
   [48]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-463/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-c-dp-4.html

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-c-hdmi-a-6:
    - shard-dg2-set2:     [PASS][49] -> [DMESG-WARN][50] ([Intel XE#1727] / [Intel XE#3113])
   [49]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-dg2-435/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-c-hdmi-a-6.html
   [50]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-463/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-c-hdmi-a-6.html

  * igt@kms_ccs@random-ccs-data-4-tiled-mtl-rc-ccs-cc:
    - shard-lnl:          NOTRUN -> [SKIP][51] ([Intel XE#2887]) +3 other tests skip
   [51]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-4/igt@kms_ccs@random-ccs-data-4-tiled-mtl-rc-ccs-cc.html

  * igt@kms_cdclk@mode-transition@pipe-d-dp-4:
    - shard-dg2-set2:     NOTRUN -> [SKIP][52] ([Intel XE#4417]) +3 other tests skip
   [52]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-435/igt@kms_cdclk@mode-transition@pipe-d-dp-4.html

  * igt@kms_chamelium_color@ctm-red-to-blue:
    - shard-adlp:         NOTRUN -> [SKIP][53] ([Intel XE#306])
   [53]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-2/igt@kms_chamelium_color@ctm-red-to-blue.html
    - shard-bmg:          NOTRUN -> [SKIP][54] ([Intel XE#2325]) +1 other test skip
   [54]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-2/igt@kms_chamelium_color@ctm-red-to-blue.html
    - shard-dg2-set2:     NOTRUN -> [SKIP][55] ([Intel XE#306])
   [55]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-436/igt@kms_chamelium_color@ctm-red-to-blue.html
    - shard-lnl:          NOTRUN -> [SKIP][56] ([Intel XE#306]) +1 other test skip
   [56]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-3/igt@kms_chamelium_color@ctm-red-to-blue.html

  * igt@kms_chamelium_edid@hdmi-mode-timings:
    - shard-dg2-set2:     NOTRUN -> [SKIP][57] ([Intel XE#373]) +4 other tests skip
   [57]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-435/igt@kms_chamelium_edid@hdmi-mode-timings.html

  * igt@kms_chamelium_hpd@hdmi-hpd-storm-disable:
    - shard-adlp:         NOTRUN -> [SKIP][58] ([Intel XE#373]) +13 other tests skip
   [58]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@kms_chamelium_hpd@hdmi-hpd-storm-disable.html
    - shard-bmg:          NOTRUN -> [SKIP][59] ([Intel XE#2252]) +5 other tests skip
   [59]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-7/igt@kms_chamelium_hpd@hdmi-hpd-storm-disable.html
    - shard-lnl:          NOTRUN -> [SKIP][60] ([Intel XE#373]) +1 other test skip
   [60]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-5/igt@kms_chamelium_hpd@hdmi-hpd-storm-disable.html

  * igt@kms_content_protection@dp-mst-type-0:
    - shard-adlp:         NOTRUN -> [SKIP][61] ([Intel XE#307]) +1 other test skip
   [61]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-6/igt@kms_content_protection@dp-mst-type-0.html

  * igt@kms_content_protection@lic-type-0:
    - shard-dg2-set2:     NOTRUN -> [FAIL][62] ([Intel XE#1178]) +2 other tests fail
   [62]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-436/igt@kms_content_protection@lic-type-0.html

  * igt@kms_content_protection@lic-type-0@pipe-a-dp-4:
    - shard-dg2-set2:     NOTRUN -> [FAIL][63] ([Intel XE#3304])
   [63]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-436/igt@kms_content_protection@lic-type-0@pipe-a-dp-4.html

  * igt@kms_content_protection@mei-interface:
    - shard-bmg:          NOTRUN -> [SKIP][64] ([Intel XE#2341]) +1 other test skip
   [64]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-8/igt@kms_content_protection@mei-interface.html

  * igt@kms_cursor_crc@cursor-offscreen-512x512:
    - shard-dg2-set2:     NOTRUN -> [SKIP][65] ([Intel XE#308]) +1 other test skip
   [65]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-466/igt@kms_cursor_crc@cursor-offscreen-512x512.html
    - shard-lnl:          NOTRUN -> [SKIP][66] ([Intel XE#2321]) +1 other test skip
   [66]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-7/igt@kms_cursor_crc@cursor-offscreen-512x512.html

  * igt@kms_cursor_crc@cursor-random-512x170:
    - shard-adlp:         NOTRUN -> [SKIP][67] ([Intel XE#308]) +4 other tests skip
   [67]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-6/igt@kms_cursor_crc@cursor-random-512x170.html

  * igt@kms_cursor_crc@cursor-random-512x512:
    - shard-bmg:          NOTRUN -> [SKIP][68] ([Intel XE#2321]) +1 other test skip
   [68]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-7/igt@kms_cursor_crc@cursor-random-512x512.html

  * igt@kms_cursor_crc@cursor-rapid-movement-128x42:
    - shard-lnl:          NOTRUN -> [SKIP][69] ([Intel XE#1424]) +1 other test skip
   [69]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-7/igt@kms_cursor_crc@cursor-rapid-movement-128x42.html

  * igt@kms_cursor_crc@cursor-sliding-64x21:
    - shard-bmg:          NOTRUN -> [SKIP][70] ([Intel XE#2320]) +1 other test skip
   [70]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-6/igt@kms_cursor_crc@cursor-sliding-64x21.html

  * igt@kms_cursor_crc@cursor-sliding-max-size:
    - shard-dg2-set2:     NOTRUN -> [SKIP][71] ([Intel XE#455]) +3 other tests skip
   [71]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-436/igt@kms_cursor_crc@cursor-sliding-max-size.html

  * igt@kms_cursor_legacy@cursora-vs-flipb-legacy:
    - shard-bmg:          [PASS][72] -> [SKIP][73] ([Intel XE#2291]) +4 other tests skip
   [72]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-7/igt@kms_cursor_legacy@cursora-vs-flipb-legacy.html
   [73]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-6/igt@kms_cursor_legacy@cursora-vs-flipb-legacy.html

  * igt@kms_cursor_legacy@cursora-vs-flipb-varying-size:
    - shard-bmg:          [PASS][74] -> [DMESG-WARN][75] ([Intel XE#5354])
   [74]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-7/igt@kms_cursor_legacy@cursora-vs-flipb-varying-size.html
   [75]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-8/igt@kms_cursor_legacy@cursora-vs-flipb-varying-size.html

  * igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions-varying-size:
    - shard-bmg:          NOTRUN -> [SKIP][76] ([Intel XE#2291])
   [76]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-6/igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions-varying-size.html

  * igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions-varying-size:
    - shard-adlp:         NOTRUN -> [SKIP][77] ([Intel XE#309]) +3 other tests skip
   [77]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions-varying-size.html

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic:
    - shard-bmg:          [PASS][78] -> [FAIL][79] ([Intel XE#1475])
   [78]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-4/igt@kms_cursor_legacy@flip-vs-cursor-atomic.html
   [79]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-2/igt@kms_cursor_legacy@flip-vs-cursor-atomic.html

  * igt@kms_cursor_legacy@flip-vs-cursor-legacy:
    - shard-bmg:          [PASS][80] -> [FAIL][81] ([Intel XE#5299])
   [80]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-3/igt@kms_cursor_legacy@flip-vs-cursor-legacy.html
   [81]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-7/igt@kms_cursor_legacy@flip-vs-cursor-legacy.html

  * igt@kms_cursor_legacy@short-busy-flip-before-cursor-atomic-transitions:
    - shard-adlp:         NOTRUN -> [SKIP][82] ([Intel XE#323])
   [82]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-2/igt@kms_cursor_legacy@short-busy-flip-before-cursor-atomic-transitions.html

  * igt@kms_cursor_legacy@short-busy-flip-before-cursor-atomic-transitions-varying-size:
    - shard-lnl:          NOTRUN -> [SKIP][83] ([Intel XE#323])
   [83]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-8/igt@kms_cursor_legacy@short-busy-flip-before-cursor-atomic-transitions-varying-size.html

  * igt@kms_dither@fb-8bpc-vs-panel-6bpc@pipe-a-hdmi-a-3:
    - shard-bmg:          NOTRUN -> [SKIP][84] ([Intel XE#1340])
   [84]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-3/igt@kms_dither@fb-8bpc-vs-panel-6bpc@pipe-a-hdmi-a-3.html

  * igt@kms_dp_linktrain_fallback@dp-fallback:
    - shard-bmg:          [PASS][85] -> [SKIP][86] ([Intel XE#4294])
   [85]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-7/igt@kms_dp_linktrain_fallback@dp-fallback.html
   [86]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-6/igt@kms_dp_linktrain_fallback@dp-fallback.html

  * igt@kms_fbc_dirty_rect@fbc-dirty-rectangle-different-formats:
    - shard-dg2-set2:     NOTRUN -> [SKIP][87] ([Intel XE#4422])
   [87]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-435/igt@kms_fbc_dirty_rect@fbc-dirty-rectangle-different-formats.html
    - shard-bmg:          NOTRUN -> [SKIP][88] ([Intel XE#4422])
   [88]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-7/igt@kms_fbc_dirty_rect@fbc-dirty-rectangle-different-formats.html

  * igt@kms_fbcon_fbt@fbc-suspend:
    - shard-adlp:         [PASS][89] -> [DMESG-WARN][90] ([Intel XE#2953] / [Intel XE#4173]) +3 other tests dmesg-warn
   [89]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-9/igt@kms_fbcon_fbt@fbc-suspend.html
   [90]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@kms_fbcon_fbt@fbc-suspend.html

  * igt@kms_fbcon_fbt@psr-suspend:
    - shard-adlp:         NOTRUN -> [SKIP][91] ([Intel XE#776])
   [91]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@kms_fbcon_fbt@psr-suspend.html

  * igt@kms_feature_discovery@display-3x:
    - shard-adlp:         NOTRUN -> [SKIP][92] ([Intel XE#703])
   [92]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-6/igt@kms_feature_discovery@display-3x.html

  * igt@kms_feature_discovery@display-4x:
    - shard-lnl:          NOTRUN -> [SKIP][93] ([Intel XE#1138])
   [93]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-3/igt@kms_feature_discovery@display-4x.html

  * igt@kms_feature_discovery@dp-mst:
    - shard-adlp:         NOTRUN -> [SKIP][94] ([Intel XE#1137])
   [94]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-8/igt@kms_feature_discovery@dp-mst.html

  * igt@kms_feature_discovery@psr2:
    - shard-adlp:         NOTRUN -> [SKIP][95] ([Intel XE#1135])
   [95]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@kms_feature_discovery@psr2.html

  * igt@kms_flip@2x-flip-vs-wf_vblank-interruptible:
    - shard-lnl:          NOTRUN -> [SKIP][96] ([Intel XE#1421])
   [96]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-4/igt@kms_flip@2x-flip-vs-wf_vblank-interruptible.html

  * igt@kms_flip@2x-plain-flip:
    - shard-adlp:         NOTRUN -> [SKIP][97] ([Intel XE#310]) +9 other tests skip
   [97]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-2/igt@kms_flip@2x-plain-flip.html

  * igt@kms_flip@2x-plain-flip-interruptible:
    - shard-bmg:          [PASS][98] -> [SKIP][99] ([Intel XE#2316]) +3 other tests skip
   [98]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-8/igt@kms_flip@2x-plain-flip-interruptible.html
   [99]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-6/igt@kms_flip@2x-plain-flip-interruptible.html

  * igt@kms_flip@blocking-wf_vblank:
    - shard-dg2-set2:     [PASS][100] -> [INCOMPLETE][101] ([Intel XE#2049]) +3 other tests incomplete
   [100]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-dg2-464/igt@kms_flip@blocking-wf_vblank.html
   [101]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-432/igt@kms_flip@blocking-wf_vblank.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible:
    - shard-adlp:         NOTRUN -> [DMESG-WARN][102] ([Intel XE#4543]) +7 other tests dmesg-warn
   [102]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-6/igt@kms_flip@flip-vs-expired-vblank-interruptible.html

  * igt@kms_flip@flip-vs-rmfb-interruptible:
    - shard-adlp:         [PASS][103] -> [DMESG-WARN][104] ([Intel XE#5208])
   [103]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-9/igt@kms_flip@flip-vs-rmfb-interruptible.html
   [104]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@kms_flip@flip-vs-rmfb-interruptible.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-adlp:         [PASS][105] -> [DMESG-WARN][106] ([Intel XE#4543]) +5 other tests dmesg-warn
   [105]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-1/igt@kms_flip@flip-vs-suspend-interruptible.html
   [106]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-8/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_flip_scaled_crc@flip-32bpp-4tile-to-64bpp-4tile-downscaling:
    - shard-lnl:          NOTRUN -> [SKIP][107] ([Intel XE#1397] / [Intel XE#1745])
   [107]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-3/igt@kms_flip_scaled_crc@flip-32bpp-4tile-to-64bpp-4tile-downscaling.html

  * igt@kms_flip_scaled_crc@flip-32bpp-4tile-to-64bpp-4tile-downscaling@pipe-a-default-mode:
    - shard-lnl:          NOTRUN -> [SKIP][108] ([Intel XE#1397])
   [108]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-3/igt@kms_flip_scaled_crc@flip-32bpp-4tile-to-64bpp-4tile-downscaling@pipe-a-default-mode.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile-downscaling:
    - shard-bmg:          NOTRUN -> [SKIP][109] ([Intel XE#2293] / [Intel XE#2380]) +1 other test skip
   [109]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-7/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile-downscaling.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile-downscaling@pipe-a-valid-mode:
    - shard-bmg:          NOTRUN -> [SKIP][110] ([Intel XE#2293]) +1 other test skip
   [110]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-7/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile-downscaling@pipe-a-valid-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilegen12rcccs-upscaling@pipe-a-valid-mode:
    - shard-adlp:         NOTRUN -> [SKIP][111] ([Intel XE#455]) +33 other tests skip
   [111]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilegen12rcccs-upscaling@pipe-a-valid-mode.html

  * igt@kms_frontbuffer_tracking@drrs-2p-primscrn-indfb-pgflip-blt:
    - shard-adlp:         NOTRUN -> [SKIP][112] ([Intel XE#656]) +67 other tests skip
   [112]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-indfb-pgflip-blt.html

  * igt@kms_frontbuffer_tracking@drrs-rgb565-draw-render:
    - shard-bmg:          NOTRUN -> [SKIP][113] ([Intel XE#2311]) +9 other tests skip
   [113]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-8/igt@kms_frontbuffer_tracking@drrs-rgb565-draw-render.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-blt:
    - shard-bmg:          NOTRUN -> [SKIP][114] ([Intel XE#5390]) +5 other tests skip
   [114]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-7/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-shrfb-draw-render:
    - shard-adlp:         NOTRUN -> [DMESG-FAIL][115] ([Intel XE#4543]) +8 other tests dmesg-fail
   [115]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-8/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-shrfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-1p-primscrn-cur-indfb-onoff:
    - shard-adlp:         NOTRUN -> [SKIP][116] ([Intel XE#651]) +17 other tests skip
   [116]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-8/igt@kms_frontbuffer_tracking@fbcdrrs-1p-primscrn-cur-indfb-onoff.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-1p-primscrn-spr-indfb-move:
    - shard-lnl:          NOTRUN -> [SKIP][117] ([Intel XE#651])
   [117]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-2/igt@kms_frontbuffer_tracking@fbcdrrs-1p-primscrn-spr-indfb-move.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-shrfb-plflip-blt:
    - shard-bmg:          NOTRUN -> [SKIP][118] ([Intel XE#2312]) +2 other tests skip
   [118]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-6/igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-shrfb-plflip-blt.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-2p-scndscrn-cur-indfb-draw-blt:
    - shard-dg2-set2:     NOTRUN -> [SKIP][119] ([Intel XE#651]) +13 other tests skip
   [119]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-432/igt@kms_frontbuffer_tracking@fbcdrrs-2p-scndscrn-cur-indfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-cur-indfb-onoff:
    - shard-adlp:         NOTRUN -> [SKIP][120] ([Intel XE#653]) +19 other tests skip
   [120]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-cur-indfb-onoff.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-spr-indfb-onoff:
    - shard-bmg:          NOTRUN -> [SKIP][121] ([Intel XE#2313]) +8 other tests skip
   [121]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-4/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-spr-indfb-onoff.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-cur-indfb-draw-render:
    - shard-lnl:          NOTRUN -> [SKIP][122] ([Intel XE#656]) +16 other tests skip
   [122]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-2/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-cur-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbcpsr-modesetfrombusy:
    - shard-dg2-set2:     NOTRUN -> [SKIP][123] ([Intel XE#653]) +13 other tests skip
   [123]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-464/igt@kms_frontbuffer_tracking@fbcpsr-modesetfrombusy.html

  * igt@kms_frontbuffer_tracking@fbcpsr-tiling-4:
    - shard-adlp:         NOTRUN -> [SKIP][124] ([Intel XE#1151])
   [124]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-8/igt@kms_frontbuffer_tracking@fbcpsr-tiling-4.html

  * igt@kms_hdr@invalid-metadata-sizes:
    - shard-bmg:          [PASS][125] -> [SKIP][126] ([Intel XE#1503])
   [125]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-5/igt@kms_hdr@invalid-metadata-sizes.html
   [126]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-6/igt@kms_hdr@invalid-metadata-sizes.html

  * igt@kms_joiner@basic-max-non-joiner:
    - shard-lnl:          NOTRUN -> [SKIP][127] ([Intel XE#4298])
   [127]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-4/igt@kms_joiner@basic-max-non-joiner.html

  * igt@kms_joiner@invalid-modeset-force-ultra-joiner:
    - shard-adlp:         NOTRUN -> [SKIP][128] ([Intel XE#2925])
   [128]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@kms_joiner@invalid-modeset-force-ultra-joiner.html

  * igt@kms_plane@planar-pixel-format-settings:
    - shard-adlp:         NOTRUN -> [DMESG-WARN][129] ([Intel XE#2953] / [Intel XE#4173])
   [129]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-8/igt@kms_plane@planar-pixel-format-settings.html

  * igt@kms_plane_lowres@tiling-4:
    - shard-lnl:          NOTRUN -> [SKIP][130] ([Intel XE#599]) +3 other tests skip
   [130]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-2/igt@kms_plane_lowres@tiling-4.html

  * igt@kms_plane_multiple@2x-tiling-x:
    - shard-adlp:         NOTRUN -> [SKIP][131] ([Intel XE#4596])
   [131]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@kms_plane_multiple@2x-tiling-x.html

  * igt@kms_plane_multiple@2x-tiling-yf:
    - shard-bmg:          NOTRUN -> [SKIP][132] ([Intel XE#5021])
   [132]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-8/igt@kms_plane_multiple@2x-tiling-yf.html
    - shard-dg2-set2:     NOTRUN -> [SKIP][133] ([Intel XE#5021])
   [133]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-435/igt@kms_plane_multiple@2x-tiling-yf.html

  * igt@kms_plane_scaling@planes-downscale-factor-0-75@pipe-b:
    - shard-lnl:          NOTRUN -> [SKIP][134] ([Intel XE#2763]) +7 other tests skip
   [134]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-3/igt@kms_plane_scaling@planes-downscale-factor-0-75@pipe-b.html

  * igt@kms_plane_scaling@planes-upscale-20x20-downscale-factor-0-75@pipe-a:
    - shard-bmg:          NOTRUN -> [SKIP][135] ([Intel XE#2763]) +4 other tests skip
   [135]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-8/igt@kms_plane_scaling@planes-upscale-20x20-downscale-factor-0-75@pipe-a.html

  * igt@kms_pm_backlight@fade-with-suspend:
    - shard-adlp:         NOTRUN -> [SKIP][136] ([Intel XE#870])
   [136]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@kms_pm_backlight@fade-with-suspend.html

  * igt@kms_pm_dc@dc3co-vpb-simulation:
    - shard-adlp:         NOTRUN -> [SKIP][137] ([Intel XE#1122])
   [137]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-2/igt@kms_pm_dc@dc3co-vpb-simulation.html

  * igt@kms_pm_dc@dc5-dpms:
    - shard-lnl:          [PASS][138] -> [FAIL][139] ([Intel XE#718])
   [138]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-lnl-8/igt@kms_pm_dc@dc5-dpms.html
   [139]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-5/igt@kms_pm_dc@dc5-dpms.html

  * igt@kms_pm_dc@dc5-psr:
    - shard-adlp:         NOTRUN -> [SKIP][140] ([Intel XE#1129]) +1 other test skip
   [140]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-6/igt@kms_pm_dc@dc5-psr.html

  * igt@kms_pm_dc@dc5-retention-flops:
    - shard-adlp:         NOTRUN -> [SKIP][141] ([Intel XE#3309])
   [141]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@kms_pm_dc@dc5-retention-flops.html

  * igt@kms_pm_dc@dc9-dpms:
    - shard-adlp:         NOTRUN -> [SKIP][142] ([Intel XE#734])
   [142]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-8/igt@kms_pm_dc@dc9-dpms.html

  * igt@kms_pm_rpm@modeset-non-lpsp-stress-no-wait:
    - shard-adlp:         NOTRUN -> [SKIP][143] ([Intel XE#836]) +1 other test skip
   [143]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@kms_pm_rpm@modeset-non-lpsp-stress-no-wait.html

  * igt@kms_psr2_sf@fbc-pr-cursor-plane-update-sf:
    - shard-lnl:          NOTRUN -> [SKIP][144] ([Intel XE#1406] / [Intel XE#2893])
   [144]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-5/igt@kms_psr2_sf@fbc-pr-cursor-plane-update-sf.html

  * igt@kms_psr2_sf@fbc-pr-overlay-plane-move-continuous-sf:
    - shard-dg2-set2:     NOTRUN -> [SKIP][145] ([Intel XE#1406] / [Intel XE#1489]) +3 other tests skip
   [145]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-434/igt@kms_psr2_sf@fbc-pr-overlay-plane-move-continuous-sf.html

  * igt@kms_psr2_sf@fbc-psr2-cursor-plane-move-continuous-exceed-fully-sf@pipe-b-edp-1:
    - shard-lnl:          NOTRUN -> [SKIP][146] ([Intel XE#1406] / [Intel XE#4608])
   [146]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-5/igt@kms_psr2_sf@fbc-psr2-cursor-plane-move-continuous-exceed-fully-sf@pipe-b-edp-1.html

  * igt@kms_psr2_sf@psr2-cursor-plane-move-continuous-exceed-fully-sf:
    - shard-bmg:          NOTRUN -> [SKIP][147] ([Intel XE#1406] / [Intel XE#1489]) +1 other test skip
   [147]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-8/igt@kms_psr2_sf@psr2-cursor-plane-move-continuous-exceed-fully-sf.html

  * igt@kms_psr2_sf@psr2-cursor-plane-move-continuous-sf:
    - shard-adlp:         NOTRUN -> [SKIP][148] ([Intel XE#1406] / [Intel XE#1489]) +12 other tests skip
   [148]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@kms_psr2_sf@psr2-cursor-plane-move-continuous-sf.html

  * igt@kms_psr2_su@page_flip-xrgb8888:
    - shard-adlp:         NOTRUN -> [SKIP][149] ([Intel XE#1122] / [Intel XE#1406] / [Intel XE#5580]) +1 other test skip
   [149]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-6/igt@kms_psr2_su@page_flip-xrgb8888.html

  * igt@kms_psr@fbc-pr-sprite-plane-onoff:
    - shard-bmg:          NOTRUN -> [SKIP][150] ([Intel XE#1406] / [Intel XE#2234] / [Intel XE#2850]) +4 other tests skip
   [150]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-6/igt@kms_psr@fbc-pr-sprite-plane-onoff.html

  * igt@kms_psr@fbc-psr2-sprite-plane-onoff:
    - shard-dg2-set2:     NOTRUN -> [SKIP][151] ([Intel XE#1406] / [Intel XE#2850] / [Intel XE#929]) +2 other tests skip
   [151]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-463/igt@kms_psr@fbc-psr2-sprite-plane-onoff.html

  * igt@kms_psr@pr-primary-render:
    - shard-lnl:          NOTRUN -> [SKIP][152] ([Intel XE#1406])
   [152]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-2/igt@kms_psr@pr-primary-render.html

  * igt@kms_psr@psr2-sprite-blt:
    - shard-adlp:         NOTRUN -> [SKIP][153] ([Intel XE#1406] / [Intel XE#2850] / [Intel XE#929]) +17 other tests skip
   [153]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@kms_psr@psr2-sprite-blt.html

  * igt@kms_psr_stress_test@flip-primary-invalidate-overlay:
    - shard-adlp:         NOTRUN -> [SKIP][154] ([Intel XE#1406] / [Intel XE#2939] / [Intel XE#5585])
   [154]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@kms_psr_stress_test@flip-primary-invalidate-overlay.html

  * igt@kms_rotation_crc@primary-rotation-270:
    - shard-lnl:          NOTRUN -> [SKIP][155] ([Intel XE#3414] / [Intel XE#3904])
   [155]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-4/igt@kms_rotation_crc@primary-rotation-270.html

  * igt@kms_rotation_crc@primary-rotation-90:
    - shard-adlp:         NOTRUN -> [SKIP][156] ([Intel XE#3414])
   [156]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-2/igt@kms_rotation_crc@primary-rotation-90.html

  * igt@kms_rotation_crc@primary-yf-tiled-reflect-x-180:
    - shard-adlp:         NOTRUN -> [SKIP][157] ([Intel XE#1127]) +1 other test skip
   [157]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@kms_rotation_crc@primary-yf-tiled-reflect-x-180.html

  * igt@kms_vrr@cmrr:
    - shard-adlp:         NOTRUN -> [SKIP][158] ([Intel XE#2168])
   [158]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-8/igt@kms_vrr@cmrr.html

  * igt@kms_vrr@cmrr@pipe-a-edp-1:
    - shard-lnl:          [PASS][159] -> [FAIL][160] ([Intel XE#4459]) +1 other test fail
   [159]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-lnl-7/igt@kms_vrr@cmrr@pipe-a-edp-1.html
   [160]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-1/igt@kms_vrr@cmrr@pipe-a-edp-1.html

  * igt@kms_vrr@lobf:
    - shard-lnl:          NOTRUN -> [SKIP][161] ([Intel XE#1499])
   [161]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-8/igt@kms_vrr@lobf.html

  * igt@xe_ccs@block-copy-compressed-inc-dimension:
    - shard-adlp:         NOTRUN -> [SKIP][162] ([Intel XE#455] / [Intel XE#488] / [Intel XE#5607])
   [162]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-2/igt@xe_ccs@block-copy-compressed-inc-dimension.html

  * igt@xe_ccs@large-ctrl-surf-copy:
    - shard-adlp:         NOTRUN -> [SKIP][163] ([Intel XE#3576] / [Intel XE#5610])
   [163]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@xe_ccs@large-ctrl-surf-copy.html

  * igt@xe_compute@ccs-mode-basic:
    - shard-lnl:          NOTRUN -> [SKIP][164] ([Intel XE#1447])
   [164]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-1/igt@xe_compute@ccs-mode-basic.html

  * igt@xe_compute@ccs-mode-compute-kernel:
    - shard-adlp:         NOTRUN -> [SKIP][165] ([Intel XE#1447] / [Intel XE#5596])
   [165]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-2/igt@xe_compute@ccs-mode-compute-kernel.html

  * igt@xe_compute_preempt@compute-preempt-many:
    - shard-adlp:         NOTRUN -> [SKIP][166] ([Intel XE#6360])
   [166]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@xe_compute_preempt@compute-preempt-many.html
    - shard-dg2-set2:     NOTRUN -> [SKIP][167] ([Intel XE#6360])
   [167]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-464/igt@xe_compute_preempt@compute-preempt-many.html

  * igt@xe_copy_basic@mem-set-linear-0x369:
    - shard-adlp:         NOTRUN -> [SKIP][168] ([Intel XE#1126]) +1 other test skip
   [168]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-8/igt@xe_copy_basic@mem-set-linear-0x369.html

  * igt@xe_copy_basic@mem-set-linear-0xfd:
    - shard-dg2-set2:     NOTRUN -> [SKIP][169] ([Intel XE#1126])
   [169]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-463/igt@xe_copy_basic@mem-set-linear-0xfd.html

  * igt@xe_eu_stall@blocking-read:
    - shard-adlp:         NOTRUN -> [SKIP][170] ([Intel XE#5626]) +1 other test skip
   [170]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@xe_eu_stall@blocking-read.html

  * igt@xe_eu_stall@invalid-event-report-count:
    - shard-dg2-set2:     NOTRUN -> [SKIP][171] ([Intel XE#5626]) +1 other test skip
   [171]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-464/igt@xe_eu_stall@invalid-event-report-count.html

  * igt@xe_eudebug@basic-read-event:
    - shard-dg2-set2:     NOTRUN -> [SKIP][172] ([Intel XE#4837]) +8 other tests skip
   [172]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-463/igt@xe_eudebug@basic-read-event.html

  * igt@xe_eudebug@basic-vm-access-faultable:
    - shard-lnl:          NOTRUN -> [SKIP][173] ([Intel XE#4837]) +5 other tests skip
   [173]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-4/igt@xe_eudebug@basic-vm-access-faultable.html

  * igt@xe_eudebug@basic-vm-access-parameters-userptr:
    - shard-bmg:          NOTRUN -> [SKIP][174] ([Intel XE#4837]) +5 other tests skip
   [174]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-2/igt@xe_eudebug@basic-vm-access-parameters-userptr.html

  * igt@xe_eudebug_online@set-breakpoint-sigint-debugger:
    - shard-adlp:         NOTRUN -> [SKIP][175] ([Intel XE#4837] / [Intel XE#5565]) +21 other tests skip
   [175]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@xe_eudebug_online@set-breakpoint-sigint-debugger.html

  * igt@xe_evict@evict-beng-mixed-threads-large:
    - shard-adlp:         NOTRUN -> [SKIP][176] ([Intel XE#261]) +6 other tests skip
   [176]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@xe_evict@evict-beng-mixed-threads-large.html

  * igt@xe_evict@evict-beng-mixed-threads-large-multi-vm:
    - shard-lnl:          NOTRUN -> [SKIP][177] ([Intel XE#688]) +3 other tests skip
   [177]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-7/igt@xe_evict@evict-beng-mixed-threads-large-multi-vm.html

  * igt@xe_evict@evict-small-external-cm:
    - shard-adlp:         NOTRUN -> [SKIP][178] ([Intel XE#261] / [Intel XE#5564] / [Intel XE#688]) +4 other tests skip
   [178]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@xe_evict@evict-small-external-cm.html

  * igt@xe_evict@evict-threads-small:
    - shard-adlp:         NOTRUN -> [SKIP][179] ([Intel XE#261] / [Intel XE#688])
   [179]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@xe_evict@evict-threads-small.html

  * igt@xe_evict_ccs@evict-overcommit-parallel-instantfree-samefd:
    - shard-adlp:         NOTRUN -> [SKIP][180] ([Intel XE#688]) +3 other tests skip
   [180]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@xe_evict_ccs@evict-overcommit-parallel-instantfree-samefd.html

  * igt@xe_exec_basic@multigpu-many-execqueues-many-vm-bindexecqueue-userptr-rebind:
    - shard-lnl:          NOTRUN -> [SKIP][181] ([Intel XE#1392]) +2 other tests skip
   [181]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-4/igt@xe_exec_basic@multigpu-many-execqueues-many-vm-bindexecqueue-userptr-rebind.html

  * igt@xe_exec_basic@multigpu-once-bindexecqueue-rebind:
    - shard-bmg:          NOTRUN -> [SKIP][182] ([Intel XE#2322]) +3 other tests skip
   [182]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-5/igt@xe_exec_basic@multigpu-once-bindexecqueue-rebind.html
    - shard-dg2-set2:     NOTRUN -> [SKIP][183] ([Intel XE#1392])
   [183]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-464/igt@xe_exec_basic@multigpu-once-bindexecqueue-rebind.html

  * igt@xe_exec_basic@multigpu-once-bindexecqueue-userptr-invalidate:
    - shard-adlp:         NOTRUN -> [SKIP][184] ([Intel XE#1392] / [Intel XE#5575]) +14 other tests skip
   [184]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@xe_exec_basic@multigpu-once-bindexecqueue-userptr-invalidate.html

  * igt@xe_exec_fault_mode@once-rebind-prefetch:
    - shard-dg2-set2:     NOTRUN -> [SKIP][185] ([Intel XE#288]) +10 other tests skip
   [185]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-436/igt@xe_exec_fault_mode@once-rebind-prefetch.html

  * igt@xe_exec_fault_mode@once-userptr-invalidate:
    - shard-adlp:         NOTRUN -> [SKIP][186] ([Intel XE#288] / [Intel XE#5561]) +32 other tests skip
   [186]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@xe_exec_fault_mode@once-userptr-invalidate.html

  * igt@xe_exec_mix_modes@exec-spinner-interrupted-lr:
    - shard-adlp:         NOTRUN -> [SKIP][187] ([Intel XE#2360])
   [187]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@xe_exec_mix_modes@exec-spinner-interrupted-lr.html

  * igt@xe_exec_reset@cat-error:
    - shard-adlp:         NOTRUN -> [DMESG-WARN][188] ([Intel XE#3868])
   [188]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@xe_exec_reset@cat-error.html

  * igt@xe_exec_system_allocator@many-stride-mmap-huge-nomemset:
    - shard-lnl:          NOTRUN -> [SKIP][189] ([Intel XE#4943]) +7 other tests skip
   [189]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-4/igt@xe_exec_system_allocator@many-stride-mmap-huge-nomemset.html

  * igt@xe_exec_system_allocator@once-mmap-huge-nomemset:
    - shard-adlp:         NOTRUN -> [SKIP][190] ([Intel XE#4915]) +415 other tests skip
   [190]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@xe_exec_system_allocator@once-mmap-huge-nomemset.html
    - shard-bmg:          NOTRUN -> [SKIP][191] ([Intel XE#4943]) +10 other tests skip
   [191]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-3/igt@xe_exec_system_allocator@once-mmap-huge-nomemset.html

  * igt@xe_exec_system_allocator@threads-many-stride-malloc-bo-unmap:
    - shard-dg2-set2:     NOTRUN -> [SKIP][192] ([Intel XE#4915]) +123 other tests skip
   [192]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-432/igt@xe_exec_system_allocator@threads-many-stride-malloc-bo-unmap.html

  * igt@xe_fault_injection@probe-fail-guc-xe_guc_mmio_send_recv:
    - shard-dg2-set2:     [PASS][193] -> [DMESG-WARN][194] ([Intel XE#5893])
   [193]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-dg2-434/igt@xe_fault_injection@probe-fail-guc-xe_guc_mmio_send_recv.html
   [194]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-464/igt@xe_fault_injection@probe-fail-guc-xe_guc_mmio_send_recv.html

  * igt@xe_live_ktest@xe_bo:
    - shard-dg2-set2:     NOTRUN -> [FAIL][195] ([Intel XE#3099]) +2 other tests fail
   [195]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-435/igt@xe_live_ktest@xe_bo.html
    - shard-adlp:         NOTRUN -> [SKIP][196] ([Intel XE#2229] / [Intel XE#455]) +1 other test skip
   [196]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@xe_live_ktest@xe_bo.html

  * igt@xe_live_ktest@xe_bo@xe_ccs_migrate_kunit:
    - shard-adlp:         NOTRUN -> [SKIP][197] ([Intel XE#2229])
   [197]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@xe_live_ktest@xe_bo@xe_ccs_migrate_kunit.html

  * igt@xe_mmap@pci-membarrier-parallel:
    - shard-adlp:         NOTRUN -> [SKIP][198] ([Intel XE#5100]) +1 other test skip
   [198]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@xe_mmap@pci-membarrier-parallel.html

  * igt@xe_mmap@vram:
    - shard-lnl:          NOTRUN -> [SKIP][199] ([Intel XE#1416])
   [199]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-2/igt@xe_mmap@vram.html

  * igt@xe_module_load@load:
    - shard-adlp:         ([PASS][200], [PASS][201], [PASS][202], [PASS][203], [PASS][204], [PASS][205], [PASS][206], [PASS][207], [PASS][208], [PASS][209], [PASS][210], [PASS][211], [PASS][212], [PASS][213], [PASS][214]) -> ([PASS][215], [PASS][216], [PASS][217], [PASS][218], [PASS][219], [PASS][220], [PASS][221], [PASS][222], [PASS][223], [PASS][224], [PASS][225], [PASS][226], [PASS][227], [SKIP][228], [PASS][229], [PASS][230]) ([Intel XE#378] / [Intel XE#5612])
   [200]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-8/igt@xe_module_load@load.html
   [201]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-8/igt@xe_module_load@load.html
   [202]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-1/igt@xe_module_load@load.html
   [203]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-8/igt@xe_module_load@load.html
   [204]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-9/igt@xe_module_load@load.html
   [205]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-9/igt@xe_module_load@load.html
   [206]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-9/igt@xe_module_load@load.html
   [207]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-1/igt@xe_module_load@load.html
   [208]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-2/igt@xe_module_load@load.html
   [209]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-6/igt@xe_module_load@load.html
   [210]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-6/igt@xe_module_load@load.html
   [211]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-6/igt@xe_module_load@load.html
   [212]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-2/igt@xe_module_load@load.html
   [213]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-2/igt@xe_module_load@load.html
   [214]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-1/igt@xe_module_load@load.html
   [215]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-8/igt@xe_module_load@load.html
   [216]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-8/igt@xe_module_load@load.html
   [217]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@xe_module_load@load.html
   [218]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@xe_module_load@load.html
   [219]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@xe_module_load@load.html
   [220]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@xe_module_load@load.html
   [221]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-6/igt@xe_module_load@load.html
   [222]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-6/igt@xe_module_load@load.html
   [223]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-6/igt@xe_module_load@load.html
   [224]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-2/igt@xe_module_load@load.html
   [225]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-8/igt@xe_module_load@load.html
   [226]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@xe_module_load@load.html
   [227]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@xe_module_load@load.html
   [228]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-8/igt@xe_module_load@load.html
   [229]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-2/igt@xe_module_load@load.html
   [230]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-2/igt@xe_module_load@load.html

  * igt@xe_oa@create-destroy-userspace-config:
    - shard-adlp:         NOTRUN -> [SKIP][231] ([Intel XE#3573]) +14 other tests skip
   [231]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-6/igt@xe_oa@create-destroy-userspace-config.html

  * igt@xe_oa@oa-unit-exclusive-stream-sample-oa:
    - shard-dg2-set2:     NOTRUN -> [SKIP][232] ([Intel XE#3573]) +2 other tests skip
   [232]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-433/igt@xe_oa@oa-unit-exclusive-stream-sample-oa.html

  * igt@xe_pat@pat-index-xehpc:
    - shard-adlp:         NOTRUN -> [SKIP][233] ([Intel XE#2838] / [Intel XE#979])
   [233]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@xe_pat@pat-index-xehpc.html

  * igt@xe_peer2peer@write:
    - shard-adlp:         NOTRUN -> [SKIP][234] ([Intel XE#1061] / [Intel XE#5568])
   [234]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@xe_peer2peer@write.html

  * igt@xe_pm@d3cold-basic:
    - shard-lnl:          NOTRUN -> [SKIP][235] ([Intel XE#2284] / [Intel XE#366])
   [235]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-4/igt@xe_pm@d3cold-basic.html
    - shard-dg2-set2:     NOTRUN -> [SKIP][236] ([Intel XE#2284] / [Intel XE#366])
   [236]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-435/igt@xe_pm@d3cold-basic.html

  * igt@xe_pm@d3cold-i2c:
    - shard-adlp:         NOTRUN -> [SKIP][237] ([Intel XE#5694])
   [237]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-2/igt@xe_pm@d3cold-i2c.html

  * igt@xe_pm@d3hot-mmap-vram:
    - shard-adlp:         NOTRUN -> [SKIP][238] ([Intel XE#1948])
   [238]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-2/igt@xe_pm@d3hot-mmap-vram.html

  * igt@xe_pm@s2idle-d3cold-basic-exec:
    - shard-adlp:         NOTRUN -> [SKIP][239] ([Intel XE#2284] / [Intel XE#366]) +2 other tests skip
   [239]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-9/igt@xe_pm@s2idle-d3cold-basic-exec.html

  * igt@xe_pm@s2idle-multiple-execs:
    - shard-dg2-set2:     NOTRUN -> [INCOMPLETE][240] ([Intel XE#4504])
   [240]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-464/igt@xe_pm@s2idle-multiple-execs.html

  * igt@xe_pm@s3-d3cold-basic-exec:
    - shard-bmg:          NOTRUN -> [SKIP][241] ([Intel XE#2284]) +1 other test skip
   [241]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-1/igt@xe_pm@s3-d3cold-basic-exec.html

  * igt@xe_pmu@all-fn-engine-activity-load:
    - shard-dg2-set2:     NOTRUN -> [SKIP][242] ([Intel XE#4650])
   [242]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-436/igt@xe_pmu@all-fn-engine-activity-load.html
    - shard-lnl:          NOTRUN -> [SKIP][243] ([Intel XE#4650])
   [243]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-3/igt@xe_pmu@all-fn-engine-activity-load.html

  * igt@xe_pmu@all-fn-engine-activity-load@engine-drm_xe_engine_class_render0:
    - shard-adlp:         NOTRUN -> [TIMEOUT][244] ([Intel XE#5213]) +1 other test timeout
   [244]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-2/igt@xe_pmu@all-fn-engine-activity-load@engine-drm_xe_engine_class_render0.html

  * igt@xe_pxp@pxp-stale-bo-exec-post-termination-irq:
    - shard-adlp:         NOTRUN -> [SKIP][245] ([Intel XE#4733] / [Intel XE#5594]) +3 other tests skip
   [245]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-8/igt@xe_pxp@pxp-stale-bo-exec-post-termination-irq.html

  * igt@xe_pxp@regular-src-to-pxp-dest-rendercopy:
    - shard-bmg:          NOTRUN -> [SKIP][246] ([Intel XE#4733])
   [246]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-2/igt@xe_pxp@regular-src-to-pxp-dest-rendercopy.html
    - shard-dg2-set2:     NOTRUN -> [SKIP][247] ([Intel XE#4733]) +1 other test skip
   [247]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-464/igt@xe_pxp@regular-src-to-pxp-dest-rendercopy.html

  * igt@xe_query@multigpu-query-config:
    - shard-bmg:          NOTRUN -> [SKIP][248] ([Intel XE#944])
   [248]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-7/igt@xe_query@multigpu-query-config.html
    - shard-dg2-set2:     NOTRUN -> [SKIP][249] ([Intel XE#944]) +1 other test skip
   [249]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-466/igt@xe_query@multigpu-query-config.html

  * igt@xe_query@multigpu-query-cs-cycles:
    - shard-adlp:         NOTRUN -> [SKIP][250] ([Intel XE#944]) +2 other tests skip
   [250]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-8/igt@xe_query@multigpu-query-cs-cycles.html

  * igt@xe_query@multigpu-query-topology:
    - shard-lnl:          NOTRUN -> [SKIP][251] ([Intel XE#944]) +1 other test skip
   [251]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-2/igt@xe_query@multigpu-query-topology.html

  * igt@xe_spin_batch@spin-mem-copy:
    - shard-adlp:         NOTRUN -> [SKIP][252] ([Intel XE#4821])
   [252]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@xe_spin_batch@spin-mem-copy.html

  * igt@xe_sriov_auto_provisioning@selfconfig-basic:
    - shard-lnl:          NOTRUN -> [SKIP][253] ([Intel XE#4130])
   [253]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-3/igt@xe_sriov_auto_provisioning@selfconfig-basic.html

  
#### Possible fixes ####

  * igt@kms_async_flips@async-flip-with-page-flip-events-linear@pipe-c-edp-1:
    - shard-lnl:          [FAIL][254] ([Intel XE#5993]) -> [PASS][255] +3 other tests pass
   [254]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-lnl-3/igt@kms_async_flips@async-flip-with-page-flip-events-linear@pipe-c-edp-1.html
   [255]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-7/igt@kms_async_flips@async-flip-with-page-flip-events-linear@pipe-c-edp-1.html

  * igt@kms_bw@connected-linear-tiling-2-displays-1920x1080p:
    - shard-bmg:          [SKIP][256] ([Intel XE#2314] / [Intel XE#2894]) -> [PASS][257]
   [256]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-6/igt@kms_bw@connected-linear-tiling-2-displays-1920x1080p.html
   [257]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-1/igt@kms_bw@connected-linear-tiling-2-displays-1920x1080p.html

  * igt@kms_cursor_legacy@cursorb-vs-flipa-varying-size:
    - shard-bmg:          [DMESG-WARN][258] ([Intel XE#5354]) -> [PASS][259]
   [258]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-5/igt@kms_cursor_legacy@cursorb-vs-flipa-varying-size.html
   [259]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-1/igt@kms_cursor_legacy@cursorb-vs-flipa-varying-size.html

  * igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions-varying-size:
    - shard-bmg:          [SKIP][260] ([Intel XE#2291]) -> [PASS][261] +1 other test pass
   [260]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-6/igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions-varying-size.html
   [261]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-4/igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions-varying-size.html

  * igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@cd-hdmi-a6-dp4:
    - shard-dg2-set2:     [FAIL][262] ([Intel XE#301] / [Intel XE#3149]) -> [PASS][263] +1 other test pass
   [262]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-dg2-434/igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@cd-hdmi-a6-dp4.html
   [263]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-466/igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@cd-hdmi-a6-dp4.html

  * igt@kms_flip@2x-flip-vs-expired-vblank@ac-hdmi-a6-dp4:
    - shard-dg2-set2:     [FAIL][264] ([Intel XE#301]) -> [PASS][265] +1 other test pass
   [264]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-dg2-435/igt@kms_flip@2x-flip-vs-expired-vblank@ac-hdmi-a6-dp4.html
   [265]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-464/igt@kms_flip@2x-flip-vs-expired-vblank@ac-hdmi-a6-dp4.html

  * igt@kms_flip@2x-modeset-vs-vblank-race:
    - shard-bmg:          [SKIP][266] ([Intel XE#2316]) -> [PASS][267] +7 other tests pass
   [266]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-6/igt@kms_flip@2x-modeset-vs-vblank-race.html
   [267]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-5/igt@kms_flip@2x-modeset-vs-vblank-race.html

  * igt@kms_flip@2x-plain-flip-fb-recreate-interruptible:
    - shard-bmg:          [FAIL][268] ([Intel XE#5416] / [Intel XE#6266]) -> [PASS][269]
   [268]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-8/igt@kms_flip@2x-plain-flip-fb-recreate-interruptible.html
   [269]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-2/igt@kms_flip@2x-plain-flip-fb-recreate-interruptible.html

  * igt@kms_flip@2x-plain-flip-fb-recreate-interruptible@ab-dp2-hdmi-a3:
    - shard-bmg:          [FAIL][270] ([Intel XE#5416]) -> [PASS][271]
   [270]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-8/igt@kms_flip@2x-plain-flip-fb-recreate-interruptible@ab-dp2-hdmi-a3.html
   [271]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-2/igt@kms_flip@2x-plain-flip-fb-recreate-interruptible@ab-dp2-hdmi-a3.html

  * igt@kms_flip@2x-plain-flip-ts-check-interruptible:
    - shard-bmg:          [FAIL][272] ([Intel XE#5408] / [Intel XE#5416]) -> [PASS][273] +1 other test pass
   [272]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-1/igt@kms_flip@2x-plain-flip-ts-check-interruptible.html
   [273]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-8/igt@kms_flip@2x-plain-flip-ts-check-interruptible.html

  * igt@kms_flip@basic-plain-flip@b-hdmi-a1:
    - shard-adlp:         [DMESG-WARN][274] ([Intel XE#4543]) -> [PASS][275] +3 other tests pass
   [274]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-2/igt@kms_flip@basic-plain-flip@b-hdmi-a1.html
   [275]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-1/igt@kms_flip@basic-plain-flip@b-hdmi-a1.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-dg2-set2:     [INCOMPLETE][276] ([Intel XE#2049] / [Intel XE#2597]) -> [PASS][277] +1 other test pass
   [276]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-dg2-433/igt@kms_flip@flip-vs-suspend-interruptible.html
   [277]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-432/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_frontbuffer_tracking@fbc-modesetfrombusy:
    - shard-lnl:          [DMESG-WARN][278] -> [PASS][279]
   [278]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-lnl-3/igt@kms_frontbuffer_tracking@fbc-modesetfrombusy.html
   [279]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-5/igt@kms_frontbuffer_tracking@fbc-modesetfrombusy.html

  * igt@kms_hdr@static-swap:
    - shard-bmg:          [SKIP][280] ([Intel XE#1503]) -> [PASS][281] +1 other test pass
   [280]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-6/igt@kms_hdr@static-swap.html
   [281]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-5/igt@kms_hdr@static-swap.html

  * igt@kms_joiner@invalid-modeset-force-big-joiner:
    - shard-bmg:          [SKIP][282] ([Intel XE#3012]) -> [PASS][283]
   [282]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-6/igt@kms_joiner@invalid-modeset-force-big-joiner.html
   [283]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-2/igt@kms_joiner@invalid-modeset-force-big-joiner.html

  * igt@kms_plane_multiple@2x-tiling-none:
    - shard-bmg:          [SKIP][284] ([Intel XE#4596]) -> [PASS][285]
   [284]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-6/igt@kms_plane_multiple@2x-tiling-none.html
   [285]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-7/igt@kms_plane_multiple@2x-tiling-none.html

  * igt@kms_pm_dc@dc6-psr:
    - shard-lnl:          [FAIL][286] ([Intel XE#718]) -> [PASS][287]
   [286]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-lnl-1/igt@kms_pm_dc@dc6-psr.html
   [287]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-5/igt@kms_pm_dc@dc6-psr.html

  * igt@kms_setmode@basic:
    - shard-bmg:          [FAIL][288] ([Intel XE#6361]) -> [PASS][289] +1 other test pass
   [288]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-5/igt@kms_setmode@basic.html
   [289]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-6/igt@kms_setmode@basic.html

  * igt@kms_setmode@basic@pipe-b-edp-1:
    - shard-lnl:          [FAIL][290] ([i915#15106]) -> [PASS][291] +2 other tests pass
   [290]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-lnl-2/igt@kms_setmode@basic@pipe-b-edp-1.html
   [291]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-3/igt@kms_setmode@basic@pipe-b-edp-1.html

  * igt@kms_setmode@invalid-clone-single-crtc:
    - shard-bmg:          [SKIP][292] ([Intel XE#1435]) -> [PASS][293]
   [292]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-6/igt@kms_setmode@invalid-clone-single-crtc.html
   [293]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-3/igt@kms_setmode@invalid-clone-single-crtc.html

  * igt@xe_fault_injection@vm-create-fail-xe_vm_create_scratch:
    - shard-bmg:          [ABORT][294] -> [PASS][295]
   [294]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-1/igt@xe_fault_injection@vm-create-fail-xe_vm_create_scratch.html
   [295]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-8/igt@xe_fault_injection@vm-create-fail-xe_vm_create_scratch.html

  * igt@xe_pm@s4-mocs:
    - shard-adlp:         [FAIL][296] -> [PASS][297]
   [296]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-adlp-1/igt@xe_pm@s4-mocs.html
   [297]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-adlp-8/igt@xe_pm@s4-mocs.html
    - shard-lnl:          [FAIL][298] -> [PASS][299]
   [298]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-lnl-8/igt@xe_pm@s4-mocs.html
   [299]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-lnl-8/igt@xe_pm@s4-mocs.html

  * igt@xe_pm@s4-vm-bind-unbind-all:
    - shard-bmg:          [FAIL][300] -> [PASS][301]
   [300]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-8/igt@xe_pm@s4-vm-bind-unbind-all.html
   [301]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-7/igt@xe_pm@s4-vm-bind-unbind-all.html
    - shard-dg2-set2:     [FAIL][302] -> [PASS][303]
   [302]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-dg2-464/igt@xe_pm@s4-vm-bind-unbind-all.html
   [303]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-433/igt@xe_pm@s4-vm-bind-unbind-all.html

  
#### Warnings ####

  * igt@kms_frontbuffer_tracking@drrs-2p-primscrn-spr-indfb-draw-render:
    - shard-bmg:          [SKIP][304] ([Intel XE#2311]) -> [SKIP][305] ([Intel XE#2312]) +11 other tests skip
   [304]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-3/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-spr-indfb-draw-render.html
   [305]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-6/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-spr-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-draw-mmap-wc:
    - shard-bmg:          [SKIP][306] ([Intel XE#2312]) -> [SKIP][307] ([Intel XE#5390]) +6 other tests skip
   [306]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-6/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-draw-mmap-wc.html
   [307]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-5/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-indfb-draw-mmap-wc:
    - shard-bmg:          [SKIP][308] ([Intel XE#5390]) -> [SKIP][309] ([Intel XE#2312]) +3 other tests skip
   [308]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-8/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-indfb-draw-mmap-wc.html
   [309]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-6/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-indfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-spr-indfb-draw-render:
    - shard-bmg:          [SKIP][310] ([Intel XE#2312]) -> [SKIP][311] ([Intel XE#2311]) +11 other tests skip
   [310]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-6/igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-spr-indfb-draw-render.html
   [311]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-1/igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-spr-indfb-draw-render.html

  * igt@kms_frontbuffer_tracking@psr-2p-primscrn-indfb-plflip-blt:
    - shard-bmg:          [SKIP][312] ([Intel XE#2312]) -> [SKIP][313] ([Intel XE#2313]) +13 other tests skip
   [312]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-6/igt@kms_frontbuffer_tracking@psr-2p-primscrn-indfb-plflip-blt.html
   [313]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-8/igt@kms_frontbuffer_tracking@psr-2p-primscrn-indfb-plflip-blt.html

  * igt@kms_frontbuffer_tracking@psr-2p-primscrn-spr-indfb-fullscreen:
    - shard-bmg:          [SKIP][314] ([Intel XE#2313]) -> [SKIP][315] ([Intel XE#2312]) +12 other tests skip
   [314]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-3/igt@kms_frontbuffer_tracking@psr-2p-primscrn-spr-indfb-fullscreen.html
   [315]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-6/igt@kms_frontbuffer_tracking@psr-2p-primscrn-spr-indfb-fullscreen.html

  * igt@kms_tiled_display@basic-test-pattern:
    - shard-dg2-set2:     [SKIP][316] ([Intel XE#362]) -> [FAIL][317] ([Intel XE#1729])
   [316]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-dg2-463/igt@kms_tiled_display@basic-test-pattern.html
   [317]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-dg2-436/igt@kms_tiled_display@basic-test-pattern.html

  * igt@kms_tiled_display@basic-test-pattern-with-chamelium:
    - shard-bmg:          [SKIP][318] ([Intel XE#2426]) -> [SKIP][319] ([Intel XE#2509])
   [318]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c/shard-bmg-2/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html
   [319]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/shard-bmg-1/igt@kms_tiled_display@basic-test-pattern-with-chamelium.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [Intel XE#1061]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1061
  [Intel XE#1122]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1122
  [Intel XE#1124]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1124
  [Intel XE#1125]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1125
  [Intel XE#1126]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1126
  [Intel XE#1127]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1127
  [Intel XE#1129]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1129
  [Intel XE#1135]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1135
  [Intel XE#1137]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1137
  [Intel XE#1138]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1138
  [Intel XE#1151]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1151
  [Intel XE#1178]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1178
  [Intel XE#1340]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1340
  [Intel XE#1392]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1392
  [Intel XE#1397]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1397
  [Intel XE#1406]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1406
  [Intel XE#1407]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1407
  [Intel XE#1416]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1416
  [Intel XE#1421]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1421
  [Intel XE#1424]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1424
  [Intel XE#1435]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1435
  [Intel XE#1447]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1447
  [Intel XE#1466]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1466
  [Intel XE#1475]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1475
  [Intel XE#1489]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1489
  [Intel XE#1499]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1499
  [Intel XE#1503]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1503
  [Intel XE#1727]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1727
  [Intel XE#1729]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1729
  [Intel XE#1745]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1745
  [Intel XE#1948]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1948
  [Intel XE#2049]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2049
  [Intel XE#2168]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2168
  [Intel XE#2191]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2191
  [Intel XE#2229]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2229
  [Intel XE#2234]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2234
  [Intel XE#2252]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2252
  [Intel XE#2284]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2284
  [Intel XE#2291]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2291
  [Intel XE#2293]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2293
  [Intel XE#2311]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2311
  [Intel XE#2312]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2312
  [Intel XE#2313]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2313
  [Intel XE#2314]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2314
  [Intel XE#2316]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2316
  [Intel XE#2320]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2320
  [Intel XE#2321]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2321
  [Intel XE#2322]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2322
  [Intel XE#2325]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2325
  [Intel XE#2327]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2327
  [Intel XE#2341]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2341
  [Intel XE#2360]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2360
  [Intel XE#2380]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2380
  [Intel XE#2426]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2426
  [Intel XE#2509]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2509
  [Intel XE#2597]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2597
  [Intel XE#261]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/261
  [Intel XE#2652]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2652
  [Intel XE#2763]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2763
  [Intel XE#2838]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2838
  [Intel XE#2850]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2850
  [Intel XE#288]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/288
  [Intel XE#2887]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2887
  [Intel XE#2893]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2893
  [Intel XE#2894]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2894
  [Intel XE#2907]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2907
  [Intel XE#2925]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2925
  [Intel XE#2939]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2939
  [Intel XE#2953]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2953
  [Intel XE#301]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/301
  [Intel XE#3012]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3012
  [Intel XE#306]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/306
  [Intel XE#307]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/307
  [Intel XE#308]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/308
  [Intel XE#309]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/309
  [Intel XE#3099]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3099
  [Intel XE#310]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/310
  [Intel XE#3113]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3113
  [Intel XE#3149]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3149
  [Intel XE#316]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/316
  [Intel XE#323]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/323
  [Intel XE#3304]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3304
  [Intel XE#3309]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3309
  [Intel XE#3414]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3414
  [Intel XE#3432]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3432
  [Intel XE#3573]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3573
  [Intel XE#3576]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3576
  [Intel XE#362]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/362
  [Intel XE#366]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/366
  [Intel XE#367]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/367
  [Intel XE#373]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/373
  [Intel XE#378]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/378
  [Intel XE#3868]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3868
  [Intel XE#3884]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3884
  [Intel XE#3904]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3904
  [Intel XE#4130]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4130
  [Intel XE#4173]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4173
  [Intel XE#4294]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4294
  [Intel XE#4298]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4298
  [Intel XE#4345]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4345
  [Intel XE#4417]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4417
  [Intel XE#4422]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4422
  [Intel XE#4459]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4459
  [Intel XE#4504]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4504
  [Intel XE#4543]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4543
  [Intel XE#455]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/455
  [Intel XE#4596]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4596
  [Intel XE#4608]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4608
  [Intel XE#4650]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4650
  [Intel XE#4665]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4665
  [Intel XE#4733]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4733
  [Intel XE#4821]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4821
  [Intel XE#4837]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4837
  [Intel XE#488]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/488
  [Intel XE#4915]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4915
  [Intel XE#4943]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4943
  [Intel XE#5007]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5007
  [Intel XE#5021]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5021
  [Intel XE#5100]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5100
  [Intel XE#5191]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5191
  [Intel XE#5208]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5208
  [Intel XE#5213]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5213
  [Intel XE#5299]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5299
  [Intel XE#5300]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5300
  [Intel XE#5354]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5354
  [Intel XE#5390]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5390
  [Intel XE#5408]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5408
  [Intel XE#5416]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5416
  [Intel XE#5561]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5561
  [Intel XE#5564]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5564
  [Intel XE#5565]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5565
  [Intel XE#5568]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5568
  [Intel XE#5574]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5574
  [Intel XE#5575]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5575
  [Intel XE#5580]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5580
  [Intel XE#5585]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5585
  [Intel XE#5594]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5594
  [Intel XE#5596]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5596
  [Intel XE#5607]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5607
  [Intel XE#5610]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5610
  [Intel XE#5612]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5612
  [Intel XE#5624]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5624
  [Intel XE#5626]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5626
  [Intel XE#5671]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5671
  [Intel XE#5694]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5694
  [Intel XE#5786]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5786
  [Intel XE#5893]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5893
  [Intel XE#599]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/599
  [Intel XE#5993]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5993
  [Intel XE#6032]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6032
  [Intel XE#6054]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6054
  [Intel XE#607]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/607
  [Intel XE#610]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/610
  [Intel XE#6168]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6168
  [Intel XE#6266]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6266
  [Intel XE#6281]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6281
  [Intel XE#6312]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6312
  [Intel XE#6318]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6318
  [Intel XE#6360]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6360
  [Intel XE#6361]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6361
  [Intel XE#6376]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6376
  [Intel XE#6377]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6377
  [Intel XE#651]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/651
  [Intel XE#653]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/653
  [Intel XE#656]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/656
  [Intel XE#688]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/688
  [Intel XE#703]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/703
  [Intel XE#718]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/718
  [Intel XE#734]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/734
  [Intel XE#776]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/776
  [Intel XE#787]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/787
  [Intel XE#836]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/836
  [Intel XE#870]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/870
  [Intel XE#929]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/929
  [Intel XE#944]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/944
  [Intel XE#979]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/979
  [i915#15106]: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/15106


Build changes
-------------

  * IGT: IGT_8591 -> IGT_8592
  * Linux: xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c -> xe-pw-156128v1

  IGT_8591: 8591
  IGT_8592: b3d809d537febc23792ab8d0eb6d13cf80d626c8 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  xe-3940-c809fdcf60b85e8a261eaa1b49f18b9c5731b18c: c809fdcf60b85e8a261eaa1b49f18b9c5731b18c
  xe-pw-156128v1: 156128v1

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156128v1/index.html

[-- Attachment #2: Type: text/html, Size: 102282 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-10-18 12:27 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-17 14:12 [PATCH v7 0/3] drm/xe/migrate: Atomicize CCS copy command setup Satyanarayana K V P
2025-10-17 14:12 ` [PATCH v7 1/3] " Satyanarayana K V P
2025-10-17 14:27   ` Ville Syrjälä
2025-10-17 15:16     ` K V P, Satyanarayana
2025-10-17 15:26       ` Ville Syrjälä
2025-10-17 16:29         ` K V P, Satyanarayana
2025-10-17 16:41           ` Rodrigo Vivi
2025-10-17 16:51           ` Ville Syrjälä
2025-10-17 18:21             ` Rodrigo Vivi
2025-10-17 22:35               ` Matthew Brost
2025-10-17 22:45                 ` Matt Roper
2025-10-17 22:35               ` Matt Roper
2025-10-17 22:59                 ` Matthew Brost
2025-10-17 18:11   ` Ville Syrjälä
2025-10-17 18:24     ` Rodrigo Vivi
2025-10-17 14:12 ` [PATCH v7 2/3] drm/xe/migrate: Make emit_pte() header write atomic Satyanarayana K V P
2025-10-17 14:12 ` [PATCH v7 3/3] drm/xe/vf: Clear CCS read/write buffers in atomic way Satyanarayana K V P
2025-10-17 14:17 ` ✗ CI.checkpatch: warning for drm/xe/migrate: Atomicize CCS copy command setup Patchwork
2025-10-17 14:18 ` ✓ CI.KUnit: success " Patchwork
2025-10-17 15:23 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-18 12:27 ` ✗ Xe.CI.Full: failure " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox