* [PATCH v8 0/3] drm/xe/migrate: Atomicize CCS copy command setup
@ 2025-10-24 13:35 Satyanarayana K V P
2025-10-24 13:35 ` [PATCH v8 1/3] drm/xe/migrate: Use AVX instructions to prevent partial writes during VF migration CCS batch buffer updates Satyanarayana K V P
` (6 more replies)
0 siblings, 7 replies; 15+ messages in thread
From: Satyanarayana K V P @ 2025-10-24 13:35 UTC (permalink / raw)
To: intel-xe; +Cc: Satyanarayana K V P
The CCS copy command is a 5-dword sequence. If the vCPU halts during
save/restore while this sequence is being programmed, partial writes may
trigger page faults when saving IGPU CCS metadata. Use the VMOVDQU
instruction to write the sequence atomically. Since VMOVDQU operates on
256-bit chunks, update EMIT_COPY_CCS_DW to emit 8 dwords instead of 5
dwords. Update emit_flush_invalidate() to use VMOVDQU operating with
128-bit chunks.
The MI_STORE_DATA_IMM instruction header is quad dword in size. If the
vCPU halts during save/restore while this sequence is being programmed,
partial writes may trigger page faults when saving IGPU CCS metadata.
Update instruction header atomically.
Clear the contents of the CCS read/write batch buffer, ensuring no page
faults / GPU hang occur if migration happens midway.
---
V7 -> V8:
- Updated commit title and message.
V6 -> V7:
- Added description explaining why to use assembly instructions for
atomicity.
- Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
- Include <asm/cpufeature.h> though checkpatch complains. With
<linux/cpufeature.h> KUnit is throwing errors.
V5 -> V6:
- Used xe_gt_assert() instead of xe_assert() (Matt B).
- Use emit_atomic() function to write MI_STORE_DATA_IMM instruction
(Matt B).
- Fixed review comments (Rodrigo)
V4 -> V5:
- Fixed review comments (Matt B)
V3 -> V4:
- Fixed review comments (Wajdeczko)
- Fix issues reported by patchworks.
V2 -> V3:
- Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
- Updated emit_flush_invalidate() to use vmovdqu instruction.
V1 -> V2:
- Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
(Auld, Matthew)
- Fix issues reported by patchworks.
Satyanarayana K V P (3):
drm/xe/migrate: Use AVX instructions to prevent partial writes during
VF migration CCS batch buffer updates
drm/xe/migrate: Make emit_pte() header write atomic
drm/xe/vf: Clear CCS read/write buffers in atomic way
drivers/gpu/drm/xe/xe_migrate.c | 262 ++++++++++++++++++++++++---
drivers/gpu/drm/xe/xe_migrate.h | 3 +
drivers/gpu/drm/xe/xe_sriov_vf_ccs.c | 5 +-
3 files changed, 245 insertions(+), 25 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v8 1/3] drm/xe/migrate: Use AVX instructions to prevent partial writes during VF migration CCS batch buffer updates
2025-10-24 13:35 [PATCH v8 0/3] drm/xe/migrate: Atomicize CCS copy command setup Satyanarayana K V P
@ 2025-10-24 13:35 ` Satyanarayana K V P
2025-10-24 13:57 ` Rodrigo Vivi
2025-10-24 13:35 ` [PATCH v8 2/3] drm/xe/migrate: Make emit_pte() header write atomic Satyanarayana K V P
` (5 subsequent siblings)
6 siblings, 1 reply; 15+ messages in thread
From: Satyanarayana K V P @ 2025-10-24 13:35 UTC (permalink / raw)
To: intel-xe
Cc: Satyanarayana K V P, Michal Wajdeczko, Matthew Brost,
Matthew Auld, Rodrigo Vivi, Matt Roper, Ville Syrjälä
VF KMD registers two specialized contexts with the GUC for migration
operations. Save context contain copy commands and PTEs to transfer CCS
metadata from GPU pools to system memory and restore context contain copy
commands and PTEs to transfer CCS metadata from system memory back to CCS
pools. GUC submits these contexts to HW during VF migration.
Each context uses a large batch buffer allocated via sub-allocator,
pre-filled with MI_NOOPs and terminated with MI_BATCH_BUFFER_END. During
BO lifecycle management, segments are dynamically allocated from this
buffer and populated with PTEs and copy commands for active BOs, then reset
to MI_NOOPs when BOs are destroyed.
The CCS copy operation requires a 5-dword command sequence to be written
to the batch buffer. During VF migration save/restore operations, if the
vCPU gets preempted or halted while this command sequence is being
programmed, partial writes can occur. These partial writes create
incomplete GPU instructions in the batch buffer, which trigger page faults
when the GUC submits the batch buffer to hardware for CCS metadata
operations.
Standard memory operations like memcpy() are preemptible, meaning the CPU
scheduler can interrupt execution midway through writing the command
sequence, leaving the batch buffer in an inconsistent state with partially
written GPU instructions.
Replace standard memory operations with x86 AVX instructions that provide
atomic, non-preemptible writes as AVX instructions cannot be preempted
during execution, ensuring complete command sequences are written
atomically to the batch buffer.
Expand EMIT_COPY_CCS_DW from 5 dwords to 8 dwords to align with 256-bit
VMOVDQU operations. Update emit_flush_invalidate() to use VMOVDQU
operating with 128-bit chunks. By ensuring GPU instruction headers
(3-dword and 5-dword sequences) are written atomically, we prevent partial
updates that could compromise migration stability.
This approach guarantees that batch buffer updates are completed entirely
or not at all, eliminating the page fault scenarios during VF migration
operations regardless of vCPU scheduling behavior.
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
V7 -> V8:
- Updated commit title and message.
V6 -> V7:
- Added description explaining why to use assembly instructions for
atomicity.
- Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
- Include <asm/cpufeature.h> though checkpatch complains. With
<linux/cpufeature.h> KUnit is throwing errors.
V5 -> V6:
- Fixed review comments (Rodrigo)
V4 -> V5:
- Fixed review comments. (Matt B)
V3 -> V4:
- Fixed review comments. (Wajdeczko)
- Fix issues reported by patchworks.
V2 -> V3:
- Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
- Updated emit_flush_invalidate() to use vmovdqu instruction.
V1 -> V2:
- Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
(Auld, Matthew)
- Fix issues reported by patchworks.
---
drivers/gpu/drm/xe/xe_migrate.c | 114 ++++++++++++++++++++++++++------
1 file changed, 93 insertions(+), 21 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 921c9c1ea41f..005dc26a0393 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -5,6 +5,8 @@
#include "xe_migrate.h"
+#include <asm/fpu/api.h>
+#include <asm/cpufeature.h>
#include <linux/bitfield.h>
#include <linux/sizes.h>
@@ -33,6 +35,7 @@
#include "xe_res_cursor.h"
#include "xe_sa.h"
#include "xe_sched_job.h"
+#include "xe_sriov_vf_ccs.h"
#include "xe_sync.h"
#include "xe_trace_bo.h"
#include "xe_validation.h"
@@ -657,18 +660,70 @@ static void emit_pte(struct xe_migrate *m,
}
}
-#define EMIT_COPY_CCS_DW 5
+/*
+ * VF KMD registers two special LRCs with the GuC to handle save/restore
+ * operations for CCS metadata on IGPU. GUC executes these LRCAs during
+ * VF state/restore operations.
+ *
+ * Each LRC contains a batch buffer pool that GuC submits to hardware during
+ * VF state save/restore operations. Since these operations can occur
+ * asynchronously at any time, we must ensure GPU instructions in the batch
+ * buffer are written atomically to prevent corruption from incomplete writes.
+ *
+ * To guarantee atomic instruction writes, we use x86 SIMD instructions
+ * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
+ * sections. This prevents vCPU preemption during instruction generation,
+ * ensuring complete GPU commands are written to the batch buffer.
+ */
+
+static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
+{
+ xe_assert(xe, !IS_DGFX(xe));
+ xe_assert(xe, IS_SRIOV_VF(xe));
+
+#ifdef CONFIG_X86
+ kernel_fpu_begin();
+ if (size == SZ_128) {
+ asm("vmovdqu (%0), %%xmm0\n"
+ "vmovups %%xmm0, (%1)\n"
+ :: "r" (src), "r" (dst) : "memory");
+ } else if (size == SZ_256) {
+ asm("vmovdqu (%0), %%ymm0\n"
+ "vmovups %%ymm0, (%1)\n"
+ :: "r" (src), "r" (dst) : "memory");
+ }
+ kernel_fpu_end();
+#endif
+}
+
+static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
+{
+ u32 instr_size = size * BITS_PER_BYTE;
+
+ xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
+
+ if (IS_VF_CCS_READY(gt_to_xe(gt))) {
+ xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
+ memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
+ } else {
+ memcpy(dst, src, size);
+ }
+}
+
+#define EMIT_COPY_CCS_DW 8
static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
u64 dst_ofs, bool dst_is_indirect,
u64 src_ofs, bool src_is_indirect,
u32 size)
{
+ u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
struct xe_device *xe = gt_to_xe(gt);
u32 *cs = bb->cs + bb->len;
u32 num_ccs_blks;
u32 num_pages;
u32 ccs_copy_size;
u32 mocs;
+ u32 i = 0;
if (GRAPHICS_VERx100(xe) >= 2000) {
num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
@@ -686,15 +741,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
}
- *cs++ = XY_CTRL_SURF_COPY_BLT |
- (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
- (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
- ccs_copy_size;
- *cs++ = lower_32_bits(src_ofs);
- *cs++ = upper_32_bits(src_ofs) | mocs;
- *cs++ = lower_32_bits(dst_ofs);
- *cs++ = upper_32_bits(dst_ofs) | mocs;
+ dw[i++] = XY_CTRL_SURF_COPY_BLT |
+ (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
+ (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
+ ccs_copy_size;
+ dw[i++] = lower_32_bits(src_ofs);
+ dw[i++] = upper_32_bits(src_ofs) | mocs;
+ dw[i++] = lower_32_bits(dst_ofs);
+ dw[i++] = upper_32_bits(dst_ofs) | mocs;
+ /*
+ * The CCS copy command is a 5-dword sequence. If the vCPU halts during
+ * save/restore while this sequence is being issued, partial writes may trigger
+ * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
+ * write the sequence atomically.
+ */
+ emit_atomic(gt, cs, dw, sizeof(dw));
+ cs += EMIT_COPY_CCS_DW;
bb->len = cs - bb->cs;
}
@@ -1061,18 +1124,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
}
-static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
+/*
+ * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
+ * save/restore while this sequence is being issued, partial writes may
+ * trigger page faults when saving iGPU CCS metadata. Use
+ * emit_atomic() to write the sequence atomically.
+ */
+#define EMIT_FLUSH_INVALIDATE_DW 4
+static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
{
u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
+ u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
+
+ dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
+ MI_FLUSH_IMM_DW | flags;
+ dw[j++] = lower_32_bits(addr);
+ dw[j++] = upper_32_bits(addr);
+ dw[j++] = MI_NOOP;
- dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
- MI_FLUSH_IMM_DW | flags;
- dw[i++] = lower_32_bits(addr);
- dw[i++] = upper_32_bits(addr);
- dw[i++] = MI_NOOP;
- dw[i++] = MI_NOOP;
+ emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
- return i;
+ return i + j;
}
/**
@@ -1117,7 +1189,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
/* Calculate Batch buffer size */
batch_size = 0;
while (size) {
- batch_size += 10; /* Flush + ggtt addr + 2 NOP */
+ batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
u64 ccs_ofs, ccs_size;
u32 ccs_pt;
@@ -1158,7 +1230,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
* sizes here again before copy command is emitted.
*/
while (size) {
- batch_size += 10; /* Flush + ggtt addr + 2 NOP */
+ batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
u32 flush_flags = 0;
u64 ccs_ofs, ccs_size;
u32 ccs_pt;
@@ -1181,11 +1253,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
- bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
+ bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
src_L0_ofs, dst_is_pltt,
src_L0, ccs_ofs, true);
- bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
+ bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
size -= src_L0;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v8 2/3] drm/xe/migrate: Make emit_pte() header write atomic
2025-10-24 13:35 [PATCH v8 0/3] drm/xe/migrate: Atomicize CCS copy command setup Satyanarayana K V P
2025-10-24 13:35 ` [PATCH v8 1/3] drm/xe/migrate: Use AVX instructions to prevent partial writes during VF migration CCS batch buffer updates Satyanarayana K V P
@ 2025-10-24 13:35 ` Satyanarayana K V P
2025-10-24 13:35 ` [PATCH v8 3/3] drm/xe/vf: Clear CCS read/write buffers in atomic way Satyanarayana K V P
` (4 subsequent siblings)
6 siblings, 0 replies; 15+ messages in thread
From: Satyanarayana K V P @ 2025-10-24 13:35 UTC (permalink / raw)
To: intel-xe; +Cc: Satyanarayana K V P, Michal Wajdeczko, Matthew Brost,
Matthew Auld
The MI_STORE_DATA_IMM instruction header is quad dword in size. If the
vCPU halts during save/restore while this sequence is being programmed,
partial writes may trigger page faults when saving IGPU CCS metadata.
Update instruction header atomically.
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
---
V7 -> V8:
- None.
V6 -> V7:
-None.
V5 -> V6:
- Use emit_atomic() function to write MI_STORE_DATA_IMM instruction
(Matt B).
V4 -> V5:
- Fixed review comments (Matt B).
V3 -> V4:
- New commit added.
V2 -> V3:
- None
V1 -> V2:
- None
---
drivers/gpu/drm/xe/xe_migrate.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 005dc26a0393..b5d36073194d 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -89,6 +89,8 @@ struct xe_migrate {
#define MAX_NUM_PTE 512
#define IDENTITY_OFFSET 256ULL
+static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size);
+
/*
* Although MI_STORE_DATA_IMM's "length" field is 10-bits, 0x3FE is the largest
* legal value accepted. Since that instruction field is always stored in
@@ -596,6 +598,7 @@ static u32 pte_update_size(struct xe_migrate *m,
return cmds;
}
+#define EMIT_STORE_DATA_IMM_DW 4
static void emit_pte(struct xe_migrate *m,
struct xe_bb *bb, u32 at_pt,
bool is_vram, bool is_comp_pte,
@@ -619,11 +622,16 @@ static void emit_pte(struct xe_migrate *m,
ptes = DIV_ROUND_UP(size, XE_PAGE_SIZE);
while (ptes) {
+ u32 dw[EMIT_STORE_DATA_IMM_DW] = {MI_NOOP}, i = 0;
u32 chunk = min(MAX_PTE_PER_SDI, ptes);
- bb->cs[bb->len++] = MI_STORE_DATA_IMM | MI_SDI_NUM_QW(chunk);
- bb->cs[bb->len++] = ofs;
- bb->cs[bb->len++] = 0;
+ dw[i++] = MI_STORE_DATA_IMM | MI_SDI_NUM_QW(chunk);
+ dw[i++] = ofs;
+ dw[i++] = 0;
+
+ emit_atomic(m->q->gt, &bb->cs[bb->len], dw, sizeof(dw));
+
+ bb->len += i;
cur_ofs = ofs;
ofs += chunk * 8;
--
2.51.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v8 3/3] drm/xe/vf: Clear CCS read/write buffers in atomic way
2025-10-24 13:35 [PATCH v8 0/3] drm/xe/migrate: Atomicize CCS copy command setup Satyanarayana K V P
2025-10-24 13:35 ` [PATCH v8 1/3] drm/xe/migrate: Use AVX instructions to prevent partial writes during VF migration CCS batch buffer updates Satyanarayana K V P
2025-10-24 13:35 ` [PATCH v8 2/3] drm/xe/migrate: Make emit_pte() header write atomic Satyanarayana K V P
@ 2025-10-24 13:35 ` Satyanarayana K V P
2025-10-24 14:40 ` ✗ CI.checkpatch: warning for drm/xe/migrate: Atomicize CCS copy command setup Patchwork
` (3 subsequent siblings)
6 siblings, 0 replies; 15+ messages in thread
From: Satyanarayana K V P @ 2025-10-24 13:35 UTC (permalink / raw)
To: intel-xe; +Cc: Satyanarayana K V P, Michal Wajdeczko, Matthew Brost,
Matthew Auld
Clear the contents of the CCS read/write batch buffer, ensuring no page
faults / GPU hang occur if migration happens midway.
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
---
V7 -> V8:
- None.
V6 -> V7:
- None.
V5 -> V6:
- Used xe_gt_assert() instead of xe_assert() (Matt B).
V4 -> V5:
- Fixed review comments (Matt B).
V3 -> V4:
- New commit added.
V2 -> V3:
- None
V1 -> V2:
- None
---
drivers/gpu/drm/xe/xe_migrate.c | 134 +++++++++++++++++++++++++++
drivers/gpu/drm/xe/xe_migrate.h | 3 +
drivers/gpu/drm/xe/xe_sriov_vf_ccs.c | 5 +-
3 files changed, 141 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index b5d36073194d..f171ea27bf84 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -668,6 +668,43 @@ static void emit_pte(struct xe_migrate *m,
}
}
+static void emit_pte_clear(struct xe_gt *gt, struct xe_bb *bb, int start_offset,
+ int end_offset)
+{
+ u32 dw_nop[SZ_2] = {MI_NOOP};
+ int i = start_offset;
+ int len = end_offset;
+ u32 *cs = bb->cs;
+
+ /* Reverses the operations performed by emit_pte() */
+ while (i < len) {
+ u32 dwords, qwords;
+
+ xe_gt_assert(gt, (REG_FIELD_GET(REG_GENMASK(31, 23), cs[i]) == 0x20));
+
+ qwords = REG_FIELD_GET(MI_SDI_LEN_DW, cs[i]);
+ /*
+ * If Store QW is enabled, then the value of the dwlengh
+ * includes the header, address and multiple QW pairs of data
+ * which means the values will be limited to odd values starting
+ * at a value of 3(3 representing the size of a 5 DW command
+ * including header, 2 dw address and 2 dw data).
+ */
+ dwords = qwords - 1;
+ /*
+ * Do not clear header first. Clear PTEs first and then clear the
+ * header to avoid page faults.
+ */
+ memset(&cs[i + 3], MI_NOOP, (dwords) * sizeof(u32));
+
+ xe_device_wmb(gt_to_xe(gt));
+ WRITE_ONCE(*(u64 *)&cs[i], *(u64 *)dw_nop);
+
+ cs[i + 2] = MI_NOOP;
+ i += (dwords + 3);
+ }
+}
+
/*
* VF KMD registers two special LRCs with the GuC to handle save/restore
* operations for CCS metadata on IGPU. GUC executes these LRCAs during
@@ -769,6 +806,18 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
bb->len = cs - bb->cs;
}
+static u32 emit_copy_ccs_clear(struct xe_gt *gt, struct xe_bb *bb, u32 offset)
+{
+ u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
+ u32 *cs = bb->cs + offset - EMIT_COPY_CCS_DW;
+
+ xe_gt_assert(gt, (REG_FIELD_GET(REG_GENMASK(31, 22), *cs) == 0x148));
+ emit_atomic(gt, cs, dw, sizeof(dw));
+ xe_device_wmb(gt_to_xe(gt));
+
+ return offset - EMIT_COPY_CCS_DW;
+}
+
#define EMIT_COPY_DW 10
static void emit_xy_fast_copy(struct xe_gt *gt, struct xe_bb *bb, u64 src_ofs,
u64 dst_ofs, unsigned int size,
@@ -1155,6 +1204,19 @@ static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 fl
return i + j;
}
+static u32 emit_flush_invalidate_clear(struct xe_gt *gt, struct xe_bb *bb,
+ u32 offset)
+{
+ u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP};
+ u32 *cs = bb->cs + offset - EMIT_FLUSH_INVALIDATE_DW;
+
+ xe_gt_assert(gt, (REG_FIELD_GET(REG_GENMASK(31, 23), *cs) == 0x26));
+
+ emit_atomic(gt, cs, dw, sizeof(dw));
+
+ return offset - EMIT_FLUSH_INVALIDATE_DW;
+}
+
/**
* xe_migrate_ccs_rw_copy() - Copy content of TTM resources.
* @tile: Tile whose migration context to be used.
@@ -1279,6 +1341,78 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
return err;
}
+static u32 ccs_rw_pte_size(struct xe_gt *gt, struct xe_bb *bb, u32 offset)
+{
+ int len = bb->len;
+ u32 *cs = bb->cs;
+ u32 i = offset;
+
+ while (i < len) {
+ u32 dwords, qwords;
+
+ xe_gt_assert(gt, (REG_FIELD_GET(REG_GENMASK(31, 23), cs[i]) == 0x20));
+
+ qwords = REG_FIELD_GET(MI_SDI_LEN_DW, cs[i]);
+ /*
+ * If Store QW is enabled, then the value of the dwlengh
+ * includes the header, address and multiple QW pairs of data
+ * which means the values will be limited to odd values starting
+ * at a value of 3(3 representing the size of a 5 DW command
+ * including header, 2 dw address and 2 dw data).
+ */
+ dwords = qwords - 1;
+ i += dwords + 3;
+
+ /*
+ * Break if the next dword is for emit_flush_invalidate_clear()
+ * or emit_copy_ccs_clear()
+ */
+ if ((REG_FIELD_GET(REG_GENMASK(31, 23), cs[i]) == 0x26) ||
+ (REG_FIELD_GET(REG_GENMASK(31, 22), cs[i]) == 0x148))
+ break;
+ }
+ return i;
+}
+
+/**
+ * xe_migrate_ccs_rw_copy_clear() - Clear the CCS read/write batch buffer
+ * content.
+ * @tile: Tile whose migration context to be used.
+ * @src_bo: The buffer object @src is currently bound to.
+ * @read_write : Creates BB commands for CCS read/write.
+ *
+ * The CCS copy command has three stages: PTE setup, TLB invalidation, and CCS
+ * copy. Each stage includes a header followed by instructions. When clearing,
+ * remove the instructions first, then the header. For the TLB invalidation and
+ * CCS copy stages, ensure the writes are atomic.
+ *
+ * This reverses the operations performed by xe_migrate_ccs_rw_copy().
+ *
+ * Returns: None.
+ */
+void xe_migrate_ccs_rw_copy_clear(struct xe_tile *tile, struct xe_bo *src_bo,
+ enum xe_sriov_vf_ccs_rw_ctxs read_write)
+{
+ struct xe_bb *bb = src_bo->bb_ccs[read_write];
+ u32 bb_offset = 0, bb_offset_chunk = 0;
+ struct xe_gt *gt = tile->primary_gt;
+
+ while (bb_offset_chunk >= 0 && bb_offset_chunk < bb->len) {
+ bb_offset = ccs_rw_pte_size(gt, bb, bb_offset_chunk);
+ /*
+ * After PTE entries, we have one TLB invalidation, CCS copy
+ * command and another TLB invalidation command.
+ */
+ bb_offset_chunk = bb_offset + EMIT_FLUSH_INVALIDATE_DW +
+ EMIT_COPY_CCS_DW + EMIT_FLUSH_INVALIDATE_DW;
+
+ bb_offset = emit_flush_invalidate_clear(gt, bb, bb_offset_chunk);
+ bb_offset = emit_copy_ccs_clear(gt, bb, bb_offset);
+ bb_offset = emit_flush_invalidate_clear(gt, bb, bb_offset);
+ emit_pte_clear(gt, bb, bb_offset_chunk, bb_offset);
+ }
+}
+
/**
* xe_get_migrate_exec_queue() - Get the execution queue from migrate context.
* @migrate: Migrate context.
diff --git a/drivers/gpu/drm/xe/xe_migrate.h b/drivers/gpu/drm/xe/xe_migrate.h
index 4fad324b6253..7d3d4c5109dd 100644
--- a/drivers/gpu/drm/xe/xe_migrate.h
+++ b/drivers/gpu/drm/xe/xe_migrate.h
@@ -129,6 +129,9 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
struct xe_bo *src_bo,
enum xe_sriov_vf_ccs_rw_ctxs read_write);
+void xe_migrate_ccs_rw_copy_clear(struct xe_tile *tile, struct xe_bo *src_bo,
+ enum xe_sriov_vf_ccs_rw_ctxs read_write);
+
struct xe_lrc *xe_migrate_lrc(struct xe_migrate *migrate);
struct xe_exec_queue *xe_migrate_exec_queue(struct xe_migrate *migrate);
int xe_migrate_access_memory(struct xe_migrate *m, struct xe_bo *bo,
diff --git a/drivers/gpu/drm/xe/xe_sriov_vf_ccs.c b/drivers/gpu/drm/xe/xe_sriov_vf_ccs.c
index 797a4b866226..bda838f4d59a 100644
--- a/drivers/gpu/drm/xe/xe_sriov_vf_ccs.c
+++ b/drivers/gpu/drm/xe/xe_sriov_vf_ccs.c
@@ -429,6 +429,7 @@ int xe_sriov_vf_ccs_detach_bo(struct xe_bo *bo)
{
struct xe_device *xe = xe_bo_device(bo);
enum xe_sriov_vf_ccs_rw_ctxs ctx_id;
+ struct xe_tile *tile;
struct xe_bb *bb;
xe_assert(xe, IS_VF_CCS_READY(xe));
@@ -436,12 +437,14 @@ int xe_sriov_vf_ccs_detach_bo(struct xe_bo *bo)
if (!xe_bo_has_valid_ccs_bb(bo))
return 0;
+ tile = xe_device_get_root_tile(xe);
+
for_each_ccs_rw_ctx(ctx_id) {
bb = bo->bb_ccs[ctx_id];
if (!bb)
continue;
- memset(bb->cs, MI_NOOP, bb->len * sizeof(u32));
+ xe_migrate_ccs_rw_copy_clear(tile, bo, ctx_id);
xe_bb_free(bb, NULL);
bo->bb_ccs[ctx_id] = NULL;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v8 1/3] drm/xe/migrate: Use AVX instructions to prevent partial writes during VF migration CCS batch buffer updates
2025-10-24 13:35 ` [PATCH v8 1/3] drm/xe/migrate: Use AVX instructions to prevent partial writes during VF migration CCS batch buffer updates Satyanarayana K V P
@ 2025-10-24 13:57 ` Rodrigo Vivi
2025-10-24 14:05 ` Ville Syrjälä
0 siblings, 1 reply; 15+ messages in thread
From: Rodrigo Vivi @ 2025-10-24 13:57 UTC (permalink / raw)
To: Satyanarayana K V P
Cc: intel-xe, Michal Wajdeczko, Matthew Brost, Matthew Auld,
Matt Roper, Ville Syrjälä
On Fri, Oct 24, 2025 at 07:05:24PM +0530, Satyanarayana K V P wrote:
Hi Satya,
First of all, thank you for the updates.
Second, the subject is way to big.
This should be enough and under 75 cols:
drm/xe: Use AVX instructions to prevent partial writes during VF pause
more below:
> VF KMD registers two specialized contexts with the GUC for migration
> operations. Save context contain copy commands and PTEs to transfer CCS
> metadata from GPU pools to system memory and restore context contain copy
> commands and PTEs to transfer CCS metadata from system memory back to CCS
> pools. GUC submits these contexts to HW during VF migration.
>
> Each context uses a large batch buffer allocated via sub-allocator,
> pre-filled with MI_NOOPs and terminated with MI_BATCH_BUFFER_END. During
> BO lifecycle management, segments are dynamically allocated from this
> buffer and populated with PTEs and copy commands for active BOs, then reset
> to MI_NOOPs when BOs are destroyed.
>
> The CCS copy operation requires a 5-dword command sequence to be written
> to the batch buffer. During VF migration save/restore operations, if the
> vCPU gets preempted or halted while this command sequence is being
> programmed, partial writes can occur. These partial writes create
> incomplete GPU instructions in the batch buffer, which trigger page faults
> when the GUC submits the batch buffer to hardware for CCS metadata
> operations.
Perhaps we could summarize the thing here and move details to the comment
near the assembly. The important part in the commit message is to have
the 'why'. Some of the details of the commands like MI_NOOP fill and all
could be in the comment near the ASM.
>
> Standard memory operations like memcpy() are preemptible, meaning the CPU
> scheduler can interrupt execution midway through writing the command
> sequence, leaving the batch buffer in an inconsistent state with partially
> written GPU instructions.
>
> Replace standard memory operations with x86 AVX instructions that provide
> atomic, non-preemptible writes as AVX instructions cannot be preempted
> during execution, ensuring complete command sequences are written
> atomically to the batch buffer.
>
> Expand EMIT_COPY_CCS_DW from 5 dwords to 8 dwords to align with 256-bit
> VMOVDQU operations. Update emit_flush_invalidate() to use VMOVDQU
> operating with 128-bit chunks. By ensuring GPU instruction headers
> (3-dword and 5-dword sequences) are written atomically, we prevent partial
> updates that could compromise migration stability.
>
> This approach guarantees that batch buffer updates are completed entirely
> or not at all, eliminating the page fault scenarios during VF migration
> operations regardless of vCPU scheduling behavior.
>
> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Matthew Auld <matthew.auld@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>
> ---
> V7 -> V8:
> - Updated commit title and message.
>
> V6 -> V7:
> - Added description explaining why to use assembly instructions for
> atomicity.
> - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
> - Include <asm/cpufeature.h> though checkpatch complains. With
> <linux/cpufeature.h> KUnit is throwing errors.
>
> V5 -> V6:
> - Fixed review comments (Rodrigo)
>
> V4 -> V5:
> - Fixed review comments. (Matt B)
>
> V3 -> V4:
> - Fixed review comments. (Wajdeczko)
> - Fix issues reported by patchworks.
>
> V2 -> V3:
> - Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
> - Updated emit_flush_invalidate() to use vmovdqu instruction.
>
> V1 -> V2:
> - Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
> (Auld, Matthew)
> - Fix issues reported by patchworks.
> ---
> drivers/gpu/drm/xe/xe_migrate.c | 114 ++++++++++++++++++++++++++------
> 1 file changed, 93 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index 921c9c1ea41f..005dc26a0393 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -5,6 +5,8 @@
>
> #include "xe_migrate.h"
>
> +#include <asm/fpu/api.h>
> +#include <asm/cpufeature.h>
> #include <linux/bitfield.h>
> #include <linux/sizes.h>
>
> @@ -33,6 +35,7 @@
> #include "xe_res_cursor.h"
> #include "xe_sa.h"
> #include "xe_sched_job.h"
> +#include "xe_sriov_vf_ccs.h"
> #include "xe_sync.h"
> #include "xe_trace_bo.h"
> #include "xe_validation.h"
> @@ -657,18 +660,70 @@ static void emit_pte(struct xe_migrate *m,
> }
> }
>
> -#define EMIT_COPY_CCS_DW 5
> +/*
> + * VF KMD registers two special LRCs with the GuC to handle save/restore
> + * operations for CCS metadata on IGPU. GUC executes these LRCAs during
> + * VF state/restore operations.
> + *
> + * Each LRC contains a batch buffer pool that GuC submits to hardware during
> + * VF state save/restore operations. Since these operations can occur
> + * asynchronously at any time, we must ensure GPU instructions in the batch
> + * buffer are written atomically to prevent corruption from incomplete writes.
> + *
> + * To guarantee atomic instruction writes, we use x86 SIMD instructions
Here you still mention 'atomic' since we already know this is not 'atomic'.
Let a summarized explanation in the commit message and put more here.
I'm sorry for being picky here, but I want to ensure that the information
around this code is clear so we don't keep having to explain this over
and over in the future.
> + * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
> + * sections. This prevents vCPU preemption during instruction generation,
> + * ensuring complete GPU commands are written to the batch buffer.
> + */
> +
> +static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
> +{
> + xe_assert(xe, !IS_DGFX(xe));
> + xe_assert(xe, IS_SRIOV_VF(xe));
> +
> +#ifdef CONFIG_X86
> + kernel_fpu_begin();
> + if (size == SZ_128) {
> + asm("vmovdqu (%0), %%xmm0\n"
> + "vmovups %%xmm0, (%1)\n"
> + :: "r" (src), "r" (dst) : "memory");
> + } else if (size == SZ_256) {
> + asm("vmovdqu (%0), %%ymm0\n"
> + "vmovups %%ymm0, (%1)\n"
> + :: "r" (src), "r" (dst) : "memory");
> + }
> + kernel_fpu_end();
> +#endif
> +}
> +
> +static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
> +{
> + u32 instr_size = size * BITS_PER_BYTE;
> +
> + xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
> +
> + if (IS_VF_CCS_READY(gt_to_xe(gt))) {
> + xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
> + memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
> + } else {
> + memcpy(dst, src, size);
> + }
> +}
> +
> +#define EMIT_COPY_CCS_DW 8
> static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> u64 dst_ofs, bool dst_is_indirect,
> u64 src_ofs, bool src_is_indirect,
> u32 size)
> {
> + u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
> struct xe_device *xe = gt_to_xe(gt);
> u32 *cs = bb->cs + bb->len;
> u32 num_ccs_blks;
> u32 num_pages;
> u32 ccs_copy_size;
> u32 mocs;
> + u32 i = 0;
>
> if (GRAPHICS_VERx100(xe) >= 2000) {
> num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
> @@ -686,15 +741,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
> }
>
> - *cs++ = XY_CTRL_SURF_COPY_BLT |
> - (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> - (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> - ccs_copy_size;
> - *cs++ = lower_32_bits(src_ofs);
> - *cs++ = upper_32_bits(src_ofs) | mocs;
> - *cs++ = lower_32_bits(dst_ofs);
> - *cs++ = upper_32_bits(dst_ofs) | mocs;
> + dw[i++] = XY_CTRL_SURF_COPY_BLT |
> + (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> + (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> + ccs_copy_size;
> + dw[i++] = lower_32_bits(src_ofs);
> + dw[i++] = upper_32_bits(src_ofs) | mocs;
> + dw[i++] = lower_32_bits(dst_ofs);
> + dw[i++] = upper_32_bits(dst_ofs) | mocs;
>
> + /*
> + * The CCS copy command is a 5-dword sequence. If the vCPU halts during
> + * save/restore while this sequence is being issued, partial writes may trigger
> + * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
> + * write the sequence atomically.
> + */
> + emit_atomic(gt, cs, dw, sizeof(dw));
> + cs += EMIT_COPY_CCS_DW;
> bb->len = cs - bb->cs;
> }
>
> @@ -1061,18 +1124,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
> return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
> }
>
> -static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
> +/*
> + * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
> + * save/restore while this sequence is being issued, partial writes may
> + * trigger page faults when saving iGPU CCS metadata. Use
> + * emit_atomic() to write the sequence atomically.
> + */
> +#define EMIT_FLUSH_INVALIDATE_DW 4
> +static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
> {
> u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
> + u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
> +
> + dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> + MI_FLUSH_IMM_DW | flags;
> + dw[j++] = lower_32_bits(addr);
> + dw[j++] = upper_32_bits(addr);
> + dw[j++] = MI_NOOP;
>
> - dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> - MI_FLUSH_IMM_DW | flags;
> - dw[i++] = lower_32_bits(addr);
> - dw[i++] = upper_32_bits(addr);
> - dw[i++] = MI_NOOP;
> - dw[i++] = MI_NOOP;
> + emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
>
> - return i;
> + return i + j;
> }
>
> /**
> @@ -1117,7 +1189,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> /* Calculate Batch buffer size */
> batch_size = 0;
> while (size) {
> - batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> + batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> u64 ccs_ofs, ccs_size;
> u32 ccs_pt;
>
> @@ -1158,7 +1230,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> * sizes here again before copy command is emitted.
> */
> while (size) {
> - batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> + batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> u32 flush_flags = 0;
> u64 ccs_ofs, ccs_size;
> u32 ccs_pt;
> @@ -1181,11 +1253,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
>
> emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
>
> - bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> + bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
> src_L0_ofs, dst_is_pltt,
> src_L0, ccs_ofs, true);
> - bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> + bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
>
> size -= src_L0;
> }
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8 1/3] drm/xe/migrate: Use AVX instructions to prevent partial writes during VF migration CCS batch buffer updates
2025-10-24 13:57 ` Rodrigo Vivi
@ 2025-10-24 14:05 ` Ville Syrjälä
2025-10-24 14:25 ` K V P, Satyanarayana
0 siblings, 1 reply; 15+ messages in thread
From: Ville Syrjälä @ 2025-10-24 14:05 UTC (permalink / raw)
To: Rodrigo Vivi
Cc: Satyanarayana K V P, intel-xe, Michal Wajdeczko, Matthew Brost,
Matthew Auld, Matt Roper
On Fri, Oct 24, 2025 at 09:57:15AM -0400, Rodrigo Vivi wrote:
> On Fri, Oct 24, 2025 at 07:05:24PM +0530, Satyanarayana K V P wrote:
>
> Hi Satya,
>
> First of all, thank you for the updates.
>
> Second, the subject is way to big.
>
> This should be enough and under 75 cols:
>
> drm/xe: Use AVX instructions to prevent partial writes during VF pause
>
> more below:
>
> > VF KMD registers two specialized contexts with the GUC for migration
> > operations. Save context contain copy commands and PTEs to transfer CCS
> > metadata from GPU pools to system memory and restore context contain copy
> > commands and PTEs to transfer CCS metadata from system memory back to CCS
> > pools. GUC submits these contexts to HW during VF migration.
> >
> > Each context uses a large batch buffer allocated via sub-allocator,
> > pre-filled with MI_NOOPs and terminated with MI_BATCH_BUFFER_END. During
> > BO lifecycle management, segments are dynamically allocated from this
> > buffer and populated with PTEs and copy commands for active BOs, then reset
> > to MI_NOOPs when BOs are destroyed.
> >
> > The CCS copy operation requires a 5-dword command sequence to be written
> > to the batch buffer. During VF migration save/restore operations, if the
> > vCPU gets preempted or halted while this command sequence is being
> > programmed, partial writes can occur. These partial writes create
> > incomplete GPU instructions in the batch buffer, which trigger page faults
> > when the GUC submits the batch buffer to hardware for CCS metadata
> > operations.
>
> Perhaps we could summarize the thing here and move details to the comment
> near the assembly. The important part in the commit message is to have
> the 'why'. Some of the details of the commands like MI_NOOP fill and all
> could be in the comment near the ASM.
>
> >
> > Standard memory operations like memcpy() are preemptible, meaning the CPU
> > scheduler can interrupt execution midway through writing the command
> > sequence, leaving the batch buffer in an inconsistent state with partially
> > written GPU instructions.
> >
> > Replace standard memory operations with x86 AVX instructions that provide
> > atomic, non-preemptible writes as AVX instructions cannot be preempted
> > during execution, ensuring complete command sequences are written
> > atomically to the batch buffer.
> >
> > Expand EMIT_COPY_CCS_DW from 5 dwords to 8 dwords to align with 256-bit
> > VMOVDQU operations. Update emit_flush_invalidate() to use VMOVDQU
> > operating with 128-bit chunks. By ensuring GPU instruction headers
> > (3-dword and 5-dword sequences) are written atomically, we prevent partial
> > updates that could compromise migration stability.
> >
> > This approach guarantees that batch buffer updates are completed entirely
> > or not at all, eliminating the page fault scenarios during VF migration
> > operations regardless of vCPU scheduling behavior.
> >
> > Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: Matthew Auld <matthew.auld@intel.com>
> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > Cc: Matt Roper <matthew.d.roper@intel.com>
> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >
> > ---
> > V7 -> V8:
> > - Updated commit title and message.
> >
> > V6 -> V7:
> > - Added description explaining why to use assembly instructions for
> > atomicity.
> > - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
> > - Include <asm/cpufeature.h> though checkpatch complains. With
> > <linux/cpufeature.h> KUnit is throwing errors.
> >
> > V5 -> V6:
> > - Fixed review comments (Rodrigo)
> >
> > V4 -> V5:
> > - Fixed review comments. (Matt B)
> >
> > V3 -> V4:
> > - Fixed review comments. (Wajdeczko)
> > - Fix issues reported by patchworks.
> >
> > V2 -> V3:
> > - Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
> > - Updated emit_flush_invalidate() to use vmovdqu instruction.
> >
> > V1 -> V2:
> > - Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
> > (Auld, Matthew)
> > - Fix issues reported by patchworks.
> > ---
> > drivers/gpu/drm/xe/xe_migrate.c | 114 ++++++++++++++++++++++++++------
> > 1 file changed, 93 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> > index 921c9c1ea41f..005dc26a0393 100644
> > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > @@ -5,6 +5,8 @@
> >
> > #include "xe_migrate.h"
> >
> > +#include <asm/fpu/api.h>
> > +#include <asm/cpufeature.h>
> > #include <linux/bitfield.h>
> > #include <linux/sizes.h>
> >
> > @@ -33,6 +35,7 @@
> > #include "xe_res_cursor.h"
> > #include "xe_sa.h"
> > #include "xe_sched_job.h"
> > +#include "xe_sriov_vf_ccs.h"
> > #include "xe_sync.h"
> > #include "xe_trace_bo.h"
> > #include "xe_validation.h"
> > @@ -657,18 +660,70 @@ static void emit_pte(struct xe_migrate *m,
> > }
> > }
> >
> > -#define EMIT_COPY_CCS_DW 5
> > +/*
> > + * VF KMD registers two special LRCs with the GuC to handle save/restore
> > + * operations for CCS metadata on IGPU. GUC executes these LRCAs during
> > + * VF state/restore operations.
> > + *
> > + * Each LRC contains a batch buffer pool that GuC submits to hardware during
> > + * VF state save/restore operations. Since these operations can occur
> > + * asynchronously at any time, we must ensure GPU instructions in the batch
> > + * buffer are written atomically to prevent corruption from incomplete writes.
> > + *
> > + * To guarantee atomic instruction writes, we use x86 SIMD instructions
>
> Here you still mention 'atomic' since we already know this is not 'atomic'.
I still don't see how is this supposed to do anything useful without
atomic writes to memory.
If the GPU is executing the same memory we're writing then nothing
short of atomic memory writes is going to actually fix it. And even
that would require careful alignment of things to guarantee that
each command is completely contained within one atomic write.
>
> Let a summarized explanation in the commit message and put more here.
>
> I'm sorry for being picky here, but I want to ensure that the information
> around this code is clear so we don't keep having to explain this over
> and over in the future.
>
> > + * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
> > + * sections. This prevents vCPU preemption during instruction generation,
> > + * ensuring complete GPU commands are written to the batch buffer.
> > + */
> > +
> > +static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
> > +{
> > + xe_assert(xe, !IS_DGFX(xe));
> > + xe_assert(xe, IS_SRIOV_VF(xe));
> > +
> > +#ifdef CONFIG_X86
> > + kernel_fpu_begin();
> > + if (size == SZ_128) {
> > + asm("vmovdqu (%0), %%xmm0\n"
> > + "vmovups %%xmm0, (%1)\n"
> > + :: "r" (src), "r" (dst) : "memory");
> > + } else if (size == SZ_256) {
> > + asm("vmovdqu (%0), %%ymm0\n"
> > + "vmovups %%ymm0, (%1)\n"
> > + :: "r" (src), "r" (dst) : "memory");
> > + }
> > + kernel_fpu_end();
> > +#endif
> > +}
> > +
> > +static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
> > +{
> > + u32 instr_size = size * BITS_PER_BYTE;
> > +
> > + xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
> > +
> > + if (IS_VF_CCS_READY(gt_to_xe(gt))) {
> > + xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
> > + memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
> > + } else {
> > + memcpy(dst, src, size);
> > + }
> > +}
> > +
> > +#define EMIT_COPY_CCS_DW 8
> > static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > u64 dst_ofs, bool dst_is_indirect,
> > u64 src_ofs, bool src_is_indirect,
> > u32 size)
> > {
> > + u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
> > struct xe_device *xe = gt_to_xe(gt);
> > u32 *cs = bb->cs + bb->len;
> > u32 num_ccs_blks;
> > u32 num_pages;
> > u32 ccs_copy_size;
> > u32 mocs;
> > + u32 i = 0;
> >
> > if (GRAPHICS_VERx100(xe) >= 2000) {
> > num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
> > @@ -686,15 +741,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
> > }
> >
> > - *cs++ = XY_CTRL_SURF_COPY_BLT |
> > - (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > - (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > - ccs_copy_size;
> > - *cs++ = lower_32_bits(src_ofs);
> > - *cs++ = upper_32_bits(src_ofs) | mocs;
> > - *cs++ = lower_32_bits(dst_ofs);
> > - *cs++ = upper_32_bits(dst_ofs) | mocs;
> > + dw[i++] = XY_CTRL_SURF_COPY_BLT |
> > + (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > + (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > + ccs_copy_size;
> > + dw[i++] = lower_32_bits(src_ofs);
> > + dw[i++] = upper_32_bits(src_ofs) | mocs;
> > + dw[i++] = lower_32_bits(dst_ofs);
> > + dw[i++] = upper_32_bits(dst_ofs) | mocs;
> >
> > + /*
> > + * The CCS copy command is a 5-dword sequence. If the vCPU halts during
> > + * save/restore while this sequence is being issued, partial writes may trigger
> > + * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
> > + * write the sequence atomically.
> > + */
> > + emit_atomic(gt, cs, dw, sizeof(dw));
> > + cs += EMIT_COPY_CCS_DW;
> > bb->len = cs - bb->cs;
> > }
> >
> > @@ -1061,18 +1124,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
> > return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
> > }
> >
> > -static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
> > +/*
> > + * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
> > + * save/restore while this sequence is being issued, partial writes may
> > + * trigger page faults when saving iGPU CCS metadata. Use
> > + * emit_atomic() to write the sequence atomically.
> > + */
> > +#define EMIT_FLUSH_INVALIDATE_DW 4
> > +static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
> > {
> > u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
> > + u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
> > +
> > + dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > + MI_FLUSH_IMM_DW | flags;
> > + dw[j++] = lower_32_bits(addr);
> > + dw[j++] = upper_32_bits(addr);
> > + dw[j++] = MI_NOOP;
> >
> > - dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > - MI_FLUSH_IMM_DW | flags;
> > - dw[i++] = lower_32_bits(addr);
> > - dw[i++] = upper_32_bits(addr);
> > - dw[i++] = MI_NOOP;
> > - dw[i++] = MI_NOOP;
> > + emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
> >
> > - return i;
> > + return i + j;
> > }
> >
> > /**
> > @@ -1117,7 +1189,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > /* Calculate Batch buffer size */
> > batch_size = 0;
> > while (size) {
> > - batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > + batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > u64 ccs_ofs, ccs_size;
> > u32 ccs_pt;
> >
> > @@ -1158,7 +1230,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > * sizes here again before copy command is emitted.
> > */
> > while (size) {
> > - batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > + batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > u32 flush_flags = 0;
> > u64 ccs_ofs, ccs_size;
> > u32 ccs_pt;
> > @@ -1181,11 +1253,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> >
> > emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
> >
> > - bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > + bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> > flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
> > src_L0_ofs, dst_is_pltt,
> > src_L0, ccs_ofs, true);
> > - bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > + bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> >
> > size -= src_L0;
> > }
> > --
> > 2.51.0
> >
--
Ville Syrjälä
Intel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8 1/3] drm/xe/migrate: Use AVX instructions to prevent partial writes during VF migration CCS batch buffer updates
2025-10-24 14:05 ` Ville Syrjälä
@ 2025-10-24 14:25 ` K V P, Satyanarayana
2025-10-24 15:40 ` Matthew Brost
2025-10-24 16:05 ` Matt Roper
0 siblings, 2 replies; 15+ messages in thread
From: K V P, Satyanarayana @ 2025-10-24 14:25 UTC (permalink / raw)
To: Ville Syrjälä, Rodrigo Vivi
Cc: intel-xe, Michal Wajdeczko, Matthew Brost, Matthew Auld,
Matt Roper
On 24-10-2025 19:35, Ville Syrjälä wrote:
> On Fri, Oct 24, 2025 at 09:57:15AM -0400, Rodrigo Vivi wrote:
>> On Fri, Oct 24, 2025 at 07:05:24PM +0530, Satyanarayana K V P wrote:
>>
>> Hi Satya,
>>
>> First of all, thank you for the updates.
>>
>> Second, the subject is way to big.
>>
>> This should be enough and under 75 cols:
>>
>> drm/xe: Use AVX instructions to prevent partial writes during VF pause
>>
>> more below:
>>
>>> VF KMD registers two specialized contexts with the GUC for migration
>>> operations. Save context contain copy commands and PTEs to transfer CCS
>>> metadata from GPU pools to system memory and restore context contain copy
>>> commands and PTEs to transfer CCS metadata from system memory back to CCS
>>> pools. GUC submits these contexts to HW during VF migration.
>>>
>>> Each context uses a large batch buffer allocated via sub-allocator,
>>> pre-filled with MI_NOOPs and terminated with MI_BATCH_BUFFER_END. During
>>> BO lifecycle management, segments are dynamically allocated from this
>>> buffer and populated with PTEs and copy commands for active BOs, then reset
>>> to MI_NOOPs when BOs are destroyed.
>>>
>>> The CCS copy operation requires a 5-dword command sequence to be written
>>> to the batch buffer. During VF migration save/restore operations, if the
>>> vCPU gets preempted or halted while this command sequence is being
>>> programmed, partial writes can occur. These partial writes create
>>> incomplete GPU instructions in the batch buffer, which trigger page faults
>>> when the GUC submits the batch buffer to hardware for CCS metadata
>>> operations.
>>
>> Perhaps we could summarize the thing here and move details to the comment
>> near the assembly. The important part in the commit message is to have
>> the 'why'. Some of the details of the commands like MI_NOOP fill and all
>> could be in the comment near the ASM.
>>
>>>
>>> Standard memory operations like memcpy() are preemptible, meaning the CPU
>>> scheduler can interrupt execution midway through writing the command
>>> sequence, leaving the batch buffer in an inconsistent state with partially
>>> written GPU instructions.
>>>
>>> Replace standard memory operations with x86 AVX instructions that provide
>>> atomic, non-preemptible writes as AVX instructions cannot be preempted
>>> during execution, ensuring complete command sequences are written
>>> atomically to the batch buffer.
>>>
>>> Expand EMIT_COPY_CCS_DW from 5 dwords to 8 dwords to align with 256-bit
>>> VMOVDQU operations. Update emit_flush_invalidate() to use VMOVDQU
>>> operating with 128-bit chunks. By ensuring GPU instruction headers
>>> (3-dword and 5-dword sequences) are written atomically, we prevent partial
>>> updates that could compromise migration stability.
>>>
>>> This approach guarantees that batch buffer updates are completed entirely
>>> or not at all, eliminating the page fault scenarios during VF migration
>>> operations regardless of vCPU scheduling behavior.
>>>
>>> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>> Cc: Matthew Auld <matthew.auld@intel.com>
>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> Cc: Matt Roper <matthew.d.roper@intel.com>
>>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>>
>>> ---
>>> V7 -> V8:
>>> - Updated commit title and message.
>>>
>>> V6 -> V7:
>>> - Added description explaining why to use assembly instructions for
>>> atomicity.
>>> - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
>>> - Include <asm/cpufeature.h> though checkpatch complains. With
>>> <linux/cpufeature.h> KUnit is throwing errors.
>>>
>>> V5 -> V6:
>>> - Fixed review comments (Rodrigo)
>>>
>>> V4 -> V5:
>>> - Fixed review comments. (Matt B)
>>>
>>> V3 -> V4:
>>> - Fixed review comments. (Wajdeczko)
>>> - Fix issues reported by patchworks.
>>>
>>> V2 -> V3:
>>> - Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
>>> - Updated emit_flush_invalidate() to use vmovdqu instruction.
>>>
>>> V1 -> V2:
>>> - Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
>>> (Auld, Matthew)
>>> - Fix issues reported by patchworks.
>>> ---
>>> drivers/gpu/drm/xe/xe_migrate.c | 114 ++++++++++++++++++++++++++------
>>> 1 file changed, 93 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
>>> index 921c9c1ea41f..005dc26a0393 100644
>>> --- a/drivers/gpu/drm/xe/xe_migrate.c
>>> +++ b/drivers/gpu/drm/xe/xe_migrate.c
>>> @@ -5,6 +5,8 @@
>>>
>>> #include "xe_migrate.h"
>>>
>>> +#include <asm/fpu/api.h>
>>> +#include <asm/cpufeature.h>
>>> #include <linux/bitfield.h>
>>> #include <linux/sizes.h>
>>>
>>> @@ -33,6 +35,7 @@
>>> #include "xe_res_cursor.h"
>>> #include "xe_sa.h"
>>> #include "xe_sched_job.h"
>>> +#include "xe_sriov_vf_ccs.h"
>>> #include "xe_sync.h"
>>> #include "xe_trace_bo.h"
>>> #include "xe_validation.h"
>>> @@ -657,18 +660,70 @@ static void emit_pte(struct xe_migrate *m,
>>> }
>>> }
>>>
>>> -#define EMIT_COPY_CCS_DW 5
>>> +/*
>>> + * VF KMD registers two special LRCs with the GuC to handle save/restore
>>> + * operations for CCS metadata on IGPU. GUC executes these LRCAs during
>>> + * VF state/restore operations.
>>> + *
>>> + * Each LRC contains a batch buffer pool that GuC submits to hardware during
>>> + * VF state save/restore operations. Since these operations can occur
>>> + * asynchronously at any time, we must ensure GPU instructions in the batch
>>> + * buffer are written atomically to prevent corruption from incomplete writes.
>>> + *
>>> + * To guarantee atomic instruction writes, we use x86 SIMD instructions
>>
>> Here you still mention 'atomic' since we already know this is not 'atomic'.
>
> I still don't see how is this supposed to do anything useful without
> atomic writes to memory.
>
> If the GPU is executing the same memory we're writing then nothing
> short of atomic memory writes is going to actually fix it. And even
> that would require careful alignment of things to guarantee that
> each command is completely contained within one atomic write.
>
The CPU and GPU operate on the same memory space but at different times
during VF migration. The critical issue occurs during the batch buffer
preparation phase when the vCPU is still active and writing GPU
instructions, while the GPU will later execute these same instructions
after the vCPU is paused.
During batch buffer updates, if the vCPU gets preempted while writing
GPU instruction sequences (such as the 5-dword CCS copy command), it
leaves partially written instructions in memory. When the GPU later
executes the batch buffer after vCPU suspension, these incomplete
instructions cause execution failures and page faults.
AVX instructions provide atomic write operations that cannot be
interrupted by the CPU scheduler. This ensures that GPU instruction
sequences are written completely before any potential vCPU preemption
occurs.
AVX instructions (VMOVDQU) guarantee that entire instruction sequences
are written in a single, non-preemptible operation. The 5-dword CCS copy
command is expanded to 8 dwords (padded with 3 MI_NOOPs) to meet AVX
256-bit alignment requirements. By the time the GPU executes the batch
buffer (after vCPU pause), all instructions are guaranteed to be
completely written.
Here we are ensuring that GPU instructions are fully formed before the
GPU attempts to execute them during the migration process.
-Satya.>>
>> Let a summarized explanation in the commit message and put more here.
>>
>> I'm sorry for being picky here, but I want to ensure that the information
>> around this code is clear so we don't keep having to explain this over
>> and over in the future.
>>
>>> + * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
>>> + * sections. This prevents vCPU preemption during instruction generation,
>>> + * ensuring complete GPU commands are written to the batch buffer.
>>> + */
>>> +
>>> +static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
>>> +{
>>> + xe_assert(xe, !IS_DGFX(xe));
>>> + xe_assert(xe, IS_SRIOV_VF(xe));
>>> +
>>> +#ifdef CONFIG_X86
>>> + kernel_fpu_begin();
>>> + if (size == SZ_128) {
>>> + asm("vmovdqu (%0), %%xmm0\n"
>>> + "vmovups %%xmm0, (%1)\n"
>>> + :: "r" (src), "r" (dst) : "memory");
>>> + } else if (size == SZ_256) {
>>> + asm("vmovdqu (%0), %%ymm0\n"
>>> + "vmovups %%ymm0, (%1)\n"
>>> + :: "r" (src), "r" (dst) : "memory");
>>> + }
>>> + kernel_fpu_end();
>>> +#endif
>>> +}
>>> +
>>> +static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
>>> +{
>>> + u32 instr_size = size * BITS_PER_BYTE;
>>> +
>>> + xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
>>> +
>>> + if (IS_VF_CCS_READY(gt_to_xe(gt))) {
>>> + xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
>>> + memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
>>> + } else {
>>> + memcpy(dst, src, size);
>>> + }
>>> +}
>>> +
>>> +#define EMIT_COPY_CCS_DW 8
>>> static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
>>> u64 dst_ofs, bool dst_is_indirect,
>>> u64 src_ofs, bool src_is_indirect,
>>> u32 size)
>>> {
>>> + u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
>>> struct xe_device *xe = gt_to_xe(gt);
>>> u32 *cs = bb->cs + bb->len;
>>> u32 num_ccs_blks;
>>> u32 num_pages;
>>> u32 ccs_copy_size;
>>> u32 mocs;
>>> + u32 i = 0;
>>>
>>> if (GRAPHICS_VERx100(xe) >= 2000) {
>>> num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
>>> @@ -686,15 +741,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
>>> mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
>>> }
>>>
>>> - *cs++ = XY_CTRL_SURF_COPY_BLT |
>>> - (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
>>> - (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
>>> - ccs_copy_size;
>>> - *cs++ = lower_32_bits(src_ofs);
>>> - *cs++ = upper_32_bits(src_ofs) | mocs;
>>> - *cs++ = lower_32_bits(dst_ofs);
>>> - *cs++ = upper_32_bits(dst_ofs) | mocs;
>>> + dw[i++] = XY_CTRL_SURF_COPY_BLT |
>>> + (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
>>> + (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
>>> + ccs_copy_size;
>>> + dw[i++] = lower_32_bits(src_ofs);
>>> + dw[i++] = upper_32_bits(src_ofs) | mocs;
>>> + dw[i++] = lower_32_bits(dst_ofs);
>>> + dw[i++] = upper_32_bits(dst_ofs) | mocs;
>>>
>>> + /*
>>> + * The CCS copy command is a 5-dword sequence. If the vCPU halts during
>>> + * save/restore while this sequence is being issued, partial writes may trigger
>>> + * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
>>> + * write the sequence atomically.
>>> + */
>>> + emit_atomic(gt, cs, dw, sizeof(dw));
>>> + cs += EMIT_COPY_CCS_DW;
>>> bb->len = cs - bb->cs;
>>> }
>>>
>>> @@ -1061,18 +1124,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
>>> return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
>>> }
>>>
>>> -static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
>>> +/*
>>> + * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
>>> + * save/restore while this sequence is being issued, partial writes may
>>> + * trigger page faults when saving iGPU CCS metadata. Use
>>> + * emit_atomic() to write the sequence atomically.
>>> + */
>>> +#define EMIT_FLUSH_INVALIDATE_DW 4
>>> +static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
>>> {
>>> u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
>>> + u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
>>> +
>>> + dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
>>> + MI_FLUSH_IMM_DW | flags;
>>> + dw[j++] = lower_32_bits(addr);
>>> + dw[j++] = upper_32_bits(addr);
>>> + dw[j++] = MI_NOOP;
>>>
>>> - dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
>>> - MI_FLUSH_IMM_DW | flags;
>>> - dw[i++] = lower_32_bits(addr);
>>> - dw[i++] = upper_32_bits(addr);
>>> - dw[i++] = MI_NOOP;
>>> - dw[i++] = MI_NOOP;
>>> + emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
>>>
>>> - return i;
>>> + return i + j;
>>> }
>>>
>>> /**
>>> @@ -1117,7 +1189,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
>>> /* Calculate Batch buffer size */
>>> batch_size = 0;
>>> while (size) {
>>> - batch_size += 10; /* Flush + ggtt addr + 2 NOP */
>>> + batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
>>> u64 ccs_ofs, ccs_size;
>>> u32 ccs_pt;
>>>
>>> @@ -1158,7 +1230,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
>>> * sizes here again before copy command is emitted.
>>> */
>>> while (size) {
>>> - batch_size += 10; /* Flush + ggtt addr + 2 NOP */
>>> + batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
>>> u32 flush_flags = 0;
>>> u64 ccs_ofs, ccs_size;
>>> u32 ccs_pt;
>>> @@ -1181,11 +1253,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
>>>
>>> emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
>>>
>>> - bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
>>> + bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
>>> flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
>>> src_L0_ofs, dst_is_pltt,
>>> src_L0, ccs_ofs, true);
>>> - bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
>>> + bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
>>>
>>> size -= src_L0;
>>> }
>>> --
>>> 2.51.0
>>>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* ✗ CI.checkpatch: warning for drm/xe/migrate: Atomicize CCS copy command setup
2025-10-24 13:35 [PATCH v8 0/3] drm/xe/migrate: Atomicize CCS copy command setup Satyanarayana K V P
` (2 preceding siblings ...)
2025-10-24 13:35 ` [PATCH v8 3/3] drm/xe/vf: Clear CCS read/write buffers in atomic way Satyanarayana K V P
@ 2025-10-24 14:40 ` Patchwork
2025-10-24 14:42 ` ✓ CI.KUnit: success " Patchwork
` (2 subsequent siblings)
6 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2025-10-24 14:40 UTC (permalink / raw)
To: K V P, Satyanarayana; +Cc: intel-xe
== Series Details ==
Series: drm/xe/migrate: Atomicize CCS copy command setup
URL : https://patchwork.freedesktop.org/series/156482/
State : warning
== Summary ==
+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
8677d3b99d5fd579c143b22605d99121e2482e8a
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 890ac51f3357618fb2b0bf70e694f639b3465eca
Author: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Date: Fri Oct 24 19:05:26 2025 +0530
drm/xe/vf: Clear CCS read/write buffers in atomic way
Clear the contents of the CCS read/write batch buffer, ensuring no page
faults / GPU hang occur if migration happens midway.
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
+ /mt/dim checkpatch 60a51825a1d699df170dfa797b099112e3a02de4 drm-intel
62940f330386 drm/xe/migrate: Use AVX instructions to prevent partial writes during VF migration CCS batch buffer updates
-:67: WARNING:INCLUDE_LINUX: Use #include <linux/cpufeature.h> instead of <asm/cpufeature.h>
#67: FILE: drivers/gpu/drm/xe/xe_migrate.c:9:
+#include <asm/cpufeature.h>
total: 0 errors, 1 warnings, 0 checks, 181 lines checked
63cd709531cb drm/xe/migrate: Make emit_pte() header write atomic
890ac51f3357 drm/xe/vf: Clear CCS read/write buffers in atomic way
^ permalink raw reply [flat|nested] 15+ messages in thread
* ✓ CI.KUnit: success for drm/xe/migrate: Atomicize CCS copy command setup
2025-10-24 13:35 [PATCH v8 0/3] drm/xe/migrate: Atomicize CCS copy command setup Satyanarayana K V P
` (3 preceding siblings ...)
2025-10-24 14:40 ` ✗ CI.checkpatch: warning for drm/xe/migrate: Atomicize CCS copy command setup Patchwork
@ 2025-10-24 14:42 ` Patchwork
2025-10-24 15:48 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-25 3:47 ` ✓ Xe.CI.Full: " Patchwork
6 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2025-10-24 14:42 UTC (permalink / raw)
To: K V P, Satyanarayana; +Cc: intel-xe
== Series Details ==
Series: drm/xe/migrate: Atomicize CCS copy command setup
URL : https://patchwork.freedesktop.org/series/156482/
State : success
== Summary ==
+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
[14:40:37] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[14:40:41] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=25
[14:41:19] Starting KUnit Kernel (1/1)...
[14:41:19] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[14:41:19] ================== guc_buf (11 subtests) ===================
[14:41:19] [PASSED] test_smallest
[14:41:19] [PASSED] test_largest
[14:41:19] [PASSED] test_granular
[14:41:19] [PASSED] test_unique
[14:41:19] [PASSED] test_overlap
[14:41:19] [PASSED] test_reusable
[14:41:19] [PASSED] test_too_big
[14:41:19] [PASSED] test_flush
[14:41:19] [PASSED] test_lookup
[14:41:19] [PASSED] test_data
[14:41:19] [PASSED] test_class
[14:41:19] ===================== [PASSED] guc_buf =====================
[14:41:19] =================== guc_dbm (7 subtests) ===================
[14:41:19] [PASSED] test_empty
[14:41:19] [PASSED] test_default
[14:41:19] ======================== test_size ========================
[14:41:19] [PASSED] 4
[14:41:19] [PASSED] 8
[14:41:19] [PASSED] 32
[14:41:19] [PASSED] 256
[14:41:19] ==================== [PASSED] test_size ====================
[14:41:19] ======================= test_reuse ========================
[14:41:19] [PASSED] 4
[14:41:19] [PASSED] 8
[14:41:19] [PASSED] 32
[14:41:19] [PASSED] 256
[14:41:19] =================== [PASSED] test_reuse ====================
[14:41:19] =================== test_range_overlap ====================
[14:41:19] [PASSED] 4
[14:41:19] [PASSED] 8
[14:41:19] [PASSED] 32
[14:41:19] [PASSED] 256
[14:41:19] =============== [PASSED] test_range_overlap ================
[14:41:19] =================== test_range_compact ====================
[14:41:19] [PASSED] 4
[14:41:19] [PASSED] 8
[14:41:19] [PASSED] 32
[14:41:19] [PASSED] 256
[14:41:19] =============== [PASSED] test_range_compact ================
[14:41:19] ==================== test_range_spare =====================
[14:41:19] [PASSED] 4
[14:41:19] [PASSED] 8
[14:41:19] [PASSED] 32
[14:41:19] [PASSED] 256
[14:41:19] ================ [PASSED] test_range_spare =================
[14:41:19] ===================== [PASSED] guc_dbm =====================
[14:41:19] =================== guc_idm (6 subtests) ===================
[14:41:19] [PASSED] bad_init
[14:41:19] [PASSED] no_init
[14:41:19] [PASSED] init_fini
[14:41:19] [PASSED] check_used
[14:41:19] [PASSED] check_quota
[14:41:19] [PASSED] check_all
[14:41:19] ===================== [PASSED] guc_idm =====================
[14:41:19] ================== no_relay (3 subtests) ===================
[14:41:19] [PASSED] xe_drops_guc2pf_if_not_ready
[14:41:19] [PASSED] xe_drops_guc2vf_if_not_ready
[14:41:19] [PASSED] xe_rejects_send_if_not_ready
[14:41:19] ==================== [PASSED] no_relay =====================
[14:41:19] ================== pf_relay (14 subtests) ==================
[14:41:19] [PASSED] pf_rejects_guc2pf_too_short
[14:41:19] [PASSED] pf_rejects_guc2pf_too_long
[14:41:19] [PASSED] pf_rejects_guc2pf_no_payload
[14:41:19] [PASSED] pf_fails_no_payload
[14:41:19] [PASSED] pf_fails_bad_origin
[14:41:19] [PASSED] pf_fails_bad_type
[14:41:19] [PASSED] pf_txn_reports_error
[14:41:19] [PASSED] pf_txn_sends_pf2guc
[14:41:19] [PASSED] pf_sends_pf2guc
[14:41:19] [SKIPPED] pf_loopback_nop
[14:41:19] [SKIPPED] pf_loopback_echo
[14:41:19] [SKIPPED] pf_loopback_fail
[14:41:19] [SKIPPED] pf_loopback_busy
[14:41:19] [SKIPPED] pf_loopback_retry
[14:41:19] ==================== [PASSED] pf_relay =====================
[14:41:19] ================== vf_relay (3 subtests) ===================
[14:41:19] [PASSED] vf_rejects_guc2vf_too_short
[14:41:19] [PASSED] vf_rejects_guc2vf_too_long
[14:41:19] [PASSED] vf_rejects_guc2vf_no_payload
[14:41:19] ==================== [PASSED] vf_relay =====================
[14:41:19] ===================== lmtt (1 subtest) =====================
[14:41:19] ======================== test_ops =========================
[14:41:19] [PASSED] 2-level
[14:41:19] [PASSED] multi-level
[14:41:19] ==================== [PASSED] test_ops =====================
[14:41:19] ====================== [PASSED] lmtt =======================
[14:41:19] ================= pf_service (11 subtests) =================
[14:41:19] [PASSED] pf_negotiate_any
[14:41:19] [PASSED] pf_negotiate_base_match
[14:41:19] [PASSED] pf_negotiate_base_newer
[14:41:19] [PASSED] pf_negotiate_base_next
[14:41:19] [SKIPPED] pf_negotiate_base_older
[14:41:19] [PASSED] pf_negotiate_base_prev
[14:41:19] [PASSED] pf_negotiate_latest_match
[14:41:19] [PASSED] pf_negotiate_latest_newer
[14:41:19] [PASSED] pf_negotiate_latest_next
[14:41:19] [SKIPPED] pf_negotiate_latest_older
[14:41:19] [SKIPPED] pf_negotiate_latest_prev
[14:41:19] =================== [PASSED] pf_service ====================
[14:41:19] ================= xe_guc_g2g (2 subtests) ==================
[14:41:19] ============== xe_live_guc_g2g_kunit_default ==============
[14:41:19] ========= [SKIPPED] xe_live_guc_g2g_kunit_default ==========
[14:41:19] ============== xe_live_guc_g2g_kunit_allmem ===============
[14:41:19] ========== [SKIPPED] xe_live_guc_g2g_kunit_allmem ==========
[14:41:19] =================== [SKIPPED] xe_guc_g2g ===================
[14:41:19] =================== xe_mocs (2 subtests) ===================
[14:41:19] ================ xe_live_mocs_kernel_kunit ================
[14:41:19] =========== [SKIPPED] xe_live_mocs_kernel_kunit ============
[14:41:19] ================ xe_live_mocs_reset_kunit =================
[14:41:19] ============ [SKIPPED] xe_live_mocs_reset_kunit ============
[14:41:19] ==================== [SKIPPED] xe_mocs =====================
[14:41:19] ================= xe_migrate (2 subtests) ==================
[14:41:19] ================= xe_migrate_sanity_kunit =================
[14:41:19] ============ [SKIPPED] xe_migrate_sanity_kunit =============
[14:41:19] ================== xe_validate_ccs_kunit ==================
[14:41:19] ============= [SKIPPED] xe_validate_ccs_kunit ==============
[14:41:19] =================== [SKIPPED] xe_migrate ===================
[14:41:19] ================== xe_dma_buf (1 subtest) ==================
[14:41:19] ==================== xe_dma_buf_kunit =====================
[14:41:19] ================ [SKIPPED] xe_dma_buf_kunit ================
[14:41:19] =================== [SKIPPED] xe_dma_buf ===================
[14:41:19] ================= xe_bo_shrink (1 subtest) =================
[14:41:19] =================== xe_bo_shrink_kunit ====================
[14:41:19] =============== [SKIPPED] xe_bo_shrink_kunit ===============
[14:41:19] ================== [SKIPPED] xe_bo_shrink ==================
[14:41:19] ==================== xe_bo (2 subtests) ====================
[14:41:19] ================== xe_ccs_migrate_kunit ===================
[14:41:19] ============== [SKIPPED] xe_ccs_migrate_kunit ==============
[14:41:19] ==================== xe_bo_evict_kunit ====================
[14:41:19] =============== [SKIPPED] xe_bo_evict_kunit ================
[14:41:19] ===================== [SKIPPED] xe_bo ======================
[14:41:19] ==================== args (11 subtests) ====================
[14:41:19] [PASSED] count_args_test
[14:41:19] [PASSED] call_args_example
[14:41:19] [PASSED] call_args_test
[14:41:19] [PASSED] drop_first_arg_example
[14:41:19] [PASSED] drop_first_arg_test
[14:41:19] [PASSED] first_arg_example
[14:41:19] [PASSED] first_arg_test
[14:41:19] [PASSED] last_arg_example
[14:41:19] [PASSED] last_arg_test
[14:41:19] [PASSED] pick_arg_example
[14:41:19] [PASSED] sep_comma_example
[14:41:19] ====================== [PASSED] args =======================
[14:41:19] =================== xe_pci (3 subtests) ====================
[14:41:19] ==================== check_graphics_ip ====================
[14:41:19] [PASSED] 12.00 Xe_LP
[14:41:19] [PASSED] 12.10 Xe_LP+
[14:41:19] [PASSED] 12.55 Xe_HPG
[14:41:19] [PASSED] 12.60 Xe_HPC
[14:41:19] [PASSED] 12.70 Xe_LPG
[14:41:19] [PASSED] 12.71 Xe_LPG
[14:41:19] [PASSED] 12.74 Xe_LPG+
[14:41:19] [PASSED] 20.01 Xe2_HPG
[14:41:19] [PASSED] 20.02 Xe2_HPG
[14:41:19] [PASSED] 20.04 Xe2_LPG
[14:41:19] [PASSED] 30.00 Xe3_LPG
[14:41:19] [PASSED] 30.01 Xe3_LPG
[14:41:19] [PASSED] 30.03 Xe3_LPG
[14:41:19] [PASSED] 30.04 Xe3_LPG
[14:41:19] [PASSED] 30.05 Xe3_LPG
[14:41:19] [PASSED] 35.11 Xe3p_XPC
[14:41:19] ================ [PASSED] check_graphics_ip ================
[14:41:19] ===================== check_media_ip ======================
[14:41:19] [PASSED] 12.00 Xe_M
[14:41:19] [PASSED] 12.55 Xe_HPM
[14:41:19] [PASSED] 13.00 Xe_LPM+
[14:41:19] [PASSED] 13.01 Xe2_HPM
[14:41:19] [PASSED] 20.00 Xe2_LPM
[14:41:19] [PASSED] 30.00 Xe3_LPM
[14:41:19] [PASSED] 30.02 Xe3_LPM
[14:41:19] [PASSED] 35.00 Xe3p_LPM
[14:41:19] [PASSED] 35.03 Xe3p_HPM
[14:41:19] ================= [PASSED] check_media_ip ==================
[14:41:19] =================== check_platform_desc ===================
[14:41:19] [PASSED] 0x9A60 (TIGERLAKE)
[14:41:19] [PASSED] 0x9A68 (TIGERLAKE)
[14:41:19] [PASSED] 0x9A70 (TIGERLAKE)
[14:41:19] [PASSED] 0x9A40 (TIGERLAKE)
[14:41:19] [PASSED] 0x9A49 (TIGERLAKE)
[14:41:19] [PASSED] 0x9A59 (TIGERLAKE)
[14:41:19] [PASSED] 0x9A78 (TIGERLAKE)
[14:41:19] [PASSED] 0x9AC0 (TIGERLAKE)
[14:41:19] [PASSED] 0x9AC9 (TIGERLAKE)
[14:41:19] [PASSED] 0x9AD9 (TIGERLAKE)
[14:41:19] [PASSED] 0x9AF8 (TIGERLAKE)
[14:41:19] [PASSED] 0x4C80 (ROCKETLAKE)
[14:41:19] [PASSED] 0x4C8A (ROCKETLAKE)
[14:41:19] [PASSED] 0x4C8B (ROCKETLAKE)
[14:41:19] [PASSED] 0x4C8C (ROCKETLAKE)
[14:41:19] [PASSED] 0x4C90 (ROCKETLAKE)
[14:41:19] [PASSED] 0x4C9A (ROCKETLAKE)
[14:41:19] [PASSED] 0x4680 (ALDERLAKE_S)
[14:41:19] [PASSED] 0x4682 (ALDERLAKE_S)
[14:41:19] [PASSED] 0x4688 (ALDERLAKE_S)
[14:41:19] [PASSED] 0x468A (ALDERLAKE_S)
[14:41:19] [PASSED] 0x468B (ALDERLAKE_S)
[14:41:19] [PASSED] 0x4690 (ALDERLAKE_S)
[14:41:19] [PASSED] 0x4692 (ALDERLAKE_S)
[14:41:19] [PASSED] 0x4693 (ALDERLAKE_S)
[14:41:19] [PASSED] 0x46A0 (ALDERLAKE_P)
[14:41:19] [PASSED] 0x46A1 (ALDERLAKE_P)
[14:41:19] [PASSED] 0x46A2 (ALDERLAKE_P)
[14:41:19] [PASSED] 0x46A3 (ALDERLAKE_P)
[14:41:19] [PASSED] 0x46A6 (ALDERLAKE_P)
[14:41:19] [PASSED] 0x46A8 (ALDERLAKE_P)
[14:41:19] [PASSED] 0x46AA (ALDERLAKE_P)
[14:41:19] [PASSED] 0x462A (ALDERLAKE_P)
[14:41:19] [PASSED] 0x4626 (ALDERLAKE_P)
[14:41:19] [PASSED] 0x4628 (ALDERLAKE_P)
[14:41:19] [PASSED] 0x46B0 (ALDERLAKE_P)
[14:41:19] [PASSED] 0x46B1 (ALDERLAKE_P)
[14:41:19] [PASSED] 0x46B2 (ALDERLAKE_P)
[14:41:19] [PASSED] 0x46B3 (ALDERLAKE_P)
[14:41:19] [PASSED] 0x46C0 (ALDERLAKE_P)
[14:41:19] [PASSED] 0x46C1 (ALDERLAKE_P)
[14:41:19] [PASSED] 0x46C2 (ALDERLAKE_P)
[14:41:19] [PASSED] 0x46C3 (ALDERLAKE_P)
[14:41:19] [PASSED] 0x46D0 (ALDERLAKE_N)
[14:41:19] [PASSED] 0x46D1 (ALDERLAKE_N)
[14:41:19] [PASSED] 0x46D2 (ALDERLAKE_N)
[14:41:19] [PASSED] 0x46D3 (ALDERLAKE_N)
[14:41:19] [PASSED] 0x46D4 (ALDERLAKE_N)
[14:41:19] [PASSED] 0xA721 (ALDERLAKE_P)
[14:41:19] [PASSED] 0xA7A1 (ALDERLAKE_P)
[14:41:19] [PASSED] 0xA7A9 (ALDERLAKE_P)
[14:41:19] [PASSED] 0xA7AC (ALDERLAKE_P)
[14:41:19] [PASSED] 0xA7AD (ALDERLAKE_P)
[14:41:19] [PASSED] 0xA720 (ALDERLAKE_P)
[14:41:19] [PASSED] 0xA7A0 (ALDERLAKE_P)
[14:41:19] [PASSED] 0xA7A8 (ALDERLAKE_P)
[14:41:19] [PASSED] 0xA7AA (ALDERLAKE_P)
[14:41:19] [PASSED] 0xA7AB (ALDERLAKE_P)
[14:41:19] [PASSED] 0xA780 (ALDERLAKE_S)
[14:41:19] [PASSED] 0xA781 (ALDERLAKE_S)
[14:41:19] [PASSED] 0xA782 (ALDERLAKE_S)
[14:41:19] [PASSED] 0xA783 (ALDERLAKE_S)
[14:41:19] [PASSED] 0xA788 (ALDERLAKE_S)
[14:41:19] [PASSED] 0xA789 (ALDERLAKE_S)
[14:41:19] [PASSED] 0xA78A (ALDERLAKE_S)
[14:41:19] [PASSED] 0xA78B (ALDERLAKE_S)
[14:41:19] [PASSED] 0x4905 (DG1)
[14:41:19] [PASSED] 0x4906 (DG1)
[14:41:19] [PASSED] 0x4907 (DG1)
[14:41:19] [PASSED] 0x4908 (DG1)
[14:41:19] [PASSED] 0x4909 (DG1)
[14:41:19] [PASSED] 0x56C0 (DG2)
[14:41:19] [PASSED] 0x56C2 (DG2)
[14:41:19] [PASSED] 0x56C1 (DG2)
[14:41:19] [PASSED] 0x7D51 (METEORLAKE)
[14:41:19] [PASSED] 0x7DD1 (METEORLAKE)
[14:41:19] [PASSED] 0x7D41 (METEORLAKE)
[14:41:19] [PASSED] 0x7D67 (METEORLAKE)
[14:41:19] [PASSED] 0xB640 (METEORLAKE)
[14:41:19] [PASSED] 0x56A0 (DG2)
[14:41:19] [PASSED] 0x56A1 (DG2)
[14:41:19] [PASSED] 0x56A2 (DG2)
[14:41:19] [PASSED] 0x56BE (DG2)
[14:41:19] [PASSED] 0x56BF (DG2)
[14:41:19] [PASSED] 0x5690 (DG2)
[14:41:19] [PASSED] 0x5691 (DG2)
[14:41:19] [PASSED] 0x5692 (DG2)
[14:41:19] [PASSED] 0x56A5 (DG2)
[14:41:19] [PASSED] 0x56A6 (DG2)
[14:41:19] [PASSED] 0x56B0 (DG2)
[14:41:19] [PASSED] 0x56B1 (DG2)
[14:41:19] [PASSED] 0x56BA (DG2)
[14:41:19] [PASSED] 0x56BB (DG2)
[14:41:19] [PASSED] 0x56BC (DG2)
[14:41:19] [PASSED] 0x56BD (DG2)
[14:41:19] [PASSED] 0x5693 (DG2)
[14:41:19] [PASSED] 0x5694 (DG2)
[14:41:19] [PASSED] 0x5695 (DG2)
[14:41:19] [PASSED] 0x56A3 (DG2)
[14:41:19] [PASSED] 0x56A4 (DG2)
[14:41:19] [PASSED] 0x56B2 (DG2)
[14:41:19] [PASSED] 0x56B3 (DG2)
[14:41:19] [PASSED] 0x5696 (DG2)
[14:41:19] [PASSED] 0x5697 (DG2)
[14:41:19] [PASSED] 0xB69 (PVC)
[14:41:19] [PASSED] 0xB6E (PVC)
[14:41:19] [PASSED] 0xBD4 (PVC)
[14:41:19] [PASSED] 0xBD5 (PVC)
[14:41:19] [PASSED] 0xBD6 (PVC)
[14:41:19] [PASSED] 0xBD7 (PVC)
[14:41:19] [PASSED] 0xBD8 (PVC)
[14:41:19] [PASSED] 0xBD9 (PVC)
[14:41:19] [PASSED] 0xBDA (PVC)
[14:41:19] [PASSED] 0xBDB (PVC)
[14:41:19] [PASSED] 0xBE0 (PVC)
[14:41:19] [PASSED] 0xBE1 (PVC)
[14:41:19] [PASSED] 0xBE5 (PVC)
[14:41:19] [PASSED] 0x7D40 (METEORLAKE)
[14:41:19] [PASSED] 0x7D45 (METEORLAKE)
[14:41:19] [PASSED] 0x7D55 (METEORLAKE)
[14:41:19] [PASSED] 0x7D60 (METEORLAKE)
[14:41:19] [PASSED] 0x7DD5 (METEORLAKE)
[14:41:19] [PASSED] 0x6420 (LUNARLAKE)
[14:41:19] [PASSED] 0x64A0 (LUNARLAKE)
[14:41:19] [PASSED] 0x64B0 (LUNARLAKE)
[14:41:19] [PASSED] 0xE202 (BATTLEMAGE)
[14:41:19] [PASSED] 0xE209 (BATTLEMAGE)
[14:41:19] [PASSED] 0xE20B (BATTLEMAGE)
[14:41:19] [PASSED] 0xE20C (BATTLEMAGE)
[14:41:19] [PASSED] 0xE20D (BATTLEMAGE)
[14:41:19] [PASSED] 0xE210 (BATTLEMAGE)
[14:41:19] [PASSED] 0xE211 (BATTLEMAGE)
[14:41:19] [PASSED] 0xE212 (BATTLEMAGE)
[14:41:19] [PASSED] 0xE216 (BATTLEMAGE)
[14:41:19] [PASSED] 0xE220 (BATTLEMAGE)
[14:41:19] [PASSED] 0xE221 (BATTLEMAGE)
[14:41:19] [PASSED] 0xE222 (BATTLEMAGE)
[14:41:19] [PASSED] 0xE223 (BATTLEMAGE)
[14:41:19] [PASSED] 0xB080 (PANTHERLAKE)
[14:41:19] [PASSED] 0xB081 (PANTHERLAKE)
[14:41:19] [PASSED] 0xB082 (PANTHERLAKE)
[14:41:19] [PASSED] 0xB083 (PANTHERLAKE)
[14:41:19] [PASSED] 0xB084 (PANTHERLAKE)
[14:41:19] [PASSED] 0xB085 (PANTHERLAKE)
[14:41:19] [PASSED] 0xB086 (PANTHERLAKE)
[14:41:19] [PASSED] 0xB087 (PANTHERLAKE)
[14:41:19] [PASSED] 0xB08F (PANTHERLAKE)
[14:41:19] [PASSED] 0xB090 (PANTHERLAKE)
[14:41:19] [PASSED] 0xB0A0 (PANTHERLAKE)
[14:41:19] [PASSED] 0xB0B0 (PANTHERLAKE)
[14:41:19] [PASSED] 0xFD80 (PANTHERLAKE)
[14:41:19] [PASSED] 0xFD81 (PANTHERLAKE)
[14:41:19] [PASSED] 0xD740 (NOVALAKE_S)
[14:41:19] [PASSED] 0xD741 (NOVALAKE_S)
[14:41:19] [PASSED] 0xD742 (NOVALAKE_S)
[14:41:19] [PASSED] 0xD743 (NOVALAKE_S)
[14:41:19] [PASSED] 0xD744 (NOVALAKE_S)
[14:41:19] [PASSED] 0xD745 (NOVALAKE_S)
[14:41:19] [PASSED] 0x674C (CRESCENTISLAND)
[14:41:19] =============== [PASSED] check_platform_desc ===============
[14:41:19] ===================== [PASSED] xe_pci ======================
[14:41:19] =================== xe_rtp (2 subtests) ====================
[14:41:19] =============== xe_rtp_process_to_sr_tests ================
[14:41:19] [PASSED] coalesce-same-reg
[14:41:19] [PASSED] no-match-no-add
[14:41:19] [PASSED] match-or
[14:41:19] [PASSED] match-or-xfail
[14:41:19] [PASSED] no-match-no-add-multiple-rules
[14:41:19] [PASSED] two-regs-two-entries
[14:41:19] [PASSED] clr-one-set-other
[14:41:19] [PASSED] set-field
[14:41:19] [PASSED] conflict-duplicate
[14:41:19] [PASSED] conflict-not-disjoint
[14:41:19] [PASSED] conflict-reg-type
[14:41:19] =========== [PASSED] xe_rtp_process_to_sr_tests ============
[14:41:19] ================== xe_rtp_process_tests ===================
[14:41:19] [PASSED] active1
[14:41:19] [PASSED] active2
[14:41:19] [PASSED] active-inactive
[14:41:19] [PASSED] inactive-active
[14:41:19] [PASSED] inactive-1st_or_active-inactive
[14:41:19] [PASSED] inactive-2nd_or_active-inactive
[14:41:19] [PASSED] inactive-last_or_active-inactive
stty: 'standard input': Inappropriate ioctl for device
[14:41:19] [PASSED] inactive-no_or_active-inactive
[14:41:19] ============== [PASSED] xe_rtp_process_tests ===============
[14:41:19] ===================== [PASSED] xe_rtp ======================
[14:41:19] ==================== xe_wa (1 subtest) =====================
[14:41:19] ======================== xe_wa_gt =========================
[14:41:19] [PASSED] TIGERLAKE B0
[14:41:19] [PASSED] DG1 A0
[14:41:19] [PASSED] DG1 B0
[14:41:19] [PASSED] ALDERLAKE_S A0
[14:41:19] [PASSED] ALDERLAKE_S B0
[14:41:19] [PASSED] ALDERLAKE_S C0
[14:41:19] [PASSED] ALDERLAKE_S D0
[14:41:19] [PASSED] ALDERLAKE_P A0
[14:41:19] [PASSED] ALDERLAKE_P B0
[14:41:19] [PASSED] ALDERLAKE_P C0
[14:41:19] [PASSED] ALDERLAKE_S RPLS D0
[14:41:19] [PASSED] ALDERLAKE_P RPLU E0
[14:41:19] [PASSED] DG2 G10 C0
[14:41:19] [PASSED] DG2 G11 B1
[14:41:19] [PASSED] DG2 G12 A1
[14:41:19] [PASSED] METEORLAKE 12.70(Xe_LPG) A0 13.00(Xe_LPM+) A0
[14:41:19] [PASSED] METEORLAKE 12.71(Xe_LPG) A0 13.00(Xe_LPM+) A0
[14:41:19] [PASSED] METEORLAKE 12.74(Xe_LPG+) A0 13.00(Xe_LPM+) A0
[14:41:19] [PASSED] LUNARLAKE 20.04(Xe2_LPG) A0 20.00(Xe2_LPM) A0
[14:41:19] [PASSED] LUNARLAKE 20.04(Xe2_LPG) B0 20.00(Xe2_LPM) A0
[14:41:19] [PASSED] BATTLEMAGE 20.01(Xe2_HPG) A0 13.01(Xe2_HPM) A1
[14:41:19] [PASSED] PANTHERLAKE 30.00(Xe3_LPG) A0 30.00(Xe3_LPM) A0
[14:41:19] ==================== [PASSED] xe_wa_gt =====================
[14:41:19] ====================== [PASSED] xe_wa ======================
[14:41:19] ============================================================
[14:41:19] Testing complete. Ran 318 tests: passed: 300, skipped: 18
[14:41:19] Elapsed time: 42.559s total, 4.321s configuring, 37.872s building, 0.335s running
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/tests/.kunitconfig
[14:41:19] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[14:41:21] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=25
[14:41:51] Starting KUnit Kernel (1/1)...
[14:41:51] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[14:41:51] ============ drm_test_pick_cmdline (2 subtests) ============
[14:41:51] [PASSED] drm_test_pick_cmdline_res_1920_1080_60
[14:41:51] =============== drm_test_pick_cmdline_named ===============
[14:41:51] [PASSED] NTSC
[14:41:51] [PASSED] NTSC-J
[14:41:51] [PASSED] PAL
[14:41:51] [PASSED] PAL-M
[14:41:51] =========== [PASSED] drm_test_pick_cmdline_named ===========
[14:41:51] ============== [PASSED] drm_test_pick_cmdline ==============
[14:41:51] == drm_test_atomic_get_connector_for_encoder (1 subtest) ===
[14:41:51] [PASSED] drm_test_drm_atomic_get_connector_for_encoder
[14:41:51] ==== [PASSED] drm_test_atomic_get_connector_for_encoder ====
[14:41:51] =========== drm_validate_clone_mode (2 subtests) ===========
[14:41:51] ============== drm_test_check_in_clone_mode ===============
[14:41:51] [PASSED] in_clone_mode
[14:41:51] [PASSED] not_in_clone_mode
[14:41:51] ========== [PASSED] drm_test_check_in_clone_mode ===========
[14:41:51] =============== drm_test_check_valid_clones ===============
[14:41:51] [PASSED] not_in_clone_mode
[14:41:51] [PASSED] valid_clone
[14:41:51] [PASSED] invalid_clone
[14:41:51] =========== [PASSED] drm_test_check_valid_clones ===========
[14:41:51] ============= [PASSED] drm_validate_clone_mode =============
[14:41:51] ============= drm_validate_modeset (1 subtest) =============
[14:41:51] [PASSED] drm_test_check_connector_changed_modeset
[14:41:51] ============== [PASSED] drm_validate_modeset ===============
[14:41:51] ====== drm_test_bridge_get_current_state (2 subtests) ======
[14:41:51] [PASSED] drm_test_drm_bridge_get_current_state_atomic
[14:41:51] [PASSED] drm_test_drm_bridge_get_current_state_legacy
[14:41:51] ======== [PASSED] drm_test_bridge_get_current_state ========
[14:41:51] ====== drm_test_bridge_helper_reset_crtc (3 subtests) ======
[14:41:51] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic
[14:41:51] [PASSED] drm_test_drm_bridge_helper_reset_crtc_atomic_disabled
[14:41:51] [PASSED] drm_test_drm_bridge_helper_reset_crtc_legacy
[14:41:51] ======== [PASSED] drm_test_bridge_helper_reset_crtc ========
[14:41:51] ============== drm_bridge_alloc (2 subtests) ===============
[14:41:51] [PASSED] drm_test_drm_bridge_alloc_basic
[14:41:51] [PASSED] drm_test_drm_bridge_alloc_get_put
[14:41:51] ================ [PASSED] drm_bridge_alloc =================
[14:41:51] ================== drm_buddy (8 subtests) ==================
[14:41:51] [PASSED] drm_test_buddy_alloc_limit
[14:41:51] [PASSED] drm_test_buddy_alloc_optimistic
[14:41:51] [PASSED] drm_test_buddy_alloc_pessimistic
[14:41:51] [PASSED] drm_test_buddy_alloc_pathological
[14:41:51] [PASSED] drm_test_buddy_alloc_contiguous
[14:41:51] [PASSED] drm_test_buddy_alloc_clear
[14:41:51] [PASSED] drm_test_buddy_alloc_range_bias
[14:41:51] [PASSED] drm_test_buddy_fragmentation_performance
[14:41:51] ==================== [PASSED] drm_buddy ====================
[14:41:51] ============= drm_cmdline_parser (40 subtests) =============
[14:41:51] [PASSED] drm_test_cmdline_force_d_only
[14:41:51] [PASSED] drm_test_cmdline_force_D_only_dvi
[14:41:51] [PASSED] drm_test_cmdline_force_D_only_hdmi
[14:41:51] [PASSED] drm_test_cmdline_force_D_only_not_digital
[14:41:51] [PASSED] drm_test_cmdline_force_e_only
[14:41:51] [PASSED] drm_test_cmdline_res
[14:41:51] [PASSED] drm_test_cmdline_res_vesa
[14:41:51] [PASSED] drm_test_cmdline_res_vesa_rblank
[14:41:51] [PASSED] drm_test_cmdline_res_rblank
[14:41:51] [PASSED] drm_test_cmdline_res_bpp
[14:41:51] [PASSED] drm_test_cmdline_res_refresh
[14:41:51] [PASSED] drm_test_cmdline_res_bpp_refresh
[14:41:51] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced
[14:41:51] [PASSED] drm_test_cmdline_res_bpp_refresh_margins
[14:41:51] [PASSED] drm_test_cmdline_res_bpp_refresh_force_off
[14:41:51] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on
[14:41:51] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_analog
[14:41:51] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_digital
[14:41:51] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced_margins_force_on
[14:41:51] [PASSED] drm_test_cmdline_res_margins_force_on
[14:41:51] [PASSED] drm_test_cmdline_res_vesa_margins
[14:41:51] [PASSED] drm_test_cmdline_name
[14:41:51] [PASSED] drm_test_cmdline_name_bpp
[14:41:51] [PASSED] drm_test_cmdline_name_option
[14:41:51] [PASSED] drm_test_cmdline_name_bpp_option
[14:41:51] [PASSED] drm_test_cmdline_rotate_0
[14:41:51] [PASSED] drm_test_cmdline_rotate_90
[14:41:51] [PASSED] drm_test_cmdline_rotate_180
[14:41:51] [PASSED] drm_test_cmdline_rotate_270
[14:41:51] [PASSED] drm_test_cmdline_hmirror
[14:41:51] [PASSED] drm_test_cmdline_vmirror
[14:41:51] [PASSED] drm_test_cmdline_margin_options
[14:41:51] [PASSED] drm_test_cmdline_multiple_options
[14:41:51] [PASSED] drm_test_cmdline_bpp_extra_and_option
[14:41:51] [PASSED] drm_test_cmdline_extra_and_option
[14:41:51] [PASSED] drm_test_cmdline_freestanding_options
[14:41:51] [PASSED] drm_test_cmdline_freestanding_force_e_and_options
[14:41:51] [PASSED] drm_test_cmdline_panel_orientation
[14:41:51] ================ drm_test_cmdline_invalid =================
[14:41:51] [PASSED] margin_only
[14:41:51] [PASSED] interlace_only
[14:41:51] [PASSED] res_missing_x
[14:41:51] [PASSED] res_missing_y
[14:41:51] [PASSED] res_bad_y
[14:41:51] [PASSED] res_missing_y_bpp
[14:41:51] [PASSED] res_bad_bpp
[14:41:51] [PASSED] res_bad_refresh
[14:41:51] [PASSED] res_bpp_refresh_force_on_off
[14:41:51] [PASSED] res_invalid_mode
[14:41:51] [PASSED] res_bpp_wrong_place_mode
[14:41:51] [PASSED] name_bpp_refresh
[14:41:51] [PASSED] name_refresh
[14:41:51] [PASSED] name_refresh_wrong_mode
[14:41:51] [PASSED] name_refresh_invalid_mode
[14:41:51] [PASSED] rotate_multiple
[14:41:51] [PASSED] rotate_invalid_val
[14:41:51] [PASSED] rotate_truncated
[14:41:51] [PASSED] invalid_option
[14:41:51] [PASSED] invalid_tv_option
[14:41:51] [PASSED] truncated_tv_option
[14:41:51] ============ [PASSED] drm_test_cmdline_invalid =============
[14:41:51] =============== drm_test_cmdline_tv_options ===============
[14:41:51] [PASSED] NTSC
[14:41:51] [PASSED] NTSC_443
[14:41:51] [PASSED] NTSC_J
[14:41:51] [PASSED] PAL
[14:41:51] [PASSED] PAL_M
[14:41:51] [PASSED] PAL_N
[14:41:51] [PASSED] SECAM
[14:41:51] [PASSED] MONO_525
[14:41:51] [PASSED] MONO_625
[14:41:51] =========== [PASSED] drm_test_cmdline_tv_options ===========
[14:41:51] =============== [PASSED] drm_cmdline_parser ================
[14:41:51] ========== drmm_connector_hdmi_init (20 subtests) ==========
[14:41:51] [PASSED] drm_test_connector_hdmi_init_valid
[14:41:51] [PASSED] drm_test_connector_hdmi_init_bpc_8
[14:41:51] [PASSED] drm_test_connector_hdmi_init_bpc_10
[14:41:51] [PASSED] drm_test_connector_hdmi_init_bpc_12
[14:41:51] [PASSED] drm_test_connector_hdmi_init_bpc_invalid
[14:41:51] [PASSED] drm_test_connector_hdmi_init_bpc_null
[14:41:51] [PASSED] drm_test_connector_hdmi_init_formats_empty
[14:41:51] [PASSED] drm_test_connector_hdmi_init_formats_no_rgb
[14:41:51] === drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[14:41:51] [PASSED] supported_formats=0x9 yuv420_allowed=1
[14:41:51] [PASSED] supported_formats=0x9 yuv420_allowed=0
[14:41:51] [PASSED] supported_formats=0x3 yuv420_allowed=1
[14:41:51] [PASSED] supported_formats=0x3 yuv420_allowed=0
[14:41:51] === [PASSED] drm_test_connector_hdmi_init_formats_yuv420_allowed ===
[14:41:51] [PASSED] drm_test_connector_hdmi_init_null_ddc
[14:41:51] [PASSED] drm_test_connector_hdmi_init_null_product
[14:41:51] [PASSED] drm_test_connector_hdmi_init_null_vendor
[14:41:51] [PASSED] drm_test_connector_hdmi_init_product_length_exact
[14:41:51] [PASSED] drm_test_connector_hdmi_init_product_length_too_long
[14:41:51] [PASSED] drm_test_connector_hdmi_init_product_valid
[14:41:51] [PASSED] drm_test_connector_hdmi_init_vendor_length_exact
[14:41:51] [PASSED] drm_test_connector_hdmi_init_vendor_length_too_long
[14:41:51] [PASSED] drm_test_connector_hdmi_init_vendor_valid
[14:41:51] ========= drm_test_connector_hdmi_init_type_valid =========
[14:41:51] [PASSED] HDMI-A
[14:41:51] [PASSED] HDMI-B
[14:41:51] ===== [PASSED] drm_test_connector_hdmi_init_type_valid =====
[14:41:51] ======== drm_test_connector_hdmi_init_type_invalid ========
[14:41:51] [PASSED] Unknown
[14:41:51] [PASSED] VGA
[14:41:51] [PASSED] DVI-I
[14:41:51] [PASSED] DVI-D
[14:41:51] [PASSED] DVI-A
[14:41:51] [PASSED] Composite
[14:41:51] [PASSED] SVIDEO
[14:41:51] [PASSED] LVDS
[14:41:51] [PASSED] Component
[14:41:51] [PASSED] DIN
[14:41:51] [PASSED] DP
[14:41:51] [PASSED] TV
[14:41:51] [PASSED] eDP
[14:41:51] [PASSED] Virtual
[14:41:51] [PASSED] DSI
[14:41:51] [PASSED] DPI
[14:41:51] [PASSED] Writeback
[14:41:51] [PASSED] SPI
[14:41:51] [PASSED] USB
[14:41:51] ==== [PASSED] drm_test_connector_hdmi_init_type_invalid ====
[14:41:51] ============ [PASSED] drmm_connector_hdmi_init =============
[14:41:51] ============= drmm_connector_init (3 subtests) =============
[14:41:51] [PASSED] drm_test_drmm_connector_init
[14:41:51] [PASSED] drm_test_drmm_connector_init_null_ddc
[14:41:51] ========= drm_test_drmm_connector_init_type_valid =========
[14:41:51] [PASSED] Unknown
[14:41:51] [PASSED] VGA
[14:41:51] [PASSED] DVI-I
[14:41:51] [PASSED] DVI-D
[14:41:51] [PASSED] DVI-A
[14:41:51] [PASSED] Composite
[14:41:51] [PASSED] SVIDEO
[14:41:51] [PASSED] LVDS
[14:41:51] [PASSED] Component
[14:41:51] [PASSED] DIN
[14:41:51] [PASSED] DP
[14:41:51] [PASSED] HDMI-A
[14:41:51] [PASSED] HDMI-B
[14:41:51] [PASSED] TV
[14:41:51] [PASSED] eDP
[14:41:51] [PASSED] Virtual
[14:41:51] [PASSED] DSI
[14:41:51] [PASSED] DPI
[14:41:51] [PASSED] Writeback
[14:41:51] [PASSED] SPI
[14:41:51] [PASSED] USB
[14:41:51] ===== [PASSED] drm_test_drmm_connector_init_type_valid =====
[14:41:51] =============== [PASSED] drmm_connector_init ===============
[14:41:51] ========= drm_connector_dynamic_init (6 subtests) ==========
[14:41:51] [PASSED] drm_test_drm_connector_dynamic_init
[14:41:51] [PASSED] drm_test_drm_connector_dynamic_init_null_ddc
[14:41:51] [PASSED] drm_test_drm_connector_dynamic_init_not_added
[14:41:51] [PASSED] drm_test_drm_connector_dynamic_init_properties
[14:41:51] ===== drm_test_drm_connector_dynamic_init_type_valid ======
[14:41:51] [PASSED] Unknown
[14:41:51] [PASSED] VGA
[14:41:51] [PASSED] DVI-I
[14:41:51] [PASSED] DVI-D
[14:41:51] [PASSED] DVI-A
[14:41:51] [PASSED] Composite
[14:41:51] [PASSED] SVIDEO
[14:41:51] [PASSED] LVDS
[14:41:51] [PASSED] Component
[14:41:51] [PASSED] DIN
[14:41:51] [PASSED] DP
[14:41:51] [PASSED] HDMI-A
[14:41:51] [PASSED] HDMI-B
[14:41:51] [PASSED] TV
[14:41:51] [PASSED] eDP
[14:41:51] [PASSED] Virtual
[14:41:51] [PASSED] DSI
[14:41:51] [PASSED] DPI
[14:41:51] [PASSED] Writeback
[14:41:51] [PASSED] SPI
[14:41:51] [PASSED] USB
[14:41:51] = [PASSED] drm_test_drm_connector_dynamic_init_type_valid ==
[14:41:51] ======== drm_test_drm_connector_dynamic_init_name =========
[14:41:51] [PASSED] Unknown
[14:41:51] [PASSED] VGA
[14:41:51] [PASSED] DVI-I
[14:41:51] [PASSED] DVI-D
[14:41:51] [PASSED] DVI-A
[14:41:51] [PASSED] Composite
[14:41:51] [PASSED] SVIDEO
[14:41:51] [PASSED] LVDS
[14:41:51] [PASSED] Component
[14:41:51] [PASSED] DIN
[14:41:51] [PASSED] DP
[14:41:51] [PASSED] HDMI-A
[14:41:51] [PASSED] HDMI-B
[14:41:51] [PASSED] TV
[14:41:51] [PASSED] eDP
[14:41:51] [PASSED] Virtual
[14:41:51] [PASSED] DSI
[14:41:51] [PASSED] DPI
[14:41:51] [PASSED] Writeback
[14:41:51] [PASSED] SPI
[14:41:51] [PASSED] USB
[14:41:51] ==== [PASSED] drm_test_drm_connector_dynamic_init_name =====
[14:41:51] =========== [PASSED] drm_connector_dynamic_init ============
[14:41:51] ==== drm_connector_dynamic_register_early (4 subtests) =====
[14:41:51] [PASSED] drm_test_drm_connector_dynamic_register_early_on_list
[14:41:51] [PASSED] drm_test_drm_connector_dynamic_register_early_defer
[14:41:51] [PASSED] drm_test_drm_connector_dynamic_register_early_no_init
[14:41:51] [PASSED] drm_test_drm_connector_dynamic_register_early_no_mode_object
[14:41:51] ====== [PASSED] drm_connector_dynamic_register_early =======
[14:41:51] ======= drm_connector_dynamic_register (7 subtests) ========
[14:41:51] [PASSED] drm_test_drm_connector_dynamic_register_on_list
[14:41:51] [PASSED] drm_test_drm_connector_dynamic_register_no_defer
[14:41:51] [PASSED] drm_test_drm_connector_dynamic_register_no_init
[14:41:51] [PASSED] drm_test_drm_connector_dynamic_register_mode_object
[14:41:51] [PASSED] drm_test_drm_connector_dynamic_register_sysfs
[14:41:51] [PASSED] drm_test_drm_connector_dynamic_register_sysfs_name
[14:41:51] [PASSED] drm_test_drm_connector_dynamic_register_debugfs
[14:41:51] ========= [PASSED] drm_connector_dynamic_register ==========
[14:41:51] = drm_connector_attach_broadcast_rgb_property (2 subtests) =
[14:41:51] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property
[14:41:51] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property_hdmi_connector
[14:41:51] === [PASSED] drm_connector_attach_broadcast_rgb_property ===
[14:41:51] ========== drm_get_tv_mode_from_name (2 subtests) ==========
[14:41:51] ========== drm_test_get_tv_mode_from_name_valid ===========
[14:41:51] [PASSED] NTSC
[14:41:51] [PASSED] NTSC-443
[14:41:51] [PASSED] NTSC-J
[14:41:51] [PASSED] PAL
[14:41:51] [PASSED] PAL-M
[14:41:51] [PASSED] PAL-N
[14:41:51] [PASSED] SECAM
[14:41:51] [PASSED] Mono
[14:41:51] ====== [PASSED] drm_test_get_tv_mode_from_name_valid =======
[14:41:51] [PASSED] drm_test_get_tv_mode_from_name_truncated
[14:41:51] ============ [PASSED] drm_get_tv_mode_from_name ============
[14:41:51] = drm_test_connector_hdmi_compute_mode_clock (12 subtests) =
[14:41:51] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb
[14:41:51] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc
[14:41:51] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc_vic_1
[14:41:51] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc
[14:41:51] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc_vic_1
[14:41:51] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_double
[14:41:51] = drm_test_connector_hdmi_compute_mode_clock_yuv420_valid =
[14:41:51] [PASSED] VIC 96
[14:41:51] [PASSED] VIC 97
[14:41:51] [PASSED] VIC 101
[14:41:51] [PASSED] VIC 102
[14:41:51] [PASSED] VIC 106
[14:41:51] [PASSED] VIC 107
[14:41:51] === [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_valid ===
[14:41:51] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_10_bpc
[14:41:51] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_12_bpc
[14:41:51] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_8_bpc
[14:41:51] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_10_bpc
[14:41:51] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_12_bpc
[14:41:51] === [PASSED] drm_test_connector_hdmi_compute_mode_clock ====
[14:41:51] == drm_hdmi_connector_get_broadcast_rgb_name (2 subtests) ==
[14:41:51] === drm_test_drm_hdmi_connector_get_broadcast_rgb_name ====
[14:41:51] [PASSED] Automatic
[14:41:51] [PASSED] Full
[14:41:51] [PASSED] Limited 16:235
[14:41:51] === [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name ===
[14:41:51] [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name_invalid
[14:41:51] ==== [PASSED] drm_hdmi_connector_get_broadcast_rgb_name ====
[14:41:51] == drm_hdmi_connector_get_output_format_name (2 subtests) ==
[14:41:51] === drm_test_drm_hdmi_connector_get_output_format_name ====
[14:41:51] [PASSED] RGB
[14:41:51] [PASSED] YUV 4:2:0
[14:41:51] [PASSED] YUV 4:2:2
[14:41:51] [PASSED] YUV 4:4:4
[14:41:51] === [PASSED] drm_test_drm_hdmi_connector_get_output_format_name ===
[14:41:51] [PASSED] drm_test_drm_hdmi_connector_get_output_format_name_invalid
[14:41:51] ==== [PASSED] drm_hdmi_connector_get_output_format_name ====
[14:41:51] ============= drm_damage_helper (21 subtests) ==============
[14:41:51] [PASSED] drm_test_damage_iter_no_damage
[14:41:51] [PASSED] drm_test_damage_iter_no_damage_fractional_src
[14:41:51] [PASSED] drm_test_damage_iter_no_damage_src_moved
[14:41:51] [PASSED] drm_test_damage_iter_no_damage_fractional_src_moved
[14:41:51] [PASSED] drm_test_damage_iter_no_damage_not_visible
[14:41:51] [PASSED] drm_test_damage_iter_no_damage_no_crtc
[14:41:51] [PASSED] drm_test_damage_iter_no_damage_no_fb
[14:41:51] [PASSED] drm_test_damage_iter_simple_damage
[14:41:51] [PASSED] drm_test_damage_iter_single_damage
[14:41:51] [PASSED] drm_test_damage_iter_single_damage_intersect_src
[14:41:51] [PASSED] drm_test_damage_iter_single_damage_outside_src
[14:41:51] [PASSED] drm_test_damage_iter_single_damage_fractional_src
[14:41:51] [PASSED] drm_test_damage_iter_single_damage_intersect_fractional_src
[14:41:51] [PASSED] drm_test_damage_iter_single_damage_outside_fractional_src
[14:41:51] [PASSED] drm_test_damage_iter_single_damage_src_moved
[14:41:51] [PASSED] drm_test_damage_iter_single_damage_fractional_src_moved
[14:41:51] [PASSED] drm_test_damage_iter_damage
[14:41:51] [PASSED] drm_test_damage_iter_damage_one_intersect
[14:41:51] [PASSED] drm_test_damage_iter_damage_one_outside
[14:41:51] [PASSED] drm_test_damage_iter_damage_src_moved
[14:41:51] [PASSED] drm_test_damage_iter_damage_not_visible
[14:41:51] ================ [PASSED] drm_damage_helper ================
[14:41:51] ============== drm_dp_mst_helper (3 subtests) ==============
[14:41:51] ============== drm_test_dp_mst_calc_pbn_mode ==============
[14:41:51] [PASSED] Clock 154000 BPP 30 DSC disabled
[14:41:51] [PASSED] Clock 234000 BPP 30 DSC disabled
[14:41:51] [PASSED] Clock 297000 BPP 24 DSC disabled
[14:41:51] [PASSED] Clock 332880 BPP 24 DSC enabled
[14:41:51] [PASSED] Clock 324540 BPP 24 DSC enabled
[14:41:51] ========== [PASSED] drm_test_dp_mst_calc_pbn_mode ==========
[14:41:51] ============== drm_test_dp_mst_calc_pbn_div ===============
[14:41:51] [PASSED] Link rate 2000000 lane count 4
[14:41:51] [PASSED] Link rate 2000000 lane count 2
[14:41:51] [PASSED] Link rate 2000000 lane count 1
[14:41:51] [PASSED] Link rate 1350000 lane count 4
[14:41:51] [PASSED] Link rate 1350000 lane count 2
[14:41:51] [PASSED] Link rate 1350000 lane count 1
[14:41:51] [PASSED] Link rate 1000000 lane count 4
[14:41:51] [PASSED] Link rate 1000000 lane count 2
[14:41:51] [PASSED] Link rate 1000000 lane count 1
[14:41:51] [PASSED] Link rate 810000 lane count 4
[14:41:51] [PASSED] Link rate 810000 lane count 2
[14:41:51] [PASSED] Link rate 810000 lane count 1
[14:41:51] [PASSED] Link rate 540000 lane count 4
[14:41:51] [PASSED] Link rate 540000 lane count 2
[14:41:51] [PASSED] Link rate 540000 lane count 1
[14:41:51] [PASSED] Link rate 270000 lane count 4
[14:41:51] [PASSED] Link rate 270000 lane count 2
[14:41:51] [PASSED] Link rate 270000 lane count 1
[14:41:51] [PASSED] Link rate 162000 lane count 4
[14:41:51] [PASSED] Link rate 162000 lane count 2
[14:41:51] [PASSED] Link rate 162000 lane count 1
[14:41:51] ========== [PASSED] drm_test_dp_mst_calc_pbn_div ===========
[14:41:51] ========= drm_test_dp_mst_sideband_msg_req_decode =========
[14:41:51] [PASSED] DP_ENUM_PATH_RESOURCES with port number
[14:41:51] [PASSED] DP_POWER_UP_PHY with port number
[14:41:51] [PASSED] DP_POWER_DOWN_PHY with port number
[14:41:51] [PASSED] DP_ALLOCATE_PAYLOAD with SDP stream sinks
[14:41:51] [PASSED] DP_ALLOCATE_PAYLOAD with port number
[14:41:51] [PASSED] DP_ALLOCATE_PAYLOAD with VCPI
[14:41:51] [PASSED] DP_ALLOCATE_PAYLOAD with PBN
[14:41:51] [PASSED] DP_QUERY_PAYLOAD with port number
[14:41:51] [PASSED] DP_QUERY_PAYLOAD with VCPI
[14:41:51] [PASSED] DP_REMOTE_DPCD_READ with port number
[14:41:51] [PASSED] DP_REMOTE_DPCD_READ with DPCD address
[14:41:51] [PASSED] DP_REMOTE_DPCD_READ with max number of bytes
[14:41:51] [PASSED] DP_REMOTE_DPCD_WRITE with port number
[14:41:51] [PASSED] DP_REMOTE_DPCD_WRITE with DPCD address
[14:41:51] [PASSED] DP_REMOTE_DPCD_WRITE with data array
[14:41:51] [PASSED] DP_REMOTE_I2C_READ with port number
[14:41:51] [PASSED] DP_REMOTE_I2C_READ with I2C device ID
[14:41:51] [PASSED] DP_REMOTE_I2C_READ with transactions array
[14:41:51] [PASSED] DP_REMOTE_I2C_WRITE with port number
[14:41:51] [PASSED] DP_REMOTE_I2C_WRITE with I2C device ID
[14:41:51] [PASSED] DP_REMOTE_I2C_WRITE with data array
[14:41:51] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream ID
[14:41:51] [PASSED] DP_QUERY_STREAM_ENC_STATUS with client ID
[14:41:51] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream event
[14:41:51] [PASSED] DP_QUERY_STREAM_ENC_STATUS with valid stream event
[14:41:51] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream behavior
[14:41:51] [PASSED] DP_QUERY_STREAM_ENC_STATUS with a valid stream behavior
[14:41:51] ===== [PASSED] drm_test_dp_mst_sideband_msg_req_decode =====
[14:41:51] ================ [PASSED] drm_dp_mst_helper ================
[14:41:51] ================== drm_exec (7 subtests) ===================
[14:41:51] [PASSED] sanitycheck
[14:41:51] [PASSED] test_lock
[14:41:51] [PASSED] test_lock_unlock
[14:41:51] [PASSED] test_duplicates
[14:41:51] [PASSED] test_prepare
[14:41:51] [PASSED] test_prepare_array
[14:41:51] [PASSED] test_multiple_loops
[14:41:51] ==================== [PASSED] drm_exec =====================
[14:41:51] =========== drm_format_helper_test (17 subtests) ===========
[14:41:51] ============== drm_test_fb_xrgb8888_to_gray8 ==============
[14:41:51] [PASSED] single_pixel_source_buffer
[14:41:51] [PASSED] single_pixel_clip_rectangle
[14:41:51] [PASSED] well_known_colors
[14:41:51] [PASSED] destination_pitch
[14:41:51] ========== [PASSED] drm_test_fb_xrgb8888_to_gray8 ==========
[14:41:51] ============= drm_test_fb_xrgb8888_to_rgb332 ==============
[14:41:51] [PASSED] single_pixel_source_buffer
[14:41:51] [PASSED] single_pixel_clip_rectangle
[14:41:51] [PASSED] well_known_colors
[14:41:51] [PASSED] destination_pitch
[14:41:51] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb332 ==========
[14:41:51] ============= drm_test_fb_xrgb8888_to_rgb565 ==============
[14:41:51] [PASSED] single_pixel_source_buffer
[14:41:51] [PASSED] single_pixel_clip_rectangle
[14:41:51] [PASSED] well_known_colors
[14:41:51] [PASSED] destination_pitch
[14:41:51] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb565 ==========
[14:41:51] ============ drm_test_fb_xrgb8888_to_xrgb1555 =============
[14:41:51] [PASSED] single_pixel_source_buffer
[14:41:51] [PASSED] single_pixel_clip_rectangle
[14:41:51] [PASSED] well_known_colors
[14:41:51] [PASSED] destination_pitch
[14:41:51] ======== [PASSED] drm_test_fb_xrgb8888_to_xrgb1555 =========
[14:41:51] ============ drm_test_fb_xrgb8888_to_argb1555 =============
[14:41:51] [PASSED] single_pixel_source_buffer
[14:41:51] [PASSED] single_pixel_clip_rectangle
[14:41:51] [PASSED] well_known_colors
[14:41:51] [PASSED] destination_pitch
[14:41:51] ======== [PASSED] drm_test_fb_xrgb8888_to_argb1555 =========
[14:41:51] ============ drm_test_fb_xrgb8888_to_rgba5551 =============
[14:41:51] [PASSED] single_pixel_source_buffer
[14:41:51] [PASSED] single_pixel_clip_rectangle
[14:41:51] [PASSED] well_known_colors
[14:41:51] [PASSED] destination_pitch
[14:41:51] ======== [PASSED] drm_test_fb_xrgb8888_to_rgba5551 =========
[14:41:51] ============= drm_test_fb_xrgb8888_to_rgb888 ==============
[14:41:51] [PASSED] single_pixel_source_buffer
[14:41:51] [PASSED] single_pixel_clip_rectangle
[14:41:51] [PASSED] well_known_colors
[14:41:51] [PASSED] destination_pitch
[14:41:51] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb888 ==========
[14:41:51] ============= drm_test_fb_xrgb8888_to_bgr888 ==============
[14:41:51] [PASSED] single_pixel_source_buffer
[14:41:51] [PASSED] single_pixel_clip_rectangle
[14:41:51] [PASSED] well_known_colors
[14:41:51] [PASSED] destination_pitch
[14:41:51] ========= [PASSED] drm_test_fb_xrgb8888_to_bgr888 ==========
[14:41:51] ============ drm_test_fb_xrgb8888_to_argb8888 =============
[14:41:51] [PASSED] single_pixel_source_buffer
[14:41:51] [PASSED] single_pixel_clip_rectangle
[14:41:51] [PASSED] well_known_colors
[14:41:51] [PASSED] destination_pitch
[14:41:51] ======== [PASSED] drm_test_fb_xrgb8888_to_argb8888 =========
[14:41:51] =========== drm_test_fb_xrgb8888_to_xrgb2101010 ===========
[14:41:51] [PASSED] single_pixel_source_buffer
[14:41:51] [PASSED] single_pixel_clip_rectangle
[14:41:51] [PASSED] well_known_colors
[14:41:51] [PASSED] destination_pitch
[14:41:51] ======= [PASSED] drm_test_fb_xrgb8888_to_xrgb2101010 =======
[14:41:51] =========== drm_test_fb_xrgb8888_to_argb2101010 ===========
[14:41:51] [PASSED] single_pixel_source_buffer
[14:41:51] [PASSED] single_pixel_clip_rectangle
[14:41:51] [PASSED] well_known_colors
[14:41:51] [PASSED] destination_pitch
[14:41:51] ======= [PASSED] drm_test_fb_xrgb8888_to_argb2101010 =======
[14:41:51] ============== drm_test_fb_xrgb8888_to_mono ===============
[14:41:51] [PASSED] single_pixel_source_buffer
[14:41:51] [PASSED] single_pixel_clip_rectangle
[14:41:51] [PASSED] well_known_colors
[14:41:51] [PASSED] destination_pitch
[14:41:51] ========== [PASSED] drm_test_fb_xrgb8888_to_mono ===========
[14:41:51] ==================== drm_test_fb_swab =====================
[14:41:51] [PASSED] single_pixel_source_buffer
[14:41:51] [PASSED] single_pixel_clip_rectangle
[14:41:51] [PASSED] well_known_colors
[14:41:51] [PASSED] destination_pitch
[14:41:51] ================ [PASSED] drm_test_fb_swab =================
[14:41:51] ============ drm_test_fb_xrgb8888_to_xbgr8888 =============
[14:41:51] [PASSED] single_pixel_source_buffer
[14:41:51] [PASSED] single_pixel_clip_rectangle
[14:41:51] [PASSED] well_known_colors
[14:41:51] [PASSED] destination_pitch
[14:41:51] ======== [PASSED] drm_test_fb_xrgb8888_to_xbgr8888 =========
[14:41:51] ============ drm_test_fb_xrgb8888_to_abgr8888 =============
[14:41:51] [PASSED] single_pixel_source_buffer
[14:41:51] [PASSED] single_pixel_clip_rectangle
[14:41:51] [PASSED] well_known_colors
[14:41:51] [PASSED] destination_pitch
[14:41:51] ======== [PASSED] drm_test_fb_xrgb8888_to_abgr8888 =========
[14:41:51] ================= drm_test_fb_clip_offset =================
[14:41:51] [PASSED] pass through
[14:41:51] [PASSED] horizontal offset
[14:41:51] [PASSED] vertical offset
[14:41:51] [PASSED] horizontal and vertical offset
[14:41:51] [PASSED] horizontal offset (custom pitch)
[14:41:51] [PASSED] vertical offset (custom pitch)
[14:41:51] [PASSED] horizontal and vertical offset (custom pitch)
[14:41:51] ============= [PASSED] drm_test_fb_clip_offset =============
[14:41:51] =================== drm_test_fb_memcpy ====================
[14:41:51] [PASSED] single_pixel_source_buffer: XR24 little-endian (0x34325258)
[14:41:51] [PASSED] single_pixel_source_buffer: XRA8 little-endian (0x38415258)
[14:41:51] [PASSED] single_pixel_source_buffer: YU24 little-endian (0x34325559)
[14:41:51] [PASSED] single_pixel_clip_rectangle: XB24 little-endian (0x34324258)
[14:41:51] [PASSED] single_pixel_clip_rectangle: XRA8 little-endian (0x38415258)
[14:41:51] [PASSED] single_pixel_clip_rectangle: YU24 little-endian (0x34325559)
[14:41:51] [PASSED] well_known_colors: XB24 little-endian (0x34324258)
[14:41:51] [PASSED] well_known_colors: XRA8 little-endian (0x38415258)
[14:41:51] [PASSED] well_known_colors: YU24 little-endian (0x34325559)
[14:41:51] [PASSED] destination_pitch: XB24 little-endian (0x34324258)
[14:41:51] [PASSED] destination_pitch: XRA8 little-endian (0x38415258)
[14:41:51] [PASSED] destination_pitch: YU24 little-endian (0x34325559)
[14:41:51] =============== [PASSED] drm_test_fb_memcpy ================
[14:41:51] ============= [PASSED] drm_format_helper_test ==============
[14:41:51] ================= drm_format (18 subtests) =================
[14:41:51] [PASSED] drm_test_format_block_width_invalid
[14:41:51] [PASSED] drm_test_format_block_width_one_plane
[14:41:51] [PASSED] drm_test_format_block_width_two_plane
[14:41:51] [PASSED] drm_test_format_block_width_three_plane
[14:41:51] [PASSED] drm_test_format_block_width_tiled
[14:41:51] [PASSED] drm_test_format_block_height_invalid
[14:41:51] [PASSED] drm_test_format_block_height_one_plane
[14:41:51] [PASSED] drm_test_format_block_height_two_plane
[14:41:51] [PASSED] drm_test_format_block_height_three_plane
[14:41:51] [PASSED] drm_test_format_block_height_tiled
[14:41:51] [PASSED] drm_test_format_min_pitch_invalid
[14:41:51] [PASSED] drm_test_format_min_pitch_one_plane_8bpp
[14:41:51] [PASSED] drm_test_format_min_pitch_one_plane_16bpp
[14:41:51] [PASSED] drm_test_format_min_pitch_one_plane_24bpp
[14:41:51] [PASSED] drm_test_format_min_pitch_one_plane_32bpp
[14:41:51] [PASSED] drm_test_format_min_pitch_two_plane
[14:41:51] [PASSED] drm_test_format_min_pitch_three_plane_8bpp
[14:41:51] [PASSED] drm_test_format_min_pitch_tiled
[14:41:51] =================== [PASSED] drm_format ====================
[14:41:51] ============== drm_framebuffer (10 subtests) ===============
[14:41:51] ========== drm_test_framebuffer_check_src_coords ==========
[14:41:51] [PASSED] Success: source fits into fb
[14:41:51] [PASSED] Fail: overflowing fb with x-axis coordinate
[14:41:51] [PASSED] Fail: overflowing fb with y-axis coordinate
[14:41:51] [PASSED] Fail: overflowing fb with source width
[14:41:51] [PASSED] Fail: overflowing fb with source height
[14:41:51] ====== [PASSED] drm_test_framebuffer_check_src_coords ======
[14:41:51] [PASSED] drm_test_framebuffer_cleanup
[14:41:51] =============== drm_test_framebuffer_create ===============
[14:41:51] [PASSED] ABGR8888 normal sizes
[14:41:51] [PASSED] ABGR8888 max sizes
[14:41:51] [PASSED] ABGR8888 pitch greater than min required
[14:41:51] [PASSED] ABGR8888 pitch less than min required
[14:41:51] [PASSED] ABGR8888 Invalid width
[14:41:51] [PASSED] ABGR8888 Invalid buffer handle
[14:41:51] [PASSED] No pixel format
[14:41:51] [PASSED] ABGR8888 Width 0
[14:41:51] [PASSED] ABGR8888 Height 0
[14:41:51] [PASSED] ABGR8888 Out of bound height * pitch combination
[14:41:51] [PASSED] ABGR8888 Large buffer offset
[14:41:51] [PASSED] ABGR8888 Buffer offset for inexistent plane
[14:41:51] [PASSED] ABGR8888 Invalid flag
[14:41:51] [PASSED] ABGR8888 Set DRM_MODE_FB_MODIFIERS without modifiers
[14:41:51] [PASSED] ABGR8888 Valid buffer modifier
[14:41:51] [PASSED] ABGR8888 Invalid buffer modifier(DRM_FORMAT_MOD_SAMSUNG_64_32_TILE)
[14:41:51] [PASSED] ABGR8888 Extra pitches without DRM_MODE_FB_MODIFIERS
[14:41:51] [PASSED] ABGR8888 Extra pitches with DRM_MODE_FB_MODIFIERS
[14:41:51] [PASSED] NV12 Normal sizes
[14:41:51] [PASSED] NV12 Max sizes
[14:41:51] [PASSED] NV12 Invalid pitch
[14:41:51] [PASSED] NV12 Invalid modifier/missing DRM_MODE_FB_MODIFIERS flag
[14:41:51] [PASSED] NV12 different modifier per-plane
[14:41:51] [PASSED] NV12 with DRM_FORMAT_MOD_SAMSUNG_64_32_TILE
[14:41:51] [PASSED] NV12 Valid modifiers without DRM_MODE_FB_MODIFIERS
[14:41:51] [PASSED] NV12 Modifier for inexistent plane
[14:41:51] [PASSED] NV12 Handle for inexistent plane
[14:41:51] [PASSED] NV12 Handle for inexistent plane without DRM_MODE_FB_MODIFIERS
[14:41:51] [PASSED] YVU420 DRM_MODE_FB_MODIFIERS set without modifier
[14:41:51] [PASSED] YVU420 Normal sizes
[14:41:51] [PASSED] YVU420 Max sizes
[14:41:51] [PASSED] YVU420 Invalid pitch
[14:41:51] [PASSED] YVU420 Different pitches
[14:41:51] [PASSED] YVU420 Different buffer offsets/pitches
[14:41:51] [PASSED] YVU420 Modifier set just for plane 0, without DRM_MODE_FB_MODIFIERS
[14:41:51] [PASSED] YVU420 Modifier set just for planes 0, 1, without DRM_MODE_FB_MODIFIERS
[14:41:51] [PASSED] YVU420 Modifier set just for plane 0, 1, with DRM_MODE_FB_MODIFIERS
[14:41:51] [PASSED] YVU420 Valid modifier
[14:41:51] [PASSED] YVU420 Different modifiers per plane
[14:41:51] [PASSED] YVU420 Modifier for inexistent plane
[14:41:51] [PASSED] YUV420_10BIT Invalid modifier(DRM_FORMAT_MOD_LINEAR)
[14:41:51] [PASSED] X0L2 Normal sizes
[14:41:51] [PASSED] X0L2 Max sizes
[14:41:51] [PASSED] X0L2 Invalid pitch
[14:41:51] [PASSED] X0L2 Pitch greater than minimum required
[14:41:51] [PASSED] X0L2 Handle for inexistent plane
[14:41:51] [PASSED] X0L2 Offset for inexistent plane, without DRM_MODE_FB_MODIFIERS set
[14:41:51] [PASSED] X0L2 Modifier without DRM_MODE_FB_MODIFIERS set
[14:41:51] [PASSED] X0L2 Valid modifier
[14:41:51] [PASSED] X0L2 Modifier for inexistent plane
[14:41:51] =========== [PASSED] drm_test_framebuffer_create ===========
[14:41:51] [PASSED] drm_test_framebuffer_free
[14:41:51] [PASSED] drm_test_framebuffer_init
[14:41:51] [PASSED] drm_test_framebuffer_init_bad_format
[14:41:51] [PASSED] drm_test_framebuffer_init_dev_mismatch
[14:41:51] [PASSED] drm_test_framebuffer_lookup
[14:41:51] [PASSED] drm_test_framebuffer_lookup_inexistent
[14:41:51] [PASSED] drm_test_framebuffer_modifiers_not_supported
[14:41:51] ================= [PASSED] drm_framebuffer =================
[14:41:51] ================ drm_gem_shmem (8 subtests) ================
[14:41:51] [PASSED] drm_gem_shmem_test_obj_create
[14:41:51] [PASSED] drm_gem_shmem_test_obj_create_private
[14:41:51] [PASSED] drm_gem_shmem_test_pin_pages
[14:41:51] [PASSED] drm_gem_shmem_test_vmap
[14:41:51] [PASSED] drm_gem_shmem_test_get_pages_sgt
[14:41:51] [PASSED] drm_gem_shmem_test_get_sg_table
[14:41:51] [PASSED] drm_gem_shmem_test_madvise
[14:41:51] [PASSED] drm_gem_shmem_test_purge
[14:41:51] ================== [PASSED] drm_gem_shmem ==================
[14:41:51] === drm_atomic_helper_connector_hdmi_check (27 subtests) ===
[14:41:51] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode
[14:41:51] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode_vic_1
[14:41:51] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode
[14:41:51] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode_vic_1
[14:41:51] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode
[14:41:51] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode_vic_1
[14:41:51] ====== drm_test_check_broadcast_rgb_cea_mode_yuv420 =======
[14:41:51] [PASSED] Automatic
[14:41:51] [PASSED] Full
[14:41:51] [PASSED] Limited 16:235
[14:41:51] == [PASSED] drm_test_check_broadcast_rgb_cea_mode_yuv420 ===
[14:41:51] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_changed
[14:41:51] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_not_changed
[14:41:51] [PASSED] drm_test_check_disable_connector
[14:41:51] [PASSED] drm_test_check_hdmi_funcs_reject_rate
[14:41:51] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_rgb
[14:41:51] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_yuv420
[14:41:51] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv422
[14:41:51] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback_ignore_yuv420
[14:41:51] [PASSED] drm_test_check_driver_unsupported_fallback_yuv420
[14:41:51] [PASSED] drm_test_check_output_bpc_crtc_mode_changed
[14:41:51] [PASSED] drm_test_check_output_bpc_crtc_mode_not_changed
[14:41:51] [PASSED] drm_test_check_output_bpc_dvi
[14:41:51] [PASSED] drm_test_check_output_bpc_format_vic_1
[14:41:51] [PASSED] drm_test_check_output_bpc_format_display_8bpc_only
[14:41:51] [PASSED] drm_test_check_output_bpc_format_display_rgb_only
[14:41:51] [PASSED] drm_test_check_output_bpc_format_driver_8bpc_only
[14:41:51] [PASSED] drm_test_check_output_bpc_format_driver_rgb_only
[14:41:51] [PASSED] drm_test_check_tmds_char_rate_rgb_8bpc
[14:41:51] [PASSED] drm_test_check_tmds_char_rate_rgb_10bpc
[14:41:51] [PASSED] drm_test_check_tmds_char_rate_rgb_12bpc
[14:41:51] ===== [PASSED] drm_atomic_helper_connector_hdmi_check ======
[14:41:51] === drm_atomic_helper_connector_hdmi_reset (6 subtests) ====
[14:41:51] [PASSED] drm_test_check_broadcast_rgb_value
[14:41:51] [PASSED] drm_test_check_bpc_8_value
[14:41:51] [PASSED] drm_test_check_bpc_10_value
[14:41:51] [PASSED] drm_test_check_bpc_12_value
[14:41:51] [PASSED] drm_test_check_format_value
[14:41:51] [PASSED] drm_test_check_tmds_char_value
[14:41:51] ===== [PASSED] drm_atomic_helper_connector_hdmi_reset ======
[14:41:51] = drm_atomic_helper_connector_hdmi_mode_valid (4 subtests) =
[14:41:51] [PASSED] drm_test_check_mode_valid
[14:41:51] [PASSED] drm_test_check_mode_valid_reject
[14:41:51] [PASSED] drm_test_check_mode_valid_reject_rate
[14:41:51] [PASSED] drm_test_check_mode_valid_reject_max_clock
[14:41:51] === [PASSED] drm_atomic_helper_connector_hdmi_mode_valid ===
[14:41:51] ================= drm_managed (2 subtests) =================
[14:41:51] [PASSED] drm_test_managed_release_action
[14:41:51] [PASSED] drm_test_managed_run_action
[14:41:51] =================== [PASSED] drm_managed ===================
[14:41:51] =================== drm_mm (6 subtests) ====================
[14:41:51] [PASSED] drm_test_mm_init
[14:41:51] [PASSED] drm_test_mm_debug
[14:41:51] [PASSED] drm_test_mm_align32
[14:41:51] [PASSED] drm_test_mm_align64
[14:41:51] [PASSED] drm_test_mm_lowest
[14:41:51] [PASSED] drm_test_mm_highest
[14:41:51] ===================== [PASSED] drm_mm ======================
[14:41:51] ============= drm_modes_analog_tv (5 subtests) =============
[14:41:51] [PASSED] drm_test_modes_analog_tv_mono_576i
[14:41:51] [PASSED] drm_test_modes_analog_tv_ntsc_480i
[14:41:51] [PASSED] drm_test_modes_analog_tv_ntsc_480i_inlined
[14:41:51] [PASSED] drm_test_modes_analog_tv_pal_576i
[14:41:51] [PASSED] drm_test_modes_analog_tv_pal_576i_inlined
[14:41:51] =============== [PASSED] drm_modes_analog_tv ===============
[14:41:51] ============== drm_plane_helper (2 subtests) ===============
[14:41:51] =============== drm_test_check_plane_state ================
[14:41:51] [PASSED] clipping_simple
[14:41:51] [PASSED] clipping_rotate_reflect
[14:41:51] [PASSED] positioning_simple
[14:41:51] [PASSED] upscaling
[14:41:51] [PASSED] downscaling
[14:41:51] [PASSED] rounding1
[14:41:51] [PASSED] rounding2
[14:41:51] [PASSED] rounding3
[14:41:51] [PASSED] rounding4
[14:41:51] =========== [PASSED] drm_test_check_plane_state ============
[14:41:51] =========== drm_test_check_invalid_plane_state ============
[14:41:51] [PASSED] positioning_invalid
[14:41:51] [PASSED] upscaling_invalid
[14:41:51] [PASSED] downscaling_invalid
[14:41:51] ======= [PASSED] drm_test_check_invalid_plane_state ========
[14:41:51] ================ [PASSED] drm_plane_helper =================
[14:41:51] ====== drm_connector_helper_tv_get_modes (1 subtest) =======
[14:41:51] ====== drm_test_connector_helper_tv_get_modes_check =======
[14:41:51] [PASSED] None
[14:41:51] [PASSED] PAL
[14:41:51] [PASSED] NTSC
[14:41:51] [PASSED] Both, NTSC Default
[14:41:51] [PASSED] Both, PAL Default
[14:41:51] [PASSED] Both, NTSC Default, with PAL on command-line
[14:41:51] [PASSED] Both, PAL Default, with NTSC on command-line
[14:41:51] == [PASSED] drm_test_connector_helper_tv_get_modes_check ===
[14:41:51] ======== [PASSED] drm_connector_helper_tv_get_modes ========
[14:41:51] ================== drm_rect (9 subtests) ===================
[14:41:51] [PASSED] drm_test_rect_clip_scaled_div_by_zero
[14:41:51] [PASSED] drm_test_rect_clip_scaled_not_clipped
[14:41:51] [PASSED] drm_test_rect_clip_scaled_clipped
[14:41:51] [PASSED] drm_test_rect_clip_scaled_signed_vs_unsigned
[14:41:51] ================= drm_test_rect_intersect =================
[14:41:51] [PASSED] top-left x bottom-right: 2x2+1+1 x 2x2+0+0
[14:41:51] [PASSED] top-right x bottom-left: 2x2+0+0 x 2x2+1-1
[14:41:51] [PASSED] bottom-left x top-right: 2x2+1-1 x 2x2+0+0
[14:41:51] [PASSED] bottom-right x top-left: 2x2+0+0 x 2x2+1+1
[14:41:51] [PASSED] right x left: 2x1+0+0 x 3x1+1+0
[14:41:51] [PASSED] left x right: 3x1+1+0 x 2x1+0+0
[14:41:51] [PASSED] up x bottom: 1x2+0+0 x 1x3+0-1
[14:41:51] [PASSED] bottom x up: 1x3+0-1 x 1x2+0+0
[14:41:51] [PASSED] touching corner: 1x1+0+0 x 2x2+1+1
[14:41:51] [PASSED] touching side: 1x1+0+0 x 1x1+1+0
[14:41:51] [PASSED] equal rects: 2x2+0+0 x 2x2+0+0
[14:41:51] [PASSED] inside another: 2x2+0+0 x 1x1+1+1
[14:41:51] [PASSED] far away: 1x1+0+0 x 1x1+3+6
[14:41:51] [PASSED] points intersecting: 0x0+5+10 x 0x0+5+10
[14:41:51] [PASSED] points not intersecting: 0x0+0+0 x 0x0+5+10
[14:41:51] ============= [PASSED] drm_test_rect_intersect =============
[14:41:51] ================ drm_test_rect_calc_hscale ================
[14:41:51] [PASSED] normal use
[14:41:51] [PASSED] out of max range
[14:41:51] [PASSED] out of min range
[14:41:51] [PASSED] zero dst
[14:41:51] [PASSED] negative src
[14:41:51] [PASSED] negative dst
[14:41:51] ============ [PASSED] drm_test_rect_calc_hscale ============
[14:41:51] ================ drm_test_rect_calc_vscale ================
[14:41:51] [PASSED] normal use
stty: 'standard input': Inappropriate ioctl for device
[14:41:51] [PASSED] out of max range
[14:41:51] [PASSED] out of min range
[14:41:51] [PASSED] zero dst
[14:41:51] [PASSED] negative src
[14:41:51] [PASSED] negative dst
[14:41:51] ============ [PASSED] drm_test_rect_calc_vscale ============
[14:41:51] ================== drm_test_rect_rotate ===================
[14:41:51] [PASSED] reflect-x
[14:41:51] [PASSED] reflect-y
[14:41:51] [PASSED] rotate-0
[14:41:51] [PASSED] rotate-90
[14:41:51] [PASSED] rotate-180
[14:41:51] [PASSED] rotate-270
[14:41:51] ============== [PASSED] drm_test_rect_rotate ===============
[14:41:51] ================ drm_test_rect_rotate_inv =================
[14:41:51] [PASSED] reflect-x
[14:41:51] [PASSED] reflect-y
[14:41:51] [PASSED] rotate-0
[14:41:51] [PASSED] rotate-90
[14:41:51] [PASSED] rotate-180
[14:41:51] [PASSED] rotate-270
[14:41:51] ============ [PASSED] drm_test_rect_rotate_inv =============
[14:41:51] ==================== [PASSED] drm_rect =====================
[14:41:51] ============ drm_sysfb_modeset_test (1 subtest) ============
[14:41:51] ============ drm_test_sysfb_build_fourcc_list =============
[14:41:51] [PASSED] no native formats
[14:41:51] [PASSED] XRGB8888 as native format
[14:41:51] [PASSED] remove duplicates
[14:41:51] [PASSED] convert alpha formats
[14:41:51] [PASSED] random formats
[14:41:51] ======== [PASSED] drm_test_sysfb_build_fourcc_list =========
[14:41:51] ============= [PASSED] drm_sysfb_modeset_test ==============
[14:41:51] ============================================================
[14:41:51] Testing complete. Ran 622 tests: passed: 622
[14:41:51] Elapsed time: 32.113s total, 1.657s configuring, 29.938s building, 0.475s running
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/ttm/tests/.kunitconfig
[14:41:52] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[14:41:53] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=25
[14:42:02] Starting KUnit Kernel (1/1)...
[14:42:02] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[14:42:03] ================= ttm_device (5 subtests) ==================
[14:42:03] [PASSED] ttm_device_init_basic
[14:42:03] [PASSED] ttm_device_init_multiple
[14:42:03] [PASSED] ttm_device_fini_basic
[14:42:03] [PASSED] ttm_device_init_no_vma_man
[14:42:03] ================== ttm_device_init_pools ==================
[14:42:03] [PASSED] No DMA allocations, no DMA32 required
[14:42:03] [PASSED] DMA allocations, DMA32 required
[14:42:03] [PASSED] No DMA allocations, DMA32 required
[14:42:03] [PASSED] DMA allocations, no DMA32 required
[14:42:03] ============== [PASSED] ttm_device_init_pools ==============
[14:42:03] =================== [PASSED] ttm_device ====================
[14:42:03] ================== ttm_pool (8 subtests) ===================
[14:42:03] ================== ttm_pool_alloc_basic ===================
[14:42:03] [PASSED] One page
[14:42:03] [PASSED] More than one page
[14:42:03] [PASSED] Above the allocation limit
[14:42:03] [PASSED] One page, with coherent DMA mappings enabled
[14:42:03] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[14:42:03] ============== [PASSED] ttm_pool_alloc_basic ===============
[14:42:03] ============== ttm_pool_alloc_basic_dma_addr ==============
[14:42:03] [PASSED] One page
[14:42:03] [PASSED] More than one page
[14:42:03] [PASSED] Above the allocation limit
[14:42:03] [PASSED] One page, with coherent DMA mappings enabled
[14:42:03] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[14:42:03] ========== [PASSED] ttm_pool_alloc_basic_dma_addr ==========
[14:42:03] [PASSED] ttm_pool_alloc_order_caching_match
[14:42:03] [PASSED] ttm_pool_alloc_caching_mismatch
[14:42:03] [PASSED] ttm_pool_alloc_order_mismatch
[14:42:03] [PASSED] ttm_pool_free_dma_alloc
[14:42:03] [PASSED] ttm_pool_free_no_dma_alloc
[14:42:03] [PASSED] ttm_pool_fini_basic
[14:42:03] ==================== [PASSED] ttm_pool =====================
[14:42:03] ================ ttm_resource (8 subtests) =================
[14:42:03] ================= ttm_resource_init_basic =================
[14:42:03] [PASSED] Init resource in TTM_PL_SYSTEM
[14:42:03] [PASSED] Init resource in TTM_PL_VRAM
[14:42:03] [PASSED] Init resource in a private placement
[14:42:03] [PASSED] Init resource in TTM_PL_SYSTEM, set placement flags
[14:42:03] ============= [PASSED] ttm_resource_init_basic =============
[14:42:03] [PASSED] ttm_resource_init_pinned
[14:42:03] [PASSED] ttm_resource_fini_basic
[14:42:03] [PASSED] ttm_resource_manager_init_basic
[14:42:03] [PASSED] ttm_resource_manager_usage_basic
[14:42:03] [PASSED] ttm_resource_manager_set_used_basic
[14:42:03] [PASSED] ttm_sys_man_alloc_basic
[14:42:03] [PASSED] ttm_sys_man_free_basic
[14:42:03] ================== [PASSED] ttm_resource ===================
[14:42:03] =================== ttm_tt (15 subtests) ===================
[14:42:03] ==================== ttm_tt_init_basic ====================
[14:42:03] [PASSED] Page-aligned size
[14:42:03] [PASSED] Extra pages requested
[14:42:03] ================ [PASSED] ttm_tt_init_basic ================
[14:42:03] [PASSED] ttm_tt_init_misaligned
[14:42:03] [PASSED] ttm_tt_fini_basic
[14:42:03] [PASSED] ttm_tt_fini_sg
[14:42:03] [PASSED] ttm_tt_fini_shmem
[14:42:03] [PASSED] ttm_tt_create_basic
[14:42:03] [PASSED] ttm_tt_create_invalid_bo_type
[14:42:03] [PASSED] ttm_tt_create_ttm_exists
[14:42:03] [PASSED] ttm_tt_create_failed
[14:42:03] [PASSED] ttm_tt_destroy_basic
[14:42:03] [PASSED] ttm_tt_populate_null_ttm
[14:42:03] [PASSED] ttm_tt_populate_populated_ttm
[14:42:03] [PASSED] ttm_tt_unpopulate_basic
[14:42:03] [PASSED] ttm_tt_unpopulate_empty_ttm
[14:42:03] [PASSED] ttm_tt_swapin_basic
[14:42:03] ===================== [PASSED] ttm_tt ======================
[14:42:03] =================== ttm_bo (14 subtests) ===================
[14:42:03] =========== ttm_bo_reserve_optimistic_no_ticket ===========
[14:42:03] [PASSED] Cannot be interrupted and sleeps
[14:42:03] [PASSED] Cannot be interrupted, locks straight away
[14:42:03] [PASSED] Can be interrupted, sleeps
[14:42:03] ======= [PASSED] ttm_bo_reserve_optimistic_no_ticket =======
[14:42:03] [PASSED] ttm_bo_reserve_locked_no_sleep
[14:42:03] [PASSED] ttm_bo_reserve_no_wait_ticket
[14:42:03] [PASSED] ttm_bo_reserve_double_resv
[14:42:03] [PASSED] ttm_bo_reserve_interrupted
[14:42:03] [PASSED] ttm_bo_reserve_deadlock
[14:42:03] [PASSED] ttm_bo_unreserve_basic
[14:42:03] [PASSED] ttm_bo_unreserve_pinned
[14:42:03] [PASSED] ttm_bo_unreserve_bulk
[14:42:03] [PASSED] ttm_bo_fini_basic
[14:42:03] [PASSED] ttm_bo_fini_shared_resv
[14:42:03] [PASSED] ttm_bo_pin_basic
[14:42:03] [PASSED] ttm_bo_pin_unpin_resource
[14:42:03] [PASSED] ttm_bo_multiple_pin_one_unpin
[14:42:03] ===================== [PASSED] ttm_bo ======================
[14:42:03] ============== ttm_bo_validate (21 subtests) ===============
[14:42:03] ============== ttm_bo_init_reserved_sys_man ===============
[14:42:03] [PASSED] Buffer object for userspace
[14:42:03] [PASSED] Kernel buffer object
[14:42:03] [PASSED] Shared buffer object
[14:42:03] ========== [PASSED] ttm_bo_init_reserved_sys_man ===========
[14:42:03] ============== ttm_bo_init_reserved_mock_man ==============
[14:42:03] [PASSED] Buffer object for userspace
[14:42:03] [PASSED] Kernel buffer object
[14:42:03] [PASSED] Shared buffer object
[14:42:03] ========== [PASSED] ttm_bo_init_reserved_mock_man ==========
[14:42:03] [PASSED] ttm_bo_init_reserved_resv
[14:42:03] ================== ttm_bo_validate_basic ==================
[14:42:03] [PASSED] Buffer object for userspace
[14:42:03] [PASSED] Kernel buffer object
[14:42:03] [PASSED] Shared buffer object
[14:42:03] ============== [PASSED] ttm_bo_validate_basic ==============
[14:42:03] [PASSED] ttm_bo_validate_invalid_placement
[14:42:03] ============= ttm_bo_validate_same_placement ==============
[14:42:03] [PASSED] System manager
[14:42:03] [PASSED] VRAM manager
[14:42:03] ========= [PASSED] ttm_bo_validate_same_placement ==========
[14:42:03] [PASSED] ttm_bo_validate_failed_alloc
[14:42:03] [PASSED] ttm_bo_validate_pinned
[14:42:03] [PASSED] ttm_bo_validate_busy_placement
[14:42:03] ================ ttm_bo_validate_multihop =================
[14:42:03] [PASSED] Buffer object for userspace
[14:42:03] [PASSED] Kernel buffer object
[14:42:03] [PASSED] Shared buffer object
[14:42:03] ============ [PASSED] ttm_bo_validate_multihop =============
[14:42:03] ========== ttm_bo_validate_no_placement_signaled ==========
[14:42:03] [PASSED] Buffer object in system domain, no page vector
[14:42:03] [PASSED] Buffer object in system domain with an existing page vector
[14:42:03] ====== [PASSED] ttm_bo_validate_no_placement_signaled ======
[14:42:03] ======== ttm_bo_validate_no_placement_not_signaled ========
[14:42:03] [PASSED] Buffer object for userspace
[14:42:03] [PASSED] Kernel buffer object
[14:42:03] [PASSED] Shared buffer object
[14:42:03] ==== [PASSED] ttm_bo_validate_no_placement_not_signaled ====
[14:42:03] [PASSED] ttm_bo_validate_move_fence_signaled
[14:42:03] ========= ttm_bo_validate_move_fence_not_signaled =========
[14:42:03] [PASSED] Waits for GPU
[14:42:03] [PASSED] Tries to lock straight away
[14:42:03] ===== [PASSED] ttm_bo_validate_move_fence_not_signaled =====
[14:42:03] [PASSED] ttm_bo_validate_happy_evict
[14:42:03] [PASSED] ttm_bo_validate_all_pinned_evict
[14:42:03] [PASSED] ttm_bo_validate_allowed_only_evict
[14:42:03] [PASSED] ttm_bo_validate_deleted_evict
[14:42:03] [PASSED] ttm_bo_validate_busy_domain_evict
[14:42:03] [PASSED] ttm_bo_validate_evict_gutting
[14:42:03] [PASSED] ttm_bo_validate_recrusive_evict
stty: 'standard input': Inappropriate ioctl for device
[14:42:03] ================= [PASSED] ttm_bo_validate =================
[14:42:03] ============================================================
[14:42:03] Testing complete. Ran 101 tests: passed: 101
[14:42:03] Elapsed time: 11.131s total, 1.642s configuring, 9.273s building, 0.182s running
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8 1/3] drm/xe/migrate: Use AVX instructions to prevent partial writes during VF migration CCS batch buffer updates
2025-10-24 14:25 ` K V P, Satyanarayana
@ 2025-10-24 15:40 ` Matthew Brost
2025-10-24 16:05 ` Matt Roper
1 sibling, 0 replies; 15+ messages in thread
From: Matthew Brost @ 2025-10-24 15:40 UTC (permalink / raw)
To: K V P, Satyanarayana
Cc: Ville Syrjälä, Rodrigo Vivi, intel-xe, Michal Wajdeczko,
Matthew Auld, Matt Roper
On Fri, Oct 24, 2025 at 07:55:32PM +0530, K V P, Satyanarayana wrote:
>
>
> On 24-10-2025 19:35, Ville Syrjälä wrote:
> > On Fri, Oct 24, 2025 at 09:57:15AM -0400, Rodrigo Vivi wrote:
> > > On Fri, Oct 24, 2025 at 07:05:24PM +0530, Satyanarayana K V P wrote:
> > >
> > > Hi Satya,
> > >
> > > First of all, thank you for the updates.
> > >
> > > Second, the subject is way to big.
> > >
> > > This should be enough and under 75 cols:
> > >
> > > drm/xe: Use AVX instructions to prevent partial writes during VF pause
> > >
> > > more below:
> > >
> > > > VF KMD registers two specialized contexts with the GUC for migration
> > > > operations. Save context contain copy commands and PTEs to transfer CCS
> > > > metadata from GPU pools to system memory and restore context contain copy
> > > > commands and PTEs to transfer CCS metadata from system memory back to CCS
> > > > pools. GUC submits these contexts to HW during VF migration.
> > > >
> > > > Each context uses a large batch buffer allocated via sub-allocator,
> > > > pre-filled with MI_NOOPs and terminated with MI_BATCH_BUFFER_END. During
> > > > BO lifecycle management, segments are dynamically allocated from this
> > > > buffer and populated with PTEs and copy commands for active BOs, then reset
> > > > to MI_NOOPs when BOs are destroyed.
> > > >
> > > > The CCS copy operation requires a 5-dword command sequence to be written
> > > > to the batch buffer. During VF migration save/restore operations, if the
> > > > vCPU gets preempted or halted while this command sequence is being
> > > > programmed, partial writes can occur. These partial writes create
> > > > incomplete GPU instructions in the batch buffer, which trigger page faults
> > > > when the GUC submits the batch buffer to hardware for CCS metadata
> > > > operations.
> > >
> > > Perhaps we could summarize the thing here and move details to the comment
> > > near the assembly. The important part in the commit message is to have
> > > the 'why'. Some of the details of the commands like MI_NOOP fill and all
> > > could be in the comment near the ASM.
> > >
> > > >
> > > > Standard memory operations like memcpy() are preemptible, meaning the CPU
> > > > scheduler can interrupt execution midway through writing the command
> > > > sequence, leaving the batch buffer in an inconsistent state with partially
> > > > written GPU instructions.
> > > >
> > > > Replace standard memory operations with x86 AVX instructions that provide
> > > > atomic, non-preemptible writes as AVX instructions cannot be preempted
> > > > during execution, ensuring complete command sequences are written
> > > > atomically to the batch buffer.
> > > >
> > > > Expand EMIT_COPY_CCS_DW from 5 dwords to 8 dwords to align with 256-bit
> > > > VMOVDQU operations. Update emit_flush_invalidate() to use VMOVDQU
> > > > operating with 128-bit chunks. By ensuring GPU instruction headers
> > > > (3-dword and 5-dword sequences) are written atomically, we prevent partial
> > > > updates that could compromise migration stability.
> > > >
> > > > This approach guarantees that batch buffer updates are completed entirely
> > > > or not at all, eliminating the page fault scenarios during VF migration
> > > > operations regardless of vCPU scheduling behavior.
> > > >
> > > > Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> > > > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > > Cc: Matthew Auld <matthew.auld@intel.com>
> > > > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > Cc: Matt Roper <matthew.d.roper@intel.com>
> > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > >
> > > > ---
> > > > V7 -> V8:
> > > > - Updated commit title and message.
> > > >
> > > > V6 -> V7:
> > > > - Added description explaining why to use assembly instructions for
> > > > atomicity.
> > > > - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
> > > > - Include <asm/cpufeature.h> though checkpatch complains. With
> > > > <linux/cpufeature.h> KUnit is throwing errors.
> > > >
> > > > V5 -> V6:
> > > > - Fixed review comments (Rodrigo)
> > > >
> > > > V4 -> V5:
> > > > - Fixed review comments. (Matt B)
> > > >
> > > > V3 -> V4:
> > > > - Fixed review comments. (Wajdeczko)
> > > > - Fix issues reported by patchworks.
> > > >
> > > > V2 -> V3:
> > > > - Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
> > > > - Updated emit_flush_invalidate() to use vmovdqu instruction.
> > > >
> > > > V1 -> V2:
> > > > - Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
> > > > (Auld, Matthew)
> > > > - Fix issues reported by patchworks.
> > > > ---
> > > > drivers/gpu/drm/xe/xe_migrate.c | 114 ++++++++++++++++++++++++++------
> > > > 1 file changed, 93 insertions(+), 21 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> > > > index 921c9c1ea41f..005dc26a0393 100644
> > > > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > > @@ -5,6 +5,8 @@
> > > > #include "xe_migrate.h"
> > > > +#include <asm/fpu/api.h>
> > > > +#include <asm/cpufeature.h>
> > > > #include <linux/bitfield.h>
> > > > #include <linux/sizes.h>
> > > > @@ -33,6 +35,7 @@
> > > > #include "xe_res_cursor.h"
> > > > #include "xe_sa.h"
> > > > #include "xe_sched_job.h"
> > > > +#include "xe_sriov_vf_ccs.h"
> > > > #include "xe_sync.h"
> > > > #include "xe_trace_bo.h"
> > > > #include "xe_validation.h"
> > > > @@ -657,18 +660,70 @@ static void emit_pte(struct xe_migrate *m,
> > > > }
> > > > }
> > > > -#define EMIT_COPY_CCS_DW 5
> > > > +/*
> > > > + * VF KMD registers two special LRCs with the GuC to handle save/restore
> > > > + * operations for CCS metadata on IGPU. GUC executes these LRCAs during
> > > > + * VF state/restore operations.
> > > > + *
> > > > + * Each LRC contains a batch buffer pool that GuC submits to hardware during
> > > > + * VF state save/restore operations. Since these operations can occur
> > > > + * asynchronously at any time, we must ensure GPU instructions in the batch
> > > > + * buffer are written atomically to prevent corruption from incomplete writes.
> > > > + *
> > > > + * To guarantee atomic instruction writes, we use x86 SIMD instructions
> > >
> > > Here you still mention 'atomic' since we already know this is not 'atomic'.
> >
> > I still don't see how is this supposed to do anything useful without
> > atomic writes to memory.
> >
> > If the GPU is executing the same memory we're writing then nothing
> > short of atomic memory writes is going to actually fix it. And even
The vCPUs are halted when the save buffer is executed—we only care about
whether GPU instructions are partially programmed. Storing complete GPU
instructions on a single CPU ensures consistency.
On the restore side, both the vCPU and GuC are active, but we have
barriers in place. Additionally, we should be able to guarantee that the
buffer isn’t modified until execution is complete (not handled in this
patch, but planned for a follow-up).
> > that would require careful alignment of things to guarantee that
> > each command is completely contained within one atomic write.
> >
> The CPU and GPU operate on the same memory space but at different times
> during VF migration. The critical issue occurs during the batch buffer
> preparation phase when the vCPU is still active and writing GPU
> instructions, while the GPU will later execute these same instructions after
> the vCPU is paused.
>
> During batch buffer updates, if the vCPU gets preempted while writing GPU
> instruction sequences (such as the 5-dword CCS copy command), it leaves
> partially written instructions in memory. When the GPU later executes the
> batch buffer after vCPU suspension, these incomplete instructions cause
> execution failures and page faults.
>
> AVX instructions provide atomic write operations that cannot be interrupted
The word atomic is causing confusion. No, AVX instructions aren't
cache-atomic if unaligned, and it's unclear if they are even when
aligned. But as Satya mentioned, it's a single instruction. Halting the
vCPU guarantees that an instruction is either fully executed or not at
all—meaning it's either entirely visible in memory or not.
Caching isn't a concern here. The GuC and hardware parsing these
instructions are cache-coherent, and there are no ordering issues like
in interfaces that require cachelines to appear in a specific sequence.
Can we drop the word atomic? It seems to be confusing everyone.
Matt
> by the CPU scheduler. This ensures that GPU instruction sequences are
> written completely before any potential vCPU preemption occurs.
>
> AVX instructions (VMOVDQU) guarantee that entire instruction sequences are
> written in a single, non-preemptible operation. The 5-dword CCS copy command
> is expanded to 8 dwords (padded with 3 MI_NOOPs) to meet AVX 256-bit
> alignment requirements. By the time the GPU executes the batch buffer (after
> vCPU pause), all instructions are guaranteed to be completely written.
>
> Here we are ensuring that GPU instructions are fully formed before the GPU
> attempts to execute them during the migration process.
>
> -Satya.>>
> > > Let a summarized explanation in the commit message and put more here.
> > >
> > > I'm sorry for being picky here, but I want to ensure that the information
> > > around this code is clear so we don't keep having to explain this over
> > > and over in the future.
> > >
> > > > + * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
> > > > + * sections. This prevents vCPU preemption during instruction generation,
> > > > + * ensuring complete GPU commands are written to the batch buffer.
> > > > + */
> > > > +
> > > > +static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
> > > > +{
> > > > + xe_assert(xe, !IS_DGFX(xe));
> > > > + xe_assert(xe, IS_SRIOV_VF(xe));
> > > > +
> > > > +#ifdef CONFIG_X86
> > > > + kernel_fpu_begin();
> > > > + if (size == SZ_128) {
> > > > + asm("vmovdqu (%0), %%xmm0\n"
> > > > + "vmovups %%xmm0, (%1)\n"
> > > > + :: "r" (src), "r" (dst) : "memory");
> > > > + } else if (size == SZ_256) {
> > > > + asm("vmovdqu (%0), %%ymm0\n"
> > > > + "vmovups %%ymm0, (%1)\n"
> > > > + :: "r" (src), "r" (dst) : "memory");
> > > > + }
> > > > + kernel_fpu_end();
> > > > +#endif
> > > > +}
> > > > +
> > > > +static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
> > > > +{
> > > > + u32 instr_size = size * BITS_PER_BYTE;
> > > > +
> > > > + xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
> > > > +
> > > > + if (IS_VF_CCS_READY(gt_to_xe(gt))) {
> > > > + xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
> > > > + memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
> > > > + } else {
> > > > + memcpy(dst, src, size);
> > > > + }
> > > > +}
> > > > +
> > > > +#define EMIT_COPY_CCS_DW 8
> > > > static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > > > u64 dst_ofs, bool dst_is_indirect,
> > > > u64 src_ofs, bool src_is_indirect,
> > > > u32 size)
> > > > {
> > > > + u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
> > > > struct xe_device *xe = gt_to_xe(gt);
> > > > u32 *cs = bb->cs + bb->len;
> > > > u32 num_ccs_blks;
> > > > u32 num_pages;
> > > > u32 ccs_copy_size;
> > > > u32 mocs;
> > > > + u32 i = 0;
> > > > if (GRAPHICS_VERx100(xe) >= 2000) {
> > > > num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
> > > > @@ -686,15 +741,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > > > mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
> > > > }
> > > > - *cs++ = XY_CTRL_SURF_COPY_BLT |
> > > > - (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > > > - (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > > > - ccs_copy_size;
> > > > - *cs++ = lower_32_bits(src_ofs);
> > > > - *cs++ = upper_32_bits(src_ofs) | mocs;
> > > > - *cs++ = lower_32_bits(dst_ofs);
> > > > - *cs++ = upper_32_bits(dst_ofs) | mocs;
> > > > + dw[i++] = XY_CTRL_SURF_COPY_BLT |
> > > > + (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > > > + (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > > > + ccs_copy_size;
> > > > + dw[i++] = lower_32_bits(src_ofs);
> > > > + dw[i++] = upper_32_bits(src_ofs) | mocs;
> > > > + dw[i++] = lower_32_bits(dst_ofs);
> > > > + dw[i++] = upper_32_bits(dst_ofs) | mocs;
> > > > + /*
> > > > + * The CCS copy command is a 5-dword sequence. If the vCPU halts during
> > > > + * save/restore while this sequence is being issued, partial writes may trigger
> > > > + * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
> > > > + * write the sequence atomically.
> > > > + */
> > > > + emit_atomic(gt, cs, dw, sizeof(dw));
> > > > + cs += EMIT_COPY_CCS_DW;
> > > > bb->len = cs - bb->cs;
> > > > }
> > > > @@ -1061,18 +1124,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
> > > > return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
> > > > }
> > > > -static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
> > > > +/*
> > > > + * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
> > > > + * save/restore while this sequence is being issued, partial writes may
> > > > + * trigger page faults when saving iGPU CCS metadata. Use
> > > > + * emit_atomic() to write the sequence atomically.
> > > > + */
> > > > +#define EMIT_FLUSH_INVALIDATE_DW 4
> > > > +static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
> > > > {
> > > > u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
> > > > + u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
> > > > +
> > > > + dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > > > + MI_FLUSH_IMM_DW | flags;
> > > > + dw[j++] = lower_32_bits(addr);
> > > > + dw[j++] = upper_32_bits(addr);
> > > > + dw[j++] = MI_NOOP;
> > > > - dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > > > - MI_FLUSH_IMM_DW | flags;
> > > > - dw[i++] = lower_32_bits(addr);
> > > > - dw[i++] = upper_32_bits(addr);
> > > > - dw[i++] = MI_NOOP;
> > > > - dw[i++] = MI_NOOP;
> > > > + emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
> > > > - return i;
> > > > + return i + j;
> > > > }
> > > > /**
> > > > @@ -1117,7 +1189,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > > /* Calculate Batch buffer size */
> > > > batch_size = 0;
> > > > while (size) {
> > > > - batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > > > + batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > > > u64 ccs_ofs, ccs_size;
> > > > u32 ccs_pt;
> > > > @@ -1158,7 +1230,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > > * sizes here again before copy command is emitted.
> > > > */
> > > > while (size) {
> > > > - batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > > > + batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > > > u32 flush_flags = 0;
> > > > u64 ccs_ofs, ccs_size;
> > > > u32 ccs_pt;
> > > > @@ -1181,11 +1253,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > > emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
> > > > - bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > > > + bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> > > > flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
> > > > src_L0_ofs, dst_is_pltt,
> > > > src_L0, ccs_ofs, true);
> > > > - bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > > > + bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> > > > size -= src_L0;
> > > > }
> > > > --
> > > > 2.51.0
> > > >
> >
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* ✓ Xe.CI.BAT: success for drm/xe/migrate: Atomicize CCS copy command setup
2025-10-24 13:35 [PATCH v8 0/3] drm/xe/migrate: Atomicize CCS copy command setup Satyanarayana K V P
` (4 preceding siblings ...)
2025-10-24 14:42 ` ✓ CI.KUnit: success " Patchwork
@ 2025-10-24 15:48 ` Patchwork
2025-10-25 3:47 ` ✓ Xe.CI.Full: " Patchwork
6 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2025-10-24 15:48 UTC (permalink / raw)
To: K V P, Satyanarayana; +Cc: intel-xe
[-- Attachment #1: Type: text/plain, Size: 865 bytes --]
== Series Details ==
Series: drm/xe/migrate: Atomicize CCS copy command setup
URL : https://patchwork.freedesktop.org/series/156482/
State : success
== Summary ==
CI Bug Log - changes from xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf_BAT -> xe-pw-156482v1_BAT
====================================================
Summary
-------
**SUCCESS**
No regressions found.
Participating hosts (13 -> 12)
------------------------------
Missing (1): bat-ptl-vm
Changes
-------
No changes found
Build changes
-------------
* Linux: xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf -> xe-pw-156482v1
IGT_8596: 8596
xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf: 1e54d2c469a91e00a39ff7f6b98c31d290245ecf
xe-pw-156482v1: 156482v1
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/index.html
[-- Attachment #2: Type: text/html, Size: 1413 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8 1/3] drm/xe/migrate: Use AVX instructions to prevent partial writes during VF migration CCS batch buffer updates
2025-10-24 14:25 ` K V P, Satyanarayana
2025-10-24 15:40 ` Matthew Brost
@ 2025-10-24 16:05 ` Matt Roper
2025-10-24 16:10 ` Matthew Brost
1 sibling, 1 reply; 15+ messages in thread
From: Matt Roper @ 2025-10-24 16:05 UTC (permalink / raw)
To: K V P, Satyanarayana
Cc: Ville Syrjälä, Rodrigo Vivi, intel-xe, Michal Wajdeczko,
Matthew Brost, Matthew Auld
On Fri, Oct 24, 2025 at 07:55:32PM +0530, K V P, Satyanarayana wrote:
>
>
> On 24-10-2025 19:35, Ville Syrjälä wrote:
> > On Fri, Oct 24, 2025 at 09:57:15AM -0400, Rodrigo Vivi wrote:
> > > On Fri, Oct 24, 2025 at 07:05:24PM +0530, Satyanarayana K V P wrote:
> > >
> > > Hi Satya,
> > >
> > > First of all, thank you for the updates.
> > >
> > > Second, the subject is way to big.
> > >
> > > This should be enough and under 75 cols:
> > >
> > > drm/xe: Use AVX instructions to prevent partial writes during VF pause
> > >
> > > more below:
> > >
> > > > VF KMD registers two specialized contexts with the GUC for migration
> > > > operations. Save context contain copy commands and PTEs to transfer CCS
> > > > metadata from GPU pools to system memory and restore context contain copy
> > > > commands and PTEs to transfer CCS metadata from system memory back to CCS
> > > > pools. GUC submits these contexts to HW during VF migration.
> > > >
> > > > Each context uses a large batch buffer allocated via sub-allocator,
> > > > pre-filled with MI_NOOPs and terminated with MI_BATCH_BUFFER_END. During
> > > > BO lifecycle management, segments are dynamically allocated from this
> > > > buffer and populated with PTEs and copy commands for active BOs, then reset
> > > > to MI_NOOPs when BOs are destroyed.
> > > >
> > > > The CCS copy operation requires a 5-dword command sequence to be written
> > > > to the batch buffer. During VF migration save/restore operations, if the
> > > > vCPU gets preempted or halted while this command sequence is being
> > > > programmed, partial writes can occur. These partial writes create
> > > > incomplete GPU instructions in the batch buffer, which trigger page faults
> > > > when the GUC submits the batch buffer to hardware for CCS metadata
> > > > operations.
> > >
> > > Perhaps we could summarize the thing here and move details to the comment
> > > near the assembly. The important part in the commit message is to have
> > > the 'why'. Some of the details of the commands like MI_NOOP fill and all
> > > could be in the comment near the ASM.
> > >
> > > >
> > > > Standard memory operations like memcpy() are preemptible, meaning the CPU
> > > > scheduler can interrupt execution midway through writing the command
> > > > sequence, leaving the batch buffer in an inconsistent state with partially
> > > > written GPU instructions.
> > > >
> > > > Replace standard memory operations with x86 AVX instructions that provide
> > > > atomic, non-preemptible writes as AVX instructions cannot be preempted
> > > > during execution, ensuring complete command sequences are written
> > > > atomically to the batch buffer.
> > > >
> > > > Expand EMIT_COPY_CCS_DW from 5 dwords to 8 dwords to align with 256-bit
> > > > VMOVDQU operations. Update emit_flush_invalidate() to use VMOVDQU
> > > > operating with 128-bit chunks. By ensuring GPU instruction headers
> > > > (3-dword and 5-dword sequences) are written atomically, we prevent partial
> > > > updates that could compromise migration stability.
> > > >
> > > > This approach guarantees that batch buffer updates are completed entirely
> > > > or not at all, eliminating the page fault scenarios during VF migration
> > > > operations regardless of vCPU scheduling behavior.
> > > >
> > > > Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> > > > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > > Cc: Matthew Auld <matthew.auld@intel.com>
> > > > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > Cc: Matt Roper <matthew.d.roper@intel.com>
> > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > >
> > > > ---
> > > > V7 -> V8:
> > > > - Updated commit title and message.
> > > >
> > > > V6 -> V7:
> > > > - Added description explaining why to use assembly instructions for
> > > > atomicity.
> > > > - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
> > > > - Include <asm/cpufeature.h> though checkpatch complains. With
> > > > <linux/cpufeature.h> KUnit is throwing errors.
> > > >
> > > > V5 -> V6:
> > > > - Fixed review comments (Rodrigo)
> > > >
> > > > V4 -> V5:
> > > > - Fixed review comments. (Matt B)
> > > >
> > > > V3 -> V4:
> > > > - Fixed review comments. (Wajdeczko)
> > > > - Fix issues reported by patchworks.
> > > >
> > > > V2 -> V3:
> > > > - Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
> > > > - Updated emit_flush_invalidate() to use vmovdqu instruction.
> > > >
> > > > V1 -> V2:
> > > > - Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
> > > > (Auld, Matthew)
> > > > - Fix issues reported by patchworks.
> > > > ---
> > > > drivers/gpu/drm/xe/xe_migrate.c | 114 ++++++++++++++++++++++++++------
> > > > 1 file changed, 93 insertions(+), 21 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> > > > index 921c9c1ea41f..005dc26a0393 100644
> > > > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > > @@ -5,6 +5,8 @@
> > > > #include "xe_migrate.h"
> > > > +#include <asm/fpu/api.h>
> > > > +#include <asm/cpufeature.h>
> > > > #include <linux/bitfield.h>
> > > > #include <linux/sizes.h>
> > > > @@ -33,6 +35,7 @@
> > > > #include "xe_res_cursor.h"
> > > > #include "xe_sa.h"
> > > > #include "xe_sched_job.h"
> > > > +#include "xe_sriov_vf_ccs.h"
> > > > #include "xe_sync.h"
> > > > #include "xe_trace_bo.h"
> > > > #include "xe_validation.h"
> > > > @@ -657,18 +660,70 @@ static void emit_pte(struct xe_migrate *m,
> > > > }
> > > > }
> > > > -#define EMIT_COPY_CCS_DW 5
> > > > +/*
> > > > + * VF KMD registers two special LRCs with the GuC to handle save/restore
> > > > + * operations for CCS metadata on IGPU. GUC executes these LRCAs during
> > > > + * VF state/restore operations.
> > > > + *
> > > > + * Each LRC contains a batch buffer pool that GuC submits to hardware during
> > > > + * VF state save/restore operations. Since these operations can occur
> > > > + * asynchronously at any time, we must ensure GPU instructions in the batch
> > > > + * buffer are written atomically to prevent corruption from incomplete writes.
> > > > + *
> > > > + * To guarantee atomic instruction writes, we use x86 SIMD instructions
> > >
> > > Here you still mention 'atomic' since we already know this is not 'atomic'.
> >
> > I still don't see how is this supposed to do anything useful without
> > atomic writes to memory.
> >
> > If the GPU is executing the same memory we're writing then nothing
> > short of atomic memory writes is going to actually fix it. And even
> > that would require careful alignment of things to guarantee that
> > each command is completely contained within one atomic write.
> >
> The CPU and GPU operate on the same memory space but at different times
> during VF migration. The critical issue occurs during the batch buffer
> preparation phase when the vCPU is still active and writing GPU
> instructions, while the GPU will later execute these same instructions after
> the vCPU is paused.
>
> During batch buffer updates, if the vCPU gets preempted while writing GPU
> instruction sequences (such as the 5-dword CCS copy command), it leaves
> partially written instructions in memory. When the GPU later executes the
> batch buffer after vCPU suspension, these incomplete instructions cause
> execution failures and page faults.
As was discussed on the previous revision, the architecture document
already gives guidance on approaches to deal with these timing issues;
using AVX like this is not what was recommended. Can't we just
implement the shadow buffer and eliminate this controversial and
confusing assembly usage? I think relying on assembly should be the
absolute last resort and not something we jump to when we have cleaner
and more widely-supported options.
Matt
>
> AVX instructions provide atomic write operations that cannot be interrupted
> by the CPU scheduler. This ensures that GPU instruction sequences are
> written completely before any potential vCPU preemption occurs.
>
> AVX instructions (VMOVDQU) guarantee that entire instruction sequences are
> written in a single, non-preemptible operation. The 5-dword CCS copy command
> is expanded to 8 dwords (padded with 3 MI_NOOPs) to meet AVX 256-bit
> alignment requirements. By the time the GPU executes the batch buffer (after
> vCPU pause), all instructions are guaranteed to be completely written.
>
> Here we are ensuring that GPU instructions are fully formed before the GPU
> attempts to execute them during the migration process.
>
> -Satya.>>
> > > Let a summarized explanation in the commit message and put more here.
> > >
> > > I'm sorry for being picky here, but I want to ensure that the information
> > > around this code is clear so we don't keep having to explain this over
> > > and over in the future.
> > >
> > > > + * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
> > > > + * sections. This prevents vCPU preemption during instruction generation,
> > > > + * ensuring complete GPU commands are written to the batch buffer.
> > > > + */
> > > > +
> > > > +static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
> > > > +{
> > > > + xe_assert(xe, !IS_DGFX(xe));
> > > > + xe_assert(xe, IS_SRIOV_VF(xe));
> > > > +
> > > > +#ifdef CONFIG_X86
> > > > + kernel_fpu_begin();
> > > > + if (size == SZ_128) {
> > > > + asm("vmovdqu (%0), %%xmm0\n"
> > > > + "vmovups %%xmm0, (%1)\n"
> > > > + :: "r" (src), "r" (dst) : "memory");
> > > > + } else if (size == SZ_256) {
> > > > + asm("vmovdqu (%0), %%ymm0\n"
> > > > + "vmovups %%ymm0, (%1)\n"
> > > > + :: "r" (src), "r" (dst) : "memory");
> > > > + }
> > > > + kernel_fpu_end();
> > > > +#endif
> > > > +}
> > > > +
> > > > +static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
> > > > +{
> > > > + u32 instr_size = size * BITS_PER_BYTE;
> > > > +
> > > > + xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
> > > > +
> > > > + if (IS_VF_CCS_READY(gt_to_xe(gt))) {
> > > > + xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
> > > > + memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
> > > > + } else {
> > > > + memcpy(dst, src, size);
> > > > + }
> > > > +}
> > > > +
> > > > +#define EMIT_COPY_CCS_DW 8
> > > > static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > > > u64 dst_ofs, bool dst_is_indirect,
> > > > u64 src_ofs, bool src_is_indirect,
> > > > u32 size)
> > > > {
> > > > + u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
> > > > struct xe_device *xe = gt_to_xe(gt);
> > > > u32 *cs = bb->cs + bb->len;
> > > > u32 num_ccs_blks;
> > > > u32 num_pages;
> > > > u32 ccs_copy_size;
> > > > u32 mocs;
> > > > + u32 i = 0;
> > > > if (GRAPHICS_VERx100(xe) >= 2000) {
> > > > num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
> > > > @@ -686,15 +741,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > > > mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
> > > > }
> > > > - *cs++ = XY_CTRL_SURF_COPY_BLT |
> > > > - (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > > > - (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > > > - ccs_copy_size;
> > > > - *cs++ = lower_32_bits(src_ofs);
> > > > - *cs++ = upper_32_bits(src_ofs) | mocs;
> > > > - *cs++ = lower_32_bits(dst_ofs);
> > > > - *cs++ = upper_32_bits(dst_ofs) | mocs;
> > > > + dw[i++] = XY_CTRL_SURF_COPY_BLT |
> > > > + (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > > > + (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > > > + ccs_copy_size;
> > > > + dw[i++] = lower_32_bits(src_ofs);
> > > > + dw[i++] = upper_32_bits(src_ofs) | mocs;
> > > > + dw[i++] = lower_32_bits(dst_ofs);
> > > > + dw[i++] = upper_32_bits(dst_ofs) | mocs;
> > > > + /*
> > > > + * The CCS copy command is a 5-dword sequence. If the vCPU halts during
> > > > + * save/restore while this sequence is being issued, partial writes may trigger
> > > > + * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
> > > > + * write the sequence atomically.
> > > > + */
> > > > + emit_atomic(gt, cs, dw, sizeof(dw));
> > > > + cs += EMIT_COPY_CCS_DW;
> > > > bb->len = cs - bb->cs;
> > > > }
> > > > @@ -1061,18 +1124,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
> > > > return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
> > > > }
> > > > -static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
> > > > +/*
> > > > + * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
> > > > + * save/restore while this sequence is being issued, partial writes may
> > > > + * trigger page faults when saving iGPU CCS metadata. Use
> > > > + * emit_atomic() to write the sequence atomically.
> > > > + */
> > > > +#define EMIT_FLUSH_INVALIDATE_DW 4
> > > > +static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
> > > > {
> > > > u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
> > > > + u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
> > > > +
> > > > + dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > > > + MI_FLUSH_IMM_DW | flags;
> > > > + dw[j++] = lower_32_bits(addr);
> > > > + dw[j++] = upper_32_bits(addr);
> > > > + dw[j++] = MI_NOOP;
> > > > - dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > > > - MI_FLUSH_IMM_DW | flags;
> > > > - dw[i++] = lower_32_bits(addr);
> > > > - dw[i++] = upper_32_bits(addr);
> > > > - dw[i++] = MI_NOOP;
> > > > - dw[i++] = MI_NOOP;
> > > > + emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
> > > > - return i;
> > > > + return i + j;
> > > > }
> > > > /**
> > > > @@ -1117,7 +1189,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > > /* Calculate Batch buffer size */
> > > > batch_size = 0;
> > > > while (size) {
> > > > - batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > > > + batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > > > u64 ccs_ofs, ccs_size;
> > > > u32 ccs_pt;
> > > > @@ -1158,7 +1230,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > > * sizes here again before copy command is emitted.
> > > > */
> > > > while (size) {
> > > > - batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > > > + batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > > > u32 flush_flags = 0;
> > > > u64 ccs_ofs, ccs_size;
> > > > u32 ccs_pt;
> > > > @@ -1181,11 +1253,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > > emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
> > > > - bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > > > + bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> > > > flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
> > > > src_L0_ofs, dst_is_pltt,
> > > > src_L0, ccs_ofs, true);
> > > > - bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > > > + bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> > > > size -= src_L0;
> > > > }
> > > > --
> > > > 2.51.0
> > > >
> >
>
--
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8 1/3] drm/xe/migrate: Use AVX instructions to prevent partial writes during VF migration CCS batch buffer updates
2025-10-24 16:05 ` Matt Roper
@ 2025-10-24 16:10 ` Matthew Brost
2025-10-24 20:07 ` Vivi, Rodrigo
0 siblings, 1 reply; 15+ messages in thread
From: Matthew Brost @ 2025-10-24 16:10 UTC (permalink / raw)
To: Matt Roper
Cc: K V P, Satyanarayana, Ville Syrjälä, Rodrigo Vivi,
intel-xe, Michal Wajdeczko, Matthew Auld
On Fri, Oct 24, 2025 at 09:05:12AM -0700, Matt Roper wrote:
> On Fri, Oct 24, 2025 at 07:55:32PM +0530, K V P, Satyanarayana wrote:
> >
> >
> > On 24-10-2025 19:35, Ville Syrjälä wrote:
> > > On Fri, Oct 24, 2025 at 09:57:15AM -0400, Rodrigo Vivi wrote:
> > > > On Fri, Oct 24, 2025 at 07:05:24PM +0530, Satyanarayana K V P wrote:
> > > >
> > > > Hi Satya,
> > > >
> > > > First of all, thank you for the updates.
> > > >
> > > > Second, the subject is way to big.
> > > >
> > > > This should be enough and under 75 cols:
> > > >
> > > > drm/xe: Use AVX instructions to prevent partial writes during VF pause
> > > >
> > > > more below:
> > > >
> > > > > VF KMD registers two specialized contexts with the GUC for migration
> > > > > operations. Save context contain copy commands and PTEs to transfer CCS
> > > > > metadata from GPU pools to system memory and restore context contain copy
> > > > > commands and PTEs to transfer CCS metadata from system memory back to CCS
> > > > > pools. GUC submits these contexts to HW during VF migration.
> > > > >
> > > > > Each context uses a large batch buffer allocated via sub-allocator,
> > > > > pre-filled with MI_NOOPs and terminated with MI_BATCH_BUFFER_END. During
> > > > > BO lifecycle management, segments are dynamically allocated from this
> > > > > buffer and populated with PTEs and copy commands for active BOs, then reset
> > > > > to MI_NOOPs when BOs are destroyed.
> > > > >
> > > > > The CCS copy operation requires a 5-dword command sequence to be written
> > > > > to the batch buffer. During VF migration save/restore operations, if the
> > > > > vCPU gets preempted or halted while this command sequence is being
> > > > > programmed, partial writes can occur. These partial writes create
> > > > > incomplete GPU instructions in the batch buffer, which trigger page faults
> > > > > when the GUC submits the batch buffer to hardware for CCS metadata
> > > > > operations.
> > > >
> > > > Perhaps we could summarize the thing here and move details to the comment
> > > > near the assembly. The important part in the commit message is to have
> > > > the 'why'. Some of the details of the commands like MI_NOOP fill and all
> > > > could be in the comment near the ASM.
> > > >
> > > > >
> > > > > Standard memory operations like memcpy() are preemptible, meaning the CPU
> > > > > scheduler can interrupt execution midway through writing the command
> > > > > sequence, leaving the batch buffer in an inconsistent state with partially
> > > > > written GPU instructions.
> > > > >
> > > > > Replace standard memory operations with x86 AVX instructions that provide
> > > > > atomic, non-preemptible writes as AVX instructions cannot be preempted
> > > > > during execution, ensuring complete command sequences are written
> > > > > atomically to the batch buffer.
> > > > >
> > > > > Expand EMIT_COPY_CCS_DW from 5 dwords to 8 dwords to align with 256-bit
> > > > > VMOVDQU operations. Update emit_flush_invalidate() to use VMOVDQU
> > > > > operating with 128-bit chunks. By ensuring GPU instruction headers
> > > > > (3-dword and 5-dword sequences) are written atomically, we prevent partial
> > > > > updates that could compromise migration stability.
> > > > >
> > > > > This approach guarantees that batch buffer updates are completed entirely
> > > > > or not at all, eliminating the page fault scenarios during VF migration
> > > > > operations regardless of vCPU scheduling behavior.
> > > > >
> > > > > Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
> > > > > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > > > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > > > Cc: Matthew Auld <matthew.auld@intel.com>
> > > > > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > > Cc: Matt Roper <matthew.d.roper@intel.com>
> > > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > >
> > > > > ---
> > > > > V7 -> V8:
> > > > > - Updated commit title and message.
> > > > >
> > > > > V6 -> V7:
> > > > > - Added description explaining why to use assembly instructions for
> > > > > atomicity.
> > > > > - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
> > > > > - Include <asm/cpufeature.h> though checkpatch complains. With
> > > > > <linux/cpufeature.h> KUnit is throwing errors.
> > > > >
> > > > > V5 -> V6:
> > > > > - Fixed review comments (Rodrigo)
> > > > >
> > > > > V4 -> V5:
> > > > > - Fixed review comments. (Matt B)
> > > > >
> > > > > V3 -> V4:
> > > > > - Fixed review comments. (Wajdeczko)
> > > > > - Fix issues reported by patchworks.
> > > > >
> > > > > V2 -> V3:
> > > > > - Added support for 128 bit and 256 bit instructions with memcpy_vmovdqu
> > > > > - Updated emit_flush_invalidate() to use vmovdqu instruction.
> > > > >
> > > > > V1 -> V2:
> > > > > - Use memcpy_vmovdqu only for x86 arch and for VF. Else use memcpy
> > > > > (Auld, Matthew)
> > > > > - Fix issues reported by patchworks.
> > > > > ---
> > > > > drivers/gpu/drm/xe/xe_migrate.c | 114 ++++++++++++++++++++++++++------
> > > > > 1 file changed, 93 insertions(+), 21 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> > > > > index 921c9c1ea41f..005dc26a0393 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > > > @@ -5,6 +5,8 @@
> > > > > #include "xe_migrate.h"
> > > > > +#include <asm/fpu/api.h>
> > > > > +#include <asm/cpufeature.h>
> > > > > #include <linux/bitfield.h>
> > > > > #include <linux/sizes.h>
> > > > > @@ -33,6 +35,7 @@
> > > > > #include "xe_res_cursor.h"
> > > > > #include "xe_sa.h"
> > > > > #include "xe_sched_job.h"
> > > > > +#include "xe_sriov_vf_ccs.h"
> > > > > #include "xe_sync.h"
> > > > > #include "xe_trace_bo.h"
> > > > > #include "xe_validation.h"
> > > > > @@ -657,18 +660,70 @@ static void emit_pte(struct xe_migrate *m,
> > > > > }
> > > > > }
> > > > > -#define EMIT_COPY_CCS_DW 5
> > > > > +/*
> > > > > + * VF KMD registers two special LRCs with the GuC to handle save/restore
> > > > > + * operations for CCS metadata on IGPU. GUC executes these LRCAs during
> > > > > + * VF state/restore operations.
> > > > > + *
> > > > > + * Each LRC contains a batch buffer pool that GuC submits to hardware during
> > > > > + * VF state save/restore operations. Since these operations can occur
> > > > > + * asynchronously at any time, we must ensure GPU instructions in the batch
> > > > > + * buffer are written atomically to prevent corruption from incomplete writes.
> > > > > + *
> > > > > + * To guarantee atomic instruction writes, we use x86 SIMD instructions
> > > >
> > > > Here you still mention 'atomic' since we already know this is not 'atomic'.
> > >
> > > I still don't see how is this supposed to do anything useful without
> > > atomic writes to memory.
> > >
> > > If the GPU is executing the same memory we're writing then nothing
> > > short of atomic memory writes is going to actually fix it. And even
> > > that would require careful alignment of things to guarantee that
> > > each command is completely contained within one atomic write.
> > >
> > The CPU and GPU operate on the same memory space but at different times
> > during VF migration. The critical issue occurs during the batch buffer
> > preparation phase when the vCPU is still active and writing GPU
> > instructions, while the GPU will later execute these same instructions after
> > the vCPU is paused.
> >
> > During batch buffer updates, if the vCPU gets preempted while writing GPU
> > instruction sequences (such as the 5-dword CCS copy command), it leaves
> > partially written instructions in memory. When the GPU later executes the
> > batch buffer after vCPU suspension, these incomplete instructions cause
> > execution failures and page faults.
>
> As was discussed on the previous revision, the architecture document
> already gives guidance on approaches to deal with these timing issues;
> using AVX like this is not what was recommended. Can't we just
> implement the shadow buffer and eliminate this controversial and
> confusing assembly usage? I think relying on assembly should be the
> absolute last resort and not something we jump to when we have cleaner
> and more widely-supported options.
>
I discussed this Rodrigo on call, we were both fine with this solution
but if the consensus if not use asm, not sure we can pivot. fwiw, this
solution will perform better as there is a non-zero cost of maintaining
2 buffers but perhaps that doesn't really matter as I wouldn't think
memory allocations are a hot path.
Matt
>
> Matt
>
> >
> > AVX instructions provide atomic write operations that cannot be interrupted
> > by the CPU scheduler. This ensures that GPU instruction sequences are
> > written completely before any potential vCPU preemption occurs.
> >
> > AVX instructions (VMOVDQU) guarantee that entire instruction sequences are
> > written in a single, non-preemptible operation. The 5-dword CCS copy command
> > is expanded to 8 dwords (padded with 3 MI_NOOPs) to meet AVX 256-bit
> > alignment requirements. By the time the GPU executes the batch buffer (after
> > vCPU pause), all instructions are guaranteed to be completely written.
> >
> > Here we are ensuring that GPU instructions are fully formed before the GPU
> > attempts to execute them during the migration process.
> >
> > -Satya.>>
> > > > Let a summarized explanation in the commit message and put more here.
> > > >
> > > > I'm sorry for being picky here, but I want to ensure that the information
> > > > around this code is clear so we don't keep having to explain this over
> > > > and over in the future.
> > > >
> > > > > + * (128-bit XMM and 256-bit YMM) within kernel_fpu_begin()/kernel_fpu_end()
> > > > > + * sections. This prevents vCPU preemption during instruction generation,
> > > > > + * ensuring complete GPU commands are written to the batch buffer.
> > > > > + */
> > > > > +
> > > > > +static void memcpy_vmovdqu(struct xe_device *xe, void *dst, const void *src, u32 size)
> > > > > +{
> > > > > + xe_assert(xe, !IS_DGFX(xe));
> > > > > + xe_assert(xe, IS_SRIOV_VF(xe));
> > > > > +
> > > > > +#ifdef CONFIG_X86
> > > > > + kernel_fpu_begin();
> > > > > + if (size == SZ_128) {
> > > > > + asm("vmovdqu (%0), %%xmm0\n"
> > > > > + "vmovups %%xmm0, (%1)\n"
> > > > > + :: "r" (src), "r" (dst) : "memory");
> > > > > + } else if (size == SZ_256) {
> > > > > + asm("vmovdqu (%0), %%ymm0\n"
> > > > > + "vmovups %%ymm0, (%1)\n"
> > > > > + :: "r" (src), "r" (dst) : "memory");
> > > > > + }
> > > > > + kernel_fpu_end();
> > > > > +#endif
> > > > > +}
> > > > > +
> > > > > +static void emit_atomic(struct xe_gt *gt, void *dst, const void *src, u32 size)
> > > > > +{
> > > > > + u32 instr_size = size * BITS_PER_BYTE;
> > > > > +
> > > > > + xe_gt_assert(gt, instr_size == SZ_128 || instr_size == SZ_256);
> > > > > +
> > > > > + if (IS_VF_CCS_READY(gt_to_xe(gt))) {
> > > > > + xe_gt_assert(gt, static_cpu_has(X86_FEATURE_AVX));
> > > > > + memcpy_vmovdqu(gt_to_xe(gt), dst, src, instr_size);
> > > > > + } else {
> > > > > + memcpy(dst, src, size);
> > > > > + }
> > > > > +}
> > > > > +
> > > > > +#define EMIT_COPY_CCS_DW 8
> > > > > static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > > > > u64 dst_ofs, bool dst_is_indirect,
> > > > > u64 src_ofs, bool src_is_indirect,
> > > > > u32 size)
> > > > > {
> > > > > + u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
> > > > > struct xe_device *xe = gt_to_xe(gt);
> > > > > u32 *cs = bb->cs + bb->len;
> > > > > u32 num_ccs_blks;
> > > > > u32 num_pages;
> > > > > u32 ccs_copy_size;
> > > > > u32 mocs;
> > > > > + u32 i = 0;
> > > > > if (GRAPHICS_VERx100(xe) >= 2000) {
> > > > > num_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
> > > > > @@ -686,15 +741,23 @@ static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
> > > > > mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK, gt->mocs.uc_index);
> > > > > }
> > > > > - *cs++ = XY_CTRL_SURF_COPY_BLT |
> > > > > - (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > > > > - (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > > > > - ccs_copy_size;
> > > > > - *cs++ = lower_32_bits(src_ofs);
> > > > > - *cs++ = upper_32_bits(src_ofs) | mocs;
> > > > > - *cs++ = lower_32_bits(dst_ofs);
> > > > > - *cs++ = upper_32_bits(dst_ofs) | mocs;
> > > > > + dw[i++] = XY_CTRL_SURF_COPY_BLT |
> > > > > + (src_is_indirect ? 0x0 : 0x1) << SRC_ACCESS_TYPE_SHIFT |
> > > > > + (dst_is_indirect ? 0x0 : 0x1) << DST_ACCESS_TYPE_SHIFT |
> > > > > + ccs_copy_size;
> > > > > + dw[i++] = lower_32_bits(src_ofs);
> > > > > + dw[i++] = upper_32_bits(src_ofs) | mocs;
> > > > > + dw[i++] = lower_32_bits(dst_ofs);
> > > > > + dw[i++] = upper_32_bits(dst_ofs) | mocs;
> > > > > + /*
> > > > > + * The CCS copy command is a 5-dword sequence. If the vCPU halts during
> > > > > + * save/restore while this sequence is being issued, partial writes may trigger
> > > > > + * page faults when saving iGPU CCS metadata. Use the VMOVDQU instruction to
> > > > > + * write the sequence atomically.
> > > > > + */
> > > > > + emit_atomic(gt, cs, dw, sizeof(dw));
> > > > > + cs += EMIT_COPY_CCS_DW;
> > > > > bb->len = cs - bb->cs;
> > > > > }
> > > > > @@ -1061,18 +1124,27 @@ static u64 migrate_vm_ppgtt_addr_tlb_inval(void)
> > > > > return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
> > > > > }
> > > > > -static int emit_flush_invalidate(u32 *dw, int i, u32 flags)
> > > > > +/*
> > > > > + * The MI_FLUSH_DW command is a 4-dword sequence. If the vCPU halts during
> > > > > + * save/restore while this sequence is being issued, partial writes may
> > > > > + * trigger page faults when saving iGPU CCS metadata. Use
> > > > > + * emit_atomic() to write the sequence atomically.
> > > > > + */
> > > > > +#define EMIT_FLUSH_INVALIDATE_DW 4
> > > > > +static int emit_flush_invalidate(struct xe_exec_queue *q, u32 *cs, int i, u32 flags)
> > > > > {
> > > > > u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
> > > > > + u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j = 0;
> > > > > +
> > > > > + dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > > > > + MI_FLUSH_IMM_DW | flags;
> > > > > + dw[j++] = lower_32_bits(addr);
> > > > > + dw[j++] = upper_32_bits(addr);
> > > > > + dw[j++] = MI_NOOP;
> > > > > - dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW |
> > > > > - MI_FLUSH_IMM_DW | flags;
> > > > > - dw[i++] = lower_32_bits(addr);
> > > > > - dw[i++] = upper_32_bits(addr);
> > > > > - dw[i++] = MI_NOOP;
> > > > > - dw[i++] = MI_NOOP;
> > > > > + emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
> > > > > - return i;
> > > > > + return i + j;
> > > > > }
> > > > > /**
> > > > > @@ -1117,7 +1189,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > > > /* Calculate Batch buffer size */
> > > > > batch_size = 0;
> > > > > while (size) {
> > > > > - batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > > > > + batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > > > > u64 ccs_ofs, ccs_size;
> > > > > u32 ccs_pt;
> > > > > @@ -1158,7 +1230,7 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > > > * sizes here again before copy command is emitted.
> > > > > */
> > > > > while (size) {
> > > > > - batch_size += 10; /* Flush + ggtt addr + 2 NOP */
> > > > > + batch_size += EMIT_FLUSH_INVALIDATE_DW * 2; /* Flush + ggtt addr + 1 NOP */
> > > > > u32 flush_flags = 0;
> > > > > u64 ccs_ofs, ccs_size;
> > > > > u32 ccs_pt;
> > > > > @@ -1181,11 +1253,11 @@ int xe_migrate_ccs_rw_copy(struct xe_tile *tile, struct xe_exec_queue *q,
> > > > > emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
> > > > > - bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > > > > + bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> > > > > flush_flags = xe_migrate_ccs_copy(m, bb, src_L0_ofs, src_is_pltt,
> > > > > src_L0_ofs, dst_is_pltt,
> > > > > src_L0, ccs_ofs, true);
> > > > > - bb->len = emit_flush_invalidate(bb->cs, bb->len, flush_flags);
> > > > > + bb->len = emit_flush_invalidate(q, bb->cs, bb->len, flush_flags);
> > > > > size -= src_L0;
> > > > > }
> > > > > --
> > > > > 2.51.0
> > > > >
> > >
> >
>
> --
> Matt Roper
> Graphics Software Engineer
> Linux GPU Platform Enablement
> Intel Corporation
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v8 1/3] drm/xe/migrate: Use AVX instructions to prevent partial writes during VF migration CCS batch buffer updates
2025-10-24 16:10 ` Matthew Brost
@ 2025-10-24 20:07 ` Vivi, Rodrigo
0 siblings, 0 replies; 15+ messages in thread
From: Vivi, Rodrigo @ 2025-10-24 20:07 UTC (permalink / raw)
To: Brost, Matthew, Roper, Matthew D
Cc: ville.syrjala@linux.intel.com, intel-xe@lists.freedesktop.org,
K V P, Satyanarayana, Wajdeczko, Michal, Auld, Matthew
On Fri, 2025-10-24 at 09:10 -0700, Matthew Brost wrote:
> On Fri, Oct 24, 2025 at 09:05:12AM -0700, Matt Roper wrote:
> > On Fri, Oct 24, 2025 at 07:55:32PM +0530, K V P, Satyanarayana
> > wrote:
> > >
> > >
> > > On 24-10-2025 19:35, Ville Syrjälä wrote:
> > > > On Fri, Oct 24, 2025 at 09:57:15AM -0400, Rodrigo Vivi wrote:
> > > > > On Fri, Oct 24, 2025 at 07:05:24PM +0530, Satyanarayana K V P
> > > > > wrote:
> > > > >
> > > > > Hi Satya,
> > > > >
> > > > > First of all, thank you for the updates.
> > > > >
> > > > > Second, the subject is way to big.
> > > > >
> > > > > This should be enough and under 75 cols:
> > > > >
> > > > > drm/xe: Use AVX instructions to prevent partial writes during
> > > > > VF pause
> > > > >
> > > > > more below:
> > > > >
> > > > > > VF KMD registers two specialized contexts with the GUC for
> > > > > > migration
> > > > > > operations. Save context contain copy commands and PTEs to
> > > > > > transfer CCS
> > > > > > metadata from GPU pools to system memory and restore
> > > > > > context contain copy
> > > > > > commands and PTEs to transfer CCS metadata from system
> > > > > > memory back to CCS
> > > > > > pools. GUC submits these contexts to HW during VF
> > > > > > migration.
> > > > > >
> > > > > > Each context uses a large batch buffer allocated via sub-
> > > > > > allocator,
> > > > > > pre-filled with MI_NOOPs and terminated with
> > > > > > MI_BATCH_BUFFER_END. During
> > > > > > BO lifecycle management, segments are dynamically allocated
> > > > > > from this
> > > > > > buffer and populated with PTEs and copy commands for active
> > > > > > BOs, then reset
> > > > > > to MI_NOOPs when BOs are destroyed.
> > > > > >
> > > > > > The CCS copy operation requires a 5-dword command sequence
> > > > > > to be written
> > > > > > to the batch buffer. During VF migration save/restore
> > > > > > operations, if the
> > > > > > vCPU gets preempted or halted while this command sequence
> > > > > > is being
> > > > > > programmed, partial writes can occur. These partial writes
> > > > > > create
> > > > > > incomplete GPU instructions in the batch buffer, which
> > > > > > trigger page faults
> > > > > > when the GUC submits the batch buffer to hardware for CCS
> > > > > > metadata
> > > > > > operations.
> > > > >
> > > > > Perhaps we could summarize the thing here and move details to
> > > > > the comment
> > > > > near the assembly. The important part in the commit message
> > > > > is to have
> > > > > the 'why'. Some of the details of the commands like MI_NOOP
> > > > > fill and all
> > > > > could be in the comment near the ASM.
> > > > >
> > > > > >
> > > > > > Standard memory operations like memcpy() are preemptible,
> > > > > > meaning the CPU
> > > > > > scheduler can interrupt execution midway through writing
> > > > > > the command
> > > > > > sequence, leaving the batch buffer in an inconsistent state
> > > > > > with partially
> > > > > > written GPU instructions.
> > > > > >
> > > > > > Replace standard memory operations with x86 AVX
> > > > > > instructions that provide
> > > > > > atomic, non-preemptible writes as AVX instructions cannot
> > > > > > be preempted
> > > > > > during execution, ensuring complete command sequences are
> > > > > > written
> > > > > > atomically to the batch buffer.
> > > > > >
> > > > > > Expand EMIT_COPY_CCS_DW from 5 dwords to 8 dwords to align
> > > > > > with 256-bit
> > > > > > VMOVDQU operations. Update emit_flush_invalidate() to use
> > > > > > VMOVDQU
> > > > > > operating with 128-bit chunks. By ensuring GPU instruction
> > > > > > headers
> > > > > > (3-dword and 5-dword sequences) are written atomically, we
> > > > > > prevent partial
> > > > > > updates that could compromise migration stability.
> > > > > >
> > > > > > This approach guarantees that batch buffer updates are
> > > > > > completed entirely
> > > > > > or not at all, eliminating the page fault scenarios during
> > > > > > VF migration
> > > > > > operations regardless of vCPU scheduling behavior.
> > > > > >
> > > > > > Signed-off-by: Satyanarayana K V P
> > > > > > <satyanarayana.k.v.p@intel.com>
> > > > > > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > > > > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > > > > Cc: Matthew Auld <matthew.auld@intel.com>
> > > > > > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > > > Cc: Matt Roper <matthew.d.roper@intel.com>
> > > > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > >
> > > > > > ---
> > > > > > V7 -> V8:
> > > > > > - Updated commit title and message.
> > > > > >
> > > > > > V6 -> V7:
> > > > > > - Added description explaining why to use assembly
> > > > > > instructions for
> > > > > > atomicity.
> > > > > > - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
> > > > > > - Include <asm/cpufeature.h> though checkpatch complains.
> > > > > > With
> > > > > > <linux/cpufeature.h> KUnit is throwing errors.
> > > > > >
> > > > > > V5 -> V6:
> > > > > > - Fixed review comments (Rodrigo)
> > > > > >
> > > > > > V4 -> V5:
> > > > > > - Fixed review comments. (Matt B)
> > > > > >
> > > > > > V3 -> V4:
> > > > > > - Fixed review comments. (Wajdeczko)
> > > > > > - Fix issues reported by patchworks.
> > > > > >
> > > > > > V2 -> V3:
> > > > > > - Added support for 128 bit and 256 bit instructions with
> > > > > > memcpy_vmovdqu
> > > > > > - Updated emit_flush_invalidate() to use vmovdqu
> > > > > > instruction.
> > > > > >
> > > > > > V1 -> V2:
> > > > > > - Use memcpy_vmovdqu only for x86 arch and for VF. Else use
> > > > > > memcpy
> > > > > > (Auld, Matthew)
> > > > > > - Fix issues reported by patchworks.
> > > > > > ---
> > > > > > drivers/gpu/drm/xe/xe_migrate.c | 114
> > > > > > ++++++++++++++++++++++++++------
> > > > > > 1 file changed, 93 insertions(+), 21 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > > > > > b/drivers/gpu/drm/xe/xe_migrate.c
> > > > > > index 921c9c1ea41f..005dc26a0393 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > > > > @@ -5,6 +5,8 @@
> > > > > > #include "xe_migrate.h"
> > > > > > +#include <asm/fpu/api.h>
> > > > > > +#include <asm/cpufeature.h>
> > > > > > #include <linux/bitfield.h>
> > > > > > #include <linux/sizes.h>
> > > > > > @@ -33,6 +35,7 @@
> > > > > > #include "xe_res_cursor.h"
> > > > > > #include "xe_sa.h"
> > > > > > #include "xe_sched_job.h"
> > > > > > +#include "xe_sriov_vf_ccs.h"
> > > > > > #include "xe_sync.h"
> > > > > > #include "xe_trace_bo.h"
> > > > > > #include "xe_validation.h"
> > > > > > @@ -657,18 +660,70 @@ static void emit_pte(struct
> > > > > > xe_migrate *m,
> > > > > > }
> > > > > > }
> > > > > > -#define EMIT_COPY_CCS_DW 5
> > > > > > +/*
> > > > > > + * VF KMD registers two special LRCs with the GuC to
> > > > > > handle save/restore
> > > > > > + * operations for CCS metadata on IGPU. GUC executes these
> > > > > > LRCAs during
> > > > > > + * VF state/restore operations.
> > > > > > + *
> > > > > > + * Each LRC contains a batch buffer pool that GuC submits
> > > > > > to hardware during
> > > > > > + * VF state save/restore operations. Since these
> > > > > > operations can occur
> > > > > > + * asynchronously at any time, we must ensure GPU
> > > > > > instructions in the batch
> > > > > > + * buffer are written atomically to prevent corruption
> > > > > > from incomplete writes.
> > > > > > + *
> > > > > > + * To guarantee atomic instruction writes, we use x86 SIMD
> > > > > > instructions
> > > > >
> > > > > Here you still mention 'atomic' since we already know this is
> > > > > not 'atomic'.
> > > >
> > > > I still don't see how is this supposed to do anything useful
> > > > without
> > > > atomic writes to memory.
> > > >
> > > > If the GPU is executing the same memory we're writing then
> > > > nothing
> > > > short of atomic memory writes is going to actually fix it. And
> > > > even
> > > > that would require careful alignment of things to guarantee
> > > > that
> > > > each command is completely contained within one atomic write.
> > > >
> > > The CPU and GPU operate on the same memory space but at different
> > > times
> > > during VF migration. The critical issue occurs during the batch
> > > buffer
> > > preparation phase when the vCPU is still active and writing GPU
> > > instructions, while the GPU will later execute these same
> > > instructions after
> > > the vCPU is paused.
> > >
> > > During batch buffer updates, if the vCPU gets preempted while
> > > writing GPU
> > > instruction sequences (such as the 5-dword CCS copy command), it
> > > leaves
> > > partially written instructions in memory. When the GPU later
> > > executes the
> > > batch buffer after vCPU suspension, these incomplete instructions
> > > cause
> > > execution failures and page faults.
> >
> > As was discussed on the previous revision, the architecture
> > document
> > already gives guidance on approaches to deal with these timing
> > issues;
> > using AVX like this is not what was recommended. Can't we just
> > implement the shadow buffer and eliminate this controversial and
> > confusing assembly usage? I think relying on assembly should be
> > the
> > absolute last resort and not something we jump to when we have
> > cleaner
> > and more widely-supported options.
> >
>
> I discussed this Rodrigo on call, we were both fine with this
> solution
> but if the consensus if not use asm, not sure we can pivot. fwiw,
> this
> solution will perform better as there is a non-zero cost of
> maintaining
> 2 buffers but perhaps that doesn't really matter as I wouldn't think
> memory allocations are a hot path.
I have nothing against this asm code to be honest. (at least after I
understood the flow and the code).
My only true concern was always with the questions that it will keep
bringing it over and over.
And based on this thread today clearly, even the doc and commit message
in place right now is not solving it and we are continuing to hear
questions over and over. :(
Then, from the documentation itself:
https://www.kernel.org/doc/html/latest/process/coding-style.html#inline-assembly
"However, don’t use inline assembly gratuitously when C can do the job.
You can and should poke hardware from C when possible."
With this in mind and the comment from Matt and Ville. Perhaps we need
to reconsider the path and take the solutions proposed by the doc
itself instead of this code.
In case this is urgent and blocking something we could perhaps go with
this solution that has been already validated, but with work in
parallel to replace it asap.
Thanks,
Rodrigo.
>
> Matt
>
> >
> > Matt
> >
> > >
> > > AVX instructions provide atomic write operations that cannot be
> > > interrupted
> > > by the CPU scheduler. This ensures that GPU instruction sequences
> > > are
> > > written completely before any potential vCPU preemption occurs.
> > >
> > > AVX instructions (VMOVDQU) guarantee that entire instruction
> > > sequences are
> > > written in a single, non-preemptible operation. The 5-dword CCS
> > > copy command
> > > is expanded to 8 dwords (padded with 3 MI_NOOPs) to meet AVX 256-
> > > bit
> > > alignment requirements. By the time the GPU executes the batch
> > > buffer (after
> > > vCPU pause), all instructions are guaranteed to be completely
> > > written.
> > >
> > > Here we are ensuring that GPU instructions are fully formed
> > > before the GPU
> > > attempts to execute them during the migration process.
> > >
> > > -Satya.>>
> > > > > Let a summarized explanation in the commit message and put
> > > > > more here.
> > > > >
> > > > > I'm sorry for being picky here, but I want to ensure that the
> > > > > information
> > > > > around this code is clear so we don't keep having to explain
> > > > > this over
> > > > > and over in the future.
> > > > >
> > > > > > + * (128-bit XMM and 256-bit YMM) within
> > > > > > kernel_fpu_begin()/kernel_fpu_end()
> > > > > > + * sections. This prevents vCPU preemption during
> > > > > > instruction generation,
> > > > > > + * ensuring complete GPU commands are written to the batch
> > > > > > buffer.
> > > > > > + */
> > > > > > +
> > > > > > +static void memcpy_vmovdqu(struct xe_device *xe, void
> > > > > > *dst, const void *src, u32 size)
> > > > > > +{
> > > > > > + xe_assert(xe, !IS_DGFX(xe));
> > > > > > + xe_assert(xe, IS_SRIOV_VF(xe));
> > > > > > +
> > > > > > +#ifdef CONFIG_X86
> > > > > > + kernel_fpu_begin();
> > > > > > + if (size == SZ_128) {
> > > > > > + asm("vmovdqu (%0), %%xmm0\n"
> > > > > > + "vmovups %%xmm0, (%1)\n"
> > > > > > + :: "r" (src), "r" (dst) : "memory");
> > > > > > + } else if (size == SZ_256) {
> > > > > > + asm("vmovdqu (%0), %%ymm0\n"
> > > > > > + "vmovups %%ymm0, (%1)\n"
> > > > > > + :: "r" (src), "r" (dst) : "memory");
> > > > > > + }
> > > > > > + kernel_fpu_end();
> > > > > > +#endif
> > > > > > +}
> > > > > > +
> > > > > > +static void emit_atomic(struct xe_gt *gt, void *dst, const
> > > > > > void *src, u32 size)
> > > > > > +{
> > > > > > + u32 instr_size = size * BITS_PER_BYTE;
> > > > > > +
> > > > > > + xe_gt_assert(gt, instr_size == SZ_128 ||
> > > > > > instr_size == SZ_256);
> > > > > > +
> > > > > > + if (IS_VF_CCS_READY(gt_to_xe(gt))) {
> > > > > > + xe_gt_assert(gt,
> > > > > > static_cpu_has(X86_FEATURE_AVX));
> > > > > > + memcpy_vmovdqu(gt_to_xe(gt), dst, src,
> > > > > > instr_size);
> > > > > > + } else {
> > > > > > + memcpy(dst, src, size);
> > > > > > + }
> > > > > > +}
> > > > > > +
> > > > > > +#define EMIT_COPY_CCS_DW 8
> > > > > > static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb
> > > > > > *bb,
> > > > > > u64 dst_ofs, bool
> > > > > > dst_is_indirect,
> > > > > > u64 src_ofs, bool
> > > > > > src_is_indirect,
> > > > > > u32 size)
> > > > > > {
> > > > > > + u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
> > > > > > struct xe_device *xe = gt_to_xe(gt);
> > > > > > u32 *cs = bb->cs + bb->len;
> > > > > > u32 num_ccs_blks;
> > > > > > u32 num_pages;
> > > > > > u32 ccs_copy_size;
> > > > > > u32 mocs;
> > > > > > + u32 i = 0;
> > > > > > if (GRAPHICS_VERx100(xe) >= 2000) {
> > > > > > num_pages = DIV_ROUND_UP(size,
> > > > > > XE_PAGE_SIZE);
> > > > > > @@ -686,15 +741,23 @@ static void emit_copy_ccs(struct
> > > > > > xe_gt *gt, struct xe_bb *bb,
> > > > > > mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK,
> > > > > > gt->mocs.uc_index);
> > > > > > }
> > > > > > - *cs++ = XY_CTRL_SURF_COPY_BLT |
> > > > > > - (src_is_indirect ? 0x0 : 0x1) <<
> > > > > > SRC_ACCESS_TYPE_SHIFT |
> > > > > > - (dst_is_indirect ? 0x0 : 0x1) <<
> > > > > > DST_ACCESS_TYPE_SHIFT |
> > > > > > - ccs_copy_size;
> > > > > > - *cs++ = lower_32_bits(src_ofs);
> > > > > > - *cs++ = upper_32_bits(src_ofs) | mocs;
> > > > > > - *cs++ = lower_32_bits(dst_ofs);
> > > > > > - *cs++ = upper_32_bits(dst_ofs) | mocs;
> > > > > > + dw[i++] = XY_CTRL_SURF_COPY_BLT |
> > > > > > + (src_is_indirect ? 0x0 : 0x1) <<
> > > > > > SRC_ACCESS_TYPE_SHIFT |
> > > > > > + (dst_is_indirect ? 0x0 : 0x1) <<
> > > > > > DST_ACCESS_TYPE_SHIFT |
> > > > > > + ccs_copy_size;
> > > > > > + dw[i++] = lower_32_bits(src_ofs);
> > > > > > + dw[i++] = upper_32_bits(src_ofs) | mocs;
> > > > > > + dw[i++] = lower_32_bits(dst_ofs);
> > > > > > + dw[i++] = upper_32_bits(dst_ofs) | mocs;
> > > > > > + /*
> > > > > > + * The CCS copy command is a 5-dword sequence. If
> > > > > > the vCPU halts during
> > > > > > + * save/restore while this sequence is being
> > > > > > issued, partial writes may trigger
> > > > > > + * page faults when saving iGPU CCS metadata. Use
> > > > > > the VMOVDQU instruction to
> > > > > > + * write the sequence atomically.
> > > > > > + */
> > > > > > + emit_atomic(gt, cs, dw, sizeof(dw));
> > > > > > + cs += EMIT_COPY_CCS_DW;
> > > > > > bb->len = cs - bb->cs;
> > > > > > }
> > > > > > @@ -1061,18 +1124,27 @@ static u64
> > > > > > migrate_vm_ppgtt_addr_tlb_inval(void)
> > > > > > return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
> > > > > > }
> > > > > > -static int emit_flush_invalidate(u32 *dw, int i, u32
> > > > > > flags)
> > > > > > +/*
> > > > > > + * The MI_FLUSH_DW command is a 4-dword sequence. If the
> > > > > > vCPU halts during
> > > > > > + * save/restore while this sequence is being issued,
> > > > > > partial writes may
> > > > > > + * trigger page faults when saving iGPU CCS metadata. Use
> > > > > > + * emit_atomic() to write the sequence atomically.
> > > > > > + */
> > > > > > +#define EMIT_FLUSH_INVALIDATE_DW 4
> > > > > > +static int emit_flush_invalidate(struct xe_exec_queue *q,
> > > > > > u32 *cs, int i, u32 flags)
> > > > > > {
> > > > > > u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
> > > > > > + u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j =
> > > > > > 0;
> > > > > > +
> > > > > > + dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB |
> > > > > > MI_FLUSH_DW_OP_STOREDW |
> > > > > > + MI_FLUSH_IMM_DW | flags;
> > > > > > + dw[j++] = lower_32_bits(addr);
> > > > > > + dw[j++] = upper_32_bits(addr);
> > > > > > + dw[j++] = MI_NOOP;
> > > > > > - dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB |
> > > > > > MI_FLUSH_DW_OP_STOREDW |
> > > > > > - MI_FLUSH_IMM_DW | flags;
> > > > > > - dw[i++] = lower_32_bits(addr);
> > > > > > - dw[i++] = upper_32_bits(addr);
> > > > > > - dw[i++] = MI_NOOP;
> > > > > > - dw[i++] = MI_NOOP;
> > > > > > + emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
> > > > > > - return i;
> > > > > > + return i + j;
> > > > > > }
> > > > > > /**
> > > > > > @@ -1117,7 +1189,7 @@ int xe_migrate_ccs_rw_copy(struct
> > > > > > xe_tile *tile, struct xe_exec_queue *q,
> > > > > > /* Calculate Batch buffer size */
> > > > > > batch_size = 0;
> > > > > > while (size) {
> > > > > > - batch_size += 10; /* Flush + ggtt addr + 2
> > > > > > NOP */
> > > > > > + batch_size += EMIT_FLUSH_INVALIDATE_DW *
> > > > > > 2; /* Flush + ggtt addr + 1 NOP */
> > > > > > u64 ccs_ofs, ccs_size;
> > > > > > u32 ccs_pt;
> > > > > > @@ -1158,7 +1230,7 @@ int xe_migrate_ccs_rw_copy(struct
> > > > > > xe_tile *tile, struct xe_exec_queue *q,
> > > > > > * sizes here again before copy command is
> > > > > > emitted.
> > > > > > */
> > > > > > while (size) {
> > > > > > - batch_size += 10; /* Flush + ggtt addr + 2
> > > > > > NOP */
> > > > > > + batch_size += EMIT_FLUSH_INVALIDATE_DW *
> > > > > > 2; /* Flush + ggtt addr + 1 NOP */
> > > > > > u32 flush_flags = 0;
> > > > > > u64 ccs_ofs, ccs_size;
> > > > > > u32 ccs_pt;
> > > > > > @@ -1181,11 +1253,11 @@ int xe_migrate_ccs_rw_copy(struct
> > > > > > xe_tile *tile, struct xe_exec_queue *q,
> > > > > > emit_pte(m, bb, ccs_pt, false, false,
> > > > > > &ccs_it, ccs_size, src);
> > > > > > - bb->len = emit_flush_invalidate(bb->cs,
> > > > > > bb->len, flush_flags);
> > > > > > + bb->len = emit_flush_invalidate(q, bb->cs,
> > > > > > bb->len, flush_flags);
> > > > > > flush_flags = xe_migrate_ccs_copy(m, bb,
> > > > > > src_L0_ofs, src_is_pltt,
> > > > > >
> > > > > > src_L0_ofs, dst_is_pltt,
> > > > > > src_L0,
> > > > > > ccs_ofs, true);
> > > > > > - bb->len = emit_flush_invalidate(bb->cs,
> > > > > > bb->len, flush_flags);
> > > > > > + bb->len = emit_flush_invalidate(q, bb->cs,
> > > > > > bb->len, flush_flags);
> > > > > > size -= src_L0;
> > > > > > }
> > > > > > --
> > > > > > 2.51.0
> > > > > >
> > > >
> > >
> >
> > --
> > Matt Roper
> > Graphics Software Engineer
> > Linux GPU Platform Enablement
> > Intel Corporation
^ permalink raw reply [flat|nested] 15+ messages in thread
* ✓ Xe.CI.Full: success for drm/xe/migrate: Atomicize CCS copy command setup
2025-10-24 13:35 [PATCH v8 0/3] drm/xe/migrate: Atomicize CCS copy command setup Satyanarayana K V P
` (5 preceding siblings ...)
2025-10-24 15:48 ` ✓ Xe.CI.BAT: " Patchwork
@ 2025-10-25 3:47 ` Patchwork
6 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2025-10-25 3:47 UTC (permalink / raw)
To: K V P, Satyanarayana; +Cc: intel-xe
[-- Attachment #1: Type: text/plain, Size: 39399 bytes --]
== Series Details ==
Series: drm/xe/migrate: Atomicize CCS copy command setup
URL : https://patchwork.freedesktop.org/series/156482/
State : success
== Summary ==
CI Bug Log - changes from xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf_FULL -> xe-pw-156482v1_FULL
====================================================
Summary
-------
**SUCCESS**
No regressions found.
Participating hosts (4 -> 4)
------------------------------
No changes in participating hosts
Known issues
------------
Here are the changes found in xe-pw-156482v1_FULL that come from known issues:
### IGT changes ###
#### Issues hit ####
* igt@kms_async_flips@test-cursor-atomic:
- shard-adlp: [PASS][1] -> [DMESG-WARN][2] ([Intel XE#2953] / [Intel XE#4173]) +5 other tests dmesg-warn
[1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-adlp-8/igt@kms_async_flips@test-cursor-atomic.html
[2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-adlp-6/igt@kms_async_flips@test-cursor-atomic.html
* igt@kms_big_fb@yf-tiled-16bpp-rotate-0:
- shard-dg2-set2: NOTRUN -> [SKIP][3] ([Intel XE#1124]) +2 other tests skip
[3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-466/igt@kms_big_fb@yf-tiled-16bpp-rotate-0.html
* igt@kms_bw@connected-linear-tiling-2-displays-2560x1440p:
- shard-bmg: [PASS][4] -> [SKIP][5] ([Intel XE#2314] / [Intel XE#2894]) +1 other test skip
[4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-8/igt@kms_bw@connected-linear-tiling-2-displays-2560x1440p.html
[5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-6/igt@kms_bw@connected-linear-tiling-2-displays-2560x1440p.html
* igt@kms_bw@linear-tiling-4-displays-2560x1440p:
- shard-dg2-set2: NOTRUN -> [SKIP][6] ([Intel XE#367])
[6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-466/igt@kms_bw@linear-tiling-4-displays-2560x1440p.html
* igt@kms_ccs@bad-aux-stride-4-tiled-mtl-mc-ccs@pipe-a-hdmi-a-6:
- shard-dg2-set2: NOTRUN -> [SKIP][7] ([Intel XE#787]) +27 other tests skip
[7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@kms_ccs@bad-aux-stride-4-tiled-mtl-mc-ccs@pipe-a-hdmi-a-6.html
* igt@kms_ccs@ccs-on-another-bo-y-tiled-gen12-rc-ccs-cc@pipe-d-dp-4:
- shard-dg2-set2: NOTRUN -> [SKIP][8] ([Intel XE#455] / [Intel XE#787]) +7 other tests skip
[8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@kms_ccs@ccs-on-another-bo-y-tiled-gen12-rc-ccs-cc@pipe-d-dp-4.html
* igt@kms_ccs@random-ccs-data-4-tiled-dg2-mc-ccs:
- shard-dg2-set2: [PASS][9] -> [INCOMPLETE][10] ([Intel XE#1727] / [Intel XE#3113] / [Intel XE#4345] / [Intel XE#6168])
[9]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-435/igt@kms_ccs@random-ccs-data-4-tiled-dg2-mc-ccs.html
[10]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-433/igt@kms_ccs@random-ccs-data-4-tiled-dg2-mc-ccs.html
* igt@kms_ccs@random-ccs-data-4-tiled-dg2-mc-ccs@pipe-a-dp-4:
- shard-dg2-set2: [PASS][11] -> [INCOMPLETE][12] ([Intel XE#1727] / [Intel XE#3113] / [Intel XE#6014] / [Intel XE#6168] / [i915#14968])
[11]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-435/igt@kms_ccs@random-ccs-data-4-tiled-dg2-mc-ccs@pipe-a-dp-4.html
[12]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-433/igt@kms_ccs@random-ccs-data-4-tiled-dg2-mc-ccs@pipe-a-dp-4.html
* igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs:
- shard-dg2-set2: [PASS][13] -> [INCOMPLETE][14] ([Intel XE#1727] / [Intel XE#3113] / [Intel XE#4345]) +1 other test incomplete
[13]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-432/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs.html
[14]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-434/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs.html
* igt@kms_chamelium_edid@hdmi-mode-timings:
- shard-dg2-set2: NOTRUN -> [SKIP][15] ([Intel XE#373]) +2 other tests skip
[15]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@kms_chamelium_edid@hdmi-mode-timings.html
* igt@kms_cursor_legacy@basic-flip-after-cursor-atomic:
- shard-bmg: NOTRUN -> [INCOMPLETE][16] ([Intel XE#3226])
[16]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-6/igt@kms_cursor_legacy@basic-flip-after-cursor-atomic.html
* igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions-varying-size:
- shard-bmg: [PASS][17] -> [SKIP][18] ([Intel XE#2291]) +4 other tests skip
[17]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-8/igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions-varying-size.html
[18]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-6/igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions-varying-size.html
* igt@kms_cursor_legacy@flip-vs-cursor-atomic:
- shard-bmg: [PASS][19] -> [FAIL][20] ([Intel XE#1475])
[19]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-4/igt@kms_cursor_legacy@flip-vs-cursor-atomic.html
[20]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-3/igt@kms_cursor_legacy@flip-vs-cursor-atomic.html
* igt@kms_cursor_legacy@short-busy-flip-before-cursor-toggle:
- shard-dg2-set2: NOTRUN -> [SKIP][21] ([Intel XE#323])
[21]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-466/igt@kms_cursor_legacy@short-busy-flip-before-cursor-toggle.html
* igt@kms_dp_link_training@non-uhbr-sst:
- shard-bmg: [PASS][22] -> [SKIP][23] ([Intel XE#4354])
[22]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-8/igt@kms_dp_link_training@non-uhbr-sst.html
[23]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-6/igt@kms_dp_link_training@non-uhbr-sst.html
* igt@kms_dp_linktrain_fallback@dp-fallback:
- shard-bmg: [PASS][24] -> [SKIP][25] ([Intel XE#4294])
[24]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-8/igt@kms_dp_linktrain_fallback@dp-fallback.html
[25]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-6/igt@kms_dp_linktrain_fallback@dp-fallback.html
* igt@kms_dsc@dsc-with-bpc:
- shard-dg2-set2: NOTRUN -> [SKIP][26] ([Intel XE#455]) +1 other test skip
[26]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@kms_dsc@dsc-with-bpc.html
* igt@kms_feature_discovery@psr1:
- shard-dg2-set2: NOTRUN -> [SKIP][27] ([Intel XE#1135])
[27]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@kms_feature_discovery@psr1.html
* igt@kms_flip@2x-plain-flip-fb-recreate-interruptible:
- shard-bmg: [PASS][28] -> [SKIP][29] ([Intel XE#2316]) +1 other test skip
[28]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-8/igt@kms_flip@2x-plain-flip-fb-recreate-interruptible.html
[29]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-6/igt@kms_flip@2x-plain-flip-fb-recreate-interruptible.html
* igt@kms_flip@flip-vs-expired-vblank@b-hdmi-a1:
- shard-adlp: [PASS][30] -> [DMESG-WARN][31] ([Intel XE#4543]) +2 other tests dmesg-warn
[30]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-adlp-2/igt@kms_flip@flip-vs-expired-vblank@b-hdmi-a1.html
[31]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-adlp-9/igt@kms_flip@flip-vs-expired-vblank@b-hdmi-a1.html
* igt@kms_flip@flip-vs-expired-vblank@c-edp1:
- shard-lnl: [PASS][32] -> [FAIL][33] ([Intel XE#301] / [Intel XE#3149]) +1 other test fail
[32]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-lnl-3/igt@kms_flip@flip-vs-expired-vblank@c-edp1.html
[33]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-lnl-3/igt@kms_flip@flip-vs-expired-vblank@c-edp1.html
* igt@kms_flip_tiling@flip-change-tiling@pipe-d-hdmi-a-1-x-to-x:
- shard-adlp: [PASS][34] -> [DMESG-FAIL][35] ([Intel XE#4543]) +1 other test dmesg-fail
[34]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-adlp-2/igt@kms_flip_tiling@flip-change-tiling@pipe-d-hdmi-a-1-x-to-x.html
[35]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-adlp-9/igt@kms_flip_tiling@flip-change-tiling@pipe-d-hdmi-a-1-x-to-x.html
* igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-pri-shrfb-draw-blt:
- shard-dg2-set2: NOTRUN -> [SKIP][36] ([Intel XE#651]) +6 other tests skip
[36]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@kms_frontbuffer_tracking@fbcdrrs-2p-primscrn-pri-shrfb-draw-blt.html
* igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-spr-indfb-fullscreen:
- shard-dg2-set2: NOTRUN -> [SKIP][37] ([Intel XE#653]) +7 other tests skip
[37]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-466/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-spr-indfb-fullscreen.html
* igt@kms_frontbuffer_tracking@plane-fbc-rte:
- shard-dg2-set2: NOTRUN -> [SKIP][38] ([Intel XE#1158])
[38]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@kms_frontbuffer_tracking@plane-fbc-rte.html
* igt@kms_hdr@static-toggle:
- shard-bmg: [PASS][39] -> [SKIP][40] ([Intel XE#1503]) +1 other test skip
[39]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-8/igt@kms_hdr@static-toggle.html
[40]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-6/igt@kms_hdr@static-toggle.html
* igt@kms_joiner@basic-force-ultra-joiner:
- shard-dg2-set2: NOTRUN -> [SKIP][41] ([Intel XE#2925])
[41]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@kms_joiner@basic-force-ultra-joiner.html
* igt@kms_plane_multiple@2x-tiling-4:
- shard-bmg: [PASS][42] -> [SKIP][43] ([Intel XE#4596])
[42]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-8/igt@kms_plane_multiple@2x-tiling-4.html
[43]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-6/igt@kms_plane_multiple@2x-tiling-4.html
* igt@kms_psr2_sf@pr-overlay-primary-update-sf-dmg-area:
- shard-dg2-set2: NOTRUN -> [SKIP][44] ([Intel XE#1406] / [Intel XE#1489])
[44]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@kms_psr2_sf@pr-overlay-primary-update-sf-dmg-area.html
* igt@kms_psr@fbc-psr2-sprite-render:
- shard-dg2-set2: NOTRUN -> [SKIP][45] ([Intel XE#1406] / [Intel XE#2850] / [Intel XE#929]) +1 other test skip
[45]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@kms_psr@fbc-psr2-sprite-render.html
* igt@kms_rotation_crc@primary-y-tiled-reflect-x-0:
- shard-dg2-set2: NOTRUN -> [SKIP][46] ([Intel XE#1127])
[46]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@kms_rotation_crc@primary-y-tiled-reflect-x-0.html
* igt@kms_rotation_crc@sprite-rotation-90:
- shard-dg2-set2: NOTRUN -> [SKIP][47] ([Intel XE#3414])
[47]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@kms_rotation_crc@sprite-rotation-90.html
* igt@sriov_basic@enable-vfs-autoprobe-off:
- shard-dg2-set2: NOTRUN -> [SKIP][48] ([Intel XE#1091] / [Intel XE#2849])
[48]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-466/igt@sriov_basic@enable-vfs-autoprobe-off.html
* igt@xe_compute_preempt@compute-preempt-many-vram-evict:
- shard-dg2-set2: NOTRUN -> [SKIP][49] ([Intel XE#6360])
[49]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@xe_compute_preempt@compute-preempt-many-vram-evict.html
* igt@xe_copy_basic@mem-set-linear-0xfd:
- shard-dg2-set2: NOTRUN -> [SKIP][50] ([Intel XE#1126])
[50]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@xe_copy_basic@mem-set-linear-0xfd.html
* igt@xe_eudebug@discovery-race-sigint:
- shard-dg2-set2: NOTRUN -> [SKIP][51] ([Intel XE#4837]) +2 other tests skip
[51]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-434/igt@xe_eudebug@discovery-race-sigint.html
* igt@xe_exec_fault_mode@many-execqueues-userptr-imm:
- shard-dg2-set2: NOTRUN -> [SKIP][52] ([Intel XE#288]) +5 other tests skip
[52]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-466/igt@xe_exec_fault_mode@many-execqueues-userptr-imm.html
* igt@xe_exec_system_allocator@threads-many-execqueues-mmap-prefetch-shared:
- shard-dg2-set2: NOTRUN -> [SKIP][53] ([Intel XE#4915]) +75 other tests skip
[53]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@xe_exec_system_allocator@threads-many-execqueues-mmap-prefetch-shared.html
* igt@xe_live_ktest@xe_migrate:
- shard-dg2-set2: NOTRUN -> [FAIL][54] ([Intel XE#3099]) +2 other tests fail
[54]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-434/igt@xe_live_ktest@xe_migrate.html
* igt@xe_module_load@load:
- shard-dg2-set2: ([PASS][55], [PASS][56], [PASS][57], [PASS][58], [PASS][59], [PASS][60], [PASS][61], [PASS][62], [PASS][63], [PASS][64], [PASS][65], [PASS][66], [PASS][67], [PASS][68], [PASS][69], [PASS][70], [PASS][71], [PASS][72], [PASS][73], [PASS][74], [PASS][75], [PASS][76], [PASS][77], [PASS][78], [PASS][79]) -> ([PASS][80], [PASS][81], [PASS][82], [PASS][83], [PASS][84], [PASS][85], [SKIP][86], [PASS][87], [PASS][88], [PASS][89], [PASS][90], [PASS][91], [PASS][92], [PASS][93], [PASS][94], [PASS][95], [PASS][96], [PASS][97], [PASS][98], [PASS][99], [PASS][100], [PASS][101], [PASS][102], [PASS][103], [PASS][104], [PASS][105]) ([Intel XE#378])
[55]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-436/igt@xe_module_load@load.html
[56]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-463/igt@xe_module_load@load.html
[57]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-436/igt@xe_module_load@load.html
[58]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-463/igt@xe_module_load@load.html
[59]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-433/igt@xe_module_load@load.html
[60]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-433/igt@xe_module_load@load.html
[61]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-466/igt@xe_module_load@load.html
[62]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-433/igt@xe_module_load@load.html
[63]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-466/igt@xe_module_load@load.html
[64]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-466/igt@xe_module_load@load.html
[65]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-464/igt@xe_module_load@load.html
[66]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-463/igt@xe_module_load@load.html
[67]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-435/igt@xe_module_load@load.html
[68]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-435/igt@xe_module_load@load.html
[69]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-435/igt@xe_module_load@load.html
[70]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-434/igt@xe_module_load@load.html
[71]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-435/igt@xe_module_load@load.html
[72]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-434/igt@xe_module_load@load.html
[73]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-432/igt@xe_module_load@load.html
[74]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-436/igt@xe_module_load@load.html
[75]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-464/igt@xe_module_load@load.html
[76]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-434/igt@xe_module_load@load.html
[77]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-432/igt@xe_module_load@load.html
[78]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-432/igt@xe_module_load@load.html
[79]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-464/igt@xe_module_load@load.html
[80]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-464/igt@xe_module_load@load.html
[81]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-466/igt@xe_module_load@load.html
[82]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@xe_module_load@load.html
[83]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-435/igt@xe_module_load@load.html
[84]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-435/igt@xe_module_load@load.html
[85]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-466/igt@xe_module_load@load.html
[86]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-466/igt@xe_module_load@load.html
[87]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-434/igt@xe_module_load@load.html
[88]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-434/igt@xe_module_load@load.html
[89]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-434/igt@xe_module_load@load.html
[90]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-464/igt@xe_module_load@load.html
[91]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@xe_module_load@load.html
[92]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-436/igt@xe_module_load@load.html
[93]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-463/igt@xe_module_load@load.html
[94]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-463/igt@xe_module_load@load.html
[95]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-463/igt@xe_module_load@load.html
[96]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-436/igt@xe_module_load@load.html
[97]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-463/igt@xe_module_load@load.html
[98]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-435/igt@xe_module_load@load.html
[99]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-433/igt@xe_module_load@load.html
[100]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-433/igt@xe_module_load@load.html
[101]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-433/igt@xe_module_load@load.html
[102]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-436/igt@xe_module_load@load.html
[103]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-466/igt@xe_module_load@load.html
[104]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@xe_module_load@load.html
[105]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@xe_module_load@load.html
* igt@xe_oa@non-privileged-access-vaddr:
- shard-dg2-set2: NOTRUN -> [SKIP][106] ([Intel XE#3573])
[106]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-466/igt@xe_oa@non-privileged-access-vaddr.html
* igt@xe_pm:
- shard-dg2-set2: NOTRUN -> [INCOMPLETE][107] ([Intel XE#2594])
[107]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-464/igt@xe_pm.html
* igt@xe_query@multigpu-query-invalid-cs-cycles:
- shard-dg2-set2: NOTRUN -> [SKIP][108] ([Intel XE#944]) +1 other test skip
[108]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@xe_query@multigpu-query-invalid-cs-cycles.html
* igt@xe_sriov_flr@flr-vf1-clear:
- shard-dg2-set2: NOTRUN -> [SKIP][109] ([Intel XE#3342])
[109]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-434/igt@xe_sriov_flr@flr-vf1-clear.html
#### Possible fixes ####
* igt@kms_bw@connected-linear-tiling-2-displays-3840x2160p:
- shard-bmg: [SKIP][110] ([Intel XE#2314] / [Intel XE#2894]) -> [PASS][111]
[110]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-6/igt@kms_bw@connected-linear-tiling-2-displays-3840x2160p.html
[111]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-4/igt@kms_bw@connected-linear-tiling-2-displays-3840x2160p.html
* igt@kms_ccs@crc-primary-suspend-4-tiled-bmg-ccs:
- shard-bmg: [INCOMPLETE][112] ([Intel XE#3862]) -> [PASS][113] +1 other test pass
[112]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-7/igt@kms_ccs@crc-primary-suspend-4-tiled-bmg-ccs.html
[113]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-2/igt@kms_ccs@crc-primary-suspend-4-tiled-bmg-ccs.html
* igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc:
- shard-dg2-set2: [INCOMPLETE][114] ([Intel XE#1727] / [Intel XE#2705] / [Intel XE#3113] / [Intel XE#4212] / [Intel XE#4345] / [Intel XE#4522]) -> [PASS][115]
[114]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-464/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc.html
[115]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc.html
* igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-a-dp-4:
- shard-dg2-set2: [INCOMPLETE][116] ([Intel XE#1727] / [Intel XE#2705] / [Intel XE#3113] / [Intel XE#4212] / [Intel XE#4522]) -> [PASS][117]
[116]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-464/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-a-dp-4.html
[117]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-a-dp-4.html
* igt@kms_color@ctm-red-to-blue:
- shard-dg2-set2: [DMESG-WARN][118] -> [PASS][119]
[118]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-464/igt@kms_color@ctm-red-to-blue.html
[119]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-432/igt@kms_color@ctm-red-to-blue.html
* igt@kms_cursor_legacy@2x-flip-vs-cursor-atomic:
- shard-bmg: [SKIP][120] ([Intel XE#2291]) -> [PASS][121] +3 other tests pass
[120]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-6/igt@kms_cursor_legacy@2x-flip-vs-cursor-atomic.html
[121]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-4/igt@kms_cursor_legacy@2x-flip-vs-cursor-atomic.html
* igt@kms_cursor_legacy@flip-vs-cursor-legacy:
- shard-bmg: [FAIL][122] ([Intel XE#5299]) -> [PASS][123]
[122]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-8/igt@kms_cursor_legacy@flip-vs-cursor-legacy.html
[123]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-6/igt@kms_cursor_legacy@flip-vs-cursor-legacy.html
* igt@kms_dither@fb-8bpc-vs-panel-6bpc:
- shard-bmg: [SKIP][124] ([Intel XE#1340]) -> [PASS][125]
[124]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-6/igt@kms_dither@fb-8bpc-vs-panel-6bpc.html
[125]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-4/igt@kms_dither@fb-8bpc-vs-panel-6bpc.html
* igt@kms_flip@2x-plain-flip-fb-recreate:
- shard-bmg: [SKIP][126] ([Intel XE#2316]) -> [PASS][127] +5 other tests pass
[126]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-6/igt@kms_flip@2x-plain-flip-fb-recreate.html
[127]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-8/igt@kms_flip@2x-plain-flip-fb-recreate.html
* igt@kms_flip@flip-vs-rmfb-interruptible:
- shard-adlp: [DMESG-WARN][128] ([Intel XE#4543] / [Intel XE#5208]) -> [PASS][129]
[128]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-adlp-8/igt@kms_flip@flip-vs-rmfb-interruptible.html
[129]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-adlp-1/igt@kms_flip@flip-vs-rmfb-interruptible.html
* igt@kms_flip@flip-vs-rmfb-interruptible@c-hdmi-a1:
- shard-adlp: [DMESG-WARN][130] ([Intel XE#4543]) -> [PASS][131] +5 other tests pass
[130]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-adlp-8/igt@kms_flip@flip-vs-rmfb-interruptible@c-hdmi-a1.html
[131]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-adlp-1/igt@kms_flip@flip-vs-rmfb-interruptible@c-hdmi-a1.html
* igt@kms_flip@flip-vs-suspend-interruptible@a-hdmi-a1:
- shard-adlp: [DMESG-WARN][132] ([Intel XE#2953] / [Intel XE#4173]) -> [PASS][133] +2 other tests pass
[132]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-adlp-9/igt@kms_flip@flip-vs-suspend-interruptible@a-hdmi-a1.html
[133]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-adlp-6/igt@kms_flip@flip-vs-suspend-interruptible@a-hdmi-a1.html
* igt@kms_flip_tiling@flip-change-tiling@pipe-b-hdmi-a-1-y-to-y:
- shard-adlp: [DMESG-FAIL][134] ([Intel XE#4543]) -> [PASS][135]
[134]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-adlp-2/igt@kms_flip_tiling@flip-change-tiling@pipe-b-hdmi-a-1-y-to-y.html
[135]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-adlp-9/igt@kms_flip_tiling@flip-change-tiling@pipe-b-hdmi-a-1-y-to-y.html
* igt@kms_flip_tiling@flip-change-tiling@pipe-d-hdmi-a-1-x-to-y:
- shard-adlp: [FAIL][136] ([Intel XE#1874]) -> [PASS][137] +1 other test pass
[136]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-adlp-2/igt@kms_flip_tiling@flip-change-tiling@pipe-d-hdmi-a-1-x-to-y.html
[137]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-adlp-9/igt@kms_flip_tiling@flip-change-tiling@pipe-d-hdmi-a-1-x-to-y.html
* igt@kms_vrr@negative-basic:
- shard-bmg: [SKIP][138] ([Intel XE#1499]) -> [PASS][139]
[138]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-6/igt@kms_vrr@negative-basic.html
[139]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-4/igt@kms_vrr@negative-basic.html
* igt@xe_evict@evict-mixed-many-threads-small:
- shard-bmg: [INCOMPLETE][140] ([Intel XE#6321]) -> [PASS][141] +1 other test pass
[140]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-1/igt@xe_evict@evict-mixed-many-threads-small.html
[141]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-5/igt@xe_evict@evict-mixed-many-threads-small.html
#### Warnings ####
* igt@kms_content_protection@uevent:
- shard-bmg: [FAIL][142] ([Intel XE#1188]) -> [SKIP][143] ([Intel XE#2341])
[142]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-8/igt@kms_content_protection@uevent.html
[143]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-6/igt@kms_content_protection@uevent.html
* igt@kms_frontbuffer_tracking@drrs-2p-primscrn-indfb-pgflip-blt:
- shard-bmg: [SKIP][144] ([Intel XE#2312]) -> [SKIP][145] ([Intel XE#2311]) +10 other tests skip
[144]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-6/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-indfb-pgflip-blt.html
[145]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-4/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-indfb-pgflip-blt.html
* igt@kms_frontbuffer_tracking@drrs-2p-primscrn-spr-indfb-draw-render:
- shard-bmg: [SKIP][146] ([Intel XE#2311]) -> [SKIP][147] ([Intel XE#2312]) +8 other tests skip
[146]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-8/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-spr-indfb-draw-render.html
[147]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-6/igt@kms_frontbuffer_tracking@drrs-2p-primscrn-spr-indfb-draw-render.html
* igt@kms_frontbuffer_tracking@fbc-2p-primscrn-indfb-msflip-blt:
- shard-bmg: [SKIP][148] ([Intel XE#2312]) -> [SKIP][149] ([Intel XE#5390]) +6 other tests skip
[148]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-6/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-indfb-msflip-blt.html
[149]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-8/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-indfb-msflip-blt.html
* igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-move:
- shard-bmg: [SKIP][150] ([Intel XE#5390]) -> [SKIP][151] ([Intel XE#2312]) +1 other test skip
[150]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-8/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-move.html
[151]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-6/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-spr-indfb-move.html
* igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-cur-indfb-draw-blt:
- shard-bmg: [SKIP][152] ([Intel XE#2312]) -> [SKIP][153] ([Intel XE#2313]) +12 other tests skip
[152]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-6/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-cur-indfb-draw-blt.html
[153]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-3/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-cur-indfb-draw-blt.html
* igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-shrfb-plflip-blt:
- shard-bmg: [SKIP][154] ([Intel XE#2313]) -> [SKIP][155] ([Intel XE#2312]) +8 other tests skip
[154]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-8/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-shrfb-plflip-blt.html
[155]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-6/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-shrfb-plflip-blt.html
* igt@kms_plane_multiple@2x-tiling-yf:
- shard-bmg: [SKIP][156] ([Intel XE#5021]) -> [SKIP][157] ([Intel XE#4596])
[156]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-8/igt@kms_plane_multiple@2x-tiling-yf.html
[157]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-6/igt@kms_plane_multiple@2x-tiling-yf.html
* igt@kms_tiled_display@basic-test-pattern:
- shard-bmg: [FAIL][158] ([Intel XE#1729]) -> [SKIP][159] ([Intel XE#2426])
[158]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-bmg-5/igt@kms_tiled_display@basic-test-pattern.html
[159]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-bmg-8/igt@kms_tiled_display@basic-test-pattern.html
* igt@xe_exec_basic@multigpu-once-basic:
- shard-dg2-set2: [SKIP][160] ([Intel XE#1392]) -> [INCOMPLETE][161] ([Intel XE#4842])
[160]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf/shard-dg2-436/igt@xe_exec_basic@multigpu-once-basic.html
[161]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/shard-dg2-463/igt@xe_exec_basic@multigpu-once-basic.html
[Intel XE#1091]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1091
[Intel XE#1124]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1124
[Intel XE#1126]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1126
[Intel XE#1127]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1127
[Intel XE#1135]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1135
[Intel XE#1158]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1158
[Intel XE#1188]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1188
[Intel XE#1340]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1340
[Intel XE#1392]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1392
[Intel XE#1406]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1406
[Intel XE#1475]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1475
[Intel XE#1489]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1489
[Intel XE#1499]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1499
[Intel XE#1503]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1503
[Intel XE#1727]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1727
[Intel XE#1729]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1729
[Intel XE#1874]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1874
[Intel XE#2291]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2291
[Intel XE#2311]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2311
[Intel XE#2312]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2312
[Intel XE#2313]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2313
[Intel XE#2314]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2314
[Intel XE#2316]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2316
[Intel XE#2341]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2341
[Intel XE#2426]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2426
[Intel XE#2594]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2594
[Intel XE#2705]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2705
[Intel XE#2849]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2849
[Intel XE#2850]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2850
[Intel XE#288]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/288
[Intel XE#2894]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2894
[Intel XE#2925]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2925
[Intel XE#2953]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2953
[Intel XE#301]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/301
[Intel XE#3099]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3099
[Intel XE#3113]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3113
[Intel XE#3149]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3149
[Intel XE#3226]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3226
[Intel XE#323]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/323
[Intel XE#3342]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3342
[Intel XE#3414]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3414
[Intel XE#3573]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3573
[Intel XE#367]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/367
[Intel XE#373]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/373
[Intel XE#378]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/378
[Intel XE#3862]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3862
[Intel XE#4173]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4173
[Intel XE#4212]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4212
[Intel XE#4294]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4294
[Intel XE#4345]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4345
[Intel XE#4354]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4354
[Intel XE#4522]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4522
[Intel XE#4543]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4543
[Intel XE#455]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/455
[Intel XE#4596]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4596
[Intel XE#4837]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4837
[Intel XE#4842]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4842
[Intel XE#4915]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/4915
[Intel XE#5021]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5021
[Intel XE#5208]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5208
[Intel XE#5299]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5299
[Intel XE#5390]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/5390
[Intel XE#6014]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6014
[Intel XE#6168]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6168
[Intel XE#6321]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6321
[Intel XE#6360]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/6360
[Intel XE#651]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/651
[Intel XE#653]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/653
[Intel XE#787]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/787
[Intel XE#929]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/929
[Intel XE#944]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/944
[i915#14968]: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14968
Build changes
-------------
* Linux: xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf -> xe-pw-156482v1
IGT_8596: 8596
xe-3981-1e54d2c469a91e00a39ff7f6b98c31d290245ecf: 1e54d2c469a91e00a39ff7f6b98c31d290245ecf
xe-pw-156482v1: 156482v1
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-156482v1/index.html
[-- Attachment #2: Type: text/html, Size: 44258 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-10-25 3:47 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-24 13:35 [PATCH v8 0/3] drm/xe/migrate: Atomicize CCS copy command setup Satyanarayana K V P
2025-10-24 13:35 ` [PATCH v8 1/3] drm/xe/migrate: Use AVX instructions to prevent partial writes during VF migration CCS batch buffer updates Satyanarayana K V P
2025-10-24 13:57 ` Rodrigo Vivi
2025-10-24 14:05 ` Ville Syrjälä
2025-10-24 14:25 ` K V P, Satyanarayana
2025-10-24 15:40 ` Matthew Brost
2025-10-24 16:05 ` Matt Roper
2025-10-24 16:10 ` Matthew Brost
2025-10-24 20:07 ` Vivi, Rodrigo
2025-10-24 13:35 ` [PATCH v8 2/3] drm/xe/migrate: Make emit_pte() header write atomic Satyanarayana K V P
2025-10-24 13:35 ` [PATCH v8 3/3] drm/xe/vf: Clear CCS read/write buffers in atomic way Satyanarayana K V P
2025-10-24 14:40 ` ✗ CI.checkpatch: warning for drm/xe/migrate: Atomicize CCS copy command setup Patchwork
2025-10-24 14:42 ` ✓ CI.KUnit: success " Patchwork
2025-10-24 15:48 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-25 3:47 ` ✓ Xe.CI.Full: " Patchwork
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox