Re: [PATCH v8 1/3] drm/xe/migrate: Use AVX instructions to prevent partial writes during VF migration CCS batch buffer updates

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: "Vivi, Rodrigo" <rodrigo.vivi@intel.com>
To: "Brost, Matthew" <matthew.brost@intel.com>,
	"Roper, Matthew D" <matthew.d.roper@intel.com>
Cc: "ville.syrjala@linux.intel.com" <ville.syrjala@linux.intel.com>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"K V P, Satyanarayana" <satyanarayana.k.v.p@intel.com>,
	"Wajdeczko, Michal" <Michal.Wajdeczko@intel.com>,
	"Auld, Matthew" <matthew.auld@intel.com>
Subject: Re: [PATCH v8 1/3] drm/xe/migrate: Use AVX instructions to prevent partial writes during VF migration CCS batch buffer updates
Date: Fri, 24 Oct 2025 20:07:44 +0000	[thread overview]
Message-ID: <8bb5aa909150177cf16317b9c71be2014f7a2caf.camel@intel.com> (raw)
In-Reply-To: <aPuk4a/S+JUS6Zy4@lstrano-desk.jf.intel.com>

On Fri, 2025-10-24 at 09:10 -0700, Matthew Brost wrote:
> On Fri, Oct 24, 2025 at 09:05:12AM -0700, Matt Roper wrote:
> > On Fri, Oct 24, 2025 at 07:55:32PM +0530, K V P, Satyanarayana
> > wrote:
> > > 
> > > 
> > > On 24-10-2025 19:35, Ville Syrjälä wrote:
> > > > On Fri, Oct 24, 2025 at 09:57:15AM -0400, Rodrigo Vivi wrote:
> > > > > On Fri, Oct 24, 2025 at 07:05:24PM +0530, Satyanarayana K V P
> > > > > wrote:
> > > > > 
> > > > > Hi Satya,
> > > > > 
> > > > > First of all, thank you for the updates.
> > > > > 
> > > > > Second, the subject is way to big.
> > > > > 
> > > > > This should be enough and under 75 cols:
> > > > > 
> > > > > drm/xe: Use AVX instructions to prevent partial writes during
> > > > > VF pause
> > > > > 
> > > > > more below:
> > > > > 
> > > > > > VF KMD registers two specialized contexts with the GUC for
> > > > > > migration
> > > > > > operations. Save context contain copy commands and PTEs to
> > > > > > transfer CCS
> > > > > > metadata from GPU pools to system memory and restore
> > > > > > context contain copy
> > > > > > commands and PTEs to transfer CCS metadata from system
> > > > > > memory back to CCS
> > > > > > pools. GUC submits these contexts to HW during VF
> > > > > > migration.
> > > > > > 
> > > > > > Each context uses a large batch buffer allocated via sub-
> > > > > > allocator,
> > > > > > pre-filled with MI_NOOPs and terminated with
> > > > > > MI_BATCH_BUFFER_END. During
> > > > > > BO lifecycle management, segments are dynamically allocated
> > > > > > from this
> > > > > > buffer and populated with PTEs and copy commands for active
> > > > > > BOs, then reset
> > > > > > to MI_NOOPs when BOs are destroyed.
> > > > > > 
> > > > > > The CCS copy operation requires a 5-dword command sequence
> > > > > > to be written
> > > > > > to the batch buffer. During VF migration save/restore
> > > > > > operations, if the
> > > > > > vCPU gets preempted or halted while this command sequence
> > > > > > is being
> > > > > > programmed, partial writes can occur. These partial writes
> > > > > > create
> > > > > > incomplete GPU instructions in the batch buffer, which
> > > > > > trigger page faults
> > > > > > when the GUC submits the batch buffer to hardware for CCS
> > > > > > metadata
> > > > > > operations.
> > > > > 
> > > > > Perhaps we could summarize the thing here and move details to
> > > > > the comment
> > > > > near the assembly. The important part in the commit message
> > > > > is to have
> > > > > the 'why'. Some of the details of the commands like MI_NOOP
> > > > > fill and all
> > > > > could be in the comment near the ASM.
> > > > > 
> > > > > > 
> > > > > > Standard memory operations like memcpy() are preemptible,
> > > > > > meaning the CPU
> > > > > > scheduler can interrupt execution midway through writing
> > > > > > the command
> > > > > > sequence, leaving the batch buffer in an inconsistent state
> > > > > > with partially
> > > > > > written GPU instructions.
> > > > > > 
> > > > > > Replace standard memory operations with x86 AVX
> > > > > > instructions that provide
> > > > > > atomic, non-preemptible writes as AVX instructions cannot
> > > > > > be preempted
> > > > > > during execution, ensuring complete command sequences are
> > > > > > written
> > > > > > atomically to the batch buffer.
> > > > > > 
> > > > > > Expand EMIT_COPY_CCS_DW from 5 dwords to 8 dwords to align
> > > > > > with 256-bit
> > > > > > VMOVDQU operations. Update emit_flush_invalidate() to use
> > > > > > VMOVDQU
> > > > > > operating with 128-bit chunks. By ensuring GPU instruction
> > > > > > headers
> > > > > > (3-dword and 5-dword sequences) are written atomically, we
> > > > > > prevent partial
> > > > > > updates that could compromise migration stability.
> > > > > > 
> > > > > > This approach guarantees that batch buffer updates are
> > > > > > completed entirely
> > > > > > or not at all, eliminating the page fault scenarios during
> > > > > > VF migration
> > > > > > operations regardless of vCPU scheduling behavior.
> > > > > > 
> > > > > > Signed-off-by: Satyanarayana K V P
> > > > > > <satyanarayana.k.v.p@intel.com>
> > > > > > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > > > > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > > > > Cc: Matthew Auld <matthew.auld@intel.com>
> > > > > > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > > > > Cc: Matt Roper <matthew.d.roper@intel.com>
> > > > > > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > > > > 
> > > > > > ---
> > > > > > V7 -> V8:
> > > > > > - Updated commit title and message.
> > > > > > 
> > > > > > V6 -> V7:
> > > > > > - Added description explaining why to use assembly
> > > > > > instructions for
> > > > > > atomicity.
> > > > > > - Assert if DGFX tries to use memcpy_vmovdqu(). (Rodrigo)
> > > > > > - Include <asm/cpufeature.h> though checkpatch complains.
> > > > > > With
> > > > > > <linux/cpufeature.h> KUnit is throwing errors.
> > > > > > 
> > > > > > V5 -> V6:
> > > > > > - Fixed review comments (Rodrigo)
> > > > > > 
> > > > > > V4 -> V5:
> > > > > > - Fixed review comments. (Matt B)
> > > > > > 
> > > > > > V3 -> V4:
> > > > > > - Fixed review comments. (Wajdeczko)
> > > > > > - Fix issues reported by patchworks.
> > > > > > 
> > > > > > V2 -> V3:
> > > > > > - Added support for 128 bit and 256 bit instructions with
> > > > > > memcpy_vmovdqu
> > > > > > - Updated emit_flush_invalidate() to use vmovdqu
> > > > > > instruction.
> > > > > > 
> > > > > > V1 -> V2:
> > > > > > - Use memcpy_vmovdqu only for x86 arch and for VF. Else use
> > > > > > memcpy
> > > > > >    (Auld, Matthew)
> > > > > > - Fix issues reported by patchworks.
> > > > > > ---
> > > > > >   drivers/gpu/drm/xe/xe_migrate.c | 114
> > > > > > ++++++++++++++++++++++++++------
> > > > > >   1 file changed, 93 insertions(+), 21 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > > > > > b/drivers/gpu/drm/xe/xe_migrate.c
> > > > > > index 921c9c1ea41f..005dc26a0393 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > > > > @@ -5,6 +5,8 @@
> > > > > >   #include "xe_migrate.h"
> > > > > > +#include <asm/fpu/api.h>
> > > > > > +#include <asm/cpufeature.h>
> > > > > >   #include <linux/bitfield.h>
> > > > > >   #include <linux/sizes.h>
> > > > > > @@ -33,6 +35,7 @@
> > > > > >   #include "xe_res_cursor.h"
> > > > > >   #include "xe_sa.h"
> > > > > >   #include "xe_sched_job.h"
> > > > > > +#include "xe_sriov_vf_ccs.h"
> > > > > >   #include "xe_sync.h"
> > > > > >   #include "xe_trace_bo.h"
> > > > > >   #include "xe_validation.h"
> > > > > > @@ -657,18 +660,70 @@ static void emit_pte(struct
> > > > > > xe_migrate *m,
> > > > > >   	}
> > > > > >   }
> > > > > > -#define EMIT_COPY_CCS_DW 5
> > > > > > +/*
> > > > > > + * VF KMD registers two special LRCs with the GuC to
> > > > > > handle save/restore
> > > > > > + * operations for CCS metadata on IGPU. GUC executes these
> > > > > > LRCAs during
> > > > > > + * VF state/restore operations.
> > > > > > + *
> > > > > > + * Each LRC contains a batch buffer pool that GuC submits
> > > > > > to hardware during
> > > > > > + * VF state save/restore operations. Since these
> > > > > > operations can occur
> > > > > > + * asynchronously at any time, we must ensure GPU
> > > > > > instructions in the batch
> > > > > > + * buffer are written atomically to prevent corruption
> > > > > > from incomplete writes.
> > > > > > + *
> > > > > > + * To guarantee atomic instruction writes, we use x86 SIMD
> > > > > > instructions
> > > > > 
> > > > > Here you still mention 'atomic' since we already know this is
> > > > > not 'atomic'.
> > > > 
> > > > I still don't see how is this supposed to do anything useful
> > > > without
> > > > atomic writes to memory.
> > > > 
> > > > If the GPU is executing the same memory we're writing then
> > > > nothing
> > > > short of atomic memory writes is going to actually fix it. And
> > > > even
> > > > that would require careful alignment of things to guarantee
> > > > that
> > > > each command is completely contained within one atomic write.
> > > > 
> > > The CPU and GPU operate on the same memory space but at different
> > > times
> > > during VF migration. The critical issue occurs during the batch
> > > buffer
> > > preparation phase when the vCPU is still active and writing GPU
> > > instructions, while the GPU will later execute these same
> > > instructions after
> > > the vCPU is paused.
> > > 
> > > During batch buffer updates, if the vCPU gets preempted while
> > > writing GPU
> > > instruction sequences (such as the 5-dword CCS copy command), it
> > > leaves
> > > partially written instructions in memory. When the GPU later
> > > executes the
> > > batch buffer after vCPU suspension, these incomplete instructions
> > > cause
> > > execution failures and page faults.
> > 
> > As was discussed on the previous revision, the architecture
> > document
> > already gives guidance on approaches to deal with these timing
> > issues;
> > using AVX like this is not what was recommended.  Can't we just
> > implement the shadow buffer and eliminate this controversial and
> > confusing assembly usage?  I think relying on assembly should be
> > the
> > absolute last resort and not something we jump to when we have
> > cleaner
> > and more widely-supported options.
> > 
> 
> I discussed this Rodrigo on call, we were both fine with this
> solution
> but if the consensus if not use asm, not sure we can pivot. fwiw,
> this
> solution will perform better as there is a non-zero cost of
> maintaining
> 2 buffers but perhaps that doesn't really matter as I wouldn't think
> memory allocations are a hot path.

I have nothing against this asm code to be honest. (at least after I
understood the flow and the code).

My only true concern was always with the questions that it will keep
bringing it over and over.

And based on this thread today clearly, even the doc and commit message
in place right now is not solving it and we are continuing to hear
questions over and over. :(

Then, from the documentation itself:
https://www.kernel.org/doc/html/latest/process/coding-style.html#inline-assembly

"However, don’t use inline assembly gratuitously when C can do the job.
You can and should poke hardware from C when possible."

With this in mind and the comment from Matt and Ville. Perhaps we need
to reconsider the path and take the solutions proposed by the doc
itself instead of this code.

In case this is urgent and blocking something we could perhaps go with
this solution that has been already validated, but with work in
parallel to replace it asap.

Thanks,
Rodrigo.

> 
> Matt
> 
> > 
> > Matt
> > 
> > > 
> > > AVX instructions provide atomic write operations that cannot be
> > > interrupted
> > > by the CPU scheduler. This ensures that GPU instruction sequences
> > > are
> > > written completely before any potential vCPU preemption occurs.
> > > 
> > > AVX instructions (VMOVDQU) guarantee that entire instruction
> > > sequences are
> > > written in a single, non-preemptible operation. The 5-dword CCS
> > > copy command
> > > is expanded to 8 dwords (padded with 3 MI_NOOPs) to meet AVX 256-
> > > bit
> > > alignment requirements. By the time the GPU executes the batch
> > > buffer (after
> > > vCPU pause), all instructions are guaranteed to be completely
> > > written.
> > > 
> > > Here we are ensuring that GPU instructions are fully formed
> > > before the GPU
> > > attempts to execute them during the migration process.
> > > 
> > > -Satya.>>
> > > > > Let a summarized explanation in the commit message and put
> > > > > more here.
> > > > > 
> > > > > I'm sorry for being picky here, but I want to ensure that the
> > > > > information
> > > > > around this code is clear so we don't keep having to explain
> > > > > this over
> > > > > and over in the future.
> > > > > 
> > > > > > + * (128-bit XMM and 256-bit YMM) within
> > > > > > kernel_fpu_begin()/kernel_fpu_end()
> > > > > > + * sections. This prevents vCPU preemption during
> > > > > > instruction generation,
> > > > > > + * ensuring complete GPU commands are written to the batch
> > > > > > buffer.
> > > > > > + */
> > > > > > +
> > > > > > +static void memcpy_vmovdqu(struct xe_device *xe, void
> > > > > > *dst, const void *src, u32 size)
> > > > > > +{
> > > > > > +	xe_assert(xe, !IS_DGFX(xe));
> > > > > > +	xe_assert(xe, IS_SRIOV_VF(xe));
> > > > > > +
> > > > > > +#ifdef CONFIG_X86
> > > > > > +	kernel_fpu_begin();
> > > > > > +	if (size == SZ_128) {
> > > > > > +		asm("vmovdqu (%0), %%xmm0\n"
> > > > > > +		    "vmovups %%xmm0,   (%1)\n"
> > > > > > +		    :: "r" (src), "r" (dst) : "memory");
> > > > > > +	} else if (size == SZ_256) {
> > > > > > +		asm("vmovdqu (%0), %%ymm0\n"
> > > > > > +		    "vmovups %%ymm0,   (%1)\n"
> > > > > > +		    :: "r" (src), "r" (dst) : "memory");
> > > > > > +	}
> > > > > > +	kernel_fpu_end();
> > > > > > +#endif
> > > > > > +}
> > > > > > +
> > > > > > +static void emit_atomic(struct xe_gt *gt, void *dst, const
> > > > > > void *src, u32 size)
> > > > > > +{
> > > > > > +	u32 instr_size = size * BITS_PER_BYTE;
> > > > > > +
> > > > > > +	xe_gt_assert(gt, instr_size == SZ_128 ||
> > > > > > instr_size == SZ_256);
> > > > > > +
> > > > > > +	if (IS_VF_CCS_READY(gt_to_xe(gt))) {
> > > > > > +		xe_gt_assert(gt,
> > > > > > static_cpu_has(X86_FEATURE_AVX));
> > > > > > +		memcpy_vmovdqu(gt_to_xe(gt), dst, src,
> > > > > > instr_size);
> > > > > > +	} else {
> > > > > > +		memcpy(dst, src, size);
> > > > > > +	}
> > > > > > +}
> > > > > > +
> > > > > > +#define EMIT_COPY_CCS_DW 8
> > > > > >   static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb
> > > > > > *bb,
> > > > > >   			  u64 dst_ofs, bool
> > > > > > dst_is_indirect,
> > > > > >   			  u64 src_ofs, bool
> > > > > > src_is_indirect,
> > > > > >   			  u32 size)
> > > > > >   {
> > > > > > +	u32 dw[EMIT_COPY_CCS_DW] = {MI_NOOP};
> > > > > >   	struct xe_device *xe = gt_to_xe(gt);
> > > > > >   	u32 *cs = bb->cs + bb->len;
> > > > > >   	u32 num_ccs_blks;
> > > > > >   	u32 num_pages;
> > > > > >   	u32 ccs_copy_size;
> > > > > >   	u32 mocs;
> > > > > > +	u32 i = 0;
> > > > > >   	if (GRAPHICS_VERx100(xe) >= 2000) {
> > > > > >   		num_pages = DIV_ROUND_UP(size,
> > > > > > XE_PAGE_SIZE);
> > > > > > @@ -686,15 +741,23 @@ static void emit_copy_ccs(struct
> > > > > > xe_gt *gt, struct xe_bb *bb,
> > > > > >   		mocs = FIELD_PREP(XY_CTRL_SURF_MOCS_MASK,
> > > > > > gt->mocs.uc_index);
> > > > > >   	}
> > > > > > -	*cs++ = XY_CTRL_SURF_COPY_BLT |
> > > > > > -		(src_is_indirect ? 0x0 : 0x1) <<
> > > > > > SRC_ACCESS_TYPE_SHIFT |
> > > > > > -		(dst_is_indirect ? 0x0 : 0x1) <<
> > > > > > DST_ACCESS_TYPE_SHIFT |
> > > > > > -		ccs_copy_size;
> > > > > > -	*cs++ = lower_32_bits(src_ofs);
> > > > > > -	*cs++ = upper_32_bits(src_ofs) | mocs;
> > > > > > -	*cs++ = lower_32_bits(dst_ofs);
> > > > > > -	*cs++ = upper_32_bits(dst_ofs) | mocs;
> > > > > > +	dw[i++] = XY_CTRL_SURF_COPY_BLT |
> > > > > > +		  (src_is_indirect ? 0x0 : 0x1) <<
> > > > > > SRC_ACCESS_TYPE_SHIFT |
> > > > > > +		  (dst_is_indirect ? 0x0 : 0x1) <<
> > > > > > DST_ACCESS_TYPE_SHIFT |
> > > > > > +		  ccs_copy_size;
> > > > > > +	dw[i++] = lower_32_bits(src_ofs);
> > > > > > +	dw[i++] = upper_32_bits(src_ofs) | mocs;
> > > > > > +	dw[i++] = lower_32_bits(dst_ofs);
> > > > > > +	dw[i++] = upper_32_bits(dst_ofs) | mocs;
> > > > > > +	/*
> > > > > > +	 * The CCS copy command is a 5-dword sequence. If
> > > > > > the vCPU halts during
> > > > > > +	 * save/restore while this sequence is being
> > > > > > issued, partial writes may trigger
> > > > > > +	 * page faults when saving iGPU CCS metadata. Use
> > > > > > the VMOVDQU instruction to
> > > > > > +	 * write the sequence atomically.
> > > > > > +	 */
> > > > > > +	emit_atomic(gt, cs, dw, sizeof(dw));
> > > > > > +	cs += EMIT_COPY_CCS_DW;
> > > > > >   	bb->len = cs - bb->cs;
> > > > > >   }
> > > > > > @@ -1061,18 +1124,27 @@ static u64
> > > > > > migrate_vm_ppgtt_addr_tlb_inval(void)
> > > > > >   	return (NUM_KERNEL_PDE - 2) * XE_PAGE_SIZE;
> > > > > >   }
> > > > > > -static int emit_flush_invalidate(u32 *dw, int i, u32
> > > > > > flags)
> > > > > > +/*
> > > > > > + * The MI_FLUSH_DW command is a 4-dword sequence. If the
> > > > > > vCPU halts during
> > > > > > + * save/restore while this sequence is being issued,
> > > > > > partial writes may
> > > > > > + * trigger page faults when saving iGPU CCS metadata. Use
> > > > > > + * emit_atomic() to write the sequence atomically.
> > > > > > + */
> > > > > > +#define EMIT_FLUSH_INVALIDATE_DW 4
> > > > > > +static int emit_flush_invalidate(struct xe_exec_queue *q,
> > > > > > u32 *cs, int i, u32 flags)
> > > > > >   {
> > > > > >   	u64 addr = migrate_vm_ppgtt_addr_tlb_inval();
> > > > > > +	u32 dw[EMIT_FLUSH_INVALIDATE_DW] = {MI_NOOP}, j =
> > > > > > 0;
> > > > > > +
> > > > > > +	dw[j++] = MI_FLUSH_DW | MI_INVALIDATE_TLB |
> > > > > > MI_FLUSH_DW_OP_STOREDW |
> > > > > > +		      MI_FLUSH_IMM_DW | flags;
> > > > > > +	dw[j++] = lower_32_bits(addr);
> > > > > > +	dw[j++] = upper_32_bits(addr);
> > > > > > +	dw[j++] = MI_NOOP;
> > > > > > -	dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB |
> > > > > > MI_FLUSH_DW_OP_STOREDW |
> > > > > > -		  MI_FLUSH_IMM_DW | flags;
> > > > > > -	dw[i++] = lower_32_bits(addr);
> > > > > > -	dw[i++] = upper_32_bits(addr);
> > > > > > -	dw[i++] = MI_NOOP;
> > > > > > -	dw[i++] = MI_NOOP;
> > > > > > +	emit_atomic(q->gt, &cs[i], dw, sizeof(dw));
> > > > > > -	return i;
> > > > > > +	return i + j;
> > > > > >   }
> > > > > >   /**
> > > > > > @@ -1117,7 +1189,7 @@ int xe_migrate_ccs_rw_copy(struct
> > > > > > xe_tile *tile, struct xe_exec_queue *q,
> > > > > >   	/* Calculate Batch buffer size */
> > > > > >   	batch_size = 0;
> > > > > >   	while (size) {
> > > > > > -		batch_size += 10; /* Flush + ggtt addr + 2
> > > > > > NOP */
> > > > > > +		batch_size += EMIT_FLUSH_INVALIDATE_DW *
> > > > > > 2; /* Flush + ggtt addr + 1 NOP */
> > > > > >   		u64 ccs_ofs, ccs_size;
> > > > > >   		u32 ccs_pt;
> > > > > > @@ -1158,7 +1230,7 @@ int xe_migrate_ccs_rw_copy(struct
> > > > > > xe_tile *tile, struct xe_exec_queue *q,
> > > > > >   	 * sizes here again before copy command is
> > > > > > emitted.
> > > > > >   	 */
> > > > > >   	while (size) {
> > > > > > -		batch_size += 10; /* Flush + ggtt addr + 2
> > > > > > NOP */
> > > > > > +		batch_size += EMIT_FLUSH_INVALIDATE_DW *
> > > > > > 2; /* Flush + ggtt addr + 1 NOP */
> > > > > >   		u32 flush_flags = 0;
> > > > > >   		u64 ccs_ofs, ccs_size;
> > > > > >   		u32 ccs_pt;
> > > > > > @@ -1181,11 +1253,11 @@ int xe_migrate_ccs_rw_copy(struct
> > > > > > xe_tile *tile, struct xe_exec_queue *q,
> > > > > >   		emit_pte(m, bb, ccs_pt, false, false,
> > > > > > &ccs_it, ccs_size, src);
> > > > > > -		bb->len = emit_flush_invalidate(bb->cs,
> > > > > > bb->len, flush_flags);
> > > > > > +		bb->len = emit_flush_invalidate(q, bb->cs,
> > > > > > bb->len, flush_flags);
> > > > > >   		flush_flags = xe_migrate_ccs_copy(m, bb,
> > > > > > src_L0_ofs, src_is_pltt,
> > > > > >   						 
> > > > > > src_L0_ofs, dst_is_pltt,
> > > > > >   						  src_L0,
> > > > > > ccs_ofs, true);
> > > > > > -		bb->len = emit_flush_invalidate(bb->cs,
> > > > > > bb->len, flush_flags);
> > > > > > +		bb->len = emit_flush_invalidate(q, bb->cs,
> > > > > > bb->len, flush_flags);
> > > > > >   		size -= src_L0;
> > > > > >   	}
> > > > > > -- 
> > > > > > 2.51.0
> > > > > > 
> > > > 
> > > 
> > 
> > -- 
> > Matt Roper
> > Graphics Software Engineer
> > Linux GPU Platform Enablement
> > Intel Corporation

next prev parent reply	other threads:[~2025-10-24 20:07 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-24 13:35 [PATCH v8 0/3] drm/xe/migrate: Atomicize CCS copy command setup Satyanarayana K V P
2025-10-24 13:35 ` [PATCH v8 1/3] drm/xe/migrate: Use AVX instructions to prevent partial writes during VF migration CCS batch buffer updates Satyanarayana K V P
2025-10-24 13:57   ` Rodrigo Vivi
2025-10-24 14:05     ` Ville Syrjälä
2025-10-24 14:25       ` K V P, Satyanarayana
2025-10-24 15:40         ` Matthew Brost
2025-10-24 16:05         ` Matt Roper
2025-10-24 16:10           ` Matthew Brost
2025-10-24 20:07             ` Vivi, Rodrigo [this message]
2025-10-24 13:35 ` [PATCH v8 2/3] drm/xe/migrate: Make emit_pte() header write atomic Satyanarayana K V P
2025-10-24 13:35 ` [PATCH v8 3/3] drm/xe/vf: Clear CCS read/write buffers in atomic way Satyanarayana K V P
2025-10-24 14:40 ` ✗ CI.checkpatch: warning for drm/xe/migrate: Atomicize CCS copy command setup Patchwork
2025-10-24 14:42 ` ✓ CI.KUnit: success " Patchwork
2025-10-24 15:48 ` ✓ Xe.CI.BAT: " Patchwork
2025-10-25  3:47 ` ✓ Xe.CI.Full: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8bb5aa909150177cf16317b9c71be2014f7a2caf.camel@intel.com \
    --to=rodrigo.vivi@intel.com \
    --cc=Michal.Wajdeczko@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=matthew.d.roper@intel.com \
    --cc=satyanarayana.k.v.p@intel.com \
    --cc=ville.syrjala@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox