public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH v18 0/9] AuxCCS handling and render compression modifiers
@ 2026-03-04 13:03 Tvrtko Ursulin
  2026-03-04 13:03 ` [PATCH v18 1/9] drm/xe: Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC Tvrtko Ursulin
                   ` (8 more replies)
  0 siblings, 9 replies; 17+ messages in thread
From: Tvrtko Ursulin @ 2026-03-04 13:03 UTC (permalink / raw)
  To: intel-xe; +Cc: kernel-dev, Tvrtko Ursulin, Rodrigo Vivi

A series to add support for compressed surface scanout under xe with
Alderlake-P.

Currently the auxiliary buffer data isn't mapped into the page tables at all so
cf48bddd31de ("drm/i915/display: Disable AuxCCS framebuffers if built for Xe")
had to disable the support.

On top of that there are missing flushes, invalidations and similar.

Tested with KDE Wayland, on Lenovo Carbon X1 ADL-P:

  [PLANE:32:plane 1A]: type=PRI
          uapi: [FB:242] AR30 little-endian (0x30335241),0x100000000000008,2880x1800, visible=visible, src=2880.000000x1800.000000+0.000000+0.000000, dst=2880x1800+0+0, rotation=0 (0x00000001)
          hw: [FB:242] AR30 little-endian (0x30335241),0x100000000000008,2880x1800, visible=yes, src=2880.000000x1800.000000+0.000000+0.000000, dst=2880x1800+0+0, rotation=0 (0x00000001)

Display working fine - no artefacts, no DMAR/PIPE faults.

All IGTs pass for me locally.

v2:
 * More patches added to fix kms_flip_tiling.

v3:
 * Rebased after some cleanup patches from v2 were merged.
 * Added people to Cc as suggested by Rodrigo.
 * Adjusted last patch title. (Rodrigo)
 * Apply GGTT flushing only to iomapped system memory buffers.

v4:
 * Added patch for potentially misplaced Wa_14016712196.
 * Fixed (hopefully) MAX_JOB_SIZE_DW on Meteorlake.

v5:
 * Split out ring emission changes to smaller patches.
 * Fixed MAX_JOB_SIZE_DW even more.
 * Don't emit MI_FLUSH_DW_CCS on !BCS. This should fix Meteorlake.

 v6:
 * Added AuxCCS invalidation to indirect context workarounds.
 * Also added the indirect context handling and some other workarounds. They are
   unrelated but the series depends on it.
 * Dropped DPT pin alignment reduction since BMG appears not to be liking it for
   some reason.

v7:
 * Rebased on top of recent xe_fb_pin.c refactoring and also the indirect
   context workarounds series.

v8:
 * Rebased for bo->size removal.
 * Corrected PIPE_CONTROL_FLUSH_L3 to bit 30. (Jose)

v9:
 * Fixed fb remapping changes.
 * Dropped two not required patches from the series.
 * Fixed criteria for GGTT flushing.
 * Limit clflush to the compression metadata area.
 * Rebased for indirect context workarounds landing upstream.

v10:
 * Rebase for XE_GT_WA().

v11:
 * Do not use stolen for DPT on IGFX + AuxCCS.

v12:
 * Rebased for some ringbuf and LRC code changes.

v13:
 * Rebased for various upstream changes.
 * Dropped clflush and stolen avoidance patches after merging IGT MOCS 61 usage.

 v14:
 * MMIO 0x4248 and MI_FLUSH_DW_CCS are MTL+. (Matt)
 * Consolidate engine feature checks. (Ville)
 * Brought back the patch to put DPT tables in system memory for 100% CI pass
   rate. It looks like MOCS 61 is not enough to avoid sporadic pipecrc
   mismatches.

v15:
 * Limited to enabling on Alderlake-P only. (Dropped all Meteorlake patches.)
 * Dropped unrelated GGTT alignment fix. (Sent standalone.)
 * Use display parent interface for probing AuxCCS driver support.

v16:
 * Use write-combine for DPT in stolen memory. (Ville)
 * Dropped clflush patches under assumption pre-production ADL machine were the
   reason for sporadic pipecrc failures.

v17:
 * Mechanical rebase for upstream conflicts.

v18:
 * Added a patch to rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC. (Rodrigo)
 * Instead of exporting a helper function for emitting the aux invalidation
   into the ring, add it to the ring ops vfunc table. (Matthew)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>

Tvrtko Ursulin (9):
  drm/xe: Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC
  drm/xe: Use write-combine mapping when populating DPT
  drm/xe/xelpg: Limit AuxCCS ring buffer programming to Alderlake
  drm/xe/xelp: Quiesce memory traffic before invalidating AuxCCS
  drm/xe/xelp: Wait for AuxCCS invalidation to complete
  drm/xe: Move aux table invalidation to ring ops
  drm/xe/xelp: Add AuxCCS invalidation to the indirect context
    workarounds
  drm/xe/display: Add support for AuxCCS
  drm/xe/xelp: Expose AuxCCS frame buffer modifiers on Alderlake-P

 drivers/gpu/drm/xe/display/intel_fb_bo.c      |   6 +-
 drivers/gpu/drm/xe/display/intel_fbdev_fb.c   |  12 +-
 drivers/gpu/drm/xe/display/xe_display.c       |   8 ++
 drivers/gpu/drm/xe/display/xe_dsb_buffer.c    |   4 +-
 drivers/gpu/drm/xe/display/xe_fb_pin.c        | 116 +++++++++++++-----
 drivers/gpu/drm/xe/display/xe_initial_plane.c |   2 +-
 .../gpu/drm/xe/instructions/xe_mi_commands.h  |   6 +
 drivers/gpu/drm/xe/xe_bo.c                    |  16 ++-
 drivers/gpu/drm/xe/xe_bo.h                    |   2 +-
 drivers/gpu/drm/xe/xe_lrc.c                   |  23 ++++
 drivers/gpu/drm/xe/xe_ring_ops.c              | 106 ++++++++++++----
 drivers/gpu/drm/xe/xe_ring_ops.h              |   3 +
 drivers/gpu/drm/xe/xe_ring_ops_types.h        |   5 +-
 13 files changed, 240 insertions(+), 69 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v18 1/9] drm/xe: Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC
  2026-03-04 13:03 [PATCH v18 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
@ 2026-03-04 13:03 ` Tvrtko Ursulin
  2026-03-04 13:50   ` Rodrigo Vivi
  2026-03-04 13:03 ` [PATCH v18 2/9] drm/xe: Use write-combine mapping when populating DPT Tvrtko Ursulin
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Tvrtko Ursulin @ 2026-03-04 13:03 UTC (permalink / raw)
  To: intel-xe; +Cc: kernel-dev, Tvrtko Ursulin, Rodrigo Vivi

Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC so that the usage of the
flag can legitimately be expanded to more than just the actual frame-
buffer objects.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/display/intel_fb_bo.c      |  6 +++---
 drivers/gpu/drm/xe/display/intel_fbdev_fb.c   | 12 ++++++++----
 drivers/gpu/drm/xe/display/xe_dsb_buffer.c    |  4 +++-
 drivers/gpu/drm/xe/display/xe_fb_pin.c        |  2 +-
 drivers/gpu/drm/xe/display/xe_initial_plane.c |  2 +-
 drivers/gpu/drm/xe/xe_bo.c                    | 16 +++++++++++-----
 drivers/gpu/drm/xe/xe_bo.h                    |  2 +-
 7 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/xe/display/intel_fb_bo.c b/drivers/gpu/drm/xe/display/intel_fb_bo.c
index db8b1a27b4de..d2e72dc5abd9 100644
--- a/drivers/gpu/drm/xe/display/intel_fb_bo.c
+++ b/drivers/gpu/drm/xe/display/intel_fb_bo.c
@@ -45,9 +45,9 @@ int intel_fb_bo_framebuffer_init(struct drm_gem_object *obj,
 	if (ret)
 		goto err;
 
-	if (!(bo->flags & XE_BO_FLAG_SCANOUT)) {
+	if (!(bo->flags & XE_BO_FLAG_FORCE_WC)) {
 		/*
-		 * XE_BO_FLAG_SCANOUT should ideally be set at creation, or is
+		 * XE_BO_FLAG_FORCE_WC should ideally be set at creation, or is
 		 * automatically set when creating FB. We cannot change caching
 		 * mode when the bo is VM_BINDed, so we can only set
 		 * coherency with display when unbound.
@@ -57,7 +57,7 @@ int intel_fb_bo_framebuffer_init(struct drm_gem_object *obj,
 			ret = -EINVAL;
 			goto err;
 		}
-		bo->flags |= XE_BO_FLAG_SCANOUT;
+		bo->flags |= XE_BO_FLAG_FORCE_WC;
 	}
 	ttm_bo_unreserve(&bo->ttm);
 	return 0;
diff --git a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
index 87af5646c938..d7030e4d814c 100644
--- a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
+++ b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
@@ -56,9 +56,11 @@ struct drm_gem_object *intel_fbdev_fb_bo_create(struct drm_device *drm, int size
 	if (intel_fbdev_fb_prefer_stolen(drm, size)) {
 		obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe),
 						size,
-						ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
+						ttm_bo_type_kernel,
+						XE_BO_FLAG_FORCE_WC |
 						XE_BO_FLAG_STOLEN |
-						XE_BO_FLAG_GGTT, false);
+						XE_BO_FLAG_GGTT,
+						false);
 		if (!IS_ERR(obj))
 			drm_info(&xe->drm, "Allocated fbdev into stolen\n");
 		else
@@ -69,9 +71,11 @@ struct drm_gem_object *intel_fbdev_fb_bo_create(struct drm_device *drm, int size
 
 	if (IS_ERR(obj)) {
 		obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe), size,
-						ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
+						ttm_bo_type_kernel,
+						XE_BO_FLAG_FORCE_WC |
 						XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
-						XE_BO_FLAG_GGTT, false);
+						XE_BO_FLAG_GGTT,
+						false);
 	}
 
 	if (IS_ERR(obj)) {
diff --git a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
index 1c67a950c6ad..a7158c73a14c 100644
--- a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
+++ b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
@@ -54,7 +54,9 @@ static struct intel_dsb_buffer *xe_dsb_buffer_create(struct drm_device *drm, siz
 					PAGE_ALIGN(size),
 					ttm_bo_type_kernel,
 					XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
-					XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT, false);
+					XE_BO_FLAG_FORCE_WC |
+					XE_BO_FLAG_GGTT,
+					false);
 	if (IS_ERR(obj)) {
 		ret = PTR_ERR(obj);
 		goto err_pin_map;
diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
index dbbc61032b7f..d4a9eb550cae 100644
--- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
+++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
@@ -429,7 +429,7 @@ int intel_plane_pin_fb(struct intel_plane_state *new_plane_state,
 		return 0;
 
 	/* We reject creating !SCANOUT fb's, so this is weird.. */
-	drm_WARN_ON(bo->ttm.base.dev, !(bo->flags & XE_BO_FLAG_SCANOUT));
+	drm_WARN_ON(bo->ttm.base.dev, !(bo->flags & XE_BO_FLAG_FORCE_WC));
 
 	vma = __xe_pin_fb_vma(intel_fb, &new_plane_state->view.gtt, alignment);
 
diff --git a/drivers/gpu/drm/xe/display/xe_initial_plane.c b/drivers/gpu/drm/xe/display/xe_initial_plane.c
index 65cc0b0c934b..8bcae552dddc 100644
--- a/drivers/gpu/drm/xe/display/xe_initial_plane.c
+++ b/drivers/gpu/drm/xe/display/xe_initial_plane.c
@@ -48,7 +48,7 @@ initial_plane_bo(struct xe_device *xe,
 	if (plane_config->size == 0)
 		return NULL;
 
-	flags = XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT;
+	flags = XE_BO_FLAG_FORCE_WC | XE_BO_FLAG_GGTT;
 
 	base = round_down(plane_config->base, page_size);
 	if (IS_DGFX(xe)) {
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 8ff193600443..fe560b0a980a 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -515,7 +515,7 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
 		 * For Xe_LPG and beyond up to NVL-P (excluding), PPGTT PTE
 		 * lookups are also non-coherent and require a CPU:WC mapping.
 		 */
-		if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_SCANOUT) ||
+		if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_FORCE_WC) ||
 		     (!xe->info.has_cached_pt && bo->flags & XE_BO_FLAG_PAGETABLE))
 			caching = ttm_write_combined;
 	}
@@ -3196,8 +3196,14 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 	if (args->flags & DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING)
 		bo_flags |= XE_BO_FLAG_DEFER_BACKING;
 
+	/*
+	 * Display scanout is always non-coherent with the CPU cache.
+	 *
+	 * For Xe_LPG and beyond up to NVL-P (excluding), PPGTT PTE
+	 * lookups are also non-coherent and require a CPU:WC mapping.
+	 */
 	if (args->flags & DRM_XE_GEM_CREATE_FLAG_SCANOUT)
-		bo_flags |= XE_BO_FLAG_SCANOUT;
+		bo_flags |= XE_BO_FLAG_FORCE_WC;
 
 	if (args->flags & DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION) {
 		if (XE_IOCTL_DBG(xe, GRAPHICS_VER(xe) < 20))
@@ -3209,7 +3215,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 
 	/* CCS formats need physical placement at a 64K alignment in VRAM. */
 	if ((bo_flags & XE_BO_FLAG_VRAM_MASK) &&
-	    (bo_flags & XE_BO_FLAG_SCANOUT) &&
+	    (args->flags & XE_BO_FLAG_FORCE_WC) &&
 	    !(xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K) &&
 	    IS_ALIGNED(args->size, SZ_64K))
 		bo_flags |= XE_BO_FLAG_NEEDS_64K;
@@ -3229,7 +3235,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 			 args->cpu_caching != DRM_XE_GEM_CPU_CACHING_WC))
 		return -EINVAL;
 
-	if (XE_IOCTL_DBG(xe, bo_flags & XE_BO_FLAG_SCANOUT &&
+	if (XE_IOCTL_DBG(xe, bo_flags & XE_BO_FLAG_FORCE_WC &&
 			 args->cpu_caching == DRM_XE_GEM_CPU_CACHING_WB))
 		return -EINVAL;
 
@@ -3642,7 +3648,7 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
 	bo = xe_bo_create_user(xe, NULL, args->size,
 			       DRM_XE_GEM_CPU_CACHING_WC,
 			       XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
-			       XE_BO_FLAG_SCANOUT |
+			       XE_BO_FLAG_FORCE_WC |
 			       XE_BO_FLAG_NEEDS_CPU_ACCESS, NULL);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index c914ab719f20..af9a6669c872 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -35,7 +35,7 @@
 #define XE_BO_FLAG_PINNED		BIT(7)
 #define XE_BO_FLAG_NO_RESV_EVICT	BIT(8)
 #define XE_BO_FLAG_DEFER_BACKING	BIT(9)
-#define XE_BO_FLAG_SCANOUT		BIT(10)
+#define XE_BO_FLAG_FORCE_WC		BIT(10)
 #define XE_BO_FLAG_FIXED_PLACEMENT	BIT(11)
 #define XE_BO_FLAG_PAGETABLE		BIT(12)
 #define XE_BO_FLAG_NEEDS_CPU_ACCESS	BIT(13)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v18 2/9] drm/xe: Use write-combine mapping when populating DPT
  2026-03-04 13:03 [PATCH v18 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
  2026-03-04 13:03 ` [PATCH v18 1/9] drm/xe: Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC Tvrtko Ursulin
@ 2026-03-04 13:03 ` Tvrtko Ursulin
  2026-03-04 13:51   ` Rodrigo Vivi
  2026-03-04 13:03 ` [PATCH v18 3/9] drm/xe/xelpg: Limit AuxCCS ring buffer programming to Alderlake Tvrtko Ursulin
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Tvrtko Ursulin @ 2026-03-04 13:03 UTC (permalink / raw)
  To: intel-xe; +Cc: kernel-dev, Tvrtko Ursulin, Ville Syrjälä

The fallback case for DPT backing store is a buffer object in system
memory buffer, which by default use a write-back CPU caching policy.

If this fallback gets triggered, and since there is currently no flushing,
the DPT writes made when pinning a buffer to display are not guaranteed to
be seen by the display engine.

To fix this, since both the local memory and the stolen memory DPT
placements already use write-combine, let us make the system memory option
follow suit by passing down the appropriate flag.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/xe/display/xe_fb_pin.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
index d4a9eb550cae..df7d305c6fcd 100644
--- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
+++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
@@ -122,7 +122,8 @@ static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
 						   ttm_bo_type_kernel,
 						   XE_BO_FLAG_SYSTEM |
 						   XE_BO_FLAG_GGTT |
-						   XE_BO_FLAG_PAGETABLE,
+						   XE_BO_FLAG_PAGETABLE |
+						   XE_BO_FLAG_FORCE_WC,
 						   alignment, false);
 	if (IS_ERR(dpt))
 		return PTR_ERR(dpt);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v18 3/9] drm/xe/xelpg: Limit AuxCCS ring buffer programming to Alderlake
  2026-03-04 13:03 [PATCH v18 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
  2026-03-04 13:03 ` [PATCH v18 1/9] drm/xe: Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC Tvrtko Ursulin
  2026-03-04 13:03 ` [PATCH v18 2/9] drm/xe: Use write-combine mapping when populating DPT Tvrtko Ursulin
@ 2026-03-04 13:03 ` Tvrtko Ursulin
  2026-03-04 13:03 ` [PATCH v18 4/9] drm/xe/xelp: Quiesce memory traffic before invalidating AuxCCS Tvrtko Ursulin
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Tvrtko Ursulin @ 2026-03-04 13:03 UTC (permalink / raw)
  To: intel-xe; +Cc: kernel-dev, Tvrtko Ursulin, Rodrigo Vivi

At the moment the driver does not support AuxCCS at all due respective
modifiers being hidden from userspace.

As we are about to start enabling them, starting with Alderlake, let us
begin by limiting the ring buffer support to just that initial platform.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_ring_ops.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index 53d420d72164..07235a895f4b 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -305,9 +305,9 @@ static bool has_aux_ccs(struct xe_device *xe)
 	 * PVC is a special case that has no compression of either type
 	 * (FlatCCS or AuxCCS).  Also, AuxCCS is no longer used from Xe2
 	 * onward, so any future platforms with no FlatCCS will not have
-	 * AuxCCS either.
+	 * AuxCCS, and we explicity do not want to support it on MTL.
 	 */
-	if (GRAPHICS_VER(xe) >= 20 || xe->info.platform == XE_PVC)
+	if (GRAPHICS_VERx100(xe) >= 1270 || xe->info.platform == XE_PVC)
 		return false;
 
 	return !xe->info.has_flat_ccs;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v18 4/9] drm/xe/xelp: Quiesce memory traffic before invalidating AuxCCS
  2026-03-04 13:03 [PATCH v18 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2026-03-04 13:03 ` [PATCH v18 3/9] drm/xe/xelpg: Limit AuxCCS ring buffer programming to Alderlake Tvrtko Ursulin
@ 2026-03-04 13:03 ` Tvrtko Ursulin
  2026-03-04 13:03 ` [PATCH v18 5/9] drm/xe/xelp: Wait for AuxCCS invalidation to complete Tvrtko Ursulin
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Tvrtko Ursulin @ 2026-03-04 13:03 UTC (permalink / raw)
  To: intel-xe; +Cc: kernel-dev, Tvrtko Ursulin, Rodrigo Vivi

According to i915 commit
ad8ebf12217e ("drm/i915/gt: Ensure memory quiesced before invalidation")
quiescing of the memory traffic is required before invalidating the AuxCCS
tables.

Add an extra pipe control flush to achieve that.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_ring_ops.c       | 10 +++++++++-
 drivers/gpu/drm/xe/xe_ring_ops_types.h |  2 +-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index 07235a895f4b..91f52d7748ca 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -377,12 +377,20 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
 	struct xe_gt *gt = job->q->gt;
 	struct xe_device *xe = gt_to_xe(gt);
 	bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
+	const bool aux_ccs = has_aux_ccs(xe);
 	u32 mask_flags = 0;
 
 	*head = lrc->ring.tail;
 
 	i = emit_copy_timestamp(xe, lrc, dw, i);
 
+	/*
+	 * On AuxCCS platforms the invalidation of the Aux table requires
+	 * quiescing the memory traffic beforehand.
+	 */
+	if (aux_ccs)
+		i = emit_render_cache_flush(job, dw, i);
+
 	dw[i++] = preparser_disable(true);
 	if (lacks_render)
 		mask_flags = PIPE_CONTROL_3D_ARCH_FLAGS;
@@ -393,7 +401,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
 	i = emit_pipe_invalidate(job->q, mask_flags, job->ring_ops_flush_tlb, dw, i);
 
 	/* hsdes: 1809175790 */
-	if (has_aux_ccs(xe))
+	if (aux_ccs)
 		i = emit_aux_table_inv(gt, CCS_AUX_INV, dw, i);
 
 	dw[i++] = preparser_disable(false);
diff --git a/drivers/gpu/drm/xe/xe_ring_ops_types.h b/drivers/gpu/drm/xe/xe_ring_ops_types.h
index d7e3e150a9a5..477dc7defd72 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops_types.h
+++ b/drivers/gpu/drm/xe/xe_ring_ops_types.h
@@ -8,7 +8,7 @@
 
 struct xe_sched_job;
 
-#define MAX_JOB_SIZE_DW 58
+#define MAX_JOB_SIZE_DW 70
 #define MAX_JOB_SIZE_BYTES (MAX_JOB_SIZE_DW * 4)
 
 /**
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v18 5/9] drm/xe/xelp: Wait for AuxCCS invalidation to complete
  2026-03-04 13:03 [PATCH v18 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  2026-03-04 13:03 ` [PATCH v18 4/9] drm/xe/xelp: Quiesce memory traffic before invalidating AuxCCS Tvrtko Ursulin
@ 2026-03-04 13:03 ` Tvrtko Ursulin
  2026-03-04 13:03 ` [PATCH v18 6/9] drm/xe: Move aux table invalidation to ring ops Tvrtko Ursulin
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: Tvrtko Ursulin @ 2026-03-04 13:03 UTC (permalink / raw)
  To: intel-xe; +Cc: kernel-dev, Tvrtko Ursulin, Rodrigo Vivi

On AuxCCS platforms we need to wait for AuxCCS invalidations to complete.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/instructions/xe_mi_commands.h | 6 ++++++
 drivers/gpu/drm/xe/xe_ring_ops.c                 | 9 ++++++++-
 drivers/gpu/drm/xe/xe_ring_ops_types.h           | 2 +-
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
index c47b290e0e9f..49d8ffd026d5 100644
--- a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
+++ b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
@@ -81,4 +81,10 @@
 #define MI_SET_APPID_SESSION_ID_MASK	REG_GENMASK(6, 0)
 #define MI_SET_APPID_SESSION_ID(x)	REG_FIELD_PREP(MI_SET_APPID_SESSION_ID_MASK, x)
 
+#define MI_SEMAPHORE_WAIT_TOKEN		(__MI_INSTR(0x1c) | XE_INSTR_NUM_DW(5)) /* XeLP+ */
+#define   MI_SEMAPHORE_REGISTER_POLL	REG_BIT(16)
+#define   MI_SEMAPHORE_POLL		REG_BIT(15)
+#define   MI_SEMAPHORE_CMP_OP_MASK	REG_GENMASK(14, 12)
+#define   MI_SEMAPHORE_SAD_EQ_SDD	REG_FIELD_PREP(MI_SEMAPHORE_CMP_OP_MASK, 4)
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index 91f52d7748ca..596379e6d742 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -54,7 +54,14 @@ static int emit_aux_table_inv(struct xe_gt *gt, struct xe_reg reg,
 	dw[i++] = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(1) | MI_LRI_MMIO_REMAP_EN;
 	dw[i++] = reg.addr + gt->mmio.adj_offset;
 	dw[i++] = AUX_INV;
-	dw[i++] = MI_NOOP;
+	dw[i++] = MI_SEMAPHORE_WAIT_TOKEN |
+		  MI_SEMAPHORE_REGISTER_POLL |
+		  MI_SEMAPHORE_POLL |
+		  MI_SEMAPHORE_SAD_EQ_SDD;
+	dw[i++] = 0;
+	dw[i++] = reg.addr + gt->mmio.adj_offset;
+	dw[i++] = 0;
+	dw[i++] = 0;
 
 	return i;
 }
diff --git a/drivers/gpu/drm/xe/xe_ring_ops_types.h b/drivers/gpu/drm/xe/xe_ring_ops_types.h
index 477dc7defd72..1197fc0bf2af 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops_types.h
+++ b/drivers/gpu/drm/xe/xe_ring_ops_types.h
@@ -8,7 +8,7 @@
 
 struct xe_sched_job;
 
-#define MAX_JOB_SIZE_DW 70
+#define MAX_JOB_SIZE_DW 74
 #define MAX_JOB_SIZE_BYTES (MAX_JOB_SIZE_DW * 4)
 
 /**
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v18 6/9] drm/xe: Move aux table invalidation to ring ops
  2026-03-04 13:03 [PATCH v18 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
                   ` (4 preceding siblings ...)
  2026-03-04 13:03 ` [PATCH v18 5/9] drm/xe/xelp: Wait for AuxCCS invalidation to complete Tvrtko Ursulin
@ 2026-03-04 13:03 ` Tvrtko Ursulin
  2026-03-04 16:20   ` Matthew Brost
  2026-03-04 13:03 ` [PATCH v18 7/9] drm/xe/xelp: Add AuxCCS invalidation to the indirect context workarounds Tvrtko Ursulin
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: Tvrtko Ursulin @ 2026-03-04 13:03 UTC (permalink / raw)
  To: intel-xe; +Cc: kernel-dev, Tvrtko Ursulin, Matthew Brost, Rodrigo Vivi

Implement the suggestion of moving the aux invalidation from a helper to a
ring ops vfunc, together with the suggestion to split the vfunc table of
video decode and video enhance engines.

With this done the LRC code will be able to access the functionality via
the newly added ring ops vfunc.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_ring_ops.c       | 105 ++++++++++++++++++-------
 drivers/gpu/drm/xe/xe_ring_ops.h       |   3 +
 drivers/gpu/drm/xe/xe_ring_ops_types.h |   3 +
 3 files changed, 83 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index 596379e6d742..8947b3091873 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -48,22 +48,48 @@ static u32 preparser_disable(bool state)
 	return MI_ARB_CHECK | BIT(8) | state;
 }
 
-static int emit_aux_table_inv(struct xe_gt *gt, struct xe_reg reg,
-			      u32 *dw, int i)
+static u32 *
+__emit_aux_table_inv(u32 *cmd, const struct xe_reg reg, u32 adj_offset)
 {
-	dw[i++] = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(1) | MI_LRI_MMIO_REMAP_EN;
-	dw[i++] = reg.addr + gt->mmio.adj_offset;
-	dw[i++] = AUX_INV;
-	dw[i++] = MI_SEMAPHORE_WAIT_TOKEN |
-		  MI_SEMAPHORE_REGISTER_POLL |
-		  MI_SEMAPHORE_POLL |
-		  MI_SEMAPHORE_SAD_EQ_SDD;
-	dw[i++] = 0;
-	dw[i++] = reg.addr + gt->mmio.adj_offset;
-	dw[i++] = 0;
-	dw[i++] = 0;
+	*cmd++ = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(1) |
+		 MI_LRI_MMIO_REMAP_EN;
+	*cmd++ = reg.addr + adj_offset;
+	*cmd++ = AUX_INV;
+	*cmd++ = MI_SEMAPHORE_WAIT_TOKEN | MI_SEMAPHORE_REGISTER_POLL |
+		 MI_SEMAPHORE_POLL | MI_SEMAPHORE_SAD_EQ_SDD;
+	*cmd++ = 0;
+	*cmd++ = reg.addr + adj_offset;
+	*cmd++ = 0;
+	*cmd++ = 0;
 
-	return i;
+	return cmd;
+}
+
+static u32 *emit_aux_table_inv_render_compute(struct xe_gt *gt, u32 *cmd)
+{
+	return __emit_aux_table_inv(cmd, CCS_AUX_INV, gt->mmio.adj_offset);
+}
+
+static u32 *emit_aux_table_inv_video_decode(struct xe_gt *gt, u32 *cmd)
+{
+	return __emit_aux_table_inv(cmd, VD0_AUX_INV, gt->mmio.adj_offset);
+}
+
+static u32 *emit_aux_table_inv_video_enhance(struct xe_gt *gt, u32 *cmd)
+{
+	return __emit_aux_table_inv(cmd, VE0_AUX_INV, gt->mmio.adj_offset);
+}
+
+static int emit_aux_table_inv(struct xe_hw_engine *hwe, u32 *dw, int i)
+{
+	struct xe_gt *gt = hwe->gt;
+	u32 *(*emit)(struct xe_gt *gt, u32 *cmd) =
+		gt->ring_ops[hwe->class]->emit_aux_table_inv;
+
+	if (emit)
+		return emit(gt, dw + i) - dw;
+	else
+		return i;
 }
 
 static int emit_user_interrupt(u32 *dw, int i)
@@ -327,7 +353,6 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
 	u32 ppgtt_flag = get_ppgtt_flag(job);
 	struct xe_gt *gt = job->q->gt;
 	struct xe_device *xe = gt_to_xe(gt);
-	bool decode = job->q->class == XE_ENGINE_CLASS_VIDEO_DECODE;
 
 	*head = lrc->ring.tail;
 
@@ -336,12 +361,7 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
 	dw[i++] = preparser_disable(true);
 
 	/* hsdes: 1809175790 */
-	if (has_aux_ccs(xe)) {
-		if (decode)
-			i = emit_aux_table_inv(gt, VD0_AUX_INV, dw, i);
-		else
-			i = emit_aux_table_inv(gt, VE0_AUX_INV, dw, i);
-	}
+	i = emit_aux_table_inv(job->q->hwe, dw, i);
 
 	if (job->ring_ops_flush_tlb)
 		i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
@@ -384,7 +404,6 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
 	struct xe_gt *gt = job->q->gt;
 	struct xe_device *xe = gt_to_xe(gt);
 	bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
-	const bool aux_ccs = has_aux_ccs(xe);
 	u32 mask_flags = 0;
 
 	*head = lrc->ring.tail;
@@ -395,7 +414,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
 	 * On AuxCCS platforms the invalidation of the Aux table requires
 	 * quiescing the memory traffic beforehand.
 	 */
-	if (aux_ccs)
+	if (has_aux_ccs(xe))
 		i = emit_render_cache_flush(job, dw, i);
 
 	dw[i++] = preparser_disable(true);
@@ -408,8 +427,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
 	i = emit_pipe_invalidate(job->q, mask_flags, job->ring_ops_flush_tlb, dw, i);
 
 	/* hsdes: 1809175790 */
-	if (aux_ccs)
-		i = emit_aux_table_inv(gt, CCS_AUX_INV, dw, i);
+	i = emit_aux_table_inv(job->q->hwe, dw, i);
 
 	dw[i++] = preparser_disable(false);
 
@@ -534,7 +552,11 @@ static const struct xe_ring_ops ring_ops_gen12_copy = {
 	.emit_job = emit_job_gen12_copy,
 };
 
-static const struct xe_ring_ops ring_ops_gen12_video = {
+static const struct xe_ring_ops ring_ops_gen12_video_decode = {
+	.emit_job = emit_job_gen12_video,
+};
+
+static const struct xe_ring_ops ring_ops_gen12_video_enhance = {
 	.emit_job = emit_job_gen12_video,
 };
 
@@ -542,20 +564,47 @@ static const struct xe_ring_ops ring_ops_gen12_render_compute = {
 	.emit_job = emit_job_gen12_render_compute,
 };
 
+static const struct xe_ring_ops auxccs_ring_ops_gen12_video_decode = {
+	.emit_job = emit_job_gen12_video,
+	.emit_aux_table_inv = emit_aux_table_inv_video_decode,
+};
+
+static const struct xe_ring_ops auxccs_ring_ops_gen12_video_enhance = {
+	.emit_job = emit_job_gen12_video,
+	.emit_aux_table_inv = emit_aux_table_inv_video_enhance,
+};
+
+static const struct xe_ring_ops auxccs_ring_ops_gen12_render_compute = {
+	.emit_job = emit_job_gen12_render_compute,
+	.emit_aux_table_inv = emit_aux_table_inv_render_compute,
+};
+
 const struct xe_ring_ops *
 xe_ring_ops_get(struct xe_gt *gt, enum xe_engine_class class)
 {
+	struct xe_device *xe = gt_to_xe(gt);
+
 	switch (class) {
 	case XE_ENGINE_CLASS_OTHER:
 		return &ring_ops_gen12_gsc;
 	case XE_ENGINE_CLASS_COPY:
 		return &ring_ops_gen12_copy;
 	case XE_ENGINE_CLASS_VIDEO_DECODE:
+		if (has_aux_ccs(xe))
+			return &auxccs_ring_ops_gen12_video_decode;
+		else
+			return &ring_ops_gen12_video_decode;
 	case XE_ENGINE_CLASS_VIDEO_ENHANCE:
-		return &ring_ops_gen12_video;
+		if (has_aux_ccs(xe))
+			return &auxccs_ring_ops_gen12_video_enhance;
+		else
+			return &ring_ops_gen12_video_enhance;
 	case XE_ENGINE_CLASS_RENDER:
 	case XE_ENGINE_CLASS_COMPUTE:
-		return &ring_ops_gen12_render_compute;
+		if (has_aux_ccs(xe))
+			return &auxccs_ring_ops_gen12_render_compute;
+		else
+			return &ring_ops_gen12_render_compute;
 	default:
 		return NULL;
 	}
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.h b/drivers/gpu/drm/xe/xe_ring_ops.h
index e942735d76a6..5a2d32f9bb25 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.h
+++ b/drivers/gpu/drm/xe/xe_ring_ops.h
@@ -10,8 +10,11 @@
 #include "xe_ring_ops_types.h"
 
 struct xe_gt;
+struct xe_hw_engine;
 
 const struct xe_ring_ops *
 xe_ring_ops_get(struct xe_gt *gt, enum xe_engine_class class);
 
+u32 *xe_emit_aux_table_inv(struct xe_hw_engine *hwe, u32 *cmd);
+
 #endif
diff --git a/drivers/gpu/drm/xe/xe_ring_ops_types.h b/drivers/gpu/drm/xe/xe_ring_ops_types.h
index 1197fc0bf2af..e25630fac17e 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops_types.h
+++ b/drivers/gpu/drm/xe/xe_ring_ops_types.h
@@ -17,6 +17,9 @@ struct xe_sched_job;
 struct xe_ring_ops {
 	/** @emit_job: Write job to ring */
 	void (*emit_job)(struct xe_sched_job *job);
+
+	/** @emit_aux_table_inv: Emit aux table invalidation to the ring */
+	u32 *(*emit_aux_table_inv)(struct xe_gt *gt, u32 *cmd);
 };
 
 #endif
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v18 7/9] drm/xe/xelp: Add AuxCCS invalidation to the indirect context workarounds
  2026-03-04 13:03 [PATCH v18 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
                   ` (5 preceding siblings ...)
  2026-03-04 13:03 ` [PATCH v18 6/9] drm/xe: Move aux table invalidation to ring ops Tvrtko Ursulin
@ 2026-03-04 13:03 ` Tvrtko Ursulin
  2026-03-04 13:51   ` Rodrigo Vivi
  2026-03-04 13:03 ` [PATCH v18 8/9] drm/xe/display: Add support for AuxCCS Tvrtko Ursulin
  2026-03-04 13:03 ` [PATCH v18 9/9] drm/xe/xelp: Expose AuxCCS frame buffer modifiers on Alderlake-P Tvrtko Ursulin
  8 siblings, 1 reply; 17+ messages in thread
From: Tvrtko Ursulin @ 2026-03-04 13:03 UTC (permalink / raw)
  To: intel-xe; +Cc: kernel-dev, Tvrtko Ursulin, Rodrigo Vivi

Following from the i915 reference implementation, we add the AuxCCS
invalidation to the indirect context workarounds page.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> # v1
---
v2:
 * Reworked to accomodate aux invalidation becoming part of ring_ops.
---
 drivers/gpu/drm/xe/xe_lrc.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index fcdbd403fa3c..2a6f3157491f 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -27,6 +27,7 @@
 #include "xe_map.h"
 #include "xe_memirq.h"
 #include "xe_mmio.h"
+#include "xe_ring_ops.h"
 #include "xe_sriov.h"
 #include "xe_trace_lrc.h"
 #include "xe_vm.h"
@@ -93,6 +94,9 @@ gt_engine_needs_indirect_ctx(struct xe_gt *gt, enum xe_engine_class class)
 					       class, NULL))
 		return true;
 
+	if (gt->ring_ops[class]->emit_aux_table_inv)
+		return true;
+
 	return false;
 }
 
@@ -1216,6 +1220,23 @@ static ssize_t setup_invalidate_state_cache_wa(struct xe_lrc *lrc,
 	return cmd - batch;
 }
 
+static ssize_t setup_invalidate_auxccs_wa(struct xe_lrc *lrc,
+					  struct xe_hw_engine *hwe,
+					  u32 *batch, size_t max_len)
+{
+	struct xe_gt *gt = lrc->gt;
+	u32 *(*emit)(struct xe_gt *gt, u32 *cmd) =
+		gt->ring_ops[hwe->class]->emit_aux_table_inv;
+
+	if (!emit)
+		return 0;
+
+	if (xe_gt_WARN_ON(gt, max_len < 8))
+		return -ENOSPC;
+
+	return emit(gt, batch) - batch;
+}
+
 struct bo_setup {
 	ssize_t (*setup)(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 			 u32 *batch, size_t max_size);
@@ -1348,9 +1369,11 @@ setup_indirect_ctx(struct xe_lrc *lrc, struct xe_hw_engine *hwe)
 {
 	static const struct bo_setup rcs_funcs[] = {
 		{ .setup = setup_timestamp_wa },
+		{ .setup = setup_invalidate_auxccs_wa },
 		{ .setup = setup_configfs_mid_ctx_restore_bb },
 	};
 	static const struct bo_setup xcs_funcs[] = {
+		{ .setup = setup_invalidate_auxccs_wa },
 		{ .setup = setup_configfs_mid_ctx_restore_bb },
 	};
 	struct bo_setup_state state = {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v18 8/9] drm/xe/display: Add support for AuxCCS
  2026-03-04 13:03 [PATCH v18 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
                   ` (6 preceding siblings ...)
  2026-03-04 13:03 ` [PATCH v18 7/9] drm/xe/xelp: Add AuxCCS invalidation to the indirect context workarounds Tvrtko Ursulin
@ 2026-03-04 13:03 ` Tvrtko Ursulin
  2026-03-04 13:03 ` [PATCH v18 9/9] drm/xe/xelp: Expose AuxCCS frame buffer modifiers on Alderlake-P Tvrtko Ursulin
  8 siblings, 0 replies; 17+ messages in thread
From: Tvrtko Ursulin @ 2026-03-04 13:03 UTC (permalink / raw)
  To: intel-xe
  Cc: kernel-dev, Tvrtko Ursulin, Juha-Pekka Heikkila, Michael J. Ruhl,
	Rodrigo Vivi

Add support for mapping the auxiliary CCS buffer into the DPT page tables.

This will allow for better power efficiency by enabling the render
compression frame buffer modifiers such as
I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS in a following patch.

We do this by refactoring the code a bit so handling for the linear
auxiliary frame buffer can be added in a tidy way. Also replace some
hardcoded constants.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Cc: Michael J. Ruhl <michael.j.ruhl@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/display/xe_fb_pin.c | 111 ++++++++++++++++++-------
 1 file changed, 81 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
index df7d305c6fcd..fe4b66e5c3ad 100644
--- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
+++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
@@ -49,33 +49,94 @@ write_dpt_rotated(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs, u32 bo_
 	*dpt_ofs = ALIGN(*dpt_ofs, 4096);
 }
 
+static unsigned int
+write_dpt_padding(struct iosys_map *map, unsigned int dest, unsigned int pad)
+{
+	/* The DE ignores the PTEs for the padding tiles */
+	return dest + pad * sizeof(u64);
+}
+
+static unsigned int
+write_dpt_remapped_linear(struct xe_bo *bo, struct iosys_map *map,
+			  unsigned int dest,
+			  const struct intel_remapped_plane_info *plane)
+{
+	struct xe_device *xe = xe_bo_device(bo);
+	struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
+	const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
+						 xe->pat.idx[XE_CACHE_NONE]);
+	unsigned int offset = plane->offset * XE_PAGE_SIZE;
+	unsigned int size = plane->size;
+
+	while (size--) {
+		u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
+
+		iosys_map_wr(map, dest, u64, addr | pte);
+		dest += sizeof(u64);
+		offset += XE_PAGE_SIZE;
+	}
+
+	return dest;
+}
+
+static unsigned int
+write_dpt_remapped_tiled(struct xe_bo *bo, struct iosys_map *map,
+			 unsigned int dest,
+			 const struct intel_remapped_plane_info *plane)
+{
+	struct xe_device *xe = xe_bo_device(bo);
+	struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
+	const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
+						 xe->pat.idx[XE_CACHE_NONE]);
+	unsigned int offset, column, row;
+
+	for (row = 0; row < plane->height; row++) {
+		offset = (plane->offset + plane->src_stride * row) *
+			 XE_PAGE_SIZE;
+
+		for (column = 0; column < plane->width; column++) {
+			u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
+
+			iosys_map_wr(map, dest, u64, addr | pte);
+			dest += sizeof(u64);
+			offset += XE_PAGE_SIZE;
+		}
+
+		dest = write_dpt_padding(map, dest,
+					 plane->dst_stride - plane->width);
+	}
+
+	return dest;
+}
+
 static void
-write_dpt_remapped(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs,
-		   u32 bo_ofs, u32 width, u32 height, u32 src_stride,
-		   u32 dst_stride)
+write_dpt_remapped(struct xe_bo *bo,
+		   const struct intel_remapped_info *remap_info,
+		   struct iosys_map *map)
 {
-	struct xe_device *xe = xe_bo_device(bo);
-	struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
-	u32 column, row;
-	u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo, xe->pat.idx[XE_CACHE_NONE]);
+	unsigned int i, dest = 0;
 
-	for (row = 0; row < height; row++) {
-		u32 src_idx = src_stride * row + bo_ofs;
+	for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++) {
+		const struct intel_remapped_plane_info *plane =
+						&remap_info->plane[i];
 
-		for (column = 0; column < width; column++) {
-			u64 addr = xe_bo_addr(bo, src_idx * XE_PAGE_SIZE, XE_PAGE_SIZE);
-			iosys_map_wr(map, *dpt_ofs, u64, pte | addr);
+		if (!plane->width && !plane->height && !plane->linear)
+			continue;
 
-			*dpt_ofs += 8;
-			src_idx++;
+		if (remap_info->plane_alignment) {
+			const unsigned int index = dest / sizeof(u64);
+			const unsigned int pad =
+				ALIGN(index, remap_info->plane_alignment) -
+				index;
+
+			dest = write_dpt_padding(map, dest, pad);
 		}
 
-		/* The DE ignores the PTEs for the padding tiles */
-		*dpt_ofs += (dst_stride - width) * 8;
+		if (plane->linear)
+			dest = write_dpt_remapped_linear(bo, map, dest, plane);
+		else
+			dest = write_dpt_remapped_tiled(bo, map, dest, plane);
 	}
-
-	/* Align to next page */
-	*dpt_ofs = ALIGN(*dpt_ofs, 4096);
 }
 
 static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
@@ -138,17 +199,7 @@ static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
 			iosys_map_wr(&dpt->vmap, x * 8, u64, pte | addr);
 		}
 	} else if (view->type == I915_GTT_VIEW_REMAPPED) {
-		const struct intel_remapped_info *remap_info = &view->remapped;
-		u32 i, dpt_ofs = 0;
-
-		for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++)
-			write_dpt_remapped(bo, &dpt->vmap, &dpt_ofs,
-					   remap_info->plane[i].offset,
-					   remap_info->plane[i].width,
-					   remap_info->plane[i].height,
-					   remap_info->plane[i].src_stride,
-					   remap_info->plane[i].dst_stride);
-
+		write_dpt_remapped(bo, &view->remapped, &dpt->vmap);
 	} else {
 		const struct intel_rotation_info *rot_info = &view->rotated;
 		u32 i, dpt_ofs = 0;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v18 9/9] drm/xe/xelp: Expose AuxCCS frame buffer modifiers on Alderlake-P
  2026-03-04 13:03 [PATCH v18 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
                   ` (7 preceding siblings ...)
  2026-03-04 13:03 ` [PATCH v18 8/9] drm/xe/display: Add support for AuxCCS Tvrtko Ursulin
@ 2026-03-04 13:03 ` Tvrtko Ursulin
  2026-03-04 13:55   ` Rodrigo Vivi
  8 siblings, 1 reply; 17+ messages in thread
From: Tvrtko Ursulin @ 2026-03-04 13:03 UTC (permalink / raw)
  To: intel-xe
  Cc: kernel-dev, Tvrtko Ursulin, Jani Nikula,
	José Roberto de Souza, Juha-Pekka Heikkila, Rodrigo Vivi

Now that we have implemented all the related missing bits we can enable
the AuxCCS compressed modifiers which were disabled in
cf48bddd31de ("drm/i915/display: Disable AuxCCS framebuffers if built for Xe").

Tested with KDE Wayland, on Lenovo Carbon X1 ADL-P:

        [PLANE:32:plane 1A]: type=PRI
                uapi: [FB:242] AR30 little-endian (0x30335241),0x100000000000008,2880x1800, visible=visible, src=28
                hw: [FB:242] AR30 little-endian (0x30335241),0x100000000000008,2880x1800, visible=yes, src=2880.000

Display is working fine - no artefacts, no DMAR/PIPE faults.

v2:
 * Adjust patch title. (Rodrigo)

v3:
 * Complete rewrite based on the display parent interface.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
References: cf48bddd31de ("drm/i915/display: Disable AuxCCS framebuffers if built for Xe")
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> # v2
---
 drivers/gpu/drm/xe/display/xe_display.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
index c8dd3faa9b97..5180de285295 100644
--- a/drivers/gpu/drm/xe/display/xe_display.c
+++ b/drivers/gpu/drm/xe/display/xe_display.c
@@ -539,6 +539,13 @@ static const struct intel_display_irq_interface xe_display_irq_interface = {
 	.synchronize = irq_synchronize,
 };
 
+static bool has_auxccs(struct drm_device *drm)
+{
+	struct xe_device *xe = to_xe_device(drm);
+
+	return xe->info.platform == XE_ALDERLAKE_P;
+}
+
 static const struct intel_display_parent_interface parent = {
 	.dsb = &xe_display_dsb_interface,
 	.hdcp = &xe_display_hdcp_interface,
@@ -548,6 +555,7 @@ static const struct intel_display_parent_interface parent = {
 	.pcode = &xe_display_pcode_interface,
 	.rpm = &xe_display_rpm_interface,
 	.stolen = &xe_display_stolen_interface,
+	.has_auxccs = has_auxccs,
 };
 
 /**
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v18 1/9] drm/xe: Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC
  2026-03-04 13:03 ` [PATCH v18 1/9] drm/xe: Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC Tvrtko Ursulin
@ 2026-03-04 13:50   ` Rodrigo Vivi
  2026-03-04 14:23     ` Tvrtko Ursulin
  0 siblings, 1 reply; 17+ messages in thread
From: Rodrigo Vivi @ 2026-03-04 13:50 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-xe, kernel-dev

On Wed, Mar 04, 2026 at 01:03:06PM +0000, Tvrtko Ursulin wrote:
> Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC so that the usage of the
> flag can legitimately be expanded to more than just the actual frame-
> buffer objects.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>  drivers/gpu/drm/xe/display/intel_fb_bo.c      |  6 +++---
>  drivers/gpu/drm/xe/display/intel_fbdev_fb.c   | 12 ++++++++----
>  drivers/gpu/drm/xe/display/xe_dsb_buffer.c    |  4 +++-
>  drivers/gpu/drm/xe/display/xe_fb_pin.c        |  2 +-
>  drivers/gpu/drm/xe/display/xe_initial_plane.c |  2 +-
>  drivers/gpu/drm/xe/xe_bo.c                    | 16 +++++++++++-----
>  drivers/gpu/drm/xe/xe_bo.h                    |  2 +-
>  7 files changed, 28 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/display/intel_fb_bo.c b/drivers/gpu/drm/xe/display/intel_fb_bo.c
> index db8b1a27b4de..d2e72dc5abd9 100644
> --- a/drivers/gpu/drm/xe/display/intel_fb_bo.c
> +++ b/drivers/gpu/drm/xe/display/intel_fb_bo.c
> @@ -45,9 +45,9 @@ int intel_fb_bo_framebuffer_init(struct drm_gem_object *obj,
>  	if (ret)
>  		goto err;
>  
> -	if (!(bo->flags & XE_BO_FLAG_SCANOUT)) {
> +	if (!(bo->flags & XE_BO_FLAG_FORCE_WC)) {
>  		/*
> -		 * XE_BO_FLAG_SCANOUT should ideally be set at creation, or is
> +		 * XE_BO_FLAG_FORCE_WC should ideally be set at creation, or is
>  		 * automatically set when creating FB. We cannot change caching
>  		 * mode when the bo is VM_BINDed, so we can only set
>  		 * coherency with display when unbound.
> @@ -57,7 +57,7 @@ int intel_fb_bo_framebuffer_init(struct drm_gem_object *obj,
>  			ret = -EINVAL;
>  			goto err;
>  		}
> -		bo->flags |= XE_BO_FLAG_SCANOUT;
> +		bo->flags |= XE_BO_FLAG_FORCE_WC;
>  	}
>  	ttm_bo_unreserve(&bo->ttm);
>  	return 0;
> diff --git a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> index 87af5646c938..d7030e4d814c 100644
> --- a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> +++ b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
> @@ -56,9 +56,11 @@ struct drm_gem_object *intel_fbdev_fb_bo_create(struct drm_device *drm, int size
>  	if (intel_fbdev_fb_prefer_stolen(drm, size)) {
>  		obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe),
>  						size,
> -						ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
> +						ttm_bo_type_kernel,
> +						XE_BO_FLAG_FORCE_WC |
>  						XE_BO_FLAG_STOLEN |
> -						XE_BO_FLAG_GGTT, false);
> +						XE_BO_FLAG_GGTT,
> +						false);
>  		if (!IS_ERR(obj))
>  			drm_info(&xe->drm, "Allocated fbdev into stolen\n");
>  		else
> @@ -69,9 +71,11 @@ struct drm_gem_object *intel_fbdev_fb_bo_create(struct drm_device *drm, int size
>  
>  	if (IS_ERR(obj)) {
>  		obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe), size,
> -						ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
> +						ttm_bo_type_kernel,
> +						XE_BO_FLAG_FORCE_WC |
>  						XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> -						XE_BO_FLAG_GGTT, false);
> +						XE_BO_FLAG_GGTT,
> +						false);
>  	}
>  
>  	if (IS_ERR(obj)) {
> diff --git a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> index 1c67a950c6ad..a7158c73a14c 100644
> --- a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> +++ b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
> @@ -54,7 +54,9 @@ static struct intel_dsb_buffer *xe_dsb_buffer_create(struct drm_device *drm, siz
>  					PAGE_ALIGN(size),
>  					ttm_bo_type_kernel,
>  					XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> -					XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT, false);
> +					XE_BO_FLAG_FORCE_WC |
> +					XE_BO_FLAG_GGTT,
> +					false);
>  	if (IS_ERR(obj)) {
>  		ret = PTR_ERR(obj);
>  		goto err_pin_map;
> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> index dbbc61032b7f..d4a9eb550cae 100644
> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> @@ -429,7 +429,7 @@ int intel_plane_pin_fb(struct intel_plane_state *new_plane_state,
>  		return 0;
>  
>  	/* We reject creating !SCANOUT fb's, so this is weird.. */
> -	drm_WARN_ON(bo->ttm.base.dev, !(bo->flags & XE_BO_FLAG_SCANOUT));
> +	drm_WARN_ON(bo->ttm.base.dev, !(bo->flags & XE_BO_FLAG_FORCE_WC));
>  
>  	vma = __xe_pin_fb_vma(intel_fb, &new_plane_state->view.gtt, alignment);
>  
> diff --git a/drivers/gpu/drm/xe/display/xe_initial_plane.c b/drivers/gpu/drm/xe/display/xe_initial_plane.c
> index 65cc0b0c934b..8bcae552dddc 100644
> --- a/drivers/gpu/drm/xe/display/xe_initial_plane.c
> +++ b/drivers/gpu/drm/xe/display/xe_initial_plane.c
> @@ -48,7 +48,7 @@ initial_plane_bo(struct xe_device *xe,
>  	if (plane_config->size == 0)
>  		return NULL;
>  
> -	flags = XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT;
> +	flags = XE_BO_FLAG_FORCE_WC | XE_BO_FLAG_GGTT;
>  
>  	base = round_down(plane_config->base, page_size);
>  	if (IS_DGFX(xe)) {
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 8ff193600443..fe560b0a980a 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -515,7 +515,7 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
>  		 * For Xe_LPG and beyond up to NVL-P (excluding), PPGTT PTE
>  		 * lookups are also non-coherent and require a CPU:WC mapping.
>  		 */
> -		if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_SCANOUT) ||
> +		if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_FORCE_WC) ||
>  		     (!xe->info.has_cached_pt && bo->flags & XE_BO_FLAG_PAGETABLE))
>  			caching = ttm_write_combined;
>  	}
> @@ -3196,8 +3196,14 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>  	if (args->flags & DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING)
>  		bo_flags |= XE_BO_FLAG_DEFER_BACKING;
>  
> +	/*
> +	 * Display scanout is always non-coherent with the CPU cache.
> +	 *
> +	 * For Xe_LPG and beyond up to NVL-P (excluding), PPGTT PTE
> +	 * lookups are also non-coherent and require a CPU:WC mapping.
> +	 */

I believe this comment should be now removed from the other place.
But it doesn't hurt if you decide to keep in both places.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


>  	if (args->flags & DRM_XE_GEM_CREATE_FLAG_SCANOUT)
> -		bo_flags |= XE_BO_FLAG_SCANOUT;
> +		bo_flags |= XE_BO_FLAG_FORCE_WC;
>  
>  	if (args->flags & DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION) {
>  		if (XE_IOCTL_DBG(xe, GRAPHICS_VER(xe) < 20))
> @@ -3209,7 +3215,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>  
>  	/* CCS formats need physical placement at a 64K alignment in VRAM. */
>  	if ((bo_flags & XE_BO_FLAG_VRAM_MASK) &&
> -	    (bo_flags & XE_BO_FLAG_SCANOUT) &&
> +	    (args->flags & XE_BO_FLAG_FORCE_WC) &&
>  	    !(xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K) &&
>  	    IS_ALIGNED(args->size, SZ_64K))
>  		bo_flags |= XE_BO_FLAG_NEEDS_64K;
> @@ -3229,7 +3235,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>  			 args->cpu_caching != DRM_XE_GEM_CPU_CACHING_WC))
>  		return -EINVAL;
>  
> -	if (XE_IOCTL_DBG(xe, bo_flags & XE_BO_FLAG_SCANOUT &&
> +	if (XE_IOCTL_DBG(xe, bo_flags & XE_BO_FLAG_FORCE_WC &&
>  			 args->cpu_caching == DRM_XE_GEM_CPU_CACHING_WB))
>  		return -EINVAL;
>  
> @@ -3642,7 +3648,7 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
>  	bo = xe_bo_create_user(xe, NULL, args->size,
>  			       DRM_XE_GEM_CPU_CACHING_WC,
>  			       XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> -			       XE_BO_FLAG_SCANOUT |
> +			       XE_BO_FLAG_FORCE_WC |
>  			       XE_BO_FLAG_NEEDS_CPU_ACCESS, NULL);
>  	if (IS_ERR(bo))
>  		return PTR_ERR(bo);
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index c914ab719f20..af9a6669c872 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -35,7 +35,7 @@
>  #define XE_BO_FLAG_PINNED		BIT(7)
>  #define XE_BO_FLAG_NO_RESV_EVICT	BIT(8)
>  #define XE_BO_FLAG_DEFER_BACKING	BIT(9)
> -#define XE_BO_FLAG_SCANOUT		BIT(10)
> +#define XE_BO_FLAG_FORCE_WC		BIT(10)
>  #define XE_BO_FLAG_FIXED_PLACEMENT	BIT(11)
>  #define XE_BO_FLAG_PAGETABLE		BIT(12)
>  #define XE_BO_FLAG_NEEDS_CPU_ACCESS	BIT(13)
> -- 
> 2.52.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v18 2/9] drm/xe: Use write-combine mapping when populating DPT
  2026-03-04 13:03 ` [PATCH v18 2/9] drm/xe: Use write-combine mapping when populating DPT Tvrtko Ursulin
@ 2026-03-04 13:51   ` Rodrigo Vivi
  0 siblings, 0 replies; 17+ messages in thread
From: Rodrigo Vivi @ 2026-03-04 13:51 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-xe, kernel-dev, Ville Syrjälä

On Wed, Mar 04, 2026 at 01:03:07PM +0000, Tvrtko Ursulin wrote:
> The fallback case for DPT backing store is a buffer object in system
> memory buffer, which by default use a write-back CPU caching policy.
> 
> If this fallback gets triggered, and since there is currently no flushing,
> the DPT writes made when pinning a buffer to display are not guaranteed to
> be seen by the display engine.
> 
> To fix this, since both the local memory and the stolen memory DPT
> placements already use write-combine, let us make the system memory option
> follow suit by passing down the appropriate flag.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

> ---
>  drivers/gpu/drm/xe/display/xe_fb_pin.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> index d4a9eb550cae..df7d305c6fcd 100644
> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> @@ -122,7 +122,8 @@ static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
>  						   ttm_bo_type_kernel,
>  						   XE_BO_FLAG_SYSTEM |
>  						   XE_BO_FLAG_GGTT |
> -						   XE_BO_FLAG_PAGETABLE,
> +						   XE_BO_FLAG_PAGETABLE |
> +						   XE_BO_FLAG_FORCE_WC,
>  						   alignment, false);
>  	if (IS_ERR(dpt))
>  		return PTR_ERR(dpt);
> -- 
> 2.52.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v18 7/9] drm/xe/xelp: Add AuxCCS invalidation to the indirect context workarounds
  2026-03-04 13:03 ` [PATCH v18 7/9] drm/xe/xelp: Add AuxCCS invalidation to the indirect context workarounds Tvrtko Ursulin
@ 2026-03-04 13:51   ` Rodrigo Vivi
  0 siblings, 0 replies; 17+ messages in thread
From: Rodrigo Vivi @ 2026-03-04 13:51 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-xe, kernel-dev

On Wed, Mar 04, 2026 at 01:03:12PM +0000, Tvrtko Ursulin wrote:
> Following from the i915 reference implementation, we add the AuxCCS
> invalidation to the indirect context workarounds page.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> # v1
> ---
> v2:
>  * Reworked to accomodate aux invalidation becoming part of ring_ops.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_lrc.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
> index fcdbd403fa3c..2a6f3157491f 100644
> --- a/drivers/gpu/drm/xe/xe_lrc.c
> +++ b/drivers/gpu/drm/xe/xe_lrc.c
> @@ -27,6 +27,7 @@
>  #include "xe_map.h"
>  #include "xe_memirq.h"
>  #include "xe_mmio.h"
> +#include "xe_ring_ops.h"
>  #include "xe_sriov.h"
>  #include "xe_trace_lrc.h"
>  #include "xe_vm.h"
> @@ -93,6 +94,9 @@ gt_engine_needs_indirect_ctx(struct xe_gt *gt, enum xe_engine_class class)
>  					       class, NULL))
>  		return true;
>  
> +	if (gt->ring_ops[class]->emit_aux_table_inv)
> +		return true;
> +
>  	return false;
>  }
>  
> @@ -1216,6 +1220,23 @@ static ssize_t setup_invalidate_state_cache_wa(struct xe_lrc *lrc,
>  	return cmd - batch;
>  }
>  
> +static ssize_t setup_invalidate_auxccs_wa(struct xe_lrc *lrc,
> +					  struct xe_hw_engine *hwe,
> +					  u32 *batch, size_t max_len)
> +{
> +	struct xe_gt *gt = lrc->gt;
> +	u32 *(*emit)(struct xe_gt *gt, u32 *cmd) =
> +		gt->ring_ops[hwe->class]->emit_aux_table_inv;
> +
> +	if (!emit)
> +		return 0;
> +
> +	if (xe_gt_WARN_ON(gt, max_len < 8))
> +		return -ENOSPC;
> +
> +	return emit(gt, batch) - batch;
> +}
> +
>  struct bo_setup {
>  	ssize_t (*setup)(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
>  			 u32 *batch, size_t max_size);
> @@ -1348,9 +1369,11 @@ setup_indirect_ctx(struct xe_lrc *lrc, struct xe_hw_engine *hwe)
>  {
>  	static const struct bo_setup rcs_funcs[] = {
>  		{ .setup = setup_timestamp_wa },
> +		{ .setup = setup_invalidate_auxccs_wa },
>  		{ .setup = setup_configfs_mid_ctx_restore_bb },
>  	};
>  	static const struct bo_setup xcs_funcs[] = {
> +		{ .setup = setup_invalidate_auxccs_wa },
>  		{ .setup = setup_configfs_mid_ctx_restore_bb },
>  	};
>  	struct bo_setup_state state = {
> -- 
> 2.52.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v18 9/9] drm/xe/xelp: Expose AuxCCS frame buffer modifiers on Alderlake-P
  2026-03-04 13:03 ` [PATCH v18 9/9] drm/xe/xelp: Expose AuxCCS frame buffer modifiers on Alderlake-P Tvrtko Ursulin
@ 2026-03-04 13:55   ` Rodrigo Vivi
  0 siblings, 0 replies; 17+ messages in thread
From: Rodrigo Vivi @ 2026-03-04 13:55 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: intel-xe, kernel-dev, Jani Nikula, José Roberto de Souza,
	Juha-Pekka Heikkila

On Wed, Mar 04, 2026 at 01:03:14PM +0000, Tvrtko Ursulin wrote:
> Now that we have implemented all the related missing bits we can enable
> the AuxCCS compressed modifiers which were disabled in
> cf48bddd31de ("drm/i915/display: Disable AuxCCS framebuffers if built for Xe").
> 
> Tested with KDE Wayland, on Lenovo Carbon X1 ADL-P:
> 
>         [PLANE:32:plane 1A]: type=PRI
>                 uapi: [FB:242] AR30 little-endian (0x30335241),0x100000000000008,2880x1800, visible=visible, src=28
>                 hw: [FB:242] AR30 little-endian (0x30335241),0x100000000000008,2880x1800, visible=yes, src=2880.000
> 
> Display is working fine - no artefacts, no DMAR/PIPE faults.
> 
> v2:
>  * Adjust patch title. (Rodrigo)
> 
> v3:
>  * Complete rewrite based on the display parent interface.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> References: cf48bddd31de ("drm/i915/display: Disable AuxCCS framebuffers if built for Xe")
> Cc: Jani Nikula <jani.nikula@intel.com>
> Cc: José Roberto de Souza <jose.souza@intel.com>
> Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> # v2
> ---
>  drivers/gpu/drm/xe/display/xe_display.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
> index c8dd3faa9b97..5180de285295 100644
> --- a/drivers/gpu/drm/xe/display/xe_display.c
> +++ b/drivers/gpu/drm/xe/display/xe_display.c
> @@ -539,6 +539,13 @@ static const struct intel_display_irq_interface xe_display_irq_interface = {
>  	.synchronize = irq_synchronize,
>  };
>  
> +static bool has_auxccs(struct drm_device *drm)
> +{
> +	struct xe_device *xe = to_xe_device(drm);
> +
> +	return xe->info.platform == XE_ALDERLAKE_P;
> +}
> +
>  static const struct intel_display_parent_interface parent = {
>  	.dsb = &xe_display_dsb_interface,
>  	.hdcp = &xe_display_hdcp_interface,
> @@ -548,6 +555,7 @@ static const struct intel_display_parent_interface parent = {
>  	.pcode = &xe_display_pcode_interface,
>  	.rpm = &xe_display_rpm_interface,
>  	.stolen = &xe_display_stolen_interface,
> +	.has_auxccs = has_auxccs,
>  };
>  
>  /**
> -- 
> 2.52.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v18 1/9] drm/xe: Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC
  2026-03-04 13:50   ` Rodrigo Vivi
@ 2026-03-04 14:23     ` Tvrtko Ursulin
  2026-03-04 14:41       ` Tvrtko Ursulin
  0 siblings, 1 reply; 17+ messages in thread
From: Tvrtko Ursulin @ 2026-03-04 14:23 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe, kernel-dev


On 04/03/2026 13:50, Rodrigo Vivi wrote:
> On Wed, Mar 04, 2026 at 01:03:06PM +0000, Tvrtko Ursulin wrote:
>> Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC so that the usage of the
>> flag can legitimately be expanded to more than just the actual frame-
>> buffer objects.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> ---
>>   drivers/gpu/drm/xe/display/intel_fb_bo.c      |  6 +++---
>>   drivers/gpu/drm/xe/display/intel_fbdev_fb.c   | 12 ++++++++----
>>   drivers/gpu/drm/xe/display/xe_dsb_buffer.c    |  4 +++-
>>   drivers/gpu/drm/xe/display/xe_fb_pin.c        |  2 +-
>>   drivers/gpu/drm/xe/display/xe_initial_plane.c |  2 +-
>>   drivers/gpu/drm/xe/xe_bo.c                    | 16 +++++++++++-----
>>   drivers/gpu/drm/xe/xe_bo.h                    |  2 +-
>>   7 files changed, 28 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/display/intel_fb_bo.c b/drivers/gpu/drm/xe/display/intel_fb_bo.c
>> index db8b1a27b4de..d2e72dc5abd9 100644
>> --- a/drivers/gpu/drm/xe/display/intel_fb_bo.c
>> +++ b/drivers/gpu/drm/xe/display/intel_fb_bo.c
>> @@ -45,9 +45,9 @@ int intel_fb_bo_framebuffer_init(struct drm_gem_object *obj,
>>   	if (ret)
>>   		goto err;
>>   
>> -	if (!(bo->flags & XE_BO_FLAG_SCANOUT)) {
>> +	if (!(bo->flags & XE_BO_FLAG_FORCE_WC)) {
>>   		/*
>> -		 * XE_BO_FLAG_SCANOUT should ideally be set at creation, or is
>> +		 * XE_BO_FLAG_FORCE_WC should ideally be set at creation, or is
>>   		 * automatically set when creating FB. We cannot change caching
>>   		 * mode when the bo is VM_BINDed, so we can only set
>>   		 * coherency with display when unbound.
>> @@ -57,7 +57,7 @@ int intel_fb_bo_framebuffer_init(struct drm_gem_object *obj,
>>   			ret = -EINVAL;
>>   			goto err;
>>   		}
>> -		bo->flags |= XE_BO_FLAG_SCANOUT;
>> +		bo->flags |= XE_BO_FLAG_FORCE_WC;
>>   	}
>>   	ttm_bo_unreserve(&bo->ttm);
>>   	return 0;
>> diff --git a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
>> index 87af5646c938..d7030e4d814c 100644
>> --- a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
>> +++ b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
>> @@ -56,9 +56,11 @@ struct drm_gem_object *intel_fbdev_fb_bo_create(struct drm_device *drm, int size
>>   	if (intel_fbdev_fb_prefer_stolen(drm, size)) {
>>   		obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe),
>>   						size,
>> -						ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
>> +						ttm_bo_type_kernel,
>> +						XE_BO_FLAG_FORCE_WC |
>>   						XE_BO_FLAG_STOLEN |
>> -						XE_BO_FLAG_GGTT, false);
>> +						XE_BO_FLAG_GGTT,
>> +						false);
>>   		if (!IS_ERR(obj))
>>   			drm_info(&xe->drm, "Allocated fbdev into stolen\n");
>>   		else
>> @@ -69,9 +71,11 @@ struct drm_gem_object *intel_fbdev_fb_bo_create(struct drm_device *drm, int size
>>   
>>   	if (IS_ERR(obj)) {
>>   		obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe), size,
>> -						ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
>> +						ttm_bo_type_kernel,
>> +						XE_BO_FLAG_FORCE_WC |
>>   						XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
>> -						XE_BO_FLAG_GGTT, false);
>> +						XE_BO_FLAG_GGTT,
>> +						false);
>>   	}
>>   
>>   	if (IS_ERR(obj)) {
>> diff --git a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
>> index 1c67a950c6ad..a7158c73a14c 100644
>> --- a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
>> +++ b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
>> @@ -54,7 +54,9 @@ static struct intel_dsb_buffer *xe_dsb_buffer_create(struct drm_device *drm, siz
>>   					PAGE_ALIGN(size),
>>   					ttm_bo_type_kernel,
>>   					XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
>> -					XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT, false);
>> +					XE_BO_FLAG_FORCE_WC |
>> +					XE_BO_FLAG_GGTT,
>> +					false);
>>   	if (IS_ERR(obj)) {
>>   		ret = PTR_ERR(obj);
>>   		goto err_pin_map;
>> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> index dbbc61032b7f..d4a9eb550cae 100644
>> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> @@ -429,7 +429,7 @@ int intel_plane_pin_fb(struct intel_plane_state *new_plane_state,
>>   		return 0;
>>   
>>   	/* We reject creating !SCANOUT fb's, so this is weird.. */
>> -	drm_WARN_ON(bo->ttm.base.dev, !(bo->flags & XE_BO_FLAG_SCANOUT));
>> +	drm_WARN_ON(bo->ttm.base.dev, !(bo->flags & XE_BO_FLAG_FORCE_WC));
>>   
>>   	vma = __xe_pin_fb_vma(intel_fb, &new_plane_state->view.gtt, alignment);
>>   
>> diff --git a/drivers/gpu/drm/xe/display/xe_initial_plane.c b/drivers/gpu/drm/xe/display/xe_initial_plane.c
>> index 65cc0b0c934b..8bcae552dddc 100644
>> --- a/drivers/gpu/drm/xe/display/xe_initial_plane.c
>> +++ b/drivers/gpu/drm/xe/display/xe_initial_plane.c
>> @@ -48,7 +48,7 @@ initial_plane_bo(struct xe_device *xe,
>>   	if (plane_config->size == 0)
>>   		return NULL;
>>   
>> -	flags = XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT;
>> +	flags = XE_BO_FLAG_FORCE_WC | XE_BO_FLAG_GGTT;
>>   
>>   	base = round_down(plane_config->base, page_size);
>>   	if (IS_DGFX(xe)) {
>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>> index 8ff193600443..fe560b0a980a 100644
>> --- a/drivers/gpu/drm/xe/xe_bo.c
>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>> @@ -515,7 +515,7 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
>>   		 * For Xe_LPG and beyond up to NVL-P (excluding), PPGTT PTE
>>   		 * lookups are also non-coherent and require a CPU:WC mapping.
>>   		 */
>> -		if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_SCANOUT) ||
>> +		if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_FORCE_WC) ||
>>   		     (!xe->info.has_cached_pt && bo->flags & XE_BO_FLAG_PAGETABLE))
>>   			caching = ttm_write_combined;
>>   	}
>> @@ -3196,8 +3196,14 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>>   	if (args->flags & DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING)
>>   		bo_flags |= XE_BO_FLAG_DEFER_BACKING;
>>   
>> +	/*
>> +	 * Display scanout is always non-coherent with the CPU cache.
>> +	 *
>> +	 * For Xe_LPG and beyond up to NVL-P (excluding), PPGTT PTE
>> +	 * lookups are also non-coherent and require a CPU:WC mapping.
>> +	 */
> 
> I believe this comment should be now removed from the other place.
> But it doesn't hurt if you decide to keep in both places.

My bad, I thought you were suggesting a new comment text to add, and 
while touching the flag in xe_ttm_tt_create apparently I suffered from 
tunnel vision and did no see the same comment there. I will remove that 
one but will hold of the respin until more of the series gets comments.

> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Thank you!

Regards,

Tvrtko

> 
>>   	if (args->flags & DRM_XE_GEM_CREATE_FLAG_SCANOUT)
>> -		bo_flags |= XE_BO_FLAG_SCANOUT;
>> +		bo_flags |= XE_BO_FLAG_FORCE_WC;
>>   
>>   	if (args->flags & DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION) {
>>   		if (XE_IOCTL_DBG(xe, GRAPHICS_VER(xe) < 20))
>> @@ -3209,7 +3215,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>>   
>>   	/* CCS formats need physical placement at a 64K alignment in VRAM. */
>>   	if ((bo_flags & XE_BO_FLAG_VRAM_MASK) &&
>> -	    (bo_flags & XE_BO_FLAG_SCANOUT) &&
>> +	    (args->flags & XE_BO_FLAG_FORCE_WC) &&
>>   	    !(xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K) &&
>>   	    IS_ALIGNED(args->size, SZ_64K))
>>   		bo_flags |= XE_BO_FLAG_NEEDS_64K;
>> @@ -3229,7 +3235,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>>   			 args->cpu_caching != DRM_XE_GEM_CPU_CACHING_WC))
>>   		return -EINVAL;
>>   
>> -	if (XE_IOCTL_DBG(xe, bo_flags & XE_BO_FLAG_SCANOUT &&
>> +	if (XE_IOCTL_DBG(xe, bo_flags & XE_BO_FLAG_FORCE_WC &&
>>   			 args->cpu_caching == DRM_XE_GEM_CPU_CACHING_WB))
>>   		return -EINVAL;
>>   
>> @@ -3642,7 +3648,7 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
>>   	bo = xe_bo_create_user(xe, NULL, args->size,
>>   			       DRM_XE_GEM_CPU_CACHING_WC,
>>   			       XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
>> -			       XE_BO_FLAG_SCANOUT |
>> +			       XE_BO_FLAG_FORCE_WC |
>>   			       XE_BO_FLAG_NEEDS_CPU_ACCESS, NULL);
>>   	if (IS_ERR(bo))
>>   		return PTR_ERR(bo);
>> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
>> index c914ab719f20..af9a6669c872 100644
>> --- a/drivers/gpu/drm/xe/xe_bo.h
>> +++ b/drivers/gpu/drm/xe/xe_bo.h
>> @@ -35,7 +35,7 @@
>>   #define XE_BO_FLAG_PINNED		BIT(7)
>>   #define XE_BO_FLAG_NO_RESV_EVICT	BIT(8)
>>   #define XE_BO_FLAG_DEFER_BACKING	BIT(9)
>> -#define XE_BO_FLAG_SCANOUT		BIT(10)
>> +#define XE_BO_FLAG_FORCE_WC		BIT(10)
>>   #define XE_BO_FLAG_FIXED_PLACEMENT	BIT(11)
>>   #define XE_BO_FLAG_PAGETABLE		BIT(12)
>>   #define XE_BO_FLAG_NEEDS_CPU_ACCESS	BIT(13)
>> -- 
>> 2.52.0
>>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v18 1/9] drm/xe: Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC
  2026-03-04 14:23     ` Tvrtko Ursulin
@ 2026-03-04 14:41       ` Tvrtko Ursulin
  0 siblings, 0 replies; 17+ messages in thread
From: Tvrtko Ursulin @ 2026-03-04 14:41 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe, kernel-dev


On 04/03/2026 14:23, Tvrtko Ursulin wrote:
> 
> On 04/03/2026 13:50, Rodrigo Vivi wrote:
>> On Wed, Mar 04, 2026 at 01:03:06PM +0000, Tvrtko Ursulin wrote:
>>> Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC so that the usage of 
>>> the
>>> flag can legitimately be expanded to more than just the actual frame-
>>> buffer objects.
>>>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>> Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> ---
>>>   drivers/gpu/drm/xe/display/intel_fb_bo.c      |  6 +++---
>>>   drivers/gpu/drm/xe/display/intel_fbdev_fb.c   | 12 ++++++++----
>>>   drivers/gpu/drm/xe/display/xe_dsb_buffer.c    |  4 +++-
>>>   drivers/gpu/drm/xe/display/xe_fb_pin.c        |  2 +-
>>>   drivers/gpu/drm/xe/display/xe_initial_plane.c |  2 +-
>>>   drivers/gpu/drm/xe/xe_bo.c                    | 16 +++++++++++-----
>>>   drivers/gpu/drm/xe/xe_bo.h                    |  2 +-
>>>   7 files changed, 28 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/display/intel_fb_bo.c b/drivers/gpu/ 
>>> drm/xe/display/intel_fb_bo.c
>>> index db8b1a27b4de..d2e72dc5abd9 100644
>>> --- a/drivers/gpu/drm/xe/display/intel_fb_bo.c
>>> +++ b/drivers/gpu/drm/xe/display/intel_fb_bo.c
>>> @@ -45,9 +45,9 @@ int intel_fb_bo_framebuffer_init(struct 
>>> drm_gem_object *obj,
>>>       if (ret)
>>>           goto err;
>>> -    if (!(bo->flags & XE_BO_FLAG_SCANOUT)) {
>>> +    if (!(bo->flags & XE_BO_FLAG_FORCE_WC)) {
>>>           /*
>>> -         * XE_BO_FLAG_SCANOUT should ideally be set at creation, or is
>>> +         * XE_BO_FLAG_FORCE_WC should ideally be set at creation, or is
>>>            * automatically set when creating FB. We cannot change 
>>> caching
>>>            * mode when the bo is VM_BINDed, so we can only set
>>>            * coherency with display when unbound.
>>> @@ -57,7 +57,7 @@ int intel_fb_bo_framebuffer_init(struct 
>>> drm_gem_object *obj,
>>>               ret = -EINVAL;
>>>               goto err;
>>>           }
>>> -        bo->flags |= XE_BO_FLAG_SCANOUT;
>>> +        bo->flags |= XE_BO_FLAG_FORCE_WC;
>>>       }
>>>       ttm_bo_unreserve(&bo->ttm);
>>>       return 0;
>>> diff --git a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c b/drivers/ 
>>> gpu/drm/xe/display/intel_fbdev_fb.c
>>> index 87af5646c938..d7030e4d814c 100644
>>> --- a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
>>> +++ b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
>>> @@ -56,9 +56,11 @@ struct drm_gem_object 
>>> *intel_fbdev_fb_bo_create(struct drm_device *drm, int size
>>>       if (intel_fbdev_fb_prefer_stolen(drm, size)) {
>>>           obj = xe_bo_create_pin_map_novm(xe, 
>>> xe_device_get_root_tile(xe),
>>>                           size,
>>> -                        ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
>>> +                        ttm_bo_type_kernel,
>>> +                        XE_BO_FLAG_FORCE_WC |
>>>                           XE_BO_FLAG_STOLEN |
>>> -                        XE_BO_FLAG_GGTT, false);
>>> +                        XE_BO_FLAG_GGTT,
>>> +                        false);
>>>           if (!IS_ERR(obj))
>>>               drm_info(&xe->drm, "Allocated fbdev into stolen\n");
>>>           else
>>> @@ -69,9 +71,11 @@ struct drm_gem_object 
>>> *intel_fbdev_fb_bo_create(struct drm_device *drm, int size
>>>       if (IS_ERR(obj)) {
>>>           obj = xe_bo_create_pin_map_novm(xe, 
>>> xe_device_get_root_tile(xe), size,
>>> -                        ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
>>> +                        ttm_bo_type_kernel,
>>> +                        XE_BO_FLAG_FORCE_WC |
>>>                           
>>> XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
>>> -                        XE_BO_FLAG_GGTT, false);
>>> +                        XE_BO_FLAG_GGTT,
>>> +                        false);
>>>       }
>>>       if (IS_ERR(obj)) {
>>> diff --git a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c b/drivers/ 
>>> gpu/drm/xe/display/xe_dsb_buffer.c
>>> index 1c67a950c6ad..a7158c73a14c 100644
>>> --- a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
>>> +++ b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
>>> @@ -54,7 +54,9 @@ static struct intel_dsb_buffer 
>>> *xe_dsb_buffer_create(struct drm_device *drm, siz
>>>                       PAGE_ALIGN(size),
>>>                       ttm_bo_type_kernel,
>>>                       
>>> XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
>>> -                    XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT, false);
>>> +                    XE_BO_FLAG_FORCE_WC |
>>> +                    XE_BO_FLAG_GGTT,
>>> +                    false);
>>>       if (IS_ERR(obj)) {
>>>           ret = PTR_ERR(obj);
>>>           goto err_pin_map;
>>> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/ 
>>> drm/xe/display/xe_fb_pin.c
>>> index dbbc61032b7f..d4a9eb550cae 100644
>>> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>> @@ -429,7 +429,7 @@ int intel_plane_pin_fb(struct intel_plane_state 
>>> *new_plane_state,
>>>           return 0;
>>>       /* We reject creating !SCANOUT fb's, so this is weird.. */
>>> -    drm_WARN_ON(bo->ttm.base.dev, !(bo->flags & XE_BO_FLAG_SCANOUT));
>>> +    drm_WARN_ON(bo->ttm.base.dev, !(bo->flags & XE_BO_FLAG_FORCE_WC));
>>>       vma = __xe_pin_fb_vma(intel_fb, &new_plane_state->view.gtt, 
>>> alignment);
>>> diff --git a/drivers/gpu/drm/xe/display/xe_initial_plane.c b/drivers/ 
>>> gpu/drm/xe/display/xe_initial_plane.c
>>> index 65cc0b0c934b..8bcae552dddc 100644
>>> --- a/drivers/gpu/drm/xe/display/xe_initial_plane.c
>>> +++ b/drivers/gpu/drm/xe/display/xe_initial_plane.c
>>> @@ -48,7 +48,7 @@ initial_plane_bo(struct xe_device *xe,
>>>       if (plane_config->size == 0)
>>>           return NULL;
>>> -    flags = XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT;
>>> +    flags = XE_BO_FLAG_FORCE_WC | XE_BO_FLAG_GGTT;
>>>       base = round_down(plane_config->base, page_size);
>>>       if (IS_DGFX(xe)) {
>>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>>> index 8ff193600443..fe560b0a980a 100644
>>> --- a/drivers/gpu/drm/xe/xe_bo.c
>>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>>> @@ -515,7 +515,7 @@ static struct ttm_tt *xe_ttm_tt_create(struct 
>>> ttm_buffer_object *ttm_bo,
>>>            * For Xe_LPG and beyond up to NVL-P (excluding), PPGTT PTE
>>>            * lookups are also non-coherent and require a CPU:WC mapping.
>>>            */
>>> -        if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_SCANOUT) ||
>>> +        if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_FORCE_WC) ||
>>>                (!xe->info.has_cached_pt && bo->flags & 
>>> XE_BO_FLAG_PAGETABLE))
>>>               caching = ttm_write_combined;
>>>       }
>>> @@ -3196,8 +3196,14 @@ int xe_gem_create_ioctl(struct drm_device 
>>> *dev, void *data,
>>>       if (args->flags & DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING)
>>>           bo_flags |= XE_BO_FLAG_DEFER_BACKING;
>>> +    /*
>>> +     * Display scanout is always non-coherent with the CPU cache.
>>> +     *
>>> +     * For Xe_LPG and beyond up to NVL-P (excluding), PPGTT PTE
>>> +     * lookups are also non-coherent and require a CPU:WC mapping.
>>> +     */
>>
>> I believe this comment should be now removed from the other place.
>> But it doesn't hurt if you decide to keep in both places.
> 
> My bad, I thought you were suggesting a new comment text to add, and 
> while touching the flag in xe_ttm_tt_create apparently I suffered from 
> tunnel vision and did no see the same comment there. I will remove that 
> one but will hold of the respin until more of the series gets comments.

Actually what do you think of having it like this:

--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -510,12 +510,10 @@ static struct ttm_tt *xe_ttm_tt_create(struct 
ttm_buffer_object *ttm_bo,
                 WARN_ON((bo->flags & XE_BO_FLAG_USER) && !bo->cpu_caching);

                 /*
-                * Display scanout is always non-coherent with the CPU 
cache.
-                *
                  * For Xe_LPG and beyond up to NVL-P (excluding), PPGTT PTE
                  * lookups are also non-coherent and require a CPU:WC 
mapping.
                  */
-               if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_SCANOUT) ||
+               if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_FORCE_WC) ||
                      (!xe->info.has_cached_pt && bo->flags & 
XE_BO_FLAG_PAGETABLE))
                         caching = ttm_write_combined;
         }
@@ -3196,8 +3194,11 @@ int xe_gem_create_ioctl(struct drm_device *dev, 
void *data,
         if (args->flags & DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING)
                 bo_flags |= XE_BO_FLAG_DEFER_BACKING;

+       /*
+        * Display scanout is always non-coherent with the CPU cache.
+        */
         if (args->flags & DRM_XE_GEM_CREATE_FLAG_SCANOUT)
-               bo_flags |= XE_BO_FLAG_SCANOUT;
+               bo_flags |= XE_BO_FLAG_FORCE_WC;

         if (args->flags & DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION) {
                 if (XE_IOCTL_DBG(xe, GRAPHICS_VER(xe) < 20))

The new comment in xe_gem_create_ioctl() explains the scanount uapi 
flag, while the comment in xe_ttm_tt_create() is partially left to 
explain the XE_BO_FLAG_PAGETABLE angle.

Regards,

Tvrtko

> 
>> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> 
> Thank you!
> 
> Regards,
> 
> Tvrtko
> 
>>
>>>       if (args->flags & DRM_XE_GEM_CREATE_FLAG_SCANOUT)
>>> -        bo_flags |= XE_BO_FLAG_SCANOUT;
>>> +        bo_flags |= XE_BO_FLAG_FORCE_WC;
>>>       if (args->flags & DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION) {
>>>           if (XE_IOCTL_DBG(xe, GRAPHICS_VER(xe) < 20))
>>> @@ -3209,7 +3215,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, 
>>> void *data,
>>>       /* CCS formats need physical placement at a 64K alignment in 
>>> VRAM. */
>>>       if ((bo_flags & XE_BO_FLAG_VRAM_MASK) &&
>>> -        (bo_flags & XE_BO_FLAG_SCANOUT) &&
>>> +        (args->flags & XE_BO_FLAG_FORCE_WC) &&
>>>           !(xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K) &&
>>>           IS_ALIGNED(args->size, SZ_64K))
>>>           bo_flags |= XE_BO_FLAG_NEEDS_64K;
>>> @@ -3229,7 +3235,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, 
>>> void *data,
>>>                args->cpu_caching != DRM_XE_GEM_CPU_CACHING_WC))
>>>           return -EINVAL;
>>> -    if (XE_IOCTL_DBG(xe, bo_flags & XE_BO_FLAG_SCANOUT &&
>>> +    if (XE_IOCTL_DBG(xe, bo_flags & XE_BO_FLAG_FORCE_WC &&
>>>                args->cpu_caching == DRM_XE_GEM_CPU_CACHING_WB))
>>>           return -EINVAL;
>>> @@ -3642,7 +3648,7 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
>>>       bo = xe_bo_create_user(xe, NULL, args->size,
>>>                      DRM_XE_GEM_CPU_CACHING_WC,
>>>                      
>>> XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
>>> -                   XE_BO_FLAG_SCANOUT |
>>> +                   XE_BO_FLAG_FORCE_WC |
>>>                      XE_BO_FLAG_NEEDS_CPU_ACCESS, NULL);
>>>       if (IS_ERR(bo))
>>>           return PTR_ERR(bo);
>>> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
>>> index c914ab719f20..af9a6669c872 100644
>>> --- a/drivers/gpu/drm/xe/xe_bo.h
>>> +++ b/drivers/gpu/drm/xe/xe_bo.h
>>> @@ -35,7 +35,7 @@
>>>   #define XE_BO_FLAG_PINNED        BIT(7)
>>>   #define XE_BO_FLAG_NO_RESV_EVICT    BIT(8)
>>>   #define XE_BO_FLAG_DEFER_BACKING    BIT(9)
>>> -#define XE_BO_FLAG_SCANOUT        BIT(10)
>>> +#define XE_BO_FLAG_FORCE_WC        BIT(10)
>>>   #define XE_BO_FLAG_FIXED_PLACEMENT    BIT(11)
>>>   #define XE_BO_FLAG_PAGETABLE        BIT(12)
>>>   #define XE_BO_FLAG_NEEDS_CPU_ACCESS    BIT(13)
>>> -- 
>>> 2.52.0
>>>
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v18 6/9] drm/xe: Move aux table invalidation to ring ops
  2026-03-04 13:03 ` [PATCH v18 6/9] drm/xe: Move aux table invalidation to ring ops Tvrtko Ursulin
@ 2026-03-04 16:20   ` Matthew Brost
  0 siblings, 0 replies; 17+ messages in thread
From: Matthew Brost @ 2026-03-04 16:20 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-xe, kernel-dev, Rodrigo Vivi

On Wed, Mar 04, 2026 at 01:03:11PM +0000, Tvrtko Ursulin wrote:
> Implement the suggestion of moving the aux invalidation from a helper to a
> ring ops vfunc, together with the suggestion to split the vfunc table of
> video decode and video enhance engines.
> 
> With this done the LRC code will be able to access the functionality via
> the newly added ring ops vfunc.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Suggested-by: Matthew Brost <matthew.brost@intel.com>

Overall look much better. One comment below.

> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_ring_ops.c       | 105 ++++++++++++++++++-------
>  drivers/gpu/drm/xe/xe_ring_ops.h       |   3 +
>  drivers/gpu/drm/xe/xe_ring_ops_types.h |   3 +
>  3 files changed, 83 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
> index 596379e6d742..8947b3091873 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> @@ -48,22 +48,48 @@ static u32 preparser_disable(bool state)
>  	return MI_ARB_CHECK | BIT(8) | state;
>  }
>  
> -static int emit_aux_table_inv(struct xe_gt *gt, struct xe_reg reg,
> -			      u32 *dw, int i)
> +static u32 *
> +__emit_aux_table_inv(u32 *cmd, const struct xe_reg reg, u32 adj_offset)
>  {
> -	dw[i++] = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(1) | MI_LRI_MMIO_REMAP_EN;
> -	dw[i++] = reg.addr + gt->mmio.adj_offset;
> -	dw[i++] = AUX_INV;
> -	dw[i++] = MI_SEMAPHORE_WAIT_TOKEN |
> -		  MI_SEMAPHORE_REGISTER_POLL |
> -		  MI_SEMAPHORE_POLL |
> -		  MI_SEMAPHORE_SAD_EQ_SDD;
> -	dw[i++] = 0;
> -	dw[i++] = reg.addr + gt->mmio.adj_offset;
> -	dw[i++] = 0;
> -	dw[i++] = 0;
> +	*cmd++ = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(1) |
> +		 MI_LRI_MMIO_REMAP_EN;
> +	*cmd++ = reg.addr + adj_offset;
> +	*cmd++ = AUX_INV;
> +	*cmd++ = MI_SEMAPHORE_WAIT_TOKEN | MI_SEMAPHORE_REGISTER_POLL |
> +		 MI_SEMAPHORE_POLL | MI_SEMAPHORE_SAD_EQ_SDD;
> +	*cmd++ = 0;
> +	*cmd++ = reg.addr + adj_offset;
> +	*cmd++ = 0;
> +	*cmd++ = 0;
>  
> -	return i;
> +	return cmd;
> +}
> +
> +static u32 *emit_aux_table_inv_render_compute(struct xe_gt *gt, u32 *cmd)
> +{
> +	return __emit_aux_table_inv(cmd, CCS_AUX_INV, gt->mmio.adj_offset);
> +}
> +
> +static u32 *emit_aux_table_inv_video_decode(struct xe_gt *gt, u32 *cmd)
> +{
> +	return __emit_aux_table_inv(cmd, VD0_AUX_INV, gt->mmio.adj_offset);
> +}
> +
> +static u32 *emit_aux_table_inv_video_enhance(struct xe_gt *gt, u32 *cmd)
> +{
> +	return __emit_aux_table_inv(cmd, VE0_AUX_INV, gt->mmio.adj_offset);
> +}
> +
> +static int emit_aux_table_inv(struct xe_hw_engine *hwe, u32 *dw, int i)
> +{
> +	struct xe_gt *gt = hwe->gt;
> +	u32 *(*emit)(struct xe_gt *gt, u32 *cmd) =
> +		gt->ring_ops[hwe->class]->emit_aux_table_inv;
> +
> +	if (emit)
> +		return emit(gt, dw + i) - dw;
> +	else
> +		return i;
>  }
>  
>  static int emit_user_interrupt(u32 *dw, int i)
> @@ -327,7 +353,6 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
>  	u32 ppgtt_flag = get_ppgtt_flag(job);
>  	struct xe_gt *gt = job->q->gt;
>  	struct xe_device *xe = gt_to_xe(gt);
> -	bool decode = job->q->class == XE_ENGINE_CLASS_VIDEO_DECODE;
>  
>  	*head = lrc->ring.tail;
>  
> @@ -336,12 +361,7 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
>  	dw[i++] = preparser_disable(true);
>  
>  	/* hsdes: 1809175790 */
> -	if (has_aux_ccs(xe)) {
> -		if (decode)
> -			i = emit_aux_table_inv(gt, VD0_AUX_INV, dw, i);
> -		else
> -			i = emit_aux_table_inv(gt, VE0_AUX_INV, dw, i);
> -	}
> +	i = emit_aux_table_inv(job->q->hwe, dw, i);
>  
>  	if (job->ring_ops_flush_tlb)
>  		i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
> @@ -384,7 +404,6 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
>  	struct xe_gt *gt = job->q->gt;
>  	struct xe_device *xe = gt_to_xe(gt);
>  	bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
> -	const bool aux_ccs = has_aux_ccs(xe);
>  	u32 mask_flags = 0;
>  
>  	*head = lrc->ring.tail;
> @@ -395,7 +414,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
>  	 * On AuxCCS platforms the invalidation of the Aux table requires
>  	 * quiescing the memory traffic beforehand.
>  	 */
> -	if (aux_ccs)
> +	if (has_aux_ccs(xe))
>  		i = emit_render_cache_flush(job, dw, i);
>  
>  	dw[i++] = preparser_disable(true);
> @@ -408,8 +427,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
>  	i = emit_pipe_invalidate(job->q, mask_flags, job->ring_ops_flush_tlb, dw, i);
>  
>  	/* hsdes: 1809175790 */
> -	if (aux_ccs)
> -		i = emit_aux_table_inv(gt, CCS_AUX_INV, dw, i);
> +	i = emit_aux_table_inv(job->q->hwe, dw, i);
>  
>  	dw[i++] = preparser_disable(false);
>  
> @@ -534,7 +552,11 @@ static const struct xe_ring_ops ring_ops_gen12_copy = {
>  	.emit_job = emit_job_gen12_copy,
>  };
>  
> -static const struct xe_ring_ops ring_ops_gen12_video = {
> +static const struct xe_ring_ops ring_ops_gen12_video_decode = {
> +	.emit_job = emit_job_gen12_video,
> +};
> +
> +static const struct xe_ring_ops ring_ops_gen12_video_enhance = {
>  	.emit_job = emit_job_gen12_video,
>  };
>  
> @@ -542,20 +564,47 @@ static const struct xe_ring_ops ring_ops_gen12_render_compute = {
>  	.emit_job = emit_job_gen12_render_compute,
>  };
>  
> +static const struct xe_ring_ops auxccs_ring_ops_gen12_video_decode = {
> +	.emit_job = emit_job_gen12_video,
> +	.emit_aux_table_inv = emit_aux_table_inv_video_decode,
> +};
> +
> +static const struct xe_ring_ops auxccs_ring_ops_gen12_video_enhance = {
> +	.emit_job = emit_job_gen12_video,
> +	.emit_aux_table_inv = emit_aux_table_inv_video_enhance,
> +};
> +
> +static const struct xe_ring_ops auxccs_ring_ops_gen12_render_compute = {
> +	.emit_job = emit_job_gen12_render_compute,
> +	.emit_aux_table_inv = emit_aux_table_inv_render_compute,
> +};
> +
>  const struct xe_ring_ops *
>  xe_ring_ops_get(struct xe_gt *gt, enum xe_engine_class class)
>  {
> +	struct xe_device *xe = gt_to_xe(gt);
> +
>  	switch (class) {
>  	case XE_ENGINE_CLASS_OTHER:
>  		return &ring_ops_gen12_gsc;
>  	case XE_ENGINE_CLASS_COPY:
>  		return &ring_ops_gen12_copy;
>  	case XE_ENGINE_CLASS_VIDEO_DECODE:
> +		if (has_aux_ccs(xe))
> +			return &auxccs_ring_ops_gen12_video_decode;
> +		else
> +			return &ring_ops_gen12_video_decode;
>  	case XE_ENGINE_CLASS_VIDEO_ENHANCE:
> -		return &ring_ops_gen12_video;
> +		if (has_aux_ccs(xe))
> +			return &auxccs_ring_ops_gen12_video_enhance;
> +		else
> +			return &ring_ops_gen12_video_enhance;
>  	case XE_ENGINE_CLASS_RENDER:
>  	case XE_ENGINE_CLASS_COMPUTE:
> -		return &ring_ops_gen12_render_compute;
> +		if (has_aux_ccs(xe))
> +			return &auxccs_ring_ops_gen12_render_compute;
> +		else
> +			return &ring_ops_gen12_render_compute;
>  	default:
>  		return NULL;
>  	}
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.h b/drivers/gpu/drm/xe/xe_ring_ops.h
> index e942735d76a6..5a2d32f9bb25 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops.h
> +++ b/drivers/gpu/drm/xe/xe_ring_ops.h
> @@ -10,8 +10,11 @@
>  #include "xe_ring_ops_types.h"
>  
>  struct xe_gt;
> +struct xe_hw_engine;
>  
>  const struct xe_ring_ops *
>  xe_ring_ops_get(struct xe_gt *gt, enum xe_engine_class class);
>  
> +u32 *xe_emit_aux_table_inv(struct xe_hw_engine *hwe, u32 *cmd);
> +

I think the changes in xe_ring_ops.h are leftover from the previous
revs and can be deleted.

Matt

>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops_types.h b/drivers/gpu/drm/xe/xe_ring_ops_types.h
> index 1197fc0bf2af..e25630fac17e 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops_types.h
> +++ b/drivers/gpu/drm/xe/xe_ring_ops_types.h
> @@ -17,6 +17,9 @@ struct xe_sched_job;
>  struct xe_ring_ops {
>  	/** @emit_job: Write job to ring */
>  	void (*emit_job)(struct xe_sched_job *job);
> +
> +	/** @emit_aux_table_inv: Emit aux table invalidation to the ring */
> +	u32 *(*emit_aux_table_inv)(struct xe_gt *gt, u32 *cmd);
>  };
>  
>  #endif
> -- 
> 2.52.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-03-04 16:20 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-04 13:03 [PATCH v18 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
2026-03-04 13:03 ` [PATCH v18 1/9] drm/xe: Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC Tvrtko Ursulin
2026-03-04 13:50   ` Rodrigo Vivi
2026-03-04 14:23     ` Tvrtko Ursulin
2026-03-04 14:41       ` Tvrtko Ursulin
2026-03-04 13:03 ` [PATCH v18 2/9] drm/xe: Use write-combine mapping when populating DPT Tvrtko Ursulin
2026-03-04 13:51   ` Rodrigo Vivi
2026-03-04 13:03 ` [PATCH v18 3/9] drm/xe/xelpg: Limit AuxCCS ring buffer programming to Alderlake Tvrtko Ursulin
2026-03-04 13:03 ` [PATCH v18 4/9] drm/xe/xelp: Quiesce memory traffic before invalidating AuxCCS Tvrtko Ursulin
2026-03-04 13:03 ` [PATCH v18 5/9] drm/xe/xelp: Wait for AuxCCS invalidation to complete Tvrtko Ursulin
2026-03-04 13:03 ` [PATCH v18 6/9] drm/xe: Move aux table invalidation to ring ops Tvrtko Ursulin
2026-03-04 16:20   ` Matthew Brost
2026-03-04 13:03 ` [PATCH v18 7/9] drm/xe/xelp: Add AuxCCS invalidation to the indirect context workarounds Tvrtko Ursulin
2026-03-04 13:51   ` Rodrigo Vivi
2026-03-04 13:03 ` [PATCH v18 8/9] drm/xe/display: Add support for AuxCCS Tvrtko Ursulin
2026-03-04 13:03 ` [PATCH v18 9/9] drm/xe/xelp: Expose AuxCCS frame buffer modifiers on Alderlake-P Tvrtko Ursulin
2026-03-04 13:55   ` Rodrigo Vivi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox