* [PATCH v21 0/9] AuxCCS handling and render compression modifiers
@ 2026-03-10 12:34 Tvrtko Ursulin
2026-03-10 12:34 ` [PATCH v21 1/9] drm/xe: Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC Tvrtko Ursulin
` (10 more replies)
0 siblings, 11 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2026-03-10 12:34 UTC (permalink / raw)
To: intel-xe; +Cc: kernel-dev, Tvrtko Ursulin, Rodrigo Vivi
A series to add support for compressed surface scanout under xe with
Alderlake-P.
Currently the auxiliary buffer data isn't mapped into the page tables at all so
cf48bddd31de ("drm/i915/display: Disable AuxCCS framebuffers if built for Xe")
had to disable the support.
On top of that there are missing flushes, invalidations and similar.
Tested with KDE Wayland, on Lenovo Carbon X1 ADL-P:
[PLANE:32:plane 1A]: type=PRI
uapi: [FB:242] AR30 little-endian (0x30335241),0x100000000000008,2880x1800, visible=visible, src=2880.000000x1800.000000+0.000000+0.000000, dst=2880x1800+0+0, rotation=0 (0x00000001)
hw: [FB:242] AR30 little-endian (0x30335241),0x100000000000008,2880x1800, visible=yes, src=2880.000000x1800.000000+0.000000+0.000000, dst=2880x1800+0+0, rotation=0 (0x00000001)
Display working fine - no artefacts, no DMAR/PIPE faults.
All IGTs pass for me locally.
v2:
* More patches added to fix kms_flip_tiling.
v3:
* Rebased after some cleanup patches from v2 were merged.
* Added people to Cc as suggested by Rodrigo.
* Adjusted last patch title. (Rodrigo)
* Apply GGTT flushing only to iomapped system memory buffers.
v4:
* Added patch for potentially misplaced Wa_14016712196.
* Fixed (hopefully) MAX_JOB_SIZE_DW on Meteorlake.
v5:
* Split out ring emission changes to smaller patches.
* Fixed MAX_JOB_SIZE_DW even more.
* Don't emit MI_FLUSH_DW_CCS on !BCS. This should fix Meteorlake.
v6:
* Added AuxCCS invalidation to indirect context workarounds.
* Also added the indirect context handling and some other workarounds. They are
unrelated but the series depends on it.
* Dropped DPT pin alignment reduction since BMG appears not to be liking it for
some reason.
v7:
* Rebased on top of recent xe_fb_pin.c refactoring and also the indirect
context workarounds series.
v8:
* Rebased for bo->size removal.
* Corrected PIPE_CONTROL_FLUSH_L3 to bit 30. (Jose)
v9:
* Fixed fb remapping changes.
* Dropped two not required patches from the series.
* Fixed criteria for GGTT flushing.
* Limit clflush to the compression metadata area.
* Rebased for indirect context workarounds landing upstream.
v10:
* Rebase for XE_GT_WA().
v11:
* Do not use stolen for DPT on IGFX + AuxCCS.
v12:
* Rebased for some ringbuf and LRC code changes.
v13:
* Rebased for various upstream changes.
* Dropped clflush and stolen avoidance patches after merging IGT MOCS 61 usage.
v14:
* MMIO 0x4248 and MI_FLUSH_DW_CCS are MTL+. (Matt)
* Consolidate engine feature checks. (Ville)
* Brought back the patch to put DPT tables in system memory for 100% CI pass
rate. It looks like MOCS 61 is not enough to avoid sporadic pipecrc
mismatches.
v15:
* Limited to enabling on Alderlake-P only. (Dropped all Meteorlake patches.)
* Dropped unrelated GGTT alignment fix. (Sent standalone.)
* Use display parent interface for probing AuxCCS driver support.
v16:
* Use write-combine for DPT in stolen memory. (Ville)
* Dropped clflush patches under assumption pre-production ADL machine were the
reason for sporadic pipecrc failures.
v17:
* Mechanical rebase for upstream conflicts.
v18:
* Added a patch to rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC. (Rodrigo)
* Instead of exporting a helper function for emitting the aux invalidation
into the ring, add it to the ring ops vfunc table. (Matthew)
v19:
* Tweaked comments and removed some stray hunks from v17.
v20:
* Include <linux/types.h> for u32.
v21:
* Forward declare struct xe_gt to fix standalone headers test.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Tvrtko Ursulin (9):
drm/xe: Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC
drm/xe: Use write-combine mapping when populating DPT
drm/xe/xelpg: Limit AuxCCS ring buffer programming to Alderlake
drm/xe/xelp: Quiesce memory traffic before invalidating AuxCCS
drm/xe/xelp: Wait for AuxCCS invalidation to complete
drm/xe: Move aux table invalidation to ring ops
drm/xe/xelp: Add AuxCCS invalidation to the indirect context
workarounds
drm/xe/display: Add support for AuxCCS
drm/xe/xelp: Expose AuxCCS frame buffer modifiers on Alderlake-P
drivers/gpu/drm/xe/display/intel_fb_bo.c | 6 +-
drivers/gpu/drm/xe/display/intel_fbdev_fb.c | 12 +-
drivers/gpu/drm/xe/display/xe_display.c | 8 ++
drivers/gpu/drm/xe/display/xe_dsb_buffer.c | 4 +-
drivers/gpu/drm/xe/display/xe_fb_pin.c | 116 +++++++++++++-----
drivers/gpu/drm/xe/display/xe_initial_plane.c | 2 +-
.../gpu/drm/xe/instructions/xe_mi_commands.h | 6 +
drivers/gpu/drm/xe/xe_bo.c | 15 +--
drivers/gpu/drm/xe/xe_bo.h | 2 +-
drivers/gpu/drm/xe/xe_lrc.c | 23 ++++
drivers/gpu/drm/xe/xe_ring_ops.c | 106 ++++++++++++----
drivers/gpu/drm/xe/xe_ring_ops_types.h | 8 +-
12 files changed, 237 insertions(+), 71 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v21 1/9] drm/xe: Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC
2026-03-10 12:34 [PATCH v21 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
@ 2026-03-10 12:34 ` Tvrtko Ursulin
2026-03-10 12:34 ` [PATCH v21 2/9] drm/xe: Use write-combine mapping when populating DPT Tvrtko Ursulin
` (9 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2026-03-10 12:34 UTC (permalink / raw)
To: intel-xe; +Cc: kernel-dev, Tvrtko Ursulin, Rodrigo Vivi
Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC so that the usage of the
flag can legitimately be expanded to more than just the actual frame-
buffer objects.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
drivers/gpu/drm/xe/display/intel_fb_bo.c | 6 +++---
drivers/gpu/drm/xe/display/intel_fbdev_fb.c | 12 ++++++++----
drivers/gpu/drm/xe/display/xe_dsb_buffer.c | 4 +++-
drivers/gpu/drm/xe/display/xe_fb_pin.c | 2 +-
drivers/gpu/drm/xe/display/xe_initial_plane.c | 2 +-
drivers/gpu/drm/xe/xe_bo.c | 15 ++++++++-------
drivers/gpu/drm/xe/xe_bo.h | 2 +-
7 files changed, 25 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/xe/display/intel_fb_bo.c b/drivers/gpu/drm/xe/display/intel_fb_bo.c
index db8b1a27b4de..d2e72dc5abd9 100644
--- a/drivers/gpu/drm/xe/display/intel_fb_bo.c
+++ b/drivers/gpu/drm/xe/display/intel_fb_bo.c
@@ -45,9 +45,9 @@ int intel_fb_bo_framebuffer_init(struct drm_gem_object *obj,
if (ret)
goto err;
- if (!(bo->flags & XE_BO_FLAG_SCANOUT)) {
+ if (!(bo->flags & XE_BO_FLAG_FORCE_WC)) {
/*
- * XE_BO_FLAG_SCANOUT should ideally be set at creation, or is
+ * XE_BO_FLAG_FORCE_WC should ideally be set at creation, or is
* automatically set when creating FB. We cannot change caching
* mode when the bo is VM_BINDed, so we can only set
* coherency with display when unbound.
@@ -57,7 +57,7 @@ int intel_fb_bo_framebuffer_init(struct drm_gem_object *obj,
ret = -EINVAL;
goto err;
}
- bo->flags |= XE_BO_FLAG_SCANOUT;
+ bo->flags |= XE_BO_FLAG_FORCE_WC;
}
ttm_bo_unreserve(&bo->ttm);
return 0;
diff --git a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
index 87af5646c938..d7030e4d814c 100644
--- a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
+++ b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
@@ -56,9 +56,11 @@ struct drm_gem_object *intel_fbdev_fb_bo_create(struct drm_device *drm, int size
if (intel_fbdev_fb_prefer_stolen(drm, size)) {
obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe),
size,
- ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
+ ttm_bo_type_kernel,
+ XE_BO_FLAG_FORCE_WC |
XE_BO_FLAG_STOLEN |
- XE_BO_FLAG_GGTT, false);
+ XE_BO_FLAG_GGTT,
+ false);
if (!IS_ERR(obj))
drm_info(&xe->drm, "Allocated fbdev into stolen\n");
else
@@ -69,9 +71,11 @@ struct drm_gem_object *intel_fbdev_fb_bo_create(struct drm_device *drm, int size
if (IS_ERR(obj)) {
obj = xe_bo_create_pin_map_novm(xe, xe_device_get_root_tile(xe), size,
- ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
+ ttm_bo_type_kernel,
+ XE_BO_FLAG_FORCE_WC |
XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
- XE_BO_FLAG_GGTT, false);
+ XE_BO_FLAG_GGTT,
+ false);
}
if (IS_ERR(obj)) {
diff --git a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
index 1c67a950c6ad..a7158c73a14c 100644
--- a/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
+++ b/drivers/gpu/drm/xe/display/xe_dsb_buffer.c
@@ -54,7 +54,9 @@ static struct intel_dsb_buffer *xe_dsb_buffer_create(struct drm_device *drm, siz
PAGE_ALIGN(size),
ttm_bo_type_kernel,
XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
- XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT, false);
+ XE_BO_FLAG_FORCE_WC |
+ XE_BO_FLAG_GGTT,
+ false);
if (IS_ERR(obj)) {
ret = PTR_ERR(obj);
goto err_pin_map;
diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
index dbbc61032b7f..d4a9eb550cae 100644
--- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
+++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
@@ -429,7 +429,7 @@ int intel_plane_pin_fb(struct intel_plane_state *new_plane_state,
return 0;
/* We reject creating !SCANOUT fb's, so this is weird.. */
- drm_WARN_ON(bo->ttm.base.dev, !(bo->flags & XE_BO_FLAG_SCANOUT));
+ drm_WARN_ON(bo->ttm.base.dev, !(bo->flags & XE_BO_FLAG_FORCE_WC));
vma = __xe_pin_fb_vma(intel_fb, &new_plane_state->view.gtt, alignment);
diff --git a/drivers/gpu/drm/xe/display/xe_initial_plane.c b/drivers/gpu/drm/xe/display/xe_initial_plane.c
index 65cc0b0c934b..8bcae552dddc 100644
--- a/drivers/gpu/drm/xe/display/xe_initial_plane.c
+++ b/drivers/gpu/drm/xe/display/xe_initial_plane.c
@@ -48,7 +48,7 @@ initial_plane_bo(struct xe_device *xe,
if (plane_config->size == 0)
return NULL;
- flags = XE_BO_FLAG_SCANOUT | XE_BO_FLAG_GGTT;
+ flags = XE_BO_FLAG_FORCE_WC | XE_BO_FLAG_GGTT;
base = round_down(plane_config->base, page_size);
if (IS_DGFX(xe)) {
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 8ff193600443..c1fe6ae5447b 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -510,12 +510,10 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
WARN_ON((bo->flags & XE_BO_FLAG_USER) && !bo->cpu_caching);
/*
- * Display scanout is always non-coherent with the CPU cache.
- *
* For Xe_LPG and beyond up to NVL-P (excluding), PPGTT PTE
* lookups are also non-coherent and require a CPU:WC mapping.
*/
- if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_SCANOUT) ||
+ if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_FORCE_WC) ||
(!xe->info.has_cached_pt && bo->flags & XE_BO_FLAG_PAGETABLE))
caching = ttm_write_combined;
}
@@ -3196,8 +3194,11 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
if (args->flags & DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING)
bo_flags |= XE_BO_FLAG_DEFER_BACKING;
+ /*
+ * Display scanout is always non-coherent with the CPU cache.
+ */
if (args->flags & DRM_XE_GEM_CREATE_FLAG_SCANOUT)
- bo_flags |= XE_BO_FLAG_SCANOUT;
+ bo_flags |= XE_BO_FLAG_FORCE_WC;
if (args->flags & DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION) {
if (XE_IOCTL_DBG(xe, GRAPHICS_VER(xe) < 20))
@@ -3209,7 +3210,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
/* CCS formats need physical placement at a 64K alignment in VRAM. */
if ((bo_flags & XE_BO_FLAG_VRAM_MASK) &&
- (bo_flags & XE_BO_FLAG_SCANOUT) &&
+ (args->flags & XE_BO_FLAG_FORCE_WC) &&
!(xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K) &&
IS_ALIGNED(args->size, SZ_64K))
bo_flags |= XE_BO_FLAG_NEEDS_64K;
@@ -3229,7 +3230,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
args->cpu_caching != DRM_XE_GEM_CPU_CACHING_WC))
return -EINVAL;
- if (XE_IOCTL_DBG(xe, bo_flags & XE_BO_FLAG_SCANOUT &&
+ if (XE_IOCTL_DBG(xe, bo_flags & XE_BO_FLAG_FORCE_WC &&
args->cpu_caching == DRM_XE_GEM_CPU_CACHING_WB))
return -EINVAL;
@@ -3642,7 +3643,7 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
bo = xe_bo_create_user(xe, NULL, args->size,
DRM_XE_GEM_CPU_CACHING_WC,
XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
- XE_BO_FLAG_SCANOUT |
+ XE_BO_FLAG_FORCE_WC |
XE_BO_FLAG_NEEDS_CPU_ACCESS, NULL);
if (IS_ERR(bo))
return PTR_ERR(bo);
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index c914ab719f20..af9a6669c872 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -35,7 +35,7 @@
#define XE_BO_FLAG_PINNED BIT(7)
#define XE_BO_FLAG_NO_RESV_EVICT BIT(8)
#define XE_BO_FLAG_DEFER_BACKING BIT(9)
-#define XE_BO_FLAG_SCANOUT BIT(10)
+#define XE_BO_FLAG_FORCE_WC BIT(10)
#define XE_BO_FLAG_FIXED_PLACEMENT BIT(11)
#define XE_BO_FLAG_PAGETABLE BIT(12)
#define XE_BO_FLAG_NEEDS_CPU_ACCESS BIT(13)
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v21 2/9] drm/xe: Use write-combine mapping when populating DPT
2026-03-10 12:34 [PATCH v21 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
2026-03-10 12:34 ` [PATCH v21 1/9] drm/xe: Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC Tvrtko Ursulin
@ 2026-03-10 12:34 ` Tvrtko Ursulin
2026-03-10 12:34 ` [PATCH v21 3/9] drm/xe/xelpg: Limit AuxCCS ring buffer programming to Alderlake Tvrtko Ursulin
` (8 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2026-03-10 12:34 UTC (permalink / raw)
To: intel-xe
Cc: kernel-dev, Tvrtko Ursulin, Ville Syrjälä, Rodrigo Vivi
The fallback case for DPT backing store is a buffer object in system
memory buffer, which by default use a write-back CPU caching policy.
If this fallback gets triggered, and since there is currently no flushing,
the DPT writes made when pinning a buffer to display are not guaranteed to
be seen by the display engine.
To fix this, since both the local memory and the stolen memory DPT
placements already use write-combine, let us make the system memory option
follow suit by passing down the appropriate flag.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
drivers/gpu/drm/xe/display/xe_fb_pin.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
index d4a9eb550cae..df7d305c6fcd 100644
--- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
+++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
@@ -122,7 +122,8 @@ static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
ttm_bo_type_kernel,
XE_BO_FLAG_SYSTEM |
XE_BO_FLAG_GGTT |
- XE_BO_FLAG_PAGETABLE,
+ XE_BO_FLAG_PAGETABLE |
+ XE_BO_FLAG_FORCE_WC,
alignment, false);
if (IS_ERR(dpt))
return PTR_ERR(dpt);
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v21 3/9] drm/xe/xelpg: Limit AuxCCS ring buffer programming to Alderlake
2026-03-10 12:34 [PATCH v21 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
2026-03-10 12:34 ` [PATCH v21 1/9] drm/xe: Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC Tvrtko Ursulin
2026-03-10 12:34 ` [PATCH v21 2/9] drm/xe: Use write-combine mapping when populating DPT Tvrtko Ursulin
@ 2026-03-10 12:34 ` Tvrtko Ursulin
2026-03-10 12:34 ` [PATCH v21 4/9] drm/xe/xelp: Quiesce memory traffic before invalidating AuxCCS Tvrtko Ursulin
` (7 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2026-03-10 12:34 UTC (permalink / raw)
To: intel-xe; +Cc: kernel-dev, Tvrtko Ursulin, Rodrigo Vivi
At the moment the driver does not support AuxCCS at all due respective
modifiers being hidden from userspace.
As we are about to start enabling them, starting with Alderlake, let us
begin by limiting the ring buffer support to just that initial platform.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
drivers/gpu/drm/xe/xe_ring_ops.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index 53d420d72164..07235a895f4b 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -305,9 +305,9 @@ static bool has_aux_ccs(struct xe_device *xe)
* PVC is a special case that has no compression of either type
* (FlatCCS or AuxCCS). Also, AuxCCS is no longer used from Xe2
* onward, so any future platforms with no FlatCCS will not have
- * AuxCCS either.
+ * AuxCCS, and we explicity do not want to support it on MTL.
*/
- if (GRAPHICS_VER(xe) >= 20 || xe->info.platform == XE_PVC)
+ if (GRAPHICS_VERx100(xe) >= 1270 || xe->info.platform == XE_PVC)
return false;
return !xe->info.has_flat_ccs;
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v21 4/9] drm/xe/xelp: Quiesce memory traffic before invalidating AuxCCS
2026-03-10 12:34 [PATCH v21 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
` (2 preceding siblings ...)
2026-03-10 12:34 ` [PATCH v21 3/9] drm/xe/xelpg: Limit AuxCCS ring buffer programming to Alderlake Tvrtko Ursulin
@ 2026-03-10 12:34 ` Tvrtko Ursulin
2026-03-10 12:34 ` [PATCH v21 5/9] drm/xe/xelp: Wait for AuxCCS invalidation to complete Tvrtko Ursulin
` (6 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2026-03-10 12:34 UTC (permalink / raw)
To: intel-xe; +Cc: kernel-dev, Tvrtko Ursulin, Rodrigo Vivi
According to i915 commit
ad8ebf12217e ("drm/i915/gt: Ensure memory quiesced before invalidation")
quiescing of the memory traffic is required before invalidating the AuxCCS
tables.
Add an extra pipe control flush to achieve that.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
drivers/gpu/drm/xe/xe_ring_ops.c | 10 +++++++++-
drivers/gpu/drm/xe/xe_ring_ops_types.h | 2 +-
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index 07235a895f4b..91f52d7748ca 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -377,12 +377,20 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
struct xe_gt *gt = job->q->gt;
struct xe_device *xe = gt_to_xe(gt);
bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
+ const bool aux_ccs = has_aux_ccs(xe);
u32 mask_flags = 0;
*head = lrc->ring.tail;
i = emit_copy_timestamp(xe, lrc, dw, i);
+ /*
+ * On AuxCCS platforms the invalidation of the Aux table requires
+ * quiescing the memory traffic beforehand.
+ */
+ if (aux_ccs)
+ i = emit_render_cache_flush(job, dw, i);
+
dw[i++] = preparser_disable(true);
if (lacks_render)
mask_flags = PIPE_CONTROL_3D_ARCH_FLAGS;
@@ -393,7 +401,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
i = emit_pipe_invalidate(job->q, mask_flags, job->ring_ops_flush_tlb, dw, i);
/* hsdes: 1809175790 */
- if (has_aux_ccs(xe))
+ if (aux_ccs)
i = emit_aux_table_inv(gt, CCS_AUX_INV, dw, i);
dw[i++] = preparser_disable(false);
diff --git a/drivers/gpu/drm/xe/xe_ring_ops_types.h b/drivers/gpu/drm/xe/xe_ring_ops_types.h
index d7e3e150a9a5..477dc7defd72 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops_types.h
+++ b/drivers/gpu/drm/xe/xe_ring_ops_types.h
@@ -8,7 +8,7 @@
struct xe_sched_job;
-#define MAX_JOB_SIZE_DW 58
+#define MAX_JOB_SIZE_DW 70
#define MAX_JOB_SIZE_BYTES (MAX_JOB_SIZE_DW * 4)
/**
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v21 5/9] drm/xe/xelp: Wait for AuxCCS invalidation to complete
2026-03-10 12:34 [PATCH v21 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
` (3 preceding siblings ...)
2026-03-10 12:34 ` [PATCH v21 4/9] drm/xe/xelp: Quiesce memory traffic before invalidating AuxCCS Tvrtko Ursulin
@ 2026-03-10 12:34 ` Tvrtko Ursulin
2026-03-10 12:34 ` [PATCH v21 6/9] drm/xe: Move aux table invalidation to ring ops Tvrtko Ursulin
` (5 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2026-03-10 12:34 UTC (permalink / raw)
To: intel-xe; +Cc: kernel-dev, Tvrtko Ursulin, Rodrigo Vivi
On AuxCCS platforms we need to wait for AuxCCS invalidations to complete.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
drivers/gpu/drm/xe/instructions/xe_mi_commands.h | 6 ++++++
drivers/gpu/drm/xe/xe_ring_ops.c | 9 ++++++++-
drivers/gpu/drm/xe/xe_ring_ops_types.h | 2 +-
3 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
index c47b290e0e9f..49d8ffd026d5 100644
--- a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
+++ b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
@@ -81,4 +81,10 @@
#define MI_SET_APPID_SESSION_ID_MASK REG_GENMASK(6, 0)
#define MI_SET_APPID_SESSION_ID(x) REG_FIELD_PREP(MI_SET_APPID_SESSION_ID_MASK, x)
+#define MI_SEMAPHORE_WAIT_TOKEN (__MI_INSTR(0x1c) | XE_INSTR_NUM_DW(5)) /* XeLP+ */
+#define MI_SEMAPHORE_REGISTER_POLL REG_BIT(16)
+#define MI_SEMAPHORE_POLL REG_BIT(15)
+#define MI_SEMAPHORE_CMP_OP_MASK REG_GENMASK(14, 12)
+#define MI_SEMAPHORE_SAD_EQ_SDD REG_FIELD_PREP(MI_SEMAPHORE_CMP_OP_MASK, 4)
+
#endif
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index 91f52d7748ca..596379e6d742 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -54,7 +54,14 @@ static int emit_aux_table_inv(struct xe_gt *gt, struct xe_reg reg,
dw[i++] = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(1) | MI_LRI_MMIO_REMAP_EN;
dw[i++] = reg.addr + gt->mmio.adj_offset;
dw[i++] = AUX_INV;
- dw[i++] = MI_NOOP;
+ dw[i++] = MI_SEMAPHORE_WAIT_TOKEN |
+ MI_SEMAPHORE_REGISTER_POLL |
+ MI_SEMAPHORE_POLL |
+ MI_SEMAPHORE_SAD_EQ_SDD;
+ dw[i++] = 0;
+ dw[i++] = reg.addr + gt->mmio.adj_offset;
+ dw[i++] = 0;
+ dw[i++] = 0;
return i;
}
diff --git a/drivers/gpu/drm/xe/xe_ring_ops_types.h b/drivers/gpu/drm/xe/xe_ring_ops_types.h
index 477dc7defd72..1197fc0bf2af 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops_types.h
+++ b/drivers/gpu/drm/xe/xe_ring_ops_types.h
@@ -8,7 +8,7 @@
struct xe_sched_job;
-#define MAX_JOB_SIZE_DW 70
+#define MAX_JOB_SIZE_DW 74
#define MAX_JOB_SIZE_BYTES (MAX_JOB_SIZE_DW * 4)
/**
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v21 6/9] drm/xe: Move aux table invalidation to ring ops
2026-03-10 12:34 [PATCH v21 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
` (4 preceding siblings ...)
2026-03-10 12:34 ` [PATCH v21 5/9] drm/xe/xelp: Wait for AuxCCS invalidation to complete Tvrtko Ursulin
@ 2026-03-10 12:34 ` Tvrtko Ursulin
2026-03-10 21:47 ` Matthew Brost
2026-03-10 12:34 ` [PATCH v21 7/9] drm/xe/xelp: Add AuxCCS invalidation to the indirect context workarounds Tvrtko Ursulin
` (4 subsequent siblings)
10 siblings, 1 reply; 18+ messages in thread
From: Tvrtko Ursulin @ 2026-03-10 12:34 UTC (permalink / raw)
To: intel-xe; +Cc: kernel-dev, Tvrtko Ursulin, Matthew Brost, Rodrigo Vivi
Implement the suggestion of moving the aux invalidation from a helper to a
ring ops vfunc, together with the suggestion to split the vfunc table of
video decode and video enhance engines.
With this done the LRC code will be able to access the functionality via
the newly added ring ops vfunc.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
v2:
* Removed stray hunks from v1.
v3:
* Include header for u32.
v4:
* Forward declare struct xe_gt.
---
drivers/gpu/drm/xe/xe_ring_ops.c | 105 ++++++++++++++++++-------
drivers/gpu/drm/xe/xe_ring_ops_types.h | 6 ++
2 files changed, 83 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index 596379e6d742..8947b3091873 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -48,22 +48,48 @@ static u32 preparser_disable(bool state)
return MI_ARB_CHECK | BIT(8) | state;
}
-static int emit_aux_table_inv(struct xe_gt *gt, struct xe_reg reg,
- u32 *dw, int i)
+static u32 *
+__emit_aux_table_inv(u32 *cmd, const struct xe_reg reg, u32 adj_offset)
{
- dw[i++] = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(1) | MI_LRI_MMIO_REMAP_EN;
- dw[i++] = reg.addr + gt->mmio.adj_offset;
- dw[i++] = AUX_INV;
- dw[i++] = MI_SEMAPHORE_WAIT_TOKEN |
- MI_SEMAPHORE_REGISTER_POLL |
- MI_SEMAPHORE_POLL |
- MI_SEMAPHORE_SAD_EQ_SDD;
- dw[i++] = 0;
- dw[i++] = reg.addr + gt->mmio.adj_offset;
- dw[i++] = 0;
- dw[i++] = 0;
+ *cmd++ = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(1) |
+ MI_LRI_MMIO_REMAP_EN;
+ *cmd++ = reg.addr + adj_offset;
+ *cmd++ = AUX_INV;
+ *cmd++ = MI_SEMAPHORE_WAIT_TOKEN | MI_SEMAPHORE_REGISTER_POLL |
+ MI_SEMAPHORE_POLL | MI_SEMAPHORE_SAD_EQ_SDD;
+ *cmd++ = 0;
+ *cmd++ = reg.addr + adj_offset;
+ *cmd++ = 0;
+ *cmd++ = 0;
- return i;
+ return cmd;
+}
+
+static u32 *emit_aux_table_inv_render_compute(struct xe_gt *gt, u32 *cmd)
+{
+ return __emit_aux_table_inv(cmd, CCS_AUX_INV, gt->mmio.adj_offset);
+}
+
+static u32 *emit_aux_table_inv_video_decode(struct xe_gt *gt, u32 *cmd)
+{
+ return __emit_aux_table_inv(cmd, VD0_AUX_INV, gt->mmio.adj_offset);
+}
+
+static u32 *emit_aux_table_inv_video_enhance(struct xe_gt *gt, u32 *cmd)
+{
+ return __emit_aux_table_inv(cmd, VE0_AUX_INV, gt->mmio.adj_offset);
+}
+
+static int emit_aux_table_inv(struct xe_hw_engine *hwe, u32 *dw, int i)
+{
+ struct xe_gt *gt = hwe->gt;
+ u32 *(*emit)(struct xe_gt *gt, u32 *cmd) =
+ gt->ring_ops[hwe->class]->emit_aux_table_inv;
+
+ if (emit)
+ return emit(gt, dw + i) - dw;
+ else
+ return i;
}
static int emit_user_interrupt(u32 *dw, int i)
@@ -327,7 +353,6 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
u32 ppgtt_flag = get_ppgtt_flag(job);
struct xe_gt *gt = job->q->gt;
struct xe_device *xe = gt_to_xe(gt);
- bool decode = job->q->class == XE_ENGINE_CLASS_VIDEO_DECODE;
*head = lrc->ring.tail;
@@ -336,12 +361,7 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
dw[i++] = preparser_disable(true);
/* hsdes: 1809175790 */
- if (has_aux_ccs(xe)) {
- if (decode)
- i = emit_aux_table_inv(gt, VD0_AUX_INV, dw, i);
- else
- i = emit_aux_table_inv(gt, VE0_AUX_INV, dw, i);
- }
+ i = emit_aux_table_inv(job->q->hwe, dw, i);
if (job->ring_ops_flush_tlb)
i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
@@ -384,7 +404,6 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
struct xe_gt *gt = job->q->gt;
struct xe_device *xe = gt_to_xe(gt);
bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
- const bool aux_ccs = has_aux_ccs(xe);
u32 mask_flags = 0;
*head = lrc->ring.tail;
@@ -395,7 +414,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
* On AuxCCS platforms the invalidation of the Aux table requires
* quiescing the memory traffic beforehand.
*/
- if (aux_ccs)
+ if (has_aux_ccs(xe))
i = emit_render_cache_flush(job, dw, i);
dw[i++] = preparser_disable(true);
@@ -408,8 +427,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
i = emit_pipe_invalidate(job->q, mask_flags, job->ring_ops_flush_tlb, dw, i);
/* hsdes: 1809175790 */
- if (aux_ccs)
- i = emit_aux_table_inv(gt, CCS_AUX_INV, dw, i);
+ i = emit_aux_table_inv(job->q->hwe, dw, i);
dw[i++] = preparser_disable(false);
@@ -534,7 +552,11 @@ static const struct xe_ring_ops ring_ops_gen12_copy = {
.emit_job = emit_job_gen12_copy,
};
-static const struct xe_ring_ops ring_ops_gen12_video = {
+static const struct xe_ring_ops ring_ops_gen12_video_decode = {
+ .emit_job = emit_job_gen12_video,
+};
+
+static const struct xe_ring_ops ring_ops_gen12_video_enhance = {
.emit_job = emit_job_gen12_video,
};
@@ -542,20 +564,47 @@ static const struct xe_ring_ops ring_ops_gen12_render_compute = {
.emit_job = emit_job_gen12_render_compute,
};
+static const struct xe_ring_ops auxccs_ring_ops_gen12_video_decode = {
+ .emit_job = emit_job_gen12_video,
+ .emit_aux_table_inv = emit_aux_table_inv_video_decode,
+};
+
+static const struct xe_ring_ops auxccs_ring_ops_gen12_video_enhance = {
+ .emit_job = emit_job_gen12_video,
+ .emit_aux_table_inv = emit_aux_table_inv_video_enhance,
+};
+
+static const struct xe_ring_ops auxccs_ring_ops_gen12_render_compute = {
+ .emit_job = emit_job_gen12_render_compute,
+ .emit_aux_table_inv = emit_aux_table_inv_render_compute,
+};
+
const struct xe_ring_ops *
xe_ring_ops_get(struct xe_gt *gt, enum xe_engine_class class)
{
+ struct xe_device *xe = gt_to_xe(gt);
+
switch (class) {
case XE_ENGINE_CLASS_OTHER:
return &ring_ops_gen12_gsc;
case XE_ENGINE_CLASS_COPY:
return &ring_ops_gen12_copy;
case XE_ENGINE_CLASS_VIDEO_DECODE:
+ if (has_aux_ccs(xe))
+ return &auxccs_ring_ops_gen12_video_decode;
+ else
+ return &ring_ops_gen12_video_decode;
case XE_ENGINE_CLASS_VIDEO_ENHANCE:
- return &ring_ops_gen12_video;
+ if (has_aux_ccs(xe))
+ return &auxccs_ring_ops_gen12_video_enhance;
+ else
+ return &ring_ops_gen12_video_enhance;
case XE_ENGINE_CLASS_RENDER:
case XE_ENGINE_CLASS_COMPUTE:
- return &ring_ops_gen12_render_compute;
+ if (has_aux_ccs(xe))
+ return &auxccs_ring_ops_gen12_render_compute;
+ else
+ return &ring_ops_gen12_render_compute;
default:
return NULL;
}
diff --git a/drivers/gpu/drm/xe/xe_ring_ops_types.h b/drivers/gpu/drm/xe/xe_ring_ops_types.h
index 1197fc0bf2af..52ff96bc4100 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops_types.h
+++ b/drivers/gpu/drm/xe/xe_ring_ops_types.h
@@ -6,6 +6,9 @@
#ifndef _XE_RING_OPS_TYPES_H_
#define _XE_RING_OPS_TYPES_H_
+#include <linux/types.h>
+
+struct xe_gt;
struct xe_sched_job;
#define MAX_JOB_SIZE_DW 74
@@ -17,6 +20,9 @@ struct xe_sched_job;
struct xe_ring_ops {
/** @emit_job: Write job to ring */
void (*emit_job)(struct xe_sched_job *job);
+
+ /** @emit_aux_table_inv: Emit aux table invalidation to the ring */
+ u32 *(*emit_aux_table_inv)(struct xe_gt *gt, u32 *cmd);
};
#endif
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v21 7/9] drm/xe/xelp: Add AuxCCS invalidation to the indirect context workarounds
2026-03-10 12:34 [PATCH v21 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
` (5 preceding siblings ...)
2026-03-10 12:34 ` [PATCH v21 6/9] drm/xe: Move aux table invalidation to ring ops Tvrtko Ursulin
@ 2026-03-10 12:34 ` Tvrtko Ursulin
2026-03-10 12:34 ` [PATCH v21 8/9] drm/xe/display: Add support for AuxCCS Tvrtko Ursulin
` (3 subsequent siblings)
10 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2026-03-10 12:34 UTC (permalink / raw)
To: intel-xe; +Cc: kernel-dev, Tvrtko Ursulin, Rodrigo Vivi
Following from the i915 reference implementation, we add the AuxCCS
invalidation to the indirect context workarounds page.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
v2:
* Reworked to accomodate aux invalidation becoming part of ring_ops.
---
drivers/gpu/drm/xe/xe_lrc.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index a408bb9d58e4..d8276334ecc3 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -27,6 +27,7 @@
#include "xe_map.h"
#include "xe_memirq.h"
#include "xe_mmio.h"
+#include "xe_ring_ops.h"
#include "xe_sriov.h"
#include "xe_trace_lrc.h"
#include "xe_vm.h"
@@ -93,6 +94,9 @@ gt_engine_needs_indirect_ctx(struct xe_gt *gt, enum xe_engine_class class)
class, NULL))
return true;
+ if (gt->ring_ops[class]->emit_aux_table_inv)
+ return true;
+
return false;
}
@@ -1216,6 +1220,23 @@ static ssize_t setup_invalidate_state_cache_wa(struct xe_lrc *lrc,
return cmd - batch;
}
+static ssize_t setup_invalidate_auxccs_wa(struct xe_lrc *lrc,
+ struct xe_hw_engine *hwe,
+ u32 *batch, size_t max_len)
+{
+ struct xe_gt *gt = lrc->gt;
+ u32 *(*emit)(struct xe_gt *gt, u32 *cmd) =
+ gt->ring_ops[hwe->class]->emit_aux_table_inv;
+
+ if (!emit)
+ return 0;
+
+ if (xe_gt_WARN_ON(gt, max_len < 8))
+ return -ENOSPC;
+
+ return emit(gt, batch) - batch;
+}
+
struct bo_setup {
ssize_t (*setup)(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
u32 *batch, size_t max_size);
@@ -1348,9 +1369,11 @@ setup_indirect_ctx(struct xe_lrc *lrc, struct xe_hw_engine *hwe)
{
static const struct bo_setup rcs_funcs[] = {
{ .setup = setup_timestamp_wa },
+ { .setup = setup_invalidate_auxccs_wa },
{ .setup = setup_configfs_mid_ctx_restore_bb },
};
static const struct bo_setup xcs_funcs[] = {
+ { .setup = setup_invalidate_auxccs_wa },
{ .setup = setup_configfs_mid_ctx_restore_bb },
};
struct bo_setup_state state = {
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v21 8/9] drm/xe/display: Add support for AuxCCS
2026-03-10 12:34 [PATCH v21 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
` (6 preceding siblings ...)
2026-03-10 12:34 ` [PATCH v21 7/9] drm/xe/xelp: Add AuxCCS invalidation to the indirect context workarounds Tvrtko Ursulin
@ 2026-03-10 12:34 ` Tvrtko Ursulin
2026-03-11 22:24 ` Rodrigo Vivi
2026-03-10 12:34 ` [PATCH v21 9/9] drm/xe/xelp: Expose AuxCCS frame buffer modifiers on Alderlake-P Tvrtko Ursulin
` (2 subsequent siblings)
10 siblings, 1 reply; 18+ messages in thread
From: Tvrtko Ursulin @ 2026-03-10 12:34 UTC (permalink / raw)
To: intel-xe
Cc: kernel-dev, Tvrtko Ursulin, Juha-Pekka Heikkila, Michael J. Ruhl,
Rodrigo Vivi
Add support for mapping the auxiliary CCS buffer into the DPT page tables.
This will allow for better power efficiency by enabling the render
compression frame buffer modifiers such as
I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS in a following patch.
We do this by refactoring the code a bit so handling for the linear
auxiliary frame buffer can be added in a tidy way. Also replace some
hardcoded constants.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Cc: Michael J. Ruhl <michael.j.ruhl@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
drivers/gpu/drm/xe/display/xe_fb_pin.c | 111 ++++++++++++++++++-------
1 file changed, 81 insertions(+), 30 deletions(-)
diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
index df7d305c6fcd..fe4b66e5c3ad 100644
--- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
+++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
@@ -49,33 +49,94 @@ write_dpt_rotated(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs, u32 bo_
*dpt_ofs = ALIGN(*dpt_ofs, 4096);
}
+static unsigned int
+write_dpt_padding(struct iosys_map *map, unsigned int dest, unsigned int pad)
+{
+ /* The DE ignores the PTEs for the padding tiles */
+ return dest + pad * sizeof(u64);
+}
+
+static unsigned int
+write_dpt_remapped_linear(struct xe_bo *bo, struct iosys_map *map,
+ unsigned int dest,
+ const struct intel_remapped_plane_info *plane)
+{
+ struct xe_device *xe = xe_bo_device(bo);
+ struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
+ const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
+ xe->pat.idx[XE_CACHE_NONE]);
+ unsigned int offset = plane->offset * XE_PAGE_SIZE;
+ unsigned int size = plane->size;
+
+ while (size--) {
+ u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
+
+ iosys_map_wr(map, dest, u64, addr | pte);
+ dest += sizeof(u64);
+ offset += XE_PAGE_SIZE;
+ }
+
+ return dest;
+}
+
+static unsigned int
+write_dpt_remapped_tiled(struct xe_bo *bo, struct iosys_map *map,
+ unsigned int dest,
+ const struct intel_remapped_plane_info *plane)
+{
+ struct xe_device *xe = xe_bo_device(bo);
+ struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
+ const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
+ xe->pat.idx[XE_CACHE_NONE]);
+ unsigned int offset, column, row;
+
+ for (row = 0; row < plane->height; row++) {
+ offset = (plane->offset + plane->src_stride * row) *
+ XE_PAGE_SIZE;
+
+ for (column = 0; column < plane->width; column++) {
+ u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
+
+ iosys_map_wr(map, dest, u64, addr | pte);
+ dest += sizeof(u64);
+ offset += XE_PAGE_SIZE;
+ }
+
+ dest = write_dpt_padding(map, dest,
+ plane->dst_stride - plane->width);
+ }
+
+ return dest;
+}
+
static void
-write_dpt_remapped(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs,
- u32 bo_ofs, u32 width, u32 height, u32 src_stride,
- u32 dst_stride)
+write_dpt_remapped(struct xe_bo *bo,
+ const struct intel_remapped_info *remap_info,
+ struct iosys_map *map)
{
- struct xe_device *xe = xe_bo_device(bo);
- struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
- u32 column, row;
- u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo, xe->pat.idx[XE_CACHE_NONE]);
+ unsigned int i, dest = 0;
- for (row = 0; row < height; row++) {
- u32 src_idx = src_stride * row + bo_ofs;
+ for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++) {
+ const struct intel_remapped_plane_info *plane =
+ &remap_info->plane[i];
- for (column = 0; column < width; column++) {
- u64 addr = xe_bo_addr(bo, src_idx * XE_PAGE_SIZE, XE_PAGE_SIZE);
- iosys_map_wr(map, *dpt_ofs, u64, pte | addr);
+ if (!plane->width && !plane->height && !plane->linear)
+ continue;
- *dpt_ofs += 8;
- src_idx++;
+ if (remap_info->plane_alignment) {
+ const unsigned int index = dest / sizeof(u64);
+ const unsigned int pad =
+ ALIGN(index, remap_info->plane_alignment) -
+ index;
+
+ dest = write_dpt_padding(map, dest, pad);
}
- /* The DE ignores the PTEs for the padding tiles */
- *dpt_ofs += (dst_stride - width) * 8;
+ if (plane->linear)
+ dest = write_dpt_remapped_linear(bo, map, dest, plane);
+ else
+ dest = write_dpt_remapped_tiled(bo, map, dest, plane);
}
-
- /* Align to next page */
- *dpt_ofs = ALIGN(*dpt_ofs, 4096);
}
static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
@@ -138,17 +199,7 @@ static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
iosys_map_wr(&dpt->vmap, x * 8, u64, pte | addr);
}
} else if (view->type == I915_GTT_VIEW_REMAPPED) {
- const struct intel_remapped_info *remap_info = &view->remapped;
- u32 i, dpt_ofs = 0;
-
- for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++)
- write_dpt_remapped(bo, &dpt->vmap, &dpt_ofs,
- remap_info->plane[i].offset,
- remap_info->plane[i].width,
- remap_info->plane[i].height,
- remap_info->plane[i].src_stride,
- remap_info->plane[i].dst_stride);
-
+ write_dpt_remapped(bo, &view->remapped, &dpt->vmap);
} else {
const struct intel_rotation_info *rot_info = &view->rotated;
u32 i, dpt_ofs = 0;
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v21 9/9] drm/xe/xelp: Expose AuxCCS frame buffer modifiers on Alderlake-P
2026-03-10 12:34 [PATCH v21 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
` (7 preceding siblings ...)
2026-03-10 12:34 ` [PATCH v21 8/9] drm/xe/display: Add support for AuxCCS Tvrtko Ursulin
@ 2026-03-10 12:34 ` Tvrtko Ursulin
2026-03-10 15:50 ` ✗ CI.checkpatch: warning for AuxCCS handling and render compression modifiers Patchwork
2026-03-10 15:50 ` ✗ CI.KUnit: failure " Patchwork
10 siblings, 0 replies; 18+ messages in thread
From: Tvrtko Ursulin @ 2026-03-10 12:34 UTC (permalink / raw)
To: intel-xe
Cc: kernel-dev, Tvrtko Ursulin, Jani Nikula,
José Roberto de Souza, Juha-Pekka Heikkila, Rodrigo Vivi
Now that we have implemented all the related missing bits we can enable
the AuxCCS compressed modifiers which were disabled in
cf48bddd31de ("drm/i915/display: Disable AuxCCS framebuffers if built for Xe").
Tested with KDE Wayland, on Lenovo Carbon X1 ADL-P:
[PLANE:32:plane 1A]: type=PRI
uapi: [FB:242] AR30 little-endian (0x30335241),0x100000000000008,2880x1800, visible=visible, src=28
hw: [FB:242] AR30 little-endian (0x30335241),0x100000000000008,2880x1800, visible=yes, src=2880.000
Display is working fine - no artefacts, no DMAR/PIPE faults.
v2:
* Adjust patch title. (Rodrigo)
v3:
* Complete rewrite based on the display parent interface.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
References: cf48bddd31de ("drm/i915/display: Disable AuxCCS framebuffers if built for Xe")
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
drivers/gpu/drm/xe/display/xe_display.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c
index c8dd3faa9b97..5180de285295 100644
--- a/drivers/gpu/drm/xe/display/xe_display.c
+++ b/drivers/gpu/drm/xe/display/xe_display.c
@@ -539,6 +539,13 @@ static const struct intel_display_irq_interface xe_display_irq_interface = {
.synchronize = irq_synchronize,
};
+static bool has_auxccs(struct drm_device *drm)
+{
+ struct xe_device *xe = to_xe_device(drm);
+
+ return xe->info.platform == XE_ALDERLAKE_P;
+}
+
static const struct intel_display_parent_interface parent = {
.dsb = &xe_display_dsb_interface,
.hdcp = &xe_display_hdcp_interface,
@@ -548,6 +555,7 @@ static const struct intel_display_parent_interface parent = {
.pcode = &xe_display_pcode_interface,
.rpm = &xe_display_rpm_interface,
.stolen = &xe_display_stolen_interface,
+ .has_auxccs = has_auxccs,
};
/**
--
2.52.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* ✗ CI.checkpatch: warning for AuxCCS handling and render compression modifiers
2026-03-10 12:34 [PATCH v21 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
` (8 preceding siblings ...)
2026-03-10 12:34 ` [PATCH v21 9/9] drm/xe/xelp: Expose AuxCCS frame buffer modifiers on Alderlake-P Tvrtko Ursulin
@ 2026-03-10 15:50 ` Patchwork
2026-03-10 15:50 ` ✗ CI.KUnit: failure " Patchwork
10 siblings, 0 replies; 18+ messages in thread
From: Patchwork @ 2026-03-10 15:50 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: intel-xe
== Series Details ==
Series: AuxCCS handling and render compression modifiers
URL : https://patchwork.freedesktop.org/series/162953/
State : warning
== Summary ==
+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
1f57ba1afceae32108bd24770069f764d940a0e4
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 004317ff024e787f794562f2457d617ace1c253f
Author: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Date: Tue Mar 10 12:34:53 2026 +0000
drm/xe/xelp: Expose AuxCCS frame buffer modifiers on Alderlake-P
Now that we have implemented all the related missing bits we can enable
the AuxCCS compressed modifiers which were disabled in
cf48bddd31de ("drm/i915/display: Disable AuxCCS framebuffers if built for Xe").
Tested with KDE Wayland, on Lenovo Carbon X1 ADL-P:
[PLANE:32:plane 1A]: type=PRI
uapi: [FB:242] AR30 little-endian (0x30335241),0x100000000000008,2880x1800, visible=visible, src=28
hw: [FB:242] AR30 little-endian (0x30335241),0x100000000000008,2880x1800, visible=yes, src=2880.000
Display is working fine - no artefacts, no DMAR/PIPE faults.
v2:
* Adjust patch title. (Rodrigo)
v3:
* Complete rewrite based on the display parent interface.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
References: cf48bddd31de ("drm/i915/display: Disable AuxCCS framebuffers if built for Xe")
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
+ /mt/dim checkpatch adf67fb1258884dab38a71526783f901069dbb81 drm-intel
a258c5cd5870 drm/xe: Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC
-:127: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#127: FILE: drivers/gpu/drm/xe/xe_bo.c:517:
+ if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_FORCE_WC) ||
(!xe->info.has_cached_pt && bo->flags & XE_BO_FLAG_PAGETABLE))
total: 0 errors, 0 warnings, 1 checks, 128 lines checked
9a15063a3621 drm/xe: Use write-combine mapping when populating DPT
7bec3c410d55 drm/xe/xelpg: Limit AuxCCS ring buffer programming to Alderlake
-:26: WARNING:TYPO_SPELLING: 'explicity' may be misspelled - perhaps 'explicitly'?
#26: FILE: drivers/gpu/drm/xe/xe_ring_ops.c:308:
+ * AuxCCS, and we explicity do not want to support it on MTL.
^^^^^^^^^
total: 0 errors, 1 warnings, 0 checks, 11 lines checked
74e61c681696 drm/xe/xelp: Quiesce memory traffic before invalidating AuxCCS
d5fcdf78c28e drm/xe/xelp: Wait for AuxCCS invalidation to complete
5d4d542e9438 drm/xe: Move aux table invalidation to ring ops
2593db62fdf9 drm/xe/xelp: Add AuxCCS invalidation to the indirect context workarounds
eb228c47d2ea drm/xe/display: Add support for AuxCCS
004317ff024e drm/xe/xelp: Expose AuxCCS frame buffer modifiers on Alderlake-P
-:12: WARNING:COMMIT_LOG_LONG_LINE: Prefer a maximum 75 chars per line (possible unwrapped commit description?)
#12:
cf48bddd31de ("drm/i915/display: Disable AuxCCS framebuffers if built for Xe").
-:12: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit cf48bddd31de ("drm/i915/display: Disable AuxCCS framebuffers if built for Xe")'
#12:
cf48bddd31de ("drm/i915/display: Disable AuxCCS framebuffers if built for Xe").
total: 1 errors, 1 warnings, 0 checks, 20 lines checked
^ permalink raw reply [flat|nested] 18+ messages in thread
* ✗ CI.KUnit: failure for AuxCCS handling and render compression modifiers
2026-03-10 12:34 [PATCH v21 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
` (9 preceding siblings ...)
2026-03-10 15:50 ` ✗ CI.checkpatch: warning for AuxCCS handling and render compression modifiers Patchwork
@ 2026-03-10 15:50 ` Patchwork
10 siblings, 0 replies; 18+ messages in thread
From: Patchwork @ 2026-03-10 15:50 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: intel-xe
== Series Details ==
Series: AuxCCS handling and render compression modifiers
URL : https://patchwork.freedesktop.org/series/162953/
State : failure
== Summary ==
+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
ERROR:root:../drivers/gpu/drm/xe/xe_lrc.c: In function ‘xe_lrc_ctx_init’:
../drivers/gpu/drm/xe/xe_lrc.c:1577:43: error: implicit declaration of function ‘_MASKED_BIT_ENABLE’; did you mean ‘REG_MASKED_FIELD_ENABLE’? [-Werror=implicit-function-declaration]
1577 | state_cache_perf_fix[2] = _MASKED_BIT_ENABLE(DISABLE_STATE_CACHE_PERF_FIX);
| ^~~~~~~~~~~~~~~~~~
| REG_MASKED_FIELD_ENABLE
cc1: some warnings being treated as errors
make[7]: *** [../scripts/Makefile.build:289: drivers/gpu/drm/xe/xe_lrc.o] Error 1
make[7]: *** Waiting for unfinished jobs....
make[6]: *** [../scripts/Makefile.build:548: drivers/gpu/drm/xe] Error 2
make[5]: *** [../scripts/Makefile.build:548: drivers/gpu/drm] Error 2
make[4]: *** [../scripts/Makefile.build:548: drivers/gpu] Error 2
make[3]: *** [../scripts/Makefile.build:548: drivers] Error 2
make[2]: *** [/kernel/Makefile:2101: .] Error 2
make[1]: *** [/kernel/Makefile:248: __sub-make] Error 2
make: *** [Makefile:248: __sub-make] Error 2
[15:50:17] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[15:50:21] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=48
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v21 6/9] drm/xe: Move aux table invalidation to ring ops
2026-03-10 12:34 ` [PATCH v21 6/9] drm/xe: Move aux table invalidation to ring ops Tvrtko Ursulin
@ 2026-03-10 21:47 ` Matthew Brost
0 siblings, 0 replies; 18+ messages in thread
From: Matthew Brost @ 2026-03-10 21:47 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: intel-xe, kernel-dev, Rodrigo Vivi
On Tue, Mar 10, 2026 at 12:34:50PM +0000, Tvrtko Ursulin wrote:
> Implement the suggestion of moving the aux invalidation from a helper to a
> ring ops vfunc, together with the suggestion to split the vfunc table of
> video decode and video enhance engines.
>
> With this done the LRC code will be able to access the functionality via
> the newly added ring ops vfunc.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Suggested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
> v2:
> * Removed stray hunks from v1.
>
> v3:
> * Include header for u32.
>
> v4:
> * Forward declare struct xe_gt.
> ---
> drivers/gpu/drm/xe/xe_ring_ops.c | 105 ++++++++++++++++++-------
> drivers/gpu/drm/xe/xe_ring_ops_types.h | 6 ++
> 2 files changed, 83 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
> index 596379e6d742..8947b3091873 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> @@ -48,22 +48,48 @@ static u32 preparser_disable(bool state)
> return MI_ARB_CHECK | BIT(8) | state;
> }
>
> -static int emit_aux_table_inv(struct xe_gt *gt, struct xe_reg reg,
> - u32 *dw, int i)
> +static u32 *
> +__emit_aux_table_inv(u32 *cmd, const struct xe_reg reg, u32 adj_offset)
> {
> - dw[i++] = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(1) | MI_LRI_MMIO_REMAP_EN;
> - dw[i++] = reg.addr + gt->mmio.adj_offset;
> - dw[i++] = AUX_INV;
> - dw[i++] = MI_SEMAPHORE_WAIT_TOKEN |
> - MI_SEMAPHORE_REGISTER_POLL |
> - MI_SEMAPHORE_POLL |
> - MI_SEMAPHORE_SAD_EQ_SDD;
> - dw[i++] = 0;
> - dw[i++] = reg.addr + gt->mmio.adj_offset;
> - dw[i++] = 0;
> - dw[i++] = 0;
> + *cmd++ = MI_LOAD_REGISTER_IMM | MI_LRI_NUM_REGS(1) |
> + MI_LRI_MMIO_REMAP_EN;
> + *cmd++ = reg.addr + adj_offset;
> + *cmd++ = AUX_INV;
> + *cmd++ = MI_SEMAPHORE_WAIT_TOKEN | MI_SEMAPHORE_REGISTER_POLL |
> + MI_SEMAPHORE_POLL | MI_SEMAPHORE_SAD_EQ_SDD;
> + *cmd++ = 0;
> + *cmd++ = reg.addr + adj_offset;
> + *cmd++ = 0;
> + *cmd++ = 0;
>
> - return i;
> + return cmd;
> +}
> +
> +static u32 *emit_aux_table_inv_render_compute(struct xe_gt *gt, u32 *cmd)
> +{
> + return __emit_aux_table_inv(cmd, CCS_AUX_INV, gt->mmio.adj_offset);
> +}
> +
> +static u32 *emit_aux_table_inv_video_decode(struct xe_gt *gt, u32 *cmd)
> +{
> + return __emit_aux_table_inv(cmd, VD0_AUX_INV, gt->mmio.adj_offset);
> +}
> +
> +static u32 *emit_aux_table_inv_video_enhance(struct xe_gt *gt, u32 *cmd)
> +{
> + return __emit_aux_table_inv(cmd, VE0_AUX_INV, gt->mmio.adj_offset);
> +}
> +
> +static int emit_aux_table_inv(struct xe_hw_engine *hwe, u32 *dw, int i)
> +{
> + struct xe_gt *gt = hwe->gt;
> + u32 *(*emit)(struct xe_gt *gt, u32 *cmd) =
> + gt->ring_ops[hwe->class]->emit_aux_table_inv;
> +
> + if (emit)
> + return emit(gt, dw + i) - dw;
> + else
> + return i;
> }
>
> static int emit_user_interrupt(u32 *dw, int i)
> @@ -327,7 +353,6 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
> u32 ppgtt_flag = get_ppgtt_flag(job);
> struct xe_gt *gt = job->q->gt;
> struct xe_device *xe = gt_to_xe(gt);
> - bool decode = job->q->class == XE_ENGINE_CLASS_VIDEO_DECODE;
>
> *head = lrc->ring.tail;
>
> @@ -336,12 +361,7 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
> dw[i++] = preparser_disable(true);
>
> /* hsdes: 1809175790 */
> - if (has_aux_ccs(xe)) {
> - if (decode)
> - i = emit_aux_table_inv(gt, VD0_AUX_INV, dw, i);
> - else
> - i = emit_aux_table_inv(gt, VE0_AUX_INV, dw, i);
> - }
> + i = emit_aux_table_inv(job->q->hwe, dw, i);
>
> if (job->ring_ops_flush_tlb)
> i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
> @@ -384,7 +404,6 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
> struct xe_gt *gt = job->q->gt;
> struct xe_device *xe = gt_to_xe(gt);
> bool lacks_render = !(gt->info.engine_mask & XE_HW_ENGINE_RCS_MASK);
> - const bool aux_ccs = has_aux_ccs(xe);
> u32 mask_flags = 0;
>
> *head = lrc->ring.tail;
> @@ -395,7 +414,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
> * On AuxCCS platforms the invalidation of the Aux table requires
> * quiescing the memory traffic beforehand.
> */
> - if (aux_ccs)
> + if (has_aux_ccs(xe))
> i = emit_render_cache_flush(job, dw, i);
>
> dw[i++] = preparser_disable(true);
> @@ -408,8 +427,7 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
> i = emit_pipe_invalidate(job->q, mask_flags, job->ring_ops_flush_tlb, dw, i);
>
> /* hsdes: 1809175790 */
> - if (aux_ccs)
> - i = emit_aux_table_inv(gt, CCS_AUX_INV, dw, i);
> + i = emit_aux_table_inv(job->q->hwe, dw, i);
>
> dw[i++] = preparser_disable(false);
>
> @@ -534,7 +552,11 @@ static const struct xe_ring_ops ring_ops_gen12_copy = {
> .emit_job = emit_job_gen12_copy,
> };
>
> -static const struct xe_ring_ops ring_ops_gen12_video = {
> +static const struct xe_ring_ops ring_ops_gen12_video_decode = {
> + .emit_job = emit_job_gen12_video,
> +};
> +
> +static const struct xe_ring_ops ring_ops_gen12_video_enhance = {
> .emit_job = emit_job_gen12_video,
> };
>
> @@ -542,20 +564,47 @@ static const struct xe_ring_ops ring_ops_gen12_render_compute = {
> .emit_job = emit_job_gen12_render_compute,
> };
>
> +static const struct xe_ring_ops auxccs_ring_ops_gen12_video_decode = {
> + .emit_job = emit_job_gen12_video,
> + .emit_aux_table_inv = emit_aux_table_inv_video_decode,
> +};
> +
> +static const struct xe_ring_ops auxccs_ring_ops_gen12_video_enhance = {
> + .emit_job = emit_job_gen12_video,
> + .emit_aux_table_inv = emit_aux_table_inv_video_enhance,
> +};
> +
> +static const struct xe_ring_ops auxccs_ring_ops_gen12_render_compute = {
> + .emit_job = emit_job_gen12_render_compute,
> + .emit_aux_table_inv = emit_aux_table_inv_render_compute,
> +};
> +
> const struct xe_ring_ops *
> xe_ring_ops_get(struct xe_gt *gt, enum xe_engine_class class)
> {
> + struct xe_device *xe = gt_to_xe(gt);
> +
> switch (class) {
> case XE_ENGINE_CLASS_OTHER:
> return &ring_ops_gen12_gsc;
> case XE_ENGINE_CLASS_COPY:
> return &ring_ops_gen12_copy;
> case XE_ENGINE_CLASS_VIDEO_DECODE:
> + if (has_aux_ccs(xe))
> + return &auxccs_ring_ops_gen12_video_decode;
> + else
> + return &ring_ops_gen12_video_decode;
> case XE_ENGINE_CLASS_VIDEO_ENHANCE:
> - return &ring_ops_gen12_video;
> + if (has_aux_ccs(xe))
> + return &auxccs_ring_ops_gen12_video_enhance;
> + else
> + return &ring_ops_gen12_video_enhance;
> case XE_ENGINE_CLASS_RENDER:
> case XE_ENGINE_CLASS_COMPUTE:
> - return &ring_ops_gen12_render_compute;
> + if (has_aux_ccs(xe))
> + return &auxccs_ring_ops_gen12_render_compute;
> + else
> + return &ring_ops_gen12_render_compute;
> default:
> return NULL;
> }
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops_types.h b/drivers/gpu/drm/xe/xe_ring_ops_types.h
> index 1197fc0bf2af..52ff96bc4100 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops_types.h
> +++ b/drivers/gpu/drm/xe/xe_ring_ops_types.h
> @@ -6,6 +6,9 @@
> #ifndef _XE_RING_OPS_TYPES_H_
> #define _XE_RING_OPS_TYPES_H_
>
> +#include <linux/types.h>
> +
> +struct xe_gt;
> struct xe_sched_job;
>
> #define MAX_JOB_SIZE_DW 74
> @@ -17,6 +20,9 @@ struct xe_sched_job;
> struct xe_ring_ops {
> /** @emit_job: Write job to ring */
> void (*emit_job)(struct xe_sched_job *job);
> +
> + /** @emit_aux_table_inv: Emit aux table invalidation to the ring */
> + u32 *(*emit_aux_table_inv)(struct xe_gt *gt, u32 *cmd);
> };
>
> #endif
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v21 8/9] drm/xe/display: Add support for AuxCCS
2026-03-10 12:34 ` [PATCH v21 8/9] drm/xe/display: Add support for AuxCCS Tvrtko Ursulin
@ 2026-03-11 22:24 ` Rodrigo Vivi
2026-03-12 11:58 ` Tvrtko Ursulin
0 siblings, 1 reply; 18+ messages in thread
From: Rodrigo Vivi @ 2026-03-11 22:24 UTC (permalink / raw)
To: Tvrtko Ursulin, Maarten Lankhorst, Thomas Hellström
Cc: intel-xe, kernel-dev, Juha-Pekka Heikkila, Michael J. Ruhl
On Tue, Mar 10, 2026 at 12:34:52PM +0000, Tvrtko Ursulin wrote:
> Add support for mapping the auxiliary CCS buffer into the DPT page tables.
>
> This will allow for better power efficiency by enabling the render
> compression frame buffer modifiers such as
> I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS in a following patch.
>
> We do this by refactoring the code a bit so handling for the linear
> auxiliary frame buffer can be added in a tidy way. Also replace some
> hardcoded constants.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> Cc: Michael J. Ruhl <michael.j.ruhl@intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
I hope one of them can review this patch.
To me, it would be easier if we could split this patch into smaller
easier to follow patches.
> ---
> drivers/gpu/drm/xe/display/xe_fb_pin.c | 111 ++++++++++++++++++-------
> 1 file changed, 81 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> index df7d305c6fcd..fe4b66e5c3ad 100644
> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> @@ -49,33 +49,94 @@ write_dpt_rotated(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs, u32 bo_
> *dpt_ofs = ALIGN(*dpt_ofs, 4096);
> }
>
> +static unsigned int
> +write_dpt_padding(struct iosys_map *map, unsigned int dest, unsigned int pad)
> +{
> + /* The DE ignores the PTEs for the padding tiles */
> + return dest + pad * sizeof(u64);
> +}
> +
> +static unsigned int
> +write_dpt_remapped_linear(struct xe_bo *bo, struct iosys_map *map,
> + unsigned int dest,
> + const struct intel_remapped_plane_info *plane)
> +{
> + struct xe_device *xe = xe_bo_device(bo);
> + struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
> + const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
> + xe->pat.idx[XE_CACHE_NONE]);
> + unsigned int offset = plane->offset * XE_PAGE_SIZE;
> + unsigned int size = plane->size;
> +
> + while (size--) {
> + u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
> +
> + iosys_map_wr(map, dest, u64, addr | pte);
> + dest += sizeof(u64);
> + offset += XE_PAGE_SIZE;
> + }
> +
> + return dest;
> +}
> +
> +static unsigned int
> +write_dpt_remapped_tiled(struct xe_bo *bo, struct iosys_map *map,
> + unsigned int dest,
> + const struct intel_remapped_plane_info *plane)
> +{
> + struct xe_device *xe = xe_bo_device(bo);
> + struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
> + const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
> + xe->pat.idx[XE_CACHE_NONE]);
> + unsigned int offset, column, row;
> +
> + for (row = 0; row < plane->height; row++) {
> + offset = (plane->offset + plane->src_stride * row) *
> + XE_PAGE_SIZE;
> +
> + for (column = 0; column < plane->width; column++) {
> + u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
> +
> + iosys_map_wr(map, dest, u64, addr | pte);
> + dest += sizeof(u64);
> + offset += XE_PAGE_SIZE;
> + }
> +
> + dest = write_dpt_padding(map, dest,
> + plane->dst_stride - plane->width);
> + }
> +
> + return dest;
> +}
> +
> static void
> -write_dpt_remapped(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs,
> - u32 bo_ofs, u32 width, u32 height, u32 src_stride,
> - u32 dst_stride)
> +write_dpt_remapped(struct xe_bo *bo,
> + const struct intel_remapped_info *remap_info,
> + struct iosys_map *map)
> {
> - struct xe_device *xe = xe_bo_device(bo);
> - struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
> - u32 column, row;
> - u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo, xe->pat.idx[XE_CACHE_NONE]);
> + unsigned int i, dest = 0;
>
> - for (row = 0; row < height; row++) {
> - u32 src_idx = src_stride * row + bo_ofs;
> + for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++) {
> + const struct intel_remapped_plane_info *plane =
> + &remap_info->plane[i];
>
> - for (column = 0; column < width; column++) {
> - u64 addr = xe_bo_addr(bo, src_idx * XE_PAGE_SIZE, XE_PAGE_SIZE);
> - iosys_map_wr(map, *dpt_ofs, u64, pte | addr);
> + if (!plane->width && !plane->height && !plane->linear)
> + continue;
>
> - *dpt_ofs += 8;
> - src_idx++;
> + if (remap_info->plane_alignment) {
> + const unsigned int index = dest / sizeof(u64);
> + const unsigned int pad =
> + ALIGN(index, remap_info->plane_alignment) -
> + index;
> +
> + dest = write_dpt_padding(map, dest, pad);
> }
>
> - /* The DE ignores the PTEs for the padding tiles */
> - *dpt_ofs += (dst_stride - width) * 8;
> + if (plane->linear)
> + dest = write_dpt_remapped_linear(bo, map, dest, plane);
> + else
> + dest = write_dpt_remapped_tiled(bo, map, dest, plane);
> }
> -
> - /* Align to next page */
> - *dpt_ofs = ALIGN(*dpt_ofs, 4096);
> }
>
> static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
> @@ -138,17 +199,7 @@ static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
> iosys_map_wr(&dpt->vmap, x * 8, u64, pte | addr);
> }
> } else if (view->type == I915_GTT_VIEW_REMAPPED) {
> - const struct intel_remapped_info *remap_info = &view->remapped;
> - u32 i, dpt_ofs = 0;
> -
> - for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++)
> - write_dpt_remapped(bo, &dpt->vmap, &dpt_ofs,
> - remap_info->plane[i].offset,
> - remap_info->plane[i].width,
> - remap_info->plane[i].height,
> - remap_info->plane[i].src_stride,
> - remap_info->plane[i].dst_stride);
> -
> + write_dpt_remapped(bo, &view->remapped, &dpt->vmap);
> } else {
> const struct intel_rotation_info *rot_info = &view->rotated;
> u32 i, dpt_ofs = 0;
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v21 8/9] drm/xe/display: Add support for AuxCCS
2026-03-11 22:24 ` Rodrigo Vivi
@ 2026-03-12 11:58 ` Tvrtko Ursulin
2026-03-16 12:04 ` Shankar, Uma
0 siblings, 1 reply; 18+ messages in thread
From: Tvrtko Ursulin @ 2026-03-12 11:58 UTC (permalink / raw)
To: Rodrigo Vivi, Maarten Lankhorst, Thomas Hellström
Cc: intel-xe, kernel-dev, Juha-Pekka Heikkila, Michael J. Ruhl
On 11/03/2026 22:24, Rodrigo Vivi wrote:
> On Tue, Mar 10, 2026 at 12:34:52PM +0000, Tvrtko Ursulin wrote:
>> Add support for mapping the auxiliary CCS buffer into the DPT page tables.
>>
>> This will allow for better power efficiency by enabling the render
>> compression frame buffer modifiers such as
>> I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS in a following patch.
>>
>> We do this by refactoring the code a bit so handling for the linear
>> auxiliary frame buffer can be added in a tidy way. Also replace some
>> hardcoded constants.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
>> Cc: Michael J. Ruhl <michael.j.ruhl@intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>
> I hope one of them can review this patch.
>
> To me, it would be easier if we could split this patch into smaller
> easier to follow patches.
I didn't have any immediate nice ideas on how. Maybe it helps if I paste
how the code looks after the patch?
It is like this:
static unsigned int
write_dpt_padding(struct iosys_map *map, unsigned int dest, unsigned int
pad)
{
/* The DE ignores the PTEs for the padding tiles */
return dest + pad * sizeof(u64);
}
static unsigned int
write_dpt_remapped_linear(struct xe_bo *bo, struct iosys_map *map,
unsigned int dest,
const struct intel_remapped_plane_info *plane)
{
struct xe_device *xe = xe_bo_device(bo);
struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
xe->pat.idx[XE_CACHE_NONE]);
unsigned int offset = plane->offset * XE_PAGE_SIZE;
unsigned int size = plane->size;
while (size--) {
u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
iosys_map_wr(map, dest, u64, addr | pte);
dest += sizeof(u64);
offset += XE_PAGE_SIZE;
}
return dest;
}
static unsigned int
write_dpt_remapped_tiled(struct xe_bo *bo, struct iosys_map *map,
unsigned int dest,
const struct intel_remapped_plane_info *plane)
{
struct xe_device *xe = xe_bo_device(bo);
struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
xe->pat.idx[XE_CACHE_NONE]);
unsigned int offset, column, row;
for (row = 0; row < plane->height; row++) {
offset = (plane->offset + plane->src_stride * row) *
XE_PAGE_SIZE;
for (column = 0; column < plane->width; column++) {
u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
iosys_map_wr(map, dest, u64, addr | pte);
dest += sizeof(u64);
offset += XE_PAGE_SIZE;
}
dest = write_dpt_padding(map, dest,
plane->dst_stride - plane->width);
}
return dest;
}
static void
write_dpt_remapped(struct xe_bo *bo,
const struct intel_remapped_info *remap_info,
struct iosys_map *map)
{
unsigned int i, dest = 0;
for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++) {
const struct intel_remapped_plane_info *plane =
&remap_info->plane[i];
if (!plane->width && !plane->height && !plane->linear)
continue;
if (remap_info->plane_alignment) {
const unsigned int index = dest / sizeof(u64);
const unsigned int pad =
ALIGN(index, remap_info->plane_alignment) -
index;
dest = write_dpt_padding(map, dest, pad);
}
if (plane->linear)
dest = write_dpt_remapped_linear(bo, map, dest, plane);
else
dest = write_dpt_remapped_tiled(bo, map, dest, plane);
}
}
Essentially moving the plane loop to a helper and adding support for
aux/linear plane, while attempting to have the lower level helpers with
consistent function signatures.
If the paste does not help I can revisit and try to somehow split it. It
has been more than a year that I originally wrote this so some details
have evaporated from my head by now.
Regards,
Tvrtko
>> ---
>> drivers/gpu/drm/xe/display/xe_fb_pin.c | 111 ++++++++++++++++++-------
>> 1 file changed, 81 insertions(+), 30 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> index df7d305c6fcd..fe4b66e5c3ad 100644
>> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> @@ -49,33 +49,94 @@ write_dpt_rotated(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs, u32 bo_
>> *dpt_ofs = ALIGN(*dpt_ofs, 4096);
>> }
>>
>> +static unsigned int
>> +write_dpt_padding(struct iosys_map *map, unsigned int dest, unsigned int pad)
>> +{
>> + /* The DE ignores the PTEs for the padding tiles */
>> + return dest + pad * sizeof(u64);
>> +}
>> +
>> +static unsigned int
>> +write_dpt_remapped_linear(struct xe_bo *bo, struct iosys_map *map,
>> + unsigned int dest,
>> + const struct intel_remapped_plane_info *plane)
>> +{
>> + struct xe_device *xe = xe_bo_device(bo);
>> + struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
>> + const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
>> + xe->pat.idx[XE_CACHE_NONE]);
>> + unsigned int offset = plane->offset * XE_PAGE_SIZE;
>> + unsigned int size = plane->size;
>> +
>> + while (size--) {
>> + u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
>> +
>> + iosys_map_wr(map, dest, u64, addr | pte);
>> + dest += sizeof(u64);
>> + offset += XE_PAGE_SIZE;
>> + }
>> +
>> + return dest;
>> +}
>> +
>> +static unsigned int
>> +write_dpt_remapped_tiled(struct xe_bo *bo, struct iosys_map *map,
>> + unsigned int dest,
>> + const struct intel_remapped_plane_info *plane)
>> +{
>> + struct xe_device *xe = xe_bo_device(bo);
>> + struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
>> + const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
>> + xe->pat.idx[XE_CACHE_NONE]);
>> + unsigned int offset, column, row;
>> +
>> + for (row = 0; row < plane->height; row++) {
>> + offset = (plane->offset + plane->src_stride * row) *
>> + XE_PAGE_SIZE;
>> +
>> + for (column = 0; column < plane->width; column++) {
>> + u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
>> +
>> + iosys_map_wr(map, dest, u64, addr | pte);
>> + dest += sizeof(u64);
>> + offset += XE_PAGE_SIZE;
>> + }
>> +
>> + dest = write_dpt_padding(map, dest,
>> + plane->dst_stride - plane->width);
>> + }
>> +
>> + return dest;
>> +}
>> +
>> static void
>> -write_dpt_remapped(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs,
>> - u32 bo_ofs, u32 width, u32 height, u32 src_stride,
>> - u32 dst_stride)
>> +write_dpt_remapped(struct xe_bo *bo,
>> + const struct intel_remapped_info *remap_info,
>> + struct iosys_map *map)
>> {
>> - struct xe_device *xe = xe_bo_device(bo);
>> - struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
>> - u32 column, row;
>> - u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo, xe->pat.idx[XE_CACHE_NONE]);
>> + unsigned int i, dest = 0;
>>
>> - for (row = 0; row < height; row++) {
>> - u32 src_idx = src_stride * row + bo_ofs;
>> + for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++) {
>> + const struct intel_remapped_plane_info *plane =
>> + &remap_info->plane[i];
>>
>> - for (column = 0; column < width; column++) {
>> - u64 addr = xe_bo_addr(bo, src_idx * XE_PAGE_SIZE, XE_PAGE_SIZE);
>> - iosys_map_wr(map, *dpt_ofs, u64, pte | addr);
>> + if (!plane->width && !plane->height && !plane->linear)
>> + continue;
>>
>> - *dpt_ofs += 8;
>> - src_idx++;
>> + if (remap_info->plane_alignment) {
>> + const unsigned int index = dest / sizeof(u64);
>> + const unsigned int pad =
>> + ALIGN(index, remap_info->plane_alignment) -
>> + index;
>> +
>> + dest = write_dpt_padding(map, dest, pad);
>> }
>>
>> - /* The DE ignores the PTEs for the padding tiles */
>> - *dpt_ofs += (dst_stride - width) * 8;
>> + if (plane->linear)
>> + dest = write_dpt_remapped_linear(bo, map, dest, plane);
>> + else
>> + dest = write_dpt_remapped_tiled(bo, map, dest, plane);
>> }
>> -
>> - /* Align to next page */
>> - *dpt_ofs = ALIGN(*dpt_ofs, 4096);
>> }
>>
>> static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
>> @@ -138,17 +199,7 @@ static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
>> iosys_map_wr(&dpt->vmap, x * 8, u64, pte | addr);
>> }
>> } else if (view->type == I915_GTT_VIEW_REMAPPED) {
>> - const struct intel_remapped_info *remap_info = &view->remapped;
>> - u32 i, dpt_ofs = 0;
>> -
>> - for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++)
>> - write_dpt_remapped(bo, &dpt->vmap, &dpt_ofs,
>> - remap_info->plane[i].offset,
>> - remap_info->plane[i].width,
>> - remap_info->plane[i].height,
>> - remap_info->plane[i].src_stride,
>> - remap_info->plane[i].dst_stride);
>> -
>> + write_dpt_remapped(bo, &view->remapped, &dpt->vmap);
>> } else {
>> const struct intel_rotation_info *rot_info = &view->rotated;
>> u32 i, dpt_ofs = 0;
>> --
>> 2.52.0
>>
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH v21 8/9] drm/xe/display: Add support for AuxCCS
2026-03-12 11:58 ` Tvrtko Ursulin
@ 2026-03-16 12:04 ` Shankar, Uma
2026-03-16 12:43 ` Tvrtko Ursulin
0 siblings, 1 reply; 18+ messages in thread
From: Shankar, Uma @ 2026-03-16 12:04 UTC (permalink / raw)
To: Tvrtko Ursulin, Vivi, Rodrigo, Maarten Lankhorst,
Thomas Hellström
Cc: intel-xe@lists.freedesktop.org, kernel-dev@igalia.com,
Juha-Pekka Heikkila, Ruhl, Michael J
> -----Original Message-----
> From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of Tvrtko
> Ursulin
> Sent: Thursday, March 12, 2026 5:29 PM
> To: Vivi, Rodrigo <rodrigo.vivi@intel.com>; Maarten Lankhorst
> <maarten.lankhorst@linux.intel.com>; Thomas Hellström
> <thomas.hellstrom@linux.intel.com>
> Cc: intel-xe@lists.freedesktop.org; kernel-dev@igalia.com; Juha-Pekka Heikkila
> <juhapekka.heikkila@gmail.com>; Ruhl, Michael J <michael.j.ruhl@intel.com>
> Subject: Re: [PATCH v21 8/9] drm/xe/display: Add support for AuxCCS
>
>
> On 11/03/2026 22:24, Rodrigo Vivi wrote:
> > On Tue, Mar 10, 2026 at 12:34:52PM +0000, Tvrtko Ursulin wrote:
> >> Add support for mapping the auxiliary CCS buffer into the DPT page tables.
> >>
> >> This will allow for better power efficiency by enabling the render
> >> compression frame buffer modifiers such as
> >> I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS in a following patch.
> >>
> >> We do this by refactoring the code a bit so handling for the linear
> >> auxiliary frame buffer can be added in a tidy way. Also replace some
> >> hardcoded constants.
> >>
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> >> Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> >> Cc: Michael J. Ruhl <michael.j.ruhl@intel.com>
> >> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> >
> > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> >
> > I hope one of them can review this patch.
> >
> > To me, it would be easier if we could split this patch into smaller
> > easier to follow patches.
>
> I didn't have any immediate nice ideas on how. Maybe it helps if I paste how the
> code looks after the patch?
Hi Tvrtko,
Change looks good in general, some minor comments:
> It is like this:
>
> static unsigned int
> write_dpt_padding(struct iosys_map *map, unsigned int dest, unsigned int
> pad)
> {
> /* The DE ignores the PTEs for the padding tiles */
> return dest + pad * sizeof(u64);
> }
>
> static unsigned int
> write_dpt_remapped_linear(struct xe_bo *bo, struct iosys_map *map,
> unsigned int dest,
> const struct intel_remapped_plane_info *plane) {
> struct xe_device *xe = xe_bo_device(bo);
> struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
> const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
> xe->pat.idx[XE_CACHE_NONE]);
> unsigned int offset = plane->offset * XE_PAGE_SIZE;
I think we should consider for greater than 4GB here where potential of overflow is there.
> unsigned int size = plane->size;
>
> while (size--) {
> u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
>
> iosys_map_wr(map, dest, u64, addr | pte);
> dest += sizeof(u64);
> offset += XE_PAGE_SIZE;
> }
>
> return dest;
> }
>
> static unsigned int
> write_dpt_remapped_tiled(struct xe_bo *bo, struct iosys_map *map,
> unsigned int dest,
> const struct intel_remapped_plane_info *plane) {
> struct xe_device *xe = xe_bo_device(bo);
> struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
> const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
> xe->pat.idx[XE_CACHE_NONE]);
> unsigned int offset, column, row;
>
> for (row = 0; row < plane->height; row++) {
> offset = (plane->offset + plane->src_stride * row) *
> XE_PAGE_SIZE;
Here as well.
>
> for (column = 0; column < plane->width; column++) {
> u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
>
> iosys_map_wr(map, dest, u64, addr | pte);
> dest += sizeof(u64);
> offset += XE_PAGE_SIZE;
> }
>
> dest = write_dpt_padding(map, dest,
> plane->dst_stride - plane->width);
> }
>
> return dest;
> }
>
> static void
> write_dpt_remapped(struct xe_bo *bo,
> const struct intel_remapped_info *remap_info,
> struct iosys_map *map)
> {
> unsigned int i, dest = 0;
>
> for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++) {
> const struct intel_remapped_plane_info *plane =
> &remap_info->plane[i];
>
> if (!plane->width && !plane->height && !plane->linear)
> continue;
>
> if (remap_info->plane_alignment) {
> const unsigned int index = dest / sizeof(u64);
> const unsigned int pad =
> ALIGN(index, remap_info->plane_alignment) -
> index;
>
> dest = write_dpt_padding(map, dest, pad);
> }
>
> if (plane->linear)
> dest = write_dpt_remapped_linear(bo, map, dest, plane);
> else
> dest = write_dpt_remapped_tiled(bo, map, dest, plane);
> }
> }
>
> Essentially moving the plane loop to a helper and adding support for aux/linear
> plane, while attempting to have the lower level helpers with consistent function
> signatures.
>
> If the paste does not help I can revisit and try to somehow split it. It has been
> more than a year that I originally wrote this so some details have evaporated from
> my head by now.
>
> Regards,
>
> Tvrtko
>
> >> ---
> >> drivers/gpu/drm/xe/display/xe_fb_pin.c | 111 ++++++++++++++++++-------
> >> 1 file changed, 81 insertions(+), 30 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> >> b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> >> index df7d305c6fcd..fe4b66e5c3ad 100644
> >> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> >> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> >> @@ -49,33 +49,94 @@ write_dpt_rotated(struct xe_bo *bo, struct iosys_map
> *map, u32 *dpt_ofs, u32 bo_
> >> *dpt_ofs = ALIGN(*dpt_ofs, 4096);
> >> }
> >>
> >> +static unsigned int
> >> +write_dpt_padding(struct iosys_map *map, unsigned int dest, unsigned
> >> +int pad) {
> >> + /* The DE ignores the PTEs for the padding tiles */
> >> + return dest + pad * sizeof(u64);
> >> +}
> >> +
> >> +static unsigned int
> >> +write_dpt_remapped_linear(struct xe_bo *bo, struct iosys_map *map,
> >> + unsigned int dest,
> >> + const struct intel_remapped_plane_info *plane) {
> >> + struct xe_device *xe = xe_bo_device(bo);
> >> + struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
> >> + const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
> >> + xe->pat.idx[XE_CACHE_NONE]);
> >> + unsigned int offset = plane->offset * XE_PAGE_SIZE;
> >> + unsigned int size = plane->size;
> >> +
> >> + while (size--) {
> >> + u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
> >> +
> >> + iosys_map_wr(map, dest, u64, addr | pte);
> >> + dest += sizeof(u64);
> >> + offset += XE_PAGE_SIZE;
> >> + }
> >> +
> >> + return dest;
> >> +}
> >> +
> >> +static unsigned int
> >> +write_dpt_remapped_tiled(struct xe_bo *bo, struct iosys_map *map,
> >> + unsigned int dest,
> >> + const struct intel_remapped_plane_info *plane) {
> >> + struct xe_device *xe = xe_bo_device(bo);
> >> + struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
> >> + const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
> >> + xe->pat.idx[XE_CACHE_NONE]);
> >> + unsigned int offset, column, row;
> >> +
> >> + for (row = 0; row < plane->height; row++) {
> >> + offset = (plane->offset + plane->src_stride * row) *
> >> + XE_PAGE_SIZE;
> >> +
> >> + for (column = 0; column < plane->width; column++) {
> >> + u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
> >> +
> >> + iosys_map_wr(map, dest, u64, addr | pte);
> >> + dest += sizeof(u64);
> >> + offset += XE_PAGE_SIZE;
> >> + }
> >> +
> >> + dest = write_dpt_padding(map, dest,
> >> + plane->dst_stride - plane->width);
> >> + }
> >> +
> >> + return dest;
> >> +}
> >> +
> >> static void
> >> -write_dpt_remapped(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs,
> >> - u32 bo_ofs, u32 width, u32 height, u32 src_stride,
> >> - u32 dst_stride)
> >> +write_dpt_remapped(struct xe_bo *bo,
> >> + const struct intel_remapped_info *remap_info,
> >> + struct iosys_map *map)
> >> {
> >> - struct xe_device *xe = xe_bo_device(bo);
> >> - struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
> >> - u32 column, row;
> >> - u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo, xe-
> >pat.idx[XE_CACHE_NONE]);
> >> + unsigned int i, dest = 0;
> >>
> >> - for (row = 0; row < height; row++) {
> >> - u32 src_idx = src_stride * row + bo_ofs;
> >> + for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++) {
> >> + const struct intel_remapped_plane_info *plane =
> >> + &remap_info->plane[i];
> >>
> >> - for (column = 0; column < width; column++) {
> >> - u64 addr = xe_bo_addr(bo, src_idx * XE_PAGE_SIZE,
> XE_PAGE_SIZE);
> >> - iosys_map_wr(map, *dpt_ofs, u64, pte | addr);
> >> + if (!plane->width && !plane->height && !plane->linear)
Since linear doesn't have width and height explicitly but we go with size, I think it would be
good to flip the checks a bit
if (!plane->linear && !plane->width && !plane->height)
Other than above, changes look good to me.
@juhapekka.heikkila@gmail.com Can you also check and ack once.
Regards,
Uma Shankar
> >> + continue;
> >>
> >> - *dpt_ofs += 8;
> >> - src_idx++;
> >> + if (remap_info->plane_alignment) {
> >> + const unsigned int index = dest / sizeof(u64);
> >> + const unsigned int pad =
> >> + ALIGN(index, remap_info->plane_alignment) -
> >> + index;
> >> +
> >> + dest = write_dpt_padding(map, dest, pad);
> >> }
> >>
> >> - /* The DE ignores the PTEs for the padding tiles */
> >> - *dpt_ofs += (dst_stride - width) * 8;
> >> + if (plane->linear)
> >> + dest = write_dpt_remapped_linear(bo, map, dest, plane);
> >> + else
> >> + dest = write_dpt_remapped_tiled(bo, map, dest, plane);
> >> }
> >> -
> >> - /* Align to next page */
> >> - *dpt_ofs = ALIGN(*dpt_ofs, 4096);
> >> }
> >>
> >> static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
> >> @@ -138,17 +199,7 @@ static int __xe_pin_fb_vma_dpt(const struct
> intel_framebuffer *fb,
> >> iosys_map_wr(&dpt->vmap, x * 8, u64, pte | addr);
> >> }
> >> } else if (view->type == I915_GTT_VIEW_REMAPPED) {
> >> - const struct intel_remapped_info *remap_info = &view-
> >remapped;
> >> - u32 i, dpt_ofs = 0;
> >> -
> >> - for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++)
> >> - write_dpt_remapped(bo, &dpt->vmap, &dpt_ofs,
> >> - remap_info->plane[i].offset,
> >> - remap_info->plane[i].width,
> >> - remap_info->plane[i].height,
> >> - remap_info->plane[i].src_stride,
> >> - remap_info->plane[i].dst_stride);
> >> -
> >> + write_dpt_remapped(bo, &view->remapped, &dpt->vmap);
> >> } else {
> >> const struct intel_rotation_info *rot_info = &view->rotated;
> >> u32 i, dpt_ofs = 0;
> >> --
> >> 2.52.0
> >>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v21 8/9] drm/xe/display: Add support for AuxCCS
2026-03-16 12:04 ` Shankar, Uma
@ 2026-03-16 12:43 ` Tvrtko Ursulin
2026-03-16 15:51 ` Shankar, Uma
0 siblings, 1 reply; 18+ messages in thread
From: Tvrtko Ursulin @ 2026-03-16 12:43 UTC (permalink / raw)
To: Shankar, Uma, Vivi, Rodrigo, Maarten Lankhorst,
Thomas Hellström
Cc: intel-xe@lists.freedesktop.org, kernel-dev@igalia.com,
Juha-Pekka Heikkila, Ruhl, Michael J
Hi Uma,
On 16/03/2026 12:04, Shankar, Uma wrote:
>
>
>> -----Original Message-----
>> From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of Tvrtko
>> Ursulin
>> Sent: Thursday, March 12, 2026 5:29 PM
>> To: Vivi, Rodrigo <rodrigo.vivi@intel.com>; Maarten Lankhorst
>> <maarten.lankhorst@linux.intel.com>; Thomas Hellström
>> <thomas.hellstrom@linux.intel.com>
>> Cc: intel-xe@lists.freedesktop.org; kernel-dev@igalia.com; Juha-Pekka Heikkila
>> <juhapekka.heikkila@gmail.com>; Ruhl, Michael J <michael.j.ruhl@intel.com>
>> Subject: Re: [PATCH v21 8/9] drm/xe/display: Add support for AuxCCS
>>
>>
>> On 11/03/2026 22:24, Rodrigo Vivi wrote:
>>> On Tue, Mar 10, 2026 at 12:34:52PM +0000, Tvrtko Ursulin wrote:
>>>> Add support for mapping the auxiliary CCS buffer into the DPT page tables.
>>>>
>>>> This will allow for better power efficiency by enabling the render
>>>> compression frame buffer modifiers such as
>>>> I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS in a following patch.
>>>>
>>>> We do this by refactoring the code a bit so handling for the linear
>>>> auxiliary frame buffer can be added in a tidy way. Also replace some
>>>> hardcoded constants.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>> Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
>>>> Cc: Michael J. Ruhl <michael.j.ruhl@intel.com>
>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>
>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>
>>> I hope one of them can review this patch.
>>>
>>> To me, it would be easier if we could split this patch into smaller
>>> easier to follow patches.
>>
>> I didn't have any immediate nice ideas on how. Maybe it helps if I paste how the
>> code looks after the patch?
>
> Hi Tvrtko,
> Change looks good in general, some minor comments:
>
>> It is like this:
>>
>> static unsigned int
>> write_dpt_padding(struct iosys_map *map, unsigned int dest, unsigned int
>> pad)
>> {
>> /* The DE ignores the PTEs for the padding tiles */
>> return dest + pad * sizeof(u64);
>> }
>>
>> static unsigned int
>> write_dpt_remapped_linear(struct xe_bo *bo, struct iosys_map *map,
>> unsigned int dest,
>> const struct intel_remapped_plane_info *plane) {
>> struct xe_device *xe = xe_bo_device(bo);
>> struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
>> const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
>> xe->pat.idx[XE_CACHE_NONE]);
>> unsigned int offset = plane->offset * XE_PAGE_SIZE;
>
> I think we should consider for greater than 4GB here where potential of overflow is there.
That would be one huge aux plane, no? :) I can do it, no problem, just
keeping in mind even the current normal view does not handle >4GB
primary planes:
u32 x;
for (x = 0; x < size / XE_PAGE_SIZE; x++) {
u64 addr = xe_bo_addr(bo, x * XE_PAGE_SIZE, XE_PAGE_SIZE);
>
>> unsigned int size = plane->size;
>>
>> while (size--) {
>> u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
>>
>> iosys_map_wr(map, dest, u64, addr | pte);
>> dest += sizeof(u64);
>> offset += XE_PAGE_SIZE;
>> }
>>
>> return dest;
>> }
>>
>> static unsigned int
>> write_dpt_remapped_tiled(struct xe_bo *bo, struct iosys_map *map,
>> unsigned int dest,
>> const struct intel_remapped_plane_info *plane) {
>> struct xe_device *xe = xe_bo_device(bo);
>> struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
>> const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
>> xe->pat.idx[XE_CACHE_NONE]);
>> unsigned int offset, column, row;
>>
>> for (row = 0; row < plane->height; row++) {
>> offset = (plane->offset + plane->src_stride * row) *
>> XE_PAGE_SIZE;
>
> Here as well.
Same thing, the current code does not handle it. I can change it, as
long as someone does not object to it.
>
>>
>> for (column = 0; column < plane->width; column++) {
>> u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
>>
>> iosys_map_wr(map, dest, u64, addr | pte);
>> dest += sizeof(u64);
>> offset += XE_PAGE_SIZE;
>> }
>>
>> dest = write_dpt_padding(map, dest,
>> plane->dst_stride - plane->width);
>> }
>>
>> return dest;
>> }
>>
>> static void
>> write_dpt_remapped(struct xe_bo *bo,
>> const struct intel_remapped_info *remap_info,
>> struct iosys_map *map)
>> {
>> unsigned int i, dest = 0;
>>
>> for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++) {
>> const struct intel_remapped_plane_info *plane =
>> &remap_info->plane[i];
>>
>> if (!plane->width && !plane->height && !plane->linear)
>> continue;
>>
>> if (remap_info->plane_alignment) {
>> const unsigned int index = dest / sizeof(u64);
>> const unsigned int pad =
>> ALIGN(index, remap_info->plane_alignment) -
>> index;
>>
>> dest = write_dpt_padding(map, dest, pad);
>> }
>>
>> if (plane->linear)
>> dest = write_dpt_remapped_linear(bo, map, dest, plane);
>> else
>> dest = write_dpt_remapped_tiled(bo, map, dest, plane);
>> }
>> }
>>
>> Essentially moving the plane loop to a helper and adding support for aux/linear
>> plane, while attempting to have the lower level helpers with consistent function
>> signatures.
>>
>> If the paste does not help I can revisit and try to somehow split it. It has been
>> more than a year that I originally wrote this so some details have evaporated from
>> my head by now.
>>
>> Regards,
>>
>> Tvrtko
>>
>>>> ---
>>>> drivers/gpu/drm/xe/display/xe_fb_pin.c | 111 ++++++++++++++++++-------
>>>> 1 file changed, 81 insertions(+), 30 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>>> b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>>> index df7d305c6fcd..fe4b66e5c3ad 100644
>>>> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>>> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>>> @@ -49,33 +49,94 @@ write_dpt_rotated(struct xe_bo *bo, struct iosys_map
>> *map, u32 *dpt_ofs, u32 bo_
>>>> *dpt_ofs = ALIGN(*dpt_ofs, 4096);
>>>> }
>>>>
>>>> +static unsigned int
>>>> +write_dpt_padding(struct iosys_map *map, unsigned int dest, unsigned
>>>> +int pad) {
>>>> + /* The DE ignores the PTEs for the padding tiles */
>>>> + return dest + pad * sizeof(u64);
>>>> +}
>>>> +
>>>> +static unsigned int
>>>> +write_dpt_remapped_linear(struct xe_bo *bo, struct iosys_map *map,
>>>> + unsigned int dest,
>>>> + const struct intel_remapped_plane_info *plane) {
>>>> + struct xe_device *xe = xe_bo_device(bo);
>>>> + struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
>>>> + const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
>>>> + xe->pat.idx[XE_CACHE_NONE]);
>>>> + unsigned int offset = plane->offset * XE_PAGE_SIZE;
>>>> + unsigned int size = plane->size;
>>>> +
>>>> + while (size--) {
>>>> + u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
>>>> +
>>>> + iosys_map_wr(map, dest, u64, addr | pte);
>>>> + dest += sizeof(u64);
>>>> + offset += XE_PAGE_SIZE;
>>>> + }
>>>> +
>>>> + return dest;
>>>> +}
>>>> +
>>>> +static unsigned int
>>>> +write_dpt_remapped_tiled(struct xe_bo *bo, struct iosys_map *map,
>>>> + unsigned int dest,
>>>> + const struct intel_remapped_plane_info *plane) {
>>>> + struct xe_device *xe = xe_bo_device(bo);
>>>> + struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
>>>> + const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
>>>> + xe->pat.idx[XE_CACHE_NONE]);
>>>> + unsigned int offset, column, row;
>>>> +
>>>> + for (row = 0; row < plane->height; row++) {
>>>> + offset = (plane->offset + plane->src_stride * row) *
>>>> + XE_PAGE_SIZE;
>>>> +
>>>> + for (column = 0; column < plane->width; column++) {
>>>> + u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
>>>> +
>>>> + iosys_map_wr(map, dest, u64, addr | pte);
>>>> + dest += sizeof(u64);
>>>> + offset += XE_PAGE_SIZE;
>>>> + }
>>>> +
>>>> + dest = write_dpt_padding(map, dest,
>>>> + plane->dst_stride - plane->width);
>>>> + }
>>>> +
>>>> + return dest;
>>>> +}
>>>> +
>>>> static void
>>>> -write_dpt_remapped(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs,
>>>> - u32 bo_ofs, u32 width, u32 height, u32 src_stride,
>>>> - u32 dst_stride)
>>>> +write_dpt_remapped(struct xe_bo *bo,
>>>> + const struct intel_remapped_info *remap_info,
>>>> + struct iosys_map *map)
>>>> {
>>>> - struct xe_device *xe = xe_bo_device(bo);
>>>> - struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
>>>> - u32 column, row;
>>>> - u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo, xe-
>>> pat.idx[XE_CACHE_NONE]);
>>>> + unsigned int i, dest = 0;
>>>>
>>>> - for (row = 0; row < height; row++) {
>>>> - u32 src_idx = src_stride * row + bo_ofs;
>>>> + for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++) {
>>>> + const struct intel_remapped_plane_info *plane =
>>>> + &remap_info->plane[i];
>>>>
>>>> - for (column = 0; column < width; column++) {
>>>> - u64 addr = xe_bo_addr(bo, src_idx * XE_PAGE_SIZE,
>> XE_PAGE_SIZE);
>>>> - iosys_map_wr(map, *dpt_ofs, u64, pte | addr);
>>>> + if (!plane->width && !plane->height && !plane->linear)
>
> Since linear doesn't have width and height explicitly but we go with size, I think it would be
> good to flip the checks a bit
>
> if (!plane->linear && !plane->width && !plane->height)
Done locally.
Btw between last Thursday and now I went and started splitting this
patch up between. Locally this patch is now four patches:
drm/xe/display: Move remapped plane loop out of __xe_pin_fb_vma_dpt
drm/xe/display: Change write_dpt_remapped_tiled function signature
drm/xe/display: Respect remapped plane alignment
drm/xe/display: Add support for AuxCCS
I can send this out now or wait for the verdict on the >4GB planes handling.
Regards,
Tvrtko
>
> Other than above, changes look good to me.
> @juhapekka.heikkila@gmail.com Can you also check and ack once.
>
> Regards,
> Uma Shankar
>
>>>> + continue;
>>>>
>>>> - *dpt_ofs += 8;
>>>> - src_idx++;
>>>> + if (remap_info->plane_alignment) {
>>>> + const unsigned int index = dest / sizeof(u64);
>>>> + const unsigned int pad =
>>>> + ALIGN(index, remap_info->plane_alignment) -
>>>> + index;
>>>> +
>>>> + dest = write_dpt_padding(map, dest, pad);
>>>> }
>>>>
>>>> - /* The DE ignores the PTEs for the padding tiles */
>>>> - *dpt_ofs += (dst_stride - width) * 8;
>>>> + if (plane->linear)
>>>> + dest = write_dpt_remapped_linear(bo, map, dest, plane);
>>>> + else
>>>> + dest = write_dpt_remapped_tiled(bo, map, dest, plane);
>>>> }
>>>> -
>>>> - /* Align to next page */
>>>> - *dpt_ofs = ALIGN(*dpt_ofs, 4096);
>>>> }
>>>>
>>>> static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb,
>>>> @@ -138,17 +199,7 @@ static int __xe_pin_fb_vma_dpt(const struct
>> intel_framebuffer *fb,
>>>> iosys_map_wr(&dpt->vmap, x * 8, u64, pte | addr);
>>>> }
>>>> } else if (view->type == I915_GTT_VIEW_REMAPPED) {
>>>> - const struct intel_remapped_info *remap_info = &view-
>>> remapped;
>>>> - u32 i, dpt_ofs = 0;
>>>> -
>>>> - for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++)
>>>> - write_dpt_remapped(bo, &dpt->vmap, &dpt_ofs,
>>>> - remap_info->plane[i].offset,
>>>> - remap_info->plane[i].width,
>>>> - remap_info->plane[i].height,
>>>> - remap_info->plane[i].src_stride,
>>>> - remap_info->plane[i].dst_stride);
>>>> -
>>>> + write_dpt_remapped(bo, &view->remapped, &dpt->vmap);
>>>> } else {
>>>> const struct intel_rotation_info *rot_info = &view->rotated;
>>>> u32 i, dpt_ofs = 0;
>>>> --
>>>> 2.52.0
>>>>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH v21 8/9] drm/xe/display: Add support for AuxCCS
2026-03-16 12:43 ` Tvrtko Ursulin
@ 2026-03-16 15:51 ` Shankar, Uma
0 siblings, 0 replies; 18+ messages in thread
From: Shankar, Uma @ 2026-03-16 15:51 UTC (permalink / raw)
To: Tvrtko Ursulin, Vivi, Rodrigo, Maarten Lankhorst,
Thomas Hellström
Cc: intel-xe@lists.freedesktop.org, kernel-dev@igalia.com,
Juha-Pekka Heikkila, Ruhl, Michael J
> -----Original Message-----
> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> Sent: Monday, March 16, 2026 6:13 PM
> To: Shankar, Uma <uma.shankar@intel.com>; Vivi, Rodrigo
> <rodrigo.vivi@intel.com>; Maarten Lankhorst
> <maarten.lankhorst@linux.intel.com>; Thomas Hellström
> <thomas.hellstrom@linux.intel.com>
> Cc: intel-xe@lists.freedesktop.org; kernel-dev@igalia.com; Juha-Pekka Heikkila
> <juhapekka.heikkila@gmail.com>; Ruhl, Michael J <michael.j.ruhl@intel.com>
> Subject: Re: [PATCH v21 8/9] drm/xe/display: Add support for AuxCCS
>
>
> Hi Uma,
>
> On 16/03/2026 12:04, Shankar, Uma wrote:
> >
> >
> >> -----Original Message-----
> >> From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of
> >> Tvrtko Ursulin
> >> Sent: Thursday, March 12, 2026 5:29 PM
> >> To: Vivi, Rodrigo <rodrigo.vivi@intel.com>; Maarten Lankhorst
> >> <maarten.lankhorst@linux.intel.com>; Thomas Hellström
> >> <thomas.hellstrom@linux.intel.com>
> >> Cc: intel-xe@lists.freedesktop.org; kernel-dev@igalia.com; Juha-Pekka
> >> Heikkila <juhapekka.heikkila@gmail.com>; Ruhl, Michael J
> >> <michael.j.ruhl@intel.com>
> >> Subject: Re: [PATCH v21 8/9] drm/xe/display: Add support for AuxCCS
> >>
> >>
> >> On 11/03/2026 22:24, Rodrigo Vivi wrote:
> >>> On Tue, Mar 10, 2026 at 12:34:52PM +0000, Tvrtko Ursulin wrote:
> >>>> Add support for mapping the auxiliary CCS buffer into the DPT page tables.
> >>>>
> >>>> This will allow for better power efficiency by enabling the render
> >>>> compression frame buffer modifiers such as
> >>>> I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS in a following patch.
> >>>>
> >>>> We do this by refactoring the code a bit so handling for the linear
> >>>> auxiliary frame buffer can be added in a tidy way. Also replace
> >>>> some hardcoded constants.
> >>>>
> >>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
> >>>> Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
> >>>> Cc: Michael J. Ruhl <michael.j.ruhl@intel.com>
> >>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> >>>
> >>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> >>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> >>>
> >>> I hope one of them can review this patch.
> >>>
> >>> To me, it would be easier if we could split this patch into smaller
> >>> easier to follow patches.
> >>
> >> I didn't have any immediate nice ideas on how. Maybe it helps if I
> >> paste how the code looks after the patch?
> >
> > Hi Tvrtko,
> > Change looks good in general, some minor comments:
> >
> >> It is like this:
> >>
> >> static unsigned int
> >> write_dpt_padding(struct iosys_map *map, unsigned int dest, unsigned
> >> int
> >> pad)
> >> {
> >> /* The DE ignores the PTEs for the padding tiles */
> >> return dest + pad * sizeof(u64);
> >> }
> >>
> >> static unsigned int
> >> write_dpt_remapped_linear(struct xe_bo *bo, struct iosys_map *map,
> >> unsigned int dest,
> >> const struct intel_remapped_plane_info *plane) {
> >> struct xe_device *xe = xe_bo_device(bo);
> >> struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
> >> const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
> >> xe->pat.idx[XE_CACHE_NONE]);
> >> unsigned int offset = plane->offset * XE_PAGE_SIZE;
> >
> > I think we should consider for greater than 4GB here where potential of overflow
> is there.
>
> That would be one huge aux plane, no? :) I can do it, no problem, just keeping in
> mind even the current normal view does not handle >4GB primary planes:
Yeah checked for actual limits, seems practically this may not happen. 8k @32bpp
will be ~127MB. Taking 16K as a most practical what a real world can get we will still be
~500MB.
You can ignore it. Sorry for noise.
> u32 x;
>
> for (x = 0; x < size / XE_PAGE_SIZE; x++) {
> u64 addr = xe_bo_addr(bo, x * XE_PAGE_SIZE,
> XE_PAGE_SIZE);
>
> >
> >> unsigned int size = plane->size;
> >>
> >> while (size--) {
> >> u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
> >>
> >> iosys_map_wr(map, dest, u64, addr | pte);
> >> dest += sizeof(u64);
> >> offset += XE_PAGE_SIZE;
> >> }
> >>
> >> return dest;
> >> }
> >>
> >> static unsigned int
> >> write_dpt_remapped_tiled(struct xe_bo *bo, struct iosys_map *map,
> >> unsigned int dest,
> >> const struct intel_remapped_plane_info *plane) {
> >> struct xe_device *xe = xe_bo_device(bo);
> >> struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
> >> const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
> >> xe->pat.idx[XE_CACHE_NONE]);
> >> unsigned int offset, column, row;
> >>
> >> for (row = 0; row < plane->height; row++) {
> >> offset = (plane->offset + plane->src_stride * row) *
> >> XE_PAGE_SIZE;
> >
> > Here as well.
>
> Same thing, the current code does not handle it. I can change it, as long as
> someone does not object to it.
We can skip this.
> >
> >>
> >> for (column = 0; column < plane->width; column++) {
> >> u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
> >>
> >> iosys_map_wr(map, dest, u64, addr | pte);
> >> dest += sizeof(u64);
> >> offset += XE_PAGE_SIZE;
> >> }
> >>
> >> dest = write_dpt_padding(map, dest,
> >> plane->dst_stride - plane->width);
> >> }
> >>
> >> return dest;
> >> }
> >>
> >> static void
> >> write_dpt_remapped(struct xe_bo *bo,
> >> const struct intel_remapped_info *remap_info,
> >> struct iosys_map *map)
> >> {
> >> unsigned int i, dest = 0;
> >>
> >> for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++) {
> >> const struct intel_remapped_plane_info *plane =
> >> &remap_info->plane[i];
> >>
> >> if (!plane->width && !plane->height && !plane->linear)
> >> continue;
> >>
> >> if (remap_info->plane_alignment) {
> >> const unsigned int index = dest / sizeof(u64);
> >> const unsigned int pad =
> >> ALIGN(index, remap_info->plane_alignment) -
> >> index;
> >>
> >> dest = write_dpt_padding(map, dest, pad);
> >> }
> >>
> >> if (plane->linear)
> >> dest = write_dpt_remapped_linear(bo, map, dest, plane);
> >> else
> >> dest = write_dpt_remapped_tiled(bo, map, dest, plane);
> >> }
> >> }
> >>
> >> Essentially moving the plane loop to a helper and adding support for
> >> aux/linear plane, while attempting to have the lower level helpers
> >> with consistent function signatures.
> >>
> >> If the paste does not help I can revisit and try to somehow split it.
> >> It has been more than a year that I originally wrote this so some
> >> details have evaporated from my head by now.
> >>
> >> Regards,
> >>
> >> Tvrtko
> >>
> >>>> ---
> >>>> drivers/gpu/drm/xe/display/xe_fb_pin.c | 111 ++++++++++++++++++------
> -
> >>>> 1 file changed, 81 insertions(+), 30 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> >>>> b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> >>>> index df7d305c6fcd..fe4b66e5c3ad 100644
> >>>> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> >>>> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> >>>> @@ -49,33 +49,94 @@ write_dpt_rotated(struct xe_bo *bo, struct
> >>>> iosys_map
> >> *map, u32 *dpt_ofs, u32 bo_
> >>>> *dpt_ofs = ALIGN(*dpt_ofs, 4096);
> >>>> }
> >>>>
> >>>> +static unsigned int
> >>>> +write_dpt_padding(struct iosys_map *map, unsigned int dest,
> >>>> +unsigned int pad) {
> >>>> + /* The DE ignores the PTEs for the padding tiles */
> >>>> + return dest + pad * sizeof(u64);
> >>>> +}
> >>>> +
> >>>> +static unsigned int
> >>>> +write_dpt_remapped_linear(struct xe_bo *bo, struct iosys_map *map,
> >>>> + unsigned int dest,
> >>>> + const struct intel_remapped_plane_info *plane) {
> >>>> + struct xe_device *xe = xe_bo_device(bo);
> >>>> + struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
> >>>> + const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
> >>>> + xe->pat.idx[XE_CACHE_NONE]);
> >>>> + unsigned int offset = plane->offset * XE_PAGE_SIZE;
> >>>> + unsigned int size = plane->size;
> >>>> +
> >>>> + while (size--) {
> >>>> + u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
> >>>> +
> >>>> + iosys_map_wr(map, dest, u64, addr | pte);
> >>>> + dest += sizeof(u64);
> >>>> + offset += XE_PAGE_SIZE;
> >>>> + }
> >>>> +
> >>>> + return dest;
> >>>> +}
> >>>> +
> >>>> +static unsigned int
> >>>> +write_dpt_remapped_tiled(struct xe_bo *bo, struct iosys_map *map,
> >>>> + unsigned int dest,
> >>>> + const struct intel_remapped_plane_info *plane) {
> >>>> + struct xe_device *xe = xe_bo_device(bo);
> >>>> + struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
> >>>> + const u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo,
> >>>> + xe->pat.idx[XE_CACHE_NONE]);
> >>>> + unsigned int offset, column, row;
> >>>> +
> >>>> + for (row = 0; row < plane->height; row++) {
> >>>> + offset = (plane->offset + plane->src_stride * row) *
> >>>> + XE_PAGE_SIZE;
> >>>> +
> >>>> + for (column = 0; column < plane->width; column++) {
> >>>> + u64 addr = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
> >>>> +
> >>>> + iosys_map_wr(map, dest, u64, addr | pte);
> >>>> + dest += sizeof(u64);
> >>>> + offset += XE_PAGE_SIZE;
> >>>> + }
> >>>> +
> >>>> + dest = write_dpt_padding(map, dest,
> >>>> + plane->dst_stride - plane->width);
> >>>> + }
> >>>> +
> >>>> + return dest;
> >>>> +}
> >>>> +
> >>>> static void
> >>>> -write_dpt_remapped(struct xe_bo *bo, struct iosys_map *map, u32
> *dpt_ofs,
> >>>> - u32 bo_ofs, u32 width, u32 height, u32 src_stride,
> >>>> - u32 dst_stride)
> >>>> +write_dpt_remapped(struct xe_bo *bo,
> >>>> + const struct intel_remapped_info *remap_info,
> >>>> + struct iosys_map *map)
> >>>> {
> >>>> - struct xe_device *xe = xe_bo_device(bo);
> >>>> - struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
> >>>> - u32 column, row;
> >>>> - u64 pte = xe_ggtt_encode_pte_flags(ggtt, bo, xe-
> >>> pat.idx[XE_CACHE_NONE]);
> >>>> + unsigned int i, dest = 0;
> >>>>
> >>>> - for (row = 0; row < height; row++) {
> >>>> - u32 src_idx = src_stride * row + bo_ofs;
> >>>> + for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++) {
> >>>> + const struct intel_remapped_plane_info *plane =
> >>>> + &remap_info->plane[i];
> >>>>
> >>>> - for (column = 0; column < width; column++) {
> >>>> - u64 addr = xe_bo_addr(bo, src_idx * XE_PAGE_SIZE,
> >> XE_PAGE_SIZE);
> >>>> - iosys_map_wr(map, *dpt_ofs, u64, pte | addr);
> >>>> + if (!plane->width && !plane->height && !plane->linear)
> >
> > Since linear doesn't have width and height explicitly but we go with
> > size, I think it would be good to flip the checks a bit
> >
> > if (!plane->linear && !plane->width && !plane->height)
>
> Done locally.
>
> Btw between last Thursday and now I went and started splitting this patch up
> between. Locally this patch is now four patches:
>
> drm/xe/display: Move remapped plane loop out of __xe_pin_fb_vma_dpt
> drm/xe/display: Change write_dpt_remapped_tiled function signature
> drm/xe/display: Respect remapped plane alignment
> drm/xe/display: Add support for AuxCCS
>
> I can send this out now or wait for the verdict on the >4GB planes handling.
Sure, will check them out.
Regards,
Uma Shankar
> Regards,
>
> Tvrtko
>
> >
> > Other than above, changes look good to me.
> > @juhapekka.heikkila@gmail.com Can you also check and ack once.
> >
> > Regards,
> > Uma Shankar
> >
> >>>> + continue;
> >>>>
> >>>> - *dpt_ofs += 8;
> >>>> - src_idx++;
> >>>> + if (remap_info->plane_alignment) {
> >>>> + const unsigned int index = dest / sizeof(u64);
> >>>> + const unsigned int pad =
> >>>> + ALIGN(index, remap_info->plane_alignment) -
> >>>> + index;
> >>>> +
> >>>> + dest = write_dpt_padding(map, dest, pad);
> >>>> }
> >>>>
> >>>> - /* The DE ignores the PTEs for the padding tiles */
> >>>> - *dpt_ofs += (dst_stride - width) * 8;
> >>>> + if (plane->linear)
> >>>> + dest = write_dpt_remapped_linear(bo, map, dest, plane);
> >>>> + else
> >>>> + dest = write_dpt_remapped_tiled(bo, map, dest, plane);
> >>>> }
> >>>> -
> >>>> - /* Align to next page */
> >>>> - *dpt_ofs = ALIGN(*dpt_ofs, 4096);
> >>>> }
> >>>>
> >>>> static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer
> >>>> *fb, @@ -138,17 +199,7 @@ static int __xe_pin_fb_vma_dpt(const
> >>>> struct
> >> intel_framebuffer *fb,
> >>>> iosys_map_wr(&dpt->vmap, x * 8, u64, pte |
> addr);
> >>>> }
> >>>> } else if (view->type == I915_GTT_VIEW_REMAPPED) {
> >>>> - const struct intel_remapped_info *remap_info = &view-
> >>> remapped;
> >>>> - u32 i, dpt_ofs = 0;
> >>>> -
> >>>> - for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++)
> >>>> - write_dpt_remapped(bo, &dpt->vmap, &dpt_ofs,
> >>>> - remap_info->plane[i].offset,
> >>>> - remap_info->plane[i].width,
> >>>> - remap_info->plane[i].height,
> >>>> - remap_info->plane[i].src_stride,
> >>>> - remap_info->plane[i].dst_stride);
> >>>> -
> >>>> + write_dpt_remapped(bo, &view->remapped, &dpt->vmap);
> >>>> } else {
> >>>> const struct intel_rotation_info *rot_info = &view->rotated;
> >>>> u32 i, dpt_ofs = 0;
> >>>> --
> >>>> 2.52.0
> >>>>
> >
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-03-16 15:51 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10 12:34 [PATCH v21 0/9] AuxCCS handling and render compression modifiers Tvrtko Ursulin
2026-03-10 12:34 ` [PATCH v21 1/9] drm/xe: Rename XE_BO_FLAG_SCANOUT to XE_BO_FLAG_FORCE_WC Tvrtko Ursulin
2026-03-10 12:34 ` [PATCH v21 2/9] drm/xe: Use write-combine mapping when populating DPT Tvrtko Ursulin
2026-03-10 12:34 ` [PATCH v21 3/9] drm/xe/xelpg: Limit AuxCCS ring buffer programming to Alderlake Tvrtko Ursulin
2026-03-10 12:34 ` [PATCH v21 4/9] drm/xe/xelp: Quiesce memory traffic before invalidating AuxCCS Tvrtko Ursulin
2026-03-10 12:34 ` [PATCH v21 5/9] drm/xe/xelp: Wait for AuxCCS invalidation to complete Tvrtko Ursulin
2026-03-10 12:34 ` [PATCH v21 6/9] drm/xe: Move aux table invalidation to ring ops Tvrtko Ursulin
2026-03-10 21:47 ` Matthew Brost
2026-03-10 12:34 ` [PATCH v21 7/9] drm/xe/xelp: Add AuxCCS invalidation to the indirect context workarounds Tvrtko Ursulin
2026-03-10 12:34 ` [PATCH v21 8/9] drm/xe/display: Add support for AuxCCS Tvrtko Ursulin
2026-03-11 22:24 ` Rodrigo Vivi
2026-03-12 11:58 ` Tvrtko Ursulin
2026-03-16 12:04 ` Shankar, Uma
2026-03-16 12:43 ` Tvrtko Ursulin
2026-03-16 15:51 ` Shankar, Uma
2026-03-10 12:34 ` [PATCH v21 9/9] drm/xe/xelp: Expose AuxCCS frame buffer modifiers on Alderlake-P Tvrtko Ursulin
2026-03-10 15:50 ` ✗ CI.checkpatch: warning for AuxCCS handling and render compression modifiers Patchwork
2026-03-10 15:50 ` ✗ CI.KUnit: failure " Patchwork
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox