[Intel-xe] [PATCH v2 0/6] PAT and cache coherency support

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [Intel-xe] [PATCH v2 0/6] PAT and cache coherency support
@ 2023-09-14 15:31 Matthew Auld
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 1/6] drm/xe/uapi: Add support for cache and coherency mode Matthew Auld
                   ` (8 more replies)
  0 siblings, 9 replies; 28+ messages in thread
From: Matthew Auld @ 2023-09-14 15:31 UTC (permalink / raw)
  To: intel-xe

Branch available here (lightly tested):
https://gitlab.freedesktop.org/mwa/kernel/-/tree/xe-pat-index?ref_type=heads

Series still needs some more testing. Also note that the series directly depends
on the WIP patch here: https://patchwork.freedesktop.org/series/122708/

Goal here is to allow userspace to directly control the pat_index when mapping
memory via the ppGTT, in addtion to the CPU caching mode for system memory. This
is very much needed on newer igpu platforms which allow incoherent GT access,
where the choice over the cache level and expected coherency is best left to
userspace depending on their usecase.  In the future there may also be other
stuff encoded in the pat_index, so giving userspace direct control will also be
needed there.

To support this we added new gem_create uAPI for selecting the CPU cache
mode to use for system memory, including the expected GPU coherency mode. There
are various restrictions here for the selected coherency mode and compatible CPU
cache modes.  With that in place the actual pat_index can now be provided as
part of vm_bind. The only restriction is that the coherency mode of the
pat_index must be at least as coherent as the gem_create coherency mode. There
are also some special cases like with userptr and dma-buf.

v2:
  - Loads of improvements/tweaks. Main changes are to now allow
    gem_create.coh_mode <= coh_mode(pat_index), rather than it needing to match
    exactly. This simplifies the dma-buf policy from userspace pov. Also we now
    only consider COH_NONE and COH_AT_LEAST_1WAY.

-- 
2.41.0

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Intel-xe] [PATCH v2 1/6] drm/xe/uapi: Add support for cache and coherency mode
  2023-09-14 15:31 [Intel-xe] [PATCH v2 0/6] PAT and cache coherency support Matthew Auld
@ 2023-09-14 15:31 ` Matthew Auld
  2023-09-14 23:47   ` Matt Roper
  2023-09-21 20:07   ` Souza, Jose
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 2/6] drm/xe: move pat_table into device info Matthew Auld
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 28+ messages in thread
From: Matthew Auld @ 2023-09-14 15:31 UTC (permalink / raw)
  To: intel-xe
  Cc: Filip Hazubski, Lucas De Marchi, Carl Zhang, Effie Yu, Matt Roper

From: Pallavi Mishra <pallavi.mishra@intel.com>

Allow userspace to specify the CPU caching mode to use for system memory
in addition to coherency modes during object creation. Modify gem create
handler and introduce xe_bo_create_user to replace xe_bo_create. In a
later patch we will support setting the pat_index as part of vm_bind,
where expectation is that the coherency mode extracted from the
pat_index must match the one set at object creation.

v2
  - s/smem_caching/smem_cpu_caching/ and
    s/XE_GEM_CACHING/XE_GEM_CPU_CACHING/. (Matt Roper)
  - Drop COH_2WAY and just use COH_NONE + COH_AT_LEAST_1WAY; KMD mostly
    just cares that zeroing/swap-in can't be bypassed with the given
    smem_caching mode. (Matt Roper)
  - Fix broken range check for coh_mode and smem_cpu_caching and also
    don't use constant value, but the already defined macros. (José)
  - Prefer switch statement for smem_cpu_caching -> ttm_caching. (José)
  - Add note in kernel-doc for dgpu and coherency modes for system
    memory. (José)

Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
Co-authored-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Filip Hazubski <filip.hazubski@intel.com>
Cc: Carl Zhang <carl.zhang@intel.com>
Cc: Effie Yu <effie.yu@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c       | 105 ++++++++++++++++++++++++++-----
 drivers/gpu/drm/xe/xe_bo.h       |   3 +-
 drivers/gpu/drm/xe/xe_bo_types.h |  10 +++
 drivers/gpu/drm/xe/xe_dma_buf.c  |   5 +-
 include/uapi/drm/xe_drm.h        |  57 ++++++++++++++++-
 5 files changed, 158 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 27726d4f3423..f3facd788f15 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -325,7 +325,7 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
 	struct xe_device *xe = xe_bo_device(bo);
 	struct xe_ttm_tt *tt;
 	unsigned long extra_pages;
-	enum ttm_caching caching = ttm_cached;
+	enum ttm_caching caching;
 	int err;
 
 	tt = kzalloc(sizeof(*tt), GFP_KERNEL);
@@ -339,13 +339,25 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
 		extra_pages = DIV_ROUND_UP(xe_device_ccs_bytes(xe, bo->size),
 					   PAGE_SIZE);
 
+	switch (bo->smem_cpu_caching) {
+	case XE_GEM_CPU_CACHING_WC:
+		caching = ttm_write_combined;
+		break;
+	case XE_GEM_CPU_CACHING_UC:
+		caching = ttm_uncached;
+		break;
+	default:
+		caching = ttm_cached;
+		break;
+	}
+
 	/*
 	 * Display scanout is always non-coherent with the CPU cache.
 	 *
 	 * For Xe_LPG and beyond, PPGTT PTE lookups are also non-coherent and
 	 * require a CPU:WC mapping.
 	 */
-	if (bo->flags & XE_BO_SCANOUT_BIT ||
+	if ((!bo->smem_cpu_caching && bo->flags & XE_BO_SCANOUT_BIT) ||
 	    (xe->info.graphics_verx100 >= 1270 && bo->flags & XE_BO_PAGETABLE))
 		caching = ttm_write_combined;
 
@@ -1184,9 +1196,10 @@ void xe_bo_free(struct xe_bo *bo)
 	kfree(bo);
 }
 
-struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
+struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 				    struct xe_tile *tile, struct dma_resv *resv,
 				    struct ttm_lru_bulk_move *bulk, size_t size,
+				    u16 smem_cpu_caching, u16 coh_mode,
 				    enum ttm_bo_type type, u32 flags)
 {
 	struct ttm_operation_ctx ctx = {
@@ -1224,6 +1237,8 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 	bo->tile = tile;
 	bo->size = size;
 	bo->flags = flags;
+	bo->smem_cpu_caching = smem_cpu_caching;
+	bo->coh_mode = coh_mode;
 	bo->ttm.base.funcs = &xe_gem_object_funcs;
 	bo->props.preferred_mem_class = XE_BO_PROPS_INVALID;
 	bo->props.preferred_gt = XE_BO_PROPS_INVALID;
@@ -1307,10 +1322,11 @@ static int __xe_bo_fixed_placement(struct xe_device *xe,
 }
 
 struct xe_bo *
-xe_bo_create_locked_range(struct xe_device *xe,
-			  struct xe_tile *tile, struct xe_vm *vm,
-			  size_t size, u64 start, u64 end,
-			  enum ttm_bo_type type, u32 flags)
+__xe_bo_create_locked(struct xe_device *xe,
+		      struct xe_tile *tile, struct xe_vm *vm,
+		      size_t size, u64 start, u64 end,
+		      u16 smem_cpu_caching, u16 coh_mode,
+		      enum ttm_bo_type type, u32 flags)
 {
 	struct xe_bo *bo = NULL;
 	int err;
@@ -1331,10 +1347,11 @@ xe_bo_create_locked_range(struct xe_device *xe,
 		}
 	}
 
-	bo = __xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
+	bo = ___xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
 				   vm && !xe_vm_in_fault_mode(vm) &&
 				   flags & XE_BO_CREATE_USER_BIT ?
 				   &vm->lru_bulk_move : NULL, size,
+				   smem_cpu_caching, coh_mode,
 				   type, flags);
 	if (IS_ERR(bo))
 		return bo;
@@ -1368,11 +1385,35 @@ xe_bo_create_locked_range(struct xe_device *xe,
 	return ERR_PTR(err);
 }
 
+struct xe_bo *
+xe_bo_create_locked_range(struct xe_device *xe,
+			  struct xe_tile *tile, struct xe_vm *vm,
+			  size_t size, u64 start, u64 end,
+			  enum ttm_bo_type type, u32 flags)
+{
+	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);
+}
+
 struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
 				  struct xe_vm *vm, size_t size,
 				  enum ttm_bo_type type, u32 flags)
 {
-	return xe_bo_create_locked_range(xe, tile, vm, size, 0, ~0ULL, type, flags);
+	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);
+}
+
+static struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
+				       struct xe_vm *vm, size_t size,
+				       u16 smem_cpu_caching, u16 coh_mode,
+				       enum ttm_bo_type type,
+				       u32 flags)
+{
+	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
+						 smem_cpu_caching, coh_mode, type,
+						 flags | XE_BO_CREATE_USER_BIT);
+	if (!IS_ERR(bo))
+		xe_bo_unlock_vm_held(bo);
+
+	return bo;
 }
 
 struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
@@ -1755,11 +1796,11 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 	struct drm_xe_gem_create *args = data;
 	struct xe_vm *vm = NULL;
 	struct xe_bo *bo;
-	unsigned int bo_flags = XE_BO_CREATE_USER_BIT;
+	unsigned int bo_flags;
 	u32 handle;
 	int err;
 
-	if (XE_IOCTL_DBG(xe, args->extensions) || XE_IOCTL_DBG(xe, args->pad) ||
+	if (XE_IOCTL_DBG(xe, args->extensions) ||
 	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
 		return -EINVAL;
 
@@ -1801,6 +1842,32 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 		bo_flags |= XE_BO_NEEDS_CPU_ACCESS;
 	}
 
+	if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC))
+		return -EINVAL;
+
+	if (bo_flags & XE_BO_CREATE_SYSTEM_BIT) {
+		if (XE_IOCTL_DBG(xe, !args->coh_mode))
+			return -EINVAL;
+
+		if (XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
+			return -EINVAL;
+
+		if (XE_IOCTL_DBG(xe, !IS_DGFX(xe) &&
+				 bo_flags & XE_BO_SCANOUT_BIT &&
+				 args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
+			return -EINVAL;
+
+		if (args->coh_mode == XE_GEM_COH_NONE) {
+			if (XE_IOCTL_DBG(xe, args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
+				return -EINVAL;
+		}
+	} else if (XE_IOCTL_DBG(xe, args->smem_cpu_caching)) {
+		return -EINVAL;
+	}
+
 	if (args->vm_id) {
 		vm = xe_vm_lookup(xef, args->vm_id);
 		if (XE_IOCTL_DBG(xe, !vm))
@@ -1812,8 +1879,10 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 		}
 	}
 
-	bo = xe_bo_create(xe, NULL, vm, args->size, ttm_bo_type_device,
-			  bo_flags);
+	bo = xe_bo_create_user(xe, NULL, vm, args->size,
+			       args->smem_cpu_caching, args->coh_mode,
+			       ttm_bo_type_device,
+			       bo_flags);
 	if (IS_ERR(bo)) {
 		err = PTR_ERR(bo);
 		goto out_vm;
@@ -2105,10 +2174,12 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
 	args->size = ALIGN(mul_u32_u32(args->pitch, args->height),
 			   page_size);
 
-	bo = xe_bo_create(xe, NULL, NULL, args->size, ttm_bo_type_device,
-			  XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
-			  XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
-			  XE_BO_NEEDS_CPU_ACCESS);
+	bo = xe_bo_create_user(xe, NULL, NULL, args->size,
+			       XE_GEM_CPU_CACHING_WC, XE_GEM_COH_NONE,
+			       ttm_bo_type_device,
+			       XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
+			       XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
+			       XE_BO_NEEDS_CPU_ACCESS);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 4a68d869b3b5..4a0ee81fe598 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -81,9 +81,10 @@ struct sg_table;
 struct xe_bo *xe_bo_alloc(void);
 void xe_bo_free(struct xe_bo *bo);
 
-struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
+struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 				    struct xe_tile *tile, struct dma_resv *resv,
 				    struct ttm_lru_bulk_move *bulk, size_t size,
+				    u16 smem_cpu_caching, u16 coh_mode,
 				    enum ttm_bo_type type, u32 flags);
 struct xe_bo *
 xe_bo_create_locked_range(struct xe_device *xe,
diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
index 2ea9ad423170..9bee220a6872 100644
--- a/drivers/gpu/drm/xe/xe_bo_types.h
+++ b/drivers/gpu/drm/xe/xe_bo_types.h
@@ -68,6 +68,16 @@ struct xe_bo {
 	struct llist_node freed;
 	/** @created: Whether the bo has passed initial creation */
 	bool created;
+	/**
+	 * @coh_mode: Coherency setting. Currently only used for userspace
+	 * objects.
+	 */
+	u16 coh_mode;
+	/**
+	 * @smem_cpu_caching: Caching mode for smem. Currently only used for
+	 * userspace objects.
+	 */
+	u16 smem_cpu_caching;
 };
 
 #define intel_bo_to_drm_bo(bo) (&(bo)->ttm.base)
diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index 09343b8b3e96..ac20dbc27a2b 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -200,8 +200,9 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
 	int ret;
 
 	dma_resv_lock(resv, NULL);
-	bo = __xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
-				   ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
+	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
+				    0, 0, /* Will require 1way or 2way for vm_bind */
+				    ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
 	if (IS_ERR(bo)) {
 		ret = PTR_ERR(bo);
 		goto error;
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 00d5cb4ef85e..737bb1d4c6f7 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -456,8 +456,61 @@ struct drm_xe_gem_create {
 	 */
 	__u32 handle;
 
-	/** @pad: MBZ */
-	__u32 pad;
+	/**
+	 * @coh_mode: The coherency mode for this object. This will limit the
+	 * possible @smem_caching values.
+	 *
+	 * Supported values:
+	 *
+	 * XE_GEM_COH_NONE: GPU access is assumed to be not coherent with
+	 * CPU. CPU caches are not snooped.
+	 *
+	 * XE_GEM_COH_AT_LEAST_1WAY:
+	 *
+	 * CPU-GPU coherency must be at least 1WAY.
+	 *
+	 * If 1WAY then GPU access is coherent with CPU (CPU caches are snooped)
+	 * until GPU acquires. The acquire by the GPU is not tracked by CPU
+	 * caches.
+	 *
+	 * If 2WAY then should be fully coherent between GPU and CPU.  Fully
+	 * tracked by CPU caches. Both CPU and GPU caches are snooped.
+	 *
+	 * Note: On dgpu the GPU device never caches system memory (outside of
+	 * the special system-memory-read-only cache, which is anyway flushed by
+	 * KMD when nuking TLBs for a given object so should be no concern to
+	 * userspace). The device should be thought of as always 1WAY coherent,
+	 * with the addition that the GPU never caches system memory. At least
+	 * on current dgpu HW there is no way to turn off snooping so likely the
+	 * different coherency modes of the pat_index make no difference for
+	 * system memory.
+	 */
+#define XE_GEM_COH_NONE			1
+#define XE_GEM_COH_AT_LEAST_1WAY	2
+	__u16 coh_mode;
+
+	/**
+	 * @smem_cpu_caching: The CPU caching mode to select for system memory.
+	 *
+	 * Supported values:
+	 *
+	 * XE_GEM_CPU_CACHING_WB: Allocate the pages with write-back caching.
+	 * On iGPU this can't be used for scanout surfaces. The @coh_mode must
+	 * be XE_GEM_COH_AT_LEAST_1WAY.
+	 *
+	 * XE_GEM_CPU_CACHING_WC: Allocate the pages as write-combined. This is
+	 * uncached. Any @coh_mode is permitted. Scanout surfaces should likely
+	 * use this.
+	 *
+	 * XE_GEM_CPU_CACHING_UC: Allocate the pages as uncached. Any @coh_mode
+	 * is permitted. Scanout surfaces are permitted to use this.
+	 *
+	 * MUST be left as zero for VRAM-only objects.
+	 */
+#define XE_GEM_CPU_CACHING_WB                      1
+#define XE_GEM_CPU_CACHING_WC                      2
+#define XE_GEM_CPU_CACHING_UC                      3
+	__u16 smem_cpu_caching;
 
 	/** @reserved: Reserved */
 	__u64 reserved[2];
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-xe] [PATCH v2 2/6] drm/xe: move pat_table into device info
  2023-09-14 15:31 [Intel-xe] [PATCH v2 0/6] PAT and cache coherency support Matthew Auld
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 1/6] drm/xe/uapi: Add support for cache and coherency mode Matthew Auld
@ 2023-09-14 15:31 ` Matthew Auld
  2023-09-14 23:53   ` Matt Roper
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 3/6] drm/xe/pat: trim the tgl PAT table Matthew Auld
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Matthew Auld @ 2023-09-14 15:31 UTC (permalink / raw)
  To: intel-xe; +Cc: Matt Roper, Lucas De Marchi

We need to able to know the max pat_index range for a given platform, as
well being able to lookup the pat_index for a given platform in upcoming
vm_bind uapi, where userspace can directly provide the pat_index. Move
the platform definition of the pat_table into the device info with the
idea of encoding more information about each pat_index in a future
patch.

v2 (Lucas):
  - s/ENODEV/ENOTSUPP/
  - s/xe_pat_fill_info/xe_pat_init_early/
  - Prefer new pat substruct in xe_info.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h | 11 ++++++++
 drivers/gpu/drm/xe/xe_pat.c          | 39 ++++++++++++++++++----------
 drivers/gpu/drm/xe/xe_pat.h          |  1 +
 drivers/gpu/drm/xe/xe_pci.c          |  7 +++++
 4 files changed, 44 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 5037b8c180b8..6c50d0f03466 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -238,6 +238,17 @@ struct xe_device {
 		/** @enable_display: display enabled */
 		u8 enable_display:1;
 
+		/** @pat: Platform information related to PAT settings. */
+		struct {
+			/**
+			 * @table: The PAT table encoding for every pat_index
+			 * supported by the platform.
+			 */
+			const u32 *table;
+			/** @n_entries: The number of entries in the @table */
+			int n_entries;
+		} pat;
+
 #if IS_ENABLED(CONFIG_DRM_XE_DISPLAY)
 		const struct intel_display_device_info *display;
 		struct intel_display_runtime_info display_runtime;
diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
index 32b0c922e7fa..aa2c5eb88266 100644
--- a/drivers/gpu/drm/xe/xe_pat.c
+++ b/drivers/gpu/drm/xe/xe_pat.c
@@ -106,24 +106,17 @@ static void program_pat_mcr(struct xe_gt *gt, const u32 table[], int n_entries)
 	}
 }
 
-void xe_pat_init(struct xe_gt *gt)
+int xe_pat_init_early(struct xe_device *xe)
 {
-	struct xe_device *xe = gt_to_xe(gt);
-
 	if (xe->info.platform == XE_METEORLAKE) {
-		/*
-		 * SAMedia register offsets are adjusted by the write methods
-		 * and they target registers that are not MCR, while for normal
-		 * GT they are MCR
-		 */
-		if (xe_gt_is_media_type(gt))
-			program_pat(gt, mtl_pat_table, ARRAY_SIZE(mtl_pat_table));
-		else
-			program_pat_mcr(gt, mtl_pat_table, ARRAY_SIZE(mtl_pat_table));
+		xe->info.pat.table = mtl_pat_table;
+		xe->info.pat.n_entries = ARRAY_SIZE(mtl_pat_table);
 	} else if (xe->info.platform == XE_PVC || xe->info.platform == XE_DG2) {
-		program_pat_mcr(gt, pvc_pat_table, ARRAY_SIZE(pvc_pat_table));
+		xe->info.pat.table = pvc_pat_table;
+		xe->info.pat.n_entries = ARRAY_SIZE(pvc_pat_table);
 	} else if (GRAPHICS_VERx100(xe) <= 1210) {
-		program_pat(gt, tgl_pat_table, ARRAY_SIZE(tgl_pat_table));
+		xe->info.pat.table = tgl_pat_table;
+		xe->info.pat.n_entries = ARRAY_SIZE(tgl_pat_table);
 	} else {
 		/*
 		 * Going forward we expect to need new PAT settings for most
@@ -135,7 +128,25 @@ void xe_pat_init(struct xe_gt *gt)
 		 */
 		drm_err(&xe->drm, "Missing PAT table for platform with graphics version %d.%02d!\n",
 			GRAPHICS_VER(xe), GRAPHICS_VERx100(xe) % 100);
+		return -ENOTSUPP;
 	}
+
+	return 0;
+}
+
+void xe_pat_init(struct xe_gt *gt)
+{
+	struct xe_device *xe = gt_to_xe(gt);
+
+	/*
+	 * SAMedia register offsets are adjusted by the write methods
+	 * and they target registers that are not MCR, while for normal
+	 * GT they are MCR.
+	 */
+	if (xe_gt_is_media_type(gt) || GRAPHICS_VERx100(xe) < 1255)
+		program_pat(gt, xe->info.pat.table, xe->info.pat.n_entries);
+	else
+		program_pat_mcr(gt, xe->info.pat.table, xe->info.pat.n_entries);
 }
 
 void xe_pte_pat_init(struct xe_device *xe)
diff --git a/drivers/gpu/drm/xe/xe_pat.h b/drivers/gpu/drm/xe/xe_pat.h
index 5e71bd98d787..2f89503233b9 100644
--- a/drivers/gpu/drm/xe/xe_pat.h
+++ b/drivers/gpu/drm/xe/xe_pat.h
@@ -28,6 +28,7 @@
 struct xe_gt;
 struct xe_device;
 
+int xe_pat_init_early(struct xe_device *xe);
 void xe_pat_init(struct xe_gt *gt);
 void xe_pte_pat_init(struct xe_device *xe);
 unsigned int xe_pat_get_index(struct xe_device *xe, enum xe_cache_level cache);
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index dc233a1226bd..2806311bd6a4 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -22,6 +22,7 @@
 #include "xe_gt.h"
 #include "xe_macros.h"
 #include "xe_module.h"
+#include "xe_pat.h"
 #include "xe_pci_types.h"
 #include "xe_pm.h"
 #include "xe_step.h"
@@ -531,6 +532,7 @@ static int xe_info_init(struct xe_device *xe,
 	struct xe_tile *tile;
 	struct xe_gt *gt;
 	u8 id;
+	int err;
 
 	xe->info.platform = desc->platform;
 	xe->info.subplatform = subplatform_desc ?
@@ -579,6 +581,11 @@ static int xe_info_init(struct xe_device *xe,
 	xe->info.enable_display = IS_ENABLED(CONFIG_DRM_XE_DISPLAY) &&
 				  enable_display &&
 				  desc->has_display;
+
+	err = xe_pat_init_early(xe);
+	if (err)
+		return err;
+
 	/*
 	 * All platforms have at least one primary GT.  Any platform with media
 	 * version 13 or higher has an additional dedicated media GT.  And
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-xe] [PATCH v2 3/6] drm/xe/pat: trim the tgl PAT table
  2023-09-14 15:31 [Intel-xe] [PATCH v2 0/6] PAT and cache coherency support Matthew Auld
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 1/6] drm/xe/uapi: Add support for cache and coherency mode Matthew Auld
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 2/6] drm/xe: move pat_table into device info Matthew Auld
@ 2023-09-14 15:31 ` Matthew Auld
  2023-09-14 18:07   ` Matt Roper
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 4/6] drm/xe/pat: annotate pat_index with coherency mode Matthew Auld
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Matthew Auld @ 2023-09-14 15:31 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi, Matt Roper

We don't seem to use the 4-7 pat indexes, even though they are defined
by the HW. In the next patch userspace will be able to directly set the
pat_index as part of vm_bind and we don't want to allow setting 4-7.
Simplest is to just ignore them here.

Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/xe_pat.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
index aa2c5eb88266..fb490982fd99 100644
--- a/drivers/gpu/drm/xe/xe_pat.c
+++ b/drivers/gpu/drm/xe/xe_pat.c
@@ -38,10 +38,6 @@ static const u32 tgl_pat_table[] = {
 	[1] = TGL_PAT_WC,
 	[2] = TGL_PAT_WT,
 	[3] = TGL_PAT_UC,
-	[4] = TGL_PAT_WB,
-	[5] = TGL_PAT_WB,
-	[6] = TGL_PAT_WB,
-	[7] = TGL_PAT_WB,
 };
 
 static const u32 pvc_pat_table[] = {
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-xe] [PATCH v2 4/6] drm/xe/pat: annotate pat_index with coherency mode
  2023-09-14 15:31 [Intel-xe] [PATCH v2 0/6] PAT and cache coherency support Matthew Auld
                   ` (2 preceding siblings ...)
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 3/6] drm/xe/pat: trim the tgl PAT table Matthew Auld
@ 2023-09-14 15:31 ` Matthew Auld
  2023-09-15  0:08   ` Matt Roper
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 5/6] drm/xe/migrate: rather use pte_encode helpers Matthew Auld
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Matthew Auld @ 2023-09-14 15:31 UTC (permalink / raw)
  To: intel-xe
  Cc: Filip Hazubski, Lucas De Marchi, Carl Zhang, Effie Yu, Matt Roper

Future uapi needs to give userspace the ability to select the pat_index
for a given vm_bind. However we need to be able to extract the coherency
mode from the provided pat_index to ensure it matches the coherency mode
set at object creation. There are various security reasons for why this
matters.  However the pat_index itself is very platform specific, so
seems reasonable to annotate each platform definition of the pat table.
On some older platforms there is no explicit coherency mode, so we just
pick whatever makes sense.

v2:
  - Simplify with COH_AT_LEAST_1_WAY
  - Add some kernel-doc

Bspec: 45101, 44235 #xe
Bspec: 70552, 71582, 59400 #xe2
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Filip Hazubski <filip.hazubski@intel.com>
Cc: Carl Zhang <carl.zhang@intel.com>
Cc: Effie Yu <effie.yu@intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h |  2 +-
 drivers/gpu/drm/xe/xe_pat.c          | 59 +++++++++++++++++-----------
 drivers/gpu/drm/xe/xe_pat.h          | 18 +++++++++
 3 files changed, 54 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 6c50d0f03466..959e095eb46c 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -244,7 +244,7 @@ struct xe_device {
 			 * @table: The PAT table encoding for every pat_index
 			 * supported by the platform.
 			 */
-			const u32 *table;
+			const struct xe_pat_table_entry *table;
 			/** @n_entries: The number of entries in the @table */
 			int n_entries;
 		} pat;
diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
index fb490982fd99..f4fceb3fa086 100644
--- a/drivers/gpu/drm/xe/xe_pat.c
+++ b/drivers/gpu/drm/xe/xe_pat.c
@@ -4,6 +4,8 @@
  */
 
 
+#include <drm/xe_drm.h>
+
 #include "regs/xe_reg_defs.h"
 #include "xe_gt.h"
 #include "xe_gt_mcr.h"
@@ -33,30 +35,30 @@
 #define TGL_PAT_WC				REG_FIELD_PREP(TGL_MEM_TYPE_MASK, 1)
 #define TGL_PAT_UC				REG_FIELD_PREP(TGL_MEM_TYPE_MASK, 0)
 
-static const u32 tgl_pat_table[] = {
-	[0] = TGL_PAT_WB,
-	[1] = TGL_PAT_WC,
-	[2] = TGL_PAT_WT,
-	[3] = TGL_PAT_UC,
+static const struct xe_pat_table_entry tgl_pat_table[] = {
+	[0] = { TGL_PAT_WB, XE_GEM_COH_AT_LEAST_1WAY },
+	[1] = { TGL_PAT_WC, XE_GEM_COH_NONE },
+	[2] = { TGL_PAT_WT, XE_GEM_COH_NONE },
+	[3] = { TGL_PAT_UC, XE_GEM_COH_NONE },
 };
 
-static const u32 pvc_pat_table[] = {
-	[0] = TGL_PAT_UC,
-	[1] = TGL_PAT_WC,
-	[2] = TGL_PAT_WT,
-	[3] = TGL_PAT_WB,
-	[4] = PVC_PAT_CLOS(1) | TGL_PAT_WT,
-	[5] = PVC_PAT_CLOS(1) | TGL_PAT_WB,
-	[6] = PVC_PAT_CLOS(2) | TGL_PAT_WT,
-	[7] = PVC_PAT_CLOS(2) | TGL_PAT_WB,
+static const struct xe_pat_table_entry pvc_pat_table[] = {
+	[0] = { TGL_PAT_UC, XE_GEM_COH_NONE },
+	[1] = { TGL_PAT_WC, XE_GEM_COH_NONE },
+	[2] = { TGL_PAT_WT, XE_GEM_COH_NONE },
+	[3] = { TGL_PAT_WB, XE_GEM_COH_AT_LEAST_1WAY },
+	[4] = { PVC_PAT_CLOS(1) | TGL_PAT_WT, XE_GEM_COH_NONE },
+	[5] = { PVC_PAT_CLOS(1) | TGL_PAT_WB, XE_GEM_COH_AT_LEAST_1WAY },
+	[6] = { PVC_PAT_CLOS(2) | TGL_PAT_WT, XE_GEM_COH_NONE },
+	[7] = { PVC_PAT_CLOS(2) | TGL_PAT_WB, XE_GEM_COH_AT_LEAST_1WAY },
 };
 
-static const u32 mtl_pat_table[] = {
-	[0] = MTL_PAT_0_WB,
-	[1] = MTL_PAT_1_WT,
-	[2] = MTL_PAT_3_UC,
-	[3] = MTL_PAT_0_WB | MTL_2_COH_1W,
-	[4] = MTL_PAT_0_WB | MTL_3_COH_2W,
+static const struct xe_pat_table_entry mtl_pat_table[] = {
+	[0] = { MTL_PAT_0_WB, XE_GEM_COH_NONE },
+	[1] = { MTL_PAT_1_WT, XE_GEM_COH_NONE },
+	[2] = { MTL_PAT_3_UC, XE_GEM_COH_NONE },
+	[3] = { MTL_PAT_0_WB | MTL_2_COH_1W, XE_GEM_COH_AT_LEAST_1WAY },
+	[4] = { MTL_PAT_0_WB | MTL_3_COH_2W, XE_GEM_COH_AT_LEAST_1WAY },
 };
 
 static const u32 xelp_pte_pat_table[XE_CACHE_LAST] = {
@@ -78,27 +80,35 @@ static const u32 xelpg_pte_pat_table[XE_CACHE_LAST] = {
 	[XE_CACHE_WB_1_WAY] = XELPG_PAT_WB_CACHE_1_WAY,
 };
 
+u16 xe_pat_index_get_coh_mode(struct xe_device *xe, u16 pat_index)
+{
+	WARN_ON(pat_index >= xe->info.pat.n_entries);
+	return xe->info.pat.table[pat_index].coh_mode;
+}
+
 unsigned int xe_pat_get_index(struct xe_device *xe, enum xe_cache_level cache)
 {
 	WARN_ON(cache >= XE_CACHE_LAST);
 	return (xe->pat_table).pte_pat_table[cache];
 }
 
-static void program_pat(struct xe_gt *gt, const u32 table[], int n_entries)
+static void program_pat(struct xe_gt *gt, const struct xe_pat_table_entry table[],
+			int n_entries)
 {
 	for (int i = 0; i < n_entries; i++) {
 		struct xe_reg reg = XE_REG(_PAT_INDEX(i));
 
-		xe_mmio_write32(gt, reg, table[i]);
+		xe_mmio_write32(gt, reg, table[i].value);
 	}
 }
 
-static void program_pat_mcr(struct xe_gt *gt, const u32 table[], int n_entries)
+static void program_pat_mcr(struct xe_gt *gt, const struct xe_pat_table_entry table[],
+			    int n_entries)
 {
 	for (int i = 0; i < n_entries; i++) {
 		struct xe_reg_mcr reg_mcr = XE_REG_MCR(_PAT_INDEX(i));
 
-		xe_gt_mcr_multicast_write(gt, reg_mcr, table[i]);
+		xe_gt_mcr_multicast_write(gt, reg_mcr, table[i].value);
 	}
 }
 
@@ -111,6 +121,7 @@ int xe_pat_init_early(struct xe_device *xe)
 		xe->info.pat.table = pvc_pat_table;
 		xe->info.pat.n_entries = ARRAY_SIZE(pvc_pat_table);
 	} else if (GRAPHICS_VERx100(xe) <= 1210) {
+		WARN_ON_ONCE(!IS_DGFX(xe) && !xe->info.has_llc);
 		xe->info.pat.table = tgl_pat_table;
 		xe->info.pat.n_entries = ARRAY_SIZE(tgl_pat_table);
 	} else {
diff --git a/drivers/gpu/drm/xe/xe_pat.h b/drivers/gpu/drm/xe/xe_pat.h
index 2f89503233b9..809332ff08d5 100644
--- a/drivers/gpu/drm/xe/xe_pat.h
+++ b/drivers/gpu/drm/xe/xe_pat.h
@@ -28,9 +28,27 @@
 struct xe_gt;
 struct xe_device;
 
+/**
+ * struct xe_pat_table_entry - The pat_index encoding and other meta information.
+ */
+struct xe_pat_table_entry {
+	/**
+	 * @value: The platform specific value encoding the various memory
+	 * attributes (this maps to some fixed pat_index). So things like
+	 * caching, coherency, compression etc can be encoded here.
+	 */
+	u32 value;
+	/**
+	 * @coh_mode: The GPU coherency mode that @value maps to. Either
+	 * XE_GEM_COH_NONE or XE_GEM_COH_AT_LEAST_1WAY.
+	 */
+	u16 coh_mode;
+};
+
 int xe_pat_init_early(struct xe_device *xe);
 void xe_pat_init(struct xe_gt *gt);
 void xe_pte_pat_init(struct xe_device *xe);
 unsigned int xe_pat_get_index(struct xe_device *xe, enum xe_cache_level cache);
+u16 xe_pat_index_get_coh_mode(struct xe_device *xe, u16 pat_index);
 
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-xe] [PATCH v2 5/6] drm/xe/migrate: rather use pte_encode helpers
  2023-09-14 15:31 [Intel-xe] [PATCH v2 0/6] PAT and cache coherency support Matthew Auld
                   ` (3 preceding siblings ...)
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 4/6] drm/xe/pat: annotate pat_index with coherency mode Matthew Auld
@ 2023-09-14 15:31 ` Matthew Auld
  2023-09-15 22:19   ` Matt Roper
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 6/6] drm/xe/uapi: support pat_index selection with vm_bind Matthew Auld
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Matthew Auld @ 2023-09-14 15:31 UTC (permalink / raw)
  To: intel-xe; +Cc: Matt Roper, Lucas De Marchi

We need to avoid using stuff like PPAT_CACHED directly, which is no
longer going to work on newer platforms. At some point we can just
directly use the pat_index, but for now just use XE_CACHE_WB.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/xe/xe_migrate.c |  7 ++++---
 drivers/gpu/drm/xe/xe_pt.c      | 12 ++++++------
 drivers/gpu/drm/xe/xe_pt.h      |  2 ++
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 46f88f3a8c58..26cbc9107501 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -257,8 +257,9 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 
 		level = 2;
 		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
-		flags = XE_PAGE_RW | XE_PAGE_PRESENT | PPAT_CACHED |
-			XE_PPGTT_PTE_DM | XE_PDPE_PS_1G;
+
+		flags = XE_PPGTT_PTE_DM;
+		flags = __xe_pte_encode(flags, XE_CACHE_WB, vm, NULL, 2);
 
 		/*
 		 * Use 1GB pages, it shouldn't matter the physical amount of
@@ -493,7 +494,7 @@ static void emit_pte(struct xe_migrate *m,
 				addr += vram_region_gpu_offset(bo->ttm.resource);
 				addr |= XE_PPGTT_PTE_DM;
 			}
-			addr |= PPAT_CACHED | XE_PAGE_PRESENT | XE_PAGE_RW;
+			addr = __xe_pte_encode(addr, XE_CACHE_WB, m->q->vm, NULL, 0);
 			bb->cs[bb->len++] = lower_32_bits(addr);
 			bb->cs[bb->len++] = upper_32_bits(addr);
 
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index b0874052f5ce..a1b164cf8bce 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -67,8 +67,8 @@ u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset)
 	return pde;
 }
 
-static u64 __pte_encode(u64 pte, enum xe_cache_level cache,
-			struct xe_vm *vm, struct xe_vma *vma, u32 pt_level)
+u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
+		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level)
 {
 	struct xe_device *xe = vm->xe;
 
@@ -112,7 +112,7 @@ u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_
 	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
 		pte |= XE_PPGTT_PTE_DM;
 
-	return __pte_encode(pte, cache, vm, NULL, pt_level);
+	return __xe_pte_encode(pte, cache, vm, NULL, pt_level);
 }
 
 static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
@@ -592,9 +592,9 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 
 		XE_WARN_ON(xe_walk->va_curs_start != addr);
 
-		pte = __pte_encode(is_null ? 0 :
-				   xe_res_dma(curs) + xe_walk->dma_offset,
-				   xe_walk->cache, xe_walk->vm, xe_walk->vma, level);
+		pte = __xe_pte_encode(is_null ? 0 :
+				      xe_res_dma(curs) + xe_walk->dma_offset,
+				      xe_walk->cache, xe_walk->vm, xe_walk->vma, level);
 		pte |= xe_walk->default_pte;
 
 		/*
diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
index 4a9143bc6628..0e66436d707d 100644
--- a/drivers/gpu/drm/xe/xe_pt.h
+++ b/drivers/gpu/drm/xe/xe_pt.h
@@ -49,5 +49,7 @@ u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset);
 
 u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
 		  u32 pt_level);
+u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
+		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level);
 
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-xe] [PATCH v2 6/6] drm/xe/uapi: support pat_index selection with vm_bind
  2023-09-14 15:31 [Intel-xe] [PATCH v2 0/6] PAT and cache coherency support Matthew Auld
                   ` (4 preceding siblings ...)
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 5/6] drm/xe/migrate: rather use pte_encode helpers Matthew Auld
@ 2023-09-14 15:31 ` Matthew Auld
  2023-09-15 22:24   ` Matt Roper
  2023-09-25 21:56   ` Rodrigo Vivi
  2023-09-14 18:16 ` [Intel-xe] ✗ CI.Patch_applied: failure for PAT and cache coherency support (rev2) Patchwork
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 28+ messages in thread
From: Matthew Auld @ 2023-09-14 15:31 UTC (permalink / raw)
  To: intel-xe
  Cc: Filip Hazubski, Lucas De Marchi, Carl Zhang, Effie Yu, Matt Roper

Allow userspace to directly control the pat_index for a given vm
binding. This should allow directly controlling the coherency, caching
and potentially other stuff in the future for the ppGTT binding.

The exact meaning behind the pat_index is very platform specific (see
BSpec or PRMs) but effectively maps to some predefined memory
attributes. From the KMD pov we only care about the coherency that is
provided by the pat_index, which falls into either NONE, 1WAY or 2WAY.
The vm_bind coherency mode for the given pat_index needs to be at least
as coherent as the coh_mode that was set at object creation. For
platforms that lack the explicit coherency mode, we treat UC/WT/WC as
NONE and WB as AT_LEAST_1WAY.

For userptr mappings we lack a corresponding gem object, so the expected
coherency mode is instead implicit and must fall into either 1WAY or
2WAY. Trying to use NONE will be rejected by the kernel. For imported
dma-buf (from a different device) the coherency mode is also implicit
and must also be either 1WAY or 2WAY i.e AT_LEAST_1WAY.

As part of adding pat_index support with vm_bind we also need stop using
xe_cache_level and instead use the pat_index in various places. We still
make use of xe_cache_level, but only as a convenience for kernel
internal objectsi (internally it maps to some reasonable pat_index). For
now this is just a 1:1 conversion of the existing code, however for
platforms like MTL+ we might need to give more control through bo_create
or stop using WB on the CPU side if we need CPU access.

v2:
  - Undefined coh_mode(pat_index) can now be treated as programmer error. (Matt Roper)
  - We now allow gem_create.coh_mode <= coh_mode(pat_index), rather than
    having to match exactly. This ensures imported dma-buf can always
    just use 1way (or even 2way), now that we also bundle 1way/2way into
    at_least_1way. We still require 1way/2way for external dma-buf, but
    the policy can now be the same for self-import, if desired.
  - Use u16 for pat_index in uapi. u32 is massive overkill. (José)
  - Move as much of the pat_index validation as we can into
    vm_bind_ioctl_check_args. (José)

Bspec: 45101, 44235 #xe
Bspec: 70552, 71582, 59400 #xe2
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Filip Hazubski <filip.hazubski@intel.com>
Cc: Carl Zhang <carl.zhang@intel.com>
Cc: Effie Yu <effie.yu@intel.com>
---
 drivers/gpu/drm/xe/tests/xe_migrate.c |  2 +-
 drivers/gpu/drm/xe/xe_ggtt.c          |  7 ++-
 drivers/gpu/drm/xe/xe_ggtt_types.h    |  2 +-
 drivers/gpu/drm/xe/xe_migrate.c       | 13 +++--
 drivers/gpu/drm/xe/xe_pt.c            | 22 ++++-----
 drivers/gpu/drm/xe/xe_pt.h            |  4 +-
 drivers/gpu/drm/xe/xe_vm.c            | 69 +++++++++++++++++++++------
 drivers/gpu/drm/xe/xe_vm_types.h      | 10 +++-
 include/uapi/drm/xe_drm.h             | 43 ++++++++++++++++-
 9 files changed, 128 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
index 6b4388bfbb31..d3bf4751a2d7 100644
--- a/drivers/gpu/drm/xe/tests/xe_migrate.c
+++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
@@ -301,7 +301,7 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
 	/* First part of the test, are we updating our pagetable bo with a new entry? */
 	xe_map_wr(xe, &bo->vmap, XE_PAGE_SIZE * (NUM_KERNEL_PDE - 1), u64,
 		  0xdeaddeadbeefbeef);
-	expected = xe_pte_encode(m->q->vm, pt, 0, XE_CACHE_WB, 0);
+	expected = xe_pte_encode(m->q->vm, pt, 0, xe_pat_get_index(xe, XE_CACHE_WB), 0);
 	if (m->q->vm->flags & XE_VM_FLAG_64K)
 		expected |= XE_PTE_PS64;
 	if (xe_bo_is_vram(pt))
diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index aea26afd4668..7e4da16389af 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -41,7 +41,8 @@ u64 xe_ggtt_pte_encode(struct xe_bo *bo, u64 bo_offset)
 		pte |= XE_GGTT_PTE_DM;
 
 	if ((ggtt->pat_encode).pte_encode)
-		pte = (ggtt->pat_encode).pte_encode(xe, pte, XE_CACHE_WB_1_WAY);
+		pte = (ggtt->pat_encode).pte_encode(xe, pte,
+						    xe_pat_get_index(xe, XE_CACHE_WB_1_WAY));
 
 	return pte;
 }
@@ -102,10 +103,8 @@ static void primelockdep(struct xe_ggtt *ggtt)
 }
 
 static u64 xelpg_ggtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
-						enum xe_cache_level cache)
+				     u16 pat_index)
 {
-	u32 pat_index = xe_pat_get_index(xe, cache);
-
 	pte_pat &= ~(XELPG_GGTT_PTE_PAT_MASK);
 
 	if (pat_index & BIT(0))
diff --git a/drivers/gpu/drm/xe/xe_ggtt_types.h b/drivers/gpu/drm/xe/xe_ggtt_types.h
index 7e55fac1a8a9..7981075bb228 100644
--- a/drivers/gpu/drm/xe/xe_ggtt_types.h
+++ b/drivers/gpu/drm/xe/xe_ggtt_types.h
@@ -31,7 +31,7 @@ struct xe_ggtt {
 
 	struct {
 		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat,
-						enum xe_cache_level cache);
+				  u16 pat_index);
 	} pat_encode;
 };
 
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 26cbc9107501..89d9e33a07e7 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -25,6 +25,7 @@
 #include "xe_lrc.h"
 #include "xe_map.h"
 #include "xe_mocs.h"
+#include "xe_pat.h"
 #include "xe_pt.h"
 #include "xe_res_cursor.h"
 #include "xe_sched_job.h"
@@ -162,6 +163,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 	u32 num_entries = NUM_PT_SLOTS, num_level = vm->pt_root[id]->level;
 	u32 map_ofs, level, i;
 	struct xe_bo *bo, *batch = tile->mem.kernel_bb_pool->bo;
+	u16 pat_index = xe_pat_get_index(xe, XE_CACHE_WB);
 	u64 entry;
 	int ret;
 
@@ -196,7 +198,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 
 	/* Map the entire BO in our level 0 pt */
 	for (i = 0, level = 0; i < num_entries; level++) {
-		entry = xe_pte_encode(vm, bo, i * XE_PAGE_SIZE, XE_CACHE_WB, 0);
+		entry = xe_pte_encode(vm, bo, i * XE_PAGE_SIZE, pat_index, 0);
 
 		xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64, entry);
 
@@ -214,7 +216,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 		for (i = 0; i < batch->size;
 		     i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
 		     XE_PAGE_SIZE) {
-			entry = xe_pte_encode(vm, batch, i, XE_CACHE_WB, 0);
+			entry = xe_pte_encode(vm, batch, i, pat_index, 0);
 
 			xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64,
 				  entry);
@@ -259,7 +261,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
 
 		flags = XE_PPGTT_PTE_DM;
-		flags = __xe_pte_encode(flags, XE_CACHE_WB, vm, NULL, 2);
+		flags = __xe_pte_encode(flags, pat_index, vm, NULL, 2);
 
 		/*
 		 * Use 1GB pages, it shouldn't matter the physical amount of
@@ -454,6 +456,7 @@ static void emit_pte(struct xe_migrate *m,
 		     struct xe_res_cursor *cur,
 		     u32 size, struct xe_bo *bo)
 {
+	u16 pat_index = xe_pat_get_index(m->tile->xe, XE_CACHE_WB);
 	u32 ptes;
 	u64 ofs = at_pt * XE_PAGE_SIZE;
 	u64 cur_ofs;
@@ -494,7 +497,7 @@ static void emit_pte(struct xe_migrate *m,
 				addr += vram_region_gpu_offset(bo->ttm.resource);
 				addr |= XE_PPGTT_PTE_DM;
 			}
-			addr = __xe_pte_encode(addr, XE_CACHE_WB, m->q->vm, NULL, 0);
+			addr = __xe_pte_encode(addr, pat_index, m->q->vm, NULL, 0);
 			bb->cs[bb->len++] = lower_32_bits(addr);
 			bb->cs[bb->len++] = upper_32_bits(addr);
 
@@ -1254,7 +1257,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
 
 			xe_tile_assert(tile, pt_bo->size == SZ_4K);
 
-			addr = xe_pte_encode(vm, pt_bo, 0, XE_CACHE_WB, 0);
+			addr = xe_pte_encode(vm, pt_bo, 0, xe_pat_get_index(xe, XE_CACHE_WB), 0);
 			bb->cs[bb->len++] = lower_32_bits(addr);
 			bb->cs[bb->len++] = upper_32_bits(addr);
 		}
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index a1b164cf8bce..7dd93cbff704 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -10,6 +10,7 @@
 #include "xe_gt.h"
 #include "xe_gt_tlb_invalidation.h"
 #include "xe_migrate.h"
+#include "xe_pat.h"
 #include "xe_pt_types.h"
 #include "xe_pt_walk.h"
 #include "xe_res_cursor.h"
@@ -67,7 +68,7 @@ u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset)
 	return pde;
 }
 
-u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
+u64 __xe_pte_encode(u64 pte, u16 pat_index,
 		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level)
 {
 	struct xe_device *xe = vm->xe;
@@ -85,7 +86,7 @@ u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
 	else if (pt_level == 2)
 		pte |= XE_PDPE_PS_1G;
 
-	pte = vm->pat_encode.pte_encode(xe, pte, cache);
+	pte = vm->pat_encode.pte_encode(xe, pte, pat_index);
 
 	/* XXX: Does hw support 1 GiB pages? */
 	XE_WARN_ON(pt_level > 2);
@@ -103,7 +104,7 @@ u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
  *
  * Return: An encoded page-table entry. No errors.
  */
-u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
+u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, u16 pat_index,
 		  u32 pt_level)
 {
 	u64 pte;
@@ -112,7 +113,7 @@ u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_
 	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
 		pte |= XE_PPGTT_PTE_DM;
 
-	return __xe_pte_encode(pte, cache, vm, NULL, pt_level);
+	return __xe_pte_encode(pte, pat_index, vm, NULL, pt_level);
 }
 
 static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
@@ -125,7 +126,7 @@ static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
 
 	if (level == 0) {
 		u64 empty = xe_pte_encode(vm, vm->scratch_bo[id], 0,
-					  XE_CACHE_WB, 0);
+					  xe_pat_get_index(vm->xe, XE_CACHE_WB), 0);
 
 		return empty;
 	} else {
@@ -358,8 +359,6 @@ struct xe_pt_stage_bind_walk {
 	struct xe_vm *vm;
 	/** @tile: The tile we're building for. */
 	struct xe_tile *tile;
-	/** @cache: Desired cache level for the ptes */
-	enum xe_cache_level cache;
 	/** @default_pte: PTE flag only template. No address is associated */
 	u64 default_pte;
 	/** @dma_offset: DMA offset to add to the PTE. */
@@ -594,7 +593,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 
 		pte = __xe_pte_encode(is_null ? 0 :
 				      xe_res_dma(curs) + xe_walk->dma_offset,
-				      xe_walk->cache, xe_walk->vm, xe_walk->vma, level);
+				      xe_walk->vma->pat_index, xe_walk->vm, xe_walk->vma, level);
 		pte |= xe_walk->default_pte;
 
 		/*
@@ -720,13 +719,8 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
 		if (vma && vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT)
 			xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
 		xe_walk.dma_offset = vram_region_gpu_offset(bo->ttm.resource);
-		xe_walk.cache = XE_CACHE_WB;
-	} else {
-		if (!xe_vma_has_no_bo(vma) && bo->flags & XE_BO_SCANOUT_BIT)
-			xe_walk.cache = XE_CACHE_WT;
-		else
-			xe_walk.cache = XE_CACHE_WB;
 	}
+
 	if (!xe_vma_has_no_bo(vma) && xe_bo_is_stolen(bo))
 		xe_walk.dma_offset = xe_ttm_stolen_gpu_offset(xe_bo_device(bo));
 
diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
index 0e66436d707d..6d10823fca9b 100644
--- a/drivers/gpu/drm/xe/xe_pt.h
+++ b/drivers/gpu/drm/xe/xe_pt.h
@@ -47,9 +47,9 @@ bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
 
 u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset);
 
-u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
+u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, u16 pat_index,
 		  u32 pt_level);
-u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
+u64 __xe_pte_encode(u64 pte, u16 pat_index,
 		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level);
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index ba612a5ee2d8..98db7a298139 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -6,6 +6,7 @@
 #include "xe_vm.h"
 
 #include <linux/dma-fence-array.h>
+#include <linux/nospec.h>
 
 #include <drm/drm_exec.h>
 #include <drm/drm_print.h>
@@ -858,7 +859,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 				    u64 start, u64 end,
 				    bool read_only,
 				    bool is_null,
-				    u8 tile_mask)
+				    u8 tile_mask,
+				    u16 pat_index)
 {
 	struct xe_vma *vma;
 	struct xe_tile *tile;
@@ -897,6 +899,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 			vma->tile_mask |= 0x1 << id;
 	}
 
+	vma->pat_index = pat_index;
+
 	if (vm->xe->info.platform == XE_PVC)
 		vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
 
@@ -1195,10 +1199,8 @@ static void xe_vma_op_work_func(struct work_struct *w);
 static void vm_destroy_work_func(struct work_struct *w);
 
 static u64 xe2_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
-						enum xe_cache_level cache)
+				    u16 pat_index)
 {
-	u32 pat_index = xe_pat_get_index(xe, cache);
-
 	if (pat_index & BIT(0))
 		pte_pat |= BIT(3);
 
@@ -1216,10 +1218,8 @@ static u64 xe2_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
 }
 
 static u64 xelp_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
-						enum xe_cache_level cache)
+				     u16 pat_index)
 {
-	u32 pat_index = xe_pat_get_index(xe, cache);
-
 	if (pat_index & BIT(0))
 		pte_pat |= BIT(3);
 
@@ -2300,7 +2300,7 @@ static void print_op(struct xe_device *xe, struct drm_gpuva_op *op)
 static struct drm_gpuva_ops *
 vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 			 u64 bo_offset_or_userptr, u64 addr, u64 range,
-			 u32 operation, u8 tile_mask, u32 region)
+			 u32 operation, u8 tile_mask, u32 region, u16 pat_index)
 {
 	struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
 	struct drm_gpuva_ops *ops;
@@ -2327,6 +2327,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
 
 			op->tile_mask = tile_mask;
+			op->pat_index = pat_index;
 			op->map.immediate =
 				operation & XE_VM_BIND_FLAG_IMMEDIATE;
 			op->map.read_only =
@@ -2354,6 +2355,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
 
 			op->tile_mask = tile_mask;
+			op->pat_index = pat_index;
 			op->prefetch.region = region;
 		}
 		break;
@@ -2396,7 +2398,8 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 }
 
 static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
-			      u8 tile_mask, bool read_only, bool is_null)
+			      u8 tile_mask, bool read_only, bool is_null,
+			      u16 pat_index)
 {
 	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
 	struct xe_vma *vma;
@@ -2412,7 +2415,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
 	vma = xe_vma_create(vm, bo, op->gem.offset,
 			    op->va.addr, op->va.addr +
 			    op->va.range - 1, read_only, is_null,
-			    tile_mask);
+			    tile_mask, pat_index);
 	if (bo)
 		xe_bo_unlock(bo);
 
@@ -2569,7 +2572,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
 
 			vma = new_vma(vm, &op->base.map,
 				      op->tile_mask, op->map.read_only,
-				      op->map.is_null);
+				      op->map.is_null, op->pat_index);
 			if (IS_ERR(vma)) {
 				err = PTR_ERR(vma);
 				goto free_fence;
@@ -2597,7 +2600,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
 
 				vma = new_vma(vm, op->base.remap.prev,
 					      op->tile_mask, read_only,
-					      is_null);
+					      is_null, op->pat_index);
 				if (IS_ERR(vma)) {
 					err = PTR_ERR(vma);
 					goto free_fence;
@@ -2633,7 +2636,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
 
 				vma = new_vma(vm, op->base.remap.next,
 					      op->tile_mask, read_only,
-					      is_null);
+					      is_null, op->pat_index);
 				if (IS_ERR(vma)) {
 					err = PTR_ERR(vma);
 					goto free_fence;
@@ -3146,7 +3149,23 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
 		u32 obj = (*bind_ops)[i].obj;
 		u64 obj_offset = (*bind_ops)[i].obj_offset;
 		u32 region = (*bind_ops)[i].region;
+		u16 pat_index = (*bind_ops)[i].pat_index;
 		bool is_null = op & XE_VM_BIND_FLAG_NULL;
+		u16 coh_mode;
+
+		if (XE_IOCTL_DBG(xe, pat_index >= xe->info.pat.n_entries)) {
+			err = -EINVAL;
+			goto free_bind_ops;
+		}
+
+		pat_index = array_index_nospec(pat_index,
+					       xe->info.pat.n_entries);
+		(*bind_ops)[i].pat_index = pat_index;
+		coh_mode = xe_pat_index_get_coh_mode(xe, pat_index);
+		if (XE_WARN_ON(!coh_mode || coh_mode > XE_GEM_COH_AT_LEAST_1WAY)) {
+			err = -EINVAL;
+			goto free_bind_ops;
+		}
 
 		if (i == 0) {
 			*async = !!(op & XE_VM_BIND_FLAG_ASYNC);
@@ -3188,6 +3207,8 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
 				 VM_BIND_OP(op) == XE_VM_BIND_OP_UNMAP_ALL) ||
 		    XE_IOCTL_DBG(xe, obj &&
 				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP_USERPTR) ||
+		    XE_IOCTL_DBG(xe, coh_mode == XE_GEM_COH_NONE &&
+				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP_USERPTR) ||
 		    XE_IOCTL_DBG(xe, obj &&
 				 VM_BIND_OP(op) == XE_VM_BIND_OP_PREFETCH) ||
 		    XE_IOCTL_DBG(xe, region &&
@@ -3336,6 +3357,8 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		u64 addr = bind_ops[i].addr;
 		u32 obj = bind_ops[i].obj;
 		u64 obj_offset = bind_ops[i].obj_offset;
+		u16 pat_index = bind_ops[i].pat_index;
+		u16 coh_mode;
 
 		if (!obj)
 			continue;
@@ -3363,6 +3386,23 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 				goto put_obj;
 			}
 		}
+
+		coh_mode = xe_pat_index_get_coh_mode(xe, pat_index);
+		if (bos[i]->coh_mode) {
+			if (XE_IOCTL_DBG(xe, coh_mode < bos[i]->coh_mode)) {
+				err = -EINVAL;
+				goto put_obj;
+			}
+		} else if (XE_IOCTL_DBG(xe, coh_mode == XE_GEM_COH_NONE)) {
+			/*
+			 * Imported dma-buf from a different device should
+			 * require 1way or 2way coherency since we don't know
+			 * how it was mapped on the CPU. Just assume is it
+			 * potentially cached on CPU side.
+			 */
+			err = -EINVAL;
+			goto put_obj;
+		}
 	}
 
 	if (args->num_syncs) {
@@ -3400,10 +3440,11 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		u64 obj_offset = bind_ops[i].obj_offset;
 		u8 tile_mask = bind_ops[i].tile_mask;
 		u32 region = bind_ops[i].region;
+		u16 pat_index = bind_ops[i].pat_index;
 
 		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset,
 						  addr, range, op, tile_mask,
-						  region);
+						  region, pat_index);
 		if (IS_ERR(ops[i])) {
 			err = PTR_ERR(ops[i]);
 			ops[i] = NULL;
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index dc583f00919f..54658f400174 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -111,6 +111,11 @@ struct xe_vma {
 	 */
 	u8 tile_present;
 
+	/**
+	 * @pat_index: The pat index to use when encoding the PTEs for this vma.
+	 */
+	u16 pat_index;
+
 	struct {
 		struct list_head rebind_link;
 	} notifier;
@@ -338,8 +343,7 @@ struct xe_vm {
 	bool batch_invalidate_tlb;
 
 	struct {
-		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat,
-				  enum xe_cache_level cache);
+		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat, u16 pat_index);
 	} pat_encode;
 };
 
@@ -419,6 +423,8 @@ struct xe_vma_op {
 	struct async_op_fence *fence;
 	/** @tile_mask: gt mask for this operation */
 	u8 tile_mask;
+	/** @pat_index: The pat index to use for this operation. */
+	u16 pat_index;
 	/** @flags: operation flags */
 	enum xe_vma_op_flags flags;
 
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 737bb1d4c6f7..75b42c1116f2 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -605,8 +605,49 @@ struct drm_xe_vm_bind_op {
 	 */
 	__u32 obj;
 
+	/**
+	 * @pat_index: The platform defined @pat_index to use for this mapping.
+	 * The index basically maps to some predefined memory attributes,
+	 * including things like caching, coherency, compression etc.  The exact
+	 * meaning of the pat_index is platform specific and defined in the
+	 * Bspec and PRMs.  When the KMD sets up the binding the index here is
+	 * encoded into the ppGTT PTE.
+	 *
+	 * For coherency the @pat_index needs to be least as coherent as
+	 * drm_xe_gem_create.coh_mode. i.e coh_mode(pat_index) >=
+	 * drm_xe_gem_create.coh_mode. The KMD will extract the coherency mode
+	 * from the @pat_index and reject if there is a mismatch (see note below
+	 * for pre-MTL platforms).
+	 *
+	 * Note: On pre-MTL platforms there is only a caching mode and no
+	 * explicit coherency mode, but on such hardware there is always a
+	 * shared-LLC (or is dgpu) so all GT memory accesses are coherent with
+	 * CPU caches even with the caching mode set as uncached.  It's only the
+	 * display engine that is incoherent (on dgpu it must be in VRAM which
+	 * is always mapped as WC on the CPU). However to keep the uapi somewhat
+	 * consistent with newer platforms the KMD groups the different cache
+	 * levels into the following coherency buckets on all pre-MTL platforms:
+	 *
+	 *	ppGTT UC -> XE_GEM_COH_NONE
+	 *	ppGTT WC -> XE_GEM_COH_NONE
+	 *	ppGTT WT -> XE_GEM_COH_NONE
+	 *	ppGTT WB -> XE_GEM_COH_AT_LEAST_1WAY
+	 *
+	 * In practice UC/WC/WT should only ever used for scanout surfaces on
+	 * such platforms (or perhaps in general for dma-buf if shared with
+	 * another device) since it is only the display engine that is actually
+	 * incoherent.  Everything else should typically use WB given that we
+	 * have a shared-LLC.  On MTL+ this completely changes and the HW
+	 * defines the coherency mode as part of the @pat_index, where
+	 * incoherent GT access is possible.
+	 *
+	 * Note: For userptr and externally imported dma-buf the kernel expects
+	 * either 1WAY or 2WAY for the @pat_index.
+	 */
+	__u16 pat_index;
+
 	/** @pad: MBZ */
-	__u32 pad;
+	__u16 pad;
 
 	union {
 		/**
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Intel-xe] [PATCH v2 3/6] drm/xe/pat: trim the tgl PAT table
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 3/6] drm/xe/pat: trim the tgl PAT table Matthew Auld
@ 2023-09-14 18:07   ` Matt Roper
  0 siblings, 0 replies; 28+ messages in thread
From: Matt Roper @ 2023-09-14 18:07 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Lucas De Marchi, intel-xe

On Thu, Sep 14, 2023 at 04:31:16PM +0100, Matthew Auld wrote:
> We don't seem to use the 4-7 pat indexes, even though they are defined
> by the HW. In the next patch userspace will be able to directly set the
> pat_index as part of vm_bind and we don't want to allow setting 4-7.
> Simplest is to just ignore them here.
> 
> Suggested-by: Matt Roper <matthew.d.roper@intel.com>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Pallavi Mishra <pallavi.mishra@intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_pat.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
> index aa2c5eb88266..fb490982fd99 100644
> --- a/drivers/gpu/drm/xe/xe_pat.c
> +++ b/drivers/gpu/drm/xe/xe_pat.c
> @@ -38,10 +38,6 @@ static const u32 tgl_pat_table[] = {
>  	[1] = TGL_PAT_WC,
>  	[2] = TGL_PAT_WT,
>  	[3] = TGL_PAT_UC,
> -	[4] = TGL_PAT_WB,
> -	[5] = TGL_PAT_WB,
> -	[6] = TGL_PAT_WB,
> -	[7] = TGL_PAT_WB,
>  };
>  
>  static const u32 pvc_pat_table[] = {
> -- 
> 2.41.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Intel-xe] ✗ CI.Patch_applied: failure for PAT and cache coherency support (rev2)
  2023-09-14 15:31 [Intel-xe] [PATCH v2 0/6] PAT and cache coherency support Matthew Auld
                   ` (5 preceding siblings ...)
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 6/6] drm/xe/uapi: support pat_index selection with vm_bind Matthew Auld
@ 2023-09-14 18:16 ` Patchwork
  2023-09-18 15:51 ` [Intel-xe] [PATCH v2 0/6] PAT and cache coherency support Souza, Jose
  2023-09-21 20:10 ` [Intel-xe] ✗ CI.Patch_applied: failure for PAT and cache coherency support (rev3) Patchwork
  8 siblings, 0 replies; 28+ messages in thread
From: Patchwork @ 2023-09-14 18:16 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-xe

== Series Details ==

Series: PAT and cache coherency support (rev2)
URL   : https://patchwork.freedesktop.org/series/123027/
State : failure

== Summary ==

=== Applying kernel patches on branch 'drm-xe-next' with base: ===
Base commit: aeac46cfa Revert "FIXME: drm/i915: add a lot of includes in intel_display_power.h"
=== git am output follows ===
error: patch failed: drivers/gpu/drm/xe/xe_pat.c:135
error: drivers/gpu/drm/xe/xe_pat.c: patch does not apply
error: patch failed: drivers/gpu/drm/xe/xe_pat.h:28
error: drivers/gpu/drm/xe/xe_pat.h: patch does not apply
hint: Use 'git am --show-current-patch' to see the failed patch
Applying: drm/xe/uapi: Add support for cache and coherency mode
Applying: drm/xe: move pat_table into device info
Patch failed at 0002 drm/xe: move pat_table into device info
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-xe] [PATCH v2 1/6] drm/xe/uapi: Add support for cache and coherency mode
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 1/6] drm/xe/uapi: Add support for cache and coherency mode Matthew Auld
@ 2023-09-14 23:47   ` Matt Roper
  2023-09-15  7:37     ` Matthew Auld
  2023-09-21 20:07   ` Souza, Jose
  1 sibling, 1 reply; 28+ messages in thread
From: Matt Roper @ 2023-09-14 23:47 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Effie Yu, Filip Hazubski, Lucas De Marchi, intel-xe, Carl Zhang

On Thu, Sep 14, 2023 at 04:31:14PM +0100, Matthew Auld wrote:
> From: Pallavi Mishra <pallavi.mishra@intel.com>
> 
> Allow userspace to specify the CPU caching mode to use for system memory
> in addition to coherency modes during object creation. Modify gem create
> handler and introduce xe_bo_create_user to replace xe_bo_create. In a
> later patch we will support setting the pat_index as part of vm_bind,
> where expectation is that the coherency mode extracted from the
> pat_index must match the one set at object creation.
> 
> v2
>   - s/smem_caching/smem_cpu_caching/ and
>     s/XE_GEM_CACHING/XE_GEM_CPU_CACHING/. (Matt Roper)
>   - Drop COH_2WAY and just use COH_NONE + COH_AT_LEAST_1WAY; KMD mostly
>     just cares that zeroing/swap-in can't be bypassed with the given
>     smem_caching mode. (Matt Roper)
>   - Fix broken range check for coh_mode and smem_cpu_caching and also
>     don't use constant value, but the already defined macros. (José)
>   - Prefer switch statement for smem_cpu_caching -> ttm_caching. (José)
>   - Add note in kernel-doc for dgpu and coherency modes for system
>     memory. (José)
> 
> Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
> Co-authored-by: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: José Roberto de Souza <jose.souza@intel.com>
> Cc: Filip Hazubski <filip.hazubski@intel.com>
> Cc: Carl Zhang <carl.zhang@intel.com>
> Cc: Effie Yu <effie.yu@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.c       | 105 ++++++++++++++++++++++++++-----
>  drivers/gpu/drm/xe/xe_bo.h       |   3 +-
>  drivers/gpu/drm/xe/xe_bo_types.h |  10 +++
>  drivers/gpu/drm/xe/xe_dma_buf.c  |   5 +-
>  include/uapi/drm/xe_drm.h        |  57 ++++++++++++++++-
>  5 files changed, 158 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 27726d4f3423..f3facd788f15 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -325,7 +325,7 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
>  	struct xe_device *xe = xe_bo_device(bo);
>  	struct xe_ttm_tt *tt;
>  	unsigned long extra_pages;
> -	enum ttm_caching caching = ttm_cached;
> +	enum ttm_caching caching;
>  	int err;
>  
>  	tt = kzalloc(sizeof(*tt), GFP_KERNEL);
> @@ -339,13 +339,25 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
>  		extra_pages = DIV_ROUND_UP(xe_device_ccs_bytes(xe, bo->size),
>  					   PAGE_SIZE);
>  
> +	switch (bo->smem_cpu_caching) {
> +	case XE_GEM_CPU_CACHING_WC:
> +		caching = ttm_write_combined;
> +		break;
> +	case XE_GEM_CPU_CACHING_UC:
> +		caching = ttm_uncached;
> +		break;
> +	default:
> +		caching = ttm_cached;
> +		break;
> +	}
> +
>  	/*
>  	 * Display scanout is always non-coherent with the CPU cache.
>  	 *
>  	 * For Xe_LPG and beyond, PPGTT PTE lookups are also non-coherent and
>  	 * require a CPU:WC mapping.
>  	 */
> -	if (bo->flags & XE_BO_SCANOUT_BIT ||
> +	if ((!bo->smem_cpu_caching && bo->flags & XE_BO_SCANOUT_BIT) ||

Is this change just so that we'll honor XE_GEM_CPU_CACHING_UC rather
than promoting to WC?  More questions about that farther down...

It seems farther down we're allowing CPU:WB for scanout on dgpu.  Is
that safe?  I thought display was non-coherent with the CPU cache no
matter what the GT-side coherency situation was.  Assuming it is safe,
then the first part of the comment above this block is no longer
accurate.

>  	    (xe->info.graphics_verx100 >= 1270 && bo->flags & XE_BO_PAGETABLE))
>  		caching = ttm_write_combined;
>  
> @@ -1184,9 +1196,10 @@ void xe_bo_free(struct xe_bo *bo)
>  	kfree(bo);
>  }
>  
> -struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> +struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>  				    struct xe_tile *tile, struct dma_resv *resv,
>  				    struct ttm_lru_bulk_move *bulk, size_t size,
> +				    u16 smem_cpu_caching, u16 coh_mode,
>  				    enum ttm_bo_type type, u32 flags)
>  {
>  	struct ttm_operation_ctx ctx = {
> @@ -1224,6 +1237,8 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>  	bo->tile = tile;
>  	bo->size = size;
>  	bo->flags = flags;
> +	bo->smem_cpu_caching = smem_cpu_caching;
> +	bo->coh_mode = coh_mode;
>  	bo->ttm.base.funcs = &xe_gem_object_funcs;
>  	bo->props.preferred_mem_class = XE_BO_PROPS_INVALID;
>  	bo->props.preferred_gt = XE_BO_PROPS_INVALID;
> @@ -1307,10 +1322,11 @@ static int __xe_bo_fixed_placement(struct xe_device *xe,
>  }
>  
>  struct xe_bo *
> -xe_bo_create_locked_range(struct xe_device *xe,
> -			  struct xe_tile *tile, struct xe_vm *vm,
> -			  size_t size, u64 start, u64 end,
> -			  enum ttm_bo_type type, u32 flags)
> +__xe_bo_create_locked(struct xe_device *xe,
> +		      struct xe_tile *tile, struct xe_vm *vm,
> +		      size_t size, u64 start, u64 end,
> +		      u16 smem_cpu_caching, u16 coh_mode,
> +		      enum ttm_bo_type type, u32 flags)
>  {
>  	struct xe_bo *bo = NULL;
>  	int err;
> @@ -1331,10 +1347,11 @@ xe_bo_create_locked_range(struct xe_device *xe,
>  		}
>  	}
>  
> -	bo = __xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
> +	bo = ___xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
>  				   vm && !xe_vm_in_fault_mode(vm) &&
>  				   flags & XE_BO_CREATE_USER_BIT ?
>  				   &vm->lru_bulk_move : NULL, size,
> +				   smem_cpu_caching, coh_mode,
>  				   type, flags);
>  	if (IS_ERR(bo))
>  		return bo;
> @@ -1368,11 +1385,35 @@ xe_bo_create_locked_range(struct xe_device *xe,
>  	return ERR_PTR(err);
>  }
>  
> +struct xe_bo *
> +xe_bo_create_locked_range(struct xe_device *xe,
> +			  struct xe_tile *tile, struct xe_vm *vm,
> +			  size_t size, u64 start, u64 end,
> +			  enum ttm_bo_type type, u32 flags)
> +{
> +	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);

It's a bit hard to keep track of all these wrappers, but I think this
one gets used via

  initial_plane_bo -> xe_bo_create_pin_map_at -> xe_bo_create_locked_range

right?  For that path, wouldn't we want XE_GEM_CPU_CACHING_WC for the
the smem_caching?

> +}
> +
>  struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
>  				  struct xe_vm *vm, size_t size,
>  				  enum ttm_bo_type type, u32 flags)
>  {
> -	return xe_bo_create_locked_range(xe, tile, vm, size, 0, ~0ULL, type, flags);
> +	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);
> +}
> +
> +static struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
> +				       struct xe_vm *vm, size_t size,
> +				       u16 smem_cpu_caching, u16 coh_mode,
> +				       enum ttm_bo_type type,
> +				       u32 flags)
> +{
> +	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
> +						 smem_cpu_caching, coh_mode, type,
> +						 flags | XE_BO_CREATE_USER_BIT);
> +	if (!IS_ERR(bo))
> +		xe_bo_unlock_vm_held(bo);
> +
> +	return bo;
>  }
>  
>  struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
> @@ -1755,11 +1796,11 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>  	struct drm_xe_gem_create *args = data;
>  	struct xe_vm *vm = NULL;
>  	struct xe_bo *bo;
> -	unsigned int bo_flags = XE_BO_CREATE_USER_BIT;
> +	unsigned int bo_flags;
>  	u32 handle;
>  	int err;
>  
> -	if (XE_IOCTL_DBG(xe, args->extensions) || XE_IOCTL_DBG(xe, args->pad) ||
> +	if (XE_IOCTL_DBG(xe, args->extensions) ||
>  	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
>  		return -EINVAL;
>  
> @@ -1801,6 +1842,32 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>  		bo_flags |= XE_BO_NEEDS_CPU_ACCESS;
>  	}
>  
> +	if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY))
> +		return -EINVAL;
> +
> +	if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC))
> +		return -EINVAL;
> +
> +	if (bo_flags & XE_BO_CREATE_SYSTEM_BIT) {
> +		if (XE_IOCTL_DBG(xe, !args->coh_mode))
> +			return -EINVAL;
> +
> +		if (XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
> +			return -EINVAL;
> +
> +		if (XE_IOCTL_DBG(xe, !IS_DGFX(xe) &&
> +				 bo_flags & XE_BO_SCANOUT_BIT &&
> +				 args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
> +			return -EINVAL;

This reminds me...do we have a check anywhere that rejects the
combination (dgfx && scanout && smem)?  on dgpus, the scanout must
always be in vram.

> +
> +		if (args->coh_mode == XE_GEM_COH_NONE) {
> +			if (XE_IOCTL_DBG(xe, args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
> +				return -EINVAL;
> +		}
> +	} else if (XE_IOCTL_DBG(xe, args->smem_cpu_caching)) {
> +		return -EINVAL;

Isn't this going to fail for dumb framebuffers?  In xe_bo_dumb_create()
you always pass XE_GEM_CPU_CACHING_WC, which will be combined with a
vram placement. 


BTW, is there any check for the ioctl being called with no placement
(neither system nor vram bits set)?  What happens in that case?

> +	}
> +
>  	if (args->vm_id) {
>  		vm = xe_vm_lookup(xef, args->vm_id);
>  		if (XE_IOCTL_DBG(xe, !vm))
> @@ -1812,8 +1879,10 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>  		}
>  	}
>  
> -	bo = xe_bo_create(xe, NULL, vm, args->size, ttm_bo_type_device,
> -			  bo_flags);
> +	bo = xe_bo_create_user(xe, NULL, vm, args->size,
> +			       args->smem_cpu_caching, args->coh_mode,
> +			       ttm_bo_type_device,
> +			       bo_flags);
>  	if (IS_ERR(bo)) {
>  		err = PTR_ERR(bo);
>  		goto out_vm;
> @@ -2105,10 +2174,12 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
>  	args->size = ALIGN(mul_u32_u32(args->pitch, args->height),
>  			   page_size);
>  
> -	bo = xe_bo_create(xe, NULL, NULL, args->size, ttm_bo_type_device,
> -			  XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> -			  XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
> -			  XE_BO_NEEDS_CPU_ACCESS);
> +	bo = xe_bo_create_user(xe, NULL, NULL, args->size,
> +			       XE_GEM_CPU_CACHING_WC, XE_GEM_COH_NONE,
> +			       ttm_bo_type_device,
> +			       XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> +			       XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
> +			       XE_BO_NEEDS_CPU_ACCESS);
>  	if (IS_ERR(bo))
>  		return PTR_ERR(bo);
>  
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index 4a68d869b3b5..4a0ee81fe598 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -81,9 +81,10 @@ struct sg_table;
>  struct xe_bo *xe_bo_alloc(void);
>  void xe_bo_free(struct xe_bo *bo);
>  
> -struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> +struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>  				    struct xe_tile *tile, struct dma_resv *resv,
>  				    struct ttm_lru_bulk_move *bulk, size_t size,
> +				    u16 smem_cpu_caching, u16 coh_mode,
>  				    enum ttm_bo_type type, u32 flags);
>  struct xe_bo *
>  xe_bo_create_locked_range(struct xe_device *xe,
> diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
> index 2ea9ad423170..9bee220a6872 100644
> --- a/drivers/gpu/drm/xe/xe_bo_types.h
> +++ b/drivers/gpu/drm/xe/xe_bo_types.h
> @@ -68,6 +68,16 @@ struct xe_bo {
>  	struct llist_node freed;
>  	/** @created: Whether the bo has passed initial creation */
>  	bool created;
> +	/**
> +	 * @coh_mode: Coherency setting. Currently only used for userspace
> +	 * objects.
> +	 */
> +	u16 coh_mode;
> +	/**
> +	 * @smem_cpu_caching: Caching mode for smem. Currently only used for

Would it be more accurate to say "CPU caching behavior requested by
userspace" since in the end the caching actually used may be something
different (especially if this value isn't filled by userspace).

> +	 * userspace objects.
> +	 */
> +	u16 smem_cpu_caching;
>  };
>  
>  #define intel_bo_to_drm_bo(bo) (&(bo)->ttm.base)
> diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
> index 09343b8b3e96..ac20dbc27a2b 100644
> --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> @@ -200,8 +200,9 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
>  	int ret;
>  
>  	dma_resv_lock(resv, NULL);
> -	bo = __xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> -				   ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
> +	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> +				    0, 0, /* Will require 1way or 2way for vm_bind */
> +				    ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
>  	if (IS_ERR(bo)) {
>  		ret = PTR_ERR(bo);
>  		goto error;
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index 00d5cb4ef85e..737bb1d4c6f7 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -456,8 +456,61 @@ struct drm_xe_gem_create {
>  	 */
>  	__u32 handle;
>  
> -	/** @pad: MBZ */
> -	__u32 pad;
> +	/**
> +	 * @coh_mode: The coherency mode for this object. This will limit the
> +	 * possible @smem_caching values.
> +	 *
> +	 * Supported values:
> +	 *
> +	 * XE_GEM_COH_NONE: GPU access is assumed to be not coherent with
> +	 * CPU. CPU caches are not snooped.
> +	 *
> +	 * XE_GEM_COH_AT_LEAST_1WAY:
> +	 *
> +	 * CPU-GPU coherency must be at least 1WAY.
> +	 *
> +	 * If 1WAY then GPU access is coherent with CPU (CPU caches are snooped)
> +	 * until GPU acquires. The acquire by the GPU is not tracked by CPU
> +	 * caches.
> +	 *
> +	 * If 2WAY then should be fully coherent between GPU and CPU.  Fully
> +	 * tracked by CPU caches. Both CPU and GPU caches are snooped.
> +	 *
> +	 * Note: On dgpu the GPU device never caches system memory (outside of
> +	 * the special system-memory-read-only cache, which is anyway flushed by
> +	 * KMD when nuking TLBs for a given object so should be no concern to
> +	 * userspace). The device should be thought of as always 1WAY coherent,
> +	 * with the addition that the GPU never caches system memory. At least
> +	 * on current dgpu HW there is no way to turn off snooping so likely the
> +	 * different coherency modes of the pat_index make no difference for
> +	 * system memory.

I don't follow this last part.  The distinction between non-coherent vs
1-way coherent means the GPU does or does not snoop the CPU's caches.
Whether the *GPU* caches ever contain smem data should be orthogonal?

I'm also not sure it's a good idea to state that dgpu's never cache
system memory in the UAPI documentation.  That's true for the platforms
we have today, but I don't think it's something that's guaranteed to be
true forever.  I don't think there's a technical reason why a future
dGPU couldn't start doing that?

> +	 */
> +#define XE_GEM_COH_NONE			1
> +#define XE_GEM_COH_AT_LEAST_1WAY	2
> +	__u16 coh_mode;
> +
> +	/**
> +	 * @smem_cpu_caching: The CPU caching mode to select for system memory.
> +	 *
> +	 * Supported values:
> +	 *
> +	 * XE_GEM_CPU_CACHING_WB: Allocate the pages with write-back caching.
> +	 * On iGPU this can't be used for scanout surfaces. The @coh_mode must
> +	 * be XE_GEM_COH_AT_LEAST_1WAY.

As noted above, I'm not sure whether the scanout limitation is igpu-only
or not.  Do you have a bspec reference that clarifies the behavior there?

> +	 *
> +	 * XE_GEM_CPU_CACHING_WC: Allocate the pages as write-combined. This is
> +	 * uncached. Any @coh_mode is permitted. Scanout surfaces should likely
> +	 * use this.
> +	 *
> +	 * XE_GEM_CPU_CACHING_UC: Allocate the pages as uncached. Any @coh_mode
> +	 * is permitted. Scanout surfaces are permitted to use this.

Is there any specific reason why userspace would need to request UC
rather than WC?  They both effectively act like uncached access from a
coherency point of view, but WC does some batching up of writes for
efficiency.

If we don't have a solid usecase for CACHING_UC today, I'd suggest
leaving it out for now.  We can always add it to the uapi in the future
if there's a need for it, but it's much harder to go the other direction
and remove it.


Matt

> +	 *
> +	 * MUST be left as zero for VRAM-only objects.
> +	 */
> +#define XE_GEM_CPU_CACHING_WB                      1
> +#define XE_GEM_CPU_CACHING_WC                      2
> +#define XE_GEM_CPU_CACHING_UC                      3
> +	__u16 smem_cpu_caching;
>  
>  	/** @reserved: Reserved */
>  	__u64 reserved[2];
> -- 
> 2.41.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-xe] [PATCH v2 2/6] drm/xe: move pat_table into device info
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 2/6] drm/xe: move pat_table into device info Matthew Auld
@ 2023-09-14 23:53   ` Matt Roper
  0 siblings, 0 replies; 28+ messages in thread
From: Matt Roper @ 2023-09-14 23:53 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Lucas De Marchi, intel-xe

On Thu, Sep 14, 2023 at 04:31:15PM +0100, Matthew Auld wrote:
> We need to able to know the max pat_index range for a given platform, as
> well being able to lookup the pat_index for a given platform in upcoming
> vm_bind uapi, where userspace can directly provide the pat_index. Move
> the platform definition of the pat_table into the device info with the
> idea of encoding more information about each pat_index in a future
> patch.
> 
> v2 (Lucas):
>   - s/ENODEV/ENOTSUPP/
>   - s/xe_pat_fill_info/xe_pat_init_early/
>   - Prefer new pat substruct in xe_info.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Pallavi Mishra <pallavi.mishra@intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_device_types.h | 11 ++++++++
>  drivers/gpu/drm/xe/xe_pat.c          | 39 ++++++++++++++++++----------
>  drivers/gpu/drm/xe/xe_pat.h          |  1 +
>  drivers/gpu/drm/xe/xe_pci.c          |  7 +++++
>  4 files changed, 44 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 5037b8c180b8..6c50d0f03466 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -238,6 +238,17 @@ struct xe_device {
>  		/** @enable_display: display enabled */
>  		u8 enable_display:1;
>  
> +		/** @pat: Platform information related to PAT settings. */

For the top-level comment here we may want to write out "Page Attribute
Table" once.

> +		struct {
> +			/**
> +			 * @table: The PAT table encoding for every pat_index
> +			 * supported by the platform.
> +			 */
> +			const u32 *table;

We should probably have a space between these two fields for
readability.

Aside from those minor tweaks,

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>

> +			/** @n_entries: The number of entries in the @table */
> +			int n_entries;
> +		} pat;
> +
>  #if IS_ENABLED(CONFIG_DRM_XE_DISPLAY)
>  		const struct intel_display_device_info *display;
>  		struct intel_display_runtime_info display_runtime;
> diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
> index 32b0c922e7fa..aa2c5eb88266 100644
> --- a/drivers/gpu/drm/xe/xe_pat.c
> +++ b/drivers/gpu/drm/xe/xe_pat.c
> @@ -106,24 +106,17 @@ static void program_pat_mcr(struct xe_gt *gt, const u32 table[], int n_entries)
>  	}
>  }
>  
> -void xe_pat_init(struct xe_gt *gt)
> +int xe_pat_init_early(struct xe_device *xe)
>  {
> -	struct xe_device *xe = gt_to_xe(gt);
> -
>  	if (xe->info.platform == XE_METEORLAKE) {
> -		/*
> -		 * SAMedia register offsets are adjusted by the write methods
> -		 * and they target registers that are not MCR, while for normal
> -		 * GT they are MCR
> -		 */
> -		if (xe_gt_is_media_type(gt))
> -			program_pat(gt, mtl_pat_table, ARRAY_SIZE(mtl_pat_table));
> -		else
> -			program_pat_mcr(gt, mtl_pat_table, ARRAY_SIZE(mtl_pat_table));
> +		xe->info.pat.table = mtl_pat_table;
> +		xe->info.pat.n_entries = ARRAY_SIZE(mtl_pat_table);
>  	} else if (xe->info.platform == XE_PVC || xe->info.platform == XE_DG2) {
> -		program_pat_mcr(gt, pvc_pat_table, ARRAY_SIZE(pvc_pat_table));
> +		xe->info.pat.table = pvc_pat_table;
> +		xe->info.pat.n_entries = ARRAY_SIZE(pvc_pat_table);
>  	} else if (GRAPHICS_VERx100(xe) <= 1210) {
> -		program_pat(gt, tgl_pat_table, ARRAY_SIZE(tgl_pat_table));
> +		xe->info.pat.table = tgl_pat_table;
> +		xe->info.pat.n_entries = ARRAY_SIZE(tgl_pat_table);
>  	} else {
>  		/*
>  		 * Going forward we expect to need new PAT settings for most
> @@ -135,7 +128,25 @@ void xe_pat_init(struct xe_gt *gt)
>  		 */
>  		drm_err(&xe->drm, "Missing PAT table for platform with graphics version %d.%02d!\n",
>  			GRAPHICS_VER(xe), GRAPHICS_VERx100(xe) % 100);
> +		return -ENOTSUPP;
>  	}
> +
> +	return 0;
> +}
> +
> +void xe_pat_init(struct xe_gt *gt)
> +{
> +	struct xe_device *xe = gt_to_xe(gt);
> +
> +	/*
> +	 * SAMedia register offsets are adjusted by the write methods
> +	 * and they target registers that are not MCR, while for normal
> +	 * GT they are MCR.
> +	 */
> +	if (xe_gt_is_media_type(gt) || GRAPHICS_VERx100(xe) < 1255)
> +		program_pat(gt, xe->info.pat.table, xe->info.pat.n_entries);
> +	else
> +		program_pat_mcr(gt, xe->info.pat.table, xe->info.pat.n_entries);
>  }
>  
>  void xe_pte_pat_init(struct xe_device *xe)
> diff --git a/drivers/gpu/drm/xe/xe_pat.h b/drivers/gpu/drm/xe/xe_pat.h
> index 5e71bd98d787..2f89503233b9 100644
> --- a/drivers/gpu/drm/xe/xe_pat.h
> +++ b/drivers/gpu/drm/xe/xe_pat.h
> @@ -28,6 +28,7 @@
>  struct xe_gt;
>  struct xe_device;
>  
> +int xe_pat_init_early(struct xe_device *xe);
>  void xe_pat_init(struct xe_gt *gt);
>  void xe_pte_pat_init(struct xe_device *xe);
>  unsigned int xe_pat_get_index(struct xe_device *xe, enum xe_cache_level cache);
> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> index dc233a1226bd..2806311bd6a4 100644
> --- a/drivers/gpu/drm/xe/xe_pci.c
> +++ b/drivers/gpu/drm/xe/xe_pci.c
> @@ -22,6 +22,7 @@
>  #include "xe_gt.h"
>  #include "xe_macros.h"
>  #include "xe_module.h"
> +#include "xe_pat.h"
>  #include "xe_pci_types.h"
>  #include "xe_pm.h"
>  #include "xe_step.h"
> @@ -531,6 +532,7 @@ static int xe_info_init(struct xe_device *xe,
>  	struct xe_tile *tile;
>  	struct xe_gt *gt;
>  	u8 id;
> +	int err;
>  
>  	xe->info.platform = desc->platform;
>  	xe->info.subplatform = subplatform_desc ?
> @@ -579,6 +581,11 @@ static int xe_info_init(struct xe_device *xe,
>  	xe->info.enable_display = IS_ENABLED(CONFIG_DRM_XE_DISPLAY) &&
>  				  enable_display &&
>  				  desc->has_display;
> +
> +	err = xe_pat_init_early(xe);
> +	if (err)
> +		return err;
> +
>  	/*
>  	 * All platforms have at least one primary GT.  Any platform with media
>  	 * version 13 or higher has an additional dedicated media GT.  And
> -- 
> 2.41.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-xe] [PATCH v2 4/6] drm/xe/pat: annotate pat_index with coherency mode
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 4/6] drm/xe/pat: annotate pat_index with coherency mode Matthew Auld
@ 2023-09-15  0:08   ` Matt Roper
  0 siblings, 0 replies; 28+ messages in thread
From: Matt Roper @ 2023-09-15  0:08 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Effie Yu, Filip Hazubski, Lucas De Marchi, intel-xe, Carl Zhang

On Thu, Sep 14, 2023 at 04:31:17PM +0100, Matthew Auld wrote:
> Future uapi needs to give userspace the ability to select the pat_index
> for a given vm_bind. However we need to be able to extract the coherency
> mode from the provided pat_index to ensure it matches the coherency mode
> set at object creation. There are various security reasons for why this
> matters.  However the pat_index itself is very platform specific, so
> seems reasonable to annotate each platform definition of the pat table.
> On some older platforms there is no explicit coherency mode, so we just
> pick whatever makes sense.
> 
> v2:
>   - Simplify with COH_AT_LEAST_1_WAY
>   - Add some kernel-doc
> 
> Bspec: 45101, 44235 #xe
> Bspec: 70552, 71582, 59400 #xe2
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Pallavi Mishra <pallavi.mishra@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: José Roberto de Souza <jose.souza@intel.com>
> Cc: Filip Hazubski <filip.hazubski@intel.com>
> Cc: Carl Zhang <carl.zhang@intel.com>
> Cc: Effie Yu <effie.yu@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_device_types.h |  2 +-
>  drivers/gpu/drm/xe/xe_pat.c          | 59 +++++++++++++++++-----------
>  drivers/gpu/drm/xe/xe_pat.h          | 18 +++++++++
>  3 files changed, 54 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 6c50d0f03466..959e095eb46c 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -244,7 +244,7 @@ struct xe_device {
>  			 * @table: The PAT table encoding for every pat_index
>  			 * supported by the platform.
>  			 */
> -			const u32 *table;
> +			const struct xe_pat_table_entry *table;
>  			/** @n_entries: The number of entries in the @table */
>  			int n_entries;
>  		} pat;
> diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
> index fb490982fd99..f4fceb3fa086 100644
> --- a/drivers/gpu/drm/xe/xe_pat.c
> +++ b/drivers/gpu/drm/xe/xe_pat.c
> @@ -4,6 +4,8 @@
>   */
>  
>  
> +#include <drm/xe_drm.h>
> +
>  #include "regs/xe_reg_defs.h"
>  #include "xe_gt.h"
>  #include "xe_gt_mcr.h"
> @@ -33,30 +35,30 @@
>  #define TGL_PAT_WC				REG_FIELD_PREP(TGL_MEM_TYPE_MASK, 1)
>  #define TGL_PAT_UC				REG_FIELD_PREP(TGL_MEM_TYPE_MASK, 0)
>  
> -static const u32 tgl_pat_table[] = {
> -	[0] = TGL_PAT_WB,
> -	[1] = TGL_PAT_WC,
> -	[2] = TGL_PAT_WT,
> -	[3] = TGL_PAT_UC,
> +static const struct xe_pat_table_entry tgl_pat_table[] = {
> +	[0] = { TGL_PAT_WB, XE_GEM_COH_AT_LEAST_1WAY },
> +	[1] = { TGL_PAT_WC, XE_GEM_COH_NONE },
> +	[2] = { TGL_PAT_WT, XE_GEM_COH_NONE },
> +	[3] = { TGL_PAT_UC, XE_GEM_COH_NONE },
>  };
>  
> -static const u32 pvc_pat_table[] = {
> -	[0] = TGL_PAT_UC,
> -	[1] = TGL_PAT_WC,
> -	[2] = TGL_PAT_WT,
> -	[3] = TGL_PAT_WB,
> -	[4] = PVC_PAT_CLOS(1) | TGL_PAT_WT,
> -	[5] = PVC_PAT_CLOS(1) | TGL_PAT_WB,
> -	[6] = PVC_PAT_CLOS(2) | TGL_PAT_WT,
> -	[7] = PVC_PAT_CLOS(2) | TGL_PAT_WB,
> +static const struct xe_pat_table_entry pvc_pat_table[] = {
> +	[0] = { TGL_PAT_UC, XE_GEM_COH_NONE },
> +	[1] = { TGL_PAT_WC, XE_GEM_COH_NONE },
> +	[2] = { TGL_PAT_WT, XE_GEM_COH_NONE },
> +	[3] = { TGL_PAT_WB, XE_GEM_COH_AT_LEAST_1WAY },
> +	[4] = { PVC_PAT_CLOS(1) | TGL_PAT_WT, XE_GEM_COH_NONE },
> +	[5] = { PVC_PAT_CLOS(1) | TGL_PAT_WB, XE_GEM_COH_AT_LEAST_1WAY },
> +	[6] = { PVC_PAT_CLOS(2) | TGL_PAT_WT, XE_GEM_COH_NONE },
> +	[7] = { PVC_PAT_CLOS(2) | TGL_PAT_WB, XE_GEM_COH_AT_LEAST_1WAY },
>  };
>  
> -static const u32 mtl_pat_table[] = {
> -	[0] = MTL_PAT_0_WB,
> -	[1] = MTL_PAT_1_WT,
> -	[2] = MTL_PAT_3_UC,
> -	[3] = MTL_PAT_0_WB | MTL_2_COH_1W,
> -	[4] = MTL_PAT_0_WB | MTL_3_COH_2W,
> +static const struct xe_pat_table_entry mtl_pat_table[] = {
> +	[0] = { MTL_PAT_0_WB, XE_GEM_COH_NONE },
> +	[1] = { MTL_PAT_1_WT, XE_GEM_COH_NONE },
> +	[2] = { MTL_PAT_3_UC, XE_GEM_COH_NONE },
> +	[3] = { MTL_PAT_0_WB | MTL_2_COH_1W, XE_GEM_COH_AT_LEAST_1WAY },
> +	[4] = { MTL_PAT_0_WB | MTL_3_COH_2W, XE_GEM_COH_AT_LEAST_1WAY },
>  };
>  
>  static const u32 xelp_pte_pat_table[XE_CACHE_LAST] = {
> @@ -78,27 +80,35 @@ static const u32 xelpg_pte_pat_table[XE_CACHE_LAST] = {
>  	[XE_CACHE_WB_1_WAY] = XELPG_PAT_WB_CACHE_1_WAY,
>  };
>  
> +u16 xe_pat_index_get_coh_mode(struct xe_device *xe, u16 pat_index)
> +{
> +	WARN_ON(pat_index >= xe->info.pat.n_entries);
> +	return xe->info.pat.table[pat_index].coh_mode;
> +}
> +
>  unsigned int xe_pat_get_index(struct xe_device *xe, enum xe_cache_level cache)
>  {
>  	WARN_ON(cache >= XE_CACHE_LAST);
>  	return (xe->pat_table).pte_pat_table[cache];
>  }
>  
> -static void program_pat(struct xe_gt *gt, const u32 table[], int n_entries)
> +static void program_pat(struct xe_gt *gt, const struct xe_pat_table_entry table[],
> +			int n_entries)
>  {
>  	for (int i = 0; i < n_entries; i++) {
>  		struct xe_reg reg = XE_REG(_PAT_INDEX(i));
>  
> -		xe_mmio_write32(gt, reg, table[i]);
> +		xe_mmio_write32(gt, reg, table[i].value);
>  	}
>  }
>  
> -static void program_pat_mcr(struct xe_gt *gt, const u32 table[], int n_entries)
> +static void program_pat_mcr(struct xe_gt *gt, const struct xe_pat_table_entry table[],
> +			    int n_entries)
>  {
>  	for (int i = 0; i < n_entries; i++) {
>  		struct xe_reg_mcr reg_mcr = XE_REG_MCR(_PAT_INDEX(i));
>  
> -		xe_gt_mcr_multicast_write(gt, reg_mcr, table[i]);
> +		xe_gt_mcr_multicast_write(gt, reg_mcr, table[i].value);
>  	}
>  }
>  
> @@ -111,6 +121,7 @@ int xe_pat_init_early(struct xe_device *xe)
>  		xe->info.pat.table = pvc_pat_table;
>  		xe->info.pat.n_entries = ARRAY_SIZE(pvc_pat_table);
>  	} else if (GRAPHICS_VERx100(xe) <= 1210) {
> +		WARN_ON_ONCE(!IS_DGFX(xe) && !xe->info.has_llc);
>  		xe->info.pat.table = tgl_pat_table;
>  		xe->info.pat.n_entries = ARRAY_SIZE(tgl_pat_table);
>  	} else {
> diff --git a/drivers/gpu/drm/xe/xe_pat.h b/drivers/gpu/drm/xe/xe_pat.h
> index 2f89503233b9..809332ff08d5 100644
> --- a/drivers/gpu/drm/xe/xe_pat.h
> +++ b/drivers/gpu/drm/xe/xe_pat.h
> @@ -28,9 +28,27 @@
>  struct xe_gt;
>  struct xe_device;
>  
> +/**
> + * struct xe_pat_table_entry - The pat_index encoding and other meta information.
> + */
> +struct xe_pat_table_entry {
> +	/**
> +	 * @value: The platform specific value encoding the various memory
> +	 * attributes (this maps to some fixed pat_index). So things like
> +	 * caching, coherency, compression etc can be encoded here.
> +	 */
> +	u32 value;

We probably want a blank line here to keep things readable.  Otherwise,

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>

> +	/**
> +	 * @coh_mode: The GPU coherency mode that @value maps to. Either
> +	 * XE_GEM_COH_NONE or XE_GEM_COH_AT_LEAST_1WAY.
> +	 */
> +	u16 coh_mode;
> +};
> +
>  int xe_pat_init_early(struct xe_device *xe);
>  void xe_pat_init(struct xe_gt *gt);
>  void xe_pte_pat_init(struct xe_device *xe);
>  unsigned int xe_pat_get_index(struct xe_device *xe, enum xe_cache_level cache);
> +u16 xe_pat_index_get_coh_mode(struct xe_device *xe, u16 pat_index);
>  
>  #endif
> -- 
> 2.41.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-xe] [PATCH v2 1/6] drm/xe/uapi: Add support for cache and coherency mode
  2023-09-14 23:47   ` Matt Roper
@ 2023-09-15  7:37     ` Matthew Auld
  0 siblings, 0 replies; 28+ messages in thread
From: Matthew Auld @ 2023-09-15  7:37 UTC (permalink / raw)
  To: Matt Roper
  Cc: Effie Yu, Filip Hazubski, Lucas De Marchi, intel-xe, Carl Zhang

On 15/09/2023 00:47, Matt Roper wrote:
> On Thu, Sep 14, 2023 at 04:31:14PM +0100, Matthew Auld wrote:
>> From: Pallavi Mishra <pallavi.mishra@intel.com>
>>
>> Allow userspace to specify the CPU caching mode to use for system memory
>> in addition to coherency modes during object creation. Modify gem create
>> handler and introduce xe_bo_create_user to replace xe_bo_create. In a
>> later patch we will support setting the pat_index as part of vm_bind,
>> where expectation is that the coherency mode extracted from the
>> pat_index must match the one set at object creation.
>>
>> v2
>>    - s/smem_caching/smem_cpu_caching/ and
>>      s/XE_GEM_CACHING/XE_GEM_CPU_CACHING/. (Matt Roper)
>>    - Drop COH_2WAY and just use COH_NONE + COH_AT_LEAST_1WAY; KMD mostly
>>      just cares that zeroing/swap-in can't be bypassed with the given
>>      smem_caching mode. (Matt Roper)
>>    - Fix broken range check for coh_mode and smem_cpu_caching and also
>>      don't use constant value, but the already defined macros. (José)
>>    - Prefer switch statement for smem_cpu_caching -> ttm_caching. (José)
>>    - Add note in kernel-doc for dgpu and coherency modes for system
>>      memory. (José)
>>
>> Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
>> Co-authored-by: Matthew Auld <matthew.auld@intel.com>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Cc: José Roberto de Souza <jose.souza@intel.com>
>> Cc: Filip Hazubski <filip.hazubski@intel.com>
>> Cc: Carl Zhang <carl.zhang@intel.com>
>> Cc: Effie Yu <effie.yu@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_bo.c       | 105 ++++++++++++++++++++++++++-----
>>   drivers/gpu/drm/xe/xe_bo.h       |   3 +-
>>   drivers/gpu/drm/xe/xe_bo_types.h |  10 +++
>>   drivers/gpu/drm/xe/xe_dma_buf.c  |   5 +-
>>   include/uapi/drm/xe_drm.h        |  57 ++++++++++++++++-
>>   5 files changed, 158 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>> index 27726d4f3423..f3facd788f15 100644
>> --- a/drivers/gpu/drm/xe/xe_bo.c
>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>> @@ -325,7 +325,7 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
>>   	struct xe_device *xe = xe_bo_device(bo);
>>   	struct xe_ttm_tt *tt;
>>   	unsigned long extra_pages;
>> -	enum ttm_caching caching = ttm_cached;
>> +	enum ttm_caching caching;
>>   	int err;
>>   
>>   	tt = kzalloc(sizeof(*tt), GFP_KERNEL);
>> @@ -339,13 +339,25 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
>>   		extra_pages = DIV_ROUND_UP(xe_device_ccs_bytes(xe, bo->size),
>>   					   PAGE_SIZE);
>>   
>> +	switch (bo->smem_cpu_caching) {
>> +	case XE_GEM_CPU_CACHING_WC:
>> +		caching = ttm_write_combined;
>> +		break;
>> +	case XE_GEM_CPU_CACHING_UC:
>> +		caching = ttm_uncached;
>> +		break;
>> +	default:
>> +		caching = ttm_cached;
>> +		break;
>> +	}
>> +
>>   	/*
>>   	 * Display scanout is always non-coherent with the CPU cache.
>>   	 *
>>   	 * For Xe_LPG and beyond, PPGTT PTE lookups are also non-coherent and
>>   	 * require a CPU:WC mapping.
>>   	 */
>> -	if (bo->flags & XE_BO_SCANOUT_BIT ||
>> +	if ((!bo->smem_cpu_caching && bo->flags & XE_BO_SCANOUT_BIT) ||
> 
> Is this change just so that we'll honor XE_GEM_CPU_CACHING_UC rather
> than promoting to WC?  More questions about that farther down...

smem_cpu_caching is sometimes left as zero, like for kernel internal 
object and maybe some other places, so we just pick WB if not set, 
unless there was XE_BO_SCANOUT_BIT. For example initial-fb is allocated 
by KMD with XE_BO_SCANOUT_BIT.

> 
> It seems farther down we're allowing CPU:WB for scanout on dgpu.  Is
> that safe?  I thought display was non-coherent with the CPU cache no
> matter what the GT-side coherency situation was.  Assuming it is safe,
> then the first part of the comment above this block is no longer
> accurate.

Userspace sometimes like to do VRAM + SMEM. On dgpu you can only scanout 
from VRAM, such that prior to the scanout it will force migrate to VRAM, 
if required. So no actual scanout on dgpu can happen with smem, and so 
there shouldn't be any non-coherent display access. That was at least my 
thinking.

> 
>>   	    (xe->info.graphics_verx100 >= 1270 && bo->flags & XE_BO_PAGETABLE))
>>   		caching = ttm_write_combined;
>>   
>> @@ -1184,9 +1196,10 @@ void xe_bo_free(struct xe_bo *bo)
>>   	kfree(bo);
>>   }
>>   
>> -struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>> +struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>>   				    struct xe_tile *tile, struct dma_resv *resv,
>>   				    struct ttm_lru_bulk_move *bulk, size_t size,
>> +				    u16 smem_cpu_caching, u16 coh_mode,
>>   				    enum ttm_bo_type type, u32 flags)
>>   {
>>   	struct ttm_operation_ctx ctx = {
>> @@ -1224,6 +1237,8 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>>   	bo->tile = tile;
>>   	bo->size = size;
>>   	bo->flags = flags;
>> +	bo->smem_cpu_caching = smem_cpu_caching;
>> +	bo->coh_mode = coh_mode;
>>   	bo->ttm.base.funcs = &xe_gem_object_funcs;
>>   	bo->props.preferred_mem_class = XE_BO_PROPS_INVALID;
>>   	bo->props.preferred_gt = XE_BO_PROPS_INVALID;
>> @@ -1307,10 +1322,11 @@ static int __xe_bo_fixed_placement(struct xe_device *xe,
>>   }
>>   
>>   struct xe_bo *
>> -xe_bo_create_locked_range(struct xe_device *xe,
>> -			  struct xe_tile *tile, struct xe_vm *vm,
>> -			  size_t size, u64 start, u64 end,
>> -			  enum ttm_bo_type type, u32 flags)
>> +__xe_bo_create_locked(struct xe_device *xe,
>> +		      struct xe_tile *tile, struct xe_vm *vm,
>> +		      size_t size, u64 start, u64 end,
>> +		      u16 smem_cpu_caching, u16 coh_mode,
>> +		      enum ttm_bo_type type, u32 flags)
>>   {
>>   	struct xe_bo *bo = NULL;
>>   	int err;
>> @@ -1331,10 +1347,11 @@ xe_bo_create_locked_range(struct xe_device *xe,
>>   		}
>>   	}
>>   
>> -	bo = __xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
>> +	bo = ___xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
>>   				   vm && !xe_vm_in_fault_mode(vm) &&
>>   				   flags & XE_BO_CREATE_USER_BIT ?
>>   				   &vm->lru_bulk_move : NULL, size,
>> +				   smem_cpu_caching, coh_mode,
>>   				   type, flags);
>>   	if (IS_ERR(bo))
>>   		return bo;
>> @@ -1368,11 +1385,35 @@ xe_bo_create_locked_range(struct xe_device *xe,
>>   	return ERR_PTR(err);
>>   }
>>   
>> +struct xe_bo *
>> +xe_bo_create_locked_range(struct xe_device *xe,
>> +			  struct xe_tile *tile, struct xe_vm *vm,
>> +			  size_t size, u64 start, u64 end,
>> +			  enum ttm_bo_type type, u32 flags)
>> +{
>> +	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);
> 
> It's a bit hard to keep track of all these wrappers, but I think this
> one gets used via
> 
>    initial_plane_bo -> xe_bo_create_pin_map_at -> xe_bo_create_locked_range
> 
> right?  For that path, wouldn't we want XE_GEM_CPU_CACHING_WC for the
> the smem_caching?

Right, that's the reason for the smem_cpu_caching == 0 && SCANOUT check 
in xe_ttm_tt_create. I figured default policy for kernel objects is 
always WB, unless it's used for scanout. That at least matches the 
current behaviour I think.

> 
>> +}
>> +
>>   struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
>>   				  struct xe_vm *vm, size_t size,
>>   				  enum ttm_bo_type type, u32 flags)
>>   {
>> -	return xe_bo_create_locked_range(xe, tile, vm, size, 0, ~0ULL, type, flags);
>> +	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);
>> +}
>> +
>> +static struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
>> +				       struct xe_vm *vm, size_t size,
>> +				       u16 smem_cpu_caching, u16 coh_mode,
>> +				       enum ttm_bo_type type,
>> +				       u32 flags)
>> +{
>> +	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
>> +						 smem_cpu_caching, coh_mode, type,
>> +						 flags | XE_BO_CREATE_USER_BIT);
>> +	if (!IS_ERR(bo))
>> +		xe_bo_unlock_vm_held(bo);
>> +
>> +	return bo;
>>   }
>>   
>>   struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
>> @@ -1755,11 +1796,11 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>>   	struct drm_xe_gem_create *args = data;
>>   	struct xe_vm *vm = NULL;
>>   	struct xe_bo *bo;
>> -	unsigned int bo_flags = XE_BO_CREATE_USER_BIT;
>> +	unsigned int bo_flags;
>>   	u32 handle;
>>   	int err;
>>   
>> -	if (XE_IOCTL_DBG(xe, args->extensions) || XE_IOCTL_DBG(xe, args->pad) ||
>> +	if (XE_IOCTL_DBG(xe, args->extensions) ||
>>   	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
>>   		return -EINVAL;
>>   
>> @@ -1801,6 +1842,32 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>>   		bo_flags |= XE_BO_NEEDS_CPU_ACCESS;
>>   	}
>>   
>> +	if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY))
>> +		return -EINVAL;
>> +
>> +	if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC))
>> +		return -EINVAL;
>> +
>> +	if (bo_flags & XE_BO_CREATE_SYSTEM_BIT) {
>> +		if (XE_IOCTL_DBG(xe, !args->coh_mode))
>> +			return -EINVAL;
>> +
>> +		if (XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
>> +			return -EINVAL;
>> +
>> +		if (XE_IOCTL_DBG(xe, !IS_DGFX(xe) &&
>> +				 bo_flags & XE_BO_SCANOUT_BIT &&
>> +				 args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
>> +			return -EINVAL;
> 
> This reminds me...do we have a check anywhere that rejects the
> combination (dgfx && scanout && smem)?  on dgpus, the scanout must
> always be in vram.

I think it is rejected in _xe_pin_fb_vma() when trying to migrate to 
VRAM0. If you didn't set VRAM as one of the placements it will fail, at 
least for userspace. But yeah, I guess would be good to also add some 
checks at gem_create.

> 
>> +
>> +		if (args->coh_mode == XE_GEM_COH_NONE) {
>> +			if (XE_IOCTL_DBG(xe, args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
>> +				return -EINVAL;
>> +		}
>> +	} else if (XE_IOCTL_DBG(xe, args->smem_cpu_caching)) {
>> +		return -EINVAL;
> 
> Isn't this going to fail for dumb framebuffers?  In xe_bo_dumb_create()
> you always pass XE_GEM_CPU_CACHING_WC, which will be combined with a
> vram placement.

Ok, I guess better make it conditional on dgpu instead in 
xe_bo_dumb_create().

> 
> 
> BTW, is there any check for the ioctl being called with no placement
> (neither system nor vram bits set)?  What happens in that case?

It looks like it is checked with:

if (XE_IOCTL_DBG(xe, !(args->flags & xe->info.mem_region_mask)))
     return -EINVAL;

> 
>> +	}
>> +
>>   	if (args->vm_id) {
>>   		vm = xe_vm_lookup(xef, args->vm_id);
>>   		if (XE_IOCTL_DBG(xe, !vm))
>> @@ -1812,8 +1879,10 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>>   		}
>>   	}
>>   
>> -	bo = xe_bo_create(xe, NULL, vm, args->size, ttm_bo_type_device,
>> -			  bo_flags);
>> +	bo = xe_bo_create_user(xe, NULL, vm, args->size,
>> +			       args->smem_cpu_caching, args->coh_mode,
>> +			       ttm_bo_type_device,
>> +			       bo_flags);
>>   	if (IS_ERR(bo)) {
>>   		err = PTR_ERR(bo);
>>   		goto out_vm;
>> @@ -2105,10 +2174,12 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
>>   	args->size = ALIGN(mul_u32_u32(args->pitch, args->height),
>>   			   page_size);
>>   
>> -	bo = xe_bo_create(xe, NULL, NULL, args->size, ttm_bo_type_device,
>> -			  XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
>> -			  XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
>> -			  XE_BO_NEEDS_CPU_ACCESS);
>> +	bo = xe_bo_create_user(xe, NULL, NULL, args->size,
>> +			       XE_GEM_CPU_CACHING_WC, XE_GEM_COH_NONE,
>> +			       ttm_bo_type_device,
>> +			       XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
>> +			       XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
>> +			       XE_BO_NEEDS_CPU_ACCESS);
>>   	if (IS_ERR(bo))
>>   		return PTR_ERR(bo);
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
>> index 4a68d869b3b5..4a0ee81fe598 100644
>> --- a/drivers/gpu/drm/xe/xe_bo.h
>> +++ b/drivers/gpu/drm/xe/xe_bo.h
>> @@ -81,9 +81,10 @@ struct sg_table;
>>   struct xe_bo *xe_bo_alloc(void);
>>   void xe_bo_free(struct xe_bo *bo);
>>   
>> -struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>> +struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>>   				    struct xe_tile *tile, struct dma_resv *resv,
>>   				    struct ttm_lru_bulk_move *bulk, size_t size,
>> +				    u16 smem_cpu_caching, u16 coh_mode,
>>   				    enum ttm_bo_type type, u32 flags);
>>   struct xe_bo *
>>   xe_bo_create_locked_range(struct xe_device *xe,
>> diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
>> index 2ea9ad423170..9bee220a6872 100644
>> --- a/drivers/gpu/drm/xe/xe_bo_types.h
>> +++ b/drivers/gpu/drm/xe/xe_bo_types.h
>> @@ -68,6 +68,16 @@ struct xe_bo {
>>   	struct llist_node freed;
>>   	/** @created: Whether the bo has passed initial creation */
>>   	bool created;
>> +	/**
>> +	 * @coh_mode: Coherency setting. Currently only used for userspace
>> +	 * objects.
>> +	 */
>> +	u16 coh_mode;
>> +	/**
>> +	 * @smem_cpu_caching: Caching mode for smem. Currently only used for
> 
> Would it be more accurate to say "CPU caching behavior requested by
> userspace" since in the end the caching actually used may be something
> different (especially if this value isn't filled by userspace).

Ok, will fix.

> 
>> +	 * userspace objects.
>> +	 */
>> +	u16 smem_cpu_caching;
>>   };
>>   
>>   #define intel_bo_to_drm_bo(bo) (&(bo)->ttm.base)
>> diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
>> index 09343b8b3e96..ac20dbc27a2b 100644
>> --- a/drivers/gpu/drm/xe/xe_dma_buf.c
>> +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
>> @@ -200,8 +200,9 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
>>   	int ret;
>>   
>>   	dma_resv_lock(resv, NULL);
>> -	bo = __xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
>> -				   ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
>> +	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
>> +				    0, 0, /* Will require 1way or 2way for vm_bind */
>> +				    ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
>>   	if (IS_ERR(bo)) {
>>   		ret = PTR_ERR(bo);
>>   		goto error;
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index 00d5cb4ef85e..737bb1d4c6f7 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -456,8 +456,61 @@ struct drm_xe_gem_create {
>>   	 */
>>   	__u32 handle;
>>   
>> -	/** @pad: MBZ */
>> -	__u32 pad;
>> +	/**
>> +	 * @coh_mode: The coherency mode for this object. This will limit the
>> +	 * possible @smem_caching values.
>> +	 *
>> +	 * Supported values:
>> +	 *
>> +	 * XE_GEM_COH_NONE: GPU access is assumed to be not coherent with
>> +	 * CPU. CPU caches are not snooped.
>> +	 *
>> +	 * XE_GEM_COH_AT_LEAST_1WAY:
>> +	 *
>> +	 * CPU-GPU coherency must be at least 1WAY.
>> +	 *
>> +	 * If 1WAY then GPU access is coherent with CPU (CPU caches are snooped)
>> +	 * until GPU acquires. The acquire by the GPU is not tracked by CPU
>> +	 * caches.
>> +	 *
>> +	 * If 2WAY then should be fully coherent between GPU and CPU.  Fully
>> +	 * tracked by CPU caches. Both CPU and GPU caches are snooped.
>> +	 *
>> +	 * Note: On dgpu the GPU device never caches system memory (outside of
>> +	 * the special system-memory-read-only cache, which is anyway flushed by
>> +	 * KMD when nuking TLBs for a given object so should be no concern to
>> +	 * userspace). The device should be thought of as always 1WAY coherent,
>> +	 * with the addition that the GPU never caches system memory. At least
>> +	 * on current dgpu HW there is no way to turn off snooping so likely the
>> +	 * different coherency modes of the pat_index make no difference for
>> +	 * system memory.
> 
> I don't follow this last part.  The distinction between non-coherent vs
> 1-way coherent means the GPU does or does not snoop the CPU's caches.
> Whether the *GPU* caches ever contain smem data should be orthogonal?

The spec seems to be make that distinction. "Discrete GPUs do not 
support caching of system memory in the device. The coherency mode that 
is supported in discrete GPUs are the one-way coherency mode with no 
caching of system memory inside the discrete GPU". So wether you select 
none, 1way or 2way there should be no difference for system memory on 
dgpu, that's at least what I was trying to get accross.

> 
> I'm also not sure it's a good idea to state that dgpu's never cache
> system memory in the UAPI documentation.  That's true for the platforms
> we have today, but I don't think it's something that's guaranteed to be
> true forever.  I don't think there's a technical reason why a future
> dGPU couldn't start doing that?

Yes, it could in theory start doing that, or even caching VRAM on the 
CPU side. Should I tweak with "current platforms including xe2"? Or 
should I just remove? The motivation here is just to make clearer: "what 
do I pick for current dgpu, and does it matter?".

> 
>> +	 */
>> +#define XE_GEM_COH_NONE			1
>> +#define XE_GEM_COH_AT_LEAST_1WAY	2
>> +	__u16 coh_mode;
>> +
>> +	/**
>> +	 * @smem_cpu_caching: The CPU caching mode to select for system memory.
>> +	 *
>> +	 * Supported values:
>> +	 *
>> +	 * XE_GEM_CPU_CACHING_WB: Allocate the pages with write-back caching.
>> +	 * On iGPU this can't be used for scanout surfaces. The @coh_mode must
>> +	 * be XE_GEM_COH_AT_LEAST_1WAY.
> 
> As noted above, I'm not sure whether the scanout limitation is igpu-only
> or not.  Do you have a bspec reference that clarifies the behavior there?

I guess answered above. We only scanout from VRAM on dgpu, but you are 
free to select VRAM + SMEM.

> 
>> +	 *
>> +	 * XE_GEM_CPU_CACHING_WC: Allocate the pages as write-combined. This is
>> +	 * uncached. Any @coh_mode is permitted. Scanout surfaces should likely
>> +	 * use this.
>> +	 *
>> +	 * XE_GEM_CPU_CACHING_UC: Allocate the pages as uncached. Any @coh_mode
>> +	 * is permitted. Scanout surfaces are permitted to use this.
> 
> Is there any specific reason why userspace would need to request UC
> rather than WC?  They both effectively act like uncached access from a
> coherency point of view, but WC does some batching up of writes for
> efficiency.
> 
> If we don't have a solid usecase for CACHING_UC today, I'd suggest
> leaving it out for now.  We can always add it to the uapi in the future
> if there's a need for it, but it's much harder to go the other direction
> and remove it.

Yup agree, will drop.

> 
> 
> Matt
> 
>> +	 *
>> +	 * MUST be left as zero for VRAM-only objects.
>> +	 */
>> +#define XE_GEM_CPU_CACHING_WB                      1
>> +#define XE_GEM_CPU_CACHING_WC                      2
>> +#define XE_GEM_CPU_CACHING_UC                      3
>> +	__u16 smem_cpu_caching;
>>   
>>   	/** @reserved: Reserved */
>>   	__u64 reserved[2];
>> -- 
>> 2.41.0
>>
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-xe] [PATCH v2 5/6] drm/xe/migrate: rather use pte_encode helpers
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 5/6] drm/xe/migrate: rather use pte_encode helpers Matthew Auld
@ 2023-09-15 22:19   ` Matt Roper
  0 siblings, 0 replies; 28+ messages in thread
From: Matt Roper @ 2023-09-15 22:19 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Lucas De Marchi, intel-xe

On Thu, Sep 14, 2023 at 04:31:18PM +0100, Matthew Auld wrote:
> We need to avoid using stuff like PPAT_CACHED directly, which is no
> longer going to work on newer platforms. At some point we can just
> directly use the pat_index, but for now just use XE_CACHE_WB.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Pallavi Mishra <pallavi.mishra@intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_migrate.c |  7 ++++---
>  drivers/gpu/drm/xe/xe_pt.c      | 12 ++++++------
>  drivers/gpu/drm/xe/xe_pt.h      |  2 ++
>  3 files changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index 46f88f3a8c58..26cbc9107501 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -257,8 +257,9 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  
>  		level = 2;
>  		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
> -		flags = XE_PAGE_RW | XE_PAGE_PRESENT | PPAT_CACHED |
> -			XE_PPGTT_PTE_DM | XE_PDPE_PS_1G;
> +
> +		flags = XE_PPGTT_PTE_DM;
> +		flags = __xe_pte_encode(flags, XE_CACHE_WB, vm, NULL, 2);

Might be best to pass 'level' as the final parameter since we already
have it sitting around as a local variable?

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>

>  
>  		/*
>  		 * Use 1GB pages, it shouldn't matter the physical amount of
> @@ -493,7 +494,7 @@ static void emit_pte(struct xe_migrate *m,
>  				addr += vram_region_gpu_offset(bo->ttm.resource);
>  				addr |= XE_PPGTT_PTE_DM;
>  			}
> -			addr |= PPAT_CACHED | XE_PAGE_PRESENT | XE_PAGE_RW;
> +			addr = __xe_pte_encode(addr, XE_CACHE_WB, m->q->vm, NULL, 0);
>  			bb->cs[bb->len++] = lower_32_bits(addr);
>  			bb->cs[bb->len++] = upper_32_bits(addr);
>  
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index b0874052f5ce..a1b164cf8bce 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -67,8 +67,8 @@ u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset)
>  	return pde;
>  }
>  
> -static u64 __pte_encode(u64 pte, enum xe_cache_level cache,
> -			struct xe_vm *vm, struct xe_vma *vma, u32 pt_level)
> +u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
> +		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level)
>  {
>  	struct xe_device *xe = vm->xe;
>  
> @@ -112,7 +112,7 @@ u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_
>  	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
>  		pte |= XE_PPGTT_PTE_DM;
>  
> -	return __pte_encode(pte, cache, vm, NULL, pt_level);
> +	return __xe_pte_encode(pte, cache, vm, NULL, pt_level);
>  }
>  
>  static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
> @@ -592,9 +592,9 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>  
>  		XE_WARN_ON(xe_walk->va_curs_start != addr);
>  
> -		pte = __pte_encode(is_null ? 0 :
> -				   xe_res_dma(curs) + xe_walk->dma_offset,
> -				   xe_walk->cache, xe_walk->vm, xe_walk->vma, level);
> +		pte = __xe_pte_encode(is_null ? 0 :
> +				      xe_res_dma(curs) + xe_walk->dma_offset,
> +				      xe_walk->cache, xe_walk->vm, xe_walk->vma, level);
>  		pte |= xe_walk->default_pte;
>  
>  		/*
> diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
> index 4a9143bc6628..0e66436d707d 100644
> --- a/drivers/gpu/drm/xe/xe_pt.h
> +++ b/drivers/gpu/drm/xe/xe_pt.h
> @@ -49,5 +49,7 @@ u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset);
>  
>  u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
>  		  u32 pt_level);
> +u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
> +		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level);
>  
>  #endif
> -- 
> 2.41.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-xe] [PATCH v2 6/6] drm/xe/uapi: support pat_index selection with vm_bind
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 6/6] drm/xe/uapi: support pat_index selection with vm_bind Matthew Auld
@ 2023-09-15 22:24   ` Matt Roper
  2023-09-25  8:07     ` Matthew Auld
  2023-09-25 21:56   ` Rodrigo Vivi
  1 sibling, 1 reply; 28+ messages in thread
From: Matt Roper @ 2023-09-15 22:24 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Effie Yu, Filip Hazubski, Lucas De Marchi, intel-xe, Carl Zhang

On Thu, Sep 14, 2023 at 04:31:19PM +0100, Matthew Auld wrote:
> Allow userspace to directly control the pat_index for a given vm
> binding. This should allow directly controlling the coherency, caching
> and potentially other stuff in the future for the ppGTT binding.
> 
> The exact meaning behind the pat_index is very platform specific (see
> BSpec or PRMs) but effectively maps to some predefined memory
> attributes. From the KMD pov we only care about the coherency that is
> provided by the pat_index, which falls into either NONE, 1WAY or 2WAY.
> The vm_bind coherency mode for the given pat_index needs to be at least
> as coherent as the coh_mode that was set at object creation. For
> platforms that lack the explicit coherency mode, we treat UC/WT/WC as
> NONE and WB as AT_LEAST_1WAY.
> 
> For userptr mappings we lack a corresponding gem object, so the expected
> coherency mode is instead implicit and must fall into either 1WAY or
> 2WAY. Trying to use NONE will be rejected by the kernel. For imported
> dma-buf (from a different device) the coherency mode is also implicit
> and must also be either 1WAY or 2WAY i.e AT_LEAST_1WAY.
> 
> As part of adding pat_index support with vm_bind we also need stop using
> xe_cache_level and instead use the pat_index in various places. We still
> make use of xe_cache_level, but only as a convenience for kernel
> internal objectsi (internally it maps to some reasonable pat_index). For

It feels like the internal refactoring to use pat index directly in PTE
encoding should probably be in a separate patch from the vm_bind uapi
changes that allow userspace to specify a PAT index.


Matt

> now this is just a 1:1 conversion of the existing code, however for
> platforms like MTL+ we might need to give more control through bo_create
> or stop using WB on the CPU side if we need CPU access.
> 
> v2:
>   - Undefined coh_mode(pat_index) can now be treated as programmer error. (Matt Roper)
>   - We now allow gem_create.coh_mode <= coh_mode(pat_index), rather than
>     having to match exactly. This ensures imported dma-buf can always
>     just use 1way (or even 2way), now that we also bundle 1way/2way into
>     at_least_1way. We still require 1way/2way for external dma-buf, but
>     the policy can now be the same for self-import, if desired.
>   - Use u16 for pat_index in uapi. u32 is massive overkill. (José)
>   - Move as much of the pat_index validation as we can into
>     vm_bind_ioctl_check_args. (José)
> 
> Bspec: 45101, 44235 #xe
> Bspec: 70552, 71582, 59400 #xe2
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Pallavi Mishra <pallavi.mishra@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: José Roberto de Souza <jose.souza@intel.com>
> Cc: Filip Hazubski <filip.hazubski@intel.com>
> Cc: Carl Zhang <carl.zhang@intel.com>
> Cc: Effie Yu <effie.yu@intel.com>
> ---
>  drivers/gpu/drm/xe/tests/xe_migrate.c |  2 +-
>  drivers/gpu/drm/xe/xe_ggtt.c          |  7 ++-
>  drivers/gpu/drm/xe/xe_ggtt_types.h    |  2 +-
>  drivers/gpu/drm/xe/xe_migrate.c       | 13 +++--
>  drivers/gpu/drm/xe/xe_pt.c            | 22 ++++-----
>  drivers/gpu/drm/xe/xe_pt.h            |  4 +-
>  drivers/gpu/drm/xe/xe_vm.c            | 69 +++++++++++++++++++++------
>  drivers/gpu/drm/xe/xe_vm_types.h      | 10 +++-
>  include/uapi/drm/xe_drm.h             | 43 ++++++++++++++++-
>  9 files changed, 128 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
> index 6b4388bfbb31..d3bf4751a2d7 100644
> --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> @@ -301,7 +301,7 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
>  	/* First part of the test, are we updating our pagetable bo with a new entry? */
>  	xe_map_wr(xe, &bo->vmap, XE_PAGE_SIZE * (NUM_KERNEL_PDE - 1), u64,
>  		  0xdeaddeadbeefbeef);
> -	expected = xe_pte_encode(m->q->vm, pt, 0, XE_CACHE_WB, 0);
> +	expected = xe_pte_encode(m->q->vm, pt, 0, xe_pat_get_index(xe, XE_CACHE_WB), 0);
>  	if (m->q->vm->flags & XE_VM_FLAG_64K)
>  		expected |= XE_PTE_PS64;
>  	if (xe_bo_is_vram(pt))
> diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> index aea26afd4668..7e4da16389af 100644
> --- a/drivers/gpu/drm/xe/xe_ggtt.c
> +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> @@ -41,7 +41,8 @@ u64 xe_ggtt_pte_encode(struct xe_bo *bo, u64 bo_offset)
>  		pte |= XE_GGTT_PTE_DM;
>  
>  	if ((ggtt->pat_encode).pte_encode)
> -		pte = (ggtt->pat_encode).pte_encode(xe, pte, XE_CACHE_WB_1_WAY);
> +		pte = (ggtt->pat_encode).pte_encode(xe, pte,
> +						    xe_pat_get_index(xe, XE_CACHE_WB_1_WAY));
>  
>  	return pte;
>  }
> @@ -102,10 +103,8 @@ static void primelockdep(struct xe_ggtt *ggtt)
>  }
>  
>  static u64 xelpg_ggtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
> -						enum xe_cache_level cache)
> +				     u16 pat_index)
>  {
> -	u32 pat_index = xe_pat_get_index(xe, cache);
> -
>  	pte_pat &= ~(XELPG_GGTT_PTE_PAT_MASK);
>  
>  	if (pat_index & BIT(0))
> diff --git a/drivers/gpu/drm/xe/xe_ggtt_types.h b/drivers/gpu/drm/xe/xe_ggtt_types.h
> index 7e55fac1a8a9..7981075bb228 100644
> --- a/drivers/gpu/drm/xe/xe_ggtt_types.h
> +++ b/drivers/gpu/drm/xe/xe_ggtt_types.h
> @@ -31,7 +31,7 @@ struct xe_ggtt {
>  
>  	struct {
>  		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat,
> -						enum xe_cache_level cache);
> +				  u16 pat_index);
>  	} pat_encode;
>  };
>  
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index 26cbc9107501..89d9e33a07e7 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -25,6 +25,7 @@
>  #include "xe_lrc.h"
>  #include "xe_map.h"
>  #include "xe_mocs.h"
> +#include "xe_pat.h"
>  #include "xe_pt.h"
>  #include "xe_res_cursor.h"
>  #include "xe_sched_job.h"
> @@ -162,6 +163,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  	u32 num_entries = NUM_PT_SLOTS, num_level = vm->pt_root[id]->level;
>  	u32 map_ofs, level, i;
>  	struct xe_bo *bo, *batch = tile->mem.kernel_bb_pool->bo;
> +	u16 pat_index = xe_pat_get_index(xe, XE_CACHE_WB);
>  	u64 entry;
>  	int ret;
>  
> @@ -196,7 +198,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  
>  	/* Map the entire BO in our level 0 pt */
>  	for (i = 0, level = 0; i < num_entries; level++) {
> -		entry = xe_pte_encode(vm, bo, i * XE_PAGE_SIZE, XE_CACHE_WB, 0);
> +		entry = xe_pte_encode(vm, bo, i * XE_PAGE_SIZE, pat_index, 0);
>  
>  		xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64, entry);
>  
> @@ -214,7 +216,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  		for (i = 0; i < batch->size;
>  		     i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
>  		     XE_PAGE_SIZE) {
> -			entry = xe_pte_encode(vm, batch, i, XE_CACHE_WB, 0);
> +			entry = xe_pte_encode(vm, batch, i, pat_index, 0);
>  
>  			xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64,
>  				  entry);
> @@ -259,7 +261,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
>  
>  		flags = XE_PPGTT_PTE_DM;
> -		flags = __xe_pte_encode(flags, XE_CACHE_WB, vm, NULL, 2);
> +		flags = __xe_pte_encode(flags, pat_index, vm, NULL, 2);
>  
>  		/*
>  		 * Use 1GB pages, it shouldn't matter the physical amount of
> @@ -454,6 +456,7 @@ static void emit_pte(struct xe_migrate *m,
>  		     struct xe_res_cursor *cur,
>  		     u32 size, struct xe_bo *bo)
>  {
> +	u16 pat_index = xe_pat_get_index(m->tile->xe, XE_CACHE_WB);
>  	u32 ptes;
>  	u64 ofs = at_pt * XE_PAGE_SIZE;
>  	u64 cur_ofs;
> @@ -494,7 +497,7 @@ static void emit_pte(struct xe_migrate *m,
>  				addr += vram_region_gpu_offset(bo->ttm.resource);
>  				addr |= XE_PPGTT_PTE_DM;
>  			}
> -			addr = __xe_pte_encode(addr, XE_CACHE_WB, m->q->vm, NULL, 0);
> +			addr = __xe_pte_encode(addr, pat_index, m->q->vm, NULL, 0);
>  			bb->cs[bb->len++] = lower_32_bits(addr);
>  			bb->cs[bb->len++] = upper_32_bits(addr);
>  
> @@ -1254,7 +1257,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
>  
>  			xe_tile_assert(tile, pt_bo->size == SZ_4K);
>  
> -			addr = xe_pte_encode(vm, pt_bo, 0, XE_CACHE_WB, 0);
> +			addr = xe_pte_encode(vm, pt_bo, 0, xe_pat_get_index(xe, XE_CACHE_WB), 0);
>  			bb->cs[bb->len++] = lower_32_bits(addr);
>  			bb->cs[bb->len++] = upper_32_bits(addr);
>  		}
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index a1b164cf8bce..7dd93cbff704 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -10,6 +10,7 @@
>  #include "xe_gt.h"
>  #include "xe_gt_tlb_invalidation.h"
>  #include "xe_migrate.h"
> +#include "xe_pat.h"
>  #include "xe_pt_types.h"
>  #include "xe_pt_walk.h"
>  #include "xe_res_cursor.h"
> @@ -67,7 +68,7 @@ u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset)
>  	return pde;
>  }
>  
> -u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
> +u64 __xe_pte_encode(u64 pte, u16 pat_index,
>  		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level)
>  {
>  	struct xe_device *xe = vm->xe;
> @@ -85,7 +86,7 @@ u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
>  	else if (pt_level == 2)
>  		pte |= XE_PDPE_PS_1G;
>  
> -	pte = vm->pat_encode.pte_encode(xe, pte, cache);
> +	pte = vm->pat_encode.pte_encode(xe, pte, pat_index);
>  
>  	/* XXX: Does hw support 1 GiB pages? */
>  	XE_WARN_ON(pt_level > 2);
> @@ -103,7 +104,7 @@ u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
>   *
>   * Return: An encoded page-table entry. No errors.
>   */
> -u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
> +u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, u16 pat_index,
>  		  u32 pt_level)
>  {
>  	u64 pte;
> @@ -112,7 +113,7 @@ u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_
>  	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
>  		pte |= XE_PPGTT_PTE_DM;
>  
> -	return __xe_pte_encode(pte, cache, vm, NULL, pt_level);
> +	return __xe_pte_encode(pte, pat_index, vm, NULL, pt_level);
>  }
>  
>  static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
> @@ -125,7 +126,7 @@ static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
>  
>  	if (level == 0) {
>  		u64 empty = xe_pte_encode(vm, vm->scratch_bo[id], 0,
> -					  XE_CACHE_WB, 0);
> +					  xe_pat_get_index(vm->xe, XE_CACHE_WB), 0);
>  
>  		return empty;
>  	} else {
> @@ -358,8 +359,6 @@ struct xe_pt_stage_bind_walk {
>  	struct xe_vm *vm;
>  	/** @tile: The tile we're building for. */
>  	struct xe_tile *tile;
> -	/** @cache: Desired cache level for the ptes */
> -	enum xe_cache_level cache;
>  	/** @default_pte: PTE flag only template. No address is associated */
>  	u64 default_pte;
>  	/** @dma_offset: DMA offset to add to the PTE. */
> @@ -594,7 +593,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>  
>  		pte = __xe_pte_encode(is_null ? 0 :
>  				      xe_res_dma(curs) + xe_walk->dma_offset,
> -				      xe_walk->cache, xe_walk->vm, xe_walk->vma, level);
> +				      xe_walk->vma->pat_index, xe_walk->vm, xe_walk->vma, level);
>  		pte |= xe_walk->default_pte;
>  
>  		/*
> @@ -720,13 +719,8 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
>  		if (vma && vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT)
>  			xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
>  		xe_walk.dma_offset = vram_region_gpu_offset(bo->ttm.resource);
> -		xe_walk.cache = XE_CACHE_WB;
> -	} else {
> -		if (!xe_vma_has_no_bo(vma) && bo->flags & XE_BO_SCANOUT_BIT)
> -			xe_walk.cache = XE_CACHE_WT;
> -		else
> -			xe_walk.cache = XE_CACHE_WB;
>  	}
> +
>  	if (!xe_vma_has_no_bo(vma) && xe_bo_is_stolen(bo))
>  		xe_walk.dma_offset = xe_ttm_stolen_gpu_offset(xe_bo_device(bo));
>  
> diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
> index 0e66436d707d..6d10823fca9b 100644
> --- a/drivers/gpu/drm/xe/xe_pt.h
> +++ b/drivers/gpu/drm/xe/xe_pt.h
> @@ -47,9 +47,9 @@ bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
>  
>  u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset);
>  
> -u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
> +u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, u16 pat_index,
>  		  u32 pt_level);
> -u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
> +u64 __xe_pte_encode(u64 pte, u16 pat_index,
>  		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level);
>  
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index ba612a5ee2d8..98db7a298139 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -6,6 +6,7 @@
>  #include "xe_vm.h"
>  
>  #include <linux/dma-fence-array.h>
> +#include <linux/nospec.h>
>  
>  #include <drm/drm_exec.h>
>  #include <drm/drm_print.h>
> @@ -858,7 +859,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>  				    u64 start, u64 end,
>  				    bool read_only,
>  				    bool is_null,
> -				    u8 tile_mask)
> +				    u8 tile_mask,
> +				    u16 pat_index)
>  {
>  	struct xe_vma *vma;
>  	struct xe_tile *tile;
> @@ -897,6 +899,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>  			vma->tile_mask |= 0x1 << id;
>  	}
>  
> +	vma->pat_index = pat_index;
> +
>  	if (vm->xe->info.platform == XE_PVC)
>  		vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>  
> @@ -1195,10 +1199,8 @@ static void xe_vma_op_work_func(struct work_struct *w);
>  static void vm_destroy_work_func(struct work_struct *w);
>  
>  static u64 xe2_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
> -						enum xe_cache_level cache)
> +				    u16 pat_index)
>  {
> -	u32 pat_index = xe_pat_get_index(xe, cache);
> -
>  	if (pat_index & BIT(0))
>  		pte_pat |= BIT(3);
>  
> @@ -1216,10 +1218,8 @@ static u64 xe2_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
>  }
>  
>  static u64 xelp_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
> -						enum xe_cache_level cache)
> +				     u16 pat_index)
>  {
> -	u32 pat_index = xe_pat_get_index(xe, cache);
> -
>  	if (pat_index & BIT(0))
>  		pte_pat |= BIT(3);
>  
> @@ -2300,7 +2300,7 @@ static void print_op(struct xe_device *xe, struct drm_gpuva_op *op)
>  static struct drm_gpuva_ops *
>  vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>  			 u64 bo_offset_or_userptr, u64 addr, u64 range,
> -			 u32 operation, u8 tile_mask, u32 region)
> +			 u32 operation, u8 tile_mask, u32 region, u16 pat_index)
>  {
>  	struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
>  	struct drm_gpuva_ops *ops;
> @@ -2327,6 +2327,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>  			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
>  
>  			op->tile_mask = tile_mask;
> +			op->pat_index = pat_index;
>  			op->map.immediate =
>  				operation & XE_VM_BIND_FLAG_IMMEDIATE;
>  			op->map.read_only =
> @@ -2354,6 +2355,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>  			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
>  
>  			op->tile_mask = tile_mask;
> +			op->pat_index = pat_index;
>  			op->prefetch.region = region;
>  		}
>  		break;
> @@ -2396,7 +2398,8 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>  }
>  
>  static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
> -			      u8 tile_mask, bool read_only, bool is_null)
> +			      u8 tile_mask, bool read_only, bool is_null,
> +			      u16 pat_index)
>  {
>  	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
>  	struct xe_vma *vma;
> @@ -2412,7 +2415,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
>  	vma = xe_vma_create(vm, bo, op->gem.offset,
>  			    op->va.addr, op->va.addr +
>  			    op->va.range - 1, read_only, is_null,
> -			    tile_mask);
> +			    tile_mask, pat_index);
>  	if (bo)
>  		xe_bo_unlock(bo);
>  
> @@ -2569,7 +2572,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>  
>  			vma = new_vma(vm, &op->base.map,
>  				      op->tile_mask, op->map.read_only,
> -				      op->map.is_null);
> +				      op->map.is_null, op->pat_index);
>  			if (IS_ERR(vma)) {
>  				err = PTR_ERR(vma);
>  				goto free_fence;
> @@ -2597,7 +2600,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>  
>  				vma = new_vma(vm, op->base.remap.prev,
>  					      op->tile_mask, read_only,
> -					      is_null);
> +					      is_null, op->pat_index);
>  				if (IS_ERR(vma)) {
>  					err = PTR_ERR(vma);
>  					goto free_fence;
> @@ -2633,7 +2636,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>  
>  				vma = new_vma(vm, op->base.remap.next,
>  					      op->tile_mask, read_only,
> -					      is_null);
> +					      is_null, op->pat_index);
>  				if (IS_ERR(vma)) {
>  					err = PTR_ERR(vma);
>  					goto free_fence;
> @@ -3146,7 +3149,23 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
>  		u32 obj = (*bind_ops)[i].obj;
>  		u64 obj_offset = (*bind_ops)[i].obj_offset;
>  		u32 region = (*bind_ops)[i].region;
> +		u16 pat_index = (*bind_ops)[i].pat_index;
>  		bool is_null = op & XE_VM_BIND_FLAG_NULL;
> +		u16 coh_mode;
> +
> +		if (XE_IOCTL_DBG(xe, pat_index >= xe->info.pat.n_entries)) {
> +			err = -EINVAL;
> +			goto free_bind_ops;
> +		}
> +
> +		pat_index = array_index_nospec(pat_index,
> +					       xe->info.pat.n_entries);
> +		(*bind_ops)[i].pat_index = pat_index;
> +		coh_mode = xe_pat_index_get_coh_mode(xe, pat_index);
> +		if (XE_WARN_ON(!coh_mode || coh_mode > XE_GEM_COH_AT_LEAST_1WAY)) {
> +			err = -EINVAL;
> +			goto free_bind_ops;
> +		}
>  
>  		if (i == 0) {
>  			*async = !!(op & XE_VM_BIND_FLAG_ASYNC);
> @@ -3188,6 +3207,8 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
>  				 VM_BIND_OP(op) == XE_VM_BIND_OP_UNMAP_ALL) ||
>  		    XE_IOCTL_DBG(xe, obj &&
>  				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP_USERPTR) ||
> +		    XE_IOCTL_DBG(xe, coh_mode == XE_GEM_COH_NONE &&
> +				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP_USERPTR) ||
>  		    XE_IOCTL_DBG(xe, obj &&
>  				 VM_BIND_OP(op) == XE_VM_BIND_OP_PREFETCH) ||
>  		    XE_IOCTL_DBG(xe, region &&
> @@ -3336,6 +3357,8 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  		u64 addr = bind_ops[i].addr;
>  		u32 obj = bind_ops[i].obj;
>  		u64 obj_offset = bind_ops[i].obj_offset;
> +		u16 pat_index = bind_ops[i].pat_index;
> +		u16 coh_mode;
>  
>  		if (!obj)
>  			continue;
> @@ -3363,6 +3386,23 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  				goto put_obj;
>  			}
>  		}
> +
> +		coh_mode = xe_pat_index_get_coh_mode(xe, pat_index);
> +		if (bos[i]->coh_mode) {
> +			if (XE_IOCTL_DBG(xe, coh_mode < bos[i]->coh_mode)) {
> +				err = -EINVAL;
> +				goto put_obj;
> +			}
> +		} else if (XE_IOCTL_DBG(xe, coh_mode == XE_GEM_COH_NONE)) {
> +			/*
> +			 * Imported dma-buf from a different device should
> +			 * require 1way or 2way coherency since we don't know
> +			 * how it was mapped on the CPU. Just assume is it
> +			 * potentially cached on CPU side.
> +			 */
> +			err = -EINVAL;
> +			goto put_obj;
> +		}
>  	}
>  
>  	if (args->num_syncs) {
> @@ -3400,10 +3440,11 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  		u64 obj_offset = bind_ops[i].obj_offset;
>  		u8 tile_mask = bind_ops[i].tile_mask;
>  		u32 region = bind_ops[i].region;
> +		u16 pat_index = bind_ops[i].pat_index;
>  
>  		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset,
>  						  addr, range, op, tile_mask,
> -						  region);
> +						  region, pat_index);
>  		if (IS_ERR(ops[i])) {
>  			err = PTR_ERR(ops[i]);
>  			ops[i] = NULL;
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index dc583f00919f..54658f400174 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -111,6 +111,11 @@ struct xe_vma {
>  	 */
>  	u8 tile_present;
>  
> +	/**
> +	 * @pat_index: The pat index to use when encoding the PTEs for this vma.
> +	 */
> +	u16 pat_index;
> +
>  	struct {
>  		struct list_head rebind_link;
>  	} notifier;
> @@ -338,8 +343,7 @@ struct xe_vm {
>  	bool batch_invalidate_tlb;
>  
>  	struct {
> -		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat,
> -				  enum xe_cache_level cache);
> +		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat, u16 pat_index);
>  	} pat_encode;
>  };
>  
> @@ -419,6 +423,8 @@ struct xe_vma_op {
>  	struct async_op_fence *fence;
>  	/** @tile_mask: gt mask for this operation */
>  	u8 tile_mask;
> +	/** @pat_index: The pat index to use for this operation. */
> +	u16 pat_index;
>  	/** @flags: operation flags */
>  	enum xe_vma_op_flags flags;
>  
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index 737bb1d4c6f7..75b42c1116f2 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -605,8 +605,49 @@ struct drm_xe_vm_bind_op {
>  	 */
>  	__u32 obj;
>  
> +	/**
> +	 * @pat_index: The platform defined @pat_index to use for this mapping.
> +	 * The index basically maps to some predefined memory attributes,
> +	 * including things like caching, coherency, compression etc.  The exact
> +	 * meaning of the pat_index is platform specific and defined in the
> +	 * Bspec and PRMs.  When the KMD sets up the binding the index here is
> +	 * encoded into the ppGTT PTE.
> +	 *
> +	 * For coherency the @pat_index needs to be least as coherent as
> +	 * drm_xe_gem_create.coh_mode. i.e coh_mode(pat_index) >=
> +	 * drm_xe_gem_create.coh_mode. The KMD will extract the coherency mode
> +	 * from the @pat_index and reject if there is a mismatch (see note below
> +	 * for pre-MTL platforms).
> +	 *
> +	 * Note: On pre-MTL platforms there is only a caching mode and no
> +	 * explicit coherency mode, but on such hardware there is always a
> +	 * shared-LLC (or is dgpu) so all GT memory accesses are coherent with
> +	 * CPU caches even with the caching mode set as uncached.  It's only the
> +	 * display engine that is incoherent (on dgpu it must be in VRAM which
> +	 * is always mapped as WC on the CPU). However to keep the uapi somewhat
> +	 * consistent with newer platforms the KMD groups the different cache
> +	 * levels into the following coherency buckets on all pre-MTL platforms:
> +	 *
> +	 *	ppGTT UC -> XE_GEM_COH_NONE
> +	 *	ppGTT WC -> XE_GEM_COH_NONE
> +	 *	ppGTT WT -> XE_GEM_COH_NONE
> +	 *	ppGTT WB -> XE_GEM_COH_AT_LEAST_1WAY
> +	 *
> +	 * In practice UC/WC/WT should only ever used for scanout surfaces on
> +	 * such platforms (or perhaps in general for dma-buf if shared with
> +	 * another device) since it is only the display engine that is actually
> +	 * incoherent.  Everything else should typically use WB given that we
> +	 * have a shared-LLC.  On MTL+ this completely changes and the HW
> +	 * defines the coherency mode as part of the @pat_index, where
> +	 * incoherent GT access is possible.
> +	 *
> +	 * Note: For userptr and externally imported dma-buf the kernel expects
> +	 * either 1WAY or 2WAY for the @pat_index.
> +	 */
> +	__u16 pat_index;
> +
>  	/** @pad: MBZ */
> -	__u32 pad;
> +	__u16 pad;
>  
>  	union {
>  		/**
> -- 
> 2.41.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-xe] [PATCH v2 0/6] PAT and cache coherency support
  2023-09-14 15:31 [Intel-xe] [PATCH v2 0/6] PAT and cache coherency support Matthew Auld
                   ` (6 preceding siblings ...)
  2023-09-14 18:16 ` [Intel-xe] ✗ CI.Patch_applied: failure for PAT and cache coherency support (rev2) Patchwork
@ 2023-09-18 15:51 ` Souza, Jose
  2023-09-21 17:19   ` Souza, Jose
  2023-09-21 20:10 ` [Intel-xe] ✗ CI.Patch_applied: failure for PAT and cache coherency support (rev3) Patchwork
  8 siblings, 1 reply; 28+ messages in thread
From: Souza, Jose @ 2023-09-18 15:51 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Auld,  Matthew

[-- Attachment #1: Type: text/plain, Size: 1822 bytes --]

On Thu, 2023-09-14 at 16:31 +0100, Matthew Auld wrote:
> Branch available here (lightly tested):
> https://gitlab.freedesktop.org/mwa/kernel/-/tree/xe-pat-index?ref_type=heads
> 
> Series still needs some more testing. Also note that the series directly depends
> on the WIP patch here: https://patchwork.freedesktop.org/series/122708/
> 
> Goal here is to allow userspace to directly control the pat_index when mapping
> memory via the ppGTT, in addtion to the CPU caching mode for system memory. This
> is very much needed on newer igpu platforms which allow incoherent GT access,
> where the choice over the cache level and expected coherency is best left to
> userspace depending on their usecase.  In the future there may also be other
> stuff encoded in the pat_index, so giving userspace direct control will also be
> needed there.
> 
> To support this we added new gem_create uAPI for selecting the CPU cache
> mode to use for system memory, including the expected GPU coherency mode. There
> are various restrictions here for the selected coherency mode and compatible CPU
> cache modes.  With that in place the actual pat_index can now be provided as
> part of vm_bind. The only restriction is that the coherency mode of the
> pat_index must be at least as coherent as the gem_create coherency mode. There
> are also some special cases like with userptr and dma-buf.
> 
> v2:
>   - Loads of improvements/tweaks. Main changes are to now allow
>     gem_create.coh_mode <= coh_mode(pat_index), rather than it needing to match
>     exactly. This simplifies the dma-buf policy from userspace pov. Also we now
>     only consider COH_NONE and COH_AT_LEAST_1WAY.
> 


Getting constant DMAR errors after loading Xe KMD on TGL with your branch in framebuffer console, logs attached.



[-- Attachment #2: dmesg.txt --]
[-- Type: text/plain, Size: 156718 bytes --]

[    0.000000] Linux version 6.5.0-rc7+zeh-xe+ (zehortigoza@josouza-mobl2) (gcc (GCC) 13.2.1 20230801, GNU ld (GNU Binutils) 2.41.0) #1105 SMP PREEMPT_DYNAMIC Mon Sep 18 08:43:01 PDT 2023
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.5.0-rc7+zeh-xe+ root=/dev/nvme0n1p3 ro mitigations=off drm.debug=0xe modprobe.blacklist=i915 modprobe.blacklist=xe
[    0.000000] x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009f000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000049dc1fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000049dc2000-0x0000000063510fff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000063511000-0x0000000063d71fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x0000000063d72000-0x0000000063ffefff] ACPI data
[    0.000000] BIOS-e820: [mem 0x0000000063fff000-0x0000000063ffffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000064000000-0x0000000067ffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000069500000-0x00000000695fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000069e00000-0x00000000707fffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000c0000000-0x00000000cfffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed20000-0x00000000fed7ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000028f7fffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] efi: EFI v2.7 by Dell
[    0.000000] efi: ACPI=0x63ffe000 ACPI 2.0=0x63ffe014 SMBIOS=0x4a468000 TPMFinalLog=0x63ce8000 ESRT=0x4a3ccd98 MEMATTR=0x44bdd018 RNG=0x63f70018 TPMEventLog=0x44bca018 
[    0.000000] random: crng init done
[    0.000000] efi: Remove mem84: MMIO range=[0xc0000000-0xcfffffff] (256MB) from e820 map
[    0.000000] e820: remove [mem 0xc0000000-0xcfffffff] reserved
[    0.000000] efi: Remove mem86: MMIO range=[0xff000000-0xffffffff] (16MB) from e820 map
[    0.000000] e820: remove [mem 0xff000000-0xffffffff] reserved
[    0.000000] SMBIOS 3.2 present.
[    0.000000] DMI: Dell Inc. Latitude 5420/01M3M4, BIOS 1.27.0 03/17/2023
[    0.000000] tsc: Detected 1500.000 MHz processor
[    0.000000] tsc: Detected 1497.600 MHz TSC
[    0.000005] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[    0.000008] e820: remove [mem 0x000a0000-0x000fffff] usable
[    0.000013] last_pfn = 0x28f800 max_arch_pfn = 0x400000000
[    0.000015] MTRR map: 5 entries (3 fixed + 2 variable; max 23), built from 10 variable MTRRs
[    0.000017] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT  
[    0.000297] last_pfn = 0x64000 max_arch_pfn = 0x400000000
[    0.000301] esrt: Reserving ESRT space from 0x000000004a3ccd98 to 0x000000004a3ccdf8.
[    0.000308] Using GB pages for direct mapping
[    0.000523] Secure boot disabled
[    0.000525] ACPI: Early table checksum verification disabled
[    0.000527] ACPI: RSDP 0x0000000063FFE014 000024 (v02 DELL  )
[    0.000531] ACPI: XSDT 0x0000000063F78188 00010C (v01 DELL   Dell Inc 00000002      01000013)
[    0.000536] ACPI: FACP 0x0000000063FF5000 000114 (v06 DELL   Dell Inc 00000002      01000013)
[    0.000540] ACPI: DSDT 0x0000000063F96000 05B86C (v02 DELL   Dell Inc 00000002      01000013)
[    0.000543] ACPI: FACS 0x0000000063D1B000 000040
[    0.000546] ACPI: SSDT 0x0000000063FFA000 0024D0 (v02 CpuRef CpuSsdt  00003000 INTL 20191018)
[    0.000548] ACPI: SSDT 0x0000000063FF6000 003714 (v02 DptfTb DptfTabl 00001000 INTL 20191018)
[    0.000551] ACPI: HPET 0x0000000063FF4000 000038 (v01 DELL   Dell Inc 00000002      01000013)
[    0.000554] ACPI: APIC 0x0000000063FF3000 00012C (v04 DELL   Dell Inc 00000002      01000013)
[    0.000556] ACPI: MCFG 0x0000000063FF2000 00003C (v01 DELL   Dell Inc 00000002      01000013)
[    0.000559] ACPI: SSDT 0x0000000063F95000 000A65 (v02 DELL   DellRtd3 00001000 INTL 20191018)
[    0.000562] ACPI: NHLT 0x0000000063F94000 00002D (v00 DELL   Dell Inc 00000002      01000013)
[    0.000564] ACPI: SSDT 0x0000000063F91000 002BE5 (v02 SaSsdt SaSsdt   00003000 INTL 20191018)
[    0.000567] ACPI: SSDT 0x0000000063F8F000 0012AA (v02 INTEL  IgfxSsdt 00003000 INTL 20191018)
[    0.000569] ACPI: SSDT 0x0000000063F83000 00B1B6 (v02 INTEL  TcssSsdt 00001000 INTL 20191018)
[    0.000572] ACPI: SSDT 0x0000000063F82000 000D58 (v02 DELL   UsbCTabl 00001000 INTL 20191018)
[    0.000575] ACPI: LPIT 0x0000000063F81000 0000CC (v01 DELL   Dell Inc 00000002      01000013)
[    0.000577] ACPI: WSMT 0x0000000063F80000 000028 (v01 DELL   Dell Inc 00000002      01000013)
[    0.000580] ACPI: SSDT 0x0000000063F7F000 000B75 (v02 DELL   PtidDevc 00001000 INTL 20191018)
[    0.000582] ACPI: SSDT 0x0000000063F7E000 00012A (v02 DELL   TbtTypeC 00000000 INTL 20191018)
[    0.000585] ACPI: DBGP 0x0000000063F7D000 000034 (v01 DELL   Dell Inc 00000002      01000013)
[    0.000588] ACPI: DBG2 0x0000000063F7C000 000054 (v00 DELL   Dell Inc 00000002      01000013)
[    0.000590] ACPI: BOOT 0x0000000063F7B000 000028 (v01 DELL   CBX3     00000002      01000013)
[    0.000593] ACPI: SSDT 0x0000000063F7A000 00060E (v02 DELL   Tpm2Tabl 00001000 INTL 20191018)
[    0.000595] ACPI: TPM2 0x0000000063F79000 00004C (v04 DELL   Dell Inc 00000002      01000013)
[    0.000598] ACPI: MSDM 0x0000000063FFD000 000055 (v03 DELL   CBX3     06222004 AMI  00010013)
[    0.000601] ACPI: DMAR 0x0000000063F77000 0000B8 (v02 INTEL  Dell Inc 00000002      01000013)
[    0.000603] ACPI: SSDT 0x0000000063F76000 000A84 (v02 DELL   xh_Dell_ 00000000 INTL 20191018)
[    0.000606] ACPI: SSDT 0x0000000063F75000 000144 (v02 Intel  ADebTabl 00001000 INTL 20191018)
[    0.000609] ACPI: ASF! 0x0000000063F74000 0000A0 (v32 DELL   Dell Inc 00000002      01000013)
[    0.000611] ACPI: PTDT 0x0000000063F73000 000D44 (v00 DELL   Dell Inc 00000005 MSFT 0100000D)
[    0.000614] ACPI: BGRT 0x0000000063F72000 000038 (v01 DELL   Dell Inc 00000002      01000013)
[    0.000616] ACPI: FPDT 0x0000000063F71000 000034 (v01 DELL   Dell Inc 00000002      01000013)
[    0.000619] ACPI: Reserving FACP table memory at [mem 0x63ff5000-0x63ff5113]
[    0.000620] ACPI: Reserving DSDT table memory at [mem 0x63f96000-0x63ff186b]
[    0.000621] ACPI: Reserving FACS table memory at [mem 0x63d1b000-0x63d1b03f]
[    0.000621] ACPI: Reserving SSDT table memory at [mem 0x63ffa000-0x63ffc4cf]
[    0.000622] ACPI: Reserving SSDT table memory at [mem 0x63ff6000-0x63ff9713]
[    0.000623] ACPI: Reserving HPET table memory at [mem 0x63ff4000-0x63ff4037]
[    0.000624] ACPI: Reserving APIC table memory at [mem 0x63ff3000-0x63ff312b]
[    0.000624] ACPI: Reserving MCFG table memory at [mem 0x63ff2000-0x63ff203b]
[    0.000625] ACPI: Reserving SSDT table memory at [mem 0x63f95000-0x63f95a64]
[    0.000626] ACPI: Reserving NHLT table memory at [mem 0x63f94000-0x63f9402c]
[    0.000626] ACPI: Reserving SSDT table memory at [mem 0x63f91000-0x63f93be4]
[    0.000627] ACPI: Reserving SSDT table memory at [mem 0x63f8f000-0x63f902a9]
[    0.000628] ACPI: Reserving SSDT table memory at [mem 0x63f83000-0x63f8e1b5]
[    0.000628] ACPI: Reserving SSDT table memory at [mem 0x63f82000-0x63f82d57]
[    0.000629] ACPI: Reserving LPIT table memory at [mem 0x63f81000-0x63f810cb]
[    0.000630] ACPI: Reserving WSMT table memory at [mem 0x63f80000-0x63f80027]
[    0.000631] ACPI: Reserving SSDT table memory at [mem 0x63f7f000-0x63f7fb74]
[    0.000631] ACPI: Reserving SSDT table memory at [mem 0x63f7e000-0x63f7e129]
[    0.000632] ACPI: Reserving DBGP table memory at [mem 0x63f7d000-0x63f7d033]
[    0.000633] ACPI: Reserving DBG2 table memory at [mem 0x63f7c000-0x63f7c053]
[    0.000633] ACPI: Reserving BOOT table memory at [mem 0x63f7b000-0x63f7b027]
[    0.000634] ACPI: Reserving SSDT table memory at [mem 0x63f7a000-0x63f7a60d]
[    0.000635] ACPI: Reserving TPM2 table memory at [mem 0x63f79000-0x63f7904b]
[    0.000635] ACPI: Reserving MSDM table memory at [mem 0x63ffd000-0x63ffd054]
[    0.000636] ACPI: Reserving DMAR table memory at [mem 0x63f77000-0x63f770b7]
[    0.000637] ACPI: Reserving SSDT table memory at [mem 0x63f76000-0x63f76a83]
[    0.000638] ACPI: Reserving SSDT table memory at [mem 0x63f75000-0x63f75143]
[    0.000638] ACPI: Reserving ASF! table memory at [mem 0x63f74000-0x63f7409f]
[    0.000639] ACPI: Reserving PTDT table memory at [mem 0x63f73000-0x63f73d43]
[    0.000640] ACPI: Reserving BGRT table memory at [mem 0x63f72000-0x63f72037]
[    0.000640] ACPI: Reserving FPDT table memory at [mem 0x63f71000-0x63f71033]
[    0.000673] Zone ranges:
[    0.000673]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.000675]   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
[    0.000677]   Normal   [mem 0x0000000100000000-0x000000028f7fffff]
[    0.000678]   Device   empty
[    0.000679] Movable zone start for each node
[    0.000680] Early memory node ranges
[    0.000680]   node   0: [mem 0x0000000000001000-0x000000000009efff]
[    0.000681]   node   0: [mem 0x0000000000100000-0x0000000049dc1fff]
[    0.000682]   node   0: [mem 0x0000000063fff000-0x0000000063ffffff]
[    0.000683]   node   0: [mem 0x0000000100000000-0x000000028f7fffff]
[    0.000684] Initmem setup node 0 [mem 0x0000000000001000-0x000000028f7fffff]
[    0.000688] On node 0, zone DMA: 1 pages in unavailable ranges
[    0.000713] On node 0, zone DMA: 97 pages in unavailable ranges
[    0.003370] On node 0, zone DMA32: 41533 pages in unavailable ranges
[    0.016420] On node 0, zone Normal: 16384 pages in unavailable ranges
[    0.016442] On node 0, zone Normal: 2048 pages in unavailable ranges
[    0.016462] Reserving Intel graphics memory at [mem 0x6c800000-0x707fffff]
[    0.017145] ACPI: PM-Timer IO Port: 0x1808
[    0.017151] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
[    0.017152] ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
[    0.017153] ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1])
[    0.017153] ACPI: LAPIC_NMI (acpi_id[0x04] high edge lint[0x1])
[    0.017154] ACPI: LAPIC_NMI (acpi_id[0x05] high edge lint[0x1])
[    0.017155] ACPI: LAPIC_NMI (acpi_id[0x06] high edge lint[0x1])
[    0.017155] ACPI: LAPIC_NMI (acpi_id[0x07] high edge lint[0x1])
[    0.017156] ACPI: LAPIC_NMI (acpi_id[0x08] high edge lint[0x1])
[    0.017156] ACPI: LAPIC_NMI (acpi_id[0x09] high edge lint[0x1])
[    0.017157] ACPI: LAPIC_NMI (acpi_id[0x0a] high edge lint[0x1])
[    0.017158] ACPI: LAPIC_NMI (acpi_id[0x0b] high edge lint[0x1])
[    0.017158] ACPI: LAPIC_NMI (acpi_id[0x0c] high edge lint[0x1])
[    0.017159] ACPI: LAPIC_NMI (acpi_id[0x0d] high edge lint[0x1])
[    0.017160] ACPI: LAPIC_NMI (acpi_id[0x0e] high edge lint[0x1])
[    0.017160] ACPI: LAPIC_NMI (acpi_id[0x0f] high edge lint[0x1])
[    0.017161] ACPI: LAPIC_NMI (acpi_id[0x10] high edge lint[0x1])
[    0.017209] IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-119
[    0.017212] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.017214] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.017216] ACPI: Using ACPI (MADT) for SMP configuration information
[    0.017218] ACPI: HPET id: 0x8086a201 base: 0xfed00000
[    0.017224] e820: update [mem 0x44be3000-0x44c6bfff] usable ==> reserved
[    0.017230] TSC deadline timer available
[    0.017231] smpboot: Allowing 8 CPUs, 0 hotplug CPUs
[    0.017241] PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff]
[    0.017243] PM: hibernation: Registered nosave memory: [mem 0x0009f000-0x000fffff]
[    0.017244] PM: hibernation: Registered nosave memory: [mem 0x44be3000-0x44c6bfff]
[    0.017245] PM: hibernation: Registered nosave memory: [mem 0x49dc2000-0x63510fff]
[    0.017246] PM: hibernation: Registered nosave memory: [mem 0x63511000-0x63d71fff]
[    0.017246] PM: hibernation: Registered nosave memory: [mem 0x63d72000-0x63ffefff]
[    0.017247] PM: hibernation: Registered nosave memory: [mem 0x64000000-0x67ffffff]
[    0.017248] PM: hibernation: Registered nosave memory: [mem 0x68000000-0x694fffff]
[    0.017249] PM: hibernation: Registered nosave memory: [mem 0x69500000-0x695fffff]
[    0.017250] PM: hibernation: Registered nosave memory: [mem 0x69600000-0x69dfffff]
[    0.017250] PM: hibernation: Registered nosave memory: [mem 0x69e00000-0x707fffff]
[    0.017251] PM: hibernation: Registered nosave memory: [mem 0x70800000-0xfed1ffff]
[    0.017252] PM: hibernation: Registered nosave memory: [mem 0xfed20000-0xfed7ffff]
[    0.017252] PM: hibernation: Registered nosave memory: [mem 0xfed80000-0xffffffff]
[    0.017254] [mem 0x70800000-0xfed1ffff] available for PCI devices
[    0.017255] Booting paravirtualized kernel on bare hardware
[    0.017259] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[    0.021345] setup_percpu: NR_CPUS:64 nr_cpumask_bits:8 nr_cpu_ids:8 nr_node_ids:1
[    0.021924] percpu: Embedded 71 pages/cpu s252192 r8192 d30432 u524288
[    0.021929] pcpu-alloc: s252192 r8192 d30432 u524288 alloc=1*2097152
[    0.021931] pcpu-alloc: [0] 0 1 2 3 [0] 4 5 6 7 
[    0.021947] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.5.0-rc7+zeh-xe+ root=/dev/nvme0n1p3 ro mitigations=off drm.debug=0xe modprobe.blacklist=i915 modprobe.blacklist=xe
[    0.021983] Unknown kernel command line parameters "BOOT_IMAGE=/boot/vmlinuz-6.5.0-rc7+zeh-xe+", will be passed to user space.
[    0.022024] printk: log_buf_len individual max cpu contribution: 262144 bytes
[    0.022025] printk: log_buf_len total cpu_extra contributions: 1835008 bytes
[    0.022026] printk: log_buf_len min size: 262144 bytes
[    0.023868] printk: log_buf_len: 2097152 bytes
[    0.023871] printk: early log buf free: 249176(95%)
[    0.024956] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear)
[    0.025537] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
[    0.025609] Built 1 zonelists, mobility grouping on.  Total pages: 1908331
[    0.025612] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.025613] stackdepot: allocating hash table via alloc_large_system_hash
[    0.026172] stackdepot hash table entries: 524288 (order: 10, 4194304 bytes, linear)
[    0.026190] software IO TLB: area num 8.
[    0.213025] Memory: 7368732K/7755140K available (18432K kernel code, 2921K rwdata, 6384K rodata, 2664K init, 14032K bss, 386152K reserved, 0K cma-reserved)
[    0.213028] **********************************************************
[    0.213028] **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
[    0.213029] **                                                      **
[    0.213029] ** This system shows unhashed kernel memory addresses   **
[    0.213031] ** via the console, logs, and other interfaces. This    **
[    0.213031] ** might reduce the security of your system.            **
[    0.213032] **                                                      **
[    0.213032] ** If you see this message and you are not debugging    **
[    0.213033] ** the kernel, report this immediately to your system   **
[    0.213033] ** administrator!                                       **
[    0.213034] **                                                      **
[    0.213034] **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
[    0.213035] **********************************************************
[    0.213211] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
[    0.213700] Dynamic Preempt: full
[    0.213852] Running RCU self tests
[    0.213853] Running RCU synchronous self tests
[    0.213868] rcu: Preemptible hierarchical RCU implementation.
[    0.213869] rcu: 	RCU lockdep checking is enabled.
[    0.213870] rcu: 	RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=8.
[    0.213871] rcu: 	RCU callback double-/use-after-free debug is enabled.
[    0.213872] 	Trampoline variant of Tasks RCU enabled.
[    0.213873] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
[    0.213873] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
[    0.213904] Running RCU synchronous self tests
[    0.219754] NR_IRQS: 4352, nr_irqs: 2048, preallocated irqs: 16
[    0.220126] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    0.220453] Console: colour dummy device 80x25
[    0.220468] printk: console [tty0] enabled
[    0.222618] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[    0.222632] ... MAX_LOCKDEP_SUBCLASSES:  8
[    0.222639] ... MAX_LOCK_DEPTH:          48
[    0.222646] ... MAX_LOCKDEP_KEYS:        8192
[    0.222653] ... CLASSHASH_SIZE:          4096
[    0.222660] ... MAX_LOCKDEP_ENTRIES:     32768
[    0.222667] ... MAX_LOCKDEP_CHAINS:      65536
[    0.222675] ... CHAINHASH_SIZE:          32768
[    0.222682]  memory used by lock dependency info: 6493 kB
[    0.222690]  memory used for stack traces: 4224 kB
[    0.222698]  per task-struct memory footprint: 1920 bytes
[    0.222713] ACPI: Core revision 20230331
[    0.223343] hpet: HPET dysfunctional in PC10. Force disabled.
[    0.223353] APIC: Switch to symmetric I/O mode setup
[    0.223363] DMAR: Host address width 39
[    0.223369] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[    0.223407] DMAR: dmar0: reg_base_addr fed90000 ver 4:0 cap 1c0000c40660462 ecap 69e2ff0505e
[    0.223421] DMAR: DRHD base: 0x000000fed84000 flags: 0x0
[    0.223445] DMAR: dmar1: reg_base_addr fed84000 ver 1:0 cap d2008c40660462 ecap f050da
[    0.223458] DMAR: DRHD base: 0x000000fed85000 flags: 0x0
[    0.223481] DMAR: dmar2: reg_base_addr fed85000 ver 1:0 cap d2008c40660462 ecap f050da
[    0.223494] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[    0.223516] DMAR: dmar3: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
[    0.223529] DMAR: RMRR base: 0x0000006c000000 end: 0x000000707fffff
[    0.223545] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 3
[    0.223556] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[    0.223565] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.226430] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    0.226441] x2apic enabled
[    0.226479] Switched APIC routing to cluster x2apic.
[    0.231853] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x159647815e3, max_idle_ns: 440795269835 ns
[    0.231886] Calibrating delay loop (skipped), value calculated using timer frequency.. 2995.20 BogoMIPS (lpj=1497600)
[    0.231931] x86/tme: not enabled by BIOS
[    0.231943] CPU0: Thermal monitoring enabled (TM1)
[    0.231953] x86/cpu: User Mode Instruction Prevention (UMIP) activated
[    0.232083] process: using mwait in idle threads
[    0.232095] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[    0.232104] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[    0.232117] Spectre V2 : User space: Vulnerable
[    0.232126] Speculative Store Bypass: Vulnerable
[    0.232135] GDS: Vulnerable: No microcode
[    0.232151] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.232163] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.232173] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.232183] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
[    0.232192] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
[    0.232202] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
[    0.232212] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
[    0.232224] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.232233] x86/fpu: xstate_offset[5]:  832, xstate_sizes[5]:   64
[    0.232243] x86/fpu: xstate_offset[6]:  896, xstate_sizes[6]:  512
[    0.232253] x86/fpu: xstate_offset[7]: 1408, xstate_sizes[7]: 1024
[    0.232262] x86/fpu: xstate_offset[9]: 2432, xstate_sizes[9]:    8
[    0.232272] x86/fpu: Enabled xstate features 0x2e7, context size is 2440 bytes, using 'compacted' format.
[    0.232875] Freeing SMP alternatives memory: 44K
[    0.232875] pid_max: default: 32768 minimum: 301
[    0.232875] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
[    0.232875] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
[    0.232875] Running RCU synchronous self tests
[    0.232875] Running RCU synchronous self tests
[    0.232875] smpboot: CPU0: 11th Gen Intel(R) Core(TM) i5-1145G7 @ 2.60GHz (family: 0x6, model: 0x8c, stepping: 0x1)
[    0.232875] RCU Tasks: Setting shift to 3 and lim to 1 rcu_task_cb_adjust=1.
[    0.232875] Running RCU-tasks wait API self tests
[    0.335936] Performance Events: PEBS fmt4+-baseline,  AnyThread deprecated, Icelake events, 32-deep LBR, full-width counters, Intel PMU driver.
[    0.336012] ... version:                5
[    0.336019] ... bit width:              48
[    0.336026] ... generic registers:      8
[    0.336032] ... value mask:             0000ffffffffffff
[    0.336041] ... max period:             00007fffffffffff
[    0.336049] ... fixed-purpose events:   4
[    0.336056] ... event mask:             0001000f000000ff
[    0.336228] signal: max sigframe size: 3632
[    0.336256] Estimated ratio of average max frequency by base frequency (times 1024): 2730
[    0.337915] rcu: Hierarchical SRCU implementation.
[    0.337923] rcu: 	Max phase no-delay instances is 400.
[    0.339421] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
[    0.339728] smp: Bringing up secondary CPUs ...
[    0.340131] smpboot: x86: Booting SMP configuration:
[    0.340146] .... node  #0, CPUs:      #1 #2 #3 #4 #5 #6 #7
[    0.345167] smp: Brought up 1 node, 8 CPUs
[    0.345167] smpboot: Max logical packages: 1
[    0.345167] smpboot: Total of 8 processors activated (23961.60 BogoMIPS)
[    0.347047] devtmpfs: initialized
[    0.347158] x86/mm: Memory block size: 128MB
[    0.352256] ACPI: PM: Registering ACPI NVS region [mem 0x63511000-0x63d71fff] (8785920 bytes)
[    0.354718] Running RCU synchronous self tests
[    0.354736] Running RCU synchronous self tests
[    0.354792] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[    0.354811] futex hash table entries: 2048 (order: 6, 262144 bytes, linear)
[    0.354969] pinctrl core: initialized pinctrl subsystem
[    0.355583] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.357118] thermal_sys: Registered thermal governor 'fair_share'
[    0.357120] thermal_sys: Registered thermal governor 'step_wise'
[    0.357131] thermal_sys: Registered thermal governor 'user_space'
[    0.357186] cpuidle: using governor menu
[    0.357263] Simple Boot Flag at 0x47 set to 0x80
[    0.357263] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    0.357362] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xc0000000-0xcfffffff] (base 0xc0000000)
[    0.357384] PCI: not using MMCONFIG
[    0.357394] PCI: Using configuration type 1 for base access
[    0.358474] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
[    0.358890] kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.
[    0.358979] HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
[    0.358979] HugeTLB: 16380 KiB vmemmap can be freed for a 1.00 GiB page
[    0.358979] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
[    0.358979] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page
[    0.360008] cryptd: max_cpu_qlen set to 1000
[    0.376882] raid6: avx512x4 gen() 54959 MB/s
[    0.392925] raid6: avx512x2 gen() 57068 MB/s
[    0.409960] raid6: avx512x1 gen() 54001 MB/s
[    0.426997] raid6: avx2x4   gen() 44920 MB/s
[    0.444032] raid6: avx2x2   gen() 44160 MB/s
[    0.461876] raid6: avx2x1   gen() 40450 MB/s
[    0.461876] raid6: using algorithm avx512x2 gen() 57068 MB/s
[    0.461893] Callback from call_rcu_tasks() invoked.
[    0.478115] raid6: .... xor() 40356 MB/s, rmw enabled
[    0.478126] raid6: using avx512x2 recovery algorithm
[    0.478442] ACPI: Added _OSI(Module Device)
[    0.478450] ACPI: Added _OSI(Processor Device)
[    0.478458] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.478467] ACPI: Added _OSI(Processor Aggregator Device)
[    0.821487] ACPI: 13 ACPI AML tables successfully acquired and loaded
[    0.909880] ACPI: Dynamic OEM Table Load:
[    0.909918] ACPI: SSDT 0xFFFF88810171D000 0001CB (v02 PmRef  Cpu0Psd  00003000 INTL 20191018)
[    0.914572] ACPI: \_SB_.PR00: _OSC native thermal LVT Acked
[    0.935636] ACPI: Dynamic OEM Table Load:
[    0.935661] ACPI: SSDT 0xFFFF888101883400 000394 (v02 PmRef  Cpu0Cst  00003001 INTL 20191018)
[    0.941390] ACPI: Dynamic OEM Table Load:
[    0.941413] ACPI: SSDT 0xFFFF888101893800 000437 (v02 PmRef  Cpu0Ist  00003000 INTL 20191018)
[    0.947429] ACPI: Dynamic OEM Table Load:
[    0.947452] ACPI: SSDT 0xFFFF888101884000 000266 (v02 PmRef  Cpu0Hwp  00003000 INTL 20191018)
[    0.954120] ACPI: Dynamic OEM Table Load:
[    0.954149] ACPI: SSDT 0xFFFF8881018B9000 0008E7 (v02 PmRef  ApIst    00003000 INTL 20191018)
[    0.961667] ACPI: Dynamic OEM Table Load:
[    0.961691] ACPI: SSDT 0xFFFF888101CFA000 00048A (v02 PmRef  ApHwp    00003000 INTL 20191018)
[    0.967626] ACPI: Dynamic OEM Table Load:
[    0.967652] ACPI: SSDT 0xFFFF888101CFB800 0004D4 (v02 PmRef  ApPsd    00003000 INTL 20191018)
[    0.973574] ACPI: Dynamic OEM Table Load:
[    0.973597] ACPI: SSDT 0xFFFF888101CFD000 00048A (v02 PmRef  ApCst    00003000 INTL 20191018)
[    1.012177] ACPI: EC: EC started
[    1.012193] ACPI: EC: interrupt blocked
[    1.027723] ACPI: EC: EC_CMD/EC_SC=0x934, EC_DATA=0x930
[    1.027737] ACPI: \_SB_.PC00.LPCB.ECDV: Boot DSDT EC used to handle transactions
[    1.027752] ACPI: Interpreter enabled
[    1.027863] ACPI: PM: (supports S0 S4 S5)
[    1.027877] ACPI: Using IOAPIC for interrupt routing
[    1.028033] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xc0000000-0xcfffffff] (base 0xc0000000)
[    1.048803] PCI: MMCONFIG at [mem 0xc0000000-0xcfffffff] reserved as ACPI motherboard resource
[    1.048832] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    1.048847] PCI: Ignoring E820 reservations for host bridge windows
[    1.067722] ACPI: Enabled 9 GPEs in block 00 to 7F
[    1.103510] ACPI: \_SB_.PC00.XHCI.RHUB.HS10.BTPR: New power resource
[    1.159978] ACPI: \_SB_.PC00.RP05.PXP_: New power resource
[    1.202711] ACPI: \_SB_.PC00.SAT0.VOL0.V0PR: New power resource
[    1.204266] ACPI: \_SB_.PC00.SAT0.VOL1.V1PR: New power resource
[    1.205773] ACPI: \_SB_.PC00.SAT0.VOL2.V2PR: New power resource
[    1.246220] ACPI: \_SB_.PC00.CNVW.WRST: New power resource
[    1.265844] ACPI: \_SB_.PC00.TBT0: New power resource
[    1.266157] ACPI: \_SB_.PC00.TBT1: New power resource
[    1.266459] ACPI: \_SB_.PC00.D3C_: New power resource
[    1.586459] ACPI: \PIN_: New power resource
[    1.589313] ACPI: PCI Root Bridge [PC00] (domain 0000 [bus 00-fe])
[    1.589338] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
[    1.594013] acpi PNP0A08:00: _OSC: platform does not support [AER]
[    1.602581] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME PCIeCapability LTR]
[    1.613409] PCI host bridge to bus 0000:00
[    1.613421] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[    1.613436] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[    1.613450] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[    1.613465] pci_bus 0000:00: root bus resource [mem 0x70800000-0xbfffffff window]
[    1.613481] pci_bus 0000:00: root bus resource [mem 0x4000000000-0x7fffffffff window]
[    1.613498] pci_bus 0000:00: root bus resource [bus 00-fe]
[    1.614066] pci 0000:00:00.0: [8086:9a14] type 00 class 0x060000
[    1.614383] pci 0000:00:02.0: [8086:9a49] type 00 class 0x030000
[    1.614404] pci 0000:00:02.0: reg 0x10: [mem 0x6052000000-0x6052ffffff 64bit]
[    1.614423] pci 0000:00:02.0: reg 0x18: [mem 0x4000000000-0x400fffffff 64bit pref]
[    1.614441] pci 0000:00:02.0: reg 0x20: [io  0x3000-0x303f]
[    1.614481] pci 0000:00:02.0: BAR 2: assigned to efifb
[    1.614492] pci 0000:00:02.0: DMAR: Skip IOMMU disabling for graphics
[    1.614507] pci 0000:00:02.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
[    1.614562] pci 0000:00:02.0: reg 0x344: [mem 0x00000000-0x00ffffff 64bit]
[    1.614576] pci 0000:00:02.0: VF(n) BAR0 space: [mem 0x00000000-0x06ffffff 64bit] (contains BAR0 for 7 VFs)
[    1.614598] pci 0000:00:02.0: reg 0x34c: [mem 0x00000000-0x1fffffff 64bit pref]
[    1.614613] pci 0000:00:02.0: VF(n) BAR2 space: [mem 0x00000000-0xdfffffff 64bit pref] (contains BAR2 for 7 VFs)
[    1.615879] pci 0000:00:04.0: [8086:9a03] type 00 class 0x118000
[    1.615910] pci 0000:00:04.0: reg 0x10: [mem 0x6053140000-0x605315ffff 64bit]
[    1.617257] pci 0000:00:07.0: [8086:9a23] type 01 class 0x060400
[    1.617469] pci 0000:00:07.0: PME# supported from D0 D3hot D3cold
[    1.621475] pci 0000:00:07.1: [8086:9a25] type 01 class 0x060400
[    1.621678] pci 0000:00:07.1: PME# supported from D0 D3hot D3cold
[    1.625635] pci 0000:00:0d.0: [8086:9a13] type 00 class 0x0c0330
[    1.625663] pci 0000:00:0d.0: reg 0x10: [mem 0x6053180000-0x605318ffff 64bit]
[    1.625759] pci 0000:00:0d.0: PME# supported from D3hot D3cold
[    1.627254] pci 0000:00:0d.2: [8086:9a1b] type 00 class 0x0c0340
[    1.627280] pci 0000:00:0d.2: reg 0x10: [mem 0x6053100000-0x605313ffff 64bit]
[    1.627301] pci 0000:00:0d.2: reg 0x18: [mem 0x60531a1000-0x60531a1fff 64bit]
[    1.627371] pci 0000:00:0d.2: supports D1 D2
[    1.627380] pci 0000:00:0d.2: PME# supported from D0 D1 D2 D3hot D3cold
[    1.628771] pci 0000:00:12.0: [8086:a0fc] type 00 class 0x070000
[    1.628808] pci 0000:00:12.0: reg 0x10: [mem 0x6053170000-0x605317ffff 64bit]
[    1.628921] pci 0000:00:12.0: PME# supported from D0 D3hot
[    1.630838] pci 0000:00:14.0: [8086:a0ed] type 00 class 0x0c0330
[    1.630878] pci 0000:00:14.0: reg 0x10: [mem 0x6053160000-0x605316ffff 64bit]
[    1.631009] pci 0000:00:14.0: PME# supported from D3hot D3cold
[    1.632585] pci 0000:00:14.2: [8086:a0ef] type 00 class 0x050000
[    1.632622] pci 0000:00:14.2: reg 0x10: [mem 0x6053198000-0x605319bfff 64bit]
[    1.632653] pci 0000:00:14.2: reg 0x18: [mem 0x60531a0000-0x60531a0fff 64bit]
[    1.632962] pci 0000:00:14.3: [8086:a0f0] type 00 class 0x028000
[    1.633011] pci 0000:00:14.3: reg 0x10: [mem 0x6053194000-0x6053197fff 64bit]
[    1.633246] pci 0000:00:14.3: PME# supported from D0 D3hot D3cold
[    1.634333] pci 0000:00:15.0: [8086:a0e8] type 00 class 0x0c8000
[    1.634418] pci 0000:00:15.0: reg 0x10: [mem 0x00000000-0x00000fff 64bit]
[    1.636012] pci 0000:00:15.1: [8086:a0e9] type 00 class 0x0c8000
[    1.636096] pci 0000:00:15.1: reg 0x10: [mem 0x00000000-0x00000fff 64bit]
[    1.637572] pci 0000:00:16.0: [8086:a0e0] type 00 class 0x078000
[    1.637611] pci 0000:00:16.0: reg 0x10: [mem 0x605319d000-0x605319dfff 64bit]
[    1.637737] pci 0000:00:16.0: PME# supported from D3hot
[    1.639603] pci 0000:00:16.3: [8086:a0e3] type 00 class 0x070002
[    1.639637] pci 0000:00:16.3: reg 0x10: [io  0x3060-0x3067]
[    1.639660] pci 0000:00:16.3: reg 0x14: [mem 0xa2321000-0xa2321fff]
[    1.640015] pci 0000:00:1c.0: [8086:a0be] type 01 class 0x060400
[    1.640168] pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold
[    1.643534] pci 0000:00:1d.0: [8086:a0b0] type 01 class 0x060400
[    1.643705] pci 0000:00:1d.0: PME# supported from D0 D3hot D3cold
[    1.647054] pci 0000:00:1f.0: [8086:a082] type 00 class 0x060100
[    1.648428] pci 0000:00:1f.3: [8086:a0c8] type 00 class 0x040380
[    1.648485] pci 0000:00:1f.3: reg 0x10: [mem 0x6053190000-0x6053193fff 64bit]
[    1.648555] pci 0000:00:1f.3: reg 0x20: [mem 0x6053000000-0x60530fffff 64bit]
[    1.648697] pci 0000:00:1f.3: PME# supported from D3hot D3cold
[    1.651018] pci 0000:00:1f.4: [8086:a0a3] type 00 class 0x0c0500
[    1.651059] pci 0000:00:1f.4: reg 0x10: [mem 0x605319c000-0x605319c0ff 64bit]
[    1.651099] pci 0000:00:1f.4: reg 0x20: [io  0xefa0-0xefbf]
[    1.652327] pci 0000:00:1f.5: [8086:a0a4] type 00 class 0x0c8000
[    1.652363] pci 0000:00:1f.5: reg 0x10: [mem 0xfe010000-0xfe010fff]
[    1.652686] pci 0000:00:1f.6: [8086:15fb] type 00 class 0x020000
[    1.652754] pci 0000:00:1f.6: reg 0x10: [mem 0xa2300000-0xa231ffff]
[    1.653061] pci 0000:00:1f.6: PME# supported from D0 D3hot D3cold
[    1.654411] pci 0000:00:07.0: PCI bridge to [bus 01-38]
[    1.654425] pci 0000:00:07.0:   bridge window [mem 0x8c000000-0xa20fffff]
[    1.654442] pci 0000:00:07.0:   bridge window [mem 0x6000000000-0x6021ffffff 64bit pref]
[    1.654579] pci 0000:00:07.1: PCI bridge to [bus 39-70]
[    1.654593] pci 0000:00:07.1:   bridge window [mem 0x74000000-0x8a0fffff]
[    1.654609] pci 0000:00:07.1:   bridge window [mem 0x6030000000-0x6051ffffff 64bit pref]
[    1.654827] pci 0000:71:00.0: [10ec:525a] type 00 class 0xff0000
[    1.654906] pci 0000:71:00.0: reg 0x14: [mem 0xa2200000-0xa2200fff]
[    1.655250] pci 0000:71:00.0: supports D1 D2
[    1.655259] pci 0000:71:00.0: PME# supported from D1 D2 D3hot D3cold
[    1.655965] pci 0000:00:1c.0: PCI bridge to [bus 71]
[    1.655982] pci 0000:00:1c.0:   bridge window [mem 0xa2200000-0xa22fffff]
[    1.656405] pci 0000:72:00.0: [8086:f1a8] type 00 class 0x010802
[    1.656448] pci 0000:72:00.0: reg 0x10: [mem 0xa2100000-0xa2103fff 64bit]
[    1.657371] pci 0000:00:1d.0: PCI bridge to [bus 72]
[    1.657387] pci 0000:00:1d.0:   bridge window [mem 0xa2100000-0xa21fffff]
[    1.657432] pci_bus 0000:00: on NUMA node 0
[    1.944235] Low-power S0 idle used by default for system suspend
[    2.018314] ACPI: EC: interrupt unblocked
[    2.018329] ACPI: EC: event unblocked
[    2.018371] ACPI: EC: EC_CMD/EC_SC=0x934, EC_DATA=0x930
[    2.018384] ACPI: EC: GPE=0x6e
[    2.018394] ACPI: \_SB_.PC00.LPCB.ECDV: Boot DSDT EC initialization complete
[    2.018412] ACPI: \_SB_.PC00.LPCB.ECDV: EC: Used to handle transactions and events
[    2.018813] iommu: Default domain type: Translated
[    2.018823] iommu: DMA domain TLB invalidation policy: lazy mode
[    2.019390] SCSI subsystem initialized
[    2.019429] libata version 3.00 loaded.
[    2.019429] ACPI: bus type USB registered
[    2.019429] usbcore: registered new interface driver usbfs
[    2.019429] usbcore: registered new interface driver hub
[    2.019429] usbcore: registered new device driver usb
[    2.020107] efivars: Registered efivars operations
[    2.020107] Advanced Linux Sound Architecture Driver Initialized.
[    2.021391] PCI: Using ACPI for IRQ routing
[    2.039488] PCI: pci_cache_line_size set to 64 bytes
[    2.039832] pci 0000:00:1f.5: can't claim BAR 0 [mem 0xfe010000-0xfe010fff]: no compatible bridge window
[    2.040020] e820: reserve RAM buffer [mem 0x0009f000-0x0009ffff]
[    2.040035] e820: reserve RAM buffer [mem 0x44be3000-0x47ffffff]
[    2.040039] e820: reserve RAM buffer [mem 0x49dc2000-0x4bffffff]
[    2.040043] e820: reserve RAM buffer [mem 0x28f800000-0x28fffffff]
[    2.040176] pci 0000:00:02.0: vgaarb: setting as boot VGA device
[    2.040176] pci 0000:00:02.0: vgaarb: bridge control possible
[    2.040176] pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    2.040176] vgaarb: loaded
[    2.040195] clocksource: Switched to clocksource tsc-early
[    2.041152] pnp: PnP ACPI init
[    2.042858] system 00:00: [io  0x0680-0x069f] has been reserved
[    2.042889] system 00:00: [io  0x164e-0x164f] has been reserved
[    2.044011] system 00:02: [io  0x1854-0x1857] has been reserved
[    2.056141] pnp 00:05: disabling [mem 0xc0000000-0xcfffffff] because it overlaps 0000:00:02.0 BAR 9 [mem 0x00000000-0xdfffffff 64bit pref]
[    2.056426] system 00:05: [mem 0xfedc0000-0xfedc7fff] has been reserved
[    2.056459] system 00:05: [mem 0xfeda0000-0xfeda0fff] has been reserved
[    2.056491] system 00:05: [mem 0xfeda1000-0xfeda1fff] has been reserved
[    2.056528] system 00:05: [mem 0xfed20000-0xfed7ffff] could not be reserved
[    2.056565] system 00:05: [mem 0xfed90000-0xfed93fff] could not be reserved
[    2.056610] system 00:05: [mem 0xfed45000-0xfed8ffff] could not be reserved
[    2.056643] system 00:05: [mem 0xfee00000-0xfeefffff] has been reserved
[    2.064470] system 00:06: [io  0x1800-0x18fe] could not be reserved
[    2.064503] system 00:06: [mem 0xfe000000-0xfe01ffff] has been reserved
[    2.064534] system 00:06: [mem 0xfe04c000-0xfe04ffff] has been reserved
[    2.064565] system 00:06: [mem 0xfe050000-0xfe0affff] has been reserved
[    2.064603] system 00:06: [mem 0xfe0d0000-0xfe0fffff] has been reserved
[    2.064635] system 00:06: [mem 0xfe200000-0xfe7fffff] has been reserved
[    2.064667] system 00:06: [mem 0xff000000-0xffffffff] has been reserved
[    2.064698] system 00:06: [mem 0xfd000000-0xfd68ffff] has been reserved
[    2.064731] system 00:06: [mem 0xfd6b0000-0xfd6cffff] has been reserved
[    2.064765] system 00:06: [mem 0xfd6f0000-0xfdffffff] has been reserved
[    2.066821] system 00:07: [io  0x2000-0x20fe] has been reserved
[    2.100736] pnp: PnP ACPI: found 9 devices
[    2.115888] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    2.116366] NET: Registered PF_INET protocol family
[    2.116590] IP idents hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    2.118756] tcp_listen_portaddr_hash hash table entries: 4096 (order: 6, 294912 bytes, linear)
[    2.118890] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    2.118934] TCP established hash table entries: 65536 (order: 7, 524288 bytes, linear)
[    2.119136] TCP bind hash table entries: 65536 (order: 11, 9437184 bytes, vmalloc hugepage)
[    2.122610] TCP: Hash tables configured (established 65536 bind 65536)
[    2.122856] UDP hash table entries: 4096 (order: 7, 655360 bytes, linear)
[    2.123200] UDP-Lite hash table entries: 4096 (order: 7, 655360 bytes, linear)
[    2.123651] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    2.124283] RPC: Registered named UNIX socket transport module.
[    2.124316] RPC: Registered udp transport module.
[    2.124335] RPC: Registered tcp transport module.
[    2.124355] RPC: Registered tcp-with-tls transport module.
[    2.124377] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    2.124409] pci_bus 0000:00: max bus depth: 1 pci_try_num: 2
[    2.124458] pci 0000:00:02.0: BAR 9: assigned [mem 0x4020000000-0x40ffffffff 64bit pref]
[    2.124492] pci 0000:00:02.0: BAR 7: assigned [mem 0x4010000000-0x4016ffffff 64bit]
[    2.124522] pci 0000:00:07.0: BAR 13: assigned [io  0x4000-0x4fff]
[    2.124547] pci 0000:00:07.1: BAR 13: assigned [io  0x5000-0x5fff]
[    2.124570] pci 0000:00:15.0: BAR 0: assigned [mem 0x4017000000-0x4017000fff 64bit]
[    2.124666] pci 0000:00:15.1: BAR 0: assigned [mem 0x4017001000-0x4017001fff 64bit]
[    2.124755] pci 0000:00:1f.5: BAR 0: assigned [mem 0x70800000-0x70800fff]
[    2.124808] pci 0000:00:07.0: PCI bridge to [bus 01-38]
[    2.124830] pci 0000:00:07.0:   bridge window [io  0x4000-0x4fff]
[    2.124855] pci 0000:00:07.0:   bridge window [mem 0x8c000000-0xa20fffff]
[    2.124882] pci 0000:00:07.0:   bridge window [mem 0x6000000000-0x6021ffffff 64bit pref]
[    2.124915] pci 0000:00:07.1: PCI bridge to [bus 39-70]
[    2.124936] pci 0000:00:07.1:   bridge window [io  0x5000-0x5fff]
[    2.124962] pci 0000:00:07.1:   bridge window [mem 0x74000000-0x8a0fffff]
[    2.124988] pci 0000:00:07.1:   bridge window [mem 0x6030000000-0x6051ffffff 64bit pref]
[    2.125020] pci 0000:00:1c.0: PCI bridge to [bus 71]
[    2.125045] pci 0000:00:1c.0:   bridge window [mem 0xa2200000-0xa22fffff]
[    2.125079] pci 0000:00:1d.0: PCI bridge to [bus 72]
[    2.125114] pci 0000:00:1d.0:   bridge window [mem 0xa2100000-0xa21fffff]
[    2.125158] pci_bus 0000:00: resource 4 [io  0x0000-0x0cf7 window]
[    2.125181] pci_bus 0000:00: resource 5 [io  0x0d00-0xffff window]
[    2.125204] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window]
[    2.125228] pci_bus 0000:00: resource 7 [mem 0x70800000-0xbfffffff window]
[    2.125253] pci_bus 0000:00: resource 8 [mem 0x4000000000-0x7fffffffff window]
[    2.125279] pci_bus 0000:01: resource 0 [io  0x4000-0x4fff]
[    2.125300] pci_bus 0000:01: resource 1 [mem 0x8c000000-0xa20fffff]
[    2.125323] pci_bus 0000:01: resource 2 [mem 0x6000000000-0x6021ffffff 64bit pref]
[    2.125349] pci_bus 0000:39: resource 0 [io  0x5000-0x5fff]
[    2.125370] pci_bus 0000:39: resource 1 [mem 0x74000000-0x8a0fffff]
[    2.125392] pci_bus 0000:39: resource 2 [mem 0x6030000000-0x6051ffffff 64bit pref]
[    2.125420] pci_bus 0000:71: resource 1 [mem 0xa2200000-0xa22fffff]
[    2.125443] pci_bus 0000:72: resource 1 [mem 0xa2100000-0xa21fffff]
[    2.131609] PCI: CLS 0 bytes, default 64
[    2.131712] DMAR: No ATSR found
[    2.131739] DMAR: No SATC found
[    2.131756] DMAR: IOMMU feature fl1gp_support inconsistent
[    2.131759] DMAR: IOMMU feature pgsel_inv inconsistent
[    2.131794] DMAR: IOMMU feature nwfs inconsistent
[    2.131816] DMAR: IOMMU feature pds inconsistent
[    2.131838] DMAR: IOMMU feature dit inconsistent
[    2.131860] DMAR: IOMMU feature eafs inconsistent
[    2.131880] DMAR: IOMMU feature prs inconsistent
[    2.131901] DMAR: IOMMU feature nest inconsistent
[    2.131922] DMAR: IOMMU feature mts inconsistent
[    2.131943] DMAR: IOMMU feature sc_support inconsistent
[    2.131963] DMAR: IOMMU feature dev_iotlb_support inconsistent
[    2.131987] DMAR: dmar2: Using Queued invalidation
[    2.132047] DMAR: dmar1: Using Queued invalidation
[    2.132071] DMAR: dmar0: Using Queued invalidation
[    2.132094] DMAR: dmar3: Using Queued invalidation
[    2.132819] pci 0000:00:07.1: Adding to iommu group 0
[    2.134598] pci 0000:00:07.0: Adding to iommu group 1
[    2.136198] pci 0000:00:02.0: Adding to iommu group 2
[    2.138120] pci 0000:00:00.0: Adding to iommu group 3
[    2.138241] pci 0000:00:04.0: Adding to iommu group 4
[    2.138410] pci 0000:00:0d.0: Adding to iommu group 5
[    2.138502] pci 0000:00:0d.2: Adding to iommu group 5
[    2.138647] pci 0000:00:12.0: Adding to iommu group 6
[    2.138815] pci 0000:00:14.0: Adding to iommu group 7
[    2.138911] pci 0000:00:14.2: Adding to iommu group 7
[    2.139026] pci 0000:00:14.3: Adding to iommu group 8
[    2.139190] pci 0000:00:15.0: Adding to iommu group 9
[    2.139286] pci 0000:00:15.1: Adding to iommu group 9
[    2.139456] pci 0000:00:16.0: Adding to iommu group 10
[    2.139561] pci 0000:00:16.3: Adding to iommu group 10
[    2.139707] pci 0000:00:1c.0: Adding to iommu group 11
[    2.139827] pci 0000:00:1d.0: Adding to iommu group 12
[    2.140069] pci 0000:00:1f.0: Adding to iommu group 13
[    2.140176] pci 0000:00:1f.3: Adding to iommu group 13
[    2.140279] pci 0000:00:1f.4: Adding to iommu group 13
[    2.140382] pci 0000:00:1f.5: Adding to iommu group 13
[    2.140485] pci 0000:00:1f.6: Adding to iommu group 13
[    2.140615] pci 0000:71:00.0: Adding to iommu group 14
[    2.140740] pci 0000:72:00.0: Adding to iommu group 15
[    2.149398] DMAR: Intel(R) Virtualization Technology for Directed I/O
[    2.149411] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    2.149422] software IO TLB: mapped [mem 0x000000003bd7e000-0x000000003fd7e000] (64MB)
[    2.149663] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 655360 ms ovfl timer
[    2.149678] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules
[    2.149688] RAPL PMU: hw unit of domain package 2^-14 Joules
[    2.149697] RAPL PMU: hw unit of domain psys 2^-14 Joules
[    2.149876] resource: resource sanity check: requesting [mem 0x00000000fedc0000-0x00000000fedcdfff], which spans more than pnp 00:05 [mem 0xfedc0000-0xfedc7fff]
[    2.149915] caller __uncore_imc_init_box+0xe0/0x150 mapping multiple BARs
[    2.150172] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x159647815e3, max_idle_ns: 440795269835 ns
[    2.150213] clocksource: Switched to clocksource tsc
[    2.155217] workingset: timestamp_bits=46 max_order=21 bucket_order=0
[    2.158783] ntfs: driver 2.1.32 [Flags: R/O].
[    2.159483] cryptomgr_test (83) used greatest stack depth: 14424 bytes left
[    2.163877] NET: Registered PF_ALG protocol family
[    2.163932] xor: measuring software checksum speed
[    2.164227]    prefetch64-sse  : 35847 MB/sec
[    2.164544]    generic_sse     : 31987 MB/sec
[    2.164552] xor: using function: prefetch64-sse (35847 MB/sec)
[    2.164620] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
[    2.164638] io scheduler mq-deadline registered
[    2.164647] io scheduler kyber registered
[    2.165721] cryptomgr_test (85) used greatest stack depth: 14168 bytes left
[    2.167456] pcieport 0000:00:07.0: PME: Signaling with IRQ 124
[    2.167603] pcieport 0000:00:07.0: pciehp: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+
[    2.169507] pcieport 0000:00:07.1: PME: Signaling with IRQ 125
[    2.169627] pcieport 0000:00:07.1: pciehp: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+
[    2.171389] pcieport 0000:00:1c.0: PME: Signaling with IRQ 126
[    2.172828] pcieport 0000:00:1d.0: PME: Signaling with IRQ 127
[    2.173685] kworker/u16:1 (89) used greatest stack depth: 13600 bytes left
[    2.173736] uvesafb: failed to execute /sbin/v86d
[    2.173748] uvesafb: make sure that the v86d helper is installed and executable
[    2.173765] uvesafb: Getting VBE info block failed (eax=0x4f00, err=-2)
[    2.173780] uvesafb: vbe_init() failed with -22
[    2.173792] uvesafb: probe of uvesafb.0 failed with error -22
[    2.173872] efifb: probing for efifb
[    2.173901] efifb: framebuffer at 0x4000000000, using 8100k, total 8100k
[    2.173914] efifb: mode is 1920x1080x32, linelength=7680, pages=1
[    2.173926] efifb: scrolling: redraw
[    2.173933] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
[    2.177740] Console: switching to colour frame buffer device 240x67
[    2.181153] fb0: EFI VGA frame buffer device
[    2.182936] Monitor-Mwait will be used to enter C-1 state
[    2.182947] Monitor-Mwait will be used to enter C-2 state
[    2.182952] Monitor-Mwait will be used to enter C-3 state
[    2.182958] ACPI: \_SB_.PR00: Found 3 idle states
[    2.189341] ACPI: AC: AC Adapter [AC] (on-line)
[    2.189648] input: Lid Switch as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0D:00/input/input0
[    2.190703] ACPI: button: Lid Switch [LID0]
[    2.190863] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input1
[    2.191673] ACPI: button: Power Button [PBTN]
[    2.191822] input: Sleep Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0E:00/input/input2
[    2.191947] ACPI: button: Sleep Button [SBTN]
[    2.210694] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    2.213269] serial 0000:00:12.0: enabling device (0000 -> 0002)
[    2.220612] serial 0000:00:16.3: enabling device (0000 -> 0003)
[    2.228133] 0000:00:16.3: ttyS0 at I/O 0x3060 (irq = 19, base_baud = 115200) is a 16550A
[    2.230356] hpet_acpi_add: no address or irqs in _CRS
[    2.230642] Non-volatile memory driver v1.3
[    2.230788] Linux agpgart interface v0.103
[    2.230983] ACPI: bus type drm_connector registered
[    2.233776] loop: module loaded
[    2.247759] intel-lpss 0000:00:15.0: enabling device (0000 -> 0002)
[    2.403314] ACPI: battery: Slot [BAT0] (battery present)
[    2.452724] intel-lpss 0000:00:15.1: enabling device (0000 -> 0002)
[    2.503709] nvme 0000:72:00.0: platform quirk: setting simple suspend
[    2.504252] nvme nvme0: pci function 0000:72:00.0
[    2.504625] tun: Universal TUN/TAP device driver, 1.6
[    2.505535] VFIO - User Level meta-driver version: 0.3
[    2.505812] xhci_hcd 0000:00:0d.0: xHCI Host Controller
[    2.505893] xhci_hcd 0000:00:0d.0: new USB bus registered, assigned bus number 1
[    2.507557] xhci_hcd 0000:00:0d.0: hcc params 0x20007fc1 hci version 0x120 quirks 0x0000000200009810
[    2.508347] xhci_hcd 0000:00:0d.0: xHCI Host Controller
[    2.508378] xhci_hcd 0000:00:0d.0: new USB bus registered, assigned bus number 2
[    2.508412] xhci_hcd 0000:00:0d.0: Host supports USB 3.1 Enhanced SuperSpeed
[    2.508649] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.05
[    2.508688] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    2.508725] usb usb1: Product: xHCI Host Controller
[    2.508745] usb usb1: Manufacturer: Linux 6.5.0-rc7+zeh-xe+ xhci-hcd
[    2.508770] usb usb1: SerialNumber: 0000:00:0d.0
[    2.509487] hub 1-0:1.0: USB hub found
[    2.510936] hub 1-0:1.0: 1 port detected
[    2.513010] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 6.05
[    2.515567] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    2.517194] usb usb2: Product: xHCI Host Controller
[    2.517857] nvme nvme0: 8/0/0 default/read/poll queues
[    2.518824] usb usb2: Manufacturer: Linux 6.5.0-rc7+zeh-xe+ xhci-hcd
[    2.521733] usb usb2: SerialNumber: 0000:00:0d.0
[    2.523363] hub 2-0:1.0: USB hub found
[    2.524570] hub 2-0:1.0: 4 ports detected
[    2.527633]  nvme0n1: p1 p2 p3
[    2.530151] xhci_hcd 0000:00:14.0: xHCI Host Controller
[    2.531279] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 3
[    2.534377] xhci_hcd 0000:00:14.0: hcc params 0x20007fc1 hci version 0x120 quirks 0x0000000200009810
[    2.536340] xhci_hcd 0000:00:14.0: xHCI Host Controller
[    2.537477] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 4
[    2.538599] xhci_hcd 0000:00:14.0: Host supports USB 3.1 Enhanced SuperSpeed
[    2.539833] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.05
[    2.541034] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    2.542127] usb usb3: Product: xHCI Host Controller
[    2.543219] usb usb3: Manufacturer: Linux 6.5.0-rc7+zeh-xe+ xhci-hcd
[    2.544361] usb usb3: SerialNumber: 0000:00:14.0
[    2.546196] hub 3-0:1.0: USB hub found
[    2.547361] hub 3-0:1.0: 12 ports detected
[    2.554924] usb usb4: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 6.05
[    2.556034] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    2.557144] usb usb4: Product: xHCI Host Controller
[    2.558275] usb usb4: Manufacturer: Linux 6.5.0-rc7+zeh-xe+ xhci-hcd
[    2.559401] usb usb4: SerialNumber: 0000:00:14.0
[    2.560943] hub 4-0:1.0: USB hub found
[    2.562120] hub 4-0:1.0: 4 ports detected
[    2.564690] usb: port power management may be unreliable
[    2.566202] usbcore: registered new interface driver usb-storage
[    2.567585] i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
[    2.569281] i8042: Warning: Keylock active
[    2.573016] serio: i8042 KBD port at 0x60,0x64 irq 1
[    2.574471] serio: i8042 AUX port at 0x60,0x64 irq 12
[    2.576416] mousedev: PS/2 mouse device common for all mice
[    2.578256] rtc_cmos 00:01: RTC can wake from S4
[    2.580832] rtc_cmos 00:01: registered as rtc0
[    2.582185] rtc_cmos 00:01: setting system clock to 2023-09-18T15:45:22 UTC (1695051922)
[    2.583496] rtc_cmos 00:01: alarms up to one month, y3k, 242 bytes nvram
[    2.584777] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input3
[    2.586070] IR JVC protocol handler initialized
[    2.587193] IR MCE Keyboard/mouse protocol handler initialized
[    2.588332] IR NEC protocol handler initialized
[    2.589451] IR RC5(x/sz) protocol handler initialized
[    2.590520] IR RC6 protocol handler initialized
[    2.591618] IR SANYO protocol handler initialized
[    2.592612] IR Sharp protocol handler initialized
[    2.593665] IR Sony protocol handler initialized
[    2.594687] IR XMP protocol handler initialized
[    2.596706] softdog: initialized. soft_noboot=0 soft_margin=60 sec soft_panic=0 (nowayout=0)
[    2.597766] softdog:              soft_reboot_cmd=<not set> soft_active_on_boot=0
[    2.598803] device-mapper: uevent: version 1.0.3
[    2.600138] device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@redhat.com
[    2.601227] intel_pstate: Intel P-state driver initializing
[    2.603794] intel_pstate: HWP enabled
[    2.604914] sdhci: Secure Digital Host Controller Interface driver
[    2.605957] sdhci: Copyright(c) Pierre Ossman
[    2.608617] pstore: Registered efi_pstore as persistent store backend
[    2.609698] hid: raw HID events driver (C) Jiri Kosina
[    2.611073] usbcore: registered new interface driver usbhid
[    2.612155] usbhid: USB HID core driver
[    2.613834] intel_pmc_core INT33A1:00:  initialized
[    2.615061] intel_rapl_msr: PL4 support detected.
[    2.616312] intel_rapl_common: Found RAPL domain package
[    2.617402] intel_rapl_common: Found RAPL domain core
[    2.618472] intel_rapl_common: Found RAPL domain uncore
[    2.619539] intel_rapl_common: Found RAPL domain psys
[    2.622794] Initializing XFRM netlink socket
[    2.623931] NET: Registered PF_INET6 protocol family
[    2.626165] Segment Routing with IPv6
[    2.627153] In-situ OAM (IOAM) with IPv6
[    2.628117] mip6: Mobile IPv6
[    2.629047] NET: Registered PF_PACKET protocol family
[    2.629954] NET: Registered PF_KEY protocol family
[    2.633666] microcode: Microcode Update Driver: v2.2.
[    2.633673] IPI shorthand broadcast: enabled
[    2.635667] AVX2 version of gcm_enc/dec engaged.
[    2.636917] AES CTR mode by8 optimization enabled
[    2.648530] sched_clock: Marking stable (2636001794, 11573225)->(2693971511, -46396492)
[    2.650015] registered taskstats version 1
[    2.666240] Btrfs loaded, zoned=no, fsverity=no
[    2.667836] pstore: Using crash dump compression: deflate
[    2.671470] cryptomgr_test (108) used greatest stack depth: 13096 bytes left
[    2.691506] clk: Disabling unused clocks
[    2.692428] ALSA device list:
[    2.693359]   No soundcards found.
[    2.694476] md: Skipping autodetection of RAID arrays. (raid=autodetect will force)
[    2.741773] EXT4-fs (nvme0n1p3): mounted filesystem d33cb5b8-6786-41bb-8fe9-3d143d334780 ro with ordered data mode. Quota mode: disabled.
[    2.742830] VFS: Mounted root (ext4 filesystem) readonly on device 259:3.
[    2.744470] devtmpfs: mounted
[    2.746669] Freeing unused kernel image (initmem) memory: 2664K
[    2.747652] Write protecting the kernel read-only data: 26624k
[    2.749342] Freeing unused kernel image (rodata/data gap) memory: 1808K
[    2.750320] Run /sbin/init as init process
[    2.751262]   with arguments:
[    2.751264]     /sbin/init
[    2.751265]   with environment:
[    2.751266]     HOME=/
[    2.751266]     TERM=linux
[    2.751267]     BOOT_IMAGE=/boot/vmlinuz-6.5.0-rc7+zeh-xe+
[    2.867894] systemd[1]: systemd 249.11-0ubuntu3.9 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
[    2.870796] systemd[1]: Detected architecture x86-64.
[    2.877947] systemd[1]: Hostname set to <stark01>.
[    2.905963] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=1 'systemd'
[    2.914429] snapd-env-gener (205) used greatest stack depth: 12920 bytes left
[    2.938947] cat (211) used greatest stack depth: 12872 bytes left
[    2.940805] friendly-recove (207) used greatest stack depth: 12472 bytes left
[    2.968908] systemd-bless-b (212) used greatest stack depth: 12344 bytes left
[    2.991295] block nvme0n1: the capability attribute has been deprecated.
[    3.059706] usb 3-2: new high-speed USB device number 2 using xhci_hcd
[    3.135032] systemd[1]: Queued start job for default target Multi-User System.
[    3.152106] systemd[1]: Created slice Slice /system/modprobe.
[    3.155680] systemd[1]: Created slice Slice /system/systemd-fsck.
[    3.159123] systemd[1]: Created slice User and Session Slice.
[    3.162120] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
[    3.165040] systemd[1]: Condition check resulted in Arbitrary Executable File Formats File System Automount Point being skipped.
[    3.166266] systemd[1]: Reached target Remote File Systems.
[    3.169111] systemd[1]: Reached target Slice Units.
[    3.171950] systemd[1]: Reached target Mounting snaps.
[    3.174756] systemd[1]: Reached target Mounted snaps.
[    3.177616] systemd[1]: Reached target System Time Set.
[    3.180495] systemd[1]: Reached target Local Verity Protected Volumes.
[    3.184232] systemd[1]: Listening on Syslog Socket.
[    3.187537] usb 3-2: New USB device found, idVendor=214b, idProduct=7250, bcdDevice= 1.00
[    3.187593] systemd[1]: Listening on fsck to fsckd communication Socket.
[    3.188871] usb 3-2: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[    3.191419] usb 3-2: Product: USB2.0 HUB
[    3.194031] hub 3-2:1.0: USB hub found
[    3.194236] hub 3-2:1.0: 4 ports detected
[    3.197512] systemd[1]: Listening on initctl Compatibility Named Pipe.
[    3.205739] systemd[1]: Condition check resulted in Journal Audit Socket being skipped.
[    3.207224] systemd[1]: Listening on Journal Socket (/dev/log).
[    3.210496] systemd[1]: Listening on Journal Socket.
[    3.214701] systemd[1]: Listening on udev Control Socket.
[    3.218002] systemd[1]: Listening on udev Kernel Socket.
[    3.234112] systemd[1]: Mounting Huge Pages File System...
[    3.239842] systemd[1]: Mounting POSIX Message Queue File System...
[    3.245875] systemd[1]: Mounting Kernel Debug File System...
[    3.252191] systemd[1]: Mounting Kernel Trace File System...
[    3.255637] systemd[1]: systemd-journald.service: unit configures an IP firewall, but the local system does not support BPF/cgroup firewalling.
[    3.256926] systemd[1]: (This warning is only shown for the first unit using IP firewalling.)
[    3.261176] systemd[1]: Starting Journal Service...
[    3.267793] systemd[1]: Starting Set the console keyboard layout...
[    3.274832] systemd[1]: Starting Create List of Static Device Nodes...
[    3.282492] systemd[1]: Starting Load Kernel Module chromeos_pstore...
[    3.290405] systemd[1]: Starting Load Kernel Module configfs...
[    3.299220] systemd[1]: Starting Load Kernel Module drm...
[    3.308043] systemd[1]: Starting Load Kernel Module efi_pstore...
[    3.316901] systemd[1]: Starting Load Kernel Module fuse...
[    3.316963] usb 3-6: new high-speed USB device number 3 using xhci_hcd
[    3.330680] systemd[1]: Starting Load Kernel Module pstore_blk...
[    3.337347] fuse: init (API version 7.38)
[    3.343220] systemd[1]: Starting Load Kernel Module pstore_zone...
[    3.354567] systemd[1]: Starting Load Kernel Module ramoops...
[    3.366204] systemd[1]: Starting File System Check on Root Device...
[    3.378185] systemd[1]: Starting Load Kernel Modules...
[    3.386441] systemd[1]: Starting Coldplug All udev Devices...
[    3.398093] systemd[1]: Mounted Huge Pages File System.
[    3.401900] systemd[1]: Mounted POSIX Message Queue File System.
[    3.405790] systemd[1]: Mounted Kernel Debug File System.
[    3.409696] systemd[1]: Mounted Kernel Trace File System.
[    3.415127] systemd[1]: Finished Set the console keyboard layout.
[    3.426034] systemd[1]: Finished Create List of Static Device Nodes.
[    3.430844] systemd[1]: modprobe@chromeos_pstore.service: Deactivated successfully.
[    3.434005] systemd[1]: Finished Load Kernel Module chromeos_pstore.
[    3.438605] systemd[1]: modprobe@configfs.service: Deactivated successfully.
[    3.441357] systemd[1]: Finished Load Kernel Module configfs.
[    3.445731] systemd[1]: modprobe@drm.service: Deactivated successfully.
[    3.448157] systemd[1]: Finished Load Kernel Module drm.
[    3.452820] systemd[1]: modprobe@efi_pstore.service: Deactivated successfully.
[    3.455250] systemd[1]: Finished Load Kernel Module efi_pstore.
[    3.459786] systemd[1]: modprobe@fuse.service: Deactivated successfully.
[    3.462334] systemd[1]: Finished Load Kernel Module fuse.
[    3.465847] systemd[1]: Started Journal Service.
[    3.486119] usb 3-6: New USB device found, idVendor=1bcf, idProduct=28cf, bcdDevice=15.31
[    3.486126] usb 3-6: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[    3.486128] usb 3-6: Product: Integrated_Webcam_FHD
[    3.486129] usb 3-6: Manufacturer: CN0XH90J8LG0017OAHZFA00
[    3.486130] usb 3-6: SerialNumber: 01.00.00
[    3.519610] usb 3-2.2: new low-speed USB device number 4 using xhci_hcd
[    3.587417] EXT4-fs (nvme0n1p3): re-mounted d33cb5b8-6786-41bb-8fe9-3d143d334780 r/w. Quota mode: disabled.
[    3.614195] usb 3-2.2: New USB device found, idVendor=045e, idProduct=0797, bcdDevice= 2.00
[    3.614200] usb 3-2.2: New USB device strings: Mfr=0, Product=2, SerialNumber=0
[    3.614202] usb 3-2.2: Product: USB Optical Mouse
[    3.624120] input: USB Optical Mouse as /devices/pci0000:00/0000:00:14.0/usb3/3-2/3-2.2/3-2.2:1.0/0003:045E:0797.0001/input/input5
[    3.625290] hid-generic 0003:045E:0797.0001: input,hidraw0: USB HID v1.11 Mouse [USB Optical Mouse] on usb-0000:00:14.0-2.2/input0
[    3.644857] systemd-journald[235]: Received client request to flush runtime journal.
[    3.726622] usb 3-8: new high-speed USB device number 5 using xhci_hcd
[    3.748846] loop0: detected capacity change from 0 to 8
[    3.756655] loop1: detected capacity change from 0 to 126896
[    3.764794] loop0: detected capacity change from 0 to 820832
[    3.772983] loop0: detected capacity change from 0 to 187776
[    3.785463] loop0: detected capacity change from 0 to 93928
[    3.792184] loop1: detected capacity change from 0 to 96176
[    3.806018] loop0: detected capacity change from 0 to 568
[    3.858638] usb 3-8: New USB device found, idVendor=0a5c, idProduct=5843, bcdDevice= 1.02
[    3.858645] usb 3-8: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[    3.858648] usb 3-8: Product: 58200
[    3.858651] usb 3-8: Manufacturer: Broadcom Corp
[    3.858653] usb 3-8: SerialNumber: 0123456789ABCD
[    3.921620] usb 3-2.3: new high-speed USB device number 6 using xhci_hcd
[    4.174607] usb 3-10: new full-speed USB device number 7 using xhci_hcd
[    4.243218] usb 3-2.3: New USB device found, idVendor=0b95, idProduct=1790, bcdDevice= 2.00
[    4.243229] usb 3-2.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[    4.243234] usb 3-2.3: Product: AX88179A
[    4.243238] usb 3-2.3: Manufacturer: ASIX
[    4.243242] usb 3-2.3: SerialNumber: 00645E92
[    4.305885] usb 3-10: New USB device found, idVendor=8087, idProduct=0026, bcdDevice= 0.02
[    4.305892] usb 3-10: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    4.311657] wmi_bus wmi_bus-PNP0C14:02: WQBC data block query control method not found
[    4.352963] input: DELL0A20:00 0488:101A Mouse as /devices/pci0000:00/0000:00:15.1/i2c_designware.1/i2c-1/i2c-DELL0A20:00/0018:0488:101A.0002/input/input6
[    4.376646] input: DELL0A20:00 0488:101A Touchpad as /devices/pci0000:00/0000:00:15.1/i2c_designware.1/i2c-1/i2c-DELL0A20:00/0018:0488:101A.0002/input/input7
[    4.426134] hid-generic 0018:0488:101A.0002: input,hidraw1: I2C HID v1.00 Mouse [DELL0A20:00 0488:101A] on i2c-DELL0A20:00
[    4.427196] pps_core: LinuxPPS API ver. 1 registered
[    4.427202] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    4.449182] PTP clock support registered
[    4.451040] mei_me 0000:00:16.0: enabling device (0000 -> 0002)
[    4.466489] i801_smbus 0000:00:1f.4: enabling device (0000 -> 0003)
[    4.469423] i801_smbus 0000:00:1f.4: SPD Write Disable is set
[    4.469547] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt
[    4.677163] i2c i2c-2: 1/2 memory slots populated (from DMI)
[    4.744275] Adding 8000508k swap on /dev/nvme0n1p2.  Priority:-2 extents:1 across:8000508k SS
[    4.768472] e1000e: Intel(R) PRO/1000 Network Driver
[    4.768480] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[    4.771735] e1000e 0000:00:1f.6: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[    4.807534] snd_hda_intel 0000:00:1f.3: DSP detected with PCI class/subclass/prog-if info 0x040380
[    4.807816] snd_hda_intel 0000:00:1f.3: enabling device (0000 -> 0002)
[    4.936453] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized): registered PHC clock
[    5.125047] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) a0:29:19:08:9b:02
[    5.125144] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
[    5.125266] e1000e 0000:00:1f.6 eth0: MAC: 14, PHY: 12, PBA No: FFFFFF-0FF
[    5.234784] e1000e 0000:00:1f.6 enp0s31f6: renamed from eth0
[    5.292847] snd_hda_codec_realtek hdaudioC0D0: autoconfig for ALC3204: line_outs=1 (0x14/0x0/0x0/0x0/0x0) type:speaker
[    5.292857] snd_hda_codec_realtek hdaudioC0D0:    speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
[    5.292862] snd_hda_codec_realtek hdaudioC0D0:    hp_outs=1 (0x21/0x0/0x0/0x0/0x0)
[    5.292866] snd_hda_codec_realtek hdaudioC0D0:    mono: mono_out=0x0
[    5.292870] snd_hda_codec_realtek hdaudioC0D0:    inputs:
[    5.292874] snd_hda_codec_realtek hdaudioC0D0:      Headset Mic=0x19
[    5.292878] snd_hda_codec_realtek hdaudioC0D0:      Headphone Mic=0x1a
[    5.292882] snd_hda_codec_realtek hdaudioC0D0:      Internal Mic=0x12
[    5.309251] usbcore: registered new interface driver cdc_ether
[    5.355470] modprobe (394) used greatest stack depth: 11912 bytes left
[    5.364993] Bluetooth: Core ver 2.22
[    5.365185] NET: Registered PF_BLUETOOTH protocol family
[    5.365189] Bluetooth: HCI device and connection manager initialized
[    5.365291] Bluetooth: HCI socket layer initialized
[    5.365304] Bluetooth: L2CAP socket layer initialized
[    5.365363] Bluetooth: SCO socket layer initialized
[    5.366743] snd_hda_codec_hdmi hdaudioC0D2: No i915 binding for Intel HDMI/DP codec
[    5.372069] hdaudio hdaudioC0D2: Unable to configure, disabling
[    5.381508] input: HDA Intel PCH Headphone Mic as /devices/pci0000:00/0000:00:1f.3/sound/card0/input9
[    5.392328] usbcore: registered new interface driver btusb
[    5.394242] Bluetooth: hci0: Bootloader revision 0.4 build 0 week 30 2018
[    5.395748] Bluetooth: hci0: Device revision is 2
[    5.395755] Bluetooth: hci0: Secure boot is enabled
[    5.395759] Bluetooth: hci0: OTP lock is enabled
[    5.395763] Bluetooth: hci0: API lock is enabled
[    5.395767] Bluetooth: hci0: Debug lock is disabled
[    5.395770] Bluetooth: hci0: Minimum firmware build 1 week 10 2014
[    5.408964] Bluetooth: hci0: Found device firmware: intel/ibt-19-0-4.sfi
[    5.409045] Bluetooth: hci0: Boot Address: 0x24800
[    5.409049] Bluetooth: hci0: Firmware Version: 126-5.22
[    5.466807] cdc_ncm 3-2.3:2.0: MAC-Address: f8:e4:3b:64:5e:92
[    5.466813] cdc_ncm 3-2.3:2.0: setting rx_max = 16384
[    5.478381] cdc_ncm 3-2.3:2.0: setting tx_max = 16384
[    5.500869] cdc_ncm 3-2.3:2.0 eth0: register 'cdc_ncm' at usb-0000:00:14.0-2.3, CDC NCM (NO ZLP), f8:e4:3b:64:5e:92
[    5.504158] usbcore: registered new interface driver cdc_ncm
[    5.548588] cdc_ncm 3-2.3:2.0 enxf8e43b645e92: renamed from eth0
[    5.880193] loop0: detected capacity change from 0 to 8
[    7.490726] Bluetooth: hci0: Waiting for firmware download to complete
[    7.491742] Bluetooth: hci0: Firmware loaded in 2033953 usecs
[    7.492180] Bluetooth: hci0: Waiting for device to boot
[    7.509858] Bluetooth: hci0: Device booted in 17373 usecs
[    7.515812] Bluetooth: hci0: Found Intel DDC parameters: intel/ibt-19-0-4.ddc
[    7.517851] Bluetooth: hci0: Applying Intel DDC parameters completed
[    7.518849] Bluetooth: hci0: Firmware revision 0.4 build 126 week 5 2022
[    7.588923] Bluetooth: MGMT ver 1.22
[   30.462962] systemd-journald[235]: Failed to set ACL on /var/log/journal/a26005d73e4e4e9dad3f94ec0e385727/user-1000.journal, ignoring: Operation not supported

[   38.599155] **********************************************************
[   38.599156] **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
[   38.599157] **                                                      **
[   38.599158] ** trace_printk() being used. Allocating extra memory.  **
[   38.599159] **                                                      **
[   38.599160] ** This means that this is a DEBUG kernel and it is     **
[   38.599161] ** unsafe for production use.                           **
[   38.599163] **                                                      **
[   38.599164] ** If you see this message and you are not debugging    **
[   38.599165] ** the kernel, report this immediately to your vendor!  **
[   38.599166] **                                                      **
[   38.599167] **   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
[   38.599168] **********************************************************
[   38.613741] Console: switching to colour dummy device 80x25
[   38.615420] xe 0000:00:02.0: vgaarb: deactivate vga console
[   38.617878] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] XE_TIGERLAKE  9a49:0001 dgfx:0 gfx:Xe_LP (12.00) media:Xe_M (12.00) display:yes dma_m_s:39 tc:1
[   38.617964] xe 0000:00:02.0: [drm:xe_pci_probe [xe]] Stepping = (G:B0, M:B0, D:D0, B:**)
[   38.618029] xe 0000:00:02.0: [drm:intel_pch_type [xe]] Found Tiger Lake LP PCH
[   38.618097] xe 0000:00:02.0: [drm:intel_power_domains_init [xe]] Allowed DC state mask 4000000a
[   38.618222] xe 0000:00:02.0: [drm:intel_opregion_setup [xe]] graphic opregion physical addr: 0x63d05018
[   38.618303] xe 0000:00:02.0: [drm:intel_opregion_setup [xe]] ACPI OpRegion version 2.1.0
[   38.618364] xe 0000:00:02.0: [drm:intel_opregion_setup [xe]] Public ACPI methods supported
[   38.618425] xe 0000:00:02.0: [drm:intel_opregion_setup [xe]] SWSCI Mailbox #2 present for opregion v2.x
[   38.618482] xe 0000:00:02.0: [drm:intel_opregion_setup [xe]] SWSCI supported
[   38.625925] xe 0000:00:02.0: [drm:intel_opregion_setup [xe]] SWSCI GBDA callbacks 00000cb3, SBCB callbacks 00300583
[   38.625999] xe 0000:00:02.0: [drm:intel_opregion_setup [xe]] ASLE supported
[   38.626052] xe 0000:00:02.0: [drm:intel_opregion_setup [xe]] ASLE extension supported
[   38.626127] xe 0000:00:02.0: [drm:intel_opregion_setup [xe]] Found valid VBT in ACPI OpRegion (RVDA)
[   38.626208] xe 0000:00:02.0: [drm:intel_dram_detect [xe]] DRAM channels: 1
[   38.626277] xe 0000:00:02.0: [drm:xe_display_init_noirq [xe]] Watermark level 0 adjustment needed: no
[   38.626541] xe 0000:00:02.0: [drm:icl_get_qgv_points.constprop.0 [xe]] QGV 0: DCLK=2134 tRP=15 tRDPRE=8 tRAS=35 tRCD=15 tRC=50
[   38.626657] xe 0000:00:02.0: [drm:icl_get_qgv_points.constprop.0 [xe]] QGV 1: DCLK=2134 tRP=15 tRDPRE=8 tRAS=35 tRCD=15 tRC=50
[   38.626744] xe 0000:00:02.0: [drm:icl_get_qgv_points.constprop.0 [xe]] QGV 2: DCLK=3201 tRP=22 tRDPRE=12 tRAS=52 tRCD=22 tRC=74
[   38.626827] xe 0000:00:02.0: [drm:icl_get_qgv_points.constprop.0 [xe]] QGV 3: DCLK=2668 tRP=19 tRDPRE=10 tRAS=43 tRCD=19 tRC=62
[   38.626879] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW0 / QGV 0: num_planes=0 deratedbw=6224 peakbw: 17072
[   38.626929] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW0 / QGV 1: num_planes=0 deratedbw=6224 peakbw: 17072
[   38.626979] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW0 / QGV 2: num_planes=0 deratedbw=8380 peakbw: 25608
[   38.627028] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW0 / QGV 3: num_planes=0 deratedbw=7318 peakbw: 21344
[   38.627077] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW1 / QGV 0: num_planes=1 deratedbw=6876 peakbw: 17072
[   38.627125] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW1 / QGV 1: num_planes=1 deratedbw=6876 peakbw: 17072
[   38.627172] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW1 / QGV 2: num_planes=1 deratedbw=9704 peakbw: 25608
[   38.627219] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW1 / QGV 3: num_planes=1 deratedbw=8307 peakbw: 21344
[   38.627266] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW2 / QGV 0: num_planes=0 deratedbw=7257 peakbw: 17072
[   38.627313] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW2 / QGV 1: num_planes=0 deratedbw=7257 peakbw: 17072
[   38.627363] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW2 / QGV 2: num_planes=0 deratedbw=10536 peakbw: 25608
[   38.627412] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW2 / QGV 3: num_planes=0 deratedbw=8909 peakbw: 21344
[   38.627458] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW3 / QGV 0: num_planes=0 deratedbw=7464 peakbw: 17072
[   38.627503] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW3 / QGV 1: num_planes=0 deratedbw=7464 peakbw: 17072
[   38.627548] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW3 / QGV 2: num_planes=0 deratedbw=11007 peakbw: 25608
[   38.627605] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW3 / QGV 3: num_planes=0 deratedbw=9243 peakbw: 21344
[   38.627650] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW4 / QGV 0: num_planes=0 deratedbw=7571 peakbw: 17072
[   38.627694] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW4 / QGV 1: num_planes=0 deratedbw=7571 peakbw: 17072
[   38.627738] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW4 / QGV 2: num_planes=0 deratedbw=11259 peakbw: 25608
[   38.627781] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW4 / QGV 3: num_planes=0 deratedbw=9421 peakbw: 21344
[   38.627823] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW5 / QGV 0: num_planes=0 deratedbw=7626 peakbw: 17072
[   38.627866] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW5 / QGV 1: num_planes=0 deratedbw=7626 peakbw: 17072
[   38.627908] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW5 / QGV 2: num_planes=0 deratedbw=11390 peakbw: 25608
[   38.627951] xe 0000:00:02.0: [drm:tgl_get_bw_info.isra.0 [xe]] BW5 / QGV 3: num_planes=0 deratedbw=9512 peakbw: 21344
[   38.628383] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Set default to SSC at 120000 kHz
[   38.628430] xe 0000:00:02.0: [drm:intel_bios_init [xe]] VBT signature "$VBT TIGERLAKE      ", BDB version 237
[   38.628479] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Found BDB block 1 (size 5, min size 7)
[   38.628527] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Found BDB block 2 (size 356, min size 5)
[   38.628572] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Found BDB block 9 (size 100, min size 100)
[   38.628640] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Found BDB block 12 (size 19, min size 19)
[   38.628692] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Found BDB block 27 (size 780, min size 812)
[   38.628742] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Found BDB block 40 (size 30, min size 34)
[   38.628787] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Generating LFP data table pointers
[   38.628841] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Found BDB block 41 (size 148, min size 148)
[   38.628889] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Found BDB block 42 (size 1364, min size 1366)
[   38.628934] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Found BDB block 43 (size 273, min size 305)
[   38.628979] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Found BDB block 44 (size 58, min size 78)
[   38.629025] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Found BDB block 52 (size 822, min size 822)
[   38.629070] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Found BDB block 56 (size 210, min size 210)
[   38.629112] xe 0000:00:02.0: [drm:intel_bios_init [xe]] BDB_GENERAL_FEATURES int_tv_support 0 int_crt_support 0 lvds_use_ssc 0 lvds_ssc_freq 120000 display_clock_mode 1 fdi_rx_polarity_inverted 0
[   38.629155] xe 0000:00:02.0: [drm:intel_bios_init [xe]] crt_ddc_bus_pin: 2
[   38.629196] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Found VBT child device with type 0x1806
[   38.629241] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Found VBT child device with type 0x60d2
[   38.629289] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Found VBT child device with type 0x60d6
[   38.629338] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Found VBT child device with type 0x60d6
[   38.629387] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Skipping SDVO device mapping
[   38.629432] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Port A VBT info: CRT:0 DVI:0 HDMI:0 DP:1 eDP:1 DSI:0 DP++:0 LSPCON:0 USB-Type-C:0 TBT:0 DSC:0
[   38.629477] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Port A VBT HDMI level shift: 0
[   38.629521] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Port B VBT info: CRT:0 DVI:1 HDMI:1 DP:0 eDP:0 DSI:0 DP++:0 LSPCON:0 USB-Type-C:0 TBT:0 DSC:0
[   38.629565] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Port B VBT HDMI level shift: 0
[   38.629641] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Port D VBT info: CRT:0 DVI:1 HDMI:1 DP:1 eDP:0 DSI:0 DP++:1 LSPCON:0 USB-Type-C:1 TBT:1 DSC:0
[   38.629720] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Port D VBT HDMI level shift: 0
[   38.629779] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Port E VBT info: CRT:0 DVI:1 HDMI:1 DP:1 eDP:0 DSI:0 DP++:1 LSPCON:0 USB-Type-C:1 TBT:1 DSC:0
[   38.629827] xe 0000:00:02.0: [drm:intel_bios_init [xe]] Port E VBT HDMI level shift: 0
[   38.629873] xe 0000:00:02.0: [drm:intel_power_domains_init [xe]] Allowed DC state mask 4000000a
[   38.629951] xe 0000:00:02.0: [drm:gen9_set_dc_state.part.0 [xe]] Setting DC state from 00 to 00
[   38.630047] xe 0000:00:02.0: [drm:check_phy_reg [xe]] Combo PHY A reg 001628a0 state mismatch: current 300331dc mask e0000000 expected a0000000
[   38.630108] xe 0000:00:02.0: [drm:check_phy_reg [xe]] Combo PHY A reg 00162804 state mismatch: current 1c300004 mask 00300000 expected 00000000
[   38.630163] xe 0000:00:02.0: [drm:icl_verify_procmon_ref_values [xe]] Combo PHY A Voltage/Process Info : 0.85V dot0 (low-voltage)
[   38.630253] xe 0000:00:02.0: [drm:check_phy_reg [xe]] Combo PHY B reg 0006c8a0 state mismatch: current 3003501c mask e0000000 expected a0000000
[   38.630308] xe 0000:00:02.0: [drm:check_phy_reg [xe]] Combo PHY B reg 0006c804 state mismatch: current 1c300004 mask 00300000 expected 00000000
[   38.630361] xe 0000:00:02.0: [drm:icl_verify_procmon_ref_values [xe]] Combo PHY B Voltage/Process Info : 0.85V dot0 (low-voltage)
[   38.630478] xe 0000:00:02.0: [drm:intel_power_well_enable [xe]] enabling PW_1
[   38.630564] xe 0000:00:02.0: [drm:intel_cdclk_init_hw [xe]] Current CDCLK 172800 kHz, VCO 345600 kHz, ref 38400 kHz, bypass 19200 kHz, voltage level 0
[   38.630645] xe 0000:00:02.0: [drm:gen9_dbuf_slices_update [xe]] Updating dbuf slices to 0x3
[   38.630739] xe 0000:00:02.0: [drm:intel_power_well_enable [xe]] enabling always-on
[   38.630801] xe 0000:00:02.0: [drm:intel_power_well_enable [xe]] enabling DC_off
[   38.630860] xe 0000:00:02.0: [drm:gen9_set_dc_state.part.0 [xe]] Setting DC state from 00 to 00
[   38.630934] xe 0000:00:02.0: [drm:icl_verify_procmon_ref_values [xe]] Combo PHY A Voltage/Process Info : 0.85V dot0 (low-voltage)
[   38.631001] xe 0000:00:02.0: [drm:icl_combo_phys_init [xe]] Combo PHY A already enabled, won't reprogram it.
[   38.631064] xe 0000:00:02.0: [drm:icl_verify_procmon_ref_values [xe]] Combo PHY B Voltage/Process Info : 0.85V dot0 (low-voltage)
[   38.631124] xe 0000:00:02.0: [drm:icl_combo_phys_init [xe]] Combo PHY B already enabled, won't reprogram it.
[   38.631175] xe 0000:00:02.0: [drm:intel_power_well_enable [xe]] enabling PW_2
[   38.631240] xe 0000:00:02.0: [drm:intel_power_well_enable [xe]] enabling PW_3
[   38.631305] xe 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[   38.631343] xe 0000:00:02.0: [drm:intel_power_well_enable [xe]] enabling PW_4
[   38.631401] xe 0000:00:02.0: [drm:intel_power_well_enable [xe]] enabling PW_5
[   38.631721] xe 0000:00:02.0: [drm:intel_power_well_sync_hw [xe]] TC cold unblock succeeded
[   38.631800] xe 0000:00:02.0: [drm:intel_dmc_init [xe]] Loading i915/tgl_dmc_ver2_12.bin
[   38.633112] xe 0000:00:02.0: [drm:intel_fbc_init [xe]] Sanitized enable_fbc value: 1
[   38.633987] xe 0000:00:02.0: [drm] Finished loading DMC firmware i915/tgl_dmc_ver2_12.bin (v2.12)
[   38.634291] GT topology dss mask (geometry): 00000000,00000000,0000001f
[   38.634297] GT topology dss mask (compute):  00000000,00000000,00000000
[   38.634298] GT topology EU mask per DSS:     0000ffff
[   38.635113] xe 0000:00:02.0: [drm:xe_ttm_stolen_mgr_init [xe]] Initialized stolen memory support with 67108864 bytes
[   38.635233] xe 0000:00:02.0: [drm:skl_wm_init [xe]] SAGV supported: yes, original SAGV block time: 11 us
[   38.635358] xe 0000:00:02.0: [drm:intel_print_wm_latency [xe]] Gen9 Plane WM0 latency 3 (3.0 usec)
[   38.635421] xe 0000:00:02.0: [drm:intel_print_wm_latency [xe]] Gen9 Plane WM1 latency 54 (54.0 usec)
[   38.635478] xe 0000:00:02.0: [drm:intel_print_wm_latency [xe]] Gen9 Plane WM2 latency 54 (54.0 usec)
[   38.635533] xe 0000:00:02.0: [drm:intel_print_wm_latency [xe]] Gen9 Plane WM3 latency 54 (54.0 usec)
[   38.635600] xe 0000:00:02.0: [drm:intel_print_wm_latency [xe]] Gen9 Plane WM4 latency 54 (54.0 usec)
[   38.635681] xe 0000:00:02.0: [drm:intel_print_wm_latency [xe]] Gen9 Plane WM5 latency 73 (73.0 usec)
[   38.635732] xe 0000:00:02.0: [drm:intel_print_wm_latency [xe]] Gen9 Plane WM6 latency 110 (110.0 usec)
[   38.635782] xe 0000:00:02.0: [drm:intel_print_wm_latency [xe]] Gen9 Plane WM7 latency 115 (115.0 usec)
[   38.637411] xe 0000:00:02.0: [drm:intel_display_driver_probe_nogem [xe]] 4 display pipes available.
[   38.641159] xe 0000:00:02.0: [drm:intel_cdclk_dump_config [xe]] Current CDCLK 172800 kHz, VCO 345600 kHz, ref 38400 kHz, bypass 19200 kHz, voltage level 0
[   38.641392] xe 0000:00:02.0: [drm:i915_hdcp_component_bind [xe]] I915 HDCP comp bind
[   38.641505] mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_ops [xe])
[   38.641593] xe 0000:00:02.0: [drm:intel_update_max_cdclk [xe]] Max CD clock rate: 652800 kHz
[   38.641689] xe 0000:00:02.0: [drm:intel_display_driver_probe_nogem [xe]] Max dotclock rate: 1305600 kHz
[   38.641797] xe 0000:00:02.0: [drm:intel_dp_aux_ch [xe]] [ENCODER:307:DDI A/PHY A] Using AUX CH A (VBT)
[   38.641921] xe 0000:00:02.0: [drm:intel_dp_init_connector [xe]] Adding eDP connector on [ENCODER:307:DDI A/PHY A]
[   38.646594] xe 0000:00:02.0: [drm:intel_opregion_get_panel_type [xe]] Ignoring OpRegion panel type (0)
[   38.646763] xe 0000:00:02.0: [drm:intel_bios_init_panel [xe]] Panel type (VBT): 14
[   38.646917] xe 0000:00:02.0: [drm:intel_bios_init_panel [xe]] Selected panel type (VBT): 14
[   38.647052] xe 0000:00:02.0: [drm:intel_bios_init_panel [xe]] DRRS supported mode is seamless
[   38.647198] xe 0000:00:02.0: [drm:intel_bios_init_panel [xe]] Found panel mode in BIOS VBT legacy lfp table: "1920x1080": 60 148500 1920 2008 2053 2200 1080 1083 1089 1125 0x8 0xa
[   38.647319] xe 0000:00:02.0: [drm:intel_bios_init_panel [xe]] VBT initial LVDS value 300
[   38.647402] xe 0000:00:02.0: [drm:dump_pnp_id [xe]] Panel PNPID mfg: MS_ (0x7f36), prod: 3, serial: 15, week: 0, year: 2002
[   38.647486] xe 0000:00:02.0: [drm:intel_bios_init_panel [xe]] Panel name: LFP_PanelName
[   38.647562] xe 0000:00:02.0: [drm:intel_bios_init_panel [xe]] Seamless DRRS min refresh rate: 0 Hz
[   38.647667] xe 0000:00:02.0: [drm:intel_bios_init_panel [xe]] VBT backlight PWM modulation frequency 200 Hz, active high, min brightness 15, level 255, controller 0
[   38.647847] xe 0000:00:02.0: [drm:intel_pps_init [xe]] [ENCODER:307:DDI A/PHY A] initial power sequencer: PPS 0
[   38.648002] xe 0000:00:02.0: [drm:pps_init_delays [xe]] bios t1_t3 1 t8 1 t9 1 t10 500 t11_t12 6000
[   38.648095] xe 0000:00:02.0: [drm:pps_init_delays [xe]] vbt t1_t3 2000 t8 1500 t9 2000 t10 500 t11_t12 6000
[   38.648178] xe 0000:00:02.0: [drm:pps_init_delays [xe]] spec t1_t3 2100 t8 500 t9 500 t10 5000 t11_t12 6100
[   38.648257] xe 0000:00:02.0: [drm:pps_init_delays [xe]] panel power up delay 200, power down delay 50, power cycle delay 600
[   38.648336] xe 0000:00:02.0: [drm:pps_init_delays [xe]] backlight on delay 150, off delay 200
[   38.648499] xe 0000:00:02.0: [drm:pps_init_registers [xe]] panel power sequencer register settings: PP_ON 0x7d00001, PP_OFF 0x1f40001, PP_DIV 0x60
[   38.648699] xe 0000:00:02.0: [drm:intel_power_well_enable [xe]] enabling AUX_A
[   38.648869] xe 0000:00:02.0: [drm:intel_pps_vdd_on_unlocked [xe]] [ENCODER:307:DDI A/PHY A] PPS 0 turning VDD on
[   38.648988] xe 0000:00:02.0: [drm:intel_pps_vdd_on_unlocked [xe]] [ENCODER:307:DDI A/PHY A] PPS 0 PP_STATUS: 0x80000008 PP_CONTROL: 0x0000006f
[   38.649821] xe 0000:00:02.0: [drm:drm_dp_read_dpcd_caps [drm_display_helper]] AUX A/DDI A/PHY A: DPCD: 11 0a 82 41 00 00 01 00 02 02 06 00 00 0b 00
[   38.650408] xe 0000:00:02.0: [drm:drm_dp_read_desc [drm_display_helper]] AUX A/DDI A/PHY A: DP sink: OUI 00-00-00 dev-ID  HW-rev 0.0 SW-rev 0.0 quirks 0x0000
[   38.650803] xe 0000:00:02.0: [drm:intel_dp_init_connector [xe]] eDP DPCD: 01 12 07
[   38.656798] xe 0000:00:02.0: [drm:update_display_info.part.0] [CONNECTOR:308:eDP-1] Assigning EDID-1.4 digital sink color depth as 6 bpc.
[   38.656810] xe 0000:00:02.0: [drm:update_display_info.part.0] [CONNECTOR:308:eDP-1] ELD monitor 
[   38.656814] xe 0000:00:02.0: [drm:update_display_info.part.0] [CONNECTOR:308:eDP-1] ELD size 20, SAD count 0
[   38.656863] xe 0000:00:02.0: [drm:intel_panel_add_edid_fixed_modes [xe]] [CONNECTOR:308:eDP-1] using preferred EDID fixed mode: "1920x1080": 60 146500 1920 1968 2000 2180 1080 1083 1089 1120 0x48 0x9
[   38.656946] xe 0000:00:02.0: [drm:intel_panel_add_edid_fixed_modes [xe]] [CONNECTOR:308:eDP-1] using alternate EDID fixed mode: "1920x1080": 48 117200 1920 1968 2000 2180 1080 1083 1089 1120 0x40 0x9
[   38.657013] xe 0000:00:02.0: [drm:intel_dp_wait_source_oui [xe]] [CONNECTOR:308:eDP-1] Performing OUI wait (30 ms)
[   38.657489] xe 0000:00:02.0: [drm:intel_panel_init [xe]] [CONNECTOR:308:eDP-1] DRRS type: none
[   38.657616] xe 0000:00:02.0: [drm:cnp_setup_backlight [xe]] [CONNECTOR:308:eDP-1] Using native PCH PWM for backlight control (controller=0)
[   38.657744] xe 0000:00:02.0: [drm:intel_backlight_setup [xe]] [CONNECTOR:308:eDP-1] backlight initialized, enabled, brightness 96000/96000
[   38.657900] xe 0000:00:02.0: [drm:pps_init_delays [xe]] bios t1_t3 1 t8 1 t9 1 t10 500 t11_t12 6000
[   38.657975] xe 0000:00:02.0: [drm:pps_init_delays [xe]] vbt t1_t3 2000 t8 1500 t9 2000 t10 500 t11_t12 6000
[   38.658040] xe 0000:00:02.0: [drm:pps_init_delays [xe]] spec t1_t3 2100 t8 500 t9 500 t10 5000 t11_t12 6100
[   38.658102] xe 0000:00:02.0: [drm:pps_init_delays [xe]] panel power up delay 200, power down delay 50, power cycle delay 600
[   38.658160] xe 0000:00:02.0: [drm:pps_init_delays [xe]] backlight on delay 150, off delay 200
[   38.658270] xe 0000:00:02.0: [drm:pps_init_registers [xe]] panel power sequencer register settings: PP_ON 0x7d00001, PP_OFF 0x1f40001, PP_DIV 0x60
[   38.658528] xe 0000:00:02.0: [drm:intel_hdmi_init_connector [xe]] Adding HDMI connector on [ENCODER:316:DDI B/PHY B]
[   38.658615] xe 0000:00:02.0: [drm:intel_hdmi_init_connector [xe]] [ENCODER:316:DDI B/PHY B] Using DDC pin 0x2 (VBT)
[   38.658875] xe 0000:00:02.0: [drm:intel_dp_aux_ch [xe]] [ENCODER:325:DDI TC1/PHY TC1] Using AUX CH D (VBT)
[   38.658950] xe 0000:00:02.0: [drm:intel_ddi_init [xe]] VBT says port D is non-legacy TC and has HDMI (with DP: yes), assume it's non-legacy
[   38.659073] xe 0000:00:02.0: [drm:intel_power_well_enable [xe]] enabling TC_cold_off
[   38.659171] xe 0000:00:02.0: [drm:intel_power_well_enable [xe]] TC cold block succeeded
[   38.659336] xe 0000:00:02.0: [drm:tc_phy_get_current_mode [xe]] Port D/TC#1: PHY mode: tbt-alt (ready: no, owned: no, HPD: disconnected)
[   38.659448] xe 0000:00:02.0: [drm:intel_dp_init_connector [xe]] Adding DP connector on [ENCODER:325:DDI TC1/PHY TC1]
[   38.659786] xe 0000:00:02.0: [drm:intel_hdmi_init_connector [xe]] Adding HDMI connector on [ENCODER:325:DDI TC1/PHY TC1]
[   38.659859] xe 0000:00:02.0: [drm:intel_hdmi_init_connector [xe]] [ENCODER:325:DDI TC1/PHY TC1] Using DDC pin 0x9 (platform default)
[   38.659985] xe 0000:00:02.0: [drm:intel_dp_aux_ch [xe]] [ENCODER:338:DDI TC2/PHY TC2] Using AUX CH E (VBT)
[   38.660056] xe 0000:00:02.0: [drm:intel_ddi_init [xe]] VBT says port E is non-legacy TC and has HDMI (with DP: yes), assume it's non-legacy
[   38.660159] xe 0000:00:02.0: [drm:tc_phy_get_current_mode [xe]] Port E/TC#2: PHY mode: tbt-alt (ready: no, owned: no, HPD: disconnected)
[   38.660254] xe 0000:00:02.0: [drm:intel_dp_init_connector [xe]] Adding DP connector on [ENCODER:338:DDI TC2/PHY TC2]
[   38.660467] xe 0000:00:02.0: [drm:intel_hdmi_init_connector [xe]] Adding HDMI connector on [ENCODER:338:DDI TC2/PHY TC2]
[   38.660534] xe 0000:00:02.0: [drm:intel_hdmi_init_connector [xe]] [ENCODER:338:DDI TC2/PHY TC2] Using DDC pin 0xa (platform default)
[   38.660851] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [CRTC:98:pipe A] hw state readout: enabled
[   38.660937] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [CRTC:167:pipe B] hw state readout: disabled
[   38.661013] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [CRTC:236:pipe C] hw state readout: disabled
[   38.661088] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [CRTC:305:pipe D] hw state readout: disabled
[   38.661151] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:31:plane 1A] hw state readout: enabled, pipe A
[   38.661210] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:40:plane 2A] hw state readout: disabled, pipe A
[   38.661268] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:49:plane 3A] hw state readout: disabled, pipe A
[   38.661324] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:58:plane 4A] hw state readout: disabled, pipe A
[   38.661377] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:67:plane 5A] hw state readout: disabled, pipe A
[   38.661430] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:76:plane 6A] hw state readout: disabled, pipe A
[   38.661483] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:85:plane 7A] hw state readout: disabled, pipe A
[   38.661534] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:94:cursor A] hw state readout: disabled, pipe A
[   38.661594] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:100:plane 1B] hw state readout: disabled, pipe B
[   38.661652] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:109:plane 2B] hw state readout: disabled, pipe B
[   38.661739] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:118:plane 3B] hw state readout: disabled, pipe B
[   38.661798] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:127:plane 4B] hw state readout: disabled, pipe B
[   38.661846] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:136:plane 5B] hw state readout: disabled, pipe B
[   38.661894] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:145:plane 6B] hw state readout: disabled, pipe B
[   38.661942] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:154:plane 7B] hw state readout: disabled, pipe B
[   38.661991] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:163:cursor B] hw state readout: disabled, pipe B
[   38.662038] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:169:plane 1C] hw state readout: disabled, pipe C
[   38.662091] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:178:plane 2C] hw state readout: disabled, pipe C
[   38.662147] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:187:plane 3C] hw state readout: disabled, pipe C
[   38.662199] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:196:plane 4C] hw state readout: disabled, pipe C
[   38.662251] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:205:plane 5C] hw state readout: disabled, pipe C
[   38.662301] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:214:plane 6C] hw state readout: disabled, pipe C
[   38.662362] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:223:plane 7C] hw state readout: disabled, pipe C
[   38.662421] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:232:cursor C] hw state readout: disabled, pipe C
[   38.662479] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:238:plane 1D] hw state readout: disabled, pipe D
[   38.662537] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:247:plane 2D] hw state readout: disabled, pipe D
[   38.662606] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:256:plane 3D] hw state readout: disabled, pipe D
[   38.662665] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:265:plane 4D] hw state readout: disabled, pipe D
[   38.662728] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:274:plane 5D] hw state readout: disabled, pipe D
[   38.662826] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:283:plane 6D] hw state readout: disabled, pipe D
[   38.662902] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:292:plane 7D] hw state readout: disabled, pipe D
[   38.662961] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:301:cursor D] hw state readout: disabled, pipe D
[   38.663035] xe 0000:00:02.0: [drm:intel_ddi_get_config [xe]] [ENCODER:307:DDI A/PHY A] Fec status: 0
[   38.663133] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [ENCODER:307:DDI A/PHY A] hw state readout: enabled, pipe A
[   38.663213] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [ENCODER:316:DDI B/PHY B] hw state readout: disabled, pipe A
[   38.663295] xe 0000:00:02.0: [drm:intel_tc_port_sanitize_mode [xe]] Port D/TC#1: sanitize mode (disconnected)
[   38.663367] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [ENCODER:325:DDI TC1/PHY TC1] hw state readout: disabled, pipe A
[   38.663430] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [ENCODER:327:DP-MST A] hw state readout: disabled, pipe A
[   38.663487] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [ENCODER:328:DP-MST B] hw state readout: disabled, pipe B
[   38.663543] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [ENCODER:329:DP-MST C] hw state readout: disabled, pipe C
[   38.663616] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [ENCODER:330:DP-MST D] hw state readout: disabled, pipe D
[   38.663674] xe 0000:00:02.0: [drm:intel_power_well_disable [xe]] disabling TC_cold_off
[   38.663781] xe 0000:00:02.0: [drm:__intel_display_power_put_domain [xe]] TC cold unblock succeeded
[   38.663879] xe 0000:00:02.0: [drm:intel_tc_port_sanitize_mode [xe]] Port E/TC#2: sanitize mode (disconnected)
[   38.663952] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [ENCODER:338:DDI TC2/PHY TC2] hw state readout: disabled, pipe A
[   38.664023] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [ENCODER:340:DP-MST A] hw state readout: disabled, pipe A
[   38.664089] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [ENCODER:341:DP-MST B] hw state readout: disabled, pipe B
[   38.664150] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [ENCODER:342:DP-MST C] hw state readout: disabled, pipe C
[   38.664211] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [ENCODER:343:DP-MST D] hw state readout: disabled, pipe D
[   38.664273] xe 0000:00:02.0: [drm:intel_reference_shared_dpll_crtc [xe]] [CRTC:98:pipe A] reserving DPLL 0
[   38.664346] xe 0000:00:02.0: [drm:intel_dpll_readout_hw_state [xe]] DPLL 0 hw state readout: pipe_mask 0x1, on 1
[   38.664418] xe 0000:00:02.0: [drm:intel_dpll_readout_hw_state [xe]] DPLL 1 hw state readout: pipe_mask 0x0, on 0
[   38.664482] xe 0000:00:02.0: [drm:intel_dpll_readout_hw_state [xe]] TBT PLL hw state readout: pipe_mask 0x0, on 0
[   38.664543] xe 0000:00:02.0: [drm:intel_dpll_readout_hw_state [xe]] TC PLL 1 hw state readout: pipe_mask 0x0, on 0
[   38.664604] xe 0000:00:02.0: [drm:intel_dpll_readout_hw_state [xe]] TC PLL 2 hw state readout: pipe_mask 0x0, on 0
[   38.664657] xe 0000:00:02.0: [drm:intel_dpll_readout_hw_state [xe]] TC PLL 3 hw state readout: pipe_mask 0x0, on 0
[   38.664709] xe 0000:00:02.0: [drm:intel_dpll_readout_hw_state [xe]] TC PLL 4 hw state readout: pipe_mask 0x0, on 0
[   38.664789] xe 0000:00:02.0: [drm:intel_dpll_readout_hw_state [xe]] TC PLL 5 hw state readout: pipe_mask 0x0, on 0
[   38.664860] xe 0000:00:02.0: [drm:intel_dpll_readout_hw_state [xe]] TC PLL 6 hw state readout: pipe_mask 0x0, on 0
[   38.664954] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [CONNECTOR:308:eDP-1] hw state readout: enabled
[   38.665033] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [CONNECTOR:317:HDMI-A-1] hw state readout: disabled
[   38.665096] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [CONNECTOR:326:DP-1] hw state readout: disabled
[   38.665157] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [CONNECTOR:335:HDMI-A-2] hw state readout: disabled
[   38.665213] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [CONNECTOR:339:DP-2] hw state readout: disabled
[   38.665269] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [CONNECTOR:347:HDMI-A-3] hw state readout: disabled
[   38.665398] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:31:plane 1A] min_cdclk 73250 kHz
[   38.665459] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:40:plane 2A] min_cdclk 0 kHz
[   38.665513] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:49:plane 3A] min_cdclk 0 kHz
[   38.665566] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:58:plane 4A] min_cdclk 0 kHz
[   38.665638] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:67:plane 5A] min_cdclk 0 kHz
[   38.665692] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:76:plane 6A] min_cdclk 0 kHz
[   38.665752] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:85:plane 7A] min_cdclk 0 kHz
[   38.665839] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:94:cursor A] min_cdclk 0 kHz
[   38.665926] xe 0000:00:02.0: [drm:intel_bw_crtc_update [xe]] pipe A data rate 586000 num active planes 1
[   38.666005] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:100:plane 1B] min_cdclk 0 kHz
[   38.666063] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:109:plane 2B] min_cdclk 0 kHz
[   38.666119] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:118:plane 3B] min_cdclk 0 kHz
[   38.666173] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:127:plane 4B] min_cdclk 0 kHz
[   38.666226] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:136:plane 5B] min_cdclk 0 kHz
[   38.666280] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:145:plane 6B] min_cdclk 0 kHz
[   38.666337] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:154:plane 7B] min_cdclk 0 kHz
[   38.666395] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:163:cursor B] min_cdclk 0 kHz
[   38.666453] xe 0000:00:02.0: [drm:intel_bw_crtc_update [xe]] pipe B data rate 0 num active planes 0
[   38.666529] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:169:plane 1C] min_cdclk 0 kHz
[   38.666599] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:178:plane 2C] min_cdclk 0 kHz
[   38.666656] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:187:plane 3C] min_cdclk 0 kHz
[   38.666712] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:196:plane 4C] min_cdclk 0 kHz
[   38.666798] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:205:plane 5C] min_cdclk 0 kHz
[   38.666875] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:214:plane 6C] min_cdclk 0 kHz
[   38.666934] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:223:plane 7C] min_cdclk 0 kHz
[   38.666993] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:232:cursor C] min_cdclk 0 kHz
[   38.667050] xe 0000:00:02.0: [drm:intel_bw_crtc_update [xe]] pipe C data rate 0 num active planes 0
[   38.667129] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:238:plane 1D] min_cdclk 0 kHz
[   38.667189] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:247:plane 2D] min_cdclk 0 kHz
[   38.667244] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:256:plane 3D] min_cdclk 0 kHz
[   38.667299] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:265:plane 4D] min_cdclk 0 kHz
[   38.667349] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:274:plane 5D] min_cdclk 0 kHz
[   38.667399] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:283:plane 6D] min_cdclk 0 kHz
[   38.667447] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:292:plane 7D] min_cdclk 0 kHz
[   38.667500] xe 0000:00:02.0: [drm:intel_modeset_setup_hw_state [xe]] [PLANE:301:cursor D] min_cdclk 0 kHz
[   38.667555] xe 0000:00:02.0: [drm:intel_bw_crtc_update [xe]] pipe D data rate 0 num active planes 0
[   38.667633] xe 0000:00:02.0: [drm:intel_power_well_enable [xe]] enabling DDI_IO_A
[   38.694653] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] [CRTC:98:pipe A] enable: yes [setup_hw_state]
[   38.694741] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] active: yes, output_types: EDP (0x100), output format: RGB, sink format: RGB
[   38.694796] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] cpu_transcoder: A, pipe bpp: 18, dithering: 0
[   38.694844] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] MST master transcoder: <invalid>
[   38.694898] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] port sync: master transcoder: <invalid>, slave transcoder bitmask = 0x0
[   38.694943] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] bigjoiner: no, pipes: 0x0
[   38.694987] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] splitter: disabled, link count 0, overlap 0
[   38.695031] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] dp m_n: lanes: 2; data_m: 5120546, data_n: 8388608, link_m: 284474, link_n: 524288, tu: 64
[   38.695073] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] dp m2_n2: lanes: 2; data_m: 0, data_n: 0, link_m: 0, link_n: 0, tu: 0
[   38.695118] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] framestart delay: 1, MSA timing delay: 0
[   38.695164] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] audio: 0, infoframes: 0, infoframes enabled: 0x0
[   38.695212] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] vrr: no, vmin: 0, vmax: 0, pipeline full: 0, guardband: 0 flipline: 0, vmin vblank: -1, vmax vblank: -2
[   38.695259] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] requested mode: "1920x1080": 60 146500 1920 1968 2000 2180 1080 1083 1089 1120 0x40 0x9
[   38.695304] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] adjusted mode: "1920x1080": 60 146500 1920 1968 2000 2180 1080 1083 1089 1120 0x40 0x9
[   38.695346] xe 0000:00:02.0: [drm:intel_dump_crtc_timings [xe]] crtc timings: clock=146500, hd=1920 hb=1920-2180 hs=1968-2000 ht=2180, vd=1080 vb=1080-1120 vs=1083-1089 vt=1120, flags=0x9
[   38.695390] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] pipe mode: "1920x1080": 60 146500 1920 1968 2000 2180 1080 1083 1089 1120 0x40 0x9
[   38.695432] xe 0000:00:02.0: [drm:intel_dump_crtc_timings [xe]] crtc timings: clock=146500, hd=1920 hb=1920-2180 hs=1968-2000 ht=2180, vd=1080 vb=1080-1120 vs=1083-1089 vt=1120, flags=0x9
[   38.695475] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] port clock: 270000, pipe src: 1920x1080+0+0, pixel rate 146500
[   38.695516] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] linetime: 120, ips linetime: 0
[   38.695557] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] num_scalers: 2, scaler_users: 0x0, scaler_id: -1, scaling_filter: 0
[   38.695611] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] pch pfit: 0x0+0+0, disabled, force thru: no
[   38.695659] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] ips: 0, double wide: 0, drrs: 0
[   38.695704] xe 0000:00:02.0: [drm:icl_dump_hw_state [xe]] dpll_hw_state: cfgcr0: 0xe001a5, cfgcr1: 0x88, div0: 0x0, mg_refclkin_ctl: 0x0, hg_clktop2_coreclkctl1: 0x0, mg_clktop2_hsclkctl: 0x0, mg_pll_div0: 0x0, mg_pll_div2: 0x0, mg_pll_lf: 0x0, mg_pll_frac_lock: 0x0, mg_pll_ssc: 0x0, mg_pll_bias: 0x0, mg_pll_tdc_coldst_bias: 0x0
[   38.695786] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] csc_mode: 0x20000000 gamma_mode: 0x20000000 gamma_enable: 0 csc_enable: 0
[   38.695862] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] pre csc lut: 0 entries, post csc lut: 0 entries
[   38.695927] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] output csc: pre offsets: 0x0000 0x0000 0x0000
[   38.695974] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] output csc: coefficients: 0x0000 0x0000 0x0000
[   38.696019] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] output csc: coefficients: 0x0000 0x0000 0x0000
[   38.696062] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] output csc: coefficients: 0x0000 0x0000 0x0000
[   38.696105] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] output csc: post offsets: 0x0000 0x0000 0x0000
[   38.696146] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] pipe csc: pre offsets: 0x0000 0x0000 0x0000
[   38.696187] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] pipe csc: coefficients: 0x0000 0x0000 0x0000
[   38.696230] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] pipe csc: coefficients: 0x0000 0x0000 0x0000
[   38.696276] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] pipe csc: coefficients: 0x0000 0x0000 0x0000
[   38.696320] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] pipe csc: post offsets: 0x0000 0x0000 0x0000
[   38.696362] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] [CRTC:167:pipe B] enable: no [setup_hw_state]
[   38.696407] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] [CRTC:236:pipe C] enable: no [setup_hw_state]
[   38.696449] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] [CRTC:305:pipe D] enable: no [setup_hw_state]
[   38.696514] xe 0000:00:02.0: [drm:skl_wm_get_hw_state_and_sanitize [xe]] [CRTC:98:pipe A] dbuf slices 0x1, ddb (0 - 682), active pipes 0x1, mbus joined: no
[   38.696582] xe 0000:00:02.0: [drm:skl_wm_get_hw_state_and_sanitize [xe]] [CRTC:167:pipe B] dbuf slices 0x0, ddb (0 - 0), active pipes 0x1, mbus joined: no
[   38.696644] xe 0000:00:02.0: [drm:skl_wm_get_hw_state_and_sanitize [xe]] [CRTC:236:pipe C] dbuf slices 0x0, ddb (0 - 0), active pipes 0x1, mbus joined: no
[   38.696704] xe 0000:00:02.0: [drm:skl_wm_get_hw_state_and_sanitize [xe]] [CRTC:305:pipe D] dbuf slices 0x0, ddb (0 - 0), active pipes 0x1, mbus joined: no
[   38.696863] xe 0000:00:02.0: [drm:skl_get_initial_plane_config [xe]] pipe A/plane 1A with fb: size=1920x1080@32, offset=0, pitch 7680, size 0x7e9000
[   38.703123] DMAR: DRHD: handling fault status reg 3
[   38.703190] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70600000 [fault reason 0x0c] non-zero reserved fields in PTE
[   38.703750] DMAR: DRHD: handling fault status reg 3
[   38.703769] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70637000 [fault reason 0x06] PTE Read access is not set
[   38.704454] DMAR: DRHD: handling fault status reg 3
[   38.704479] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70690000 [fault reason 0x06] PTE Read access is not set
[   38.705161] DMAR: DRHD: handling fault status reg 3
[   38.809649] xe 0000:00:02.0: [drm] Using GuC firmware from i915/tgl_guc_70.bin version 70.5.1
[   38.810332] xe 0000:00:02.0: [drm:xe_guc_init [xe]] GuC param[ 0] = 0x00b2ffd3
[   38.810416] xe 0000:00:02.0: [drm:xe_guc_init [xe]] GuC param[ 1] = 0x00044000
[   38.810466] xe 0000:00:02.0: [drm:xe_guc_init [xe]] GuC param[ 2] = 0x00000004
[   38.810512] xe 0000:00:02.0: [drm:xe_guc_init [xe]] GuC param[ 3] = 0x00000003
[   38.810556] xe 0000:00:02.0: [drm:xe_guc_init [xe]] GuC param[ 4] = 0x0000168c
[   38.810613] xe 0000:00:02.0: [drm:xe_guc_init [xe]] GuC param[ 5] = 0x9a490001
[   38.810662] xe 0000:00:02.0: [drm:xe_guc_init [xe]] GuC param[ 6] = 0x00000000
[   38.810708] xe 0000:00:02.0: [drm:xe_guc_init [xe]] GuC param[ 7] = 0x00000000
[   38.810768] xe 0000:00:02.0: [drm:xe_guc_init [xe]] GuC param[ 8] = 0x00000000
[   38.810829] xe 0000:00:02.0: [drm:xe_guc_init [xe]] GuC param[ 9] = 0x00000000
[   38.810871] xe 0000:00:02.0: [drm:xe_guc_init [xe]] GuC param[10] = 0x00000000
[   38.810912] xe 0000:00:02.0: [drm:xe_guc_init [xe]] GuC param[11] = 0x00000000
[   38.810952] xe 0000:00:02.0: [drm:xe_guc_init [xe]] GuC param[12] = 0x00000000
[   38.810992] xe 0000:00:02.0: [drm:xe_guc_init [xe]] GuC param[13] = 0x00000000
[   38.814699] xe 0000:00:02.0: [drm] Using HuC firmware from i915/tgl_huc.bin version 7.9.3
[   38.814968] xe 0000:00:02.0: [drm:xe_wopcm_init [xe]] WOPCM: 2048K
[   38.815048] xe 0000:00:02.0: [drm:xe_wopcm_init [xe]] Calculated GuC WOPCM [592K, 1420K)
[   38.822672] xe 0000:00:02.0: [drm:__xe_guc_upload [xe]] GuC successfully loaded
[   38.823305] xe 0000:00:02.0: [drm:xe_guc_ct_enable [xe]] GuC CT communication channel enabled
[   38.824163] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: Applying GT save-restore MMIOs
[   38.824227] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: REG[0x9424] = 0xfffffffc
[   38.824311] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: REG[0x9550] = 0x000003ff
[   38.824378] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] flag:0x1
[   38.824426] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] entries:64
[   38.824472] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 0 0x4000 0x37
[   38.824518] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 1 0x4004 0x37
[   38.824564] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 2 0x4008 0x37
[   38.824617] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 3 0x400c 0x5
[   38.824665] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 4 0x4010 0x5
[   38.824711] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 5 0x4014 0x37
[   38.824779] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 6 0x4018 0x17
[   38.824838] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 7 0x401c 0x17
[   38.824879] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 8 0x4020 0x27
[   38.824919] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 9 0x4024 0x27
[   38.824959] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 10 0x4028 0x77
[   38.824998] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 11 0x402c 0x77
[   38.825037] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 12 0x4030 0x57
[   38.825076] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 13 0x4034 0x57
[   38.825115] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 14 0x4038 0x67
[   38.825154] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 15 0x403c 0x67
[   38.825192] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 16 0x4040 0x37
[   38.825230] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 17 0x4044 0x37
[   38.825268] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 18 0x4048 0x60037
[   38.825306] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 19 0x404c 0x737
[   38.825343] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 20 0x4050 0x337
[   38.825384] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 21 0x4054 0x137
[   38.825427] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 22 0x4058 0x3b7
[   38.825468] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 23 0x405c 0x7b7
[   38.825509] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 24 0x4060 0x37
[   38.825550] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 25 0x4064 0x37
[   38.825597] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 26 0x4068 0x37
[   38.825641] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 27 0x406c 0x37
[   38.825684] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 28 0x4070 0x37
[   38.825746] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 29 0x4074 0x37
[   38.825815] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 30 0x4078 0x37
[   38.825854] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 31 0x407c 0x37
[   38.825892] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 32 0x4080 0x37
[   38.825929] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 33 0x4084 0x37
[   38.825967] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 34 0x4088 0x37
[   38.826005] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 35 0x408c 0x37
[   38.826042] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 36 0x4090 0x37
[   38.826080] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 37 0x4094 0x37
[   38.826117] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 38 0x4098 0x37
[   38.826157] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 39 0x409c 0x37
[   38.826199] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 40 0x40a0 0x37
[   38.826241] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 41 0x40a4 0x37
[   38.826281] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 42 0x40a8 0x37
[   38.826321] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 43 0x40ac 0x37
[   38.826361] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 44 0x40b0 0x37
[   38.826400] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 45 0x40b4 0x37
[   38.826438] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 46 0x40b8 0x37
[   38.826477] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 47 0x40bc 0x37
[   38.826516] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 48 0x40c0 0x37
[   38.826553] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 49 0x40c4 0x5
[   38.826597] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 50 0x40c8 0x37
[   38.826639] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 51 0x40cc 0x5
[   38.826681] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 52 0x40d0 0x37
[   38.826734] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 53 0x40d4 0x37
[   38.826807] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 54 0x40d8 0x37
[   38.826873] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 55 0x40dc 0x37
[   38.826916] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 56 0x40e0 0x37
[   38.826958] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 57 0x40e4 0x37
[   38.826998] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 58 0x40e8 0x37
[   38.827037] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 59 0x40ec 0x37
[   38.827078] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 60 0x40f0 0x37
[   38.827117] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 61 0x40f4 0x5
[   38.827155] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 62 0x40f8 0x37
[   38.827194] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 63 0x40fc 0x37
[   38.827231] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] entries:64
[   38.827269] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 0 0xb020 0x300030
[   38.827307] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 1 0xb024 0x100030
[   38.827346] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 2 0xb028 0x100030
[   38.827385] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 3 0xb02c 0x300010
[   38.827423] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 4 0xb030 0x300010
[   38.827460] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 5 0xb034 0x300010
[   38.827498] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 6 0xb038 0x300010
[   38.827536] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 7 0xb03c 0x300010
[   38.827579] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 8 0xb040 0x300030
[   38.827626] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 9 0xb044 0x300030
[   38.827672] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 10 0xb048 0x300030
[   38.827718] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 11 0xb04c 0x300030
[   38.827789] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 12 0xb050 0x300030
[   38.827855] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 13 0xb054 0x300030
[   38.827895] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 14 0xb058 0x300030
[   38.827935] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 15 0xb05c 0x300030
[   38.827973] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 16 0xb060 0x300030
[   38.828010] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 17 0xb064 0x300030
[   38.828048] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 18 0xb068 0x300030
[   38.828087] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 19 0xb06c 0x300030
[   38.828124] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 20 0xb070 0x300030
[   38.828162] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 21 0xb074 0x300030
[   38.828200] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 22 0xb078 0x300030
[   38.828237] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 23 0xb07c 0x300030
[   38.828275] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 24 0xb080 0x300030
[   38.828312] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 25 0xb084 0x100010
[   38.828351] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 26 0xb088 0x300030
[   38.828389] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 27 0xb08c 0x300030
[   38.828427] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 28 0xb090 0x300030
[   38.828465] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 29 0xb094 0x300030
[   38.828503] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 30 0xb098 0x300010
[   38.828540] xe 0000:00:02.0: [drm:xe_mocs_init [xe]] 31 0xb09c 0x100010
[   38.828584] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: Applying rcs0 save-restore MMIOs
[   38.828636] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: REG[0x2050] = 0x10801080
[   38.828689] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: REG[0x20a0] = 0x24a80000
[   38.828768] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: REG[0x20c4] = 0x3f7e0306
[   38.828839] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: REG[0x20e0] = 0x40004000
[   38.828889] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: REG[0x20ec] = 0x00020002
[   38.828943] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: REG[0xe18c] = 0x80008000
[   38.828992] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: REG[0xe48c] = 0x02000200
[   38.829042] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: REG[0xe4f4] = 0x41004100
[   38.829093] xe 0000:00:02.0: [drm:xe_reg_sr_apply_whitelist [xe]] Whitelisting rcs0 registers
[   38.829140] xe REG[0x2340-0x235f]: allow read access
[   38.829148] xe REG[0x7010-0x7017]: allow rw access
[   38.829150] xe REG[0x7018-0x701f]: allow rw access
[   38.829300] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: Applying bcs0 save-restore MMIOs
[   38.829353] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: REG[0x220c4] = 0x3f7e0306
[   38.829405] xe 0000:00:02.0: [drm:xe_reg_sr_apply_whitelist [xe]] Whitelisting bcs0 registers
[   38.829454] xe REG[0x223a8-0x223af]: allow read access
[   38.829556] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: Applying vcs0 save-restore MMIOs
[   38.829621] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: REG[0x1c00c4] = 0x3f7e0306
[   38.829703] xe 0000:00:02.0: [drm:xe_reg_sr_apply_whitelist [xe]] Whitelisting vcs0 registers
[   38.829767] xe REG[0x1c03a8-0x1c03af]: allow read access
[   38.829864] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: Applying vcs2 save-restore MMIOs
[   38.829912] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: REG[0x1d00c4] = 0x3f7e0306
[   38.829967] xe 0000:00:02.0: [drm:xe_reg_sr_apply_whitelist [xe]] Whitelisting vcs2 registers
[   38.830012] xe REG[0x1d03a8-0x1d03af]: allow read access
[   38.830110] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: Applying vecs0 save-restore MMIOs
[   38.830155] xe 0000:00:02.0: [drm:xe_reg_sr_apply_mmio [xe]] GT0: REG[0x1c80c4] = 0x3f7e0306
[   38.830206] xe 0000:00:02.0: [drm:xe_reg_sr_apply_whitelist [xe]] Whitelisting vecs0 registers
[   38.830250] xe REG[0x1c83a8-0x1c83af]: allow read access
[   38.840667] xe 0000:00:02.0: [drm:__xe_guc_upload [xe]] GuC successfully loaded
[   38.841235] xe 0000:00:02.0: [drm:xe_guc_ct_enable [xe]] GuC CT communication channel enabled
[   38.841556] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: LRC WA rcs0 save-restore batch
[   38.841628] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x2580] = 0x00060002
[   38.841692] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x6604] = 0xe0ee6fcf
[   38.841753] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x7018] = 0x20002000
[   38.841826] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x7300] = 0x00400040
[   38.841869] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x7304] = 0x02000200
[   38.844129] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: LRC WA bcs0 save-restore batch
[   38.844203] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x2580] = 0x00060002
[   38.844273] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x6604] = 0xe0ee6fcf
[   38.844339] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x7018] = 0x20002000
[   38.844400] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x7300] = 0x00400040
[   38.844451] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x7304] = 0x02000200
[   38.844507] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x22204] = 0x00000606
[   38.845979] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: LRC WA vcs0 save-restore batch
[   38.846054] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x2580] = 0x00060002
[   38.846126] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x6604] = 0xe0ee6fcf
[   38.846192] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x7018] = 0x20002000
[   38.846252] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x7300] = 0x00400040
[   38.846307] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x7304] = 0x02000200
[   38.847736] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: LRC WA vecs0 save-restore batch
[   38.847787] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x2580] = 0x00060002
[   38.847843] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x6604] = 0xe0ee6fcf
[   38.847896] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x7018] = 0x20002000
[   38.847945] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x7300] = 0x00400040
[   38.847991] xe 0000:00:02.0: [drm:xe_gt_record_default_lrcs [xe]] GT0: REG[0x7304] = 0x02000200
[   38.856613] xe 0000:00:02.0: [drm:xe_huc_auth [xe]] HuC authenticated
[   38.856983] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [CRTC:98:pipe A] dbuf slices 0x1 -> 0x3, ddb (0 - 682) -> (0 - 2048), active pipes 0x1 -> 0x1
[   38.857123] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A] ddb (   0 -  682) -> (   0 - 2016), size  682 -> 2016
[   38.857215] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:94:cursor A] ddb (   0 -    0) -> (2016 - 2048), size    0 ->   32
[   38.857301] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A]   level *wm0, wm1, wm2, wm3, wm4, wm5, wm6, wm7, twm,*swm, stwm -> *wm0,*wm1,*wm2,*wm3,*wm4,*wm5,*wm6,*wm7, twm,*swm, stwm
[   38.857383] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A]   lines    1,   1,   1,   1,   1,   1,   1,   1,   1,   1,    1 ->    1,   4,   4,   4,   4,   5,   8,   8,   0,   2,    0
[   38.857465] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A]  blocks   17,  17,   7,   7,   7,   7,   7,   7,   7,  17,    7 ->   16,  65,  65,  65,  65,  81, 129, 129,   0,  19,    0
[   38.857523] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A] min_ddb    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0 ->   19,  73,  73,  73,  73,  91, 143, 143,   0,  22,    0
[   38.857630] xe 0000:00:02.0: [drm:intel_bw_calc_min_cdclk [xe]] new bandwidth min cdclk (11446 kHz) > old min cdclk (0 kHz)
[   38.858228] xe 0000:00:02.0: [drm:intel_modeset_verify_disabled [xe]] [ENCODER:307:DDI A/PHY A]
[   38.858302] xe 0000:00:02.0: [drm:intel_modeset_verify_disabled [xe]] [ENCODER:316:DDI B/PHY B]
[   38.858375] xe 0000:00:02.0: [drm:intel_modeset_verify_disabled [xe]] [ENCODER:325:DDI TC1/PHY TC1]
[   38.858442] xe 0000:00:02.0: [drm:intel_modeset_verify_disabled [xe]] [ENCODER:327:DP-MST A]
[   38.858508] xe 0000:00:02.0: [drm:intel_modeset_verify_disabled [xe]] [ENCODER:328:DP-MST B]
[   38.858584] xe 0000:00:02.0: [drm:intel_modeset_verify_disabled [xe]] [ENCODER:329:DP-MST C]
[   38.858664] xe 0000:00:02.0: [drm:intel_modeset_verify_disabled [xe]] [ENCODER:330:DP-MST D]
[   38.858744] xe 0000:00:02.0: [drm:intel_modeset_verify_disabled [xe]] [ENCODER:338:DDI TC2/PHY TC2]
[   38.858804] xe 0000:00:02.0: [drm:intel_modeset_verify_disabled [xe]] [ENCODER:340:DP-MST A]
[   38.858859] xe 0000:00:02.0: [drm:intel_modeset_verify_disabled [xe]] [ENCODER:341:DP-MST B]
[   38.858914] xe 0000:00:02.0: [drm:intel_modeset_verify_disabled [xe]] [ENCODER:342:DP-MST C]
[   38.858965] xe 0000:00:02.0: [drm:intel_modeset_verify_disabled [xe]] [ENCODER:343:DP-MST D]
[   38.859015] xe 0000:00:02.0: [drm:verify_single_dpll_state [xe]] DPLL 0
[   38.859091] xe 0000:00:02.0: [drm:verify_single_dpll_state [xe]] DPLL 1
[   38.859152] xe 0000:00:02.0: [drm:verify_single_dpll_state [xe]] TBT PLL
[   38.859210] xe 0000:00:02.0: [drm:verify_single_dpll_state [xe]] TC PLL 1
[   38.859267] xe 0000:00:02.0: [drm:verify_single_dpll_state [xe]] TC PLL 2
[   38.859322] xe 0000:00:02.0: [drm:verify_single_dpll_state [xe]] TC PLL 3
[   38.859378] xe 0000:00:02.0: [drm:verify_single_dpll_state [xe]] TC PLL 4
[   38.859433] xe 0000:00:02.0: [drm:verify_single_dpll_state [xe]] TC PLL 5
[   38.859488] xe 0000:00:02.0: [drm:verify_single_dpll_state [xe]] TC PLL 6
[   38.859812] xe 0000:00:02.0: [drm:intel_fbc_update [xe]] reserved 17694720 bytes of contiguous stolen space for FBC, limit: 1
[   38.859890] xe 0000:00:02.0: [drm:intel_fbc_update [xe]] Enabling FBC on [PLANE:31:plane 1A]
[   38.874432] xe 0000:00:02.0: [drm:intel_fbdev_init [xe]] found possible fb from [PLANE:31:plane 1A]
[   38.874508] xe 0000:00:02.0: [drm:intel_fbdev_init [xe]] [CRTC:167:pipe B] not active, skipping
[   38.874564] xe 0000:00:02.0: [drm:intel_fbdev_init [xe]] [CRTC:236:pipe C] not active, skipping
[   38.874628] xe 0000:00:02.0: [drm:intel_fbdev_init [xe]] [CRTC:305:pipe D] not active, skipping
[   38.874684] xe 0000:00:02.0: [drm:intel_fbdev_init [xe]] checking [PLANE:31:plane 1A] for BIOS fb
[   38.874744] xe 0000:00:02.0: [drm:intel_fbdev_init [xe]] [CRTC:98:pipe A] area: 1920x1080, bpp: 32, size: 8294400
[   38.874819] xe 0000:00:02.0: [drm:intel_fbdev_init [xe]] fb big enough [PLANE:31:plane 1A] (8294400 >= 8294400)
[   38.874867] xe 0000:00:02.0: [drm:intel_fbdev_init [xe]] [CRTC:167:pipe B] not active, skipping
[   38.874914] xe 0000:00:02.0: [drm:intel_fbdev_init [xe]] [CRTC:236:pipe C] not active, skipping
[   38.874963] xe 0000:00:02.0: [drm:intel_fbdev_init [xe]] [CRTC:305:pipe D] not active, skipping
[   38.875011] xe 0000:00:02.0: [drm:intel_fbdev_init [xe]] using BIOS fb for initial console
[   38.878481] xe 0000:00:02.0: [drm:intel_backlight_device_register [xe]] [CONNECTOR:308:eDP-1] backlight device intel_backlight registered
[   38.878721] xe 0000:00:02.0: [drm:intel_dp_connector_register [xe]] registering AUX A/DDI A/PHY A bus for card0-eDP-1
[   38.879569] xe 0000:00:02.0: [drm:drm_sysfs_connector_hotplug_event] [CONNECTOR:308:eDP-1] generating connector hotplug event
[   38.879988] xe 0000:00:02.0: [drm:drm_sysfs_connector_hotplug_event] [CONNECTOR:317:HDMI-A-1] generating connector hotplug event
[   38.880303] xe 0000:00:02.0: [drm:intel_dp_connector_register [xe]] registering AUX USBC1/DDI TC1/PHY TC1 bus for card0-DP-1
[   38.881083] xe 0000:00:02.0: [drm:drm_sysfs_connector_hotplug_event] [CONNECTOR:326:DP-1] generating connector hotplug event
[   38.881463] xe 0000:00:02.0: [drm:drm_sysfs_connector_hotplug_event] [CONNECTOR:335:HDMI-A-2] generating connector hotplug event
[   38.881895] xe 0000:00:02.0: [drm:intel_dp_connector_register [xe]] registering AUX USBC2/DDI TC2/PHY TC2 bus for card0-DP-2
[   38.882481] xe 0000:00:02.0: [drm:drm_sysfs_connector_hotplug_event] [CONNECTOR:339:DP-2] generating connector hotplug event
[   38.882871] xe 0000:00:02.0: [drm:drm_sysfs_connector_hotplug_event] [CONNECTOR:347:HDMI-A-3] generating connector hotplug event
[   38.882909] [drm] Initialized xe 1.1.0 20201103 for 0000:00:02.0 on minor 0
[   38.882937] xe 0000:00:02.0: [drm:intel_opregion_resume [xe]] 6 outputs detected
[   38.904868] ACPI: video: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[   38.909405] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input10
[   38.910899] xe 0000:00:02.0: [drm:intel_audio_init [xe]] use AUD_FREQ_CNTRL of 0x810 (init value 0x810)
[   38.911722] [drm:drm_client_modeset_probe] 
[   38.912003] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:308:eDP-1]
[   38.912041] [drm:intel_dsm_detect.isra.0 [xe]] no _DSM method for intel device
[   38.912029] xe 0000:00:02.0: [drm:intel_dp_detect [xe]] [CONNECTOR:308:eDP-1]
[   38.912173] xe 0000:00:02.0: [drm:intel_power_well_disable [xe]] disabling PW_5
[   38.912207] xe 0000:00:02.0: [drm:intel_dp_detect [xe]] [ENCODER:307:DDI A/PHY A] MST support: port: no, sink: no, modparam: yes
[   38.912303] xe 0000:00:02.0: [drm:intel_power_well_disable [xe]] disabling PW_4
[   38.912343] xe 0000:00:02.0: [drm:intel_dp_print_rates [xe]] source rates: 162000, 216000, 270000, 324000, 432000, 540000, 648000, 810000
[   38.912403] xe 0000:00:02.0: [drm:intel_power_well_disable [xe]] disabling PW_3
[   38.912493] xe 0000:00:02.0: [drm:intel_power_well_disable [xe]] disabling PW_2
[   38.912483] xe 0000:00:02.0: [drm:intel_dp_print_rates [xe]] sink rates: 162000, 270000
[   38.912633] xe 0000:00:02.0: [drm:intel_dp_print_rates [xe]] common rates: 162000, 270000
[   38.912866] xe 0000:00:02.0: [drm:update_display_info.part.0] [CONNECTOR:308:eDP-1] Assigning EDID-1.4 digital sink color depth as 6 bpc.
[   38.912878] xe 0000:00:02.0: [drm:update_display_info.part.0] [CONNECTOR:308:eDP-1] ELD monitor 
[   38.912886] xe 0000:00:02.0: [drm:update_display_info.part.0] [CONNECTOR:308:eDP-1] ELD size 20, SAD count 0
[   38.912936] xe 0000:00:02.0: [drm:intel_dp_set_edid [xe]] [CONNECTOR:308:eDP-1] VRR capable: no
[   38.913060] xe 0000:00:02.0: [drm:intel_dp_set_edid [xe]] [CONNECTOR:308:eDP-1] DFP max bpc 0, max dotclock 0, TMDS clock 0-0, PCON Max FRL BW 0Gbps
[   38.913959] xe 0000:00:02.0: [drm:intel_dp_set_edid [xe]] PCON ENCODER DSC DPCD: 00 00 00 00 00 00 00 00 00 00 00 00 00
[   38.914117] xe 0000:00:02.0: [drm:intel_dp_set_edid [xe]] [CONNECTOR:308:eDP-1] RGB->YcbCr conversion? no, YCbCr 4:2:0 allowed? yes, YCbCr 4:4:4->4:2:0 conversion? no
[   38.914796] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:308:eDP-1] status updated from unknown to connected
[   38.915153] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:308:eDP-1] probed modes :
[   38.915163] [drm:drm_mode_debug_printmodeline] Modeline "1920x1080": 60 146500 1920 1968 2000 2180 1080 1083 1089 1120 0x48 0x9
[   38.915173] [drm:drm_mode_debug_printmodeline] Modeline "1920x1080": 48 117200 1920 1968 2000 2180 1080 1083 1089 1120 0x40 0x9
[   38.915181] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:317:HDMI-A-1]
[   38.915190] xe 0000:00:02.0: [drm:intel_hdmi_detect [xe]] [CONNECTOR:317:HDMI-A-1]
[   38.915375] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:317:HDMI-A-1] status updated from unknown to disconnected
[   38.915391] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:317:HDMI-A-1] disconnected
[   38.915399] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:326:DP-1]
[   38.915408] xe 0000:00:02.0: [drm:intel_dp_detect [xe]] [CONNECTOR:326:DP-1]
[   38.915592] xe 0000:00:02.0: [drm:intel_power_well_enable [xe]] enabling TC_cold_off
[   38.915764] xe 0000:00:02.0: [drm:intel_power_well_enable [xe]] TC cold block succeeded
[   38.916040] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:326:DP-1] status updated from unknown to disconnected
[   38.916057] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:326:DP-1] disconnected
[   38.916066] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:335:HDMI-A-2]
[   38.916076] xe 0000:00:02.0: [drm:intel_hdmi_detect [xe]] [CONNECTOR:335:HDMI-A-2]
[   38.916254] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:335:HDMI-A-2] status updated from unknown to disconnected
[   38.916267] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:335:HDMI-A-2] disconnected
[   38.916275] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:339:DP-2]
[   38.916284] xe 0000:00:02.0: [drm:intel_dp_detect [xe]] [CONNECTOR:339:DP-2]
[   38.916462] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:339:DP-2] status updated from unknown to disconnected
[   38.916476] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:339:DP-2] disconnected
[   38.916484] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:347:HDMI-A-3]
[   38.916494] xe 0000:00:02.0: [drm:intel_hdmi_detect [xe]] [CONNECTOR:347:HDMI-A-3]
[   38.916680] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:347:HDMI-A-3] status updated from unknown to disconnected
[   38.916692] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:347:HDMI-A-3] disconnected
[   38.916698] [drm:drm_client_modeset_probe] connector 308 enabled? yes
[   38.916706] [drm:drm_client_modeset_probe] connector 317 enabled? no
[   38.916712] [drm:drm_client_modeset_probe] connector 326 enabled? no
[   38.916716] [drm:drm_client_modeset_probe] connector 335 enabled? no
[   38.916722] [drm:drm_client_modeset_probe] connector 339 enabled? no
[   38.916726] [drm:drm_client_modeset_probe] connector 347 enabled? no
[   38.916884] [drm:drm_client_firmware_config.isra.0] Not using firmware configuration
[   38.916916] [drm:drm_client_modeset_probe] looking for cmdline mode on connector 308
[   38.916922] [drm:drm_client_modeset_probe] looking for preferred mode on connector 308 0
[   38.916928] [drm:drm_client_modeset_probe] found mode 1920x1080
[   38.916933] [drm:drm_client_modeset_probe] picking CRTCs for 16384x16384 config
[   38.916953] [drm:drm_client_modeset_probe] desired mode 1920x1080 set on crtc 98 (0,0)
[   38.917046] xe 0000:00:02.0: [drm:__drm_fb_helper_initial_config_and_unlock] test CRTC 0 primary plane
[   38.917129] xe 0000:00:02.0: [drm:intelfb_create [xe]] re-using BIOS fb
[   38.918235] xe 0000:00:02.0: [drm:intelfb_create [xe]] allocated 1920x1080 fb: 0x00300000
[   38.919504] fbcon: xedrmfb (fb0) is primary device
[   38.921823] xe 0000:00:02.0: [drm:intel_atomic_check [xe]] [CONNECTOR:308:eDP-1] Limiting display bpp to 18 (EDID bpp 18, max requested bpp 36, max platform bpp 36)
[   38.922028] xe 0000:00:02.0: [drm:intel_dp_compute_link_config [xe]] DP link computation with max lane count 2 max rate 270000 max bpp 18 pixel clock 146500KHz
[   38.922181] xe 0000:00:02.0: [drm:intel_dp_compute_link_config [xe]] DP lane count 2 clock 270000 bpp 18
[   38.922317] xe 0000:00:02.0: [drm:intel_dp_compute_link_config [xe]] DP link rate required 329625 available 540000
[   38.922439] xe 0000:00:02.0: [drm:intel_dp_compute_config [xe]] [CONNECTOR:308:eDP-1] SDP split enable: no
[   38.922558] xe 0000:00:02.0: [drm:intel_atomic_check [xe]] [CRTC:98:pipe A] hw max bpp: 18, pipe bpp: 18, dithering: 1
[   38.922717] xe 0000:00:02.0: [drm:intel_ddi_compute_config_late [xe]] [ENCODER:307:DDI A/PHY A] [CRTC:98:pipe A]
[   38.922942] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A]   level *wm0,*wm1,*wm2,*wm3,*wm4,*wm5,*wm6,*wm7, twm,*swm, stwm -> *wm0,*wm1,*wm2,*wm3,*wm4,*wm5,*wm6,*wm7,*twm,*swm,*stwm
[   38.923075] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A]   lines    1,   4,   4,   4,   4,   5,   8,   8,   0,   2,    0 ->    1,   4,   4,   4,   4,   5,   8,   8,   0,   2,    0
[   38.923214] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A]  blocks   16,  65,  65,  65,  65,  81, 129, 129,   0,  19,    0 ->   16,  65,  65,  65,  65,  81, 129, 129,  30,  19,   33
[   38.923344] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A] min_ddb   19,  73,  73,  73,  73,  91, 143, 143,   0,  22,    0 ->   19,  73,  73,  73,  73,  91, 143, 143,  31,  22,   34
[   38.923453] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:40:plane 2A]   level  wm0, wm1, wm2, wm3, wm4, wm5, wm6, wm7, twm, swm, stwm ->  wm0, wm1, wm2, wm3, wm4, wm5, wm6, wm7, twm, swm, stwm
[   38.923559] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:40:plane 2A]   lines    1,   1,   1,   1,   1,   1,   1,   1,   1,   1,    1 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.923691] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:40:plane 2A]  blocks    7,   7,   7,   7,   7,   7,   7,   7,   7,   7,    7 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.923797] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:40:plane 2A] min_ddb    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.923900] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:49:plane 3A]   level  wm0, wm1, wm2, wm3, wm4, wm5, wm6, wm7, twm, swm, stwm ->  wm0, wm1, wm2, wm3, wm4, wm5, wm6, wm7, twm, swm, stwm
[   38.924009] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:49:plane 3A]   lines    1,   1,   1,   1,   1,   1,   1,   1,   1,   1,    1 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.924110] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:49:plane 3A]  blocks    7,   7,   7,   7,   7,   7,   7,   7,   7,   7,    7 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.924217] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:49:plane 3A] min_ddb    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.924326] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:58:plane 4A]   level  wm0, wm1, wm2, wm3, wm4, wm5, wm6, wm7, twm, swm, stwm ->  wm0, wm1, wm2, wm3, wm4, wm5, wm6, wm7, twm, swm, stwm
[   38.924430] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:58:plane 4A]   lines    1,   1,   1,   1,   1,   1,   1,   1,   1,   1,    1 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.924536] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:58:plane 4A]  blocks    7,   7,   7,   7,   7,   7,   7,   7,   7,   7,    7 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.924652] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:58:plane 4A] min_ddb    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.924756] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:67:plane 5A]   level  wm0, wm1, wm2, wm3, wm4, wm5, wm6, wm7, twm, swm, stwm ->  wm0, wm1, wm2, wm3, wm4, wm5, wm6, wm7, twm, swm, stwm
[   38.924858] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:67:plane 5A]   lines    1,   1,   1,   1,   1,   1,   1,   1,   1,   1,    1 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.924960] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:67:plane 5A]  blocks    7,   7,   7,   7,   7,   7,   7,   7,   7,   7,    7 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.925060] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:67:plane 5A] min_ddb    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.925161] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:76:plane 6A]   level  wm0, wm1, wm2, wm3, wm4, wm5, wm6, wm7, twm, swm, stwm ->  wm0, wm1, wm2, wm3, wm4, wm5, wm6, wm7, twm, swm, stwm
[   38.925261] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:76:plane 6A]   lines    1,   1,   1,   1,   1,   1,   1,   1,   1,   1,    1 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.925363] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:76:plane 6A]  blocks    7,   7,   7,   7,   7,   7,   7,   7,   7,   7,    7 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.925465] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:76:plane 6A] min_ddb    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.925570] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:85:plane 7A]   level  wm0, wm1, wm2, wm3, wm4, wm5, wm6, wm7, twm, swm, stwm ->  wm0, wm1, wm2, wm3, wm4, wm5, wm6, wm7, twm, swm, stwm
[   38.925696] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:85:plane 7A]   lines    1,   1,   1,   1,   1,   1,   1,   1,   1,   1,    1 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.925807] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:85:plane 7A]  blocks    7,   7,   7,   7,   7,   7,   7,   7,   7,   7,    7 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.925919] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:85:plane 7A] min_ddb    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.926027] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:94:cursor A]   level  wm0, wm1, wm2, wm3, wm4, wm5, wm6, wm7, twm, swm, stwm ->  wm0, wm1, wm2, wm3, wm4, wm5, wm6, wm7, twm, swm, stwm
[   38.926133] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:94:cursor A]   lines    1,   1,   1,   1,   1,   1,   1,   1,   1,   1,    1 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.926241] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:94:cursor A]  blocks    7,   7,   7,   7,   7,   7,   7,   7,   7,   7,    7 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.926347] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:94:cursor A] min_ddb    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0 ->    0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0
[   38.926474] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] [CRTC:98:pipe A] enable: yes [fastset]
[   38.926631] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] active: yes, output_types: EDP (0x100), output format: RGB, sink format: RGB
[   38.926756] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] cpu_transcoder: A, pipe bpp: 18, dithering: 1
[   38.926869] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] MST master transcoder: <invalid>
[   38.926978] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] port sync: master transcoder: <invalid>, slave transcoder bitmask = 0x0
[   38.927084] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] bigjoiner: no, pipes: 0x0
[   38.927185] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] splitter: disabled, link count 0, overlap 0
[   38.927290] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] dp m_n: lanes: 2; data_m: 5120546, data_n: 8388608, link_m: 284474, link_n: 524288, tu: 64
[   38.927415] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] dp m2_n2: lanes: 2; data_m: 0, data_n: 0, link_m: 0, link_n: 0, tu: 0
[   38.927518] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] framestart delay: 1, MSA timing delay: 0
[   38.927642] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] audio: 0, infoframes: 0, infoframes enabled: 0x0
[   38.927755] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] vrr: no, vmin: 0, vmax: 0, pipeline full: 0, guardband: 0 flipline: 0, vmin vblank: -1, vmax vblank: -2
[   38.927859] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] requested mode: "1920x1080": 60 146500 1920 1968 2000 2180 1080 1083 1089 1120 0x48 0x9
[   38.927964] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] adjusted mode: "1920x1080": 60 146500 1920 1968 2000 2180 1080 1083 1089 1120 0x48 0x9
[   38.928074] xe 0000:00:02.0: [drm:intel_dump_crtc_timings [xe]] crtc timings: clock=146500, hd=1920 hb=1920-2180 hs=1968-2000 ht=2180, vd=1080 vb=1080-1120 vs=1083-1089 vt=1120, flags=0x9
[   38.928184] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] pipe mode: "1920x1080": 60 146500 1920 1968 2000 2180 1080 1083 1089 1120 0x40 0x9
[   38.928263] xe 0000:00:02.0: [drm:intel_dump_crtc_timings [xe]] crtc timings: clock=146500, hd=1920 hb=1920-2180 hs=1968-2000 ht=2180, vd=1080 vb=1080-1120 vs=1083-1089 vt=1120, flags=0x9
[   38.928335] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] port clock: 270000, pipe src: 1920x1080+0+0, pixel rate 146500
[   38.928406] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] linetime: 120, ips linetime: 0
[   38.928477] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] num_scalers: 2, scaler_users: 0x0, scaler_id: -1, scaling_filter: 0
[   38.928548] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] pch pfit: 0x0+0+0, disabled, force thru: no
[   38.928640] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] ips: 0, double wide: 0, drrs: 0
[   38.928732] xe 0000:00:02.0: [drm:icl_dump_hw_state [xe]] dpll_hw_state: cfgcr0: 0xe001a5, cfgcr1: 0x88, div0: 0x0, mg_refclkin_ctl: 0x0, hg_clktop2_coreclkctl1: 0x0, mg_clktop2_hsclkctl: 0x0, mg_pll_div0: 0x0, mg_pll_div2: 0x0, mg_pll_lf: 0x0, mg_pll_frac_lock: 0x0, mg_pll_ssc: 0x0, mg_pll_bias: 0x0, mg_pll_tdc_coldst_bias: 0x0
[   38.928880] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] csc_mode: 0x0 gamma_mode: 0x0 gamma_enable: 0 csc_enable: 0
[   38.928974] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] pre csc lut: 0 entries, post csc lut: 0 entries
[   38.929056] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] output csc: pre offsets: 0x0000 0x0000 0x0000
[   38.929134] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] output csc: coefficients: 0x0000 0x0000 0x0000
[   38.929211] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] output csc: coefficients: 0x0000 0x0000 0x0000
[   38.929286] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] output csc: coefficients: 0x0000 0x0000 0x0000
[   38.929361] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] output csc: post offsets: 0x0000 0x0000 0x0000
[   38.929435] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] pipe csc: pre offsets: 0x0000 0x0000 0x0000
[   38.929508] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] pipe csc: coefficients: 0x0000 0x0000 0x0000
[   38.929600] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] pipe csc: coefficients: 0x0000 0x0000 0x0000
[   38.929702] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] pipe csc: coefficients: 0x0000 0x0000 0x0000
[   38.929780] xe 0000:00:02.0: [drm:ilk_dump_csc [xe]] pipe csc: post offsets: 0x0000 0x0000 0x0000
[   38.929852] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] [PLANE:31:plane 1A] fb: [FB:351] 1920x1080 format = XR24 little-endian (0x34325258) modifier = 0x0, visible: yes
[   38.929926] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] 	rotation: 0x1, scaler: -1, scaling_filter: 0
[   38.929996] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] 	src: 1920.000000x1080.000000+0.000000+0.000000 dst: 1920x1080+0+0
[   38.930066] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] [PLANE:40:plane 2A] fb: [NOFB], visible: no
[   38.930133] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] [PLANE:49:plane 3A] fb: [NOFB], visible: no
[   38.930201] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] [PLANE:58:plane 4A] fb: [NOFB], visible: no
[   38.930267] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] [PLANE:67:plane 5A] fb: [NOFB], visible: no
[   38.930334] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] [PLANE:76:plane 6A] fb: [NOFB], visible: no
[   38.930400] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] [PLANE:85:plane 7A] fb: [NOFB], visible: no
[   38.930474] xe 0000:00:02.0: [drm:intel_crtc_state_dump [xe]] [PLANE:94:cursor A] fb: [NOFB], visible: no
[   38.941625] xe 0000:00:02.0: [drm:verify_connector_state [xe]] [CONNECTOR:308:eDP-1]
[   38.941836] xe 0000:00:02.0: [drm:intel_modeset_verify_crtc [xe]] [CRTC:98:pipe A]
[   38.942044] xe 0000:00:02.0: [drm:intel_ddi_get_config [xe]] [ENCODER:307:DDI A/PHY A] Fec status: 0
[   38.942204] xe 0000:00:02.0: [drm:verify_single_dpll_state [xe]] DPLL 0
[   38.943663] Console: switching to colour frame buffer device 240x67
[   38.974748] xe 0000:00:02.0: [drm:intel_backlight_device_update_status [xe]] updating intel_backlight, brightness=96000/96000
[   38.974926] xe 0000:00:02.0: [drm:intel_panel_actually_set_backlight [xe]] [CONNECTOR:308:eDP-1] set backlight level = 96000
[   38.982733] xe 0000:00:02.0: [drm] fb0: xedrmfb frame buffer device
[   38.996975] modprobe (872) used greatest stack depth: 11552 bytes left
[   39.008306] xe 0000:00:02.0: [drm:drm_fb_helper_hotplug_event] 
[   39.008325] [drm:drm_client_modeset_probe] 
[   39.008499] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:308:eDP-1]
[   39.008514] xe 0000:00:02.0: [drm:intel_dp_detect [xe]] [CONNECTOR:308:eDP-1]
[   39.008645] xe 0000:00:02.0: [drm:intel_dp_detect [xe]] [ENCODER:307:DDI A/PHY A] MST support: port: no, sink: no, modparam: yes
[   39.008715] xe 0000:00:02.0: [drm:intel_dp_print_rates [xe]] source rates: 162000, 216000, 270000, 324000, 432000, 540000, 648000, 810000
[   39.008796] xe 0000:00:02.0: [drm:intel_dp_print_rates [xe]] sink rates: 162000, 270000
[   39.008874] xe 0000:00:02.0: [drm:intel_dp_print_rates [xe]] common rates: 162000, 270000
[   39.009034] xe 0000:00:02.0: [drm:update_display_info.part.0] [CONNECTOR:308:eDP-1] Assigning EDID-1.4 digital sink color depth as 6 bpc.
[   39.009041] xe 0000:00:02.0: [drm:update_display_info.part.0] [CONNECTOR:308:eDP-1] ELD monitor 
[   39.009045] xe 0000:00:02.0: [drm:update_display_info.part.0] [CONNECTOR:308:eDP-1] ELD size 20, SAD count 0
[   39.009068] xe 0000:00:02.0: [drm:intel_dp_set_edid [xe]] [CONNECTOR:308:eDP-1] VRR capable: no
[   39.009149] xe 0000:00:02.0: [drm:intel_dp_set_edid [xe]] [CONNECTOR:308:eDP-1] DFP max bpc 0, max dotclock 0, TMDS clock 0-0, PCON Max FRL BW 0Gbps
[   39.010075] xe 0000:00:02.0: [drm:intel_dp_set_edid [xe]] PCON ENCODER DSC DPCD: 00 00 00 00 00 00 00 00 00 00 00 00 00
[   39.010237] xe 0000:00:02.0: [drm:intel_dp_set_edid [xe]] [CONNECTOR:308:eDP-1] RGB->YcbCr conversion? no, YCbCr 4:2:0 allowed? yes, YCbCr 4:4:4->4:2:0 conversion? no
[   39.010969] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:308:eDP-1] probed modes :
[   39.010985] [drm:drm_mode_debug_printmodeline] Modeline "1920x1080": 60 146500 1920 1968 2000 2180 1080 1083 1089 1120 0x48 0x9
[   39.010993] [drm:drm_mode_debug_printmodeline] Modeline "1920x1080": 48 117200 1920 1968 2000 2180 1080 1083 1089 1120 0x40 0x9
[   39.011000] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:317:HDMI-A-1]
[   39.011007] xe 0000:00:02.0: [drm:intel_hdmi_detect [xe]] [CONNECTOR:317:HDMI-A-1]
[   39.011136] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:317:HDMI-A-1] disconnected
[   39.011146] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:326:DP-1]
[   39.011154] xe 0000:00:02.0: [drm:intel_dp_detect [xe]] [CONNECTOR:326:DP-1]
[   39.011289] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:326:DP-1] disconnected
[   39.011294] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:335:HDMI-A-2]
[   39.011299] xe 0000:00:02.0: [drm:intel_hdmi_detect [xe]] [CONNECTOR:335:HDMI-A-2]
[   39.011419] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:335:HDMI-A-2] disconnected
[   39.011423] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:339:DP-2]
[   39.011429] xe 0000:00:02.0: [drm:intel_dp_detect [xe]] [CONNECTOR:339:DP-2]
[   39.011552] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:339:DP-2] disconnected
[   39.011557] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:347:HDMI-A-3]
[   39.011562] xe 0000:00:02.0: [drm:intel_hdmi_detect [xe]] [CONNECTOR:347:HDMI-A-3]
[   39.011697] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:347:HDMI-A-3] disconnected
[   39.011702] [drm:drm_client_modeset_probe] connector 308 enabled? yes
[   39.011706] [drm:drm_client_modeset_probe] connector 317 enabled? no
[   39.011709] [drm:drm_client_modeset_probe] connector 326 enabled? no
[   39.011713] [drm:drm_client_modeset_probe] connector 335 enabled? no
[   39.011716] [drm:drm_client_modeset_probe] connector 339 enabled? no
[   39.011719] [drm:drm_client_modeset_probe] connector 347 enabled? no
[   39.024224] [drm:drm_client_firmware_config.isra.0] Not using firmware configuration
[   39.024249] [drm:drm_client_modeset_probe] looking for cmdline mode on connector 308
[   39.024252] [drm:drm_client_modeset_probe] looking for preferred mode on connector 308 0
[   39.024254] [drm:drm_client_modeset_probe] found mode 1920x1080
[   39.024256] [drm:drm_client_modeset_probe] picking CRTCs for 1920x1080 config
[   39.024293] [drm:drm_client_modeset_probe] desired mode 1920x1080 set on crtc 98 (0,0)
[   39.127634] xe 0000:00:02.0: [drm:intel_power_well_disable [xe]] disabling TC_cold_off
[   39.127964] xe 0000:00:02.0: [drm:__intel_display_power_put_domain [xe]] TC cold unblock succeeded
[   42.055940] xe 0000:00:02.0: [drm:intel_pps_vdd_off_sync_unlocked [xe]] [ENCODER:307:DDI A/PHY A] PPS 0 turning VDD off
[   42.056057] xe 0000:00:02.0: [drm:intel_pps_vdd_off_sync_unlocked [xe]] [ENCODER:307:DDI A/PHY A] PPS 0 PP_STATUS: 0x80000008 PP_CONTROL: 0x00000067
[   43.703578] dmar_fault: 972454 callbacks suppressed
[   43.703580] DMAR: DRHD: handling fault status reg 3
[   43.703923] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x7072a000 [fault reason 0x0c] non-zero reserved fields in PTE
[   43.703992] DMAR: DRHD: handling fault status reg 3
[   43.704024] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x7075e000 [fault reason 0x0c] non-zero reserved fields in PTE
[   43.704103] DMAR: DRHD: handling fault status reg 3
[   43.704136] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x7076d000 [fault reason 0x0c] non-zero reserved fields in PTE
[   43.704216] DMAR: DRHD: handling fault status reg 3
[   48.704577] dmar_fault: 1001858 callbacks suppressed
[   48.704579] DMAR: DRHD: handling fault status reg 3
[   48.704615] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x707b8000 [fault reason 0x0c] non-zero reserved fields in PTE
[   48.704676] DMAR: DRHD: handling fault status reg 3
[   48.704709] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x707c4000 [fault reason 0x0c] non-zero reserved fields in PTE
[   48.704793] DMAR: DRHD: handling fault status reg 3
[   48.704831] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x707d3000 [fault reason 0x0c] non-zero reserved fields in PTE
[   48.704922] DMAR: DRHD: handling fault status reg 3
[   53.717645] dmar_fault: 1009398 callbacks suppressed
[   53.717661] DMAR: DRHD: handling fault status reg 3
[   53.717728] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70600000 [fault reason 0x0c] non-zero reserved fields in PTE
[   53.717790] DMAR: DRHD: handling fault status reg 3
[   53.717823] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70613000 [fault reason 0x0c] non-zero reserved fields in PTE
[   53.717899] DMAR: DRHD: handling fault status reg 3
[   53.717935] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70622000 [fault reason 0x0c] non-zero reserved fields in PTE
[   53.718019] DMAR: DRHD: handling fault status reg 3
[   58.718576] dmar_fault: 991300 callbacks suppressed
[   58.718578] DMAR: DRHD: handling fault status reg 3
[   58.718628] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70687000 [fault reason 0x0c] non-zero reserved fields in PTE
[   58.718695] DMAR: DRHD: handling fault status reg 3
[   58.718728] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70694000 [fault reason 0x0c] non-zero reserved fields in PTE
[   58.718809] DMAR: DRHD: handling fault status reg 3
[   58.718844] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x706a4000 [fault reason 0x0c] non-zero reserved fields in PTE
[   58.719231] DMAR: DRHD: handling fault status reg 3
[   63.719577] dmar_fault: 1006360 callbacks suppressed
[   63.719579] DMAR: DRHD: handling fault status reg 3
[   63.719613] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70714000 [fault reason 0x0c] non-zero reserved fields in PTE
[   63.719672] DMAR: DRHD: handling fault status reg 3
[   63.719706] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70720000 [fault reason 0x0c] non-zero reserved fields in PTE
[   63.719783] DMAR: DRHD: handling fault status reg 3
[   63.719819] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x7072e000 [fault reason 0x0c] non-zero reserved fields in PTE
[   63.719890] DMAR: DRHD: handling fault status reg 3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-xe] [PATCH v2 0/6] PAT and cache coherency support
  2023-09-18 15:51 ` [Intel-xe] [PATCH v2 0/6] PAT and cache coherency support Souza, Jose
@ 2023-09-21 17:19   ` Souza, Jose
  2023-09-25 13:12     ` Matthew Auld
  0 siblings, 1 reply; 28+ messages in thread
From: Souza, Jose @ 2023-09-21 17:19 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Auld,  Matthew

On Mon, 2023-09-18 at 15:51 +0000, Souza, Jose wrote:
> On Thu, 2023-09-14 at 16:31 +0100, Matthew Auld wrote:
> > Branch available here (lightly tested):
> > https://gitlab.freedesktop.org/mwa/kernel/-/tree/xe-pat-index?ref_type=heads
> > 
> > Series still needs some more testing. Also note that the series directly depends
> > on the WIP patch here: https://patchwork.freedesktop.org/series/122708/
> > 
> > Goal here is to allow userspace to directly control the pat_index when mapping
> > memory via the ppGTT, in addtion to the CPU caching mode for system memory. This
> > is very much needed on newer igpu platforms which allow incoherent GT access,
> > where the choice over the cache level and expected coherency is best left to
> > userspace depending on their usecase.  In the future there may also be other
> > stuff encoded in the pat_index, so giving userspace direct control will also be
> > needed there.
> > 
> > To support this we added new gem_create uAPI for selecting the CPU cache
> > mode to use for system memory, including the expected GPU coherency mode. There
> > are various restrictions here for the selected coherency mode and compatible CPU
> > cache modes.  With that in place the actual pat_index can now be provided as
> > part of vm_bind. The only restriction is that the coherency mode of the
> > pat_index must be at least as coherent as the gem_create coherency mode. There
> > are also some special cases like with userptr and dma-buf.
> > 
> > v2:
> >   - Loads of improvements/tweaks. Main changes are to now allow
> >     gem_create.coh_mode <= coh_mode(pat_index), rather than it needing to match
> >     exactly. This simplifies the dma-buf policy from userspace pov. Also we now
> >     only consider COH_NONE and COH_AT_LEAST_1WAY.
> > 
> 
> 
> Getting constant DMAR errors after loading Xe KMD on TGL with your branch in framebuffer console, logs attached.
> 
> 

Another issue report, when starting Xorg I'm getting this KMD crash with your branch:

[ 2376.624393] xe 0000:00:02.0: [drm:intel_hdmi_detect [xe]] [CONNECTOR:347:HDMI-A-3]
[ 2376.624465] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:347:HDMI-A-3] disconnected
[ 2376.726753] xe 0000:00:02.0: [drm:intel_power_well_disable [xe]] disabling TC_cold_off
[ 2376.727183] xe 0000:00:02.0: [drm:__intel_display_power_put_domain [xe]] TC cold unblock succeeded
[ 2378.896672] dmar_fault: 915847 callbacks suppressed
[ 2378.896675] DMAR: DRHD: handling fault status reg 3
[ 2378.896684] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70600000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 2378.896711] DMAR: DRHD: handling fault status reg 3
[ 2378.896715] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70603000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 2378.896722] DMAR: DRHD: handling fault status reg 3
[ 2378.896726] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70607000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 2378.896737] DMAR: DRHD: handling fault status reg 3
[ 2379.479148] xe 0000:00:02.0: [drm:drm_mode_addfb2] [FB:353]
[ 2379.480368] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A]   level *wm0,*wm1,*wm2,*wm3,*wm4,*wm5,*wm6,*wm7,*twm,*swm,*stwm ->
*wm0,*wm1,*wm2,*wm3,*wm4,*wm5,*wm6,*wm7,*twm,*swm,*stwm
[ 2379.480464] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A]   lines    1,   4,   4,   4,   4,   5,   8,   8,   0,   2,    0 ->    4,
4,   4,   4,   4,   5,   8,   8,   0,   4,    0
[ 2379.480535] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A]  blocks   16,  65,  65,  65,  65,  81, 129, 129,  30,  19,   33 ->   62,
62,  62,  62,  62,  78, 123, 123, 137,  62,  137
[ 2379.480604] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A] min_ddb   19,  73,  73,  73,  73,  91, 143, 143,  31,  22,   34 ->  123,
123, 123, 123, 123, 184, 184, 184, 138, 123,  138
[ 2379.481280] BUG: kernel NULL pointer dereference, address: 0000000000000068
[ 2379.481286] #PF: supervisor read access in kernel mode
[ 2379.481289] #PF: error_code(0x0000) - not-present page
[ 2379.481291] PGD 0 P4D 0
[ 2379.481296] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 2379.481300] CPU: 7 PID: 24658 Comm: gnome-shell Not tainted 6.5.0-rc7+zeh-xe+ #1108
[ 2379.481304] Hardware name: Dell Inc. Latitude 5420/01M3M4, BIOS 1.27.0 03/17/2023
[ 2379.481306] RIP: 0010:xe_ggtt_pte_encode+0x1c/0x90 [xe]
[ 2379.481382] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 41 55 ba 00 10 00 00 41 54 55 53 48 8b 87 d0 02 00 00 48 89 fb 4c 8b a7 20 02 00 00
<4c> 8b 68 68 e8 bb 4e ff ff 48 89 df 48 89 c5 e8 20 24 ff ff 84 c0
[ 2379.481385] RSP: 0018:ffffc90001b0bb20 EFLAGS: 00010206
[ 2379.481390] RAX: 0000000000000000 RBX: ffff8881071fe800 RCX: 0000000000000000
[ 2379.481394] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff8881071fe800
[ 2379.481396] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
[ 2379.481397] R10: 0000000000000001 R11: 0000000000000659 R12: ffff8881133f0f78
[ 2379.481399] R13: 0000000000001000 R14: 0000000000809000 R15: ffff888134feb850
[ 2379.481400] FS:  00007f47ff7335c0(0000) GS:ffff888287b80000(0000) knlGS:0000000000000000
[ 2379.481402] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2379.481404] CR2: 0000000000000068 CR3: 0000000183252001 CR4: 0000000000770ee0
[ 2379.481406] PKRU: 55555554
[ 2379.481407] Call Trace:
[ 2379.481409]  <TASK>
[ 2379.481411]  ? __die+0x1a/0x60
[ 2379.481415]  ? page_fault_oops+0x158/0x450
[ 2379.481419]  ? drm_atomic_commit+0x8e/0xc0
[ 2379.481423]  ? drm_mode_atomic_ioctl+0x96a/0xbd0
[ 2379.481426]  ? drm_ioctl+0x212/0x470
[ 2379.481428]  ? do_user_addr_fault+0x61/0x7c0
[ 2379.481432]  ? exc_page_fault+0x6a/0x1b0
[ 2379.481436]  ? asm_exc_page_fault+0x22/0x30
[ 2379.481440]  ? xe_ggtt_pte_encode+0x1c/0x90 [xe]
[ 2379.481492]  __xe_pin_fb_vma+0x396/0x840 [xe]
[ 2379.481570]  intel_plane_pin_fb+0x34/0x90 [xe]
[ 2379.481647]  intel_prepare_plane_fb+0x2c/0x70 [xe]
[ 2379.481753]  drm_atomic_helper_prepare_planes+0x6b/0x210
[ 2379.481764]  intel_atomic_commit+0x4d/0x360 [xe]
[ 2379.481885]  drm_atomic_commit+0x8e/0xc0
[ 2379.481889]  ? __pfx___drm_printfn_info+0x10/0x10
[ 2379.481894]  drm_mode_atomic_ioctl+0x96a/0xbd0
[ 2379.481902]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
[ 2379.481906]  drm_ioctl_kernel+0xc0/0x170
[ 2379.481909]  drm_ioctl+0x212/0x470
[ 2379.481912]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
[ 2379.481918]  __x64_sys_ioctl+0x8d/0xb0
[ 2379.481924]  do_syscall_64+0x38/0x90
[ 2379.481928]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 2379.481932] RIP: 0033:0x7f4802b1aaff
[ 2379.481935] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05
<41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[ 2379.481939] RSP: 002b:00007ffc8bafb730 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 2379.481943] RAX: ffffffffffffffda RBX: 00007ffc8bafb7d0 RCX: 00007f4802b1aaff
[ 2379.481946] RDX: 00007ffc8bafb7d0 RSI: 00000000c03864bc RDI: 0000000000000009
[ 2379.481948] RBP: 00000000c03864bc R08: 0000000000000026 R09: 0000000000000026
[ 2379.481950] R10: 0000000000000001 R11: 0000000000000246 R12: 000055fe14331f40
[ 2379.481953] R13: 0000000000000009 R14: 000055fe1430f4c0 R15: 000055fe1430d6f0
[ 2379.481958]  </TASK>
[ 2379.481959] Modules linked in: xe drm_ttm_helper drm_exec gpu_sched drm_suballoc_helper i2c_algo_bit drm_buddy ttm drm_display_helper btusb btrtl
btbcm btintel bluetooth snd_hda_codec_hdmi cdc_ncm cdc_ether usbnet mii ecdh_generic ecc snd_ctl_led mei_pxp mei_hdcp snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio wmi_bmof x86_pkg_temp_thermal snd_hda_intel coretemp crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec
ghash_clmulni_intel snd_hwdep snd_hda_core e1000e kvm_intel video ptp snd_pcm i2c_i801 mei_me pps_core i2c_smbus mei wmi pinctrl_tigerlake fuse
[ 2379.482015] CR2: 0000000000000068
[ 2379.482018] ---[ end trace 0000000000000000 ]---
[ 2379.661641] xe 0000:00:02.0: [drm:intel_pps_vdd_off_sync_unlocked [xe]] [ENCODER:307:DDI A/PHY A] PPS 0 turning VDD off
[ 2379.661861] xe 0000:00:02.0: [drm:intel_pps_vdd_off_sync_unlocked [xe]] [ENCODER:307:DDI A/PHY A] PPS 0 PP_STATUS: 0x80000008 PP_CONTROL:
0x00000067
[ 2379.873152] RIP: 0010:xe_ggtt_pte_encode+0x1c/0x90 [xe]
[ 2379.873325] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 41 55 ba 00 10 00 00 41 54 55 53 48 8b 87 d0 02 00 00 48 89 fb 4c 8b a7 20 02 00 00
<4c> 8b 68 68 e8 bb 4e ff ff 48 89 df 48 89 c5 e8 20 24 ff ff 84 c0
[ 2379.873328] RSP: 0018:ffffc90001b0bb20 EFLAGS: 00010206
[ 2379.873330] RAX: 0000000000000000 RBX: ffff8881071fe800 RCX: 0000000000000000
[ 2379.873332] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff8881071fe800
[ 2379.873333] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
[ 2379.873334] R10: 0000000000000001 R11: 0000000000000659 R12: ffff8881133f0f78
[ 2379.873335] R13: 0000000000001000 R14: 0000000000809000 R15: ffff888134feb850
[ 2379.873336] FS:  00007f47ff7335c0(0000) GS:ffff888287b80000(0000) knlGS:0000000000000000
[ 2379.873338] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2379.873339] CR2: 0000000000000068 CR3: 0000000183252001 CR4: 0000000000770ee0
[ 2379.873340] PKRU: 55555554
[ 2379.873342] note: gnome-shell[24658] exited with irqs disabled
[ 2383.896731] dmar_fault: 1159924 callbacks suppressed
[ 2383.896733] DMAR: DRHD: handling fault status reg 3
[ 2383.896739] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70617000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 2383.896749] DMAR: DRHD: handling fault status reg 3
[ 2383.896751] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70619000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 2383.896757] DMAR: DRHD: handling fault status reg 3
[ 2383.896759] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x7061b000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 2383.896762] DMAR: DRHD: handling fault status reg 2
[ 2388.897730] dmar_fault: 1298750 callbacks suppressed
[ 2388.897733] DMAR: DRHD: handling fault status reg 3
[ 2388.897738] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x706a5000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 2388.897747] DMAR: DRHD: handling fault status reg 3
[ 2388.897748] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x706a6000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 2388.897752] DMAR: DRHD: handling fault status reg 3
[ 2388.897754] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x706a8000 [fault reason 0x0c] non-zero reserved fields in PTE
[ 2388.897757] DMAR: DRHD: handling fault status reg 3
[ 2393.898732] dmar_fault: 1164851 callbacks suppressed



This might help debug:
(gdb) list *(xe_ggtt_pte_encode+0x1c)
0x101fc is in xe_ggtt_pte_encode (drivers/gpu/drm/xe/xe_ggtt.c:34).
29	#define GUC_GGTT_TOP	0xFEE00000
30
31	u64 xe_ggtt_pte_encode(struct xe_bo *bo, u64 bo_offset)
32	{
33	        struct xe_device *xe = xe_bo_device(bo);
34	        struct xe_ggtt *ggtt = (bo->tile)->mem.ggtt;
35	        u64 pte;






^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-xe] [PATCH v2 1/6] drm/xe/uapi: Add support for cache and coherency mode
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 1/6] drm/xe/uapi: Add support for cache and coherency mode Matthew Auld
  2023-09-14 23:47   ` Matt Roper
@ 2023-09-21 20:07   ` Souza, Jose
  2023-09-25  8:06     ` Matthew Auld
  1 sibling, 1 reply; 28+ messages in thread
From: Souza, Jose @ 2023-09-21 20:07 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Auld,  Matthew

On Thu, 2023-09-14 at 16:31 +0100, Matthew Auld wrote:
> From: Pallavi Mishra <pallavi.mishra@intel.com>
> 
> Allow userspace to specify the CPU caching mode to use for system memory
> in addition to coherency modes during object creation. Modify gem create
> handler and introduce xe_bo_create_user to replace xe_bo_create. In a
> later patch we will support setting the pat_index as part of vm_bind,
> where expectation is that the coherency mode extracted from the
> pat_index must match the one set at object creation.
> 
> v2
>   - s/smem_caching/smem_cpu_caching/ and
>     s/XE_GEM_CACHING/XE_GEM_CPU_CACHING/. (Matt Roper)
>   - Drop COH_2WAY and just use COH_NONE + COH_AT_LEAST_1WAY; KMD mostly
>     just cares that zeroing/swap-in can't be bypassed with the given
>     smem_caching mode. (Matt Roper)
>   - Fix broken range check for coh_mode and smem_cpu_caching and also
>     don't use constant value, but the already defined macros. (José)
>   - Prefer switch statement for smem_cpu_caching -> ttm_caching. (José)
>   - Add note in kernel-doc for dgpu and coherency modes for system
>     memory. (José)
> 
> Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
> Co-authored-by: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: José Roberto de Souza <jose.souza@intel.com>
> Cc: Filip Hazubski <filip.hazubski@intel.com>
> Cc: Carl Zhang <carl.zhang@intel.com>
> Cc: Effie Yu <effie.yu@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.c       | 105 ++++++++++++++++++++++++++-----
>  drivers/gpu/drm/xe/xe_bo.h       |   3 +-
>  drivers/gpu/drm/xe/xe_bo_types.h |  10 +++
>  drivers/gpu/drm/xe/xe_dma_buf.c  |   5 +-
>  include/uapi/drm/xe_drm.h        |  57 ++++++++++++++++-
>  5 files changed, 158 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 27726d4f3423..f3facd788f15 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -325,7 +325,7 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
>  	struct xe_device *xe = xe_bo_device(bo);
>  	struct xe_ttm_tt *tt;
>  	unsigned long extra_pages;
> -	enum ttm_caching caching = ttm_cached;
> +	enum ttm_caching caching;
>  	int err;
>  
>  	tt = kzalloc(sizeof(*tt), GFP_KERNEL);
> @@ -339,13 +339,25 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
>  		extra_pages = DIV_ROUND_UP(xe_device_ccs_bytes(xe, bo->size),
>  					   PAGE_SIZE);
>  
> +	switch (bo->smem_cpu_caching) {
> +	case XE_GEM_CPU_CACHING_WC:
> +		caching = ttm_write_combined;
> +		break;
> +	case XE_GEM_CPU_CACHING_UC:
> +		caching = ttm_uncached;
> +		break;
> +	default:
> +		caching = ttm_cached;
> +		break;
> +	}
> +
>  	/*
>  	 * Display scanout is always non-coherent with the CPU cache.
>  	 *
>  	 * For Xe_LPG and beyond, PPGTT PTE lookups are also non-coherent and
>  	 * require a CPU:WC mapping.
>  	 */
> -	if (bo->flags & XE_BO_SCANOUT_BIT ||
> +	if ((!bo->smem_cpu_caching && bo->flags & XE_BO_SCANOUT_BIT) ||
>  	    (xe->info.graphics_verx100 >= 1270 && bo->flags & XE_BO_PAGETABLE))
>  		caching = ttm_write_combined;
>  
> @@ -1184,9 +1196,10 @@ void xe_bo_free(struct xe_bo *bo)
>  	kfree(bo);
>  }
>  
> -struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> +struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>  				    struct xe_tile *tile, struct dma_resv *resv,
>  				    struct ttm_lru_bulk_move *bulk, size_t size,
> +				    u16 smem_cpu_caching, u16 coh_mode,
>  				    enum ttm_bo_type type, u32 flags)
>  {
>  	struct ttm_operation_ctx ctx = {
> @@ -1224,6 +1237,8 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>  	bo->tile = tile;
>  	bo->size = size;
>  	bo->flags = flags;
> +	bo->smem_cpu_caching = smem_cpu_caching;
> +	bo->coh_mode = coh_mode;
>  	bo->ttm.base.funcs = &xe_gem_object_funcs;
>  	bo->props.preferred_mem_class = XE_BO_PROPS_INVALID;
>  	bo->props.preferred_gt = XE_BO_PROPS_INVALID;
> @@ -1307,10 +1322,11 @@ static int __xe_bo_fixed_placement(struct xe_device *xe,
>  }
>  
>  struct xe_bo *
> -xe_bo_create_locked_range(struct xe_device *xe,
> -			  struct xe_tile *tile, struct xe_vm *vm,
> -			  size_t size, u64 start, u64 end,
> -			  enum ttm_bo_type type, u32 flags)
> +__xe_bo_create_locked(struct xe_device *xe,
> +		      struct xe_tile *tile, struct xe_vm *vm,
> +		      size_t size, u64 start, u64 end,
> +		      u16 smem_cpu_caching, u16 coh_mode,
> +		      enum ttm_bo_type type, u32 flags)
>  {
>  	struct xe_bo *bo = NULL;
>  	int err;
> @@ -1331,10 +1347,11 @@ xe_bo_create_locked_range(struct xe_device *xe,
>  		}
>  	}
>  
> -	bo = __xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
> +	bo = ___xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
>  				   vm && !xe_vm_in_fault_mode(vm) &&
>  				   flags & XE_BO_CREATE_USER_BIT ?
>  				   &vm->lru_bulk_move : NULL, size,
> +				   smem_cpu_caching, coh_mode,
>  				   type, flags);
>  	if (IS_ERR(bo))
>  		return bo;
> @@ -1368,11 +1385,35 @@ xe_bo_create_locked_range(struct xe_device *xe,
>  	return ERR_PTR(err);
>  }
>  
> +struct xe_bo *
> +xe_bo_create_locked_range(struct xe_device *xe,
> +			  struct xe_tile *tile, struct xe_vm *vm,
> +			  size_t size, u64 start, u64 end,
> +			  enum ttm_bo_type type, u32 flags)
> +{
> +	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);
> +}
> +
>  struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
>  				  struct xe_vm *vm, size_t size,
>  				  enum ttm_bo_type type, u32 flags)
>  {
> -	return xe_bo_create_locked_range(xe, tile, vm, size, 0, ~0ULL, type, flags);
> +	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);
> +}
> +
> +static struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
> +				       struct xe_vm *vm, size_t size,
> +				       u16 smem_cpu_caching, u16 coh_mode,
> +				       enum ttm_bo_type type,
> +				       u32 flags)
> +{
> +	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
> +						 smem_cpu_caching, coh_mode, type,
> +						 flags | XE_BO_CREATE_USER_BIT);
> +	if (!IS_ERR(bo))
> +		xe_bo_unlock_vm_held(bo);
> +
> +	return bo;
>  }
>  
>  struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
> @@ -1755,11 +1796,11 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>  	struct drm_xe_gem_create *args = data;
>  	struct xe_vm *vm = NULL;
>  	struct xe_bo *bo;
> -	unsigned int bo_flags = XE_BO_CREATE_USER_BIT;
> +	unsigned int bo_flags;
>  	u32 handle;
>  	int err;
>  
> -	if (XE_IOCTL_DBG(xe, args->extensions) || XE_IOCTL_DBG(xe, args->pad) ||
> +	if (XE_IOCTL_DBG(xe, args->extensions) ||
>  	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
>  		return -EINVAL;
>  
> @@ -1801,6 +1842,32 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>  		bo_flags |= XE_BO_NEEDS_CPU_ACCESS;
>  	}
>  
> +	if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY))
> +		return -EINVAL;
> +
> +	if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC))
> +		return -EINVAL;
> +
> +	if (bo_flags & XE_BO_CREATE_SYSTEM_BIT) {
> +		if (XE_IOCTL_DBG(xe, !args->coh_mode))
> +			return -EINVAL;
> +
> +		if (XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
> +			return -EINVAL;
> +
> +		if (XE_IOCTL_DBG(xe, !IS_DGFX(xe) &&
> +				 bo_flags & XE_BO_SCANOUT_BIT &&
> +				 args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
> +			return -EINVAL;
> +
> +		if (args->coh_mode == XE_GEM_COH_NONE) {
> +			if (XE_IOCTL_DBG(xe, args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
> +				return -EINVAL;
> +		}
> +	} else if (XE_IOCTL_DBG(xe, args->smem_cpu_caching)) {

should be XE_IOCTL_DBG(xe, !args->smem_cpu_caching).

uAPI don't say anything about allow smem_cpu_caching or coh_mode == 0, did this to be able to run tests without display in DG2:


diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index f3facd788f152..e0e4fefcd2060 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1796,7 +1796,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
        struct drm_xe_gem_create *args = data;
        struct xe_vm *vm = NULL;
        struct xe_bo *bo;
-       unsigned int bo_flags;
+       unsigned int bo_flags = 0;
        u32 handle;
        int err;

@@ -1842,19 +1842,15 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
                bo_flags |= XE_BO_NEEDS_CPU_ACCESS;
        }

-       if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY))
+       if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY) ||
+           XE_IOCTL_DBG(xe, !args->coh_mode))
                return -EINVAL;

-       if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC))
+       if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC) ||
+           XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
                return -EINVAL;

        if (bo_flags & XE_BO_CREATE_SYSTEM_BIT) {
-               if (XE_IOCTL_DBG(xe, !args->coh_mode))
-                       return -EINVAL;
-
-               if (XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
-                       return -EINVAL;
-
                if (XE_IOCTL_DBG(xe, !IS_DGFX(xe) &&
                                 bo_flags & XE_BO_SCANOUT_BIT &&
                                 args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
@@ -1864,8 +1860,6 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
                        if (XE_IOCTL_DBG(xe, args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
                                return -EINVAL;
                }
-       } else if (XE_IOCTL_DBG(xe, args->smem_cpu_caching)) {
-               return -EINVAL;
        }

        if (args->vm_id) {


> +		return -EINVAL;
> +	}
> +
>  	if (args->vm_id) {
>  		vm = xe_vm_lookup(xef, args->vm_id);
>  		if (XE_IOCTL_DBG(xe, !vm))
> @@ -1812,8 +1879,10 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>  		}
>  	}
>  
> -	bo = xe_bo_create(xe, NULL, vm, args->size, ttm_bo_type_device,
> -			  bo_flags);
> +	bo = xe_bo_create_user(xe, NULL, vm, args->size,
> +			       args->smem_cpu_caching, args->coh_mode,
> +			       ttm_bo_type_device,
> +			       bo_flags);
>  	if (IS_ERR(bo)) {
>  		err = PTR_ERR(bo);
>  		goto out_vm;
> @@ -2105,10 +2174,12 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
>  	args->size = ALIGN(mul_u32_u32(args->pitch, args->height),
>  			   page_size);
>  
> -	bo = xe_bo_create(xe, NULL, NULL, args->size, ttm_bo_type_device,
> -			  XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> -			  XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
> -			  XE_BO_NEEDS_CPU_ACCESS);
> +	bo = xe_bo_create_user(xe, NULL, NULL, args->size,
> +			       XE_GEM_CPU_CACHING_WC, XE_GEM_COH_NONE,
> +			       ttm_bo_type_device,
> +			       XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> +			       XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
> +			       XE_BO_NEEDS_CPU_ACCESS);
>  	if (IS_ERR(bo))
>  		return PTR_ERR(bo);
>  
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index 4a68d869b3b5..4a0ee81fe598 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -81,9 +81,10 @@ struct sg_table;
>  struct xe_bo *xe_bo_alloc(void);
>  void xe_bo_free(struct xe_bo *bo);
>  
> -struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> +struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>  				    struct xe_tile *tile, struct dma_resv *resv,
>  				    struct ttm_lru_bulk_move *bulk, size_t size,
> +				    u16 smem_cpu_caching, u16 coh_mode,
>  				    enum ttm_bo_type type, u32 flags);
>  struct xe_bo *
>  xe_bo_create_locked_range(struct xe_device *xe,
> diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
> index 2ea9ad423170..9bee220a6872 100644
> --- a/drivers/gpu/drm/xe/xe_bo_types.h
> +++ b/drivers/gpu/drm/xe/xe_bo_types.h
> @@ -68,6 +68,16 @@ struct xe_bo {
>  	struct llist_node freed;
>  	/** @created: Whether the bo has passed initial creation */
>  	bool created;
> +	/**
> +	 * @coh_mode: Coherency setting. Currently only used for userspace
> +	 * objects.
> +	 */
> +	u16 coh_mode;
> +	/**
> +	 * @smem_cpu_caching: Caching mode for smem. Currently only used for
> +	 * userspace objects.
> +	 */
> +	u16 smem_cpu_caching;
>  };
>  
>  #define intel_bo_to_drm_bo(bo) (&(bo)->ttm.base)
> diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
> index 09343b8b3e96..ac20dbc27a2b 100644
> --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> @@ -200,8 +200,9 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
>  	int ret;
>  
>  	dma_resv_lock(resv, NULL);
> -	bo = __xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> -				   ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
> +	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> +				    0, 0, /* Will require 1way or 2way for vm_bind */
> +				    ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
>  	if (IS_ERR(bo)) {
>  		ret = PTR_ERR(bo);
>  		goto error;
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index 00d5cb4ef85e..737bb1d4c6f7 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -456,8 +456,61 @@ struct drm_xe_gem_create {
>  	 */
>  	__u32 handle;
>  
> -	/** @pad: MBZ */
> -	__u32 pad;
> +	/**
> +	 * @coh_mode: The coherency mode for this object. This will limit the
> +	 * possible @smem_caching values.
> +	 *
> +	 * Supported values:
> +	 *
> +	 * XE_GEM_COH_NONE: GPU access is assumed to be not coherent with
> +	 * CPU. CPU caches are not snooped.
> +	 *
> +	 * XE_GEM_COH_AT_LEAST_1WAY:
> +	 *
> +	 * CPU-GPU coherency must be at least 1WAY.
> +	 *
> +	 * If 1WAY then GPU access is coherent with CPU (CPU caches are snooped)
> +	 * until GPU acquires. The acquire by the GPU is not tracked by CPU
> +	 * caches.
> +	 *
> +	 * If 2WAY then should be fully coherent between GPU and CPU.  Fully
> +	 * tracked by CPU caches. Both CPU and GPU caches are snooped.
> +	 *
> +	 * Note: On dgpu the GPU device never caches system memory (outside of
> +	 * the special system-memory-read-only cache, which is anyway flushed by
> +	 * KMD when nuking TLBs for a given object so should be no concern to
> +	 * userspace). The device should be thought of as always 1WAY coherent,
> +	 * with the addition that the GPU never caches system memory. At least
> +	 * on current dgpu HW there is no way to turn off snooping so likely the
> +	 * different coherency modes of the pat_index make no difference for
> +	 * system memory.
> +	 */
> +#define XE_GEM_COH_NONE			1
> +#define XE_GEM_COH_AT_LEAST_1WAY	2
> +	__u16 coh_mode;
> +
> +	/**
> +	 * @smem_cpu_caching: The CPU caching mode to select for system memory.
> +	 *
> +	 * Supported values:
> +	 *
> +	 * XE_GEM_CPU_CACHING_WB: Allocate the pages with write-back caching.
> +	 * On iGPU this can't be used for scanout surfaces. The @coh_mode must
> +	 * be XE_GEM_COH_AT_LEAST_1WAY.
> +	 *
> +	 * XE_GEM_CPU_CACHING_WC: Allocate the pages as write-combined. This is
> +	 * uncached. Any @coh_mode is permitted. Scanout surfaces should likely
> +	 * use this.
> +	 *
> +	 * XE_GEM_CPU_CACHING_UC: Allocate the pages as uncached. Any @coh_mode
> +	 * is permitted. Scanout surfaces are permitted to use this.
> +	 *
> +	 * MUST be left as zero for VRAM-only objects.
> +	 */
> +#define XE_GEM_CPU_CACHING_WB                      1
> +#define XE_GEM_CPU_CACHING_WC                      2
> +#define XE_GEM_CPU_CACHING_UC                      3
> +	__u16 smem_cpu_caching;
>  
>  	/** @reserved: Reserved */
>  	__u64 reserved[2];


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Intel-xe] ✗ CI.Patch_applied: failure for PAT and cache coherency support (rev3)
  2023-09-14 15:31 [Intel-xe] [PATCH v2 0/6] PAT and cache coherency support Matthew Auld
                   ` (7 preceding siblings ...)
  2023-09-18 15:51 ` [Intel-xe] [PATCH v2 0/6] PAT and cache coherency support Souza, Jose
@ 2023-09-21 20:10 ` Patchwork
  8 siblings, 0 replies; 28+ messages in thread
From: Patchwork @ 2023-09-21 20:10 UTC (permalink / raw)
  To: Souza, Jose; +Cc: intel-xe

== Series Details ==

Series: PAT and cache coherency support (rev3)
URL   : https://patchwork.freedesktop.org/series/123027/
State : failure

== Summary ==

=== Applying kernel patches on branch 'drm-xe-next' with base: ===
Base commit: 7e0d40e7a fixup! drm/xe/display: Implement display support
=== git am output follows ===
error: patch failed: drivers/gpu/drm/xe/xe_bo.c:1796
error: drivers/gpu/drm/xe/xe_bo.c: patch does not apply
hint: Use 'git am --show-current-patch' to see the failed patch
Applying: drm/xe/uapi: Add support for cache and coherency mode
Patch failed at 0001 drm/xe/uapi: Add support for cache and coherency mode
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-xe] [PATCH v2 1/6] drm/xe/uapi: Add support for cache and coherency mode
  2023-09-21 20:07   ` Souza, Jose
@ 2023-09-25  8:06     ` Matthew Auld
  2023-09-25 18:26       ` Souza, Jose
  0 siblings, 1 reply; 28+ messages in thread
From: Matthew Auld @ 2023-09-25  8:06 UTC (permalink / raw)
  To: Souza, Jose, intel-xe@lists.freedesktop.org

On 21/09/2023 21:07, Souza, Jose wrote:
> On Thu, 2023-09-14 at 16:31 +0100, Matthew Auld wrote:
>> From: Pallavi Mishra <pallavi.mishra@intel.com>
>>
>> Allow userspace to specify the CPU caching mode to use for system memory
>> in addition to coherency modes during object creation. Modify gem create
>> handler and introduce xe_bo_create_user to replace xe_bo_create. In a
>> later patch we will support setting the pat_index as part of vm_bind,
>> where expectation is that the coherency mode extracted from the
>> pat_index must match the one set at object creation.
>>
>> v2
>>    - s/smem_caching/smem_cpu_caching/ and
>>      s/XE_GEM_CACHING/XE_GEM_CPU_CACHING/. (Matt Roper)
>>    - Drop COH_2WAY and just use COH_NONE + COH_AT_LEAST_1WAY; KMD mostly
>>      just cares that zeroing/swap-in can't be bypassed with the given
>>      smem_caching mode. (Matt Roper)
>>    - Fix broken range check for coh_mode and smem_cpu_caching and also
>>      don't use constant value, but the already defined macros. (José)
>>    - Prefer switch statement for smem_cpu_caching -> ttm_caching. (José)
>>    - Add note in kernel-doc for dgpu and coherency modes for system
>>      memory. (José)
>>
>> Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
>> Co-authored-by: Matthew Auld <matthew.auld@intel.com>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Cc: José Roberto de Souza <jose.souza@intel.com>
>> Cc: Filip Hazubski <filip.hazubski@intel.com>
>> Cc: Carl Zhang <carl.zhang@intel.com>
>> Cc: Effie Yu <effie.yu@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_bo.c       | 105 ++++++++++++++++++++++++++-----
>>   drivers/gpu/drm/xe/xe_bo.h       |   3 +-
>>   drivers/gpu/drm/xe/xe_bo_types.h |  10 +++
>>   drivers/gpu/drm/xe/xe_dma_buf.c  |   5 +-
>>   include/uapi/drm/xe_drm.h        |  57 ++++++++++++++++-
>>   5 files changed, 158 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>> index 27726d4f3423..f3facd788f15 100644
>> --- a/drivers/gpu/drm/xe/xe_bo.c
>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>> @@ -325,7 +325,7 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
>>   	struct xe_device *xe = xe_bo_device(bo);
>>   	struct xe_ttm_tt *tt;
>>   	unsigned long extra_pages;
>> -	enum ttm_caching caching = ttm_cached;
>> +	enum ttm_caching caching;
>>   	int err;
>>   
>>   	tt = kzalloc(sizeof(*tt), GFP_KERNEL);
>> @@ -339,13 +339,25 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
>>   		extra_pages = DIV_ROUND_UP(xe_device_ccs_bytes(xe, bo->size),
>>   					   PAGE_SIZE);
>>   
>> +	switch (bo->smem_cpu_caching) {
>> +	case XE_GEM_CPU_CACHING_WC:
>> +		caching = ttm_write_combined;
>> +		break;
>> +	case XE_GEM_CPU_CACHING_UC:
>> +		caching = ttm_uncached;
>> +		break;
>> +	default:
>> +		caching = ttm_cached;
>> +		break;
>> +	}
>> +
>>   	/*
>>   	 * Display scanout is always non-coherent with the CPU cache.
>>   	 *
>>   	 * For Xe_LPG and beyond, PPGTT PTE lookups are also non-coherent and
>>   	 * require a CPU:WC mapping.
>>   	 */
>> -	if (bo->flags & XE_BO_SCANOUT_BIT ||
>> +	if ((!bo->smem_cpu_caching && bo->flags & XE_BO_SCANOUT_BIT) ||
>>   	    (xe->info.graphics_verx100 >= 1270 && bo->flags & XE_BO_PAGETABLE))
>>   		caching = ttm_write_combined;
>>   
>> @@ -1184,9 +1196,10 @@ void xe_bo_free(struct xe_bo *bo)
>>   	kfree(bo);
>>   }
>>   
>> -struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>> +struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>>   				    struct xe_tile *tile, struct dma_resv *resv,
>>   				    struct ttm_lru_bulk_move *bulk, size_t size,
>> +				    u16 smem_cpu_caching, u16 coh_mode,
>>   				    enum ttm_bo_type type, u32 flags)
>>   {
>>   	struct ttm_operation_ctx ctx = {
>> @@ -1224,6 +1237,8 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>>   	bo->tile = tile;
>>   	bo->size = size;
>>   	bo->flags = flags;
>> +	bo->smem_cpu_caching = smem_cpu_caching;
>> +	bo->coh_mode = coh_mode;
>>   	bo->ttm.base.funcs = &xe_gem_object_funcs;
>>   	bo->props.preferred_mem_class = XE_BO_PROPS_INVALID;
>>   	bo->props.preferred_gt = XE_BO_PROPS_INVALID;
>> @@ -1307,10 +1322,11 @@ static int __xe_bo_fixed_placement(struct xe_device *xe,
>>   }
>>   
>>   struct xe_bo *
>> -xe_bo_create_locked_range(struct xe_device *xe,
>> -			  struct xe_tile *tile, struct xe_vm *vm,
>> -			  size_t size, u64 start, u64 end,
>> -			  enum ttm_bo_type type, u32 flags)
>> +__xe_bo_create_locked(struct xe_device *xe,
>> +		      struct xe_tile *tile, struct xe_vm *vm,
>> +		      size_t size, u64 start, u64 end,
>> +		      u16 smem_cpu_caching, u16 coh_mode,
>> +		      enum ttm_bo_type type, u32 flags)
>>   {
>>   	struct xe_bo *bo = NULL;
>>   	int err;
>> @@ -1331,10 +1347,11 @@ xe_bo_create_locked_range(struct xe_device *xe,
>>   		}
>>   	}
>>   
>> -	bo = __xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
>> +	bo = ___xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
>>   				   vm && !xe_vm_in_fault_mode(vm) &&
>>   				   flags & XE_BO_CREATE_USER_BIT ?
>>   				   &vm->lru_bulk_move : NULL, size,
>> +				   smem_cpu_caching, coh_mode,
>>   				   type, flags);
>>   	if (IS_ERR(bo))
>>   		return bo;
>> @@ -1368,11 +1385,35 @@ xe_bo_create_locked_range(struct xe_device *xe,
>>   	return ERR_PTR(err);
>>   }
>>   
>> +struct xe_bo *
>> +xe_bo_create_locked_range(struct xe_device *xe,
>> +			  struct xe_tile *tile, struct xe_vm *vm,
>> +			  size_t size, u64 start, u64 end,
>> +			  enum ttm_bo_type type, u32 flags)
>> +{
>> +	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);
>> +}
>> +
>>   struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
>>   				  struct xe_vm *vm, size_t size,
>>   				  enum ttm_bo_type type, u32 flags)
>>   {
>> -	return xe_bo_create_locked_range(xe, tile, vm, size, 0, ~0ULL, type, flags);
>> +	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);
>> +}
>> +
>> +static struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
>> +				       struct xe_vm *vm, size_t size,
>> +				       u16 smem_cpu_caching, u16 coh_mode,
>> +				       enum ttm_bo_type type,
>> +				       u32 flags)
>> +{
>> +	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
>> +						 smem_cpu_caching, coh_mode, type,
>> +						 flags | XE_BO_CREATE_USER_BIT);
>> +	if (!IS_ERR(bo))
>> +		xe_bo_unlock_vm_held(bo);
>> +
>> +	return bo;
>>   }
>>   
>>   struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
>> @@ -1755,11 +1796,11 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>>   	struct drm_xe_gem_create *args = data;
>>   	struct xe_vm *vm = NULL;
>>   	struct xe_bo *bo;
>> -	unsigned int bo_flags = XE_BO_CREATE_USER_BIT;
>> +	unsigned int bo_flags;
>>   	u32 handle;
>>   	int err;
>>   
>> -	if (XE_IOCTL_DBG(xe, args->extensions) || XE_IOCTL_DBG(xe, args->pad) ||
>> +	if (XE_IOCTL_DBG(xe, args->extensions) ||
>>   	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
>>   		return -EINVAL;
>>   
>> @@ -1801,6 +1842,32 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>>   		bo_flags |= XE_BO_NEEDS_CPU_ACCESS;
>>   	}
>>   
>> +	if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY))
>> +		return -EINVAL;
>> +
>> +	if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC))
>> +		return -EINVAL;
>> +
>> +	if (bo_flags & XE_BO_CREATE_SYSTEM_BIT) {
>> +		if (XE_IOCTL_DBG(xe, !args->coh_mode))
>> +			return -EINVAL;
>> +
>> +		if (XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
>> +			return -EINVAL;
>> +
>> +		if (XE_IOCTL_DBG(xe, !IS_DGFX(xe) &&
>> +				 bo_flags & XE_BO_SCANOUT_BIT &&
>> +				 args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
>> +			return -EINVAL;
>> +
>> +		if (args->coh_mode == XE_GEM_COH_NONE) {
>> +			if (XE_IOCTL_DBG(xe, args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
>> +				return -EINVAL;
>> +		}
>> +	} else if (XE_IOCTL_DBG(xe, args->smem_cpu_caching)) {
> 
> should be XE_IOCTL_DBG(xe, !args->smem_cpu_caching).
> 
> uAPI don't say anything about allow smem_cpu_caching or coh_mode == 0, did this to be able to run tests without display in DG2:

The above check is for VRAM-only objects. For smem_cpu_caching the 
kernel-doc says: "MUST be left as zero for VRAM-only objects." 
Internally the KMD uses WC for CPU mapping VRAM which is out of the 
control of userspace.

coh_mode == 0 is not meant to be allowed, but looks like I missed the 
check here for VRAM-only. Will fix.

> 
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index f3facd788f152..e0e4fefcd2060 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -1796,7 +1796,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>          struct drm_xe_gem_create *args = data;
>          struct xe_vm *vm = NULL;
>          struct xe_bo *bo;
> -       unsigned int bo_flags;
> +       unsigned int bo_flags = 0;
>          u32 handle;
>          int err;
> 
> @@ -1842,19 +1842,15 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>                  bo_flags |= XE_BO_NEEDS_CPU_ACCESS;
>          }
> 
> -       if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY))
> +       if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY) ||
> +           XE_IOCTL_DBG(xe, !args->coh_mode))
>                  return -EINVAL;
> 
> -       if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC))
> +       if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC) ||
> +           XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
>                  return -EINVAL;
> 
>          if (bo_flags & XE_BO_CREATE_SYSTEM_BIT) {
> -               if (XE_IOCTL_DBG(xe, !args->coh_mode))
> -                       return -EINVAL;
> -
> -               if (XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
> -                       return -EINVAL;
> -
>                  if (XE_IOCTL_DBG(xe, !IS_DGFX(xe) &&
>                                   bo_flags & XE_BO_SCANOUT_BIT &&
>                                   args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
> @@ -1864,8 +1860,6 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>                          if (XE_IOCTL_DBG(xe, args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
>                                  return -EINVAL;
>                  }
> -       } else if (XE_IOCTL_DBG(xe, args->smem_cpu_caching)) {
> -               return -EINVAL;
>          }
> 
>          if (args->vm_id) {
> 
> 
>> +		return -EINVAL;
>> +	}
>> +
>>   	if (args->vm_id) {
>>   		vm = xe_vm_lookup(xef, args->vm_id);
>>   		if (XE_IOCTL_DBG(xe, !vm))
>> @@ -1812,8 +1879,10 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>>   		}
>>   	}
>>   
>> -	bo = xe_bo_create(xe, NULL, vm, args->size, ttm_bo_type_device,
>> -			  bo_flags);
>> +	bo = xe_bo_create_user(xe, NULL, vm, args->size,
>> +			       args->smem_cpu_caching, args->coh_mode,
>> +			       ttm_bo_type_device,
>> +			       bo_flags);
>>   	if (IS_ERR(bo)) {
>>   		err = PTR_ERR(bo);
>>   		goto out_vm;
>> @@ -2105,10 +2174,12 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
>>   	args->size = ALIGN(mul_u32_u32(args->pitch, args->height),
>>   			   page_size);
>>   
>> -	bo = xe_bo_create(xe, NULL, NULL, args->size, ttm_bo_type_device,
>> -			  XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
>> -			  XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
>> -			  XE_BO_NEEDS_CPU_ACCESS);
>> +	bo = xe_bo_create_user(xe, NULL, NULL, args->size,
>> +			       XE_GEM_CPU_CACHING_WC, XE_GEM_COH_NONE,
>> +			       ttm_bo_type_device,
>> +			       XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
>> +			       XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
>> +			       XE_BO_NEEDS_CPU_ACCESS);
>>   	if (IS_ERR(bo))
>>   		return PTR_ERR(bo);
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
>> index 4a68d869b3b5..4a0ee81fe598 100644
>> --- a/drivers/gpu/drm/xe/xe_bo.h
>> +++ b/drivers/gpu/drm/xe/xe_bo.h
>> @@ -81,9 +81,10 @@ struct sg_table;
>>   struct xe_bo *xe_bo_alloc(void);
>>   void xe_bo_free(struct xe_bo *bo);
>>   
>> -struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>> +struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>>   				    struct xe_tile *tile, struct dma_resv *resv,
>>   				    struct ttm_lru_bulk_move *bulk, size_t size,
>> +				    u16 smem_cpu_caching, u16 coh_mode,
>>   				    enum ttm_bo_type type, u32 flags);
>>   struct xe_bo *
>>   xe_bo_create_locked_range(struct xe_device *xe,
>> diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
>> index 2ea9ad423170..9bee220a6872 100644
>> --- a/drivers/gpu/drm/xe/xe_bo_types.h
>> +++ b/drivers/gpu/drm/xe/xe_bo_types.h
>> @@ -68,6 +68,16 @@ struct xe_bo {
>>   	struct llist_node freed;
>>   	/** @created: Whether the bo has passed initial creation */
>>   	bool created;
>> +	/**
>> +	 * @coh_mode: Coherency setting. Currently only used for userspace
>> +	 * objects.
>> +	 */
>> +	u16 coh_mode;
>> +	/**
>> +	 * @smem_cpu_caching: Caching mode for smem. Currently only used for
>> +	 * userspace objects.
>> +	 */
>> +	u16 smem_cpu_caching;
>>   };
>>   
>>   #define intel_bo_to_drm_bo(bo) (&(bo)->ttm.base)
>> diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
>> index 09343b8b3e96..ac20dbc27a2b 100644
>> --- a/drivers/gpu/drm/xe/xe_dma_buf.c
>> +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
>> @@ -200,8 +200,9 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
>>   	int ret;
>>   
>>   	dma_resv_lock(resv, NULL);
>> -	bo = __xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
>> -				   ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
>> +	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
>> +				    0, 0, /* Will require 1way or 2way for vm_bind */
>> +				    ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
>>   	if (IS_ERR(bo)) {
>>   		ret = PTR_ERR(bo);
>>   		goto error;
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index 00d5cb4ef85e..737bb1d4c6f7 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -456,8 +456,61 @@ struct drm_xe_gem_create {
>>   	 */
>>   	__u32 handle;
>>   
>> -	/** @pad: MBZ */
>> -	__u32 pad;
>> +	/**
>> +	 * @coh_mode: The coherency mode for this object. This will limit the
>> +	 * possible @smem_caching values.
>> +	 *
>> +	 * Supported values:
>> +	 *
>> +	 * XE_GEM_COH_NONE: GPU access is assumed to be not coherent with
>> +	 * CPU. CPU caches are not snooped.
>> +	 *
>> +	 * XE_GEM_COH_AT_LEAST_1WAY:
>> +	 *
>> +	 * CPU-GPU coherency must be at least 1WAY.
>> +	 *
>> +	 * If 1WAY then GPU access is coherent with CPU (CPU caches are snooped)
>> +	 * until GPU acquires. The acquire by the GPU is not tracked by CPU
>> +	 * caches.
>> +	 *
>> +	 * If 2WAY then should be fully coherent between GPU and CPU.  Fully
>> +	 * tracked by CPU caches. Both CPU and GPU caches are snooped.
>> +	 *
>> +	 * Note: On dgpu the GPU device never caches system memory (outside of
>> +	 * the special system-memory-read-only cache, which is anyway flushed by
>> +	 * KMD when nuking TLBs for a given object so should be no concern to
>> +	 * userspace). The device should be thought of as always 1WAY coherent,
>> +	 * with the addition that the GPU never caches system memory. At least
>> +	 * on current dgpu HW there is no way to turn off snooping so likely the
>> +	 * different coherency modes of the pat_index make no difference for
>> +	 * system memory.
>> +	 */
>> +#define XE_GEM_COH_NONE			1
>> +#define XE_GEM_COH_AT_LEAST_1WAY	2
>> +	__u16 coh_mode;
>> +
>> +	/**
>> +	 * @smem_cpu_caching: The CPU caching mode to select for system memory.
>> +	 *
>> +	 * Supported values:
>> +	 *
>> +	 * XE_GEM_CPU_CACHING_WB: Allocate the pages with write-back caching.
>> +	 * On iGPU this can't be used for scanout surfaces. The @coh_mode must
>> +	 * be XE_GEM_COH_AT_LEAST_1WAY.
>> +	 *
>> +	 * XE_GEM_CPU_CACHING_WC: Allocate the pages as write-combined. This is
>> +	 * uncached. Any @coh_mode is permitted. Scanout surfaces should likely
>> +	 * use this.
>> +	 *
>> +	 * XE_GEM_CPU_CACHING_UC: Allocate the pages as uncached. Any @coh_mode
>> +	 * is permitted. Scanout surfaces are permitted to use this.
>> +	 *
>> +	 * MUST be left as zero for VRAM-only objects.
>> +	 */
>> +#define XE_GEM_CPU_CACHING_WB                      1
>> +#define XE_GEM_CPU_CACHING_WC                      2
>> +#define XE_GEM_CPU_CACHING_UC                      3
>> +	__u16 smem_cpu_caching;
>>   
>>   	/** @reserved: Reserved */
>>   	__u64 reserved[2];
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-xe] [PATCH v2 6/6] drm/xe/uapi: support pat_index selection with vm_bind
  2023-09-15 22:24   ` Matt Roper
@ 2023-09-25  8:07     ` Matthew Auld
  0 siblings, 0 replies; 28+ messages in thread
From: Matthew Auld @ 2023-09-25  8:07 UTC (permalink / raw)
  To: Matt Roper
  Cc: Effie Yu, Filip Hazubski, Lucas De Marchi, intel-xe, Carl Zhang

On 15/09/2023 23:24, Matt Roper wrote:
> On Thu, Sep 14, 2023 at 04:31:19PM +0100, Matthew Auld wrote:
>> Allow userspace to directly control the pat_index for a given vm
>> binding. This should allow directly controlling the coherency, caching
>> and potentially other stuff in the future for the ppGTT binding.
>>
>> The exact meaning behind the pat_index is very platform specific (see
>> BSpec or PRMs) but effectively maps to some predefined memory
>> attributes. From the KMD pov we only care about the coherency that is
>> provided by the pat_index, which falls into either NONE, 1WAY or 2WAY.
>> The vm_bind coherency mode for the given pat_index needs to be at least
>> as coherent as the coh_mode that was set at object creation. For
>> platforms that lack the explicit coherency mode, we treat UC/WT/WC as
>> NONE and WB as AT_LEAST_1WAY.
>>
>> For userptr mappings we lack a corresponding gem object, so the expected
>> coherency mode is instead implicit and must fall into either 1WAY or
>> 2WAY. Trying to use NONE will be rejected by the kernel. For imported
>> dma-buf (from a different device) the coherency mode is also implicit
>> and must also be either 1WAY or 2WAY i.e AT_LEAST_1WAY.
>>
>> As part of adding pat_index support with vm_bind we also need stop using
>> xe_cache_level and instead use the pat_index in various places. We still
>> make use of xe_cache_level, but only as a convenience for kernel
>> internal objectsi (internally it maps to some reasonable pat_index). For
> 
> It feels like the internal refactoring to use pat index directly in PTE
> encoding should probably be in a separate patch from the vm_bind uapi
> changes that allow userspace to specify a PAT index.

Yeah, agreed. Will fix.

> 
> 
> Matt
> 
>> now this is just a 1:1 conversion of the existing code, however for
>> platforms like MTL+ we might need to give more control through bo_create
>> or stop using WB on the CPU side if we need CPU access.
>>
>> v2:
>>    - Undefined coh_mode(pat_index) can now be treated as programmer error. (Matt Roper)
>>    - We now allow gem_create.coh_mode <= coh_mode(pat_index), rather than
>>      having to match exactly. This ensures imported dma-buf can always
>>      just use 1way (or even 2way), now that we also bundle 1way/2way into
>>      at_least_1way. We still require 1way/2way for external dma-buf, but
>>      the policy can now be the same for self-import, if desired.
>>    - Use u16 for pat_index in uapi. u32 is massive overkill. (José)
>>    - Move as much of the pat_index validation as we can into
>>      vm_bind_ioctl_check_args. (José)
>>
>> Bspec: 45101, 44235 #xe
>> Bspec: 70552, 71582, 59400 #xe2
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Pallavi Mishra <pallavi.mishra@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Cc: José Roberto de Souza <jose.souza@intel.com>
>> Cc: Filip Hazubski <filip.hazubski@intel.com>
>> Cc: Carl Zhang <carl.zhang@intel.com>
>> Cc: Effie Yu <effie.yu@intel.com>
>> ---
>>   drivers/gpu/drm/xe/tests/xe_migrate.c |  2 +-
>>   drivers/gpu/drm/xe/xe_ggtt.c          |  7 ++-
>>   drivers/gpu/drm/xe/xe_ggtt_types.h    |  2 +-
>>   drivers/gpu/drm/xe/xe_migrate.c       | 13 +++--
>>   drivers/gpu/drm/xe/xe_pt.c            | 22 ++++-----
>>   drivers/gpu/drm/xe/xe_pt.h            |  4 +-
>>   drivers/gpu/drm/xe/xe_vm.c            | 69 +++++++++++++++++++++------
>>   drivers/gpu/drm/xe/xe_vm_types.h      | 10 +++-
>>   include/uapi/drm/xe_drm.h             | 43 ++++++++++++++++-
>>   9 files changed, 128 insertions(+), 44 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
>> index 6b4388bfbb31..d3bf4751a2d7 100644
>> --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
>> +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
>> @@ -301,7 +301,7 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
>>   	/* First part of the test, are we updating our pagetable bo with a new entry? */
>>   	xe_map_wr(xe, &bo->vmap, XE_PAGE_SIZE * (NUM_KERNEL_PDE - 1), u64,
>>   		  0xdeaddeadbeefbeef);
>> -	expected = xe_pte_encode(m->q->vm, pt, 0, XE_CACHE_WB, 0);
>> +	expected = xe_pte_encode(m->q->vm, pt, 0, xe_pat_get_index(xe, XE_CACHE_WB), 0);
>>   	if (m->q->vm->flags & XE_VM_FLAG_64K)
>>   		expected |= XE_PTE_PS64;
>>   	if (xe_bo_is_vram(pt))
>> diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
>> index aea26afd4668..7e4da16389af 100644
>> --- a/drivers/gpu/drm/xe/xe_ggtt.c
>> +++ b/drivers/gpu/drm/xe/xe_ggtt.c
>> @@ -41,7 +41,8 @@ u64 xe_ggtt_pte_encode(struct xe_bo *bo, u64 bo_offset)
>>   		pte |= XE_GGTT_PTE_DM;
>>   
>>   	if ((ggtt->pat_encode).pte_encode)
>> -		pte = (ggtt->pat_encode).pte_encode(xe, pte, XE_CACHE_WB_1_WAY);
>> +		pte = (ggtt->pat_encode).pte_encode(xe, pte,
>> +						    xe_pat_get_index(xe, XE_CACHE_WB_1_WAY));
>>   
>>   	return pte;
>>   }
>> @@ -102,10 +103,8 @@ static void primelockdep(struct xe_ggtt *ggtt)
>>   }
>>   
>>   static u64 xelpg_ggtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
>> -						enum xe_cache_level cache)
>> +				     u16 pat_index)
>>   {
>> -	u32 pat_index = xe_pat_get_index(xe, cache);
>> -
>>   	pte_pat &= ~(XELPG_GGTT_PTE_PAT_MASK);
>>   
>>   	if (pat_index & BIT(0))
>> diff --git a/drivers/gpu/drm/xe/xe_ggtt_types.h b/drivers/gpu/drm/xe/xe_ggtt_types.h
>> index 7e55fac1a8a9..7981075bb228 100644
>> --- a/drivers/gpu/drm/xe/xe_ggtt_types.h
>> +++ b/drivers/gpu/drm/xe/xe_ggtt_types.h
>> @@ -31,7 +31,7 @@ struct xe_ggtt {
>>   
>>   	struct {
>>   		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat,
>> -						enum xe_cache_level cache);
>> +				  u16 pat_index);
>>   	} pat_encode;
>>   };
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
>> index 26cbc9107501..89d9e33a07e7 100644
>> --- a/drivers/gpu/drm/xe/xe_migrate.c
>> +++ b/drivers/gpu/drm/xe/xe_migrate.c
>> @@ -25,6 +25,7 @@
>>   #include "xe_lrc.h"
>>   #include "xe_map.h"
>>   #include "xe_mocs.h"
>> +#include "xe_pat.h"
>>   #include "xe_pt.h"
>>   #include "xe_res_cursor.h"
>>   #include "xe_sched_job.h"
>> @@ -162,6 +163,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>>   	u32 num_entries = NUM_PT_SLOTS, num_level = vm->pt_root[id]->level;
>>   	u32 map_ofs, level, i;
>>   	struct xe_bo *bo, *batch = tile->mem.kernel_bb_pool->bo;
>> +	u16 pat_index = xe_pat_get_index(xe, XE_CACHE_WB);
>>   	u64 entry;
>>   	int ret;
>>   
>> @@ -196,7 +198,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>>   
>>   	/* Map the entire BO in our level 0 pt */
>>   	for (i = 0, level = 0; i < num_entries; level++) {
>> -		entry = xe_pte_encode(vm, bo, i * XE_PAGE_SIZE, XE_CACHE_WB, 0);
>> +		entry = xe_pte_encode(vm, bo, i * XE_PAGE_SIZE, pat_index, 0);
>>   
>>   		xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64, entry);
>>   
>> @@ -214,7 +216,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>>   		for (i = 0; i < batch->size;
>>   		     i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
>>   		     XE_PAGE_SIZE) {
>> -			entry = xe_pte_encode(vm, batch, i, XE_CACHE_WB, 0);
>> +			entry = xe_pte_encode(vm, batch, i, pat_index, 0);
>>   
>>   			xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64,
>>   				  entry);
>> @@ -259,7 +261,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>>   		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
>>   
>>   		flags = XE_PPGTT_PTE_DM;
>> -		flags = __xe_pte_encode(flags, XE_CACHE_WB, vm, NULL, 2);
>> +		flags = __xe_pte_encode(flags, pat_index, vm, NULL, 2);
>>   
>>   		/*
>>   		 * Use 1GB pages, it shouldn't matter the physical amount of
>> @@ -454,6 +456,7 @@ static void emit_pte(struct xe_migrate *m,
>>   		     struct xe_res_cursor *cur,
>>   		     u32 size, struct xe_bo *bo)
>>   {
>> +	u16 pat_index = xe_pat_get_index(m->tile->xe, XE_CACHE_WB);
>>   	u32 ptes;
>>   	u64 ofs = at_pt * XE_PAGE_SIZE;
>>   	u64 cur_ofs;
>> @@ -494,7 +497,7 @@ static void emit_pte(struct xe_migrate *m,
>>   				addr += vram_region_gpu_offset(bo->ttm.resource);
>>   				addr |= XE_PPGTT_PTE_DM;
>>   			}
>> -			addr = __xe_pte_encode(addr, XE_CACHE_WB, m->q->vm, NULL, 0);
>> +			addr = __xe_pte_encode(addr, pat_index, m->q->vm, NULL, 0);
>>   			bb->cs[bb->len++] = lower_32_bits(addr);
>>   			bb->cs[bb->len++] = upper_32_bits(addr);
>>   
>> @@ -1254,7 +1257,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
>>   
>>   			xe_tile_assert(tile, pt_bo->size == SZ_4K);
>>   
>> -			addr = xe_pte_encode(vm, pt_bo, 0, XE_CACHE_WB, 0);
>> +			addr = xe_pte_encode(vm, pt_bo, 0, xe_pat_get_index(xe, XE_CACHE_WB), 0);
>>   			bb->cs[bb->len++] = lower_32_bits(addr);
>>   			bb->cs[bb->len++] = upper_32_bits(addr);
>>   		}
>> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
>> index a1b164cf8bce..7dd93cbff704 100644
>> --- a/drivers/gpu/drm/xe/xe_pt.c
>> +++ b/drivers/gpu/drm/xe/xe_pt.c
>> @@ -10,6 +10,7 @@
>>   #include "xe_gt.h"
>>   #include "xe_gt_tlb_invalidation.h"
>>   #include "xe_migrate.h"
>> +#include "xe_pat.h"
>>   #include "xe_pt_types.h"
>>   #include "xe_pt_walk.h"
>>   #include "xe_res_cursor.h"
>> @@ -67,7 +68,7 @@ u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset)
>>   	return pde;
>>   }
>>   
>> -u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
>> +u64 __xe_pte_encode(u64 pte, u16 pat_index,
>>   		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level)
>>   {
>>   	struct xe_device *xe = vm->xe;
>> @@ -85,7 +86,7 @@ u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
>>   	else if (pt_level == 2)
>>   		pte |= XE_PDPE_PS_1G;
>>   
>> -	pte = vm->pat_encode.pte_encode(xe, pte, cache);
>> +	pte = vm->pat_encode.pte_encode(xe, pte, pat_index);
>>   
>>   	/* XXX: Does hw support 1 GiB pages? */
>>   	XE_WARN_ON(pt_level > 2);
>> @@ -103,7 +104,7 @@ u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
>>    *
>>    * Return: An encoded page-table entry. No errors.
>>    */
>> -u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
>> +u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, u16 pat_index,
>>   		  u32 pt_level)
>>   {
>>   	u64 pte;
>> @@ -112,7 +113,7 @@ u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_
>>   	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
>>   		pte |= XE_PPGTT_PTE_DM;
>>   
>> -	return __xe_pte_encode(pte, cache, vm, NULL, pt_level);
>> +	return __xe_pte_encode(pte, pat_index, vm, NULL, pt_level);
>>   }
>>   
>>   static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
>> @@ -125,7 +126,7 @@ static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
>>   
>>   	if (level == 0) {
>>   		u64 empty = xe_pte_encode(vm, vm->scratch_bo[id], 0,
>> -					  XE_CACHE_WB, 0);
>> +					  xe_pat_get_index(vm->xe, XE_CACHE_WB), 0);
>>   
>>   		return empty;
>>   	} else {
>> @@ -358,8 +359,6 @@ struct xe_pt_stage_bind_walk {
>>   	struct xe_vm *vm;
>>   	/** @tile: The tile we're building for. */
>>   	struct xe_tile *tile;
>> -	/** @cache: Desired cache level for the ptes */
>> -	enum xe_cache_level cache;
>>   	/** @default_pte: PTE flag only template. No address is associated */
>>   	u64 default_pte;
>>   	/** @dma_offset: DMA offset to add to the PTE. */
>> @@ -594,7 +593,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>>   
>>   		pte = __xe_pte_encode(is_null ? 0 :
>>   				      xe_res_dma(curs) + xe_walk->dma_offset,
>> -				      xe_walk->cache, xe_walk->vm, xe_walk->vma, level);
>> +				      xe_walk->vma->pat_index, xe_walk->vm, xe_walk->vma, level);
>>   		pte |= xe_walk->default_pte;
>>   
>>   		/*
>> @@ -720,13 +719,8 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
>>   		if (vma && vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT)
>>   			xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
>>   		xe_walk.dma_offset = vram_region_gpu_offset(bo->ttm.resource);
>> -		xe_walk.cache = XE_CACHE_WB;
>> -	} else {
>> -		if (!xe_vma_has_no_bo(vma) && bo->flags & XE_BO_SCANOUT_BIT)
>> -			xe_walk.cache = XE_CACHE_WT;
>> -		else
>> -			xe_walk.cache = XE_CACHE_WB;
>>   	}
>> +
>>   	if (!xe_vma_has_no_bo(vma) && xe_bo_is_stolen(bo))
>>   		xe_walk.dma_offset = xe_ttm_stolen_gpu_offset(xe_bo_device(bo));
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
>> index 0e66436d707d..6d10823fca9b 100644
>> --- a/drivers/gpu/drm/xe/xe_pt.h
>> +++ b/drivers/gpu/drm/xe/xe_pt.h
>> @@ -47,9 +47,9 @@ bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
>>   
>>   u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset);
>>   
>> -u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
>> +u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, u16 pat_index,
>>   		  u32 pt_level);
>> -u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
>> +u64 __xe_pte_encode(u64 pte, u16 pat_index,
>>   		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level);
>>   
>>   #endif
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index ba612a5ee2d8..98db7a298139 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -6,6 +6,7 @@
>>   #include "xe_vm.h"
>>   
>>   #include <linux/dma-fence-array.h>
>> +#include <linux/nospec.h>
>>   
>>   #include <drm/drm_exec.h>
>>   #include <drm/drm_print.h>
>> @@ -858,7 +859,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>>   				    u64 start, u64 end,
>>   				    bool read_only,
>>   				    bool is_null,
>> -				    u8 tile_mask)
>> +				    u8 tile_mask,
>> +				    u16 pat_index)
>>   {
>>   	struct xe_vma *vma;
>>   	struct xe_tile *tile;
>> @@ -897,6 +899,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>>   			vma->tile_mask |= 0x1 << id;
>>   	}
>>   
>> +	vma->pat_index = pat_index;
>> +
>>   	if (vm->xe->info.platform == XE_PVC)
>>   		vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>>   
>> @@ -1195,10 +1199,8 @@ static void xe_vma_op_work_func(struct work_struct *w);
>>   static void vm_destroy_work_func(struct work_struct *w);
>>   
>>   static u64 xe2_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
>> -						enum xe_cache_level cache)
>> +				    u16 pat_index)
>>   {
>> -	u32 pat_index = xe_pat_get_index(xe, cache);
>> -
>>   	if (pat_index & BIT(0))
>>   		pte_pat |= BIT(3);
>>   
>> @@ -1216,10 +1218,8 @@ static u64 xe2_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
>>   }
>>   
>>   static u64 xelp_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
>> -						enum xe_cache_level cache)
>> +				     u16 pat_index)
>>   {
>> -	u32 pat_index = xe_pat_get_index(xe, cache);
>> -
>>   	if (pat_index & BIT(0))
>>   		pte_pat |= BIT(3);
>>   
>> @@ -2300,7 +2300,7 @@ static void print_op(struct xe_device *xe, struct drm_gpuva_op *op)
>>   static struct drm_gpuva_ops *
>>   vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>>   			 u64 bo_offset_or_userptr, u64 addr, u64 range,
>> -			 u32 operation, u8 tile_mask, u32 region)
>> +			 u32 operation, u8 tile_mask, u32 region, u16 pat_index)
>>   {
>>   	struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
>>   	struct drm_gpuva_ops *ops;
>> @@ -2327,6 +2327,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>>   			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
>>   
>>   			op->tile_mask = tile_mask;
>> +			op->pat_index = pat_index;
>>   			op->map.immediate =
>>   				operation & XE_VM_BIND_FLAG_IMMEDIATE;
>>   			op->map.read_only =
>> @@ -2354,6 +2355,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>>   			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
>>   
>>   			op->tile_mask = tile_mask;
>> +			op->pat_index = pat_index;
>>   			op->prefetch.region = region;
>>   		}
>>   		break;
>> @@ -2396,7 +2398,8 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>>   }
>>   
>>   static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
>> -			      u8 tile_mask, bool read_only, bool is_null)
>> +			      u8 tile_mask, bool read_only, bool is_null,
>> +			      u16 pat_index)
>>   {
>>   	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
>>   	struct xe_vma *vma;
>> @@ -2412,7 +2415,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
>>   	vma = xe_vma_create(vm, bo, op->gem.offset,
>>   			    op->va.addr, op->va.addr +
>>   			    op->va.range - 1, read_only, is_null,
>> -			    tile_mask);
>> +			    tile_mask, pat_index);
>>   	if (bo)
>>   		xe_bo_unlock(bo);
>>   
>> @@ -2569,7 +2572,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>>   
>>   			vma = new_vma(vm, &op->base.map,
>>   				      op->tile_mask, op->map.read_only,
>> -				      op->map.is_null);
>> +				      op->map.is_null, op->pat_index);
>>   			if (IS_ERR(vma)) {
>>   				err = PTR_ERR(vma);
>>   				goto free_fence;
>> @@ -2597,7 +2600,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>>   
>>   				vma = new_vma(vm, op->base.remap.prev,
>>   					      op->tile_mask, read_only,
>> -					      is_null);
>> +					      is_null, op->pat_index);
>>   				if (IS_ERR(vma)) {
>>   					err = PTR_ERR(vma);
>>   					goto free_fence;
>> @@ -2633,7 +2636,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>>   
>>   				vma = new_vma(vm, op->base.remap.next,
>>   					      op->tile_mask, read_only,
>> -					      is_null);
>> +					      is_null, op->pat_index);
>>   				if (IS_ERR(vma)) {
>>   					err = PTR_ERR(vma);
>>   					goto free_fence;
>> @@ -3146,7 +3149,23 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
>>   		u32 obj = (*bind_ops)[i].obj;
>>   		u64 obj_offset = (*bind_ops)[i].obj_offset;
>>   		u32 region = (*bind_ops)[i].region;
>> +		u16 pat_index = (*bind_ops)[i].pat_index;
>>   		bool is_null = op & XE_VM_BIND_FLAG_NULL;
>> +		u16 coh_mode;
>> +
>> +		if (XE_IOCTL_DBG(xe, pat_index >= xe->info.pat.n_entries)) {
>> +			err = -EINVAL;
>> +			goto free_bind_ops;
>> +		}
>> +
>> +		pat_index = array_index_nospec(pat_index,
>> +					       xe->info.pat.n_entries);
>> +		(*bind_ops)[i].pat_index = pat_index;
>> +		coh_mode = xe_pat_index_get_coh_mode(xe, pat_index);
>> +		if (XE_WARN_ON(!coh_mode || coh_mode > XE_GEM_COH_AT_LEAST_1WAY)) {
>> +			err = -EINVAL;
>> +			goto free_bind_ops;
>> +		}
>>   
>>   		if (i == 0) {
>>   			*async = !!(op & XE_VM_BIND_FLAG_ASYNC);
>> @@ -3188,6 +3207,8 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
>>   				 VM_BIND_OP(op) == XE_VM_BIND_OP_UNMAP_ALL) ||
>>   		    XE_IOCTL_DBG(xe, obj &&
>>   				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP_USERPTR) ||
>> +		    XE_IOCTL_DBG(xe, coh_mode == XE_GEM_COH_NONE &&
>> +				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP_USERPTR) ||
>>   		    XE_IOCTL_DBG(xe, obj &&
>>   				 VM_BIND_OP(op) == XE_VM_BIND_OP_PREFETCH) ||
>>   		    XE_IOCTL_DBG(xe, region &&
>> @@ -3336,6 +3357,8 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>>   		u64 addr = bind_ops[i].addr;
>>   		u32 obj = bind_ops[i].obj;
>>   		u64 obj_offset = bind_ops[i].obj_offset;
>> +		u16 pat_index = bind_ops[i].pat_index;
>> +		u16 coh_mode;
>>   
>>   		if (!obj)
>>   			continue;
>> @@ -3363,6 +3386,23 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>>   				goto put_obj;
>>   			}
>>   		}
>> +
>> +		coh_mode = xe_pat_index_get_coh_mode(xe, pat_index);
>> +		if (bos[i]->coh_mode) {
>> +			if (XE_IOCTL_DBG(xe, coh_mode < bos[i]->coh_mode)) {
>> +				err = -EINVAL;
>> +				goto put_obj;
>> +			}
>> +		} else if (XE_IOCTL_DBG(xe, coh_mode == XE_GEM_COH_NONE)) {
>> +			/*
>> +			 * Imported dma-buf from a different device should
>> +			 * require 1way or 2way coherency since we don't know
>> +			 * how it was mapped on the CPU. Just assume is it
>> +			 * potentially cached on CPU side.
>> +			 */
>> +			err = -EINVAL;
>> +			goto put_obj;
>> +		}
>>   	}
>>   
>>   	if (args->num_syncs) {
>> @@ -3400,10 +3440,11 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>>   		u64 obj_offset = bind_ops[i].obj_offset;
>>   		u8 tile_mask = bind_ops[i].tile_mask;
>>   		u32 region = bind_ops[i].region;
>> +		u16 pat_index = bind_ops[i].pat_index;
>>   
>>   		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset,
>>   						  addr, range, op, tile_mask,
>> -						  region);
>> +						  region, pat_index);
>>   		if (IS_ERR(ops[i])) {
>>   			err = PTR_ERR(ops[i]);
>>   			ops[i] = NULL;
>> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
>> index dc583f00919f..54658f400174 100644
>> --- a/drivers/gpu/drm/xe/xe_vm_types.h
>> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
>> @@ -111,6 +111,11 @@ struct xe_vma {
>>   	 */
>>   	u8 tile_present;
>>   
>> +	/**
>> +	 * @pat_index: The pat index to use when encoding the PTEs for this vma.
>> +	 */
>> +	u16 pat_index;
>> +
>>   	struct {
>>   		struct list_head rebind_link;
>>   	} notifier;
>> @@ -338,8 +343,7 @@ struct xe_vm {
>>   	bool batch_invalidate_tlb;
>>   
>>   	struct {
>> -		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat,
>> -				  enum xe_cache_level cache);
>> +		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat, u16 pat_index);
>>   	} pat_encode;
>>   };
>>   
>> @@ -419,6 +423,8 @@ struct xe_vma_op {
>>   	struct async_op_fence *fence;
>>   	/** @tile_mask: gt mask for this operation */
>>   	u8 tile_mask;
>> +	/** @pat_index: The pat index to use for this operation. */
>> +	u16 pat_index;
>>   	/** @flags: operation flags */
>>   	enum xe_vma_op_flags flags;
>>   
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index 737bb1d4c6f7..75b42c1116f2 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -605,8 +605,49 @@ struct drm_xe_vm_bind_op {
>>   	 */
>>   	__u32 obj;
>>   
>> +	/**
>> +	 * @pat_index: The platform defined @pat_index to use for this mapping.
>> +	 * The index basically maps to some predefined memory attributes,
>> +	 * including things like caching, coherency, compression etc.  The exact
>> +	 * meaning of the pat_index is platform specific and defined in the
>> +	 * Bspec and PRMs.  When the KMD sets up the binding the index here is
>> +	 * encoded into the ppGTT PTE.
>> +	 *
>> +	 * For coherency the @pat_index needs to be least as coherent as
>> +	 * drm_xe_gem_create.coh_mode. i.e coh_mode(pat_index) >=
>> +	 * drm_xe_gem_create.coh_mode. The KMD will extract the coherency mode
>> +	 * from the @pat_index and reject if there is a mismatch (see note below
>> +	 * for pre-MTL platforms).
>> +	 *
>> +	 * Note: On pre-MTL platforms there is only a caching mode and no
>> +	 * explicit coherency mode, but on such hardware there is always a
>> +	 * shared-LLC (or is dgpu) so all GT memory accesses are coherent with
>> +	 * CPU caches even with the caching mode set as uncached.  It's only the
>> +	 * display engine that is incoherent (on dgpu it must be in VRAM which
>> +	 * is always mapped as WC on the CPU). However to keep the uapi somewhat
>> +	 * consistent with newer platforms the KMD groups the different cache
>> +	 * levels into the following coherency buckets on all pre-MTL platforms:
>> +	 *
>> +	 *	ppGTT UC -> XE_GEM_COH_NONE
>> +	 *	ppGTT WC -> XE_GEM_COH_NONE
>> +	 *	ppGTT WT -> XE_GEM_COH_NONE
>> +	 *	ppGTT WB -> XE_GEM_COH_AT_LEAST_1WAY
>> +	 *
>> +	 * In practice UC/WC/WT should only ever used for scanout surfaces on
>> +	 * such platforms (or perhaps in general for dma-buf if shared with
>> +	 * another device) since it is only the display engine that is actually
>> +	 * incoherent.  Everything else should typically use WB given that we
>> +	 * have a shared-LLC.  On MTL+ this completely changes and the HW
>> +	 * defines the coherency mode as part of the @pat_index, where
>> +	 * incoherent GT access is possible.
>> +	 *
>> +	 * Note: For userptr and externally imported dma-buf the kernel expects
>> +	 * either 1WAY or 2WAY for the @pat_index.
>> +	 */
>> +	__u16 pat_index;
>> +
>>   	/** @pad: MBZ */
>> -	__u32 pad;
>> +	__u16 pad;
>>   
>>   	union {
>>   		/**
>> -- 
>> 2.41.0
>>
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-xe] [PATCH v2 0/6] PAT and cache coherency support
  2023-09-21 17:19   ` Souza, Jose
@ 2023-09-25 13:12     ` Matthew Auld
  0 siblings, 0 replies; 28+ messages in thread
From: Matthew Auld @ 2023-09-25 13:12 UTC (permalink / raw)
  To: Souza, Jose, intel-xe@lists.freedesktop.org

On 21/09/2023 18:19, Souza, Jose wrote:
> On Mon, 2023-09-18 at 15:51 +0000, Souza, Jose wrote:
>> On Thu, 2023-09-14 at 16:31 +0100, Matthew Auld wrote:
>>> Branch available here (lightly tested):
>>> https://gitlab.freedesktop.org/mwa/kernel/-/tree/xe-pat-index?ref_type=heads
>>>
>>> Series still needs some more testing. Also note that the series directly depends
>>> on the WIP patch here: https://patchwork.freedesktop.org/series/122708/
>>>
>>> Goal here is to allow userspace to directly control the pat_index when mapping
>>> memory via the ppGTT, in addtion to the CPU caching mode for system memory. This
>>> is very much needed on newer igpu platforms which allow incoherent GT access,
>>> where the choice over the cache level and expected coherency is best left to
>>> userspace depending on their usecase.  In the future there may also be other
>>> stuff encoded in the pat_index, so giving userspace direct control will also be
>>> needed there.
>>>
>>> To support this we added new gem_create uAPI for selecting the CPU cache
>>> mode to use for system memory, including the expected GPU coherency mode. There
>>> are various restrictions here for the selected coherency mode and compatible CPU
>>> cache modes.  With that in place the actual pat_index can now be provided as
>>> part of vm_bind. The only restriction is that the coherency mode of the
>>> pat_index must be at least as coherent as the gem_create coherency mode. There
>>> are also some special cases like with userptr and dma-buf.
>>>
>>> v2:
>>>    - Loads of improvements/tweaks. Main changes are to now allow
>>>      gem_create.coh_mode <= coh_mode(pat_index), rather than it needing to match
>>>      exactly. This simplifies the dma-buf policy from userspace pov. Also we now
>>>      only consider COH_NONE and COH_AT_LEAST_1WAY.
>>>
>>
>>
>> Getting constant DMAR errors after loading Xe KMD on TGL with your branch in framebuffer console, logs attached.
>>
>>
> 
> Another issue report, when starting Xorg I'm getting this KMD crash with your branch:

Thanks for the reports Jose. Hopefully both issues are now fixed. Just 
pushed an updated branch.

> 
> [ 2376.624393] xe 0000:00:02.0: [drm:intel_hdmi_detect [xe]] [CONNECTOR:347:HDMI-A-3]
> [ 2376.624465] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:347:HDMI-A-3] disconnected
> [ 2376.726753] xe 0000:00:02.0: [drm:intel_power_well_disable [xe]] disabling TC_cold_off
> [ 2376.727183] xe 0000:00:02.0: [drm:__intel_display_power_put_domain [xe]] TC cold unblock succeeded
> [ 2378.896672] dmar_fault: 915847 callbacks suppressed
> [ 2378.896675] DMAR: DRHD: handling fault status reg 3
> [ 2378.896684] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70600000 [fault reason 0x0c] non-zero reserved fields in PTE
> [ 2378.896711] DMAR: DRHD: handling fault status reg 3
> [ 2378.896715] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70603000 [fault reason 0x0c] non-zero reserved fields in PTE
> [ 2378.896722] DMAR: DRHD: handling fault status reg 3
> [ 2378.896726] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70607000 [fault reason 0x0c] non-zero reserved fields in PTE
> [ 2378.896737] DMAR: DRHD: handling fault status reg 3
> [ 2379.479148] xe 0000:00:02.0: [drm:drm_mode_addfb2] [FB:353]
> [ 2379.480368] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A]   level *wm0,*wm1,*wm2,*wm3,*wm4,*wm5,*wm6,*wm7,*twm,*swm,*stwm ->
> *wm0,*wm1,*wm2,*wm3,*wm4,*wm5,*wm6,*wm7,*twm,*swm,*stwm
> [ 2379.480464] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A]   lines    1,   4,   4,   4,   4,   5,   8,   8,   0,   2,    0 ->    4,
> 4,   4,   4,   4,   5,   8,   8,   0,   4,    0
> [ 2379.480535] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A]  blocks   16,  65,  65,  65,  65,  81, 129, 129,  30,  19,   33 ->   62,
> 62,  62,  62,  62,  78, 123, 123, 137,  62,  137
> [ 2379.480604] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A] min_ddb   19,  73,  73,  73,  73,  91, 143, 143,  31,  22,   34 ->  123,
> 123, 123, 123, 123, 184, 184, 184, 138, 123,  138
> [ 2379.481280] BUG: kernel NULL pointer dereference, address: 0000000000000068
> [ 2379.481286] #PF: supervisor read access in kernel mode
> [ 2379.481289] #PF: error_code(0x0000) - not-present page
> [ 2379.481291] PGD 0 P4D 0
> [ 2379.481296] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [ 2379.481300] CPU: 7 PID: 24658 Comm: gnome-shell Not tainted 6.5.0-rc7+zeh-xe+ #1108
> [ 2379.481304] Hardware name: Dell Inc. Latitude 5420/01M3M4, BIOS 1.27.0 03/17/2023
> [ 2379.481306] RIP: 0010:xe_ggtt_pte_encode+0x1c/0x90 [xe]
> [ 2379.481382] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 41 55 ba 00 10 00 00 41 54 55 53 48 8b 87 d0 02 00 00 48 89 fb 4c 8b a7 20 02 00 00
> <4c> 8b 68 68 e8 bb 4e ff ff 48 89 df 48 89 c5 e8 20 24 ff ff 84 c0
> [ 2379.481385] RSP: 0018:ffffc90001b0bb20 EFLAGS: 00010206
> [ 2379.481390] RAX: 0000000000000000 RBX: ffff8881071fe800 RCX: 0000000000000000
> [ 2379.481394] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff8881071fe800
> [ 2379.481396] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
> [ 2379.481397] R10: 0000000000000001 R11: 0000000000000659 R12: ffff8881133f0f78
> [ 2379.481399] R13: 0000000000001000 R14: 0000000000809000 R15: ffff888134feb850
> [ 2379.481400] FS:  00007f47ff7335c0(0000) GS:ffff888287b80000(0000) knlGS:0000000000000000
> [ 2379.481402] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2379.481404] CR2: 0000000000000068 CR3: 0000000183252001 CR4: 0000000000770ee0
> [ 2379.481406] PKRU: 55555554
> [ 2379.481407] Call Trace:
> [ 2379.481409]  <TASK>
> [ 2379.481411]  ? __die+0x1a/0x60
> [ 2379.481415]  ? page_fault_oops+0x158/0x450
> [ 2379.481419]  ? drm_atomic_commit+0x8e/0xc0
> [ 2379.481423]  ? drm_mode_atomic_ioctl+0x96a/0xbd0
> [ 2379.481426]  ? drm_ioctl+0x212/0x470
> [ 2379.481428]  ? do_user_addr_fault+0x61/0x7c0
> [ 2379.481432]  ? exc_page_fault+0x6a/0x1b0
> [ 2379.481436]  ? asm_exc_page_fault+0x22/0x30
> [ 2379.481440]  ? xe_ggtt_pte_encode+0x1c/0x90 [xe]
> [ 2379.481492]  __xe_pin_fb_vma+0x396/0x840 [xe]
> [ 2379.481570]  intel_plane_pin_fb+0x34/0x90 [xe]
> [ 2379.481647]  intel_prepare_plane_fb+0x2c/0x70 [xe]
> [ 2379.481753]  drm_atomic_helper_prepare_planes+0x6b/0x210
> [ 2379.481764]  intel_atomic_commit+0x4d/0x360 [xe]
> [ 2379.481885]  drm_atomic_commit+0x8e/0xc0
> [ 2379.481889]  ? __pfx___drm_printfn_info+0x10/0x10
> [ 2379.481894]  drm_mode_atomic_ioctl+0x96a/0xbd0
> [ 2379.481902]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
> [ 2379.481906]  drm_ioctl_kernel+0xc0/0x170
> [ 2379.481909]  drm_ioctl+0x212/0x470
> [ 2379.481912]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
> [ 2379.481918]  __x64_sys_ioctl+0x8d/0xb0
> [ 2379.481924]  do_syscall_64+0x38/0x90
> [ 2379.481928]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [ 2379.481932] RIP: 0033:0x7f4802b1aaff
> [ 2379.481935] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05
> <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
> [ 2379.481939] RSP: 002b:00007ffc8bafb730 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 2379.481943] RAX: ffffffffffffffda RBX: 00007ffc8bafb7d0 RCX: 00007f4802b1aaff
> [ 2379.481946] RDX: 00007ffc8bafb7d0 RSI: 00000000c03864bc RDI: 0000000000000009
> [ 2379.481948] RBP: 00000000c03864bc R08: 0000000000000026 R09: 0000000000000026
> [ 2379.481950] R10: 0000000000000001 R11: 0000000000000246 R12: 000055fe14331f40
> [ 2379.481953] R13: 0000000000000009 R14: 000055fe1430f4c0 R15: 000055fe1430d6f0
> [ 2379.481958]  </TASK>
> [ 2379.481959] Modules linked in: xe drm_ttm_helper drm_exec gpu_sched drm_suballoc_helper i2c_algo_bit drm_buddy ttm drm_display_helper btusb btrtl
> btbcm btintel bluetooth snd_hda_codec_hdmi cdc_ncm cdc_ether usbnet mii ecdh_generic ecc snd_ctl_led mei_pxp mei_hdcp snd_hda_codec_realtek
> snd_hda_codec_generic ledtrig_audio wmi_bmof x86_pkg_temp_thermal snd_hda_intel coretemp crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec
> ghash_clmulni_intel snd_hwdep snd_hda_core e1000e kvm_intel video ptp snd_pcm i2c_i801 mei_me pps_core i2c_smbus mei wmi pinctrl_tigerlake fuse
> [ 2379.482015] CR2: 0000000000000068
> [ 2379.482018] ---[ end trace 0000000000000000 ]---
> [ 2379.661641] xe 0000:00:02.0: [drm:intel_pps_vdd_off_sync_unlocked [xe]] [ENCODER:307:DDI A/PHY A] PPS 0 turning VDD off
> [ 2379.661861] xe 0000:00:02.0: [drm:intel_pps_vdd_off_sync_unlocked [xe]] [ENCODER:307:DDI A/PHY A] PPS 0 PP_STATUS: 0x80000008 PP_CONTROL:
> 0x00000067
> [ 2379.873152] RIP: 0010:xe_ggtt_pte_encode+0x1c/0x90 [xe]
> [ 2379.873325] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 41 55 ba 00 10 00 00 41 54 55 53 48 8b 87 d0 02 00 00 48 89 fb 4c 8b a7 20 02 00 00
> <4c> 8b 68 68 e8 bb 4e ff ff 48 89 df 48 89 c5 e8 20 24 ff ff 84 c0
> [ 2379.873328] RSP: 0018:ffffc90001b0bb20 EFLAGS: 00010206
> [ 2379.873330] RAX: 0000000000000000 RBX: ffff8881071fe800 RCX: 0000000000000000
> [ 2379.873332] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff8881071fe800
> [ 2379.873333] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
> [ 2379.873334] R10: 0000000000000001 R11: 0000000000000659 R12: ffff8881133f0f78
> [ 2379.873335] R13: 0000000000001000 R14: 0000000000809000 R15: ffff888134feb850
> [ 2379.873336] FS:  00007f47ff7335c0(0000) GS:ffff888287b80000(0000) knlGS:0000000000000000
> [ 2379.873338] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2379.873339] CR2: 0000000000000068 CR3: 0000000183252001 CR4: 0000000000770ee0
> [ 2379.873340] PKRU: 55555554
> [ 2379.873342] note: gnome-shell[24658] exited with irqs disabled
> [ 2383.896731] dmar_fault: 1159924 callbacks suppressed
> [ 2383.896733] DMAR: DRHD: handling fault status reg 3
> [ 2383.896739] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70617000 [fault reason 0x0c] non-zero reserved fields in PTE
> [ 2383.896749] DMAR: DRHD: handling fault status reg 3
> [ 2383.896751] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x70619000 [fault reason 0x0c] non-zero reserved fields in PTE
> [ 2383.896757] DMAR: DRHD: handling fault status reg 3
> [ 2383.896759] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x7061b000 [fault reason 0x0c] non-zero reserved fields in PTE
> [ 2383.896762] DMAR: DRHD: handling fault status reg 2
> [ 2388.897730] dmar_fault: 1298750 callbacks suppressed
> [ 2388.897733] DMAR: DRHD: handling fault status reg 3
> [ 2388.897738] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x706a5000 [fault reason 0x0c] non-zero reserved fields in PTE
> [ 2388.897747] DMAR: DRHD: handling fault status reg 3
> [ 2388.897748] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x706a6000 [fault reason 0x0c] non-zero reserved fields in PTE
> [ 2388.897752] DMAR: DRHD: handling fault status reg 3
> [ 2388.897754] DMAR: [DMA Read NO_PASID] Request device [00:02.0] fault addr 0x706a8000 [fault reason 0x0c] non-zero reserved fields in PTE
> [ 2388.897757] DMAR: DRHD: handling fault status reg 3
> [ 2393.898732] dmar_fault: 1164851 callbacks suppressed
> 
> 
> 
> This might help debug:
> (gdb) list *(xe_ggtt_pte_encode+0x1c)
> 0x101fc is in xe_ggtt_pte_encode (drivers/gpu/drm/xe/xe_ggtt.c:34).
> 29	#define GUC_GGTT_TOP	0xFEE00000
> 30
> 31	u64 xe_ggtt_pte_encode(struct xe_bo *bo, u64 bo_offset)
> 32	{
> 33	        struct xe_device *xe = xe_bo_device(bo);
> 34	        struct xe_ggtt *ggtt = (bo->tile)->mem.ggtt;
> 35	        u64 pte;
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-xe] [PATCH v2 1/6] drm/xe/uapi: Add support for cache and coherency mode
  2023-09-25  8:06     ` Matthew Auld
@ 2023-09-25 18:26       ` Souza, Jose
  2023-09-26  8:07         ` Matthew Auld
  0 siblings, 1 reply; 28+ messages in thread
From: Souza, Jose @ 2023-09-25 18:26 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Auld,  Matthew

On Mon, 2023-09-25 at 09:06 +0100, Matthew Auld wrote:
> On 21/09/2023 21:07, Souza, Jose wrote:
> > On Thu, 2023-09-14 at 16:31 +0100, Matthew Auld wrote:
> > > From: Pallavi Mishra <pallavi.mishra@intel.com>
> > > 
> > > Allow userspace to specify the CPU caching mode to use for system memory
> > > in addition to coherency modes during object creation. Modify gem create
> > > handler and introduce xe_bo_create_user to replace xe_bo_create. In a
> > > later patch we will support setting the pat_index as part of vm_bind,
> > > where expectation is that the coherency mode extracted from the
> > > pat_index must match the one set at object creation.
> > > 
> > > v2
> > >    - s/smem_caching/smem_cpu_caching/ and
> > >      s/XE_GEM_CACHING/XE_GEM_CPU_CACHING/. (Matt Roper)
> > >    - Drop COH_2WAY and just use COH_NONE + COH_AT_LEAST_1WAY; KMD mostly
> > >      just cares that zeroing/swap-in can't be bypassed with the given
> > >      smem_caching mode. (Matt Roper)
> > >    - Fix broken range check for coh_mode and smem_cpu_caching and also
> > >      don't use constant value, but the already defined macros. (José)
> > >    - Prefer switch statement for smem_cpu_caching -> ttm_caching. (José)
> > >    - Add note in kernel-doc for dgpu and coherency modes for system
> > >      memory. (José)
> > > 
> > > Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
> > > Co-authored-by: Matthew Auld <matthew.auld@intel.com>
> > > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> > > Cc: Matt Roper <matthew.d.roper@intel.com>
> > > Cc: José Roberto de Souza <jose.souza@intel.com>
> > > Cc: Filip Hazubski <filip.hazubski@intel.com>
> > > Cc: Carl Zhang <carl.zhang@intel.com>
> > > Cc: Effie Yu <effie.yu@intel.com>
> > > ---
> > >   drivers/gpu/drm/xe/xe_bo.c       | 105 ++++++++++++++++++++++++++-----
> > >   drivers/gpu/drm/xe/xe_bo.h       |   3 +-
> > >   drivers/gpu/drm/xe/xe_bo_types.h |  10 +++
> > >   drivers/gpu/drm/xe/xe_dma_buf.c  |   5 +-
> > >   include/uapi/drm/xe_drm.h        |  57 ++++++++++++++++-
> > >   5 files changed, 158 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> > > index 27726d4f3423..f3facd788f15 100644
> > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > @@ -325,7 +325,7 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
> > >   	struct xe_device *xe = xe_bo_device(bo);
> > >   	struct xe_ttm_tt *tt;
> > >   	unsigned long extra_pages;
> > > -	enum ttm_caching caching = ttm_cached;
> > > +	enum ttm_caching caching;
> > >   	int err;
> > >   
> > >   	tt = kzalloc(sizeof(*tt), GFP_KERNEL);
> > > @@ -339,13 +339,25 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
> > >   		extra_pages = DIV_ROUND_UP(xe_device_ccs_bytes(xe, bo->size),
> > >   					   PAGE_SIZE);
> > >   
> > > +	switch (bo->smem_cpu_caching) {
> > > +	case XE_GEM_CPU_CACHING_WC:
> > > +		caching = ttm_write_combined;
> > > +		break;
> > > +	case XE_GEM_CPU_CACHING_UC:
> > > +		caching = ttm_uncached;
> > > +		break;
> > > +	default:
> > > +		caching = ttm_cached;
> > > +		break;
> > > +	}
> > > +
> > >   	/*
> > >   	 * Display scanout is always non-coherent with the CPU cache.
> > >   	 *
> > >   	 * For Xe_LPG and beyond, PPGTT PTE lookups are also non-coherent and
> > >   	 * require a CPU:WC mapping.
> > >   	 */
> > > -	if (bo->flags & XE_BO_SCANOUT_BIT ||
> > > +	if ((!bo->smem_cpu_caching && bo->flags & XE_BO_SCANOUT_BIT) ||
> > >   	    (xe->info.graphics_verx100 >= 1270 && bo->flags & XE_BO_PAGETABLE))
> > >   		caching = ttm_write_combined;
> > >   
> > > @@ -1184,9 +1196,10 @@ void xe_bo_free(struct xe_bo *bo)
> > >   	kfree(bo);
> > >   }
> > >   
> > > -struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> > > +struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> > >   				    struct xe_tile *tile, struct dma_resv *resv,
> > >   				    struct ttm_lru_bulk_move *bulk, size_t size,
> > > +				    u16 smem_cpu_caching, u16 coh_mode,
> > >   				    enum ttm_bo_type type, u32 flags)
> > >   {
> > >   	struct ttm_operation_ctx ctx = {
> > > @@ -1224,6 +1237,8 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> > >   	bo->tile = tile;
> > >   	bo->size = size;
> > >   	bo->flags = flags;
> > > +	bo->smem_cpu_caching = smem_cpu_caching;
> > > +	bo->coh_mode = coh_mode;
> > >   	bo->ttm.base.funcs = &xe_gem_object_funcs;
> > >   	bo->props.preferred_mem_class = XE_BO_PROPS_INVALID;
> > >   	bo->props.preferred_gt = XE_BO_PROPS_INVALID;
> > > @@ -1307,10 +1322,11 @@ static int __xe_bo_fixed_placement(struct xe_device *xe,
> > >   }
> > >   
> > >   struct xe_bo *
> > > -xe_bo_create_locked_range(struct xe_device *xe,
> > > -			  struct xe_tile *tile, struct xe_vm *vm,
> > > -			  size_t size, u64 start, u64 end,
> > > -			  enum ttm_bo_type type, u32 flags)
> > > +__xe_bo_create_locked(struct xe_device *xe,
> > > +		      struct xe_tile *tile, struct xe_vm *vm,
> > > +		      size_t size, u64 start, u64 end,
> > > +		      u16 smem_cpu_caching, u16 coh_mode,
> > > +		      enum ttm_bo_type type, u32 flags)
> > >   {
> > >   	struct xe_bo *bo = NULL;
> > >   	int err;
> > > @@ -1331,10 +1347,11 @@ xe_bo_create_locked_range(struct xe_device *xe,
> > >   		}
> > >   	}
> > >   
> > > -	bo = __xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
> > > +	bo = ___xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
> > >   				   vm && !xe_vm_in_fault_mode(vm) &&
> > >   				   flags & XE_BO_CREATE_USER_BIT ?
> > >   				   &vm->lru_bulk_move : NULL, size,
> > > +				   smem_cpu_caching, coh_mode,
> > >   				   type, flags);
> > >   	if (IS_ERR(bo))
> > >   		return bo;
> > > @@ -1368,11 +1385,35 @@ xe_bo_create_locked_range(struct xe_device *xe,
> > >   	return ERR_PTR(err);
> > >   }
> > >   
> > > +struct xe_bo *
> > > +xe_bo_create_locked_range(struct xe_device *xe,
> > > +			  struct xe_tile *tile, struct xe_vm *vm,
> > > +			  size_t size, u64 start, u64 end,
> > > +			  enum ttm_bo_type type, u32 flags)
> > > +{
> > > +	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);
> > > +}
> > > +
> > >   struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
> > >   				  struct xe_vm *vm, size_t size,
> > >   				  enum ttm_bo_type type, u32 flags)
> > >   {
> > > -	return xe_bo_create_locked_range(xe, tile, vm, size, 0, ~0ULL, type, flags);
> > > +	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);
> > > +}
> > > +
> > > +static struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
> > > +				       struct xe_vm *vm, size_t size,
> > > +				       u16 smem_cpu_caching, u16 coh_mode,
> > > +				       enum ttm_bo_type type,
> > > +				       u32 flags)
> > > +{
> > > +	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
> > > +						 smem_cpu_caching, coh_mode, type,
> > > +						 flags | XE_BO_CREATE_USER_BIT);
> > > +	if (!IS_ERR(bo))
> > > +		xe_bo_unlock_vm_held(bo);
> > > +
> > > +	return bo;
> > >   }
> > >   
> > >   struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
> > > @@ -1755,11 +1796,11 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
> > >   	struct drm_xe_gem_create *args = data;
> > >   	struct xe_vm *vm = NULL;
> > >   	struct xe_bo *bo;
> > > -	unsigned int bo_flags = XE_BO_CREATE_USER_BIT;
> > > +	unsigned int bo_flags;
> > >   	u32 handle;
> > >   	int err;
> > >   
> > > -	if (XE_IOCTL_DBG(xe, args->extensions) || XE_IOCTL_DBG(xe, args->pad) ||
> > > +	if (XE_IOCTL_DBG(xe, args->extensions) ||
> > >   	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
> > >   		return -EINVAL;
> > >   
> > > @@ -1801,6 +1842,32 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
> > >   		bo_flags |= XE_BO_NEEDS_CPU_ACCESS;
> > >   	}
> > >   
> > > +	if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY))
> > > +		return -EINVAL;
> > > +
> > > +	if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC))
> > > +		return -EINVAL;
> > > +
> > > +	if (bo_flags & XE_BO_CREATE_SYSTEM_BIT) {
> > > +		if (XE_IOCTL_DBG(xe, !args->coh_mode))
> > > +			return -EINVAL;
> > > +
> > > +		if (XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
> > > +			return -EINVAL;
> > > +
> > > +		if (XE_IOCTL_DBG(xe, !IS_DGFX(xe) &&
> > > +				 bo_flags & XE_BO_SCANOUT_BIT &&
> > > +				 args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
> > > +			return -EINVAL;
> > > +
> > > +		if (args->coh_mode == XE_GEM_COH_NONE) {
> > > +			if (XE_IOCTL_DBG(xe, args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
> > > +				return -EINVAL;
> > > +		}
> > > +	} else if (XE_IOCTL_DBG(xe, args->smem_cpu_caching)) {
> > 
> > should be XE_IOCTL_DBG(xe, !args->smem_cpu_caching).
> > 
> > uAPI don't say anything about allow smem_cpu_caching or coh_mode == 0, did this to be able to run tests without display in DG2:
> 
> The above check is for VRAM-only objects. For smem_cpu_caching the 
> kernel-doc says: "MUST be left as zero for VRAM-only objects." 
> Internally the KMD uses WC for CPU mapping VRAM which is out of the 
> control of userspace.

In my opinion this should be != 0 and match with the PAT index that will be set in VM bind.
Have not read much but I believe a CXL GPU memory would be mapped as WB to take advantage of CXL caching protocols.

So I believe the kernel doc restrictions should be removed and run time check for a (smem_cpu_caching != WC && is_dgfx()) return -EINVAL;

A question related to that, so a bo placed in lmem + smem can have a WB caching in DG2? How would that work? So far Mesa was handling that case as WC
as well.

> 
> coh_mode == 0 is not meant to be allowed, but looks like I missed the 
> check here for VRAM-only. Will fix.
> 
> > 
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> > index f3facd788f152..e0e4fefcd2060 100644
> > --- a/drivers/gpu/drm/xe/xe_bo.c
> > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > @@ -1796,7 +1796,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
> >          struct drm_xe_gem_create *args = data;
> >          struct xe_vm *vm = NULL;
> >          struct xe_bo *bo;
> > -       unsigned int bo_flags;
> > +       unsigned int bo_flags = 0;
> >          u32 handle;
> >          int err;
> > 
> > @@ -1842,19 +1842,15 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
> >                  bo_flags |= XE_BO_NEEDS_CPU_ACCESS;
> >          }
> > 
> > -       if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY))
> > +       if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY) ||
> > +           XE_IOCTL_DBG(xe, !args->coh_mode))
> >                  return -EINVAL;
> > 
> > -       if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC))
> > +       if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC) ||
> > +           XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
> >                  return -EINVAL;
> > 
> >          if (bo_flags & XE_BO_CREATE_SYSTEM_BIT) {
> > -               if (XE_IOCTL_DBG(xe, !args->coh_mode))
> > -                       return -EINVAL;
> > -
> > -               if (XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
> > -                       return -EINVAL;
> > -
> >                  if (XE_IOCTL_DBG(xe, !IS_DGFX(xe) &&
> >                                   bo_flags & XE_BO_SCANOUT_BIT &&
> >                                   args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
> > @@ -1864,8 +1860,6 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
> >                          if (XE_IOCTL_DBG(xe, args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
> >                                  return -EINVAL;
> >                  }
> > -       } else if (XE_IOCTL_DBG(xe, args->smem_cpu_caching)) {
> > -               return -EINVAL;
> >          }
> > 
> >          if (args->vm_id) {
> > 
> > 
> > > +		return -EINVAL;
> > > +	}
> > > +
> > >   	if (args->vm_id) {
> > >   		vm = xe_vm_lookup(xef, args->vm_id);
> > >   		if (XE_IOCTL_DBG(xe, !vm))
> > > @@ -1812,8 +1879,10 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
> > >   		}
> > >   	}
> > >   
> > > -	bo = xe_bo_create(xe, NULL, vm, args->size, ttm_bo_type_device,
> > > -			  bo_flags);
> > > +	bo = xe_bo_create_user(xe, NULL, vm, args->size,
> > > +			       args->smem_cpu_caching, args->coh_mode,
> > > +			       ttm_bo_type_device,
> > > +			       bo_flags);
> > >   	if (IS_ERR(bo)) {
> > >   		err = PTR_ERR(bo);
> > >   		goto out_vm;
> > > @@ -2105,10 +2174,12 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
> > >   	args->size = ALIGN(mul_u32_u32(args->pitch, args->height),
> > >   			   page_size);
> > >   
> > > -	bo = xe_bo_create(xe, NULL, NULL, args->size, ttm_bo_type_device,
> > > -			  XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> > > -			  XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
> > > -			  XE_BO_NEEDS_CPU_ACCESS);
> > > +	bo = xe_bo_create_user(xe, NULL, NULL, args->size,
> > > +			       XE_GEM_CPU_CACHING_WC, XE_GEM_COH_NONE,
> > > +			       ttm_bo_type_device,
> > > +			       XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> > > +			       XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
> > > +			       XE_BO_NEEDS_CPU_ACCESS);
> > >   	if (IS_ERR(bo))
> > >   		return PTR_ERR(bo);
> > >   
> > > diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> > > index 4a68d869b3b5..4a0ee81fe598 100644
> > > --- a/drivers/gpu/drm/xe/xe_bo.h
> > > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > > @@ -81,9 +81,10 @@ struct sg_table;
> > >   struct xe_bo *xe_bo_alloc(void);
> > >   void xe_bo_free(struct xe_bo *bo);
> > >   
> > > -struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> > > +struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> > >   				    struct xe_tile *tile, struct dma_resv *resv,
> > >   				    struct ttm_lru_bulk_move *bulk, size_t size,
> > > +				    u16 smem_cpu_caching, u16 coh_mode,
> > >   				    enum ttm_bo_type type, u32 flags);
> > >   struct xe_bo *
> > >   xe_bo_create_locked_range(struct xe_device *xe,
> > > diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
> > > index 2ea9ad423170..9bee220a6872 100644
> > > --- a/drivers/gpu/drm/xe/xe_bo_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_bo_types.h
> > > @@ -68,6 +68,16 @@ struct xe_bo {
> > >   	struct llist_node freed;
> > >   	/** @created: Whether the bo has passed initial creation */
> > >   	bool created;
> > > +	/**
> > > +	 * @coh_mode: Coherency setting. Currently only used for userspace
> > > +	 * objects.
> > > +	 */
> > > +	u16 coh_mode;
> > > +	/**
> > > +	 * @smem_cpu_caching: Caching mode for smem. Currently only used for
> > > +	 * userspace objects.
> > > +	 */
> > > +	u16 smem_cpu_caching;
> > >   };
> > >   
> > >   #define intel_bo_to_drm_bo(bo) (&(bo)->ttm.base)
> > > diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
> > > index 09343b8b3e96..ac20dbc27a2b 100644
> > > --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> > > +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> > > @@ -200,8 +200,9 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
> > >   	int ret;
> > >   
> > >   	dma_resv_lock(resv, NULL);
> > > -	bo = __xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> > > -				   ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
> > > +	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> > > +				    0, 0, /* Will require 1way or 2way for vm_bind */
> > > +				    ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
> > >   	if (IS_ERR(bo)) {
> > >   		ret = PTR_ERR(bo);
> > >   		goto error;
> > > diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> > > index 00d5cb4ef85e..737bb1d4c6f7 100644
> > > --- a/include/uapi/drm/xe_drm.h
> > > +++ b/include/uapi/drm/xe_drm.h
> > > @@ -456,8 +456,61 @@ struct drm_xe_gem_create {
> > >   	 */
> > >   	__u32 handle;
> > >   
> > > -	/** @pad: MBZ */
> > > -	__u32 pad;
> > > +	/**
> > > +	 * @coh_mode: The coherency mode for this object. This will limit the
> > > +	 * possible @smem_caching values.
> > > +	 *
> > > +	 * Supported values:
> > > +	 *
> > > +	 * XE_GEM_COH_NONE: GPU access is assumed to be not coherent with
> > > +	 * CPU. CPU caches are not snooped.
> > > +	 *
> > > +	 * XE_GEM_COH_AT_LEAST_1WAY:
> > > +	 *
> > > +	 * CPU-GPU coherency must be at least 1WAY.
> > > +	 *
> > > +	 * If 1WAY then GPU access is coherent with CPU (CPU caches are snooped)
> > > +	 * until GPU acquires. The acquire by the GPU is not tracked by CPU
> > > +	 * caches.
> > > +	 *
> > > +	 * If 2WAY then should be fully coherent between GPU and CPU.  Fully
> > > +	 * tracked by CPU caches. Both CPU and GPU caches are snooped.
> > > +	 *
> > > +	 * Note: On dgpu the GPU device never caches system memory (outside of
> > > +	 * the special system-memory-read-only cache, which is anyway flushed by
> > > +	 * KMD when nuking TLBs for a given object so should be no concern to
> > > +	 * userspace). The device should be thought of as always 1WAY coherent,
> > > +	 * with the addition that the GPU never caches system memory. At least
> > > +	 * on current dgpu HW there is no way to turn off snooping so likely the
> > > +	 * different coherency modes of the pat_index make no difference for
> > > +	 * system memory.
> > > +	 */
> > > +#define XE_GEM_COH_NONE			1
> > > +#define XE_GEM_COH_AT_LEAST_1WAY	2
> > > +	__u16 coh_mode;
> > > +
> > > +	/**
> > > +	 * @smem_cpu_caching: The CPU caching mode to select for system memory.
> > > +	 *
> > > +	 * Supported values:
> > > +	 *
> > > +	 * XE_GEM_CPU_CACHING_WB: Allocate the pages with write-back caching.
> > > +	 * On iGPU this can't be used for scanout surfaces. The @coh_mode must
> > > +	 * be XE_GEM_COH_AT_LEAST_1WAY.
> > > +	 *
> > > +	 * XE_GEM_CPU_CACHING_WC: Allocate the pages as write-combined. This is
> > > +	 * uncached. Any @coh_mode is permitted. Scanout surfaces should likely
> > > +	 * use this.
> > > +	 *
> > > +	 * XE_GEM_CPU_CACHING_UC: Allocate the pages as uncached. Any @coh_mode
> > > +	 * is permitted. Scanout surfaces are permitted to use this.
> > > +	 *
> > > +	 * MUST be left as zero for VRAM-only objects.
> > > +	 */
> > > +#define XE_GEM_CPU_CACHING_WB                      1
> > > +#define XE_GEM_CPU_CACHING_WC                      2
> > > +#define XE_GEM_CPU_CACHING_UC                      3
> > > +	__u16 smem_cpu_caching;
> > >   
> > >   	/** @reserved: Reserved */
> > >   	__u64 reserved[2];
> > 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-xe] [PATCH v2 6/6] drm/xe/uapi: support pat_index selection with vm_bind
  2023-09-14 15:31 ` [Intel-xe] [PATCH v2 6/6] drm/xe/uapi: support pat_index selection with vm_bind Matthew Auld
  2023-09-15 22:24   ` Matt Roper
@ 2023-09-25 21:56   ` Rodrigo Vivi
  2023-09-26  8:17     ` Matthew Auld
  1 sibling, 1 reply; 28+ messages in thread
From: Rodrigo Vivi @ 2023-09-25 21:56 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Filip Hazubski, Lucas De Marchi, Carl Zhang, Effie Yu, Matt Roper,
	intel-xe

On Thu, Sep 14, 2023 at 04:31:19PM +0100, Matthew Auld wrote:
> Allow userspace to directly control the pat_index for a given vm
> binding. This should allow directly controlling the coherency, caching
> and potentially other stuff in the future for the ppGTT binding.
> 
> The exact meaning behind the pat_index is very platform specific (see
> BSpec or PRMs) but effectively maps to some predefined memory
> attributes. From the KMD pov we only care about the coherency that is
> provided by the pat_index, which falls into either NONE, 1WAY or 2WAY.
> The vm_bind coherency mode for the given pat_index needs to be at least
> as coherent as the coh_mode that was set at object creation. For
> platforms that lack the explicit coherency mode, we treat UC/WT/WC as
> NONE and WB as AT_LEAST_1WAY.
> 
> For userptr mappings we lack a corresponding gem object, so the expected
> coherency mode is instead implicit and must fall into either 1WAY or
> 2WAY. Trying to use NONE will be rejected by the kernel. For imported
> dma-buf (from a different device) the coherency mode is also implicit
> and must also be either 1WAY or 2WAY i.e AT_LEAST_1WAY.
> 
> As part of adding pat_index support with vm_bind we also need stop using
> xe_cache_level and instead use the pat_index in various places. We still
> make use of xe_cache_level, but only as a convenience for kernel
> internal objectsi (internally it maps to some reasonable pat_index). For
> now this is just a 1:1 conversion of the existing code, however for
> platforms like MTL+ we might need to give more control through bo_create
> or stop using WB on the CPU side if we need CPU access.
> 
> v2:
>   - Undefined coh_mode(pat_index) can now be treated as programmer error. (Matt Roper)
>   - We now allow gem_create.coh_mode <= coh_mode(pat_index), rather than
>     having to match exactly. This ensures imported dma-buf can always
>     just use 1way (or even 2way), now that we also bundle 1way/2way into
>     at_least_1way. We still require 1way/2way for external dma-buf, but
>     the policy can now be the same for self-import, if desired.
>   - Use u16 for pat_index in uapi. u32 is massive overkill. (José)
>   - Move as much of the pat_index validation as we can into
>     vm_bind_ioctl_check_args. (José)
> 
> Bspec: 45101, 44235 #xe
> Bspec: 70552, 71582, 59400 #xe2
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Pallavi Mishra <pallavi.mishra@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: José Roberto de Souza <jose.souza@intel.com>
> Cc: Filip Hazubski <filip.hazubski@intel.com>
> Cc: Carl Zhang <carl.zhang@intel.com>
> Cc: Effie Yu <effie.yu@intel.com>
> ---
>  drivers/gpu/drm/xe/tests/xe_migrate.c |  2 +-
>  drivers/gpu/drm/xe/xe_ggtt.c          |  7 ++-
>  drivers/gpu/drm/xe/xe_ggtt_types.h    |  2 +-
>  drivers/gpu/drm/xe/xe_migrate.c       | 13 +++--
>  drivers/gpu/drm/xe/xe_pt.c            | 22 ++++-----
>  drivers/gpu/drm/xe/xe_pt.h            |  4 +-
>  drivers/gpu/drm/xe/xe_vm.c            | 69 +++++++++++++++++++++------
>  drivers/gpu/drm/xe/xe_vm_types.h      | 10 +++-
>  include/uapi/drm/xe_drm.h             | 43 ++++++++++++++++-
>  9 files changed, 128 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
> index 6b4388bfbb31..d3bf4751a2d7 100644
> --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> @@ -301,7 +301,7 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
>  	/* First part of the test, are we updating our pagetable bo with a new entry? */
>  	xe_map_wr(xe, &bo->vmap, XE_PAGE_SIZE * (NUM_KERNEL_PDE - 1), u64,
>  		  0xdeaddeadbeefbeef);
> -	expected = xe_pte_encode(m->q->vm, pt, 0, XE_CACHE_WB, 0);
> +	expected = xe_pte_encode(m->q->vm, pt, 0, xe_pat_get_index(xe, XE_CACHE_WB), 0);
>  	if (m->q->vm->flags & XE_VM_FLAG_64K)
>  		expected |= XE_PTE_PS64;
>  	if (xe_bo_is_vram(pt))
> diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> index aea26afd4668..7e4da16389af 100644
> --- a/drivers/gpu/drm/xe/xe_ggtt.c
> +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> @@ -41,7 +41,8 @@ u64 xe_ggtt_pte_encode(struct xe_bo *bo, u64 bo_offset)
>  		pte |= XE_GGTT_PTE_DM;
>  
>  	if ((ggtt->pat_encode).pte_encode)
> -		pte = (ggtt->pat_encode).pte_encode(xe, pte, XE_CACHE_WB_1_WAY);
> +		pte = (ggtt->pat_encode).pte_encode(xe, pte,
> +						    xe_pat_get_index(xe, XE_CACHE_WB_1_WAY));
>  
>  	return pte;
>  }
> @@ -102,10 +103,8 @@ static void primelockdep(struct xe_ggtt *ggtt)
>  }
>  
>  static u64 xelpg_ggtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
> -						enum xe_cache_level cache)
> +				     u16 pat_index)
>  {
> -	u32 pat_index = xe_pat_get_index(xe, cache);
> -
>  	pte_pat &= ~(XELPG_GGTT_PTE_PAT_MASK);
>  
>  	if (pat_index & BIT(0))
> diff --git a/drivers/gpu/drm/xe/xe_ggtt_types.h b/drivers/gpu/drm/xe/xe_ggtt_types.h
> index 7e55fac1a8a9..7981075bb228 100644
> --- a/drivers/gpu/drm/xe/xe_ggtt_types.h
> +++ b/drivers/gpu/drm/xe/xe_ggtt_types.h
> @@ -31,7 +31,7 @@ struct xe_ggtt {
>  
>  	struct {
>  		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat,
> -						enum xe_cache_level cache);
> +				  u16 pat_index);
>  	} pat_encode;
>  };
>  
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index 26cbc9107501..89d9e33a07e7 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -25,6 +25,7 @@
>  #include "xe_lrc.h"
>  #include "xe_map.h"
>  #include "xe_mocs.h"
> +#include "xe_pat.h"
>  #include "xe_pt.h"
>  #include "xe_res_cursor.h"
>  #include "xe_sched_job.h"
> @@ -162,6 +163,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  	u32 num_entries = NUM_PT_SLOTS, num_level = vm->pt_root[id]->level;
>  	u32 map_ofs, level, i;
>  	struct xe_bo *bo, *batch = tile->mem.kernel_bb_pool->bo;
> +	u16 pat_index = xe_pat_get_index(xe, XE_CACHE_WB);
>  	u64 entry;
>  	int ret;
>  
> @@ -196,7 +198,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  
>  	/* Map the entire BO in our level 0 pt */
>  	for (i = 0, level = 0; i < num_entries; level++) {
> -		entry = xe_pte_encode(vm, bo, i * XE_PAGE_SIZE, XE_CACHE_WB, 0);
> +		entry = xe_pte_encode(vm, bo, i * XE_PAGE_SIZE, pat_index, 0);
>  
>  		xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64, entry);
>  
> @@ -214,7 +216,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  		for (i = 0; i < batch->size;
>  		     i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
>  		     XE_PAGE_SIZE) {
> -			entry = xe_pte_encode(vm, batch, i, XE_CACHE_WB, 0);
> +			entry = xe_pte_encode(vm, batch, i, pat_index, 0);
>  
>  			xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64,
>  				  entry);
> @@ -259,7 +261,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
>  
>  		flags = XE_PPGTT_PTE_DM;
> -		flags = __xe_pte_encode(flags, XE_CACHE_WB, vm, NULL, 2);
> +		flags = __xe_pte_encode(flags, pat_index, vm, NULL, 2);
>  
>  		/*
>  		 * Use 1GB pages, it shouldn't matter the physical amount of
> @@ -454,6 +456,7 @@ static void emit_pte(struct xe_migrate *m,
>  		     struct xe_res_cursor *cur,
>  		     u32 size, struct xe_bo *bo)
>  {
> +	u16 pat_index = xe_pat_get_index(m->tile->xe, XE_CACHE_WB);
>  	u32 ptes;
>  	u64 ofs = at_pt * XE_PAGE_SIZE;
>  	u64 cur_ofs;
> @@ -494,7 +497,7 @@ static void emit_pte(struct xe_migrate *m,
>  				addr += vram_region_gpu_offset(bo->ttm.resource);
>  				addr |= XE_PPGTT_PTE_DM;
>  			}
> -			addr = __xe_pte_encode(addr, XE_CACHE_WB, m->q->vm, NULL, 0);
> +			addr = __xe_pte_encode(addr, pat_index, m->q->vm, NULL, 0);
>  			bb->cs[bb->len++] = lower_32_bits(addr);
>  			bb->cs[bb->len++] = upper_32_bits(addr);
>  
> @@ -1254,7 +1257,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
>  
>  			xe_tile_assert(tile, pt_bo->size == SZ_4K);
>  
> -			addr = xe_pte_encode(vm, pt_bo, 0, XE_CACHE_WB, 0);
> +			addr = xe_pte_encode(vm, pt_bo, 0, xe_pat_get_index(xe, XE_CACHE_WB), 0);
>  			bb->cs[bb->len++] = lower_32_bits(addr);
>  			bb->cs[bb->len++] = upper_32_bits(addr);
>  		}
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index a1b164cf8bce..7dd93cbff704 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -10,6 +10,7 @@
>  #include "xe_gt.h"
>  #include "xe_gt_tlb_invalidation.h"
>  #include "xe_migrate.h"
> +#include "xe_pat.h"
>  #include "xe_pt_types.h"
>  #include "xe_pt_walk.h"
>  #include "xe_res_cursor.h"
> @@ -67,7 +68,7 @@ u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset)
>  	return pde;
>  }
>  
> -u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
> +u64 __xe_pte_encode(u64 pte, u16 pat_index,
>  		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level)
>  {
>  	struct xe_device *xe = vm->xe;
> @@ -85,7 +86,7 @@ u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
>  	else if (pt_level == 2)
>  		pte |= XE_PDPE_PS_1G;
>  
> -	pte = vm->pat_encode.pte_encode(xe, pte, cache);
> +	pte = vm->pat_encode.pte_encode(xe, pte, pat_index);
>  
>  	/* XXX: Does hw support 1 GiB pages? */
>  	XE_WARN_ON(pt_level > 2);
> @@ -103,7 +104,7 @@ u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
>   *
>   * Return: An encoded page-table entry. No errors.
>   */
> -u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
> +u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, u16 pat_index,
>  		  u32 pt_level)
>  {
>  	u64 pte;
> @@ -112,7 +113,7 @@ u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_
>  	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
>  		pte |= XE_PPGTT_PTE_DM;
>  
> -	return __xe_pte_encode(pte, cache, vm, NULL, pt_level);
> +	return __xe_pte_encode(pte, pat_index, vm, NULL, pt_level);
>  }
>  
>  static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
> @@ -125,7 +126,7 @@ static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
>  
>  	if (level == 0) {
>  		u64 empty = xe_pte_encode(vm, vm->scratch_bo[id], 0,
> -					  XE_CACHE_WB, 0);
> +					  xe_pat_get_index(vm->xe, XE_CACHE_WB), 0);
>  
>  		return empty;
>  	} else {
> @@ -358,8 +359,6 @@ struct xe_pt_stage_bind_walk {
>  	struct xe_vm *vm;
>  	/** @tile: The tile we're building for. */
>  	struct xe_tile *tile;
> -	/** @cache: Desired cache level for the ptes */
> -	enum xe_cache_level cache;
>  	/** @default_pte: PTE flag only template. No address is associated */
>  	u64 default_pte;
>  	/** @dma_offset: DMA offset to add to the PTE. */
> @@ -594,7 +593,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>  
>  		pte = __xe_pte_encode(is_null ? 0 :
>  				      xe_res_dma(curs) + xe_walk->dma_offset,
> -				      xe_walk->cache, xe_walk->vm, xe_walk->vma, level);
> +				      xe_walk->vma->pat_index, xe_walk->vm, xe_walk->vma, level);
>  		pte |= xe_walk->default_pte;
>  
>  		/*
> @@ -720,13 +719,8 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
>  		if (vma && vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT)
>  			xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
>  		xe_walk.dma_offset = vram_region_gpu_offset(bo->ttm.resource);
> -		xe_walk.cache = XE_CACHE_WB;
> -	} else {
> -		if (!xe_vma_has_no_bo(vma) && bo->flags & XE_BO_SCANOUT_BIT)
> -			xe_walk.cache = XE_CACHE_WT;
> -		else
> -			xe_walk.cache = XE_CACHE_WB;
>  	}
> +
>  	if (!xe_vma_has_no_bo(vma) && xe_bo_is_stolen(bo))
>  		xe_walk.dma_offset = xe_ttm_stolen_gpu_offset(xe_bo_device(bo));
>  
> diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
> index 0e66436d707d..6d10823fca9b 100644
> --- a/drivers/gpu/drm/xe/xe_pt.h
> +++ b/drivers/gpu/drm/xe/xe_pt.h
> @@ -47,9 +47,9 @@ bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
>  
>  u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset);
>  
> -u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
> +u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, u16 pat_index,
>  		  u32 pt_level);
> -u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
> +u64 __xe_pte_encode(u64 pte, u16 pat_index,
>  		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level);
>  
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index ba612a5ee2d8..98db7a298139 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -6,6 +6,7 @@
>  #include "xe_vm.h"
>  
>  #include <linux/dma-fence-array.h>
> +#include <linux/nospec.h>
>  
>  #include <drm/drm_exec.h>
>  #include <drm/drm_print.h>
> @@ -858,7 +859,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>  				    u64 start, u64 end,
>  				    bool read_only,
>  				    bool is_null,
> -				    u8 tile_mask)
> +				    u8 tile_mask,
> +				    u16 pat_index)
>  {
>  	struct xe_vma *vma;
>  	struct xe_tile *tile;
> @@ -897,6 +899,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>  			vma->tile_mask |= 0x1 << id;
>  	}
>  
> +	vma->pat_index = pat_index;
> +
>  	if (vm->xe->info.platform == XE_PVC)
>  		vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>  
> @@ -1195,10 +1199,8 @@ static void xe_vma_op_work_func(struct work_struct *w);
>  static void vm_destroy_work_func(struct work_struct *w);
>  
>  static u64 xe2_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
> -						enum xe_cache_level cache)
> +				    u16 pat_index)
>  {
> -	u32 pat_index = xe_pat_get_index(xe, cache);
> -
>  	if (pat_index & BIT(0))
>  		pte_pat |= BIT(3);
>  
> @@ -1216,10 +1218,8 @@ static u64 xe2_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
>  }
>  
>  static u64 xelp_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
> -						enum xe_cache_level cache)
> +				     u16 pat_index)
>  {
> -	u32 pat_index = xe_pat_get_index(xe, cache);
> -
>  	if (pat_index & BIT(0))
>  		pte_pat |= BIT(3);
>  
> @@ -2300,7 +2300,7 @@ static void print_op(struct xe_device *xe, struct drm_gpuva_op *op)
>  static struct drm_gpuva_ops *
>  vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>  			 u64 bo_offset_or_userptr, u64 addr, u64 range,
> -			 u32 operation, u8 tile_mask, u32 region)
> +			 u32 operation, u8 tile_mask, u32 region, u16 pat_index)
>  {
>  	struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
>  	struct drm_gpuva_ops *ops;
> @@ -2327,6 +2327,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>  			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
>  
>  			op->tile_mask = tile_mask;
> +			op->pat_index = pat_index;
>  			op->map.immediate =
>  				operation & XE_VM_BIND_FLAG_IMMEDIATE;
>  			op->map.read_only =
> @@ -2354,6 +2355,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>  			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
>  
>  			op->tile_mask = tile_mask;
> +			op->pat_index = pat_index;
>  			op->prefetch.region = region;
>  		}
>  		break;
> @@ -2396,7 +2398,8 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>  }
>  
>  static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
> -			      u8 tile_mask, bool read_only, bool is_null)
> +			      u8 tile_mask, bool read_only, bool is_null,
> +			      u16 pat_index)
>  {
>  	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
>  	struct xe_vma *vma;
> @@ -2412,7 +2415,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
>  	vma = xe_vma_create(vm, bo, op->gem.offset,
>  			    op->va.addr, op->va.addr +
>  			    op->va.range - 1, read_only, is_null,
> -			    tile_mask);
> +			    tile_mask, pat_index);
>  	if (bo)
>  		xe_bo_unlock(bo);
>  
> @@ -2569,7 +2572,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>  
>  			vma = new_vma(vm, &op->base.map,
>  				      op->tile_mask, op->map.read_only,
> -				      op->map.is_null);
> +				      op->map.is_null, op->pat_index);
>  			if (IS_ERR(vma)) {
>  				err = PTR_ERR(vma);
>  				goto free_fence;
> @@ -2597,7 +2600,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>  
>  				vma = new_vma(vm, op->base.remap.prev,
>  					      op->tile_mask, read_only,
> -					      is_null);
> +					      is_null, op->pat_index);
>  				if (IS_ERR(vma)) {
>  					err = PTR_ERR(vma);
>  					goto free_fence;
> @@ -2633,7 +2636,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>  
>  				vma = new_vma(vm, op->base.remap.next,
>  					      op->tile_mask, read_only,
> -					      is_null);
> +					      is_null, op->pat_index);
>  				if (IS_ERR(vma)) {
>  					err = PTR_ERR(vma);
>  					goto free_fence;
> @@ -3146,7 +3149,23 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
>  		u32 obj = (*bind_ops)[i].obj;
>  		u64 obj_offset = (*bind_ops)[i].obj_offset;
>  		u32 region = (*bind_ops)[i].region;
> +		u16 pat_index = (*bind_ops)[i].pat_index;
>  		bool is_null = op & XE_VM_BIND_FLAG_NULL;
> +		u16 coh_mode;
> +
> +		if (XE_IOCTL_DBG(xe, pat_index >= xe->info.pat.n_entries)) {
> +			err = -EINVAL;
> +			goto free_bind_ops;
> +		}
> +
> +		pat_index = array_index_nospec(pat_index,
> +					       xe->info.pat.n_entries);
> +		(*bind_ops)[i].pat_index = pat_index;
> +		coh_mode = xe_pat_index_get_coh_mode(xe, pat_index);
> +		if (XE_WARN_ON(!coh_mode || coh_mode > XE_GEM_COH_AT_LEAST_1WAY)) {
> +			err = -EINVAL;
> +			goto free_bind_ops;
> +		}
>  
>  		if (i == 0) {
>  			*async = !!(op & XE_VM_BIND_FLAG_ASYNC);
> @@ -3188,6 +3207,8 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
>  				 VM_BIND_OP(op) == XE_VM_BIND_OP_UNMAP_ALL) ||
>  		    XE_IOCTL_DBG(xe, obj &&
>  				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP_USERPTR) ||
> +		    XE_IOCTL_DBG(xe, coh_mode == XE_GEM_COH_NONE &&
> +				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP_USERPTR) ||
>  		    XE_IOCTL_DBG(xe, obj &&
>  				 VM_BIND_OP(op) == XE_VM_BIND_OP_PREFETCH) ||
>  		    XE_IOCTL_DBG(xe, region &&
> @@ -3336,6 +3357,8 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  		u64 addr = bind_ops[i].addr;
>  		u32 obj = bind_ops[i].obj;
>  		u64 obj_offset = bind_ops[i].obj_offset;
> +		u16 pat_index = bind_ops[i].pat_index;
> +		u16 coh_mode;
>  
>  		if (!obj)
>  			continue;
> @@ -3363,6 +3386,23 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  				goto put_obj;
>  			}
>  		}
> +
> +		coh_mode = xe_pat_index_get_coh_mode(xe, pat_index);
> +		if (bos[i]->coh_mode) {
> +			if (XE_IOCTL_DBG(xe, coh_mode < bos[i]->coh_mode)) {
> +				err = -EINVAL;
> +				goto put_obj;
> +			}
> +		} else if (XE_IOCTL_DBG(xe, coh_mode == XE_GEM_COH_NONE)) {
> +			/*
> +			 * Imported dma-buf from a different device should
> +			 * require 1way or 2way coherency since we don't know
> +			 * how it was mapped on the CPU. Just assume is it
> +			 * potentially cached on CPU side.
> +			 */
> +			err = -EINVAL;
> +			goto put_obj;
> +		}
>  	}
>  
>  	if (args->num_syncs) {
> @@ -3400,10 +3440,11 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>  		u64 obj_offset = bind_ops[i].obj_offset;
>  		u8 tile_mask = bind_ops[i].tile_mask;
>  		u32 region = bind_ops[i].region;
> +		u16 pat_index = bind_ops[i].pat_index;
>  
>  		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset,
>  						  addr, range, op, tile_mask,
> -						  region);
> +						  region, pat_index);
>  		if (IS_ERR(ops[i])) {
>  			err = PTR_ERR(ops[i]);
>  			ops[i] = NULL;
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index dc583f00919f..54658f400174 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -111,6 +111,11 @@ struct xe_vma {
>  	 */
>  	u8 tile_present;
>  
> +	/**
> +	 * @pat_index: The pat index to use when encoding the PTEs for this vma.
> +	 */
> +	u16 pat_index;
> +
>  	struct {
>  		struct list_head rebind_link;
>  	} notifier;
> @@ -338,8 +343,7 @@ struct xe_vm {
>  	bool batch_invalidate_tlb;
>  
>  	struct {
> -		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat,
> -				  enum xe_cache_level cache);
> +		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat, u16 pat_index);
>  	} pat_encode;
>  };
>  
> @@ -419,6 +423,8 @@ struct xe_vma_op {
>  	struct async_op_fence *fence;
>  	/** @tile_mask: gt mask for this operation */
>  	u8 tile_mask;
> +	/** @pat_index: The pat index to use for this operation. */
> +	u16 pat_index;
>  	/** @flags: operation flags */
>  	enum xe_vma_op_flags flags;
>  
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index 737bb1d4c6f7..75b42c1116f2 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -605,8 +605,49 @@ struct drm_xe_vm_bind_op {
>  	 */
>  	__u32 obj;
>  
> +	/**
> +	 * @pat_index: The platform defined @pat_index to use for this mapping.
> +	 * The index basically maps to some predefined memory attributes,
> +	 * including things like caching, coherency, compression etc.  The exact
> +	 * meaning of the pat_index is platform specific and defined in the
> +	 * Bspec and PRMs.  When the KMD sets up the binding the index here is
> +	 * encoded into the ppGTT PTE.
> +	 *
> +	 * For coherency the @pat_index needs to be least as coherent as
> +	 * drm_xe_gem_create.coh_mode. i.e coh_mode(pat_index) >=
> +	 * drm_xe_gem_create.coh_mode. The KMD will extract the coherency mode
> +	 * from the @pat_index and reject if there is a mismatch (see note below
> +	 * for pre-MTL platforms).
> +	 *
> +	 * Note: On pre-MTL platforms there is only a caching mode and no
> +	 * explicit coherency mode, but on such hardware there is always a
> +	 * shared-LLC (or is dgpu) so all GT memory accesses are coherent with
> +	 * CPU caches even with the caching mode set as uncached.  It's only the
> +	 * display engine that is incoherent (on dgpu it must be in VRAM which
> +	 * is always mapped as WC on the CPU). However to keep the uapi somewhat
> +	 * consistent with newer platforms the KMD groups the different cache
> +	 * levels into the following coherency buckets on all pre-MTL platforms:
> +	 *
> +	 *	ppGTT UC -> XE_GEM_COH_NONE
> +	 *	ppGTT WC -> XE_GEM_COH_NONE
> +	 *	ppGTT WT -> XE_GEM_COH_NONE
> +	 *	ppGTT WB -> XE_GEM_COH_AT_LEAST_1WAY
> +	 *
> +	 * In practice UC/WC/WT should only ever used for scanout surfaces on
> +	 * such platforms (or perhaps in general for dma-buf if shared with
> +	 * another device) since it is only the display engine that is actually
> +	 * incoherent.  Everything else should typically use WB given that we
> +	 * have a shared-LLC.  On MTL+ this completely changes and the HW
> +	 * defines the coherency mode as part of the @pat_index, where
> +	 * incoherent GT access is possible.

with this in mind I noticed that on i915 the scanout is just a pat param to
the uapi, while in Xe we have a buffer flag:
XE_GEM_CREATE_FLAG_SCANOUT

should we continue with this flag, or should we do the same pat.param that
i915 is doing?

> +	 *
> +	 * Note: For userptr and externally imported dma-buf the kernel expects
> +	 * either 1WAY or 2WAY for the @pat_index.
> +	 */
> +	__u16 pat_index;
> +
>  	/** @pad: MBZ */
> -	__u32 pad;
> +	__u16 pad;
>  
>  	union {
>  		/**
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-xe] [PATCH v2 1/6] drm/xe/uapi: Add support for cache and coherency mode
  2023-09-25 18:26       ` Souza, Jose
@ 2023-09-26  8:07         ` Matthew Auld
  2023-09-26 15:59           ` Souza, Jose
  0 siblings, 1 reply; 28+ messages in thread
From: Matthew Auld @ 2023-09-26  8:07 UTC (permalink / raw)
  To: Souza, Jose, intel-xe@lists.freedesktop.org

On 25/09/2023 19:26, Souza, Jose wrote:
> On Mon, 2023-09-25 at 09:06 +0100, Matthew Auld wrote:
>> On 21/09/2023 21:07, Souza, Jose wrote:
>>> On Thu, 2023-09-14 at 16:31 +0100, Matthew Auld wrote:
>>>> From: Pallavi Mishra <pallavi.mishra@intel.com>
>>>>
>>>> Allow userspace to specify the CPU caching mode to use for system memory
>>>> in addition to coherency modes during object creation. Modify gem create
>>>> handler and introduce xe_bo_create_user to replace xe_bo_create. In a
>>>> later patch we will support setting the pat_index as part of vm_bind,
>>>> where expectation is that the coherency mode extracted from the
>>>> pat_index must match the one set at object creation.
>>>>
>>>> v2
>>>>     - s/smem_caching/smem_cpu_caching/ and
>>>>       s/XE_GEM_CACHING/XE_GEM_CPU_CACHING/. (Matt Roper)
>>>>     - Drop COH_2WAY and just use COH_NONE + COH_AT_LEAST_1WAY; KMD mostly
>>>>       just cares that zeroing/swap-in can't be bypassed with the given
>>>>       smem_caching mode. (Matt Roper)
>>>>     - Fix broken range check for coh_mode and smem_cpu_caching and also
>>>>       don't use constant value, but the already defined macros. (José)
>>>>     - Prefer switch statement for smem_cpu_caching -> ttm_caching. (José)
>>>>     - Add note in kernel-doc for dgpu and coherency modes for system
>>>>       memory. (José)
>>>>
>>>> Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
>>>> Co-authored-by: Matthew Auld <matthew.auld@intel.com>
>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>>> Cc: Matt Roper <matthew.d.roper@intel.com>
>>>> Cc: José Roberto de Souza <jose.souza@intel.com>
>>>> Cc: Filip Hazubski <filip.hazubski@intel.com>
>>>> Cc: Carl Zhang <carl.zhang@intel.com>
>>>> Cc: Effie Yu <effie.yu@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/xe/xe_bo.c       | 105 ++++++++++++++++++++++++++-----
>>>>    drivers/gpu/drm/xe/xe_bo.h       |   3 +-
>>>>    drivers/gpu/drm/xe/xe_bo_types.h |  10 +++
>>>>    drivers/gpu/drm/xe/xe_dma_buf.c  |   5 +-
>>>>    include/uapi/drm/xe_drm.h        |  57 ++++++++++++++++-
>>>>    5 files changed, 158 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>>>> index 27726d4f3423..f3facd788f15 100644
>>>> --- a/drivers/gpu/drm/xe/xe_bo.c
>>>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>>>> @@ -325,7 +325,7 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
>>>>    	struct xe_device *xe = xe_bo_device(bo);
>>>>    	struct xe_ttm_tt *tt;
>>>>    	unsigned long extra_pages;
>>>> -	enum ttm_caching caching = ttm_cached;
>>>> +	enum ttm_caching caching;
>>>>    	int err;
>>>>    
>>>>    	tt = kzalloc(sizeof(*tt), GFP_KERNEL);
>>>> @@ -339,13 +339,25 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
>>>>    		extra_pages = DIV_ROUND_UP(xe_device_ccs_bytes(xe, bo->size),
>>>>    					   PAGE_SIZE);
>>>>    
>>>> +	switch (bo->smem_cpu_caching) {
>>>> +	case XE_GEM_CPU_CACHING_WC:
>>>> +		caching = ttm_write_combined;
>>>> +		break;
>>>> +	case XE_GEM_CPU_CACHING_UC:
>>>> +		caching = ttm_uncached;
>>>> +		break;
>>>> +	default:
>>>> +		caching = ttm_cached;
>>>> +		break;
>>>> +	}
>>>> +
>>>>    	/*
>>>>    	 * Display scanout is always non-coherent with the CPU cache.
>>>>    	 *
>>>>    	 * For Xe_LPG and beyond, PPGTT PTE lookups are also non-coherent and
>>>>    	 * require a CPU:WC mapping.
>>>>    	 */
>>>> -	if (bo->flags & XE_BO_SCANOUT_BIT ||
>>>> +	if ((!bo->smem_cpu_caching && bo->flags & XE_BO_SCANOUT_BIT) ||
>>>>    	    (xe->info.graphics_verx100 >= 1270 && bo->flags & XE_BO_PAGETABLE))
>>>>    		caching = ttm_write_combined;
>>>>    
>>>> @@ -1184,9 +1196,10 @@ void xe_bo_free(struct xe_bo *bo)
>>>>    	kfree(bo);
>>>>    }
>>>>    
>>>> -struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>>>> +struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>>>>    				    struct xe_tile *tile, struct dma_resv *resv,
>>>>    				    struct ttm_lru_bulk_move *bulk, size_t size,
>>>> +				    u16 smem_cpu_caching, u16 coh_mode,
>>>>    				    enum ttm_bo_type type, u32 flags)
>>>>    {
>>>>    	struct ttm_operation_ctx ctx = {
>>>> @@ -1224,6 +1237,8 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>>>>    	bo->tile = tile;
>>>>    	bo->size = size;
>>>>    	bo->flags = flags;
>>>> +	bo->smem_cpu_caching = smem_cpu_caching;
>>>> +	bo->coh_mode = coh_mode;
>>>>    	bo->ttm.base.funcs = &xe_gem_object_funcs;
>>>>    	bo->props.preferred_mem_class = XE_BO_PROPS_INVALID;
>>>>    	bo->props.preferred_gt = XE_BO_PROPS_INVALID;
>>>> @@ -1307,10 +1322,11 @@ static int __xe_bo_fixed_placement(struct xe_device *xe,
>>>>    }
>>>>    
>>>>    struct xe_bo *
>>>> -xe_bo_create_locked_range(struct xe_device *xe,
>>>> -			  struct xe_tile *tile, struct xe_vm *vm,
>>>> -			  size_t size, u64 start, u64 end,
>>>> -			  enum ttm_bo_type type, u32 flags)
>>>> +__xe_bo_create_locked(struct xe_device *xe,
>>>> +		      struct xe_tile *tile, struct xe_vm *vm,
>>>> +		      size_t size, u64 start, u64 end,
>>>> +		      u16 smem_cpu_caching, u16 coh_mode,
>>>> +		      enum ttm_bo_type type, u32 flags)
>>>>    {
>>>>    	struct xe_bo *bo = NULL;
>>>>    	int err;
>>>> @@ -1331,10 +1347,11 @@ xe_bo_create_locked_range(struct xe_device *xe,
>>>>    		}
>>>>    	}
>>>>    
>>>> -	bo = __xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
>>>> +	bo = ___xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
>>>>    				   vm && !xe_vm_in_fault_mode(vm) &&
>>>>    				   flags & XE_BO_CREATE_USER_BIT ?
>>>>    				   &vm->lru_bulk_move : NULL, size,
>>>> +				   smem_cpu_caching, coh_mode,
>>>>    				   type, flags);
>>>>    	if (IS_ERR(bo))
>>>>    		return bo;
>>>> @@ -1368,11 +1385,35 @@ xe_bo_create_locked_range(struct xe_device *xe,
>>>>    	return ERR_PTR(err);
>>>>    }
>>>>    
>>>> +struct xe_bo *
>>>> +xe_bo_create_locked_range(struct xe_device *xe,
>>>> +			  struct xe_tile *tile, struct xe_vm *vm,
>>>> +			  size_t size, u64 start, u64 end,
>>>> +			  enum ttm_bo_type type, u32 flags)
>>>> +{
>>>> +	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);
>>>> +}
>>>> +
>>>>    struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
>>>>    				  struct xe_vm *vm, size_t size,
>>>>    				  enum ttm_bo_type type, u32 flags)
>>>>    {
>>>> -	return xe_bo_create_locked_range(xe, tile, vm, size, 0, ~0ULL, type, flags);
>>>> +	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);
>>>> +}
>>>> +
>>>> +static struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
>>>> +				       struct xe_vm *vm, size_t size,
>>>> +				       u16 smem_cpu_caching, u16 coh_mode,
>>>> +				       enum ttm_bo_type type,
>>>> +				       u32 flags)
>>>> +{
>>>> +	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
>>>> +						 smem_cpu_caching, coh_mode, type,
>>>> +						 flags | XE_BO_CREATE_USER_BIT);
>>>> +	if (!IS_ERR(bo))
>>>> +		xe_bo_unlock_vm_held(bo);
>>>> +
>>>> +	return bo;
>>>>    }
>>>>    
>>>>    struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
>>>> @@ -1755,11 +1796,11 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>>>>    	struct drm_xe_gem_create *args = data;
>>>>    	struct xe_vm *vm = NULL;
>>>>    	struct xe_bo *bo;
>>>> -	unsigned int bo_flags = XE_BO_CREATE_USER_BIT;
>>>> +	unsigned int bo_flags;
>>>>    	u32 handle;
>>>>    	int err;
>>>>    
>>>> -	if (XE_IOCTL_DBG(xe, args->extensions) || XE_IOCTL_DBG(xe, args->pad) ||
>>>> +	if (XE_IOCTL_DBG(xe, args->extensions) ||
>>>>    	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
>>>>    		return -EINVAL;
>>>>    
>>>> @@ -1801,6 +1842,32 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>>>>    		bo_flags |= XE_BO_NEEDS_CPU_ACCESS;
>>>>    	}
>>>>    
>>>> +	if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY))
>>>> +		return -EINVAL;
>>>> +
>>>> +	if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC))
>>>> +		return -EINVAL;
>>>> +
>>>> +	if (bo_flags & XE_BO_CREATE_SYSTEM_BIT) {
>>>> +		if (XE_IOCTL_DBG(xe, !args->coh_mode))
>>>> +			return -EINVAL;
>>>> +
>>>> +		if (XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
>>>> +			return -EINVAL;
>>>> +
>>>> +		if (XE_IOCTL_DBG(xe, !IS_DGFX(xe) &&
>>>> +				 bo_flags & XE_BO_SCANOUT_BIT &&
>>>> +				 args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
>>>> +			return -EINVAL;
>>>> +
>>>> +		if (args->coh_mode == XE_GEM_COH_NONE) {
>>>> +			if (XE_IOCTL_DBG(xe, args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
>>>> +				return -EINVAL;
>>>> +		}
>>>> +	} else if (XE_IOCTL_DBG(xe, args->smem_cpu_caching)) {
>>>
>>> should be XE_IOCTL_DBG(xe, !args->smem_cpu_caching).
>>>
>>> uAPI don't say anything about allow smem_cpu_caching or coh_mode == 0, did this to be able to run tests without display in DG2:
>>
>> The above check is for VRAM-only objects. For smem_cpu_caching the
>> kernel-doc says: "MUST be left as zero for VRAM-only objects."
>> Internally the KMD uses WC for CPU mapping VRAM which is out of the
>> control of userspace.
> 
> In my opinion this should be != 0 and match with the PAT index that will be set in VM bind.

This is just talking about VRAM-only objects. If it was evicted to 
system memory userspace can't touch the pages from the GPU, without the 
KMD first migrating it back to VRAM. So the pat_index mostly applies to 
the VRAM placement in such a case, and that is implicitly always WC.

However the smem_cpu_caching might be interesting for controlling the 
evicted mmap caching that is used. i.e VRAM-only object is evicted to 
system memory and accessed by the CPU from userspace. I didn't think 
userspace would really care, so figured just reject/ignore 
smem_cpu_caching for VRAM-only objects.

I can remove the !smem_cpu_caching requirement for VRAM-only and update 
the kernel-doc to say that this controls the evicted-to-smem caching?

> Have not read much but I believe a CXL GPU memory would be mapped as WB to take advantage of CXL caching protocols.

Right, but for that maybe you would just add something like 
vram_cpu_caching, assuming that it still uses this type of interface?

> 
> So I believe the kernel doc restrictions should be removed and run time check for a (smem_cpu_caching != WC && is_dgfx()) return -EINVAL;
> 
> A question related to that, so a bo placed in lmem + smem can have a WB caching in DG2? How would that work? So far Mesa was handling that case as WC
> as well.

Yeah, you can have WB for smem, and then WC for vram.

AFAIK on dgpu without this series, you get WB for system memory (you 
can't turn off snooping on dgpu so might as well use WB I guess). If 
it's currently placed in VRAM you get WC.

With this series you can also select WC for lmem + smem, if that is 
preferred. But I think for smem-only you might want to use WB on dgpu, 
on current platforms.

> 
>>
>> coh_mode == 0 is not meant to be allowed, but looks like I missed the
>> check here for VRAM-only. Will fix.
>>
>>>
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>>> index f3facd788f152..e0e4fefcd2060 100644
>>> --- a/drivers/gpu/drm/xe/xe_bo.c
>>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>>> @@ -1796,7 +1796,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>>>           struct drm_xe_gem_create *args = data;
>>>           struct xe_vm *vm = NULL;
>>>           struct xe_bo *bo;
>>> -       unsigned int bo_flags;
>>> +       unsigned int bo_flags = 0;
>>>           u32 handle;
>>>           int err;
>>>
>>> @@ -1842,19 +1842,15 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>>>                   bo_flags |= XE_BO_NEEDS_CPU_ACCESS;
>>>           }
>>>
>>> -       if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY))
>>> +       if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY) ||
>>> +           XE_IOCTL_DBG(xe, !args->coh_mode))
>>>                   return -EINVAL;
>>>
>>> -       if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC))
>>> +       if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC) ||
>>> +           XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
>>>                   return -EINVAL;
>>>
>>>           if (bo_flags & XE_BO_CREATE_SYSTEM_BIT) {
>>> -               if (XE_IOCTL_DBG(xe, !args->coh_mode))
>>> -                       return -EINVAL;
>>> -
>>> -               if (XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
>>> -                       return -EINVAL;
>>> -
>>>                   if (XE_IOCTL_DBG(xe, !IS_DGFX(xe) &&
>>>                                    bo_flags & XE_BO_SCANOUT_BIT &&
>>>                                    args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
>>> @@ -1864,8 +1860,6 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>>>                           if (XE_IOCTL_DBG(xe, args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
>>>                                   return -EINVAL;
>>>                   }
>>> -       } else if (XE_IOCTL_DBG(xe, args->smem_cpu_caching)) {
>>> -               return -EINVAL;
>>>           }
>>>
>>>           if (args->vm_id) {
>>>
>>>
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>>    	if (args->vm_id) {
>>>>    		vm = xe_vm_lookup(xef, args->vm_id);
>>>>    		if (XE_IOCTL_DBG(xe, !vm))
>>>> @@ -1812,8 +1879,10 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>>>>    		}
>>>>    	}
>>>>    
>>>> -	bo = xe_bo_create(xe, NULL, vm, args->size, ttm_bo_type_device,
>>>> -			  bo_flags);
>>>> +	bo = xe_bo_create_user(xe, NULL, vm, args->size,
>>>> +			       args->smem_cpu_caching, args->coh_mode,
>>>> +			       ttm_bo_type_device,
>>>> +			       bo_flags);
>>>>    	if (IS_ERR(bo)) {
>>>>    		err = PTR_ERR(bo);
>>>>    		goto out_vm;
>>>> @@ -2105,10 +2174,12 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
>>>>    	args->size = ALIGN(mul_u32_u32(args->pitch, args->height),
>>>>    			   page_size);
>>>>    
>>>> -	bo = xe_bo_create(xe, NULL, NULL, args->size, ttm_bo_type_device,
>>>> -			  XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
>>>> -			  XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
>>>> -			  XE_BO_NEEDS_CPU_ACCESS);
>>>> +	bo = xe_bo_create_user(xe, NULL, NULL, args->size,
>>>> +			       XE_GEM_CPU_CACHING_WC, XE_GEM_COH_NONE,
>>>> +			       ttm_bo_type_device,
>>>> +			       XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
>>>> +			       XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
>>>> +			       XE_BO_NEEDS_CPU_ACCESS);
>>>>    	if (IS_ERR(bo))
>>>>    		return PTR_ERR(bo);
>>>>    
>>>> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
>>>> index 4a68d869b3b5..4a0ee81fe598 100644
>>>> --- a/drivers/gpu/drm/xe/xe_bo.h
>>>> +++ b/drivers/gpu/drm/xe/xe_bo.h
>>>> @@ -81,9 +81,10 @@ struct sg_table;
>>>>    struct xe_bo *xe_bo_alloc(void);
>>>>    void xe_bo_free(struct xe_bo *bo);
>>>>    
>>>> -struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>>>> +struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>>>>    				    struct xe_tile *tile, struct dma_resv *resv,
>>>>    				    struct ttm_lru_bulk_move *bulk, size_t size,
>>>> +				    u16 smem_cpu_caching, u16 coh_mode,
>>>>    				    enum ttm_bo_type type, u32 flags);
>>>>    struct xe_bo *
>>>>    xe_bo_create_locked_range(struct xe_device *xe,
>>>> diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
>>>> index 2ea9ad423170..9bee220a6872 100644
>>>> --- a/drivers/gpu/drm/xe/xe_bo_types.h
>>>> +++ b/drivers/gpu/drm/xe/xe_bo_types.h
>>>> @@ -68,6 +68,16 @@ struct xe_bo {
>>>>    	struct llist_node freed;
>>>>    	/** @created: Whether the bo has passed initial creation */
>>>>    	bool created;
>>>> +	/**
>>>> +	 * @coh_mode: Coherency setting. Currently only used for userspace
>>>> +	 * objects.
>>>> +	 */
>>>> +	u16 coh_mode;
>>>> +	/**
>>>> +	 * @smem_cpu_caching: Caching mode for smem. Currently only used for
>>>> +	 * userspace objects.
>>>> +	 */
>>>> +	u16 smem_cpu_caching;
>>>>    };
>>>>    
>>>>    #define intel_bo_to_drm_bo(bo) (&(bo)->ttm.base)
>>>> diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
>>>> index 09343b8b3e96..ac20dbc27a2b 100644
>>>> --- a/drivers/gpu/drm/xe/xe_dma_buf.c
>>>> +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
>>>> @@ -200,8 +200,9 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
>>>>    	int ret;
>>>>    
>>>>    	dma_resv_lock(resv, NULL);
>>>> -	bo = __xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
>>>> -				   ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
>>>> +	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
>>>> +				    0, 0, /* Will require 1way or 2way for vm_bind */
>>>> +				    ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
>>>>    	if (IS_ERR(bo)) {
>>>>    		ret = PTR_ERR(bo);
>>>>    		goto error;
>>>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>>>> index 00d5cb4ef85e..737bb1d4c6f7 100644
>>>> --- a/include/uapi/drm/xe_drm.h
>>>> +++ b/include/uapi/drm/xe_drm.h
>>>> @@ -456,8 +456,61 @@ struct drm_xe_gem_create {
>>>>    	 */
>>>>    	__u32 handle;
>>>>    
>>>> -	/** @pad: MBZ */
>>>> -	__u32 pad;
>>>> +	/**
>>>> +	 * @coh_mode: The coherency mode for this object. This will limit the
>>>> +	 * possible @smem_caching values.
>>>> +	 *
>>>> +	 * Supported values:
>>>> +	 *
>>>> +	 * XE_GEM_COH_NONE: GPU access is assumed to be not coherent with
>>>> +	 * CPU. CPU caches are not snooped.
>>>> +	 *
>>>> +	 * XE_GEM_COH_AT_LEAST_1WAY:
>>>> +	 *
>>>> +	 * CPU-GPU coherency must be at least 1WAY.
>>>> +	 *
>>>> +	 * If 1WAY then GPU access is coherent with CPU (CPU caches are snooped)
>>>> +	 * until GPU acquires. The acquire by the GPU is not tracked by CPU
>>>> +	 * caches.
>>>> +	 *
>>>> +	 * If 2WAY then should be fully coherent between GPU and CPU.  Fully
>>>> +	 * tracked by CPU caches. Both CPU and GPU caches are snooped.
>>>> +	 *
>>>> +	 * Note: On dgpu the GPU device never caches system memory (outside of
>>>> +	 * the special system-memory-read-only cache, which is anyway flushed by
>>>> +	 * KMD when nuking TLBs for a given object so should be no concern to
>>>> +	 * userspace). The device should be thought of as always 1WAY coherent,
>>>> +	 * with the addition that the GPU never caches system memory. At least
>>>> +	 * on current dgpu HW there is no way to turn off snooping so likely the
>>>> +	 * different coherency modes of the pat_index make no difference for
>>>> +	 * system memory.
>>>> +	 */
>>>> +#define XE_GEM_COH_NONE			1
>>>> +#define XE_GEM_COH_AT_LEAST_1WAY	2
>>>> +	__u16 coh_mode;
>>>> +
>>>> +	/**
>>>> +	 * @smem_cpu_caching: The CPU caching mode to select for system memory.
>>>> +	 *
>>>> +	 * Supported values:
>>>> +	 *
>>>> +	 * XE_GEM_CPU_CACHING_WB: Allocate the pages with write-back caching.
>>>> +	 * On iGPU this can't be used for scanout surfaces. The @coh_mode must
>>>> +	 * be XE_GEM_COH_AT_LEAST_1WAY.
>>>> +	 *
>>>> +	 * XE_GEM_CPU_CACHING_WC: Allocate the pages as write-combined. This is
>>>> +	 * uncached. Any @coh_mode is permitted. Scanout surfaces should likely
>>>> +	 * use this.
>>>> +	 *
>>>> +	 * XE_GEM_CPU_CACHING_UC: Allocate the pages as uncached. Any @coh_mode
>>>> +	 * is permitted. Scanout surfaces are permitted to use this.
>>>> +	 *
>>>> +	 * MUST be left as zero for VRAM-only objects.
>>>> +	 */
>>>> +#define XE_GEM_CPU_CACHING_WB                      1
>>>> +#define XE_GEM_CPU_CACHING_WC                      2
>>>> +#define XE_GEM_CPU_CACHING_UC                      3
>>>> +	__u16 smem_cpu_caching;
>>>>    
>>>>    	/** @reserved: Reserved */
>>>>    	__u64 reserved[2];
>>>
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-xe] [PATCH v2 6/6] drm/xe/uapi: support pat_index selection with vm_bind
  2023-09-25 21:56   ` Rodrigo Vivi
@ 2023-09-26  8:17     ` Matthew Auld
  2023-09-27 19:30       ` Rodrigo Vivi
  0 siblings, 1 reply; 28+ messages in thread
From: Matthew Auld @ 2023-09-26  8:17 UTC (permalink / raw)
  To: Rodrigo Vivi
  Cc: Filip Hazubski, Lucas De Marchi, Carl Zhang, Effie Yu, Matt Roper,
	intel-xe

On 25/09/2023 22:56, Rodrigo Vivi wrote:
> On Thu, Sep 14, 2023 at 04:31:19PM +0100, Matthew Auld wrote:
>> Allow userspace to directly control the pat_index for a given vm
>> binding. This should allow directly controlling the coherency, caching
>> and potentially other stuff in the future for the ppGTT binding.
>>
>> The exact meaning behind the pat_index is very platform specific (see
>> BSpec or PRMs) but effectively maps to some predefined memory
>> attributes. From the KMD pov we only care about the coherency that is
>> provided by the pat_index, which falls into either NONE, 1WAY or 2WAY.
>> The vm_bind coherency mode for the given pat_index needs to be at least
>> as coherent as the coh_mode that was set at object creation. For
>> platforms that lack the explicit coherency mode, we treat UC/WT/WC as
>> NONE and WB as AT_LEAST_1WAY.
>>
>> For userptr mappings we lack a corresponding gem object, so the expected
>> coherency mode is instead implicit and must fall into either 1WAY or
>> 2WAY. Trying to use NONE will be rejected by the kernel. For imported
>> dma-buf (from a different device) the coherency mode is also implicit
>> and must also be either 1WAY or 2WAY i.e AT_LEAST_1WAY.
>>
>> As part of adding pat_index support with vm_bind we also need stop using
>> xe_cache_level and instead use the pat_index in various places. We still
>> make use of xe_cache_level, but only as a convenience for kernel
>> internal objectsi (internally it maps to some reasonable pat_index). For
>> now this is just a 1:1 conversion of the existing code, however for
>> platforms like MTL+ we might need to give more control through bo_create
>> or stop using WB on the CPU side if we need CPU access.
>>
>> v2:
>>    - Undefined coh_mode(pat_index) can now be treated as programmer error. (Matt Roper)
>>    - We now allow gem_create.coh_mode <= coh_mode(pat_index), rather than
>>      having to match exactly. This ensures imported dma-buf can always
>>      just use 1way (or even 2way), now that we also bundle 1way/2way into
>>      at_least_1way. We still require 1way/2way for external dma-buf, but
>>      the policy can now be the same for self-import, if desired.
>>    - Use u16 for pat_index in uapi. u32 is massive overkill. (José)
>>    - Move as much of the pat_index validation as we can into
>>      vm_bind_ioctl_check_args. (José)
>>
>> Bspec: 45101, 44235 #xe
>> Bspec: 70552, 71582, 59400 #xe2
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Pallavi Mishra <pallavi.mishra@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Cc: José Roberto de Souza <jose.souza@intel.com>
>> Cc: Filip Hazubski <filip.hazubski@intel.com>
>> Cc: Carl Zhang <carl.zhang@intel.com>
>> Cc: Effie Yu <effie.yu@intel.com>
>> ---
>>   drivers/gpu/drm/xe/tests/xe_migrate.c |  2 +-
>>   drivers/gpu/drm/xe/xe_ggtt.c          |  7 ++-
>>   drivers/gpu/drm/xe/xe_ggtt_types.h    |  2 +-
>>   drivers/gpu/drm/xe/xe_migrate.c       | 13 +++--
>>   drivers/gpu/drm/xe/xe_pt.c            | 22 ++++-----
>>   drivers/gpu/drm/xe/xe_pt.h            |  4 +-
>>   drivers/gpu/drm/xe/xe_vm.c            | 69 +++++++++++++++++++++------
>>   drivers/gpu/drm/xe/xe_vm_types.h      | 10 +++-
>>   include/uapi/drm/xe_drm.h             | 43 ++++++++++++++++-
>>   9 files changed, 128 insertions(+), 44 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
>> index 6b4388bfbb31..d3bf4751a2d7 100644
>> --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
>> +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
>> @@ -301,7 +301,7 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
>>   	/* First part of the test, are we updating our pagetable bo with a new entry? */
>>   	xe_map_wr(xe, &bo->vmap, XE_PAGE_SIZE * (NUM_KERNEL_PDE - 1), u64,
>>   		  0xdeaddeadbeefbeef);
>> -	expected = xe_pte_encode(m->q->vm, pt, 0, XE_CACHE_WB, 0);
>> +	expected = xe_pte_encode(m->q->vm, pt, 0, xe_pat_get_index(xe, XE_CACHE_WB), 0);
>>   	if (m->q->vm->flags & XE_VM_FLAG_64K)
>>   		expected |= XE_PTE_PS64;
>>   	if (xe_bo_is_vram(pt))
>> diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
>> index aea26afd4668..7e4da16389af 100644
>> --- a/drivers/gpu/drm/xe/xe_ggtt.c
>> +++ b/drivers/gpu/drm/xe/xe_ggtt.c
>> @@ -41,7 +41,8 @@ u64 xe_ggtt_pte_encode(struct xe_bo *bo, u64 bo_offset)
>>   		pte |= XE_GGTT_PTE_DM;
>>   
>>   	if ((ggtt->pat_encode).pte_encode)
>> -		pte = (ggtt->pat_encode).pte_encode(xe, pte, XE_CACHE_WB_1_WAY);
>> +		pte = (ggtt->pat_encode).pte_encode(xe, pte,
>> +						    xe_pat_get_index(xe, XE_CACHE_WB_1_WAY));
>>   
>>   	return pte;
>>   }
>> @@ -102,10 +103,8 @@ static void primelockdep(struct xe_ggtt *ggtt)
>>   }
>>   
>>   static u64 xelpg_ggtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
>> -						enum xe_cache_level cache)
>> +				     u16 pat_index)
>>   {
>> -	u32 pat_index = xe_pat_get_index(xe, cache);
>> -
>>   	pte_pat &= ~(XELPG_GGTT_PTE_PAT_MASK);
>>   
>>   	if (pat_index & BIT(0))
>> diff --git a/drivers/gpu/drm/xe/xe_ggtt_types.h b/drivers/gpu/drm/xe/xe_ggtt_types.h
>> index 7e55fac1a8a9..7981075bb228 100644
>> --- a/drivers/gpu/drm/xe/xe_ggtt_types.h
>> +++ b/drivers/gpu/drm/xe/xe_ggtt_types.h
>> @@ -31,7 +31,7 @@ struct xe_ggtt {
>>   
>>   	struct {
>>   		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat,
>> -						enum xe_cache_level cache);
>> +				  u16 pat_index);
>>   	} pat_encode;
>>   };
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
>> index 26cbc9107501..89d9e33a07e7 100644
>> --- a/drivers/gpu/drm/xe/xe_migrate.c
>> +++ b/drivers/gpu/drm/xe/xe_migrate.c
>> @@ -25,6 +25,7 @@
>>   #include "xe_lrc.h"
>>   #include "xe_map.h"
>>   #include "xe_mocs.h"
>> +#include "xe_pat.h"
>>   #include "xe_pt.h"
>>   #include "xe_res_cursor.h"
>>   #include "xe_sched_job.h"
>> @@ -162,6 +163,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>>   	u32 num_entries = NUM_PT_SLOTS, num_level = vm->pt_root[id]->level;
>>   	u32 map_ofs, level, i;
>>   	struct xe_bo *bo, *batch = tile->mem.kernel_bb_pool->bo;
>> +	u16 pat_index = xe_pat_get_index(xe, XE_CACHE_WB);
>>   	u64 entry;
>>   	int ret;
>>   
>> @@ -196,7 +198,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>>   
>>   	/* Map the entire BO in our level 0 pt */
>>   	for (i = 0, level = 0; i < num_entries; level++) {
>> -		entry = xe_pte_encode(vm, bo, i * XE_PAGE_SIZE, XE_CACHE_WB, 0);
>> +		entry = xe_pte_encode(vm, bo, i * XE_PAGE_SIZE, pat_index, 0);
>>   
>>   		xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64, entry);
>>   
>> @@ -214,7 +216,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>>   		for (i = 0; i < batch->size;
>>   		     i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
>>   		     XE_PAGE_SIZE) {
>> -			entry = xe_pte_encode(vm, batch, i, XE_CACHE_WB, 0);
>> +			entry = xe_pte_encode(vm, batch, i, pat_index, 0);
>>   
>>   			xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64,
>>   				  entry);
>> @@ -259,7 +261,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>>   		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
>>   
>>   		flags = XE_PPGTT_PTE_DM;
>> -		flags = __xe_pte_encode(flags, XE_CACHE_WB, vm, NULL, 2);
>> +		flags = __xe_pte_encode(flags, pat_index, vm, NULL, 2);
>>   
>>   		/*
>>   		 * Use 1GB pages, it shouldn't matter the physical amount of
>> @@ -454,6 +456,7 @@ static void emit_pte(struct xe_migrate *m,
>>   		     struct xe_res_cursor *cur,
>>   		     u32 size, struct xe_bo *bo)
>>   {
>> +	u16 pat_index = xe_pat_get_index(m->tile->xe, XE_CACHE_WB);
>>   	u32 ptes;
>>   	u64 ofs = at_pt * XE_PAGE_SIZE;
>>   	u64 cur_ofs;
>> @@ -494,7 +497,7 @@ static void emit_pte(struct xe_migrate *m,
>>   				addr += vram_region_gpu_offset(bo->ttm.resource);
>>   				addr |= XE_PPGTT_PTE_DM;
>>   			}
>> -			addr = __xe_pte_encode(addr, XE_CACHE_WB, m->q->vm, NULL, 0);
>> +			addr = __xe_pte_encode(addr, pat_index, m->q->vm, NULL, 0);
>>   			bb->cs[bb->len++] = lower_32_bits(addr);
>>   			bb->cs[bb->len++] = upper_32_bits(addr);
>>   
>> @@ -1254,7 +1257,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
>>   
>>   			xe_tile_assert(tile, pt_bo->size == SZ_4K);
>>   
>> -			addr = xe_pte_encode(vm, pt_bo, 0, XE_CACHE_WB, 0);
>> +			addr = xe_pte_encode(vm, pt_bo, 0, xe_pat_get_index(xe, XE_CACHE_WB), 0);
>>   			bb->cs[bb->len++] = lower_32_bits(addr);
>>   			bb->cs[bb->len++] = upper_32_bits(addr);
>>   		}
>> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
>> index a1b164cf8bce..7dd93cbff704 100644
>> --- a/drivers/gpu/drm/xe/xe_pt.c
>> +++ b/drivers/gpu/drm/xe/xe_pt.c
>> @@ -10,6 +10,7 @@
>>   #include "xe_gt.h"
>>   #include "xe_gt_tlb_invalidation.h"
>>   #include "xe_migrate.h"
>> +#include "xe_pat.h"
>>   #include "xe_pt_types.h"
>>   #include "xe_pt_walk.h"
>>   #include "xe_res_cursor.h"
>> @@ -67,7 +68,7 @@ u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset)
>>   	return pde;
>>   }
>>   
>> -u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
>> +u64 __xe_pte_encode(u64 pte, u16 pat_index,
>>   		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level)
>>   {
>>   	struct xe_device *xe = vm->xe;
>> @@ -85,7 +86,7 @@ u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
>>   	else if (pt_level == 2)
>>   		pte |= XE_PDPE_PS_1G;
>>   
>> -	pte = vm->pat_encode.pte_encode(xe, pte, cache);
>> +	pte = vm->pat_encode.pte_encode(xe, pte, pat_index);
>>   
>>   	/* XXX: Does hw support 1 GiB pages? */
>>   	XE_WARN_ON(pt_level > 2);
>> @@ -103,7 +104,7 @@ u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
>>    *
>>    * Return: An encoded page-table entry. No errors.
>>    */
>> -u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
>> +u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, u16 pat_index,
>>   		  u32 pt_level)
>>   {
>>   	u64 pte;
>> @@ -112,7 +113,7 @@ u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_
>>   	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
>>   		pte |= XE_PPGTT_PTE_DM;
>>   
>> -	return __xe_pte_encode(pte, cache, vm, NULL, pt_level);
>> +	return __xe_pte_encode(pte, pat_index, vm, NULL, pt_level);
>>   }
>>   
>>   static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
>> @@ -125,7 +126,7 @@ static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
>>   
>>   	if (level == 0) {
>>   		u64 empty = xe_pte_encode(vm, vm->scratch_bo[id], 0,
>> -					  XE_CACHE_WB, 0);
>> +					  xe_pat_get_index(vm->xe, XE_CACHE_WB), 0);
>>   
>>   		return empty;
>>   	} else {
>> @@ -358,8 +359,6 @@ struct xe_pt_stage_bind_walk {
>>   	struct xe_vm *vm;
>>   	/** @tile: The tile we're building for. */
>>   	struct xe_tile *tile;
>> -	/** @cache: Desired cache level for the ptes */
>> -	enum xe_cache_level cache;
>>   	/** @default_pte: PTE flag only template. No address is associated */
>>   	u64 default_pte;
>>   	/** @dma_offset: DMA offset to add to the PTE. */
>> @@ -594,7 +593,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>>   
>>   		pte = __xe_pte_encode(is_null ? 0 :
>>   				      xe_res_dma(curs) + xe_walk->dma_offset,
>> -				      xe_walk->cache, xe_walk->vm, xe_walk->vma, level);
>> +				      xe_walk->vma->pat_index, xe_walk->vm, xe_walk->vma, level);
>>   		pte |= xe_walk->default_pte;
>>   
>>   		/*
>> @@ -720,13 +719,8 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
>>   		if (vma && vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT)
>>   			xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
>>   		xe_walk.dma_offset = vram_region_gpu_offset(bo->ttm.resource);
>> -		xe_walk.cache = XE_CACHE_WB;
>> -	} else {
>> -		if (!xe_vma_has_no_bo(vma) && bo->flags & XE_BO_SCANOUT_BIT)
>> -			xe_walk.cache = XE_CACHE_WT;
>> -		else
>> -			xe_walk.cache = XE_CACHE_WB;
>>   	}
>> +
>>   	if (!xe_vma_has_no_bo(vma) && xe_bo_is_stolen(bo))
>>   		xe_walk.dma_offset = xe_ttm_stolen_gpu_offset(xe_bo_device(bo));
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
>> index 0e66436d707d..6d10823fca9b 100644
>> --- a/drivers/gpu/drm/xe/xe_pt.h
>> +++ b/drivers/gpu/drm/xe/xe_pt.h
>> @@ -47,9 +47,9 @@ bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
>>   
>>   u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset);
>>   
>> -u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
>> +u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, u16 pat_index,
>>   		  u32 pt_level);
>> -u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
>> +u64 __xe_pte_encode(u64 pte, u16 pat_index,
>>   		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level);
>>   
>>   #endif
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index ba612a5ee2d8..98db7a298139 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -6,6 +6,7 @@
>>   #include "xe_vm.h"
>>   
>>   #include <linux/dma-fence-array.h>
>> +#include <linux/nospec.h>
>>   
>>   #include <drm/drm_exec.h>
>>   #include <drm/drm_print.h>
>> @@ -858,7 +859,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>>   				    u64 start, u64 end,
>>   				    bool read_only,
>>   				    bool is_null,
>> -				    u8 tile_mask)
>> +				    u8 tile_mask,
>> +				    u16 pat_index)
>>   {
>>   	struct xe_vma *vma;
>>   	struct xe_tile *tile;
>> @@ -897,6 +899,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
>>   			vma->tile_mask |= 0x1 << id;
>>   	}
>>   
>> +	vma->pat_index = pat_index;
>> +
>>   	if (vm->xe->info.platform == XE_PVC)
>>   		vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
>>   
>> @@ -1195,10 +1199,8 @@ static void xe_vma_op_work_func(struct work_struct *w);
>>   static void vm_destroy_work_func(struct work_struct *w);
>>   
>>   static u64 xe2_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
>> -						enum xe_cache_level cache)
>> +				    u16 pat_index)
>>   {
>> -	u32 pat_index = xe_pat_get_index(xe, cache);
>> -
>>   	if (pat_index & BIT(0))
>>   		pte_pat |= BIT(3);
>>   
>> @@ -1216,10 +1218,8 @@ static u64 xe2_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
>>   }
>>   
>>   static u64 xelp_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
>> -						enum xe_cache_level cache)
>> +				     u16 pat_index)
>>   {
>> -	u32 pat_index = xe_pat_get_index(xe, cache);
>> -
>>   	if (pat_index & BIT(0))
>>   		pte_pat |= BIT(3);
>>   
>> @@ -2300,7 +2300,7 @@ static void print_op(struct xe_device *xe, struct drm_gpuva_op *op)
>>   static struct drm_gpuva_ops *
>>   vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>>   			 u64 bo_offset_or_userptr, u64 addr, u64 range,
>> -			 u32 operation, u8 tile_mask, u32 region)
>> +			 u32 operation, u8 tile_mask, u32 region, u16 pat_index)
>>   {
>>   	struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
>>   	struct drm_gpuva_ops *ops;
>> @@ -2327,6 +2327,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>>   			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
>>   
>>   			op->tile_mask = tile_mask;
>> +			op->pat_index = pat_index;
>>   			op->map.immediate =
>>   				operation & XE_VM_BIND_FLAG_IMMEDIATE;
>>   			op->map.read_only =
>> @@ -2354,6 +2355,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>>   			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
>>   
>>   			op->tile_mask = tile_mask;
>> +			op->pat_index = pat_index;
>>   			op->prefetch.region = region;
>>   		}
>>   		break;
>> @@ -2396,7 +2398,8 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
>>   }
>>   
>>   static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
>> -			      u8 tile_mask, bool read_only, bool is_null)
>> +			      u8 tile_mask, bool read_only, bool is_null,
>> +			      u16 pat_index)
>>   {
>>   	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
>>   	struct xe_vma *vma;
>> @@ -2412,7 +2415,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
>>   	vma = xe_vma_create(vm, bo, op->gem.offset,
>>   			    op->va.addr, op->va.addr +
>>   			    op->va.range - 1, read_only, is_null,
>> -			    tile_mask);
>> +			    tile_mask, pat_index);
>>   	if (bo)
>>   		xe_bo_unlock(bo);
>>   
>> @@ -2569,7 +2572,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>>   
>>   			vma = new_vma(vm, &op->base.map,
>>   				      op->tile_mask, op->map.read_only,
>> -				      op->map.is_null);
>> +				      op->map.is_null, op->pat_index);
>>   			if (IS_ERR(vma)) {
>>   				err = PTR_ERR(vma);
>>   				goto free_fence;
>> @@ -2597,7 +2600,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>>   
>>   				vma = new_vma(vm, op->base.remap.prev,
>>   					      op->tile_mask, read_only,
>> -					      is_null);
>> +					      is_null, op->pat_index);
>>   				if (IS_ERR(vma)) {
>>   					err = PTR_ERR(vma);
>>   					goto free_fence;
>> @@ -2633,7 +2636,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
>>   
>>   				vma = new_vma(vm, op->base.remap.next,
>>   					      op->tile_mask, read_only,
>> -					      is_null);
>> +					      is_null, op->pat_index);
>>   				if (IS_ERR(vma)) {
>>   					err = PTR_ERR(vma);
>>   					goto free_fence;
>> @@ -3146,7 +3149,23 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
>>   		u32 obj = (*bind_ops)[i].obj;
>>   		u64 obj_offset = (*bind_ops)[i].obj_offset;
>>   		u32 region = (*bind_ops)[i].region;
>> +		u16 pat_index = (*bind_ops)[i].pat_index;
>>   		bool is_null = op & XE_VM_BIND_FLAG_NULL;
>> +		u16 coh_mode;
>> +
>> +		if (XE_IOCTL_DBG(xe, pat_index >= xe->info.pat.n_entries)) {
>> +			err = -EINVAL;
>> +			goto free_bind_ops;
>> +		}
>> +
>> +		pat_index = array_index_nospec(pat_index,
>> +					       xe->info.pat.n_entries);
>> +		(*bind_ops)[i].pat_index = pat_index;
>> +		coh_mode = xe_pat_index_get_coh_mode(xe, pat_index);
>> +		if (XE_WARN_ON(!coh_mode || coh_mode > XE_GEM_COH_AT_LEAST_1WAY)) {
>> +			err = -EINVAL;
>> +			goto free_bind_ops;
>> +		}
>>   
>>   		if (i == 0) {
>>   			*async = !!(op & XE_VM_BIND_FLAG_ASYNC);
>> @@ -3188,6 +3207,8 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
>>   				 VM_BIND_OP(op) == XE_VM_BIND_OP_UNMAP_ALL) ||
>>   		    XE_IOCTL_DBG(xe, obj &&
>>   				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP_USERPTR) ||
>> +		    XE_IOCTL_DBG(xe, coh_mode == XE_GEM_COH_NONE &&
>> +				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP_USERPTR) ||
>>   		    XE_IOCTL_DBG(xe, obj &&
>>   				 VM_BIND_OP(op) == XE_VM_BIND_OP_PREFETCH) ||
>>   		    XE_IOCTL_DBG(xe, region &&
>> @@ -3336,6 +3357,8 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>>   		u64 addr = bind_ops[i].addr;
>>   		u32 obj = bind_ops[i].obj;
>>   		u64 obj_offset = bind_ops[i].obj_offset;
>> +		u16 pat_index = bind_ops[i].pat_index;
>> +		u16 coh_mode;
>>   
>>   		if (!obj)
>>   			continue;
>> @@ -3363,6 +3386,23 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>>   				goto put_obj;
>>   			}
>>   		}
>> +
>> +		coh_mode = xe_pat_index_get_coh_mode(xe, pat_index);
>> +		if (bos[i]->coh_mode) {
>> +			if (XE_IOCTL_DBG(xe, coh_mode < bos[i]->coh_mode)) {
>> +				err = -EINVAL;
>> +				goto put_obj;
>> +			}
>> +		} else if (XE_IOCTL_DBG(xe, coh_mode == XE_GEM_COH_NONE)) {
>> +			/*
>> +			 * Imported dma-buf from a different device should
>> +			 * require 1way or 2way coherency since we don't know
>> +			 * how it was mapped on the CPU. Just assume is it
>> +			 * potentially cached on CPU side.
>> +			 */
>> +			err = -EINVAL;
>> +			goto put_obj;
>> +		}
>>   	}
>>   
>>   	if (args->num_syncs) {
>> @@ -3400,10 +3440,11 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>>   		u64 obj_offset = bind_ops[i].obj_offset;
>>   		u8 tile_mask = bind_ops[i].tile_mask;
>>   		u32 region = bind_ops[i].region;
>> +		u16 pat_index = bind_ops[i].pat_index;
>>   
>>   		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset,
>>   						  addr, range, op, tile_mask,
>> -						  region);
>> +						  region, pat_index);
>>   		if (IS_ERR(ops[i])) {
>>   			err = PTR_ERR(ops[i]);
>>   			ops[i] = NULL;
>> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
>> index dc583f00919f..54658f400174 100644
>> --- a/drivers/gpu/drm/xe/xe_vm_types.h
>> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
>> @@ -111,6 +111,11 @@ struct xe_vma {
>>   	 */
>>   	u8 tile_present;
>>   
>> +	/**
>> +	 * @pat_index: The pat index to use when encoding the PTEs for this vma.
>> +	 */
>> +	u16 pat_index;
>> +
>>   	struct {
>>   		struct list_head rebind_link;
>>   	} notifier;
>> @@ -338,8 +343,7 @@ struct xe_vm {
>>   	bool batch_invalidate_tlb;
>>   
>>   	struct {
>> -		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat,
>> -				  enum xe_cache_level cache);
>> +		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat, u16 pat_index);
>>   	} pat_encode;
>>   };
>>   
>> @@ -419,6 +423,8 @@ struct xe_vma_op {
>>   	struct async_op_fence *fence;
>>   	/** @tile_mask: gt mask for this operation */
>>   	u8 tile_mask;
>> +	/** @pat_index: The pat index to use for this operation. */
>> +	u16 pat_index;
>>   	/** @flags: operation flags */
>>   	enum xe_vma_op_flags flags;
>>   
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index 737bb1d4c6f7..75b42c1116f2 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -605,8 +605,49 @@ struct drm_xe_vm_bind_op {
>>   	 */
>>   	__u32 obj;
>>   
>> +	/**
>> +	 * @pat_index: The platform defined @pat_index to use for this mapping.
>> +	 * The index basically maps to some predefined memory attributes,
>> +	 * including things like caching, coherency, compression etc.  The exact
>> +	 * meaning of the pat_index is platform specific and defined in the
>> +	 * Bspec and PRMs.  When the KMD sets up the binding the index here is
>> +	 * encoded into the ppGTT PTE.
>> +	 *
>> +	 * For coherency the @pat_index needs to be least as coherent as
>> +	 * drm_xe_gem_create.coh_mode. i.e coh_mode(pat_index) >=
>> +	 * drm_xe_gem_create.coh_mode. The KMD will extract the coherency mode
>> +	 * from the @pat_index and reject if there is a mismatch (see note below
>> +	 * for pre-MTL platforms).
>> +	 *
>> +	 * Note: On pre-MTL platforms there is only a caching mode and no
>> +	 * explicit coherency mode, but on such hardware there is always a
>> +	 * shared-LLC (or is dgpu) so all GT memory accesses are coherent with
>> +	 * CPU caches even with the caching mode set as uncached.  It's only the
>> +	 * display engine that is incoherent (on dgpu it must be in VRAM which
>> +	 * is always mapped as WC on the CPU). However to keep the uapi somewhat
>> +	 * consistent with newer platforms the KMD groups the different cache
>> +	 * levels into the following coherency buckets on all pre-MTL platforms:
>> +	 *
>> +	 *	ppGTT UC -> XE_GEM_COH_NONE
>> +	 *	ppGTT WC -> XE_GEM_COH_NONE
>> +	 *	ppGTT WT -> XE_GEM_COH_NONE
>> +	 *	ppGTT WB -> XE_GEM_COH_AT_LEAST_1WAY
>> +	 *
>> +	 * In practice UC/WC/WT should only ever used for scanout surfaces on
>> +	 * such platforms (or perhaps in general for dma-buf if shared with
>> +	 * another device) since it is only the display engine that is actually
>> +	 * incoherent.  Everything else should typically use WB given that we
>> +	 * have a shared-LLC.  On MTL+ this completely changes and the HW
>> +	 * defines the coherency mode as part of the @pat_index, where
>> +	 * incoherent GT access is possible.
> 
> with this in mind I noticed that on i915 the scanout is just a pat param to
> the uapi, while in Xe we have a buffer flag:
> XE_GEM_CREATE_FLAG_SCANOUT
> 
> should we continue with this flag, or should we do the same pat.param that
> i915 is doing?

Can you point to where the scanout pat.param thing is? But yeah, with 
the explicit pat_index and smem_cpu_caching userspace can already select 
the correct thing for display surfaces. I think dropping the flag is 
possible.

> 
>> +	 *
>> +	 * Note: For userptr and externally imported dma-buf the kernel expects
>> +	 * either 1WAY or 2WAY for the @pat_index.
>> +	 */
>> +	__u16 pat_index;
>> +
>>   	/** @pad: MBZ */
>> -	__u32 pad;
>> +	__u16 pad;
>>   
>>   	union {
>>   		/**
>> -- 
>> 2.41.0
>>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-xe] [PATCH v2 1/6] drm/xe/uapi: Add support for cache and coherency mode
  2023-09-26  8:07         ` Matthew Auld
@ 2023-09-26 15:59           ` Souza, Jose
  0 siblings, 0 replies; 28+ messages in thread
From: Souza, Jose @ 2023-09-26 15:59 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Auld,  Matthew

On Tue, 2023-09-26 at 09:07 +0100, Matthew Auld wrote:
> On 25/09/2023 19:26, Souza, Jose wrote:
> > On Mon, 2023-09-25 at 09:06 +0100, Matthew Auld wrote:
> > > On 21/09/2023 21:07, Souza, Jose wrote:
> > > > On Thu, 2023-09-14 at 16:31 +0100, Matthew Auld wrote:
> > > > > From: Pallavi Mishra <pallavi.mishra@intel.com>
> > > > > 
> > > > > Allow userspace to specify the CPU caching mode to use for system memory
> > > > > in addition to coherency modes during object creation. Modify gem create
> > > > > handler and introduce xe_bo_create_user to replace xe_bo_create. In a
> > > > > later patch we will support setting the pat_index as part of vm_bind,
> > > > > where expectation is that the coherency mode extracted from the
> > > > > pat_index must match the one set at object creation.
> > > > > 
> > > > > v2
> > > > >     - s/smem_caching/smem_cpu_caching/ and
> > > > >       s/XE_GEM_CACHING/XE_GEM_CPU_CACHING/. (Matt Roper)
> > > > >     - Drop COH_2WAY and just use COH_NONE + COH_AT_LEAST_1WAY; KMD mostly
> > > > >       just cares that zeroing/swap-in can't be bypassed with the given
> > > > >       smem_caching mode. (Matt Roper)
> > > > >     - Fix broken range check for coh_mode and smem_cpu_caching and also
> > > > >       don't use constant value, but the already defined macros. (José)
> > > > >     - Prefer switch statement for smem_cpu_caching -> ttm_caching. (José)
> > > > >     - Add note in kernel-doc for dgpu and coherency modes for system
> > > > >       memory. (José)
> > > > > 
> > > > > Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
> > > > > Co-authored-by: Matthew Auld <matthew.auld@intel.com>
> > > > > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > > > > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > > > Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> > > > > Cc: Matt Roper <matthew.d.roper@intel.com>
> > > > > Cc: José Roberto de Souza <jose.souza@intel.com>
> > > > > Cc: Filip Hazubski <filip.hazubski@intel.com>
> > > > > Cc: Carl Zhang <carl.zhang@intel.com>
> > > > > Cc: Effie Yu <effie.yu@intel.com>
> > > > > ---
> > > > >    drivers/gpu/drm/xe/xe_bo.c       | 105 ++++++++++++++++++++++++++-----
> > > > >    drivers/gpu/drm/xe/xe_bo.h       |   3 +-
> > > > >    drivers/gpu/drm/xe/xe_bo_types.h |  10 +++
> > > > >    drivers/gpu/drm/xe/xe_dma_buf.c  |   5 +-
> > > > >    include/uapi/drm/xe_drm.h        |  57 ++++++++++++++++-
> > > > >    5 files changed, 158 insertions(+), 22 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> > > > > index 27726d4f3423..f3facd788f15 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > > > @@ -325,7 +325,7 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
> > > > >    	struct xe_device *xe = xe_bo_device(bo);
> > > > >    	struct xe_ttm_tt *tt;
> > > > >    	unsigned long extra_pages;
> > > > > -	enum ttm_caching caching = ttm_cached;
> > > > > +	enum ttm_caching caching;
> > > > >    	int err;
> > > > >    
> > > > >    	tt = kzalloc(sizeof(*tt), GFP_KERNEL);
> > > > > @@ -339,13 +339,25 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
> > > > >    		extra_pages = DIV_ROUND_UP(xe_device_ccs_bytes(xe, bo->size),
> > > > >    					   PAGE_SIZE);
> > > > >    
> > > > > +	switch (bo->smem_cpu_caching) {
> > > > > +	case XE_GEM_CPU_CACHING_WC:
> > > > > +		caching = ttm_write_combined;
> > > > > +		break;
> > > > > +	case XE_GEM_CPU_CACHING_UC:
> > > > > +		caching = ttm_uncached;
> > > > > +		break;
> > > > > +	default:
> > > > > +		caching = ttm_cached;
> > > > > +		break;
> > > > > +	}
> > > > > +
> > > > >    	/*
> > > > >    	 * Display scanout is always non-coherent with the CPU cache.
> > > > >    	 *
> > > > >    	 * For Xe_LPG and beyond, PPGTT PTE lookups are also non-coherent and
> > > > >    	 * require a CPU:WC mapping.
> > > > >    	 */
> > > > > -	if (bo->flags & XE_BO_SCANOUT_BIT ||
> > > > > +	if ((!bo->smem_cpu_caching && bo->flags & XE_BO_SCANOUT_BIT) ||
> > > > >    	    (xe->info.graphics_verx100 >= 1270 && bo->flags & XE_BO_PAGETABLE))
> > > > >    		caching = ttm_write_combined;
> > > > >    
> > > > > @@ -1184,9 +1196,10 @@ void xe_bo_free(struct xe_bo *bo)
> > > > >    	kfree(bo);
> > > > >    }
> > > > >    
> > > > > -struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> > > > > +struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> > > > >    				    struct xe_tile *tile, struct dma_resv *resv,
> > > > >    				    struct ttm_lru_bulk_move *bulk, size_t size,
> > > > > +				    u16 smem_cpu_caching, u16 coh_mode,
> > > > >    				    enum ttm_bo_type type, u32 flags)
> > > > >    {
> > > > >    	struct ttm_operation_ctx ctx = {
> > > > > @@ -1224,6 +1237,8 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> > > > >    	bo->tile = tile;
> > > > >    	bo->size = size;
> > > > >    	bo->flags = flags;
> > > > > +	bo->smem_cpu_caching = smem_cpu_caching;
> > > > > +	bo->coh_mode = coh_mode;
> > > > >    	bo->ttm.base.funcs = &xe_gem_object_funcs;
> > > > >    	bo->props.preferred_mem_class = XE_BO_PROPS_INVALID;
> > > > >    	bo->props.preferred_gt = XE_BO_PROPS_INVALID;
> > > > > @@ -1307,10 +1322,11 @@ static int __xe_bo_fixed_placement(struct xe_device *xe,
> > > > >    }
> > > > >    
> > > > >    struct xe_bo *
> > > > > -xe_bo_create_locked_range(struct xe_device *xe,
> > > > > -			  struct xe_tile *tile, struct xe_vm *vm,
> > > > > -			  size_t size, u64 start, u64 end,
> > > > > -			  enum ttm_bo_type type, u32 flags)
> > > > > +__xe_bo_create_locked(struct xe_device *xe,
> > > > > +		      struct xe_tile *tile, struct xe_vm *vm,
> > > > > +		      size_t size, u64 start, u64 end,
> > > > > +		      u16 smem_cpu_caching, u16 coh_mode,
> > > > > +		      enum ttm_bo_type type, u32 flags)
> > > > >    {
> > > > >    	struct xe_bo *bo = NULL;
> > > > >    	int err;
> > > > > @@ -1331,10 +1347,11 @@ xe_bo_create_locked_range(struct xe_device *xe,
> > > > >    		}
> > > > >    	}
> > > > >    
> > > > > -	bo = __xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
> > > > > +	bo = ___xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
> > > > >    				   vm && !xe_vm_in_fault_mode(vm) &&
> > > > >    				   flags & XE_BO_CREATE_USER_BIT ?
> > > > >    				   &vm->lru_bulk_move : NULL, size,
> > > > > +				   smem_cpu_caching, coh_mode,
> > > > >    				   type, flags);
> > > > >    	if (IS_ERR(bo))
> > > > >    		return bo;
> > > > > @@ -1368,11 +1385,35 @@ xe_bo_create_locked_range(struct xe_device *xe,
> > > > >    	return ERR_PTR(err);
> > > > >    }
> > > > >    
> > > > > +struct xe_bo *
> > > > > +xe_bo_create_locked_range(struct xe_device *xe,
> > > > > +			  struct xe_tile *tile, struct xe_vm *vm,
> > > > > +			  size_t size, u64 start, u64 end,
> > > > > +			  enum ttm_bo_type type, u32 flags)
> > > > > +{
> > > > > +	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);
> > > > > +}
> > > > > +
> > > > >    struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
> > > > >    				  struct xe_vm *vm, size_t size,
> > > > >    				  enum ttm_bo_type type, u32 flags)
> > > > >    {
> > > > > -	return xe_bo_create_locked_range(xe, tile, vm, size, 0, ~0ULL, type, flags);
> > > > > +	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);
> > > > > +}
> > > > > +
> > > > > +static struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
> > > > > +				       struct xe_vm *vm, size_t size,
> > > > > +				       u16 smem_cpu_caching, u16 coh_mode,
> > > > > +				       enum ttm_bo_type type,
> > > > > +				       u32 flags)
> > > > > +{
> > > > > +	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
> > > > > +						 smem_cpu_caching, coh_mode, type,
> > > > > +						 flags | XE_BO_CREATE_USER_BIT);
> > > > > +	if (!IS_ERR(bo))
> > > > > +		xe_bo_unlock_vm_held(bo);
> > > > > +
> > > > > +	return bo;
> > > > >    }
> > > > >    
> > > > >    struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
> > > > > @@ -1755,11 +1796,11 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
> > > > >    	struct drm_xe_gem_create *args = data;
> > > > >    	struct xe_vm *vm = NULL;
> > > > >    	struct xe_bo *bo;
> > > > > -	unsigned int bo_flags = XE_BO_CREATE_USER_BIT;
> > > > > +	unsigned int bo_flags;
> > > > >    	u32 handle;
> > > > >    	int err;
> > > > >    
> > > > > -	if (XE_IOCTL_DBG(xe, args->extensions) || XE_IOCTL_DBG(xe, args->pad) ||
> > > > > +	if (XE_IOCTL_DBG(xe, args->extensions) ||
> > > > >    	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
> > > > >    		return -EINVAL;
> > > > >    
> > > > > @@ -1801,6 +1842,32 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
> > > > >    		bo_flags |= XE_BO_NEEDS_CPU_ACCESS;
> > > > >    	}
> > > > >    
> > > > > +	if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY))
> > > > > +		return -EINVAL;
> > > > > +
> > > > > +	if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC))
> > > > > +		return -EINVAL;
> > > > > +
> > > > > +	if (bo_flags & XE_BO_CREATE_SYSTEM_BIT) {
> > > > > +		if (XE_IOCTL_DBG(xe, !args->coh_mode))
> > > > > +			return -EINVAL;
> > > > > +
> > > > > +		if (XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
> > > > > +			return -EINVAL;
> > > > > +
> > > > > +		if (XE_IOCTL_DBG(xe, !IS_DGFX(xe) &&
> > > > > +				 bo_flags & XE_BO_SCANOUT_BIT &&
> > > > > +				 args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
> > > > > +			return -EINVAL;
> > > > > +
> > > > > +		if (args->coh_mode == XE_GEM_COH_NONE) {
> > > > > +			if (XE_IOCTL_DBG(xe, args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
> > > > > +				return -EINVAL;
> > > > > +		}
> > > > > +	} else if (XE_IOCTL_DBG(xe, args->smem_cpu_caching)) {
> > > > 
> > > > should be XE_IOCTL_DBG(xe, !args->smem_cpu_caching).
> > > > 
> > > > uAPI don't say anything about allow smem_cpu_caching or coh_mode == 0, did this to be able to run tests without display in DG2:
> > > 
> > > The above check is for VRAM-only objects. For smem_cpu_caching the
> > > kernel-doc says: "MUST be left as zero for VRAM-only objects."
> > > Internally the KMD uses WC for CPU mapping VRAM which is out of the
> > > control of userspace.
> > 
> > In my opinion this should be != 0 and match with the PAT index that will be set in VM bind.
> 
> This is just talking about VRAM-only objects. If it was evicted to 
> system memory userspace can't touch the pages from the GPU, without the 
> KMD first migrating it back to VRAM. So the pat_index mostly applies to 
> the VRAM placement in such a case, and that is implicitly always WC.
> 
> However the smem_cpu_caching might be interesting for controlling the 
> evicted mmap caching that is used. i.e VRAM-only object is evicted to 
> system memory and accessed by the CPU from userspace. I didn't think 
> userspace would really care, so figured just reject/ignore 
> smem_cpu_caching for VRAM-only objects.
> 
> I can remove the !smem_cpu_caching requirement for VRAM-only and update 
> the kernel-doc to say that this controls the evicted-to-smem caching?

Ohh! thanks for the explanation, so now I got why did you want it to be 0 for lmem only.

> 
> > Have not read much but I believe a CXL GPU memory would be mapped as WB to take advantage of CXL caching protocols.
> 
> Right, but for that maybe you would just add something like 
> vram_cpu_caching, assuming that it still uses this type of interface?

But why we need a smem_cpu_caching and vram_cpu_caching? can't we have just a generic one?
With just cpu_caching for lmem and lmem + smem placements UMD would set WC for current platforms and WB or WC for smem.
UMD would know that mmap mode matches what was set in gem_create.

> 
> > 
> > So I believe the kernel doc restrictions should be removed and run time check for a (smem_cpu_caching != WC && is_dgfx()) return -EINVAL;
> > 
> > A question related to that, so a bo placed in lmem + smem can have a WB caching in DG2? How would that work? So far Mesa was handling that case as WC
> > as well.
> 
> Yeah, you can have WB for smem, and then WC for vram.
> 
> AFAIK on dgpu without this series, you get WB for system memory (you 
> can't turn off snooping on dgpu so might as well use WB I guess). If 
> it's currently placed in VRAM you get WC.
> 
> With this series you can also select WC for lmem + smem, if that is 
> preferred. But I think for smem-only you might want to use WB on dgpu, 
> on current platforms.

Having one caching mode for lmem + smem is the way to go as UMD have no clue where is the placement at given time.

> 
> > 
> > > 
> > > coh_mode == 0 is not meant to be allowed, but looks like I missed the
> > > check here for VRAM-only. Will fix.
> > > 
> > > > 
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> > > > index f3facd788f152..e0e4fefcd2060 100644
> > > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > > @@ -1796,7 +1796,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
> > > >           struct drm_xe_gem_create *args = data;
> > > >           struct xe_vm *vm = NULL;
> > > >           struct xe_bo *bo;
> > > > -       unsigned int bo_flags;
> > > > +       unsigned int bo_flags = 0;
> > > >           u32 handle;
> > > >           int err;
> > > > 
> > > > @@ -1842,19 +1842,15 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
> > > >                   bo_flags |= XE_BO_NEEDS_CPU_ACCESS;
> > > >           }
> > > > 
> > > > -       if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY))
> > > > +       if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY) ||
> > > > +           XE_IOCTL_DBG(xe, !args->coh_mode))
> > > >                   return -EINVAL;
> > > > 
> > > > -       if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC))
> > > > +       if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC) ||
> > > > +           XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
> > > >                   return -EINVAL;
> > > > 
> > > >           if (bo_flags & XE_BO_CREATE_SYSTEM_BIT) {
> > > > -               if (XE_IOCTL_DBG(xe, !args->coh_mode))
> > > > -                       return -EINVAL;
> > > > -
> > > > -               if (XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
> > > > -                       return -EINVAL;
> > > > -
> > > >                   if (XE_IOCTL_DBG(xe, !IS_DGFX(xe) &&
> > > >                                    bo_flags & XE_BO_SCANOUT_BIT &&
> > > >                                    args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
> > > > @@ -1864,8 +1860,6 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
> > > >                           if (XE_IOCTL_DBG(xe, args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
> > > >                                   return -EINVAL;
> > > >                   }
> > > > -       } else if (XE_IOCTL_DBG(xe, args->smem_cpu_caching)) {
> > > > -               return -EINVAL;
> > > >           }
> > > > 
> > > >           if (args->vm_id) {
> > > > 
> > > > 
> > > > > +		return -EINVAL;
> > > > > +	}
> > > > > +
> > > > >    	if (args->vm_id) {
> > > > >    		vm = xe_vm_lookup(xef, args->vm_id);
> > > > >    		if (XE_IOCTL_DBG(xe, !vm))
> > > > > @@ -1812,8 +1879,10 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
> > > > >    		}
> > > > >    	}
> > > > >    
> > > > > -	bo = xe_bo_create(xe, NULL, vm, args->size, ttm_bo_type_device,
> > > > > -			  bo_flags);
> > > > > +	bo = xe_bo_create_user(xe, NULL, vm, args->size,
> > > > > +			       args->smem_cpu_caching, args->coh_mode,
> > > > > +			       ttm_bo_type_device,
> > > > > +			       bo_flags);
> > > > >    	if (IS_ERR(bo)) {
> > > > >    		err = PTR_ERR(bo);
> > > > >    		goto out_vm;
> > > > > @@ -2105,10 +2174,12 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
> > > > >    	args->size = ALIGN(mul_u32_u32(args->pitch, args->height),
> > > > >    			   page_size);
> > > > >    
> > > > > -	bo = xe_bo_create(xe, NULL, NULL, args->size, ttm_bo_type_device,
> > > > > -			  XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> > > > > -			  XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
> > > > > -			  XE_BO_NEEDS_CPU_ACCESS);
> > > > > +	bo = xe_bo_create_user(xe, NULL, NULL, args->size,
> > > > > +			       XE_GEM_CPU_CACHING_WC, XE_GEM_COH_NONE,
> > > > > +			       ttm_bo_type_device,
> > > > > +			       XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> > > > > +			       XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
> > > > > +			       XE_BO_NEEDS_CPU_ACCESS);
> > > > >    	if (IS_ERR(bo))
> > > > >    		return PTR_ERR(bo);
> > > > >    
> > > > > diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> > > > > index 4a68d869b3b5..4a0ee81fe598 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_bo.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_bo.h
> > > > > @@ -81,9 +81,10 @@ struct sg_table;
> > > > >    struct xe_bo *xe_bo_alloc(void);
> > > > >    void xe_bo_free(struct xe_bo *bo);
> > > > >    
> > > > > -struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> > > > > +struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> > > > >    				    struct xe_tile *tile, struct dma_resv *resv,
> > > > >    				    struct ttm_lru_bulk_move *bulk, size_t size,
> > > > > +				    u16 smem_cpu_caching, u16 coh_mode,
> > > > >    				    enum ttm_bo_type type, u32 flags);
> > > > >    struct xe_bo *
> > > > >    xe_bo_create_locked_range(struct xe_device *xe,
> > > > > diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
> > > > > index 2ea9ad423170..9bee220a6872 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_bo_types.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_bo_types.h
> > > > > @@ -68,6 +68,16 @@ struct xe_bo {
> > > > >    	struct llist_node freed;
> > > > >    	/** @created: Whether the bo has passed initial creation */
> > > > >    	bool created;
> > > > > +	/**
> > > > > +	 * @coh_mode: Coherency setting. Currently only used for userspace
> > > > > +	 * objects.
> > > > > +	 */
> > > > > +	u16 coh_mode;
> > > > > +	/**
> > > > > +	 * @smem_cpu_caching: Caching mode for smem. Currently only used for
> > > > > +	 * userspace objects.
> > > > > +	 */
> > > > > +	u16 smem_cpu_caching;
> > > > >    };
> > > > >    
> > > > >    #define intel_bo_to_drm_bo(bo) (&(bo)->ttm.base)
> > > > > diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
> > > > > index 09343b8b3e96..ac20dbc27a2b 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> > > > > @@ -200,8 +200,9 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
> > > > >    	int ret;
> > > > >    
> > > > >    	dma_resv_lock(resv, NULL);
> > > > > -	bo = __xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> > > > > -				   ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
> > > > > +	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> > > > > +				    0, 0, /* Will require 1way or 2way for vm_bind */
> > > > > +				    ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
> > > > >    	if (IS_ERR(bo)) {
> > > > >    		ret = PTR_ERR(bo);
> > > > >    		goto error;
> > > > > diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> > > > > index 00d5cb4ef85e..737bb1d4c6f7 100644
> > > > > --- a/include/uapi/drm/xe_drm.h
> > > > > +++ b/include/uapi/drm/xe_drm.h
> > > > > @@ -456,8 +456,61 @@ struct drm_xe_gem_create {
> > > > >    	 */
> > > > >    	__u32 handle;
> > > > >    
> > > > > -	/** @pad: MBZ */
> > > > > -	__u32 pad;
> > > > > +	/**
> > > > > +	 * @coh_mode: The coherency mode for this object. This will limit the
> > > > > +	 * possible @smem_caching values.
> > > > > +	 *
> > > > > +	 * Supported values:
> > > > > +	 *
> > > > > +	 * XE_GEM_COH_NONE: GPU access is assumed to be not coherent with
> > > > > +	 * CPU. CPU caches are not snooped.
> > > > > +	 *
> > > > > +	 * XE_GEM_COH_AT_LEAST_1WAY:
> > > > > +	 *
> > > > > +	 * CPU-GPU coherency must be at least 1WAY.
> > > > > +	 *
> > > > > +	 * If 1WAY then GPU access is coherent with CPU (CPU caches are snooped)
> > > > > +	 * until GPU acquires. The acquire by the GPU is not tracked by CPU
> > > > > +	 * caches.
> > > > > +	 *
> > > > > +	 * If 2WAY then should be fully coherent between GPU and CPU.  Fully
> > > > > +	 * tracked by CPU caches. Both CPU and GPU caches are snooped.
> > > > > +	 *
> > > > > +	 * Note: On dgpu the GPU device never caches system memory (outside of
> > > > > +	 * the special system-memory-read-only cache, which is anyway flushed by
> > > > > +	 * KMD when nuking TLBs for a given object so should be no concern to
> > > > > +	 * userspace). The device should be thought of as always 1WAY coherent,
> > > > > +	 * with the addition that the GPU never caches system memory. At least
> > > > > +	 * on current dgpu HW there is no way to turn off snooping so likely the
> > > > > +	 * different coherency modes of the pat_index make no difference for
> > > > > +	 * system memory.
> > > > > +	 */
> > > > > +#define XE_GEM_COH_NONE			1
> > > > > +#define XE_GEM_COH_AT_LEAST_1WAY	2
> > > > > +	__u16 coh_mode;
> > > > > +
> > > > > +	/**
> > > > > +	 * @smem_cpu_caching: The CPU caching mode to select for system memory.
> > > > > +	 *
> > > > > +	 * Supported values:
> > > > > +	 *
> > > > > +	 * XE_GEM_CPU_CACHING_WB: Allocate the pages with write-back caching.
> > > > > +	 * On iGPU this can't be used for scanout surfaces. The @coh_mode must
> > > > > +	 * be XE_GEM_COH_AT_LEAST_1WAY.
> > > > > +	 *
> > > > > +	 * XE_GEM_CPU_CACHING_WC: Allocate the pages as write-combined. This is
> > > > > +	 * uncached. Any @coh_mode is permitted. Scanout surfaces should likely
> > > > > +	 * use this.
> > > > > +	 *
> > > > > +	 * XE_GEM_CPU_CACHING_UC: Allocate the pages as uncached. Any @coh_mode
> > > > > +	 * is permitted. Scanout surfaces are permitted to use this.
> > > > > +	 *
> > > > > +	 * MUST be left as zero for VRAM-only objects.
> > > > > +	 */
> > > > > +#define XE_GEM_CPU_CACHING_WB                      1
> > > > > +#define XE_GEM_CPU_CACHING_WC                      2
> > > > > +#define XE_GEM_CPU_CACHING_UC                      3
> > > > > +	__u16 smem_cpu_caching;
> > > > >    
> > > > >    	/** @reserved: Reserved */
> > > > >    	__u64 reserved[2];
> > > > 
> > 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Intel-xe] [PATCH v2 6/6] drm/xe/uapi: support pat_index selection with vm_bind
  2023-09-26  8:17     ` Matthew Auld
@ 2023-09-27 19:30       ` Rodrigo Vivi
  0 siblings, 0 replies; 28+ messages in thread
From: Rodrigo Vivi @ 2023-09-27 19:30 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Filip Hazubski, Lucas De Marchi, Carl Zhang, Effie Yu, Matt Roper,
	intel-xe

On Tue, Sep 26, 2023 at 09:17:39AM +0100, Matthew Auld wrote:
> On 25/09/2023 22:56, Rodrigo Vivi wrote:
> > On Thu, Sep 14, 2023 at 04:31:19PM +0100, Matthew Auld wrote:
> > > Allow userspace to directly control the pat_index for a given vm
> > > binding. This should allow directly controlling the coherency, caching
> > > and potentially other stuff in the future for the ppGTT binding.
> > > 
> > > The exact meaning behind the pat_index is very platform specific (see
> > > BSpec or PRMs) but effectively maps to some predefined memory
> > > attributes. From the KMD pov we only care about the coherency that is
> > > provided by the pat_index, which falls into either NONE, 1WAY or 2WAY.
> > > The vm_bind coherency mode for the given pat_index needs to be at least
> > > as coherent as the coh_mode that was set at object creation. For
> > > platforms that lack the explicit coherency mode, we treat UC/WT/WC as
> > > NONE and WB as AT_LEAST_1WAY.
> > > 
> > > For userptr mappings we lack a corresponding gem object, so the expected
> > > coherency mode is instead implicit and must fall into either 1WAY or
> > > 2WAY. Trying to use NONE will be rejected by the kernel. For imported
> > > dma-buf (from a different device) the coherency mode is also implicit
> > > and must also be either 1WAY or 2WAY i.e AT_LEAST_1WAY.
> > > 
> > > As part of adding pat_index support with vm_bind we also need stop using
> > > xe_cache_level and instead use the pat_index in various places. We still
> > > make use of xe_cache_level, but only as a convenience for kernel
> > > internal objectsi (internally it maps to some reasonable pat_index). For
> > > now this is just a 1:1 conversion of the existing code, however for
> > > platforms like MTL+ we might need to give more control through bo_create
> > > or stop using WB on the CPU side if we need CPU access.
> > > 
> > > v2:
> > >    - Undefined coh_mode(pat_index) can now be treated as programmer error. (Matt Roper)
> > >    - We now allow gem_create.coh_mode <= coh_mode(pat_index), rather than
> > >      having to match exactly. This ensures imported dma-buf can always
> > >      just use 1way (or even 2way), now that we also bundle 1way/2way into
> > >      at_least_1way. We still require 1way/2way for external dma-buf, but
> > >      the policy can now be the same for self-import, if desired.
> > >    - Use u16 for pat_index in uapi. u32 is massive overkill. (José)
> > >    - Move as much of the pat_index validation as we can into
> > >      vm_bind_ioctl_check_args. (José)
> > > 
> > > Bspec: 45101, 44235 #xe
> > > Bspec: 70552, 71582, 59400 #xe2
> > > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > > Cc: Pallavi Mishra <pallavi.mishra@intel.com>
> > > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> > > Cc: Matt Roper <matthew.d.roper@intel.com>
> > > Cc: José Roberto de Souza <jose.souza@intel.com>
> > > Cc: Filip Hazubski <filip.hazubski@intel.com>
> > > Cc: Carl Zhang <carl.zhang@intel.com>
> > > Cc: Effie Yu <effie.yu@intel.com>
> > > ---
> > >   drivers/gpu/drm/xe/tests/xe_migrate.c |  2 +-
> > >   drivers/gpu/drm/xe/xe_ggtt.c          |  7 ++-
> > >   drivers/gpu/drm/xe/xe_ggtt_types.h    |  2 +-
> > >   drivers/gpu/drm/xe/xe_migrate.c       | 13 +++--
> > >   drivers/gpu/drm/xe/xe_pt.c            | 22 ++++-----
> > >   drivers/gpu/drm/xe/xe_pt.h            |  4 +-
> > >   drivers/gpu/drm/xe/xe_vm.c            | 69 +++++++++++++++++++++------
> > >   drivers/gpu/drm/xe/xe_vm_types.h      | 10 +++-
> > >   include/uapi/drm/xe_drm.h             | 43 ++++++++++++++++-
> > >   9 files changed, 128 insertions(+), 44 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
> > > index 6b4388bfbb31..d3bf4751a2d7 100644
> > > --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> > > +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> > > @@ -301,7 +301,7 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
> > >   	/* First part of the test, are we updating our pagetable bo with a new entry? */
> > >   	xe_map_wr(xe, &bo->vmap, XE_PAGE_SIZE * (NUM_KERNEL_PDE - 1), u64,
> > >   		  0xdeaddeadbeefbeef);
> > > -	expected = xe_pte_encode(m->q->vm, pt, 0, XE_CACHE_WB, 0);
> > > +	expected = xe_pte_encode(m->q->vm, pt, 0, xe_pat_get_index(xe, XE_CACHE_WB), 0);
> > >   	if (m->q->vm->flags & XE_VM_FLAG_64K)
> > >   		expected |= XE_PTE_PS64;
> > >   	if (xe_bo_is_vram(pt))
> > > diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> > > index aea26afd4668..7e4da16389af 100644
> > > --- a/drivers/gpu/drm/xe/xe_ggtt.c
> > > +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> > > @@ -41,7 +41,8 @@ u64 xe_ggtt_pte_encode(struct xe_bo *bo, u64 bo_offset)
> > >   		pte |= XE_GGTT_PTE_DM;
> > >   	if ((ggtt->pat_encode).pte_encode)
> > > -		pte = (ggtt->pat_encode).pte_encode(xe, pte, XE_CACHE_WB_1_WAY);
> > > +		pte = (ggtt->pat_encode).pte_encode(xe, pte,
> > > +						    xe_pat_get_index(xe, XE_CACHE_WB_1_WAY));
> > >   	return pte;
> > >   }
> > > @@ -102,10 +103,8 @@ static void primelockdep(struct xe_ggtt *ggtt)
> > >   }
> > >   static u64 xelpg_ggtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
> > > -						enum xe_cache_level cache)
> > > +				     u16 pat_index)
> > >   {
> > > -	u32 pat_index = xe_pat_get_index(xe, cache);
> > > -
> > >   	pte_pat &= ~(XELPG_GGTT_PTE_PAT_MASK);
> > >   	if (pat_index & BIT(0))
> > > diff --git a/drivers/gpu/drm/xe/xe_ggtt_types.h b/drivers/gpu/drm/xe/xe_ggtt_types.h
> > > index 7e55fac1a8a9..7981075bb228 100644
> > > --- a/drivers/gpu/drm/xe/xe_ggtt_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_ggtt_types.h
> > > @@ -31,7 +31,7 @@ struct xe_ggtt {
> > >   	struct {
> > >   		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat,
> > > -						enum xe_cache_level cache);
> > > +				  u16 pat_index);
> > >   	} pat_encode;
> > >   };
> > > diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> > > index 26cbc9107501..89d9e33a07e7 100644
> > > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > @@ -25,6 +25,7 @@
> > >   #include "xe_lrc.h"
> > >   #include "xe_map.h"
> > >   #include "xe_mocs.h"
> > > +#include "xe_pat.h"
> > >   #include "xe_pt.h"
> > >   #include "xe_res_cursor.h"
> > >   #include "xe_sched_job.h"
> > > @@ -162,6 +163,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
> > >   	u32 num_entries = NUM_PT_SLOTS, num_level = vm->pt_root[id]->level;
> > >   	u32 map_ofs, level, i;
> > >   	struct xe_bo *bo, *batch = tile->mem.kernel_bb_pool->bo;
> > > +	u16 pat_index = xe_pat_get_index(xe, XE_CACHE_WB);
> > >   	u64 entry;
> > >   	int ret;
> > > @@ -196,7 +198,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
> > >   	/* Map the entire BO in our level 0 pt */
> > >   	for (i = 0, level = 0; i < num_entries; level++) {
> > > -		entry = xe_pte_encode(vm, bo, i * XE_PAGE_SIZE, XE_CACHE_WB, 0);
> > > +		entry = xe_pte_encode(vm, bo, i * XE_PAGE_SIZE, pat_index, 0);
> > >   		xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64, entry);
> > > @@ -214,7 +216,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
> > >   		for (i = 0; i < batch->size;
> > >   		     i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
> > >   		     XE_PAGE_SIZE) {
> > > -			entry = xe_pte_encode(vm, batch, i, XE_CACHE_WB, 0);
> > > +			entry = xe_pte_encode(vm, batch, i, pat_index, 0);
> > >   			xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64,
> > >   				  entry);
> > > @@ -259,7 +261,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
> > >   		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
> > >   		flags = XE_PPGTT_PTE_DM;
> > > -		flags = __xe_pte_encode(flags, XE_CACHE_WB, vm, NULL, 2);
> > > +		flags = __xe_pte_encode(flags, pat_index, vm, NULL, 2);
> > >   		/*
> > >   		 * Use 1GB pages, it shouldn't matter the physical amount of
> > > @@ -454,6 +456,7 @@ static void emit_pte(struct xe_migrate *m,
> > >   		     struct xe_res_cursor *cur,
> > >   		     u32 size, struct xe_bo *bo)
> > >   {
> > > +	u16 pat_index = xe_pat_get_index(m->tile->xe, XE_CACHE_WB);
> > >   	u32 ptes;
> > >   	u64 ofs = at_pt * XE_PAGE_SIZE;
> > >   	u64 cur_ofs;
> > > @@ -494,7 +497,7 @@ static void emit_pte(struct xe_migrate *m,
> > >   				addr += vram_region_gpu_offset(bo->ttm.resource);
> > >   				addr |= XE_PPGTT_PTE_DM;
> > >   			}
> > > -			addr = __xe_pte_encode(addr, XE_CACHE_WB, m->q->vm, NULL, 0);
> > > +			addr = __xe_pte_encode(addr, pat_index, m->q->vm, NULL, 0);
> > >   			bb->cs[bb->len++] = lower_32_bits(addr);
> > >   			bb->cs[bb->len++] = upper_32_bits(addr);
> > > @@ -1254,7 +1257,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
> > >   			xe_tile_assert(tile, pt_bo->size == SZ_4K);
> > > -			addr = xe_pte_encode(vm, pt_bo, 0, XE_CACHE_WB, 0);
> > > +			addr = xe_pte_encode(vm, pt_bo, 0, xe_pat_get_index(xe, XE_CACHE_WB), 0);
> > >   			bb->cs[bb->len++] = lower_32_bits(addr);
> > >   			bb->cs[bb->len++] = upper_32_bits(addr);
> > >   		}
> > > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> > > index a1b164cf8bce..7dd93cbff704 100644
> > > --- a/drivers/gpu/drm/xe/xe_pt.c
> > > +++ b/drivers/gpu/drm/xe/xe_pt.c
> > > @@ -10,6 +10,7 @@
> > >   #include "xe_gt.h"
> > >   #include "xe_gt_tlb_invalidation.h"
> > >   #include "xe_migrate.h"
> > > +#include "xe_pat.h"
> > >   #include "xe_pt_types.h"
> > >   #include "xe_pt_walk.h"
> > >   #include "xe_res_cursor.h"
> > > @@ -67,7 +68,7 @@ u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset)
> > >   	return pde;
> > >   }
> > > -u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
> > > +u64 __xe_pte_encode(u64 pte, u16 pat_index,
> > >   		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level)
> > >   {
> > >   	struct xe_device *xe = vm->xe;
> > > @@ -85,7 +86,7 @@ u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
> > >   	else if (pt_level == 2)
> > >   		pte |= XE_PDPE_PS_1G;
> > > -	pte = vm->pat_encode.pte_encode(xe, pte, cache);
> > > +	pte = vm->pat_encode.pte_encode(xe, pte, pat_index);
> > >   	/* XXX: Does hw support 1 GiB pages? */
> > >   	XE_WARN_ON(pt_level > 2);
> > > @@ -103,7 +104,7 @@ u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
> > >    *
> > >    * Return: An encoded page-table entry. No errors.
> > >    */
> > > -u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
> > > +u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, u16 pat_index,
> > >   		  u32 pt_level)
> > >   {
> > >   	u64 pte;
> > > @@ -112,7 +113,7 @@ u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_
> > >   	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
> > >   		pte |= XE_PPGTT_PTE_DM;
> > > -	return __xe_pte_encode(pte, cache, vm, NULL, pt_level);
> > > +	return __xe_pte_encode(pte, pat_index, vm, NULL, pt_level);
> > >   }
> > >   static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
> > > @@ -125,7 +126,7 @@ static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
> > >   	if (level == 0) {
> > >   		u64 empty = xe_pte_encode(vm, vm->scratch_bo[id], 0,
> > > -					  XE_CACHE_WB, 0);
> > > +					  xe_pat_get_index(vm->xe, XE_CACHE_WB), 0);
> > >   		return empty;
> > >   	} else {
> > > @@ -358,8 +359,6 @@ struct xe_pt_stage_bind_walk {
> > >   	struct xe_vm *vm;
> > >   	/** @tile: The tile we're building for. */
> > >   	struct xe_tile *tile;
> > > -	/** @cache: Desired cache level for the ptes */
> > > -	enum xe_cache_level cache;
> > >   	/** @default_pte: PTE flag only template. No address is associated */
> > >   	u64 default_pte;
> > >   	/** @dma_offset: DMA offset to add to the PTE. */
> > > @@ -594,7 +593,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
> > >   		pte = __xe_pte_encode(is_null ? 0 :
> > >   				      xe_res_dma(curs) + xe_walk->dma_offset,
> > > -				      xe_walk->cache, xe_walk->vm, xe_walk->vma, level);
> > > +				      xe_walk->vma->pat_index, xe_walk->vm, xe_walk->vma, level);
> > >   		pte |= xe_walk->default_pte;
> > >   		/*
> > > @@ -720,13 +719,8 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
> > >   		if (vma && vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT)
> > >   			xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
> > >   		xe_walk.dma_offset = vram_region_gpu_offset(bo->ttm.resource);
> > > -		xe_walk.cache = XE_CACHE_WB;
> > > -	} else {
> > > -		if (!xe_vma_has_no_bo(vma) && bo->flags & XE_BO_SCANOUT_BIT)
> > > -			xe_walk.cache = XE_CACHE_WT;
> > > -		else
> > > -			xe_walk.cache = XE_CACHE_WB;
> > >   	}
> > > +
> > >   	if (!xe_vma_has_no_bo(vma) && xe_bo_is_stolen(bo))
> > >   		xe_walk.dma_offset = xe_ttm_stolen_gpu_offset(xe_bo_device(bo));
> > > diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
> > > index 0e66436d707d..6d10823fca9b 100644
> > > --- a/drivers/gpu/drm/xe/xe_pt.h
> > > +++ b/drivers/gpu/drm/xe/xe_pt.h
> > > @@ -47,9 +47,9 @@ bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
> > >   u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset);
> > > -u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
> > > +u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, u16 pat_index,
> > >   		  u32 pt_level);
> > > -u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
> > > +u64 __xe_pte_encode(u64 pte, u16 pat_index,
> > >   		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level);
> > >   #endif
> > > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> > > index ba612a5ee2d8..98db7a298139 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm.c
> > > +++ b/drivers/gpu/drm/xe/xe_vm.c
> > > @@ -6,6 +6,7 @@
> > >   #include "xe_vm.h"
> > >   #include <linux/dma-fence-array.h>
> > > +#include <linux/nospec.h>
> > >   #include <drm/drm_exec.h>
> > >   #include <drm/drm_print.h>
> > > @@ -858,7 +859,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
> > >   				    u64 start, u64 end,
> > >   				    bool read_only,
> > >   				    bool is_null,
> > > -				    u8 tile_mask)
> > > +				    u8 tile_mask,
> > > +				    u16 pat_index)
> > >   {
> > >   	struct xe_vma *vma;
> > >   	struct xe_tile *tile;
> > > @@ -897,6 +899,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
> > >   			vma->tile_mask |= 0x1 << id;
> > >   	}
> > > +	vma->pat_index = pat_index;
> > > +
> > >   	if (vm->xe->info.platform == XE_PVC)
> > >   		vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
> > > @@ -1195,10 +1199,8 @@ static void xe_vma_op_work_func(struct work_struct *w);
> > >   static void vm_destroy_work_func(struct work_struct *w);
> > >   static u64 xe2_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
> > > -						enum xe_cache_level cache)
> > > +				    u16 pat_index)
> > >   {
> > > -	u32 pat_index = xe_pat_get_index(xe, cache);
> > > -
> > >   	if (pat_index & BIT(0))
> > >   		pte_pat |= BIT(3);
> > > @@ -1216,10 +1218,8 @@ static u64 xe2_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
> > >   }
> > >   static u64 xelp_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
> > > -						enum xe_cache_level cache)
> > > +				     u16 pat_index)
> > >   {
> > > -	u32 pat_index = xe_pat_get_index(xe, cache);
> > > -
> > >   	if (pat_index & BIT(0))
> > >   		pte_pat |= BIT(3);
> > > @@ -2300,7 +2300,7 @@ static void print_op(struct xe_device *xe, struct drm_gpuva_op *op)
> > >   static struct drm_gpuva_ops *
> > >   vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
> > >   			 u64 bo_offset_or_userptr, u64 addr, u64 range,
> > > -			 u32 operation, u8 tile_mask, u32 region)
> > > +			 u32 operation, u8 tile_mask, u32 region, u16 pat_index)
> > >   {
> > >   	struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
> > >   	struct drm_gpuva_ops *ops;
> > > @@ -2327,6 +2327,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
> > >   			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
> > >   			op->tile_mask = tile_mask;
> > > +			op->pat_index = pat_index;
> > >   			op->map.immediate =
> > >   				operation & XE_VM_BIND_FLAG_IMMEDIATE;
> > >   			op->map.read_only =
> > > @@ -2354,6 +2355,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
> > >   			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
> > >   			op->tile_mask = tile_mask;
> > > +			op->pat_index = pat_index;
> > >   			op->prefetch.region = region;
> > >   		}
> > >   		break;
> > > @@ -2396,7 +2398,8 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
> > >   }
> > >   static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
> > > -			      u8 tile_mask, bool read_only, bool is_null)
> > > +			      u8 tile_mask, bool read_only, bool is_null,
> > > +			      u16 pat_index)
> > >   {
> > >   	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
> > >   	struct xe_vma *vma;
> > > @@ -2412,7 +2415,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
> > >   	vma = xe_vma_create(vm, bo, op->gem.offset,
> > >   			    op->va.addr, op->va.addr +
> > >   			    op->va.range - 1, read_only, is_null,
> > > -			    tile_mask);
> > > +			    tile_mask, pat_index);
> > >   	if (bo)
> > >   		xe_bo_unlock(bo);
> > > @@ -2569,7 +2572,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
> > >   			vma = new_vma(vm, &op->base.map,
> > >   				      op->tile_mask, op->map.read_only,
> > > -				      op->map.is_null);
> > > +				      op->map.is_null, op->pat_index);
> > >   			if (IS_ERR(vma)) {
> > >   				err = PTR_ERR(vma);
> > >   				goto free_fence;
> > > @@ -2597,7 +2600,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
> > >   				vma = new_vma(vm, op->base.remap.prev,
> > >   					      op->tile_mask, read_only,
> > > -					      is_null);
> > > +					      is_null, op->pat_index);
> > >   				if (IS_ERR(vma)) {
> > >   					err = PTR_ERR(vma);
> > >   					goto free_fence;
> > > @@ -2633,7 +2636,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
> > >   				vma = new_vma(vm, op->base.remap.next,
> > >   					      op->tile_mask, read_only,
> > > -					      is_null);
> > > +					      is_null, op->pat_index);
> > >   				if (IS_ERR(vma)) {
> > >   					err = PTR_ERR(vma);
> > >   					goto free_fence;
> > > @@ -3146,7 +3149,23 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
> > >   		u32 obj = (*bind_ops)[i].obj;
> > >   		u64 obj_offset = (*bind_ops)[i].obj_offset;
> > >   		u32 region = (*bind_ops)[i].region;
> > > +		u16 pat_index = (*bind_ops)[i].pat_index;
> > >   		bool is_null = op & XE_VM_BIND_FLAG_NULL;
> > > +		u16 coh_mode;
> > > +
> > > +		if (XE_IOCTL_DBG(xe, pat_index >= xe->info.pat.n_entries)) {
> > > +			err = -EINVAL;
> > > +			goto free_bind_ops;
> > > +		}
> > > +
> > > +		pat_index = array_index_nospec(pat_index,
> > > +					       xe->info.pat.n_entries);
> > > +		(*bind_ops)[i].pat_index = pat_index;
> > > +		coh_mode = xe_pat_index_get_coh_mode(xe, pat_index);
> > > +		if (XE_WARN_ON(!coh_mode || coh_mode > XE_GEM_COH_AT_LEAST_1WAY)) {
> > > +			err = -EINVAL;
> > > +			goto free_bind_ops;
> > > +		}
> > >   		if (i == 0) {
> > >   			*async = !!(op & XE_VM_BIND_FLAG_ASYNC);
> > > @@ -3188,6 +3207,8 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
> > >   				 VM_BIND_OP(op) == XE_VM_BIND_OP_UNMAP_ALL) ||
> > >   		    XE_IOCTL_DBG(xe, obj &&
> > >   				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP_USERPTR) ||
> > > +		    XE_IOCTL_DBG(xe, coh_mode == XE_GEM_COH_NONE &&
> > > +				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP_USERPTR) ||
> > >   		    XE_IOCTL_DBG(xe, obj &&
> > >   				 VM_BIND_OP(op) == XE_VM_BIND_OP_PREFETCH) ||
> > >   		    XE_IOCTL_DBG(xe, region &&
> > > @@ -3336,6 +3357,8 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> > >   		u64 addr = bind_ops[i].addr;
> > >   		u32 obj = bind_ops[i].obj;
> > >   		u64 obj_offset = bind_ops[i].obj_offset;
> > > +		u16 pat_index = bind_ops[i].pat_index;
> > > +		u16 coh_mode;
> > >   		if (!obj)
> > >   			continue;
> > > @@ -3363,6 +3386,23 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> > >   				goto put_obj;
> > >   			}
> > >   		}
> > > +
> > > +		coh_mode = xe_pat_index_get_coh_mode(xe, pat_index);
> > > +		if (bos[i]->coh_mode) {
> > > +			if (XE_IOCTL_DBG(xe, coh_mode < bos[i]->coh_mode)) {
> > > +				err = -EINVAL;
> > > +				goto put_obj;
> > > +			}
> > > +		} else if (XE_IOCTL_DBG(xe, coh_mode == XE_GEM_COH_NONE)) {
> > > +			/*
> > > +			 * Imported dma-buf from a different device should
> > > +			 * require 1way or 2way coherency since we don't know
> > > +			 * how it was mapped on the CPU. Just assume is it
> > > +			 * potentially cached on CPU side.
> > > +			 */
> > > +			err = -EINVAL;
> > > +			goto put_obj;
> > > +		}
> > >   	}
> > >   	if (args->num_syncs) {
> > > @@ -3400,10 +3440,11 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
> > >   		u64 obj_offset = bind_ops[i].obj_offset;
> > >   		u8 tile_mask = bind_ops[i].tile_mask;
> > >   		u32 region = bind_ops[i].region;
> > > +		u16 pat_index = bind_ops[i].pat_index;
> > >   		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset,
> > >   						  addr, range, op, tile_mask,
> > > -						  region);
> > > +						  region, pat_index);
> > >   		if (IS_ERR(ops[i])) {
> > >   			err = PTR_ERR(ops[i]);
> > >   			ops[i] = NULL;
> > > diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> > > index dc583f00919f..54658f400174 100644
> > > --- a/drivers/gpu/drm/xe/xe_vm_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> > > @@ -111,6 +111,11 @@ struct xe_vma {
> > >   	 */
> > >   	u8 tile_present;
> > > +	/**
> > > +	 * @pat_index: The pat index to use when encoding the PTEs for this vma.
> > > +	 */
> > > +	u16 pat_index;
> > > +
> > >   	struct {
> > >   		struct list_head rebind_link;
> > >   	} notifier;
> > > @@ -338,8 +343,7 @@ struct xe_vm {
> > >   	bool batch_invalidate_tlb;
> > >   	struct {
> > > -		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat,
> > > -				  enum xe_cache_level cache);
> > > +		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat, u16 pat_index);
> > >   	} pat_encode;
> > >   };
> > > @@ -419,6 +423,8 @@ struct xe_vma_op {
> > >   	struct async_op_fence *fence;
> > >   	/** @tile_mask: gt mask for this operation */
> > >   	u8 tile_mask;
> > > +	/** @pat_index: The pat index to use for this operation. */
> > > +	u16 pat_index;
> > >   	/** @flags: operation flags */
> > >   	enum xe_vma_op_flags flags;
> > > diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> > > index 737bb1d4c6f7..75b42c1116f2 100644
> > > --- a/include/uapi/drm/xe_drm.h
> > > +++ b/include/uapi/drm/xe_drm.h
> > > @@ -605,8 +605,49 @@ struct drm_xe_vm_bind_op {
> > >   	 */
> > >   	__u32 obj;
> > > +	/**
> > > +	 * @pat_index: The platform defined @pat_index to use for this mapping.
> > > +	 * The index basically maps to some predefined memory attributes,
> > > +	 * including things like caching, coherency, compression etc.  The exact
> > > +	 * meaning of the pat_index is platform specific and defined in the
> > > +	 * Bspec and PRMs.  When the KMD sets up the binding the index here is
> > > +	 * encoded into the ppGTT PTE.
> > > +	 *
> > > +	 * For coherency the @pat_index needs to be least as coherent as
> > > +	 * drm_xe_gem_create.coh_mode. i.e coh_mode(pat_index) >=
> > > +	 * drm_xe_gem_create.coh_mode. The KMD will extract the coherency mode
> > > +	 * from the @pat_index and reject if there is a mismatch (see note below
> > > +	 * for pre-MTL platforms).
> > > +	 *
> > > +	 * Note: On pre-MTL platforms there is only a caching mode and no
> > > +	 * explicit coherency mode, but on such hardware there is always a
> > > +	 * shared-LLC (or is dgpu) so all GT memory accesses are coherent with
> > > +	 * CPU caches even with the caching mode set as uncached.  It's only the
> > > +	 * display engine that is incoherent (on dgpu it must be in VRAM which
> > > +	 * is always mapped as WC on the CPU). However to keep the uapi somewhat
> > > +	 * consistent with newer platforms the KMD groups the different cache
> > > +	 * levels into the following coherency buckets on all pre-MTL platforms:
> > > +	 *
> > > +	 *	ppGTT UC -> XE_GEM_COH_NONE
> > > +	 *	ppGTT WC -> XE_GEM_COH_NONE
> > > +	 *	ppGTT WT -> XE_GEM_COH_NONE
> > > +	 *	ppGTT WB -> XE_GEM_COH_AT_LEAST_1WAY
> > > +	 *
> > > +	 * In practice UC/WC/WT should only ever used for scanout surfaces on
> > > +	 * such platforms (or perhaps in general for dma-buf if shared with
> > > +	 * another device) since it is only the display engine that is actually
> > > +	 * incoherent.  Everything else should typically use WB given that we
> > > +	 * have a shared-LLC.  On MTL+ this completely changes and the HW
> > > +	 * defines the coherency mode as part of the @pat_index, where
> > > +	 * incoherent GT access is possible.
> > 
> > with this in mind I noticed that on i915 the scanout is just a pat param to
> > the uapi, while in Xe we have a buffer flag:
> > XE_GEM_CREATE_FLAG_SCANOUT
> > 
> > should we continue with this flag, or should we do the same pat.param that
> > i915 is doing?
> 
> Can you point to where the scanout pat.param thing is? But yeah, with the
> explicit pat_index and smem_cpu_caching userspace can already select the
> correct thing for display surfaces. I think dropping the flag is possible.

Nevermind. The direction is to go with the flag at bo creation time and
then limit the pat possibilities on vm_bind based on the creation flags.
So, please ignore my thoughts on removing it.

> 
> > 
> > > +	 *
> > > +	 * Note: For userptr and externally imported dma-buf the kernel expects
> > > +	 * either 1WAY or 2WAY for the @pat_index.
> > > +	 */
> > > +	__u16 pat_index;
> > > +
> > >   	/** @pad: MBZ */
> > > -	__u32 pad;
> > > +	__u16 pad;
> > >   	union {
> > >   		/**
> > > -- 
> > > 2.41.0
> > > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-09-27 19:30 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-14 15:31 [Intel-xe] [PATCH v2 0/6] PAT and cache coherency support Matthew Auld
2023-09-14 15:31 ` [Intel-xe] [PATCH v2 1/6] drm/xe/uapi: Add support for cache and coherency mode Matthew Auld
2023-09-14 23:47   ` Matt Roper
2023-09-15  7:37     ` Matthew Auld
2023-09-21 20:07   ` Souza, Jose
2023-09-25  8:06     ` Matthew Auld
2023-09-25 18:26       ` Souza, Jose
2023-09-26  8:07         ` Matthew Auld
2023-09-26 15:59           ` Souza, Jose
2023-09-14 15:31 ` [Intel-xe] [PATCH v2 2/6] drm/xe: move pat_table into device info Matthew Auld
2023-09-14 23:53   ` Matt Roper
2023-09-14 15:31 ` [Intel-xe] [PATCH v2 3/6] drm/xe/pat: trim the tgl PAT table Matthew Auld
2023-09-14 18:07   ` Matt Roper
2023-09-14 15:31 ` [Intel-xe] [PATCH v2 4/6] drm/xe/pat: annotate pat_index with coherency mode Matthew Auld
2023-09-15  0:08   ` Matt Roper
2023-09-14 15:31 ` [Intel-xe] [PATCH v2 5/6] drm/xe/migrate: rather use pte_encode helpers Matthew Auld
2023-09-15 22:19   ` Matt Roper
2023-09-14 15:31 ` [Intel-xe] [PATCH v2 6/6] drm/xe/uapi: support pat_index selection with vm_bind Matthew Auld
2023-09-15 22:24   ` Matt Roper
2023-09-25  8:07     ` Matthew Auld
2023-09-25 21:56   ` Rodrigo Vivi
2023-09-26  8:17     ` Matthew Auld
2023-09-27 19:30       ` Rodrigo Vivi
2023-09-14 18:16 ` [Intel-xe] ✗ CI.Patch_applied: failure for PAT and cache coherency support (rev2) Patchwork
2023-09-18 15:51 ` [Intel-xe] [PATCH v2 0/6] PAT and cache coherency support Souza, Jose
2023-09-21 17:19   ` Souza, Jose
2023-09-25 13:12     ` Matthew Auld
2023-09-21 20:10 ` [Intel-xe] ✗ CI.Patch_applied: failure for PAT and cache coherency support (rev3) Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox