Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/8] Fix non-contiguous VRAM BO access in Xe
@ 2024-10-31 18:10 Matthew Brost
  2024-10-31 18:10 ` [PATCH v6 1/8] drm/xe: Add xe_bo_vm_access Matthew Brost
                   ` (15 more replies)
  0 siblings, 16 replies; 56+ messages in thread
From: Matthew Brost @ 2024-10-31 18:10 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.auld

Fully reviewed and resending for final CI.

Dropping non-visible patch for now as it a bit larger, not strickly
required to unblock EU debug, and be sent independently in a follow up.

Matt

Matthew Brost (8):
  drm/xe: Add xe_bo_vm_access
  drm/ttm: Add ttm_bo_access
  drm/xe: Add xe_ttm_access_memory
  drm/xe: Take PM ref in delayed snapshot capture worker
  drm/xe/display: Update intel_bo_read_from_page to use ttm_bo_access
  drm/xe: Use ttm_bo_access in xe_vm_snapshot_capture_delayed
  drm/xe: Set XE_BO_FLAG_PINNED in migrate selftest BOs
  drm/xe: Only allow contiguous BOs to use xe_bo_vmap

 drivers/gpu/drm/ttm/ttm_bo_util.c     | 86 +++++++++++++++++++++++
 drivers/gpu/drm/ttm/ttm_bo_vm.c       | 65 +-----------------
 drivers/gpu/drm/xe/display/intel_bo.c | 25 +------
 drivers/gpu/drm/xe/tests/xe_migrate.c | 13 ++--
 drivers/gpu/drm/xe/xe_bo.c            | 99 +++++++++++++++++++++++----
 drivers/gpu/drm/xe/xe_devcoredump.c   |  6 ++
 drivers/gpu/drm/xe/xe_vm.c            | 17 ++---
 include/drm/ttm/ttm_bo.h              |  2 +
 8 files changed, 197 insertions(+), 116 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v6 1/8] drm/xe: Add xe_bo_vm_access
  2024-10-31 18:10 [PATCH v6 0/8] Fix non-contiguous VRAM BO access in Xe Matthew Brost
@ 2024-10-31 18:10 ` Matthew Brost
  2024-10-31 18:10 ` [PATCH v6 2/8] drm/ttm: Add ttm_bo_access Matthew Brost
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Matthew Brost @ 2024-10-31 18:10 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.auld

Add xe_bo_vm_access which is wrapper around ttm_bo_vm_access which takes
rpm refs for device access.

Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 5b232f2951b1..0261a8b29351 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1236,11 +1236,26 @@ static vm_fault_t xe_gem_fault(struct vm_fault *vmf)
 	return ret;
 }
 
+static int xe_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
+			   void *buf, int len, int write)
+{
+	struct ttm_buffer_object *ttm_bo = vma->vm_private_data;
+	struct xe_bo *bo = ttm_to_xe_bo(ttm_bo);
+	struct xe_device *xe = xe_bo_device(bo);
+	int ret;
+
+	xe_pm_runtime_get(xe);
+	ret = ttm_bo_vm_access(vma, addr, buf, len, write);
+	xe_pm_runtime_put(xe);
+
+	return ret;
+}
+
 static const struct vm_operations_struct xe_gem_vm_ops = {
 	.fault = xe_gem_fault,
 	.open = ttm_bo_vm_open,
 	.close = ttm_bo_vm_close,
-	.access = ttm_bo_vm_access
+	.access = xe_bo_vm_access,
 };
 
 static const struct drm_gem_object_funcs xe_gem_object_funcs = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-10-31 18:10 [PATCH v6 0/8] Fix non-contiguous VRAM BO access in Xe Matthew Brost
  2024-10-31 18:10 ` [PATCH v6 1/8] drm/xe: Add xe_bo_vm_access Matthew Brost
@ 2024-10-31 18:10 ` Matthew Brost
  2024-10-31 23:43   ` Matthew Brost
  2024-10-31 18:10 ` [PATCH v6 3/8] drm/xe: Add xe_ttm_access_memory Matthew Brost
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 56+ messages in thread
From: Matthew Brost @ 2024-10-31 18:10 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.auld

Non-contiguous VRAM cannot easily be mapped in TTM nor can non-visible
VRAM easily be accessed. Add ttm_bo_access, which is similar to
ttm_bo_vm_access, to access such memory.

v4:
 - Fix checkpatch warnings (CI)
v5:
 - Fix checkpatch warnings (CI)
v6:
 - Fix kernel doc (Auld)

Reported-by: Christoph Manszewski <christoph.manszewski@intel.com>
Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Tested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo_util.c | 86 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/ttm/ttm_bo_vm.c   | 65 +----------------------
 include/drm/ttm/ttm_bo.h          |  2 +
 3 files changed, 89 insertions(+), 64 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index d939925efa81..77e760ea7193 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -919,3 +919,89 @@ s64 ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct ttm_device *bdev,
 
 	return progress;
 }
+
+static int ttm_bo_access_kmap(struct ttm_buffer_object *bo,
+			      unsigned long offset,
+			      void *buf, int len, int write)
+{
+	unsigned long page = offset >> PAGE_SHIFT;
+	unsigned long bytes_left = len;
+	int ret;
+
+	/* Copy a page at a time, that way no extra virtual address
+	 * mapping is needed
+	 */
+	offset -= page << PAGE_SHIFT;
+	do {
+		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
+		struct ttm_bo_kmap_obj map;
+		void *ptr;
+		bool is_iomem;
+
+		ret = ttm_bo_kmap(bo, page, 1, &map);
+		if (ret)
+			return ret;
+
+		ptr = (void *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
+		WARN_ON_ONCE(is_iomem);
+		if (write)
+			memcpy(ptr, buf, bytes);
+		else
+			memcpy(buf, ptr, bytes);
+		ttm_bo_kunmap(&map);
+
+		page++;
+		buf += bytes;
+		bytes_left -= bytes;
+		offset = 0;
+	} while (bytes_left);
+
+	return len;
+}
+
+/**
+ * ttm_bo_access - Helper to access a buffer object
+ *
+ * @bo: ttm buffer object
+ * @offset: access offset into buffer object
+ * @buf: pointer to caller memory to read into or write from
+ * @len: length of access
+ * @write: write access
+ *
+ * Utility function to access a buffer object. Useful when buffer object cannot
+ * be easily mapped (non-contiguous, non-visible, etc...).
+ *
+ * Returns:
+ * @len if successful, negative error code on failure.
+ */
+int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
+		  void *buf, int len, int write)
+{
+	int ret;
+
+	if (len < 1 || (offset + len) > bo->base.size)
+		return -EIO;
+
+	ret = ttm_bo_reserve(bo, true, false, NULL);
+	if (ret)
+		return ret;
+
+	switch (bo->resource->mem_type) {
+	case TTM_PL_SYSTEM:
+		fallthrough;
+	case TTM_PL_TT:
+		ret = ttm_bo_access_kmap(bo, offset, buf, len, write);
+		break;
+	default:
+		if (bo->bdev->funcs->access_memory)
+			ret = bo->bdev->funcs->access_memory
+				(bo, offset, buf, len, write);
+		else
+			ret = -EIO;
+	}
+
+	ttm_bo_unreserve(bo);
+
+	return ret;
+}
+EXPORT_SYMBOL(ttm_bo_access);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 2c699ed1963a..20b1e5f78684 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -366,45 +366,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma)
 }
 EXPORT_SYMBOL(ttm_bo_vm_close);
 
-static int ttm_bo_vm_access_kmap(struct ttm_buffer_object *bo,
-				 unsigned long offset,
-				 uint8_t *buf, int len, int write)
-{
-	unsigned long page = offset >> PAGE_SHIFT;
-	unsigned long bytes_left = len;
-	int ret;
-
-	/* Copy a page at a time, that way no extra virtual address
-	 * mapping is needed
-	 */
-	offset -= page << PAGE_SHIFT;
-	do {
-		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
-		struct ttm_bo_kmap_obj map;
-		void *ptr;
-		bool is_iomem;
-
-		ret = ttm_bo_kmap(bo, page, 1, &map);
-		if (ret)
-			return ret;
-
-		ptr = (uint8_t *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
-		WARN_ON_ONCE(is_iomem);
-		if (write)
-			memcpy(ptr, buf, bytes);
-		else
-			memcpy(buf, ptr, bytes);
-		ttm_bo_kunmap(&map);
-
-		page++;
-		buf += bytes;
-		bytes_left -= bytes;
-		offset = 0;
-	} while (bytes_left);
-
-	return len;
-}
-
 int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
 		     void *buf, int len, int write)
 {
@@ -412,32 +373,8 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
 	unsigned long offset = (addr) - vma->vm_start +
 		((vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node))
 		 << PAGE_SHIFT);
-	int ret;
-
-	if (len < 1 || (offset + len) > bo->base.size)
-		return -EIO;
 
-	ret = ttm_bo_reserve(bo, true, false, NULL);
-	if (ret)
-		return ret;
-
-	switch (bo->resource->mem_type) {
-	case TTM_PL_SYSTEM:
-		fallthrough;
-	case TTM_PL_TT:
-		ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write);
-		break;
-	default:
-		if (bo->bdev->funcs->access_memory)
-			ret = bo->bdev->funcs->access_memory(
-				bo, offset, buf, len, write);
-		else
-			ret = -EIO;
-	}
-
-	ttm_bo_unreserve(bo);
-
-	return ret;
+	return ttm_bo_access(bo, offset, buf, len, write);
 }
 EXPORT_SYMBOL(ttm_bo_vm_access);
 
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 5804408815be..8ea11cd8df39 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -421,6 +421,8 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo);
 int ttm_bo_evict_first(struct ttm_device *bdev,
 		       struct ttm_resource_manager *man,
 		       struct ttm_operation_ctx *ctx);
+int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
+		  void *buf, int len, int write);
 vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
 			     struct vm_fault *vmf);
 vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 3/8] drm/xe: Add xe_ttm_access_memory
  2024-10-31 18:10 [PATCH v6 0/8] Fix non-contiguous VRAM BO access in Xe Matthew Brost
  2024-10-31 18:10 ` [PATCH v6 1/8] drm/xe: Add xe_bo_vm_access Matthew Brost
  2024-10-31 18:10 ` [PATCH v6 2/8] drm/ttm: Add ttm_bo_access Matthew Brost
@ 2024-10-31 18:10 ` Matthew Brost
  2024-10-31 18:10 ` [PATCH v6 4/8] drm/xe: Take PM ref in delayed snapshot capture worker Matthew Brost
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Matthew Brost @ 2024-10-31 18:10 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.auld

Non-contiguous VRAM cannot easily be mapped in TTM nor can non-visible
VRAM easily be accessed. Add xe_ttm_access_memory which hooks into
ttm_bo_access to access such memory.

v4:
 - Assert memory access rather than taking RPM ref (Thomas / Auld)
 - Fix warning on xe_res_cursor.h for non-zero offset (Mika)

Reported-by: Christoph Manszewski <christoph.manszewski@intel.com>
Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c | 59 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 56 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 0261a8b29351..0cb014aba699 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -442,6 +442,14 @@ static void xe_ttm_tt_destroy(struct ttm_device *ttm_dev, struct ttm_tt *tt)
 	kfree(tt);
 }
 
+static bool xe_ttm_resource_visible(struct ttm_resource *mem)
+{
+	struct xe_ttm_vram_mgr_resource *vres =
+		to_xe_ttm_vram_mgr_resource(mem);
+
+	return vres->used_visible_size == mem->size;
+}
+
 static int xe_ttm_io_mem_reserve(struct ttm_device *bdev,
 				 struct ttm_resource *mem)
 {
@@ -453,11 +461,9 @@ static int xe_ttm_io_mem_reserve(struct ttm_device *bdev,
 		return 0;
 	case XE_PL_VRAM0:
 	case XE_PL_VRAM1: {
-		struct xe_ttm_vram_mgr_resource *vres =
-			to_xe_ttm_vram_mgr_resource(mem);
 		struct xe_mem_region *vram = res_to_mem_region(mem);
 
-		if (vres->used_visible_size < mem->size)
+		if (!xe_ttm_resource_visible(mem))
 			return -EINVAL;
 
 		mem->bus.offset = mem->start << PAGE_SHIFT;
@@ -1111,6 +1117,52 @@ static void xe_ttm_bo_swap_notify(struct ttm_buffer_object *ttm_bo)
 	}
 }
 
+static int xe_ttm_access_memory(struct ttm_buffer_object *ttm_bo,
+				unsigned long offset, void *buf, int len,
+				int write)
+{
+	struct xe_bo *bo = ttm_to_xe_bo(ttm_bo);
+	struct xe_device *xe = ttm_to_xe_device(ttm_bo->bdev);
+	struct iosys_map vmap;
+	struct xe_res_cursor cursor;
+	struct xe_mem_region *vram;
+	int bytes_left = len;
+
+	xe_bo_assert_held(bo);
+	xe_device_assert_mem_access(xe);
+
+	if (!mem_type_is_vram(ttm_bo->resource->mem_type))
+		return -EIO;
+
+	/* FIXME: Use GPU for non-visible VRAM */
+	if (!xe_ttm_resource_visible(ttm_bo->resource))
+		return -EIO;
+
+	vram = res_to_mem_region(ttm_bo->resource);
+	xe_res_first(ttm_bo->resource, offset & PAGE_MASK,
+		     bo->size - (offset & PAGE_MASK), &cursor);
+
+	do {
+		unsigned long page_offset = (offset & ~PAGE_MASK);
+		int byte_count = min((int)(PAGE_SIZE - page_offset), bytes_left);
+
+		iosys_map_set_vaddr_iomem(&vmap, (u8 __iomem *)vram->mapping +
+					  cursor.start);
+		if (write)
+			xe_map_memcpy_to(xe, &vmap, page_offset, buf, byte_count);
+		else
+			xe_map_memcpy_from(xe, buf, &vmap, page_offset, byte_count);
+
+		buf += byte_count;
+		offset += byte_count;
+		bytes_left -= byte_count;
+		if (bytes_left)
+			xe_res_next(&cursor, PAGE_SIZE);
+	} while (bytes_left);
+
+	return len;
+}
+
 const struct ttm_device_funcs xe_ttm_funcs = {
 	.ttm_tt_create = xe_ttm_tt_create,
 	.ttm_tt_populate = xe_ttm_tt_populate,
@@ -1120,6 +1172,7 @@ const struct ttm_device_funcs xe_ttm_funcs = {
 	.move = xe_bo_move,
 	.io_mem_reserve = xe_ttm_io_mem_reserve,
 	.io_mem_pfn = xe_ttm_io_mem_pfn,
+	.access_memory = xe_ttm_access_memory,
 	.release_notify = xe_ttm_bo_release_notify,
 	.eviction_valuable = ttm_bo_eviction_valuable,
 	.delete_mem_notify = xe_ttm_bo_delete_mem_notify,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 4/8] drm/xe: Take PM ref in delayed snapshot capture worker
  2024-10-31 18:10 [PATCH v6 0/8] Fix non-contiguous VRAM BO access in Xe Matthew Brost
                   ` (2 preceding siblings ...)
  2024-10-31 18:10 ` [PATCH v6 3/8] drm/xe: Add xe_ttm_access_memory Matthew Brost
@ 2024-10-31 18:10 ` Matthew Brost
  2024-10-31 18:10 ` [PATCH v6 5/8] drm/xe/display: Update intel_bo_read_from_page to use ttm_bo_access Matthew Brost
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Matthew Brost @ 2024-10-31 18:10 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.auld

The delayed snapshot capture worker can access the GPU or VRAM both of
which require a PM reference. Take a reference in this worker.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: 4f04d07c0a94 ("drm/xe: Faster devcoredump")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/xe/xe_devcoredump.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c b/drivers/gpu/drm/xe/xe_devcoredump.c
index d2679c5d976b..0b0cd6aa1d9f 100644
--- a/drivers/gpu/drm/xe/xe_devcoredump.c
+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
@@ -23,6 +23,7 @@
 #include "xe_guc_submit.h"
 #include "xe_hw_engine.h"
 #include "xe_module.h"
+#include "xe_pm.h"
 #include "xe_sched_job.h"
 #include "xe_vm.h"
 
@@ -158,8 +159,11 @@ static void xe_devcoredump_deferred_snap_work(struct work_struct *work)
 {
 	struct xe_devcoredump_snapshot *ss = container_of(work, typeof(*ss), work);
 	struct xe_devcoredump *coredump = container_of(ss, typeof(*coredump), snapshot);
+	struct xe_device *xe = coredump_to_xe(coredump);
 	unsigned int fw_ref;
 
+	xe_pm_runtime_get(xe);
+
 	/* keep going if fw fails as we still want to save the memory and SW data */
 	fw_ref = xe_force_wake_get(gt_to_fw(ss->gt), XE_FORCEWAKE_ALL);
 	if (!xe_force_wake_ref_has_domain(fw_ref, XE_FORCEWAKE_ALL))
@@ -168,6 +172,8 @@ static void xe_devcoredump_deferred_snap_work(struct work_struct *work)
 	xe_guc_exec_queue_snapshot_capture_delayed(ss->ge);
 	xe_force_wake_put(gt_to_fw(ss->gt), fw_ref);
 
+	xe_pm_runtime_put(xe);
+
 	/* Calculate devcoredump size */
 	ss->read.size = __xe_devcoredump_read(NULL, INT_MAX, coredump);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 5/8] drm/xe/display: Update intel_bo_read_from_page to use ttm_bo_access
  2024-10-31 18:10 [PATCH v6 0/8] Fix non-contiguous VRAM BO access in Xe Matthew Brost
                   ` (3 preceding siblings ...)
  2024-10-31 18:10 ` [PATCH v6 4/8] drm/xe: Take PM ref in delayed snapshot capture worker Matthew Brost
@ 2024-10-31 18:10 ` Matthew Brost
  2024-10-31 18:10 ` [PATCH v6 6/8] drm/xe: Use ttm_bo_access in xe_vm_snapshot_capture_delayed Matthew Brost
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Matthew Brost @ 2024-10-31 18:10 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.auld

Don't open code vmap of a BO, use ttm_bo_access helper which is safe for
non-contiguous BOs and non-visible BOs.

Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/xe/display/intel_bo.c | 25 +------------------------
 1 file changed, 1 insertion(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/xe/display/intel_bo.c b/drivers/gpu/drm/xe/display/intel_bo.c
index 9f54fad0f1c0..43141964f6f2 100644
--- a/drivers/gpu/drm/xe/display/intel_bo.c
+++ b/drivers/gpu/drm/xe/display/intel_bo.c
@@ -40,31 +40,8 @@ int intel_bo_fb_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
 int intel_bo_read_from_page(struct drm_gem_object *obj, u64 offset, void *dst, int size)
 {
 	struct xe_bo *bo = gem_to_xe_bo(obj);
-	struct ttm_bo_kmap_obj map;
-	void *src;
-	bool is_iomem;
-	int ret;
 
-	ret = xe_bo_lock(bo, true);
-	if (ret)
-		return ret;
-
-	ret = ttm_bo_kmap(&bo->ttm, offset >> PAGE_SHIFT, 1, &map);
-	if (ret)
-		goto out_unlock;
-
-	offset &= ~PAGE_MASK;
-	src = ttm_kmap_obj_virtual(&map, &is_iomem);
-	src += offset;
-	if (is_iomem)
-		memcpy_fromio(dst, (void __iomem *)src, size);
-	else
-		memcpy(dst, src, size);
-
-	ttm_bo_kunmap(&map);
-out_unlock:
-	xe_bo_unlock(bo);
-	return ret;
+	return ttm_bo_access(&bo->ttm, offset, dst, size, 0);
 }
 
 struct intel_frontbuffer *intel_bo_get_frontbuffer(struct drm_gem_object *obj)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 6/8] drm/xe: Use ttm_bo_access in xe_vm_snapshot_capture_delayed
  2024-10-31 18:10 [PATCH v6 0/8] Fix non-contiguous VRAM BO access in Xe Matthew Brost
                   ` (4 preceding siblings ...)
  2024-10-31 18:10 ` [PATCH v6 5/8] drm/xe/display: Update intel_bo_read_from_page to use ttm_bo_access Matthew Brost
@ 2024-10-31 18:10 ` Matthew Brost
  2024-10-31 18:10 ` [PATCH v6 7/8] drm/xe: Set XE_BO_FLAG_PINNED in migrate selftest BOs Matthew Brost
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Matthew Brost @ 2024-10-31 18:10 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.auld

Non-contiguous mapping of BO in VRAM doesn't work, use ttm_bo_access
instead.

v2:
 - Fix error handling

Fixes: 0eb2a18a8fad ("drm/xe: Implement VM snapshot support for BO's and userptr")
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index c99380271de6..c8782da3a5c3 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -3303,7 +3303,6 @@ void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap)
 
 	for (int i = 0; i < snap->num_snaps; i++) {
 		struct xe_bo *bo = snap->snap[i].bo;
-		struct iosys_map src;
 		int err;
 
 		if (IS_ERR(snap->snap[i].data))
@@ -3316,16 +3315,12 @@ void xe_vm_snapshot_capture_delayed(struct xe_vm_snapshot *snap)
 		}
 
 		if (bo) {
-			xe_bo_lock(bo, false);
-			err = ttm_bo_vmap(&bo->ttm, &src);
-			if (!err) {
-				xe_map_memcpy_from(xe_bo_device(bo),
-						   snap->snap[i].data,
-						   &src, snap->snap[i].bo_ofs,
-						   snap->snap[i].len);
-				ttm_bo_vunmap(&bo->ttm, &src);
-			}
-			xe_bo_unlock(bo);
+			err = ttm_bo_access(&bo->ttm, snap->snap[i].bo_ofs,
+					    snap->snap[i].data, snap->snap[i].len, 0);
+			if (!(err < 0) && err != snap->snap[i].len)
+				err = -EIO;
+			else if (!(err < 0))
+				err = 0;
 		} else {
 			void __user *userptr = (void __user *)(size_t)snap->snap[i].bo_ofs;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 7/8] drm/xe: Set XE_BO_FLAG_PINNED in migrate selftest BOs
  2024-10-31 18:10 [PATCH v6 0/8] Fix non-contiguous VRAM BO access in Xe Matthew Brost
                   ` (5 preceding siblings ...)
  2024-10-31 18:10 ` [PATCH v6 6/8] drm/xe: Use ttm_bo_access in xe_vm_snapshot_capture_delayed Matthew Brost
@ 2024-10-31 18:10 ` Matthew Brost
  2024-10-31 18:10 ` [PATCH v6 8/8] drm/xe: Only allow contiguous BOs to use xe_bo_vmap Matthew Brost
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Matthew Brost @ 2024-10-31 18:10 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.auld

We only allow continguous BOs to be vmapped, set XE_BO_FLAG_PINNED on
BOs in migrate selftest as this forces continguous BOs and selftest uses
vmaps.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/xe/tests/xe_migrate.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
index 1a192a2a941b..4cef3b20bd17 100644
--- a/drivers/gpu/drm/xe/tests/xe_migrate.c
+++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
@@ -83,7 +83,8 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
 						   bo->size,
 						   ttm_bo_type_kernel,
 						   region |
-						   XE_BO_FLAG_NEEDS_CPU_ACCESS);
+						   XE_BO_FLAG_NEEDS_CPU_ACCESS |
+						   XE_BO_FLAG_PINNED);
 	if (IS_ERR(remote)) {
 		KUNIT_FAIL(test, "Failed to allocate remote bo for %s: %pe\n",
 			   str, remote);
@@ -642,7 +643,9 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
 
 	sys_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M,
 				   DRM_XE_GEM_CPU_CACHING_WC,
-				   XE_BO_FLAG_SYSTEM | XE_BO_FLAG_NEEDS_CPU_ACCESS);
+				   XE_BO_FLAG_SYSTEM |
+				   XE_BO_FLAG_NEEDS_CPU_ACCESS |
+				   XE_BO_FLAG_PINNED);
 
 	if (IS_ERR(sys_bo)) {
 		KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n",
@@ -666,7 +669,8 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
 
 	ccs_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M,
 				   DRM_XE_GEM_CPU_CACHING_WC,
-				   bo_flags | XE_BO_FLAG_NEEDS_CPU_ACCESS);
+				   bo_flags | XE_BO_FLAG_NEEDS_CPU_ACCESS |
+				   XE_BO_FLAG_PINNED);
 
 	if (IS_ERR(ccs_bo)) {
 		KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n",
@@ -690,7 +694,8 @@ static void validate_ccs_test_run_tile(struct xe_device *xe, struct xe_tile *til
 
 	vram_bo = xe_bo_create_user(xe, NULL, NULL, SZ_4M,
 				    DRM_XE_GEM_CPU_CACHING_WC,
-				    bo_flags | XE_BO_FLAG_NEEDS_CPU_ACCESS);
+				    bo_flags | XE_BO_FLAG_NEEDS_CPU_ACCESS |
+				    XE_BO_FLAG_PINNED);
 	if (IS_ERR(vram_bo)) {
 		KUNIT_FAIL(test, "xe_bo_create() failed with err=%ld\n",
 			   PTR_ERR(vram_bo));
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 8/8] drm/xe: Only allow contiguous BOs to use xe_bo_vmap
  2024-10-31 18:10 [PATCH v6 0/8] Fix non-contiguous VRAM BO access in Xe Matthew Brost
                   ` (6 preceding siblings ...)
  2024-10-31 18:10 ` [PATCH v6 7/8] drm/xe: Set XE_BO_FLAG_PINNED in migrate selftest BOs Matthew Brost
@ 2024-10-31 18:10 ` Matthew Brost
  2024-10-31 18:15 ` ✓ CI.Patch_applied: success for Fix non-contiguous VRAM BO access in Xe (rev6) Patchwork
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Matthew Brost @ 2024-10-31 18:10 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.auld

xe_bo_vmap only works on contiguous BOs, disallow xe_bo_vmap on BO
unless we are certain the BO is contiguous.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 0cb014aba699..e1cbe1d5acef 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -162,6 +162,15 @@ static void try_add_system(struct xe_device *xe, struct xe_bo *bo,
 	}
 }
 
+static bool force_contiguous(u32 bo_flags)
+{
+	/*
+	 * For eviction / restore on suspend / resume objects pinned in VRAM
+	 * must be contiguous, also only contiguous BOs support xe_bo_vmap.
+	 */
+	return bo_flags & (XE_BO_FLAG_PINNED | XE_BO_FLAG_GGTT);
+}
+
 static void add_vram(struct xe_device *xe, struct xe_bo *bo,
 		     struct ttm_place *places, u32 bo_flags, u32 mem_type, u32 *c)
 {
@@ -175,12 +184,7 @@ static void add_vram(struct xe_device *xe, struct xe_bo *bo,
 	xe_assert(xe, vram && vram->usable_size);
 	io_size = vram->io_size;
 
-	/*
-	 * For eviction / restore on suspend / resume objects
-	 * pinned in VRAM must be contiguous
-	 */
-	if (bo_flags & (XE_BO_FLAG_PINNED |
-			XE_BO_FLAG_GGTT))
+	if (force_contiguous(bo_flags))
 		place.flags |= TTM_PL_FLAG_CONTIGUOUS;
 
 	if (io_size < vram->usable_size) {
@@ -212,8 +216,7 @@ static void try_add_stolen(struct xe_device *xe, struct xe_bo *bo,
 
 		bo->placements[*c] = (struct ttm_place) {
 			.mem_type = XE_PL_STOLEN,
-			.flags = bo_flags & (XE_BO_FLAG_PINNED |
-					     XE_BO_FLAG_GGTT) ?
+			.flags = force_contiguous(bo_flags) ?
 				TTM_PL_FLAG_CONTIGUOUS : 0,
 		};
 		*c += 1;
@@ -2026,13 +2029,15 @@ dma_addr_t xe_bo_addr(struct xe_bo *bo, u64 offset, size_t page_size)
 
 int xe_bo_vmap(struct xe_bo *bo)
 {
+	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
 	void *virtual;
 	bool is_iomem;
 	int ret;
 
 	xe_bo_assert_held(bo);
 
-	if (!(bo->flags & XE_BO_FLAG_NEEDS_CPU_ACCESS))
+	if (drm_WARN_ON(&xe->drm, !(bo->flags & XE_BO_FLAG_NEEDS_CPU_ACCESS) ||
+			!force_contiguous(bo->flags)))
 		return -EINVAL;
 
 	if (!iosys_map_is_null(&bo->vmap))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* ✓ CI.Patch_applied: success for Fix non-contiguous VRAM BO access in Xe (rev6)
  2024-10-31 18:10 [PATCH v6 0/8] Fix non-contiguous VRAM BO access in Xe Matthew Brost
                   ` (7 preceding siblings ...)
  2024-10-31 18:10 ` [PATCH v6 8/8] drm/xe: Only allow contiguous BOs to use xe_bo_vmap Matthew Brost
@ 2024-10-31 18:15 ` Patchwork
  2024-10-31 18:15 ` ✗ CI.checkpatch: warning " Patchwork
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Patchwork @ 2024-10-31 18:15 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: Fix non-contiguous VRAM BO access in Xe (rev6)
URL   : https://patchwork.freedesktop.org/series/140200/
State : success

== Summary ==

=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: b8c3c871a2df drm-tip: 2024y-10m-31d-16h-11m-56s UTC integration manifest
=== git am output follows ===
Applying: drm/xe: Add xe_bo_vm_access
Applying: drm/ttm: Add ttm_bo_access
Applying: drm/xe: Add xe_ttm_access_memory
Applying: drm/xe: Take PM ref in delayed snapshot capture worker
Applying: drm/xe/display: Update intel_bo_read_from_page to use ttm_bo_access
Applying: drm/xe: Use ttm_bo_access in xe_vm_snapshot_capture_delayed
Applying: drm/xe: Set XE_BO_FLAG_PINNED in migrate selftest BOs
Applying: drm/xe: Only allow contiguous BOs to use xe_bo_vmap



^ permalink raw reply	[flat|nested] 56+ messages in thread

* ✗ CI.checkpatch: warning for Fix non-contiguous VRAM BO access in Xe (rev6)
  2024-10-31 18:10 [PATCH v6 0/8] Fix non-contiguous VRAM BO access in Xe Matthew Brost
                   ` (8 preceding siblings ...)
  2024-10-31 18:15 ` ✓ CI.Patch_applied: success for Fix non-contiguous VRAM BO access in Xe (rev6) Patchwork
@ 2024-10-31 18:15 ` Patchwork
  2024-10-31 18:17 ` ✓ CI.KUnit: success " Patchwork
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Patchwork @ 2024-10-31 18:15 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: Fix non-contiguous VRAM BO access in Xe (rev6)
URL   : https://patchwork.freedesktop.org/series/140200/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
30ab6715fc09baee6cc14cb3c89ad8858688d474
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 0bdafbaf9eb11ca51dbfcb68ee22571648f0e9e7
Author: Matthew Brost <matthew.brost@intel.com>
Date:   Thu Oct 31 11:10:48 2024 -0700

    drm/xe: Only allow contiguous BOs to use xe_bo_vmap
    
    xe_bo_vmap only works on contiguous BOs, disallow xe_bo_vmap on BO
    unless we are certain the BO is contiguous.
    
    Signed-off-by: Matthew Brost <matthew.brost@intel.com>
    Reviewed-by: Matthew Auld <matthew.auld@intel.com>
+ /mt/dim checkpatch b8c3c871a2df70e3201eb70505981d39e449384d drm-intel
e7cede92443e drm/xe: Add xe_bo_vm_access
6ca1bf9074b5 drm/ttm: Add ttm_bo_access
-:20: WARNING:BAD_REPORTED_BY_LINK: Reported-by: should be immediately followed by Closes: with a URL to the report
#20: 
Reported-by: Christoph Manszewski <christoph.manszewski@intel.com>
Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

total: 0 errors, 1 warnings, 0 checks, 175 lines checked
5afb869b3ee6 drm/xe: Add xe_ttm_access_memory
-:17: WARNING:BAD_REPORTED_BY_LINK: Reported-by: should be immediately followed by Closes: with a URL to the report
#17: 
Reported-by: Christoph Manszewski <christoph.manszewski@intel.com>
Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

total: 0 errors, 1 warnings, 0 checks, 85 lines checked
12c585fdf94f drm/xe: Take PM ref in delayed snapshot capture worker
38437d2caa56 drm/xe/display: Update intel_bo_read_from_page to use ttm_bo_access
1c79076d86fd drm/xe: Use ttm_bo_access in xe_vm_snapshot_capture_delayed
4eafa8106bc1 drm/xe: Set XE_BO_FLAG_PINNED in migrate selftest BOs
0bdafbaf9eb1 drm/xe: Only allow contiguous BOs to use xe_bo_vmap



^ permalink raw reply	[flat|nested] 56+ messages in thread

* ✓ CI.KUnit: success for Fix non-contiguous VRAM BO access in Xe (rev6)
  2024-10-31 18:10 [PATCH v6 0/8] Fix non-contiguous VRAM BO access in Xe Matthew Brost
                   ` (9 preceding siblings ...)
  2024-10-31 18:15 ` ✗ CI.checkpatch: warning " Patchwork
@ 2024-10-31 18:17 ` Patchwork
  2024-10-31 18:28 ` ✓ CI.Build: " Patchwork
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Patchwork @ 2024-10-31 18:17 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: Fix non-contiguous VRAM BO access in Xe (rev6)
URL   : https://patchwork.freedesktop.org/series/140200/
State : success

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
[18:15:57] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[18:16:02] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json ARCH=um O=.kunit --jobs=48
../lib/iomap.c:156:5: warning: no previous prototype for ‘ioread64_lo_hi’ [-Wmissing-prototypes]
  156 | u64 ioread64_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:163:5: warning: no previous prototype for ‘ioread64_hi_lo’ [-Wmissing-prototypes]
  163 | u64 ioread64_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:170:5: warning: no previous prototype for ‘ioread64be_lo_hi’ [-Wmissing-prototypes]
  170 | u64 ioread64be_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:178:5: warning: no previous prototype for ‘ioread64be_hi_lo’ [-Wmissing-prototypes]
  178 | u64 ioread64be_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:264:6: warning: no previous prototype for ‘iowrite64_lo_hi’ [-Wmissing-prototypes]
  264 | void iowrite64_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:272:6: warning: no previous prototype for ‘iowrite64_hi_lo’ [-Wmissing-prototypes]
  272 | void iowrite64_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:280:6: warning: no previous prototype for ‘iowrite64be_lo_hi’ [-Wmissing-prototypes]
  280 | void iowrite64be_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~
../lib/iomap.c:288:6: warning: no previous prototype for ‘iowrite64be_hi_lo’ [-Wmissing-prototypes]
  288 | void iowrite64be_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~

[18:16:30] Starting KUnit Kernel (1/1)...
[18:16:30] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[18:16:30] =================== guc_dbm (7 subtests) ===================
[18:16:30] [PASSED] test_empty
[18:16:30] [PASSED] test_default
[18:16:30] ======================== test_size  ========================
[18:16:30] [PASSED] 4
[18:16:30] [PASSED] 8
[18:16:30] [PASSED] 32
[18:16:30] [PASSED] 256
[18:16:30] ==================== [PASSED] test_size ====================
[18:16:30] ======================= test_reuse  ========================
[18:16:30] [PASSED] 4
[18:16:30] [PASSED] 8
[18:16:30] [PASSED] 32
[18:16:30] [PASSED] 256
[18:16:30] =================== [PASSED] test_reuse ====================
[18:16:30] =================== test_range_overlap  ====================
[18:16:30] [PASSED] 4
[18:16:30] [PASSED] 8
[18:16:30] [PASSED] 32
[18:16:30] [PASSED] 256
[18:16:30] =============== [PASSED] test_range_overlap ================
[18:16:30] =================== test_range_compact  ====================
[18:16:30] [PASSED] 4
[18:16:30] [PASSED] 8
[18:16:30] [PASSED] 32
[18:16:30] [PASSED] 256
[18:16:30] =============== [PASSED] test_range_compact ================
[18:16:30] ==================== test_range_spare  =====================
[18:16:30] [PASSED] 4
[18:16:30] [PASSED] 8
[18:16:30] [PASSED] 32
[18:16:30] [PASSED] 256
[18:16:30] ================ [PASSED] test_range_spare =================
[18:16:30] ===================== [PASSED] guc_dbm =====================
[18:16:30] =================== guc_idm (6 subtests) ===================
[18:16:30] [PASSED] bad_init
[18:16:30] [PASSED] no_init
[18:16:30] [PASSED] init_fini
[18:16:30] [PASSED] check_used
[18:16:30] [PASSED] check_quota
[18:16:30] [PASSED] check_all
[18:16:30] ===================== [PASSED] guc_idm =====================
[18:16:30] ================== no_relay (3 subtests) ===================
[18:16:30] [PASSED] xe_drops_guc2pf_if_not_ready
[18:16:30] [PASSED] xe_drops_guc2vf_if_not_ready
[18:16:30] [PASSED] xe_rejects_send_if_not_ready
[18:16:30] ==================== [PASSED] no_relay =====================
[18:16:30] ================== pf_relay (14 subtests) ==================
[18:16:30] [PASSED] pf_rejects_guc2pf_too_short
[18:16:30] [PASSED] pf_rejects_guc2pf_too_long
[18:16:30] [PASSED] pf_rejects_guc2pf_no_payload
[18:16:30] [PASSED] pf_fails_no_payload
[18:16:30] [PASSED] pf_fails_bad_origin
[18:16:30] [PASSED] pf_fails_bad_type
[18:16:30] [PASSED] pf_txn_reports_error
[18:16:30] [PASSED] pf_txn_sends_pf2guc
[18:16:30] [PASSED] pf_sends_pf2guc
[18:16:30] [SKIPPED] pf_loopback_nop
[18:16:30] [SKIPPED] pf_loopback_echo
[18:16:30] [SKIPPED] pf_loopback_fail
[18:16:30] [SKIPPED] pf_loopback_busy
[18:16:30] [SKIPPED] pf_loopback_retry
[18:16:30] ==================== [PASSED] pf_relay =====================
[18:16:30] ================== vf_relay (3 subtests) ===================
[18:16:30] [PASSED] vf_rejects_guc2vf_too_short
[18:16:30] [PASSED] vf_rejects_guc2vf_too_long
[18:16:30] [PASSED] vf_rejects_guc2vf_no_payload
[18:16:30] ==================== [PASSED] vf_relay =====================
[18:16:30] ================= pf_service (11 subtests) =================
[18:16:30] [PASSED] pf_negotiate_any
[18:16:30] [PASSED] pf_negotiate_base_match
[18:16:30] [PASSED] pf_negotiate_base_newer
[18:16:30] [PASSED] pf_negotiate_base_next
[18:16:30] [SKIPPED] pf_negotiate_base_older
[18:16:30] [PASSED] pf_negotiate_base_prev
[18:16:30] [PASSED] pf_negotiate_latest_match
[18:16:30] [PASSED] pf_negotiate_latest_newer
[18:16:30] [PASSED] pf_negotiate_latest_next
[18:16:30] [SKIPPED] pf_negotiate_latest_older
[18:16:30] [SKIPPED] pf_negotiate_latest_prev
[18:16:30] =================== [PASSED] pf_service ====================
[18:16:30] ===================== lmtt (1 subtest) =====================
[18:16:30] ======================== test_ops  =========================
[18:16:30] [PASSED] 2-level
[18:16:30] [PASSED] multi-level
[18:16:30] ==================== [PASSED] test_ops =====================
[18:16:30] ====================== [PASSED] lmtt =======================
[18:16:30] =================== xe_mocs (2 subtests) ===================
[18:16:30] ================ xe_live_mocs_kernel_kunit  ================
[18:16:30] =========== [SKIPPED] xe_live_mocs_kernel_kunit ============
[18:16:30] ================ xe_live_mocs_reset_kunit  =================
[18:16:30] ============ [SKIPPED] xe_live_mocs_reset_kunit ============
[18:16:30] ==================== [SKIPPED] xe_mocs =====================
[18:16:30] ================= xe_migrate (2 subtests) ==================
[18:16:30] ================= xe_migrate_sanity_kunit  =================
[18:16:30] ============ [SKIPPED] xe_migrate_sanity_kunit =============
[18:16:30] ================== xe_validate_ccs_kunit  ==================
[18:16:30] ============= [SKIPPED] xe_validate_ccs_kunit ==============
[18:16:30] =================== [SKIPPED] xe_migrate ===================
[18:16:30] ================== xe_dma_buf (1 subtest) ==================
[18:16:30] ==================== xe_dma_buf_kunit  =====================
[18:16:30] ================ [SKIPPED] xe_dma_buf_kunit ================
[18:16:30] =================== [SKIPPED] xe_dma_buf ===================
[18:16:30] ==================== xe_bo (3 subtests) ====================
[18:16:30] ================== xe_ccs_migrate_kunit  ===================
[18:16:30] ============== [SKIPPED] xe_ccs_migrate_kunit ==============
[18:16:30] ==================== xe_bo_evict_kunit  ====================
[18:16:30] =============== [SKIPPED] xe_bo_evict_kunit ================
[18:16:30] =================== xe_bo_shrink_kunit  ====================
[18:16:30] =============== [SKIPPED] xe_bo_shrink_kunit ===============
[18:16:30] ===================== [SKIPPED] xe_bo ======================
[18:16:30] ==================== args (11 subtests) ====================
[18:16:30] [PASSED] count_args_test
[18:16:30] [PASSED] call_args_example
[18:16:30] [PASSED] call_args_test
[18:16:30] [PASSED] drop_first_arg_example
[18:16:30] [PASSED] drop_first_arg_test
[18:16:30] [PASSED] first_arg_example
[18:16:30] [PASSED] first_arg_test
[18:16:30] [PASSED] last_arg_example
[18:16:30] [PASSED] last_arg_test
[18:16:30] [PASSED] pick_arg_example
[18:16:30] [PASSED] sep_comma_examplestty: 'standard input': Inappropriate ioctl for device

[18:16:30] ====================== [PASSED] args =======================
[18:16:30] =================== xe_pci (2 subtests) ====================
[18:16:30] [PASSED] xe_gmdid_graphics_ip
[18:16:30] [PASSED] xe_gmdid_media_ip
[18:16:30] ===================== [PASSED] xe_pci ======================
[18:16:30] =================== xe_rtp (2 subtests) ====================
[18:16:30] =============== xe_rtp_process_to_sr_tests  ================
[18:16:30] [PASSED] coalesce-same-reg
[18:16:30] [PASSED] no-match-no-add
[18:16:30] [PASSED] match-or
[18:16:30] [PASSED] match-or-xfail
[18:16:30] [PASSED] no-match-no-add-multiple-rules
[18:16:30] [PASSED] two-regs-two-entries
[18:16:30] [PASSED] clr-one-set-other
[18:16:30] [PASSED] set-field
[18:16:30] [PASSED] conflict-duplicate
[18:16:30] [PASSED] conflict-not-disjoint
[18:16:30] [PASSED] conflict-reg-type
[18:16:30] =========== [PASSED] xe_rtp_process_to_sr_tests ============
[18:16:30] ================== xe_rtp_process_tests  ===================
[18:16:30] [PASSED] active1
[18:16:30] [PASSED] active2
[18:16:30] [PASSED] active-inactive
[18:16:30] [PASSED] inactive-active
[18:16:30] [PASSED] inactive-1st_or_active-inactive
[18:16:30] [PASSED] inactive-2nd_or_active-inactive
[18:16:30] [PASSED] inactive-last_or_active-inactive
[18:16:30] [PASSED] inactive-no_or_active-inactive
[18:16:30] ============== [PASSED] xe_rtp_process_tests ===============
[18:16:30] ===================== [PASSED] xe_rtp ======================
[18:16:30] ==================== xe_wa (1 subtest) =====================
[18:16:30] ======================== xe_wa_gt  =========================
[18:16:30] [PASSED] TIGERLAKE (B0)
[18:16:30] [PASSED] DG1 (A0)
[18:16:30] [PASSED] DG1 (B0)
[18:16:30] [PASSED] ALDERLAKE_S (A0)
[18:16:30] [PASSED] ALDERLAKE_S (B0)
[18:16:30] [PASSED] ALDERLAKE_S (C0)
[18:16:30] [PASSED] ALDERLAKE_S (D0)
[18:16:30] [PASSED] ALDERLAKE_P (A0)
[18:16:30] [PASSED] ALDERLAKE_P (B0)
[18:16:30] [PASSED] ALDERLAKE_P (C0)
[18:16:30] [PASSED] ALDERLAKE_S_RPLS (D0)
[18:16:30] [PASSED] ALDERLAKE_P_RPLU (E0)
[18:16:30] [PASSED] DG2_G10 (C0)
[18:16:30] [PASSED] DG2_G11 (B1)
[18:16:30] [PASSED] DG2_G12 (A1)
[18:16:30] [PASSED] METEORLAKE (g:A0, m:A0)
[18:16:30] [PASSED] METEORLAKE (g:A0, m:A0)
[18:16:30] [PASSED] METEORLAKE (g:A0, m:A0)
[18:16:30] [PASSED] LUNARLAKE (g:A0, m:A0)
[18:16:30] [PASSED] LUNARLAKE (g:B0, m:A0)
[18:16:30] [PASSED] BATTLEMAGE (g:A0, m:A1)
[18:16:30] ==================== [PASSED] xe_wa_gt =====================
[18:16:30] ====================== [PASSED] xe_wa ======================
[18:16:30] ============================================================
[18:16:30] Testing complete. Ran 122 tests: passed: 106, skipped: 16
[18:16:30] Elapsed time: 32.696s total, 4.420s configuring, 28.010s building, 0.245s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/tests/.kunitconfig
[18:16:30] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[18:16:32] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json ARCH=um O=.kunit --jobs=48
../lib/iomap.c:156:5: warning: no previous prototype for ‘ioread64_lo_hi’ [-Wmissing-prototypes]
  156 | u64 ioread64_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:163:5: warning: no previous prototype for ‘ioread64_hi_lo’ [-Wmissing-prototypes]
  163 | u64 ioread64_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:170:5: warning: no previous prototype for ‘ioread64be_lo_hi’ [-Wmissing-prototypes]
  170 | u64 ioread64be_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:178:5: warning: no previous prototype for ‘ioread64be_hi_lo’ [-Wmissing-prototypes]
  178 | u64 ioread64be_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:264:6: warning: no previous prototype for ‘iowrite64_lo_hi’ [-Wmissing-prototypes]
  264 | void iowrite64_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:272:6: warning: no previous prototype for ‘iowrite64_hi_lo’ [-Wmissing-prototypes]
  272 | void iowrite64_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:280:6: warning: no previous prototype for ‘iowrite64be_lo_hi’ [-Wmissing-prototypes]
  280 | void iowrite64be_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~
../lib/iomap.c:288:6: warning: no previous prototype for ‘iowrite64be_hi_lo’ [-Wmissing-prototypes]
  288 | void iowrite64be_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~

[18:16:54] Starting KUnit Kernel (1/1)...
[18:16:54] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[18:16:55] ================== drm_buddy (7 subtests) ==================
[18:16:55] [PASSED] drm_test_buddy_alloc_limit
[18:16:55] [PASSED] drm_test_buddy_alloc_optimistic
[18:16:55] [PASSED] drm_test_buddy_alloc_pessimistic
[18:16:55] [PASSED] drm_test_buddy_alloc_pathological
[18:16:55] [PASSED] drm_test_buddy_alloc_contiguous
[18:16:55] [PASSED] drm_test_buddy_alloc_clear
[18:16:55] [PASSED] drm_test_buddy_alloc_range_bias
[18:16:55] ==================== [PASSED] drm_buddy ====================
[18:16:55] ============= drm_cmdline_parser (40 subtests) =============
[18:16:55] [PASSED] drm_test_cmdline_force_d_only
[18:16:55] [PASSED] drm_test_cmdline_force_D_only_dvi
[18:16:55] [PASSED] drm_test_cmdline_force_D_only_hdmi
[18:16:55] [PASSED] drm_test_cmdline_force_D_only_not_digital
[18:16:55] [PASSED] drm_test_cmdline_force_e_only
[18:16:55] [PASSED] drm_test_cmdline_res
[18:16:55] [PASSED] drm_test_cmdline_res_vesa
[18:16:55] [PASSED] drm_test_cmdline_res_vesa_rblank
[18:16:55] [PASSED] drm_test_cmdline_res_rblank
[18:16:55] [PASSED] drm_test_cmdline_res_bpp
[18:16:55] [PASSED] drm_test_cmdline_res_refresh
[18:16:55] [PASSED] drm_test_cmdline_res_bpp_refresh
[18:16:55] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced
[18:16:55] [PASSED] drm_test_cmdline_res_bpp_refresh_margins
[18:16:55] [PASSED] drm_test_cmdline_res_bpp_refresh_force_off
[18:16:55] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on
[18:16:55] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_analog
[18:16:55] [PASSED] drm_test_cmdline_res_bpp_refresh_force_on_digital
[18:16:55] [PASSED] drm_test_cmdline_res_bpp_refresh_interlaced_margins_force_on
[18:16:55] [PASSED] drm_test_cmdline_res_margins_force_on
[18:16:55] [PASSED] drm_test_cmdline_res_vesa_margins
[18:16:55] [PASSED] drm_test_cmdline_name
[18:16:55] [PASSED] drm_test_cmdline_name_bpp
[18:16:55] [PASSED] drm_test_cmdline_name_option
[18:16:55] [PASSED] drm_test_cmdline_name_bpp_option
[18:16:55] [PASSED] drm_test_cmdline_rotate_0
[18:16:55] [PASSED] drm_test_cmdline_rotate_90
[18:16:55] [PASSED] drm_test_cmdline_rotate_180
[18:16:55] [PASSED] drm_test_cmdline_rotate_270
[18:16:55] [PASSED] drm_test_cmdline_hmirror
[18:16:55] [PASSED] drm_test_cmdline_vmirror
[18:16:55] [PASSED] drm_test_cmdline_margin_options
[18:16:55] [PASSED] drm_test_cmdline_multiple_options
[18:16:55] [PASSED] drm_test_cmdline_bpp_extra_and_option
[18:16:55] [PASSED] drm_test_cmdline_extra_and_option
[18:16:55] [PASSED] drm_test_cmdline_freestanding_options
[18:16:55] [PASSED] drm_test_cmdline_freestanding_force_e_and_options
[18:16:55] [PASSED] drm_test_cmdline_panel_orientation
[18:16:55] ================ drm_test_cmdline_invalid  =================
[18:16:55] [PASSED] margin_only
[18:16:55] [PASSED] interlace_only
[18:16:55] [PASSED] res_missing_x
[18:16:55] [PASSED] res_missing_y
[18:16:55] [PASSED] res_bad_y
[18:16:55] [PASSED] res_missing_y_bpp
[18:16:55] [PASSED] res_bad_bpp
[18:16:55] [PASSED] res_bad_refresh
[18:16:55] [PASSED] res_bpp_refresh_force_on_off
[18:16:55] [PASSED] res_invalid_mode
[18:16:55] [PASSED] res_bpp_wrong_place_mode
[18:16:55] [PASSED] name_bpp_refresh
[18:16:55] [PASSED] name_refresh
[18:16:55] [PASSED] name_refresh_wrong_mode
[18:16:55] [PASSED] name_refresh_invalid_mode
[18:16:55] [PASSED] rotate_multiple
[18:16:55] [PASSED] rotate_invalid_val
[18:16:55] [PASSED] rotate_truncated
[18:16:55] [PASSED] invalid_option
[18:16:55] [PASSED] invalid_tv_option
[18:16:55] [PASSED] truncated_tv_option
[18:16:55] ============ [PASSED] drm_test_cmdline_invalid =============
[18:16:55] =============== drm_test_cmdline_tv_options  ===============
[18:16:55] [PASSED] NTSC
[18:16:55] [PASSED] NTSC_443
[18:16:55] [PASSED] NTSC_J
[18:16:55] [PASSED] PAL
[18:16:55] [PASSED] PAL_M
[18:16:55] [PASSED] PAL_N
[18:16:55] [PASSED] SECAM
[18:16:55] [PASSED] MONO_525
[18:16:55] [PASSED] MONO_625
[18:16:55] =========== [PASSED] drm_test_cmdline_tv_options ===========
[18:16:55] =============== [PASSED] drm_cmdline_parser ================
[18:16:55] ========== drmm_connector_hdmi_init (19 subtests) ==========
[18:16:55] [PASSED] drm_test_connector_hdmi_init_valid
[18:16:55] [PASSED] drm_test_connector_hdmi_init_bpc_8
[18:16:55] [PASSED] drm_test_connector_hdmi_init_bpc_10
[18:16:55] [PASSED] drm_test_connector_hdmi_init_bpc_12
[18:16:55] [PASSED] drm_test_connector_hdmi_init_bpc_invalid
[18:16:55] [PASSED] drm_test_connector_hdmi_init_bpc_null
[18:16:55] [PASSED] drm_test_connector_hdmi_init_formats_empty
[18:16:55] [PASSED] drm_test_connector_hdmi_init_formats_no_rgb
[18:16:55] [PASSED] drm_test_connector_hdmi_init_null_ddc
[18:16:55] [PASSED] drm_test_connector_hdmi_init_null_product
[18:16:55] [PASSED] drm_test_connector_hdmi_init_null_vendor
[18:16:55] [PASSED] drm_test_connector_hdmi_init_product_length_exact
[18:16:55] [PASSED] drm_test_connector_hdmi_init_product_length_too_long
[18:16:55] [PASSED] drm_test_connector_hdmi_init_product_valid
[18:16:55] [PASSED] drm_test_connector_hdmi_init_vendor_length_exact
[18:16:55] [PASSED] drm_test_connector_hdmi_init_vendor_length_too_long
[18:16:55] [PASSED] drm_test_connector_hdmi_init_vendor_valid
[18:16:55] ========= drm_test_connector_hdmi_init_type_valid  =========
[18:16:55] [PASSED] HDMI-A
[18:16:55] [PASSED] HDMI-B
[18:16:55] ===== [PASSED] drm_test_connector_hdmi_init_type_valid =====
[18:16:55] ======== drm_test_connector_hdmi_init_type_invalid  ========
[18:16:55] [PASSED] Unknown
[18:16:55] [PASSED] VGA
[18:16:55] [PASSED] DVI-I
[18:16:55] [PASSED] DVI-D
[18:16:55] [PASSED] DVI-A
[18:16:55] [PASSED] Composite
[18:16:55] [PASSED] SVIDEO
[18:16:55] [PASSED] LVDS
[18:16:55] [PASSED] Component
[18:16:55] [PASSED] DIN
[18:16:55] [PASSED] DP
[18:16:55] [PASSED] TV
[18:16:55] [PASSED] eDP
[18:16:55] [PASSED] Virtual
[18:16:55] [PASSED] DSI
[18:16:55] [PASSED] DPI
[18:16:55] [PASSED] Writeback
[18:16:55] [PASSED] SPI
[18:16:55] [PASSED] USB
[18:16:55] ==== [PASSED] drm_test_connector_hdmi_init_type_invalid ====
[18:16:55] ============ [PASSED] drmm_connector_hdmi_init =============
[18:16:55] ============= drmm_connector_init (3 subtests) =============
[18:16:55] [PASSED] drm_test_drmm_connector_init
[18:16:55] [PASSED] drm_test_drmm_connector_init_null_ddc
[18:16:55] ========= drm_test_drmm_connector_init_type_valid  =========
[18:16:55] [PASSED] Unknown
[18:16:55] [PASSED] VGA
[18:16:55] [PASSED] DVI-I
[18:16:55] [PASSED] DVI-D
[18:16:55] [PASSED] DVI-A
[18:16:55] [PASSED] Composite
[18:16:55] [PASSED] SVIDEO
[18:16:55] [PASSED] LVDS
[18:16:55] [PASSED] Component
[18:16:55] [PASSED] DIN
[18:16:55] [PASSED] DP
[18:16:55] [PASSED] HDMI-A
[18:16:55] [PASSED] HDMI-B
[18:16:55] [PASSED] TV
[18:16:55] [PASSED] eDP
[18:16:55] [PASSED] Virtual
[18:16:55] [PASSED] DSI
[18:16:55] [PASSED] DPI
[18:16:55] [PASSED] Writeback
[18:16:55] [PASSED] SPI
[18:16:55] [PASSED] USB
[18:16:55] ===== [PASSED] drm_test_drmm_connector_init_type_valid =====
[18:16:55] =============== [PASSED] drmm_connector_init ===============
[18:16:55] = drm_connector_attach_broadcast_rgb_property (2 subtests) =
[18:16:55] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property
[18:16:55] [PASSED] drm_test_drm_connector_attach_broadcast_rgb_property_hdmi_connector
[18:16:55] === [PASSED] drm_connector_attach_broadcast_rgb_property ===
[18:16:55] ========== drm_get_tv_mode_from_name (2 subtests) ==========
[18:16:55] ========== drm_test_get_tv_mode_from_name_valid  ===========
[18:16:55] [PASSED] NTSC
[18:16:55] [PASSED] NTSC-443
[18:16:55] [PASSED] NTSC-J
[18:16:55] [PASSED] PAL
[18:16:55] [PASSED] PAL-M
[18:16:55] [PASSED] PAL-N
[18:16:55] [PASSED] SECAM
[18:16:55] [PASSED] Mono
[18:16:55] ====== [PASSED] drm_test_get_tv_mode_from_name_valid =======
[18:16:55] [PASSED] drm_test_get_tv_mode_from_name_truncated
[18:16:55] ============ [PASSED] drm_get_tv_mode_from_name ============
[18:16:55] = drm_test_connector_hdmi_compute_mode_clock (12 subtests) =
[18:16:55] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb
[18:16:55] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc
[18:16:55] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_10bpc_vic_1
[18:16:55] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc
[18:16:55] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_12bpc_vic_1
[18:16:55] [PASSED] drm_test_drm_hdmi_compute_mode_clock_rgb_double
[18:16:55] = drm_test_connector_hdmi_compute_mode_clock_yuv420_valid  =
[18:16:55] [PASSED] VIC 96
[18:16:55] [PASSED] VIC 97
[18:16:55] [PASSED] VIC 101
[18:16:55] [PASSED] VIC 102
[18:16:55] [PASSED] VIC 106
[18:16:55] [PASSED] VIC 107
[18:16:55] === [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_valid ===
[18:16:55] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_10_bpc
[18:16:55] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv420_12_bpc
[18:16:55] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_8_bpc
[18:16:55] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_10_bpc
[18:16:55] [PASSED] drm_test_connector_hdmi_compute_mode_clock_yuv422_12_bpc
[18:16:55] === [PASSED] drm_test_connector_hdmi_compute_mode_clock ====
[18:16:55] == drm_hdmi_connector_get_broadcast_rgb_name (2 subtests) ==
[18:16:55] === drm_test_drm_hdmi_connector_get_broadcast_rgb_name  ====
[18:16:55] [PASSED] Automatic
[18:16:55] [PASSED] Full
[18:16:55] [PASSED] Limited 16:235
[18:16:55] === [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name ===
[18:16:55] [PASSED] drm_test_drm_hdmi_connector_get_broadcast_rgb_name_invalid
[18:16:55] ==== [PASSED] drm_hdmi_connector_get_broadcast_rgb_name ====
[18:16:55] == drm_hdmi_connector_get_output_format_name (2 subtests) ==
[18:16:55] === drm_test_drm_hdmi_connector_get_output_format_name  ====
[18:16:55] [PASSED] RGB
[18:16:55] [PASSED] YUV 4:2:0
[18:16:55] [PASSED] YUV 4:2:2
[18:16:55] [PASSED] YUV 4:4:4
[18:16:55] === [PASSED] drm_test_drm_hdmi_connector_get_output_format_name ===
[18:16:55] [PASSED] drm_test_drm_hdmi_connector_get_output_format_name_invalid
[18:16:55] ==== [PASSED] drm_hdmi_connector_get_output_format_name ====
[18:16:55] ============= drm_damage_helper (21 subtests) ==============
[18:16:55] [PASSED] drm_test_damage_iter_no_damage
[18:16:55] [PASSED] drm_test_damage_iter_no_damage_fractional_src
[18:16:55] [PASSED] drm_test_damage_iter_no_damage_src_moved
[18:16:55] [PASSED] drm_test_damage_iter_no_damage_fractional_src_moved
[18:16:55] [PASSED] drm_test_damage_iter_no_damage_not_visible
[18:16:55] [PASSED] drm_test_damage_iter_no_damage_no_crtc
[18:16:55] [PASSED] drm_test_damage_iter_no_damage_no_fb
[18:16:55] [PASSED] drm_test_damage_iter_simple_damage
[18:16:55] [PASSED] drm_test_damage_iter_single_damage
[18:16:55] [PASSED] drm_test_damage_iter_single_damage_intersect_src
[18:16:55] [PASSED] drm_test_damage_iter_single_damage_outside_src
[18:16:55] [PASSED] drm_test_damage_iter_single_damage_fractional_src
[18:16:55] [PASSED] drm_test_damage_iter_single_damage_intersect_fractional_src
[18:16:55] [PASSED] drm_test_damage_iter_single_damage_outside_fractional_src
[18:16:55] [PASSED] drm_test_damage_iter_single_damage_src_moved
[18:16:55] [PASSED] drm_test_damage_iter_single_damage_fractional_src_moved
[18:16:55] [PASSED] drm_test_damage_iter_damage
[18:16:55] [PASSED] drm_test_damage_iter_damage_one_intersect
[18:16:55] [PASSED] drm_test_damage_iter_damage_one_outside
[18:16:55] [PASSED] drm_test_damage_iter_damage_src_moved
[18:16:55] [PASSED] drm_test_damage_iter_damage_not_visible
[18:16:55] ================ [PASSED] drm_damage_helper ================
[18:16:55] ============== drm_dp_mst_helper (3 subtests) ==============
[18:16:55] ============== drm_test_dp_mst_calc_pbn_mode  ==============
[18:16:55] [PASSED] Clock 154000 BPP 30 DSC disabled
[18:16:55] [PASSED] Clock 234000 BPP 30 DSC disabled
[18:16:55] [PASSED] Clock 297000 BPP 24 DSC disabled
[18:16:55] [PASSED] Clock 332880 BPP 24 DSC enabled
[18:16:55] [PASSED] Clock 324540 BPP 24 DSC enabled
[18:16:55] ========== [PASSED] drm_test_dp_mst_calc_pbn_mode ==========
[18:16:55] ============== drm_test_dp_mst_calc_pbn_div  ===============
[18:16:55] [PASSED] Link rate 2000000 lane count 4
[18:16:55] [PASSED] Link rate 2000000 lane count 2
[18:16:55] [PASSED] Link rate 2000000 lane count 1
[18:16:55] [PASSED] Link rate 1350000 lane count 4
[18:16:55] [PASSED] Link rate 1350000 lane count 2
[18:16:55] [PASSED] Link rate 1350000 lane count 1
[18:16:55] [PASSED] Link rate 1000000 lane count 4
[18:16:55] [PASSED] Link rate 1000000 lane count 2
[18:16:55] [PASSED] Link rate 1000000 lane count 1
[18:16:55] [PASSED] Link rate 810000 lane count 4
[18:16:55] [PASSED] Link rate 810000 lane count 2
[18:16:55] [PASSED] Link rate 810000 lane count 1
[18:16:55] [PASSED] Link rate 540000 lane count 4
[18:16:55] [PASSED] Link rate 540000 lane count 2
[18:16:55] [PASSED] Link rate 540000 lane count 1
[18:16:55] [PASSED] Link rate 270000 lane count 4
[18:16:55] [PASSED] Link rate 270000 lane count 2
[18:16:55] [PASSED] Link rate 270000 lane count 1
[18:16:55] [PASSED] Link rate 162000 lane count 4
[18:16:55] [PASSED] Link rate 162000 lane count 2
[18:16:55] [PASSED] Link rate 162000 lane count 1
[18:16:55] ========== [PASSED] drm_test_dp_mst_calc_pbn_div ===========
[18:16:55] ========= drm_test_dp_mst_sideband_msg_req_decode  =========
[18:16:55] [PASSED] DP_ENUM_PATH_RESOURCES with port number
[18:16:55] [PASSED] DP_POWER_UP_PHY with port number
[18:16:55] [PASSED] DP_POWER_DOWN_PHY with port number
[18:16:55] [PASSED] DP_ALLOCATE_PAYLOAD with SDP stream sinks
[18:16:55] [PASSED] DP_ALLOCATE_PAYLOAD with port number
[18:16:55] [PASSED] DP_ALLOCATE_PAYLOAD with VCPI
[18:16:55] [PASSED] DP_ALLOCATE_PAYLOAD with PBN
[18:16:55] [PASSED] DP_QUERY_PAYLOAD with port number
[18:16:55] [PASSED] DP_QUERY_PAYLOAD with VCPI
[18:16:55] [PASSED] DP_REMOTE_DPCD_READ with port number
[18:16:55] [PASSED] DP_REMOTE_DPCD_READ with DPCD address
[18:16:55] [PASSED] DP_REMOTE_DPCD_READ with max number of bytes
[18:16:55] [PASSED] DP_REMOTE_DPCD_WRITE with port number
[18:16:55] [PASSED] DP_REMOTE_DPCD_WRITE with DPCD address
[18:16:55] [PASSED] DP_REMOTE_DPCD_WRITE with data array
[18:16:55] [PASSED] DP_REMOTE_I2C_READ with port number
[18:16:55] [PASSED] DP_REMOTE_I2C_READ with I2C device ID
[18:16:55] [PASSED] DP_REMOTE_I2C_READ with transactions array
[18:16:55] [PASSED] DP_REMOTE_I2C_WRITE with port number
[18:16:55] [PASSED] DP_REMOTE_I2C_WRITE with I2C device ID
[18:16:55] [PASSED] DP_REMOTE_I2C_WRITE with data array
[18:16:55] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream ID
[18:16:55] [PASSED] DP_QUERY_STREAM_ENC_STATUS with client ID
[18:16:55] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream event
[18:16:55] [PASSED] DP_QUERY_STREAM_ENC_STATUS with valid stream event
[18:16:55] [PASSED] DP_QUERY_STREAM_ENC_STATUS with stream behavior
[18:16:55] [PASSED] DP_QUERY_STREAM_ENC_STATUS with a valid stream behavior
[18:16:55] ===== [PASSED] drm_test_dp_mst_sideband_msg_req_decode =====
[18:16:55] ================ [PASSED] drm_dp_mst_helper ================
[18:16:55] ================== drm_exec (7 subtests) ===================
[18:16:55] [PASSED] sanitycheck
[18:16:55] [PASSED] test_lock
[18:16:55] [PASSED] test_lock_unlock
[18:16:55] [PASSED] test_duplicates
[18:16:55] [PASSED] test_prepare
[18:16:55] [PASSED] test_prepare_array
[18:16:55] [PASSED] test_multiple_loops
[18:16:55] ==================== [PASSED] drm_exec =====================
[18:16:55] =========== drm_format_helper_test (17 subtests) ===========
[18:16:55] ============== drm_test_fb_xrgb8888_to_gray8  ==============
[18:16:55] [PASSED] single_pixel_source_buffer
[18:16:55] [PASSED] single_pixel_clip_rectangle
[18:16:55] [PASSED] well_known_colors
[18:16:55] [PASSED] destination_pitch
[18:16:55] ========== [PASSED] drm_test_fb_xrgb8888_to_gray8 ==========
[18:16:55] ============= drm_test_fb_xrgb8888_to_rgb332  ==============
[18:16:55] [PASSED] single_pixel_source_buffer
[18:16:55] [PASSED] single_pixel_clip_rectangle
[18:16:55] [PASSED] well_known_colors
[18:16:55] [PASSED] destination_pitch
[18:16:55] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb332 ==========
[18:16:55] ============= drm_test_fb_xrgb8888_to_rgb565  ==============
[18:16:55] [PASSED] single_pixel_source_buffer
[18:16:55] [PASSED] single_pixel_clip_rectangle
[18:16:55] [PASSED] well_known_colors
[18:16:55] [PASSED] destination_pitch
[18:16:55] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb565 ==========
[18:16:55] ============ drm_test_fb_xrgb8888_to_xrgb1555  =============
[18:16:55] [PASSED] single_pixel_source_buffer
[18:16:55] [PASSED] single_pixel_clip_rectangle
[18:16:55] [PASSED] well_known_colors
[18:16:55] [PASSED] destination_pitch
[18:16:55] ======== [PASSED] drm_test_fb_xrgb8888_to_xrgb1555 =========
[18:16:55] ============ drm_test_fb_xrgb8888_to_argb1555  =============
[18:16:55] [PASSED] single_pixel_source_buffer
[18:16:55] [PASSED] single_pixel_clip_rectangle
[18:16:55] [PASSED] well_known_colors
[18:16:55] [PASSED] destination_pitch
[18:16:55] ======== [PASSED] drm_test_fb_xrgb8888_to_argb1555 =========
[18:16:55] ============ drm_test_fb_xrgb8888_to_rgba5551  =============
[18:16:55] [PASSED] single_pixel_source_buffer
[18:16:55] [PASSED] single_pixel_clip_rectangle
[18:16:55] [PASSED] well_known_colors
[18:16:55] [PASSED] destination_pitch
[18:16:55] ======== [PASSED] drm_test_fb_xrgb8888_to_rgba5551 =========
[18:16:55] ============= drm_test_fb_xrgb8888_to_rgb888  ==============
[18:16:55] [PASSED] single_pixel_source_buffer
[18:16:55] [PASSED] single_pixel_clip_rectangle
[18:16:55] [PASSED] well_known_colors
[18:16:55] [PASSED] destination_pitch
[18:16:55] ========= [PASSED] drm_test_fb_xrgb8888_to_rgb888 ==========
[18:16:55] ============ drm_test_fb_xrgb8888_to_argb8888  =============
[18:16:55] [PASSED] single_pixel_source_buffer
[18:16:55] [PASSED] single_pixel_clip_rectangle
[18:16:55] [PASSED] well_known_colors
[18:16:55] [PASSED] destination_pitch
[18:16:55] ======== [PASSED] drm_test_fb_xrgb8888_to_argb8888 =========
[18:16:55] =========== drm_test_fb_xrgb8888_to_xrgb2101010  ===========
[18:16:55] [PASSED] single_pixel_source_buffer
[18:16:55] [PASSED] single_pixel_clip_rectangle
[18:16:55] [PASSED] well_known_colors
[18:16:55] [PASSED] destination_pitch
[18:16:55] ======= [PASSED] drm_test_fb_xrgb8888_to_xrgb2101010 =======
[18:16:55] =========== drm_test_fb_xrgb8888_to_argb2101010  ===========
[18:16:55] [PASSED] single_pixel_source_buffer
[18:16:55] [PASSED] single_pixel_clip_rectangle
[18:16:55] [PASSED] well_known_colors
[18:16:55] [PASSED] destination_pitch
[18:16:55] ======= [PASSED] drm_test_fb_xrgb8888_to_argb2101010 =======
[18:16:55] ============== drm_test_fb_xrgb8888_to_mono  ===============
[18:16:55] [PASSED] single_pixel_source_buffer
[18:16:55] [PASSED] single_pixel_clip_rectangle
[18:16:55] [PASSED] well_known_colors
[18:16:55] [PASSED] destination_pitch
[18:16:55] ========== [PASSED] drm_test_fb_xrgb8888_to_mono ===========
[18:16:55] ==================== drm_test_fb_swab  =====================
[18:16:55] [PASSED] single_pixel_source_buffer
[18:16:55] [PASSED] single_pixel_clip_rectangle
[18:16:55] [PASSED] well_known_colors
[18:16:55] [PASSED] destination_pitch
[18:16:55] ================ [PASSED] drm_test_fb_swab =================
[18:16:55] ============ drm_test_fb_xrgb8888_to_xbgr8888  =============
[18:16:55] [PASSED] single_pixel_source_buffer
[18:16:55] [PASSED] single_pixel_clip_rectangle
[18:16:55] [PASSED] well_known_colors
[18:16:55] [PASSED] destination_pitch
[18:16:55] ======== [PASSED] drm_test_fb_xrgb8888_to_xbgr8888 =========
[18:16:55] ============ drm_test_fb_xrgb8888_to_abgr8888  =============
[18:16:55] [PASSED] single_pixel_source_buffer
[18:16:55] [PASSED] single_pixel_clip_rectangle
[18:16:55] [PASSED] well_known_colors
[18:16:55] [PASSED] destination_pitch
[18:16:55] ======== [PASSED] drm_test_fb_xrgb8888_to_abgr8888 =========
[18:16:55] ================= drm_test_fb_clip_offset  =================
[18:16:55] [PASSED] pass through
[18:16:55] [PASSED] horizontal offset
[18:16:55] [PASSED] vertical offset
[18:16:55] [PASSED] horizontal and vertical offset
[18:16:55] [PASSED] horizontal offset (custom pitch)
[18:16:55] [PASSED] vertical offset (custom pitch)
[18:16:55] [PASSED] horizontal and vertical offset (custom pitch)
[18:16:55] ============= [PASSED] drm_test_fb_clip_offset =============
[18:16:55] ============== drm_test_fb_build_fourcc_list  ==============
[18:16:55] [PASSED] no native formats
[18:16:55] [PASSED] XRGB8888 as native format
[18:16:55] [PASSED] remove duplicates
[18:16:55] [PASSED] convert alpha formats
[18:16:55] [PASSED] random formats
[18:16:55] ========== [PASSED] drm_test_fb_build_fourcc_list ==========
[18:16:55] =================== drm_test_fb_memcpy  ====================
[18:16:55] [PASSED] single_pixel_source_buffer: XR24 little-endian (0x34325258)
[18:16:55] [PASSED] single_pixel_source_buffer: XRA8 little-endian (0x38415258)
[18:16:55] [PASSED] single_pixel_source_buffer: YU24 little-endian (0x34325559)
[18:16:55] [PASSED] single_pixel_clip_rectangle: XB24 little-endian (0x34324258)
[18:16:55] [PASSED] single_pixel_clip_rectangle: XRA8 little-endian (0x38415258)
[18:16:55] [PASSED] single_pixel_clip_rectangle: YU24 little-endian (0x34325559)
[18:16:55] [PASSED] well_known_colors: XB24 little-endian (0x34324258)
[18:16:55] [PASSED] well_known_colors: XRA8 little-endian (0x38415258)
[18:16:55] [PASSED] well_known_colors: YU24 little-endian (0x34325559)
[18:16:55] [PASSED] destination_pitch: XB24 little-endian (0x34324258)
[18:16:55] [PASSED] destination_pitch: XRA8 little-endian (0x38415258)
[18:16:55] [PASSED] destination_pitch: YU24 little-endian (0x34325559)
[18:16:55] =============== [PASSED] drm_test_fb_memcpy ================
[18:16:55] ============= [PASSED] drm_format_helper_test ==============
[18:16:55] ================= drm_format (18 subtests) =================
[18:16:55] [PASSED] drm_test_format_block_width_invalid
[18:16:55] [PASSED] drm_test_format_block_width_one_plane
[18:16:55] [PASSED] drm_test_format_block_width_two_plane
[18:16:55] [PASSED] drm_test_format_block_width_three_plane
[18:16:55] [PASSED] drm_test_format_block_width_tiled
[18:16:55] [PASSED] drm_test_format_block_height_invalid
[18:16:55] [PASSED] drm_test_format_block_height_one_plane
[18:16:55] [PASSED] drm_test_format_block_height_two_plane
[18:16:55] [PASSED] drm_test_format_block_height_three_plane
[18:16:55] [PASSED] drm_test_format_block_height_tiled
[18:16:55] [PASSED] drm_test_format_min_pitch_invalid
[18:16:55] [PASSED] drm_test_format_min_pitch_one_plane_8bpp
[18:16:55] [PASSED] drm_test_format_min_pitch_one_plane_16bpp
[18:16:55] [PASSED] drm_test_format_min_pitch_one_plane_24bpp
[18:16:55] [PASSED] drm_test_format_min_pitch_one_plane_32bpp
[18:16:55] [PASSED] drm_test_format_min_pitch_two_plane
[18:16:55] [PASSED] drm_test_format_min_pitch_three_plane_8bpp
[18:16:55] [PASSED] drm_test_format_min_pitch_tiled
[18:16:55] =================== [PASSED] drm_format ====================
[18:16:55] ============== drm_framebuffer (10 subtests) ===============
[18:16:55] ========== drm_test_framebuffer_check_src_coords  ==========
[18:16:55] [PASSED] Success: source fits into fb
[18:16:55] [PASSED] Fail: overflowing fb with x-axis coordinate
[18:16:55] [PASSED] Fail: overflowing fb with y-axis coordinate
[18:16:55] [PASSED] Fail: overflowing fb with source width
[18:16:55] [PASSED] Fail: overflowing fb with source height
[18:16:55] ====== [PASSED] drm_test_framebuffer_check_src_coords ======
[18:16:55] [PASSED] drm_test_framebuffer_cleanup
[18:16:55] =============== drm_test_framebuffer_create  ===============
[18:16:55] [PASSED] ABGR8888 normal sizes
[18:16:55] [PASSED] ABGR8888 max sizes
[18:16:55] [PASSED] ABGR8888 pitch greater than min required
[18:16:55] [PASSED] ABGR8888 pitch less than min required
[18:16:55] [PASSED] ABGR8888 Invalid width
[18:16:55] [PASSED] ABGR8888 Invalid buffer handle
[18:16:55] [PASSED] No pixel format
[18:16:55] [PASSED] ABGR8888 Width 0
[18:16:55] [PASSED] ABGR8888 Height 0
[18:16:55] [PASSED] ABGR8888 Out of bound height * pitch combination
[18:16:55] [PASSED] ABGR8888 Large buffer offset
[18:16:55] [PASSED] ABGR8888 Buffer offset for inexistent plane
[18:16:55] [PASSED] ABGR8888 Invalid flag
[18:16:55] [PASSED] ABGR8888 Set DRM_MODE_FB_MODIFIERS without modifiers
[18:16:55] [PASSED] ABGR8888 Valid buffer modifier
[18:16:55] [PASSED] ABGR8888 Invalid buffer modifier(DRM_FORMAT_MOD_SAMSUNG_64_32_TILE)
[18:16:55] [PASSED] ABGR8888 Extra pitches without DRM_MODE_FB_MODIFIERS
[18:16:55] [PASSED] ABGR8888 Extra pitches with DRM_MODE_FB_MODIFIERS
[18:16:55] [PASSED] NV12 Normal sizes
[18:16:55] [PASSED] NV12 Max sizes
[18:16:55] [PASSED] NV12 Invalid pitch
[18:16:55] [PASSED] NV12 Invalid modifier/missing DRM_MODE_FB_MODIFIERS flag
[18:16:55] [PASSED] NV12 different  modifier per-plane
[18:16:55] [PASSED] NV12 with DRM_FORMAT_MOD_SAMSUNG_64_32_TILE
[18:16:55] [PASSED] NV12 Valid modifiers without DRM_MODE_FB_MODIFIERS
[18:16:55] [PASSED] NV12 Modifier for inexistent plane
[18:16:55] [PASSED] NV12 Handle for inexistent plane
[18:16:55] [PASSED] NV12 Handle for inexistent plane without DRM_MODE_FB_MODIFIERS
[18:16:55] [PASSED] YVU420 DRM_MODE_FB_MODIFIERS set without modifier
[18:16:55] [PASSED] YVU420 Normal sizes
[18:16:55] [PASSED] YVU420 Max sizes
[18:16:55] [PASSED] YVU420 Invalid pitch
[18:16:55] [PASSED] YVU420 Different pitches
[18:16:55] [PASSED] YVU420 Different buffer offsets/pitches
[18:16:55] [PASSED] YVU420 Modifier set just for plane 0, without DRM_MODE_FB_MODIFIERS
[18:16:55] [PASSED] YVU420 Modifier set just for planes 0, 1, without DRM_MODE_FB_MODIFIERS
[18:16:55] [PASSED] YVU420 Modifier set just for plane 0, 1, with DRM_MODE_FB_MODIFIERS
[18:16:55] [PASSED] YVU420 Valid modifier
[18:16:55] [PASSED] YVU420 Different modifiers per plane
[18:16:55] [PASSED] YVU420 Modifier for inexistent plane
[18:16:55] [PASSED] YUV420_10BIT Invalid modifier(DRM_FORMAT_MOD_LINEAR)
[18:16:55] [PASSED] X0L2 Normal sizes
[18:16:55] [PASSED] X0L2 Max sizes
[18:16:55] [PASSED] X0L2 Invalid pitch
[18:16:55] [PASSED] X0L2 Pitch greater than minimum required
[18:16:55] [PASSED] X0L2 Handle for inexistent plane
[18:16:55] [PASSED] X0L2 Offset for inexistent plane, without DRM_MODE_FB_MODIFIERS set
[18:16:55] [PASSED] X0L2 Modifier without DRM_MODE_FB_MODIFIERS set
[18:16:55] [PASSED] X0L2 Valid modifier
[18:16:55] [PASSED] X0L2 Modifier for inexistent plane
[18:16:55] =========== [PASSED] drm_test_framebuffer_create ===========
[18:16:55] [PASSED] drm_test_framebuffer_free
[18:16:55] [PASSED] drm_test_framebuffer_init
[18:16:55] [PASSED] drm_test_framebuffer_init_bad_format
[18:16:55] [PASSED] drm_test_framebuffer_init_dev_mismatch
[18:16:55] [PASSED] drm_test_framebuffer_lookup
[18:16:55] [PASSED] drm_test_framebuffer_lookup_inexistent
[18:16:55] [PASSED] drm_test_framebuffer_modifiers_not_supported
[18:16:55] ================= [PASSED] drm_framebuffer =================
[18:16:55] ================ drm_gem_shmem (8 subtests) ================
[18:16:55] [PASSED] drm_gem_shmem_test_obj_create
[18:16:55] [PASSED] drm_gem_shmem_test_obj_create_private
[18:16:55] [PASSED] drm_gem_shmem_test_pin_pages
[18:16:55] [PASSED] drm_gem_shmem_test_vmap
[18:16:55] [PASSED] drm_gem_shmem_test_get_pages_sgt
[18:16:55] [PASSED] drm_gem_shmem_test_get_sg_table
[18:16:55] [PASSED] drm_gem_shmem_test_madvise
[18:16:55] [PASSED] drm_gem_shmem_test_purge
[18:16:55] ================== [PASSED] drm_gem_shmem ==================
[18:16:55] === drm_atomic_helper_connector_hdmi_check (22 subtests) ===
[18:16:55] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode
[18:16:55] [PASSED] drm_test_check_broadcast_rgb_auto_cea_mode_vic_1
[18:16:55] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode
[18:16:55] [PASSED] drm_test_check_broadcast_rgb_full_cea_mode_vic_1
[18:16:55] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode
[18:16:55] [PASSED] drm_test_check_broadcast_rgb_limited_cea_mode_vic_1
[18:16:55] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_changed
[18:16:55] [PASSED] drm_test_check_broadcast_rgb_crtc_mode_not_changed
[18:16:55] [PASSED] drm_test_check_hdmi_funcs_reject_rate
[18:16:55] [PASSED] drm_test_check_max_tmds_rate_bpc_fallback
[18:16:55] [PASSED] drm_test_check_max_tmds_rate_format_fallback
[18:16:55] [PASSED] drm_test_check_output_bpc_crtc_mode_changed
[18:16:55] [PASSED] drm_test_check_output_bpc_crtc_mode_not_changed
[18:16:55] [PASSED] drm_test_check_output_bpc_dvi
[18:16:55] [PASSED] drm_test_check_output_bpc_format_vic_1
[18:16:55] [PASSED] drm_test_check_output_bpc_format_display_8bpc_only
[18:16:55] [PASSED] drm_test_check_output_bpc_format_display_rgb_only
[18:16:55] [PASSED] drm_test_check_output_bpc_format_driver_8bpc_only
[18:16:55] [PASSED] drm_test_check_output_bpc_format_driver_rgb_only
[18:16:55] [PASSED] drm_test_check_tmds_char_rate_rgb_8bpc
[18:16:55] [PASSED] drm_test_check_tmds_char_rate_rgb_10bpc
[18:16:55] [PASSED] drm_test_check_tmds_char_rate_rgb_12bpc
[18:16:55] ===== [PASSED] drm_atomic_helper_connector_hdmi_check ======
[18:16:55] === drm_atomic_helper_connector_hdmi_reset (6 subtests) ====
[18:16:55] [PASSED] drm_test_check_broadcast_rgb_value
[18:16:55] [PASSED] drm_test_check_bpc_8_value
[18:16:55] [PASSED] drm_test_check_bpc_10_value
[18:16:55] [PASSED] drm_test_check_bpc_12_value
[18:16:55] [PASSED] drm_test_check_format_value
[18:16:55] [PASSED] drm_test_check_tmds_char_value
[18:16:55] ===== [PASSED] drm_atomic_helper_connector_hdmi_reset ======
[18:16:55] ================= drm_managed (2 subtests) =================
[18:16:55] [PASSED] drm_test_managed_release_action
[18:16:55] [PASSED] drm_test_managed_run_action
[18:16:55] =================== [PASSED] drm_managed ===================
[18:16:55] =================== drm_mm (6 subtests) ====================
[18:16:55] [PASSED] drm_test_mm_init
[18:16:55] [PASSED] drm_test_mm_debug
[18:16:55] [PASSED] drm_test_mm_align32
[18:16:55] [PASSED] drm_test_mm_align64
[18:16:55] [PASSED] drm_test_mm_lowest
[18:16:55] [PASSED] drm_test_mm_highest
[18:16:55] ===================== [PASSED] drm_mm ======================
[18:16:55] ============= drm_modes_analog_tv (5 subtests) =============
[18:16:55] [PASSED] drm_test_modes_analog_tv_mono_576i
[18:16:55] [PASSED] drm_test_modes_analog_tv_ntsc_480i
[18:16:55] [PASSED] drm_test_modes_analog_tv_ntsc_480i_inlined
[18:16:55] [PASSED] drm_test_modes_analog_tv_pal_576i
[18:16:55] [PASSED] drm_test_modes_analog_tv_pal_576i_inlined
[18:16:55] =============== [PASSED] drm_modes_analog_tv ===============
stty: 'standard input': Inappropriate ioctl for device
[18:16:55] ============== drm_plane_helper (2 subtests) ===============
[18:16:55] =============== drm_test_check_plane_state  ================
[18:16:55] [PASSED] clipping_simple
[18:16:55] [PASSED] clipping_rotate_reflect
[18:16:55] [PASSED] positioning_simple
[18:16:55] [PASSED] upscaling
[18:16:55] [PASSED] downscaling
[18:16:55] [PASSED] rounding1
[18:16:55] [PASSED] rounding2
[18:16:55] [PASSED] rounding3
[18:16:55] [PASSED] rounding4
[18:16:55] =========== [PASSED] drm_test_check_plane_state ============
[18:16:55] =========== drm_test_check_invalid_plane_state  ============
[18:16:55] [PASSED] positioning_invalid
[18:16:55] [PASSED] upscaling_invalid
[18:16:55] [PASSED] downscaling_invalid
[18:16:55] ======= [PASSED] drm_test_check_invalid_plane_state ========
[18:16:55] ================ [PASSED] drm_plane_helper =================
[18:16:55] ====== drm_connector_helper_tv_get_modes (1 subtest) =======
[18:16:55] ====== drm_test_connector_helper_tv_get_modes_check  =======
[18:16:55] [PASSED] None
[18:16:55] [PASSED] PAL
[18:16:55] [PASSED] NTSC
[18:16:55] [PASSED] Both, NTSC Default
[18:16:55] [PASSED] Both, PAL Default
[18:16:55] [PASSED] Both, NTSC Default, with PAL on command-line
[18:16:55] [PASSED] Both, PAL Default, with NTSC on command-line
[18:16:55] == [PASSED] drm_test_connector_helper_tv_get_modes_check ===
[18:16:55] ======== [PASSED] drm_connector_helper_tv_get_modes ========
[18:16:55] ================== drm_rect (9 subtests) ===================
[18:16:55] [PASSED] drm_test_rect_clip_scaled_div_by_zero
[18:16:55] [PASSED] drm_test_rect_clip_scaled_not_clipped
[18:16:55] [PASSED] drm_test_rect_clip_scaled_clipped
[18:16:55] [PASSED] drm_test_rect_clip_scaled_signed_vs_unsigned
[18:16:55] ================= drm_test_rect_intersect  =================
[18:16:55] [PASSED] top-left x bottom-right: 2x2+1+1 x 2x2+0+0
[18:16:55] [PASSED] top-right x bottom-left: 2x2+0+0 x 2x2+1-1
[18:16:55] [PASSED] bottom-left x top-right: 2x2+1-1 x 2x2+0+0
[18:16:55] [PASSED] bottom-right x top-left: 2x2+0+0 x 2x2+1+1
[18:16:55] [PASSED] right x left: 2x1+0+0 x 3x1+1+0
[18:16:55] [PASSED] left x right: 3x1+1+0 x 2x1+0+0
[18:16:55] [PASSED] up x bottom: 1x2+0+0 x 1x3+0-1
[18:16:55] [PASSED] bottom x up: 1x3+0-1 x 1x2+0+0
[18:16:55] [PASSED] touching corner: 1x1+0+0 x 2x2+1+1
[18:16:55] [PASSED] touching side: 1x1+0+0 x 1x1+1+0
[18:16:55] [PASSED] equal rects: 2x2+0+0 x 2x2+0+0
[18:16:55] [PASSED] inside another: 2x2+0+0 x 1x1+1+1
[18:16:55] [PASSED] far away: 1x1+0+0 x 1x1+3+6
[18:16:55] [PASSED] points intersecting: 0x0+5+10 x 0x0+5+10
[18:16:55] [PASSED] points not intersecting: 0x0+0+0 x 0x0+5+10
[18:16:55] ============= [PASSED] drm_test_rect_intersect =============
[18:16:55] ================ drm_test_rect_calc_hscale  ================
[18:16:55] [PASSED] normal use
[18:16:55] [PASSED] out of max range
[18:16:55] [PASSED] out of min range
[18:16:55] [PASSED] zero dst
[18:16:55] [PASSED] negative src
[18:16:55] [PASSED] negative dst
[18:16:55] ============ [PASSED] drm_test_rect_calc_hscale ============
[18:16:55] ================ drm_test_rect_calc_vscale  ================
[18:16:55] [PASSED] normal use
[18:16:55] [PASSED] out of max range
[18:16:55] [PASSED] out of min range
[18:16:55] [PASSED] zero dst
[18:16:55] [PASSED] negative src
[18:16:55] [PASSED] negative dst
[18:16:55] ============ [PASSED] drm_test_rect_calc_vscale ============
[18:16:55] ================== drm_test_rect_rotate  ===================
[18:16:55] [PASSED] reflect-x
[18:16:55] [PASSED] reflect-y
[18:16:55] [PASSED] rotate-0
[18:16:55] [PASSED] rotate-90
[18:16:55] [PASSED] rotate-180
[18:16:55] [PASSED] rotate-270
[18:16:55] ============== [PASSED] drm_test_rect_rotate ===============
[18:16:55] ================ drm_test_rect_rotate_inv  =================
[18:16:55] [PASSED] reflect-x
[18:16:55] [PASSED] reflect-y
[18:16:55] [PASSED] rotate-0
[18:16:55] [PASSED] rotate-90
[18:16:55] [PASSED] rotate-180
[18:16:55] [PASSED] rotate-270
[18:16:55] ============ [PASSED] drm_test_rect_rotate_inv =============
[18:16:55] ==================== [PASSED] drm_rect =====================
[18:16:55] ============================================================
[18:16:55] Testing complete. Ran 526 tests: passed: 526
[18:16:55] Elapsed time: 24.550s total, 1.652s configuring, 22.732s building, 0.156s running

+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/ttm/tests/.kunitconfig
[18:16:55] Configuring KUnit Kernel ...
Regenerating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[18:16:56] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make all compile_commands.json ARCH=um O=.kunit --jobs=48
[18:17:04] Starting KUnit Kernel (1/1)...
[18:17:04] ============================================================
Running tests with:
$ .kunit/linux kunit.enable=1 mem=1G console=tty kunit_shutdown=halt
[18:17:04] ================= ttm_device (5 subtests) ==================
[18:17:04] [PASSED] ttm_device_init_basic
[18:17:04] [PASSED] ttm_device_init_multiple
[18:17:04] [PASSED] ttm_device_fini_basic
[18:17:04] [PASSED] ttm_device_init_no_vma_man
[18:17:04] ================== ttm_device_init_pools  ==================
[18:17:04] [PASSED] No DMA allocations, no DMA32 required
[18:17:04] [PASSED] DMA allocations, DMA32 required
[18:17:04] [PASSED] No DMA allocations, DMA32 required
[18:17:04] [PASSED] DMA allocations, no DMA32 required
[18:17:04] ============== [PASSED] ttm_device_init_pools ==============
[18:17:04] =================== [PASSED] ttm_device ====================
[18:17:04] ================== ttm_pool (8 subtests) ===================
[18:17:04] ================== ttm_pool_alloc_basic  ===================
[18:17:04] [PASSED] One page
[18:17:04] [PASSED] More than one page
[18:17:04] [PASSED] Above the allocation limit
[18:17:04] [PASSED] One page, with coherent DMA mappings enabled
[18:17:04] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[18:17:04] ============== [PASSED] ttm_pool_alloc_basic ===============
[18:17:04] ============== ttm_pool_alloc_basic_dma_addr  ==============
[18:17:04] [PASSED] One page
[18:17:04] [PASSED] More than one page
[18:17:04] [PASSED] Above the allocation limit
[18:17:04] [PASSED] One page, with coherent DMA mappings enabled
[18:17:04] [PASSED] Above the allocation limit, with coherent DMA mappings enabled
[18:17:04] ========== [PASSED] ttm_pool_alloc_basic_dma_addr ==========
[18:17:04] [PASSED] ttm_pool_alloc_order_caching_match
[18:17:04] [PASSED] ttm_pool_alloc_caching_mismatch
[18:17:04] [PASSED] ttm_pool_alloc_order_mismatch
[18:17:04] [PASSED] ttm_pool_free_dma_alloc
[18:17:04] [PASSED] ttm_pool_free_no_dma_alloc
[18:17:04] [PASSED] ttm_pool_fini_basic
[18:17:04] ==================== [PASSED] ttm_pool =====================
[18:17:04] ================ ttm_resource (8 subtests) =================
[18:17:04] ================= ttm_resource_init_basic  =================
[18:17:04] [PASSED] Init resource in TTM_PL_SYSTEM
[18:17:04] [PASSED] Init resource in TTM_PL_VRAM
[18:17:04] [PASSED] Init resource in a private placement
[18:17:04] [PASSED] Init resource in TTM_PL_SYSTEM, set placement flags
[18:17:04] ============= [PASSED] ttm_resource_init_basic =============
[18:17:04] [PASSED] ttm_resource_init_pinned
[18:17:04] [PASSED] ttm_resource_fini_basic
[18:17:04] [PASSED] ttm_resource_manager_init_basic
[18:17:04] [PASSED] ttm_resource_manager_usage_basic
[18:17:04] [PASSED] ttm_resource_manager_set_used_basic
[18:17:04] [PASSED] ttm_sys_man_alloc_basic
[18:17:04] [PASSED] ttm_sys_man_free_basic
[18:17:04] ================== [PASSED] ttm_resource ===================
[18:17:04] =================== ttm_tt (15 subtests) ===================
[18:17:04] ==================== ttm_tt_init_basic  ====================
[18:17:04] [PASSED] Page-aligned size
[18:17:04] [PASSED] Extra pages requested
[18:17:04] ================ [PASSED] ttm_tt_init_basic ================
[18:17:04] [PASSED] ttm_tt_init_misaligned
[18:17:04] [PASSED] ttm_tt_fini_basic
[18:17:04] [PASSED] ttm_tt_fini_sg
[18:17:04] [PASSED] ttm_tt_fini_shmem
[18:17:04] [PASSED] ttm_tt_create_basic
[18:17:04] [PASSED] ttm_tt_create_invalid_bo_type
[18:17:04] [PASSED] ttm_tt_create_ttm_exists
[18:17:04] [PASSED] ttm_tt_create_failed
[18:17:04] [PASSED] ttm_tt_destroy_basic
[18:17:04] [PASSED] ttm_tt_populate_null_ttm
[18:17:04] [PASSED] ttm_tt_populate_populated_ttm
[18:17:04] [PASSED] ttm_tt_unpopulate_basic
[18:17:04] [PASSED] ttm_tt_unpopulate_empty_ttm
[18:17:04] [PASSED] ttm_tt_swapin_basic
[18:17:04] ===================== [PASSED] ttm_tt ======================
[18:17:04] =================== ttm_bo (14 subtests) ===================
[18:17:04] =========== ttm_bo_reserve_optimistic_no_ticket  ===========
[18:17:04] [PASSED] Cannot be interrupted and sleeps
[18:17:04] [PASSED] Cannot be interrupted, locks straight away
[18:17:04] [PASSED] Can be interrupted, sleeps
[18:17:04] ======= [PASSED] ttm_bo_reserve_optimistic_no_ticket =======
[18:17:04] [PASSED] ttm_bo_reserve_locked_no_sleep
[18:17:04] [PASSED] ttm_bo_reserve_no_wait_ticket
[18:17:04] [PASSED] ttm_bo_reserve_double_resv
[18:17:04] [PASSED] ttm_bo_reserve_interrupted
[18:17:04] [PASSED] ttm_bo_reserve_deadlock
[18:17:04] [PASSED] ttm_bo_unreserve_basic
[18:17:04] [PASSED] ttm_bo_unreserve_pinned
[18:17:04] [PASSED] ttm_bo_unreserve_bulk
[18:17:04] [PASSED] ttm_bo_put_basic
[18:17:04] [PASSED] ttm_bo_put_shared_resv
[18:17:04] [PASSED] ttm_bo_pin_basic
[18:17:04] [PASSED] ttm_bo_pin_unpin_resource
[18:17:04] [PASSED] ttm_bo_multiple_pin_one_unpin
[18:17:04] ===================== [PASSED] ttm_bo ======================
[18:17:04] ============== ttm_bo_validate (22 subtests) ===============
[18:17:04] ============== ttm_bo_init_reserved_sys_man  ===============
[18:17:04] [PASSED] Buffer object for userspace
[18:17:04] [PASSED] Kernel buffer object
[18:17:04] [PASSED] Shared buffer object
[18:17:04] ========== [PASSED] ttm_bo_init_reserved_sys_man ===========
[18:17:04] ============== ttm_bo_init_reserved_mock_man  ==============
[18:17:04] [PASSED] Buffer object for userspace
[18:17:04] [PASSED] Kernel buffer object
[18:17:04] [PASSED] Shared buffer object
[18:17:04] ========== [PASSED] ttm_bo_init_reserved_mock_man ==========
[18:17:04] [PASSED] ttm_bo_init_reserved_resv
[18:17:04] ================== ttm_bo_validate_basic  ==================
[18:17:04] [PASSED] Buffer object for userspace
[18:17:04] [PASSED] Kernel buffer object
[18:17:04] [PASSED] Shared buffer object
[18:17:04] ============== [PASSED] ttm_bo_validate_basic ==============
[18:17:04] [PASSED] ttm_bo_validate_invalid_placement
[18:17:04] ============= ttm_bo_validate_same_placement  ==============
[18:17:04] [PASSED] System manager
[18:17:04] [PASSED] VRAM manager
[18:17:04] ========= [PASSED] ttm_bo_validate_same_placement ==========
[18:17:04] [PASSED] ttm_bo_validate_failed_alloc
[18:17:04] [PASSED] ttm_bo_validate_pinned
[18:17:04] [PASSED] ttm_bo_validate_busy_placement
[18:17:04] ================ ttm_bo_validate_multihop  =================
[18:17:04] [PASSED] Buffer object for userspace
[18:17:04] [PASSED] Kernel buffer object
[18:17:04] [PASSED] Shared buffer object
[18:17:04] ============ [PASSED] ttm_bo_validate_multihop =============
[18:17:04] ========== ttm_bo_validate_no_placement_signaled  ==========
[18:17:04] [PASSED] Buffer object in system domain, no page vector
[18:17:04] [PASSED] Buffer object in system domain with an existing page vector
[18:17:04] ====== [PASSED] ttm_bo_validate_no_placement_signaled ======
[18:17:04] ======== ttm_bo_validate_no_placement_not_signaled  ========
[18:17:04] [PASSED] Buffer object for userspace
[18:17:04] [PASSED] Kernel buffer object
[18:17:04] [PASSED] Shared buffer object
[18:17:04] ==== [PASSED] ttm_bo_validate_no_placement_not_signaled ====
[18:17:04] [PASSED] ttm_bo_validate_move_fence_signaled
[18:17:04] ========= ttm_bo_validate_move_fence_not_signaled  =========
[18:17:04] [PASSED] Waits for GPU
[18:17:04] [PASSED] Tries to lock straight away
[18:17:05] ===== [PASSED] ttm_bo_validate_move_fence_not_signaled =====
[18:17:05] [PASSED] ttm_bo_validate_swapout
[18:17:05] [PASSED] ttm_bo_validate_happy_evict
[18:17:05] [PASSED] ttm_bo_validate_all_pinned_evict
[18:17:05] [PASSED] ttm_bo_validate_allowed_only_evict
[18:17:05] [PASSED] ttm_bo_validate_deleted_evict
[18:17:05] [PASSED] ttm_bo_validate_busy_domain_evict
[18:17:05] [PASSED] ttm_bo_validate_evict_gutting
[18:17:05] [PASSED] ttm_bo_validate_recrusive_evict
stty: 'standard input': Inappropriate ioctl for device
[18:17:05] ================= [PASSED] ttm_bo_validate =================
[18:17:05] ============================================================
[18:17:05] Testing complete. Ran 102 tests: passed: 102
[18:17:05] Elapsed time: 9.993s total, 1.686s configuring, 7.639s building, 0.545s running

+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 56+ messages in thread

* ✓ CI.Build: success for Fix non-contiguous VRAM BO access in Xe (rev6)
  2024-10-31 18:10 [PATCH v6 0/8] Fix non-contiguous VRAM BO access in Xe Matthew Brost
                   ` (10 preceding siblings ...)
  2024-10-31 18:17 ` ✓ CI.KUnit: success " Patchwork
@ 2024-10-31 18:28 ` Patchwork
  2024-10-31 18:31 ` ✓ CI.Hooks: " Patchwork
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Patchwork @ 2024-10-31 18:28 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: Fix non-contiguous VRAM BO access in Xe (rev6)
URL   : https://patchwork.freedesktop.org/series/140200/
State : success

== Summary ==

lib/modules/6.12.0-rc5-xe/kernel/sound/core/snd-hwdep.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/core/snd.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/core/snd-pcm.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/core/snd-compress.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/core/snd-timer.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soundcore.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/intel/
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/intel/atom/
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/intel/atom/snd-soc-sst-atom-hifi2-platform.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/intel/atom/sst/
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/intel/atom/sst/snd-intel-sst-acpi.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/intel/atom/sst/snd-intel-sst-core.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/intel/common/
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/intel/common/snd-soc-acpi-intel-match.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/amd/
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/amd/acp/
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/amd/acp/snd-soc-acpi-amd-match.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/amd/snd-acp-config.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/intel/
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/intel/snd-sof-pci-intel-tgl.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/intel/snd-sof-intel-hda-mlink.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/intel/snd-sof-pci-intel-ptl.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/intel/snd-sof-pci-intel-cnl.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/intel/snd-sof-pci-intel-lnl.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/intel/snd-sof-intel-hda-common.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/intel/snd-sof-intel-hda-generic.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/intel/snd-sof-intel-hda.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/intel/snd-sof-pci-intel-mtl.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/amd/
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/amd/snd-sof-amd-renoir.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/amd/snd-sof-amd-acp.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/snd-sof-utils.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/snd-sof-pci.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/snd-sof.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/snd-sof-probes.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/xtensa/
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/sof/xtensa/snd-sof-xtensa-dsp.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/snd-soc-core.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/snd-soc-acpi.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/codecs/
lib/modules/6.12.0-rc5-xe/kernel/sound/soc/codecs/snd-soc-hdac-hda.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/hda/
lib/modules/6.12.0-rc5-xe/kernel/sound/hda/snd-intel-sdw-acpi.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/hda/ext/
lib/modules/6.12.0-rc5-xe/kernel/sound/hda/ext/snd-hda-ext-core.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/hda/snd-intel-dspcfg.ko
lib/modules/6.12.0-rc5-xe/kernel/sound/hda/snd-hda-core.ko
lib/modules/6.12.0-rc5-xe/kernel/arch/
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/kernel/
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/kernel/msr.ko
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/kernel/cpuid.ko
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/crypto/
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/crypto/sha512-ssse3.ko
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/crypto/crct10dif-pclmul.ko
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/crypto/ghash-clmulni-intel.ko
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/crypto/sha1-ssse3.ko
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/crypto/crc32-pclmul.ko
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/crypto/sha256-ssse3.ko
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/crypto/aesni-intel.ko
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/crypto/polyval-clmulni.ko
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/events/
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/events/intel/
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/events/intel/intel-cstate.ko
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/events/rapl.ko
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/kvm/
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/kvm/kvm.ko
lib/modules/6.12.0-rc5-xe/kernel/arch/x86/kvm/kvm-intel.ko
lib/modules/6.12.0-rc5-xe/kernel/crypto/
lib/modules/6.12.0-rc5-xe/kernel/crypto/crypto_simd.ko
lib/modules/6.12.0-rc5-xe/kernel/crypto/cmac.ko
lib/modules/6.12.0-rc5-xe/kernel/crypto/ccm.ko
lib/modules/6.12.0-rc5-xe/kernel/crypto/cryptd.ko
lib/modules/6.12.0-rc5-xe/kernel/crypto/polyval-generic.ko
lib/modules/6.12.0-rc5-xe/kernel/crypto/async_tx/
lib/modules/6.12.0-rc5-xe/kernel/crypto/async_tx/async_xor.ko
lib/modules/6.12.0-rc5-xe/kernel/crypto/async_tx/async_tx.ko
lib/modules/6.12.0-rc5-xe/kernel/crypto/async_tx/async_memcpy.ko
lib/modules/6.12.0-rc5-xe/kernel/crypto/async_tx/async_pq.ko
lib/modules/6.12.0-rc5-xe/kernel/crypto/async_tx/async_raid6_recov.ko
lib/modules/6.12.0-rc5-xe/build
lib/modules/6.12.0-rc5-xe/modules.alias.bin
lib/modules/6.12.0-rc5-xe/modules.builtin
lib/modules/6.12.0-rc5-xe/modules.softdep
lib/modules/6.12.0-rc5-xe/modules.alias
lib/modules/6.12.0-rc5-xe/modules.order
lib/modules/6.12.0-rc5-xe/modules.symbols
lib/modules/6.12.0-rc5-xe/modules.dep.bin
+ mv kernel-nodebug.tar.gz ..
+ cd ..
+ rm -rf archive
++ date +%s
+ echo -e '\e[0Ksection_end:1730399315:package_x86_64_nodebug\r\e[0K'
+ sync
^[[0Ksection_end:1730399315:package_x86_64_nodebug
^[[0K
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 56+ messages in thread

* ✓ CI.Hooks: success for Fix non-contiguous VRAM BO access in Xe (rev6)
  2024-10-31 18:10 [PATCH v6 0/8] Fix non-contiguous VRAM BO access in Xe Matthew Brost
                   ` (11 preceding siblings ...)
  2024-10-31 18:28 ` ✓ CI.Build: " Patchwork
@ 2024-10-31 18:31 ` Patchwork
  2024-10-31 18:32 ` ✗ CI.checksparse: warning " Patchwork
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 56+ messages in thread
From: Patchwork @ 2024-10-31 18:31 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: Fix non-contiguous VRAM BO access in Xe (rev6)
URL   : https://patchwork.freedesktop.org/series/140200/
State : success

== Summary ==

run-parts: executing /workspace/ci/hooks/00-showenv
+ grep -Ei '(^|\W)CI_'
+ export
declare -x CI_KERNEL_BUILD_DIR="/workspace/kernel/build64-default"
declare -x CI_KERNEL_SRC_DIR="/workspace/kernel"
declare -x CI_TOOLS_SRC_DIR="/workspace/ci"
declare -x CI_WORKSPACE_DIR="/workspace"
run-parts: executing /workspace/ci/hooks/10-build-W1
+ SRC_DIR=/workspace/kernel
+ RESTORE_DISPLAY_CONFIG=0
+ '[' -n /workspace/kernel/build64-default ']'
+ BUILD_DIR=/workspace/kernel/build64-default
+ cd /workspace/kernel
++ nproc
+ make -j48 O=/workspace/kernel/build64-default modules_prepare
make[1]: Entering directory '/workspace/kernel/build64-default'
  GEN     Makefile
  UPD     include/config/kernel.release
mkdir -p /workspace/kernel/build64-default/tools/objtool && make O=/workspace/kernel/build64-default subdir=tools/objtool --no-print-directory -C objtool 
  UPD     include/generated/utsrelease.h
  CALL    ../scripts/checksyscalls.sh
  INSTALL libsubcmd_headers
  CC      /workspace/kernel/build64-default/tools/objtool/libsubcmd/exec-cmd.o
  CC      /workspace/kernel/build64-default/tools/objtool/libsubcmd/help.o
  CC      /workspace/kernel/build64-default/tools/objtool/libsubcmd/parse-options.o
  CC      /workspace/kernel/build64-default/tools/objtool/libsubcmd/pager.o
  CC      /workspace/kernel/build64-default/tools/objtool/libsubcmd/run-command.o
  CC      /workspace/kernel/build64-default/tools/objtool/libsubcmd/sigchain.o
  CC      /workspace/kernel/build64-default/tools/objtool/libsubcmd/subcmd-config.o
  LD      /workspace/kernel/build64-default/tools/objtool/libsubcmd/libsubcmd-in.o
  AR      /workspace/kernel/build64-default/tools/objtool/libsubcmd/libsubcmd.a
  CC      /workspace/kernel/build64-default/tools/objtool/weak.o
  CC      /workspace/kernel/build64-default/tools/objtool/check.o
  CC      /workspace/kernel/build64-default/tools/objtool/special.o
  CC      /workspace/kernel/build64-default/tools/objtool/builtin-check.o
  CC      /workspace/kernel/build64-default/tools/objtool/elf.o
  CC      /workspace/kernel/build64-default/tools/objtool/objtool.o
  CC      /workspace/kernel/build64-default/tools/objtool/orc_gen.o
  CC      /workspace/kernel/build64-default/tools/objtool/orc_dump.o
  CC      /workspace/kernel/build64-default/tools/objtool/libstring.o
  CC      /workspace/kernel/build64-default/tools/objtool/arch/x86/special.o
  CC      /workspace/kernel/build64-default/tools/objtool/libctype.o
  CC      /workspace/kernel/build64-default/tools/objtool/str_error_r.o
  CC      /workspace/kernel/build64-default/tools/objtool/arch/x86/decode.o
  CC      /workspace/kernel/build64-default/tools/objtool/librbtree.o
  CC      /workspace/kernel/build64-default/tools/objtool/arch/x86/orc.o
  LD      /workspace/kernel/build64-default/tools/objtool/arch/x86/objtool-in.o
  LD      /workspace/kernel/build64-default/tools/objtool/objtool-in.o
  LINK    /workspace/kernel/build64-default/tools/objtool/objtool
make[1]: Leaving directory '/workspace/kernel/build64-default'
++ nproc
+ make -j48 O=/workspace/kernel/build64-default W=1 drivers/gpu/drm/xe
make[1]: Entering directory '/workspace/kernel/build64-default'
make[2]: Nothing to be done for 'drivers/gpu/drm/xe'.
make[1]: Leaving directory '/workspace/kernel/build64-default'
run-parts: executing /workspace/ci/hooks/11-build-32b
+++ realpath /workspace/ci/hooks/11-build-32b
++ dirname /workspace/ci/hooks/11-build-32b
+ THIS_SCRIPT_DIR=/workspace/ci/hooks
+ SRC_DIR=/workspace/kernel
+ TOOLS_SRC_DIR=/workspace/ci
+ '[' -n /workspace/kernel/build64-default ']'
+ BUILD_DIR=/workspace/kernel/build64-default
+ BUILD_DIR=/workspace/kernel/build64-default/build32
+ cd /workspace/kernel
+ mkdir -p /workspace/kernel/build64-default/build32
++ nproc
+ make -j48 ARCH=i386 O=/workspace/kernel/build64-default/build32 defconfig
make[1]: Entering directory '/workspace/kernel/build64-default/build32'
  GEN     Makefile
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  HOSTCC  scripts/kconfig/confdata.o
  HOSTCC  scripts/kconfig/expr.o
  LEX     scripts/kconfig/lexer.lex.c
  YACC    scripts/kconfig/parser.tab.[ch]
  HOSTCC  scripts/kconfig/menu.o
  HOSTCC  scripts/kconfig/preprocess.o
  HOSTCC  scripts/kconfig/symbol.o
  HOSTCC  scripts/kconfig/util.o
  HOSTCC  scripts/kconfig/lexer.lex.o
  HOSTCC  scripts/kconfig/parser.tab.o
  HOSTLD  scripts/kconfig/conf
*** Default configuration is based on 'i386_defconfig'
#
# configuration written to .config
#
make[1]: Leaving directory '/workspace/kernel/build64-default/build32'
+ cd /workspace/kernel/build64-default/build32
+ /workspace/kernel/scripts/kconfig/merge_config.sh .config /workspace/ci/kernel/10-xe.fragment
Using .config as base
Merging /workspace/ci/kernel/10-xe.fragment
Value of CONFIG_DRM_XE is redefined by fragment /workspace/ci/kernel/10-xe.fragment:
Previous value: # CONFIG_DRM_XE is not set
New value: CONFIG_DRM_XE=m

Value of CONFIG_SND_DEBUG is redefined by fragment /workspace/ci/kernel/10-xe.fragment:
Previous value: # CONFIG_SND_DEBUG is not set
New value: CONFIG_SND_DEBUG=y

Value of CONFIG_SND_HDA_INTEL is redefined by fragment /workspace/ci/kernel/10-xe.fragment:
Previous value: CONFIG_SND_HDA_INTEL=y
New value: CONFIG_SND_HDA_INTEL=m

Value of CONFIG_SND_HDA_CODEC_HDMI is redefined by fragment /workspace/ci/kernel/10-xe.fragment:
Previous value: # CONFIG_SND_HDA_CODEC_HDMI is not set
New value: CONFIG_SND_HDA_CODEC_HDMI=m

  GEN     Makefile

WARNING: unmet direct dependencies detected for FB_IOMEM_HELPERS
  Depends on [n]: HAS_IOMEM [=y] && FB_CORE [=n]
  Selected by [m]:
  - DRM_XE_DISPLAY [=y] && HAS_IOMEM [=y] && DRM [=y] && DRM_XE [=m] && DRM_XE [=m]=m [=m]
#
# configuration written to .config
#
Value requested for CONFIG_HAVE_UID16 not in final .config
Requested value:  CONFIG_HAVE_UID16=y
Actual value:     

Value requested for CONFIG_UID16 not in final .config
Requested value:  CONFIG_UID16=y
Actual value:     

Value requested for CONFIG_X86_32 not in final .config
Requested value:  CONFIG_X86_32=y
Actual value:     

Value requested for CONFIG_OUTPUT_FORMAT not in final .config
Requested value:  CONFIG_OUTPUT_FORMAT="elf32-i386"
Actual value:     CONFIG_OUTPUT_FORMAT="elf64-x86-64"

Value requested for CONFIG_ARCH_MMAP_RND_BITS_MIN not in final .config
Requested value:  CONFIG_ARCH_MMAP_RND_BITS_MIN=8
Actual value:     CONFIG_ARCH_MMAP_RND_BITS_MIN=28

Value requested for CONFIG_ARCH_MMAP_RND_BITS_MAX not in final .config
Requested value:  CONFIG_ARCH_MMAP_RND_BITS_MAX=16
Actual value:     CONFIG_ARCH_MMAP_RND_BITS_MAX=32

Value requested for CONFIG_PGTABLE_LEVELS not in final .config
Requested value:  CONFIG_PGTABLE_LEVELS=2
Actual value:     CONFIG_PGTABLE_LEVELS=5

Value requested for CONFIG_X86_BIGSMP not in final .config
Requested value:  # CONFIG_X86_BIGSMP is not set
Actual value:     

Value requested for CONFIG_X86_INTEL_QUARK not in final .config
Requested value:  # CONFIG_X86_INTEL_QUARK is not set
Actual value:     

Value requested for CONFIG_X86_RDC321X not in final .config
Requested value:  # CONFIG_X86_RDC321X is not set
Actual value:     

Value requested for CONFIG_X86_32_NON_STANDARD not in final .config
Requested value:  # CONFIG_X86_32_NON_STANDARD is not set
Actual value:     

Value requested for CONFIG_X86_32_IRIS not in final .config
Requested value:  # CONFIG_X86_32_IRIS is not set
Actual value:     

Value requested for CONFIG_M486SX not in final .config
Requested value:  # CONFIG_M486SX is not set
Actual value:     

Value requested for CONFIG_M486 not in final .config
Requested value:  # CONFIG_M486 is not set
Actual value:     

Value requested for CONFIG_M586 not in final .config
Requested value:  # CONFIG_M586 is not set
Actual value:     

Value requested for CONFIG_M586TSC not in final .config
Requested value:  # CONFIG_M586TSC is not set
Actual value:     

Value requested for CONFIG_M586MMX not in final .config
Requested value:  # CONFIG_M586MMX is not set
Actual value:     

Value requested for CONFIG_M686 not in final .config
Requested value:  CONFIG_M686=y
Actual value:     

Value requested for CONFIG_MPENTIUMII not in final .config
Requested value:  # CONFIG_MPENTIUMII is not set
Actual value:     

Value requested for CONFIG_MPENTIUMIII not in final .config
Requested value:  # CONFIG_MPENTIUMIII is not set
Actual value:     

Value requested for CONFIG_MPENTIUMM not in final .config
Requested value:  # CONFIG_MPENTIUMM is not set
Actual value:     

Value requested for CONFIG_MPENTIUM4 not in final .config
Requested value:  # CONFIG_MPENTIUM4 is not set
Actual value:     

Value requested for CONFIG_MK6 not in final .config
Requested value:  # CONFIG_MK6 is not set
Actual value:     

Value requested for CONFIG_MK7 not in final .config
Requested value:  # CONFIG_MK7 is not set
Actual value:     

Value requested for CONFIG_MCRUSOE not in final .config
Requested value:  # CONFIG_MCRUSOE is not set
Actual value:     

Value requested for CONFIG_MEFFICEON not in final .config
Requested value:  # CONFIG_MEFFICEON is not set
Actual value:     

Value requested for CONFIG_MWINCHIPC6 not in final .config
Requested value:  # CONFIG_MWINCHIPC6 is not set
Actual value:     

Value requested for CONFIG_MWINCHIP3D not in final .config
Requested value:  # CONFIG_MWINCHIP3D is not set
Actual value:     

Value requested for CONFIG_MELAN not in final .config
Requested value:  # CONFIG_MELAN is not set
Actual value:     

Value requested for CONFIG_MGEODEGX1 not in final .config
Requested value:  # CONFIG_MGEODEGX1 is not set
Actual value:     

Value requested for CONFIG_MGEODE_LX not in final .config
Requested value:  # CONFIG_MGEODE_LX is not set
Actual value:     

Value requested for CONFIG_MCYRIXIII not in final .config
Requested value:  # CONFIG_MCYRIXIII is not set
Actual value:     

Value requested for CONFIG_MVIAC3_2 not in final .config
Requested value:  # CONFIG_MVIAC3_2 is not set
Actual value:     

Value requested for CONFIG_MVIAC7 not in final .config
Requested value:  # CONFIG_MVIAC7 is not set
Actual value:     

Value requested for CONFIG_X86_GENERIC not in final .config
Requested value:  # CONFIG_X86_GENERIC is not set
Actual value:     

Value requested for CONFIG_X86_INTERNODE_CACHE_SHIFT not in final .config
Requested value:  CONFIG_X86_INTERNODE_CACHE_SHIFT=5
Actual value:     CONFIG_X86_INTERNODE_CACHE_SHIFT=6

Value requested for CONFIG_X86_L1_CACHE_SHIFT not in final .config
Requested value:  CONFIG_X86_L1_CACHE_SHIFT=5
Actual value:     CONFIG_X86_L1_CACHE_SHIFT=6

Value requested for CONFIG_X86_USE_PPRO_CHECKSUM not in final .config
Requested value:  CONFIG_X86_USE_PPRO_CHECKSUM=y
Actual value:     

Value requested for CONFIG_X86_MINIMUM_CPU_FAMILY not in final .config
Requested value:  CONFIG_X86_MINIMUM_CPU_FAMILY=6
Actual value:     CONFIG_X86_MINIMUM_CPU_FAMILY=64

Value requested for CONFIG_CPU_SUP_TRANSMETA_32 not in final .config
Requested value:  CONFIG_CPU_SUP_TRANSMETA_32=y
Actual value:     

Value requested for CONFIG_CPU_SUP_VORTEX_32 not in final .config
Requested value:  CONFIG_CPU_SUP_VORTEX_32=y
Actual value:     

Value requested for CONFIG_HPET_TIMER not in final .config
Requested value:  # CONFIG_HPET_TIMER is not set
Actual value:     CONFIG_HPET_TIMER=y

Value requested for CONFIG_NR_CPUS_RANGE_END not in final .config
Requested value:  CONFIG_NR_CPUS_RANGE_END=8
Actual value:     CONFIG_NR_CPUS_RANGE_END=512

Value requested for CONFIG_NR_CPUS_DEFAULT not in final .config
Requested value:  CONFIG_NR_CPUS_DEFAULT=8
Actual value:     CONFIG_NR_CPUS_DEFAULT=64

Value requested for CONFIG_X86_ANCIENT_MCE not in final .config
Requested value:  # CONFIG_X86_ANCIENT_MCE is not set
Actual value:     

Value requested for CONFIG_X86_LEGACY_VM86 not in final .config
Requested value:  # CONFIG_X86_LEGACY_VM86 is not set
Actual value:     

Value requested for CONFIG_X86_ESPFIX32 not in final .config
Requested value:  CONFIG_X86_ESPFIX32=y
Actual value:     

Value requested for CONFIG_TOSHIBA not in final .config
Requested value:  # CONFIG_TOSHIBA is not set
Actual value:     

Value requested for CONFIG_X86_REBOOTFIXUPS not in final .config
Requested value:  # CONFIG_X86_REBOOTFIXUPS is not set
Actual value:     

Value requested for CONFIG_MICROCODE_INITRD32 not in final .config
Requested value:  CONFIG_MICROCODE_INITRD32=y
Actual value:     

Value requested for CONFIG_NOHIGHMEM not in final .config
Requested value:  # CONFIG_NOHIGHMEM is not set
Actual value:     

Value requested for CONFIG_HIGHMEM4G not in final .config
Requested value:  CONFIG_HIGHMEM4G=y
Actual value:     

Value requested for CONFIG_HIGHMEM64G not in final .config
Requested value:  # CONFIG_HIGHMEM64G is not set
Actual value:     

Value requested for CONFIG_VMSPLIT_3G not in final .config
Requested value:  CONFIG_VMSPLIT_3G=y
Actual value:     

Value requested for CONFIG_VMSPLIT_3G_OPT not in final .config
Requested value:  # CONFIG_VMSPLIT_3G_OPT is not set
Actual value:     

Value requested for CONFIG_VMSPLIT_2G not in final .config
Requested value:  # CONFIG_VMSPLIT_2G is not set
Actual value:     

Value requested for CONFIG_VMSPLIT_2G_OPT not in final .config
Requested value:  # CONFIG_VMSPLIT_2G_OPT is not set
Actual value:     

Value requested for CONFIG_VMSPLIT_1G not in final .config
Requested value:  # CONFIG_VMSPLIT_1G is not set
Actual value:     

Value requested for CONFIG_PAGE_OFFSET not in final .config
Requested value:  CONFIG_PAGE_OFFSET=0xC0000000
Actual value:     

Value requested for CONFIG_HIGHMEM not in final .config
Requested value:  CONFIG_HIGHMEM=y
Actual value:     

Value requested for CONFIG_X86_PAE not in final .config
Requested value:  # CONFIG_X86_PAE is not set
Actual value:     

Value requested for CONFIG_ARCH_FLATMEM_ENABLE not in final .config
Requested value:  CONFIG_ARCH_FLATMEM_ENABLE=y
Actual value:     

Value requested for CONFIG_ARCH_SELECT_MEMORY_MODEL not in final .config
Requested value:  CONFIG_ARCH_SELECT_MEMORY_MODEL=y
Actual value:     

Value requested for CONFIG_ILLEGAL_POINTER_VALUE not in final .config
Requested value:  CONFIG_ILLEGAL_POINTER_VALUE=0
Actual value:     CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000

Value requested for CONFIG_HIGHPTE not in final .config
Requested value:  # CONFIG_HIGHPTE is not set
Actual value:     

Value requested for CONFIG_COMPAT_VDSO not in final .config
Requested value:  # CONFIG_COMPAT_VDSO is not set
Actual value:     

Value requested for CONFIG_FUNCTION_PADDING_CFI not in final .config
Requested value:  CONFIG_FUNCTION_PADDING_CFI=0
Actual value:     CONFIG_FUNCTION_PADDING_CFI=11

Value requested for CONFIG_FUNCTION_PADDING_BYTES not in final .config
Requested value:  CONFIG_FUNCTION_PADDING_BYTES=4
Actual value:     CONFIG_FUNCTION_PADDING_BYTES=16

Value requested for CONFIG_APM not in final .config
Requested value:  # CONFIG_APM is not set
Actual value:     

Value requested for CONFIG_X86_POWERNOW_K6 not in final .config
Requested value:  # CONFIG_X86_POWERNOW_K6 is not set
Actual value:     

Value requested for CONFIG_X86_POWERNOW_K7 not in final .config
Requested value:  # CONFIG_X86_POWERNOW_K7 is not set
Actual value:     

Value requested for CONFIG_X86_GX_SUSPMOD not in final .config
Requested value:  # CONFIG_X86_GX_SUSPMOD is not set
Actual value:     

Value requested for CONFIG_X86_SPEEDSTEP_ICH not in final .config
Requested value:  # CONFIG_X86_SPEEDSTEP_ICH is not set
Actual value:     

Value requested for CONFIG_X86_SPEEDSTEP_SMI not in final .config
Requested value:  # CONFIG_X86_SPEEDSTEP_SMI is not set
Actual value:     

Value requested for CONFIG_X86_CPUFREQ_NFORCE2 not in final .config
Requested value:  # CONFIG_X86_CPUFREQ_NFORCE2 is not set
Actual value:     

Value requested for CONFIG_X86_LONGRUN not in final .config
Requested value:  # CONFIG_X86_LONGRUN is not set
Actual value:     

Value requested for CONFIG_X86_LONGHAUL not in final .config
Requested value:  # CONFIG_X86_LONGHAUL is not set
Actual value:     

Value requested for CONFIG_X86_E_POWERSAVER not in final .config
Requested value:  # CONFIG_X86_E_POWERSAVER is not set
Actual value:     

Value requested for CONFIG_PCI_GOBIOS not in final .config
Requested value:  # CONFIG_PCI_GOBIOS is not set
Actual value:     

Value requested for CONFIG_PCI_GOMMCONFIG not in final .config
Requested value:  # CONFIG_PCI_GOMMCONFIG is not set
Actual value:     

Value requested for CONFIG_PCI_GODIRECT not in final .config
Requested value:  # CONFIG_PCI_GODIRECT is not set
Actual value:     

Value requested for CONFIG_PCI_GOANY not in final .config
Requested value:  CONFIG_PCI_GOANY=y
Actual value:     

Value requested for CONFIG_PCI_BIOS not in final .config
Requested value:  CONFIG_PCI_BIOS=y
Actual value:     

Value requested for CONFIG_ISA not in final .config
Requested value:  # CONFIG_ISA is not set
Actual value:     

Value requested for CONFIG_SCx200 not in final .config
Requested value:  # CONFIG_SCx200 is not set
Actual value:     

Value requested for CONFIG_OLPC not in final .config
Requested value:  # CONFIG_OLPC is not set
Actual value:     

Value requested for CONFIG_ALIX not in final .config
Requested value:  # CONFIG_ALIX is not set
Actual value:     

Value requested for CONFIG_NET5501 not in final .config
Requested value:  # CONFIG_NET5501 is not set
Actual value:     

Value requested for CONFIG_GEOS not in final .config
Requested value:  # CONFIG_GEOS is not set
Actual value:     

Value requested for CONFIG_COMPAT_32 not in final .config
Requested value:  CONFIG_COMPAT_32=y
Actual value:     

Value requested for CONFIG_HAVE_ATOMIC_IOMAP not in final .config
Requested value:  CONFIG_HAVE_ATOMIC_IOMAP=y
Actual value:     

Value requested for CONFIG_ARCH_32BIT_OFF_T not in final .config
Requested value:  CONFIG_ARCH_32BIT_OFF_T=y
Actual value:     

Value requested for CONFIG_ARCH_WANT_IPC_PARSE_VERSION not in final .config
Requested value:  CONFIG_ARCH_WANT_IPC_PARSE_VERSION=y
Actual value:     

Value requested for CONFIG_MODULES_USE_ELF_REL not in final .config
Requested value:  CONFIG_MODULES_USE_ELF_REL=y
Actual value:     

Value requested for CONFIG_ARCH_MMAP_RND_BITS not in final .config
Requested value:  CONFIG_ARCH_MMAP_RND_BITS=8
Actual value:     CONFIG_ARCH_MMAP_RND_BITS=28

Value requested for CONFIG_CLONE_BACKWARDS not in final .config
Requested value:  CONFIG_CLONE_BACKWARDS=y
Actual value:     

Value requested for CONFIG_OLD_SIGSUSPEND3 not in final .config
Requested value:  CONFIG_OLD_SIGSUSPEND3=y
Actual value:     

Value requested for CONFIG_OLD_SIGACTION not in final .config
Requested value:  CONFIG_OLD_SIGACTION=y
Actual value:     

Value requested for CONFIG_ARCH_SPLIT_ARG64 not in final .config
Requested value:  CONFIG_ARCH_SPLIT_ARG64=y
Actual value:     

Value requested for CONFIG_FUNCTION_ALIGNMENT not in final .config
Requested value:  CONFIG_FUNCTION_ALIGNMENT=4
Actual value:     CONFIG_FUNCTION_ALIGNMENT=16

Value requested for CONFIG_SELECT_MEMORY_MODEL not in final .config
Requested value:  CONFIG_SELECT_MEMORY_MODEL=y
Actual value:     

Value requested for CONFIG_FLATMEM_MANUAL not in final .config
Requested value:  CONFIG_FLATMEM_MANUAL=y
Actual value:     

Value requested for CONFIG_SPARSEMEM_MANUAL not in final .config
Requested value:  # CONFIG_SPARSEMEM_MANUAL is not set
Actual value:     

Value requested for CONFIG_FLATMEM not in final .config
Requested value:  CONFIG_FLATMEM=y
Actual value:     

Value requested for CONFIG_SPARSEMEM_STATIC not in final .config
Requested value:  CONFIG_SPARSEMEM_STATIC=y
Actual value:     

Value requested for CONFIG_BOUNCE not in final .config
Requested value:  CONFIG_BOUNCE=y
Actual value:     

Value requested for CONFIG_KMAP_LOCAL not in final .config
Requested value:  CONFIG_KMAP_LOCAL=y
Actual value:     

Value requested for CONFIG_HOTPLUG_PCI_COMPAQ not in final .config
Requested value:  # CONFIG_HOTPLUG_PCI_COMPAQ is not set
Actual value:     

Value requested for CONFIG_HOTPLUG_PCI_IBM not in final .config
Requested value:  # CONFIG_HOTPLUG_PCI_IBM is not set
Actual value:     

Value requested for CONFIG_EFI_CAPSULE_QUIRK_QUARK_CSH not in final .config
Requested value:  CONFIG_EFI_CAPSULE_QUIRK_QUARK_CSH=y
Actual value:     

Value requested for CONFIG_PCH_PHUB not in final .config
Requested value:  # CONFIG_PCH_PHUB is not set
Actual value:     

Value requested for CONFIG_SCSI_NSP32 not in final .config
Requested value:  # CONFIG_SCSI_NSP32 is not set
Actual value:     

Value requested for CONFIG_PATA_CS5520 not in final .config
Requested value:  # CONFIG_PATA_CS5520 is not set
Actual value:     

Value requested for CONFIG_PATA_CS5530 not in final .config
Requested value:  # CONFIG_PATA_CS5530 is not set
Actual value:     

Value requested for CONFIG_PATA_CS5535 not in final .config
Requested value:  # CONFIG_PATA_CS5535 is not set
Actual value:     

Value requested for CONFIG_PATA_CS5536 not in final .config
Requested value:  # CONFIG_PATA_CS5536 is not set
Actual value:     

Value requested for CONFIG_PATA_SC1200 not in final .config
Requested value:  # CONFIG_PATA_SC1200 is not set
Actual value:     

Value requested for CONFIG_PCH_GBE not in final .config
Requested value:  # CONFIG_PCH_GBE is not set
Actual value:     

Value requested for CONFIG_INPUT_WISTRON_BTNS not in final .config
Requested value:  # CONFIG_INPUT_WISTRON_BTNS is not set
Actual value:     

Value requested for CONFIG_SERIAL_TIMBERDALE not in final .config
Requested value:  # CONFIG_SERIAL_TIMBERDALE is not set
Actual value:     

Value requested for CONFIG_SERIAL_PCH_UART not in final .config
Requested value:  # CONFIG_SERIAL_PCH_UART is not set
Actual value:     

Value requested for CONFIG_HW_RANDOM_GEODE not in final .config
Requested value:  CONFIG_HW_RANDOM_GEODE=y
Actual value:     

Value requested for CONFIG_SONYPI not in final .config
Requested value:  # CONFIG_SONYPI is not set
Actual value:     

Value requested for CONFIG_PC8736x_GPIO not in final .config
Requested value:  # CONFIG_PC8736x_GPIO is not set
Actual value:     

Value requested for CONFIG_NSC_GPIO not in final .config
Requested value:  # CONFIG_NSC_GPIO is not set
Actual value:     

Value requested for CONFIG_I2C_EG20T not in final .config
Requested value:  # CONFIG_I2C_EG20T is not set
Actual value:     

Value requested for CONFIG_SCx200_ACB not in final .config
Requested value:  # CONFIG_SCx200_ACB is not set
Actual value:     

Value requested for CONFIG_PTP_1588_CLOCK_PCH not in final .config
Requested value:  # CONFIG_PTP_1588_CLOCK_PCH is not set
Actual value:     

Value requested for CONFIG_SBC8360_WDT not in final .config
Requested value:  # CONFIG_SBC8360_WDT is not set
Actual value:     

Value requested for CONFIG_SBC7240_WDT not in final .config
Requested value:  # CONFIG_SBC7240_WDT is not set
Actual value:     

Value requested for CONFIG_MFD_CS5535 not in final .config
Requested value:  # CONFIG_MFD_CS5535 is not set
Actual value:     

Value requested for CONFIG_AGP_ALI not in final .config
Requested value:  # CONFIG_AGP_ALI is not set
Actual value:     

Value requested for CONFIG_AGP_ATI not in final .config
Requested value:  # CONFIG_AGP_ATI is not set
Actual value:     

Value requested for CONFIG_AGP_AMD not in final .config
Requested value:  # CONFIG_AGP_AMD is not set
Actual value:     

Value requested for CONFIG_AGP_NVIDIA not in final .config
Requested value:  # CONFIG_AGP_NVIDIA is not set
Actual value:     

Value requested for CONFIG_AGP_SWORKS not in final .config
Requested value:  # CONFIG_AGP_SWORKS is not set
Actual value:     

Value requested for CONFIG_AGP_EFFICEON not in final .config
Requested value:  # CONFIG_AGP_EFFICEON is not set
Actual value:     

Value requested for CONFIG_SND_PCM not in final .config
Requested value:  CONFIG_SND_PCM=y
Actual value:     CONFIG_SND_PCM=m

Value requested for CONFIG_SND_HWDEP not in final .config
Requested value:  CONFIG_SND_HWDEP=y
Actual value:     CONFIG_SND_HWDEP=m

Value requested for CONFIG_SND_DYNAMIC_MINORS not in final .config
Requested value:  # CONFIG_SND_DYNAMIC_MINORS is not set
Actual value:     CONFIG_SND_DYNAMIC_MINORS=y

Value requested for CONFIG_SND_CS5530 not in final .config
Requested value:  # CONFIG_SND_CS5530 is not set
Actual value:     

Value requested for CONFIG_SND_CS5535AUDIO not in final .config
Requested value:  # CONFIG_SND_CS5535AUDIO is not set
Actual value:     

Value requested for CONFIG_SND_SIS7019 not in final .config
Requested value:  # CONFIG_SND_SIS7019 is not set
Actual value:     

Value requested for CONFIG_SND_HDA not in final .config
Requested value:  CONFIG_SND_HDA=y
Actual value:     CONFIG_SND_HDA=m

Value requested for CONFIG_SND_HDA_CORE not in final .config
Requested value:  CONFIG_SND_HDA_CORE=y
Actual value:     CONFIG_SND_HDA_CORE=m

Value requested for CONFIG_SND_INTEL_DSP_CONFIG not in final .config
Requested value:  CONFIG_SND_INTEL_DSP_CONFIG=y
Actual value:     CONFIG_SND_INTEL_DSP_CONFIG=m

Value requested for CONFIG_SND_INTEL_SOUNDWIRE_ACPI not in final .config
Requested value:  CONFIG_SND_INTEL_SOUNDWIRE_ACPI=y
Actual value:     CONFIG_SND_INTEL_SOUNDWIRE_ACPI=m

Value requested for CONFIG_LEDS_OT200 not in final .config
Requested value:  # CONFIG_LEDS_OT200 is not set
Actual value:     

Value requested for CONFIG_PCH_DMA not in final .config
Requested value:  # CONFIG_PCH_DMA is not set
Actual value:     

Value requested for CONFIG_CLKSRC_I8253 not in final .config
Requested value:  CONFIG_CLKSRC_I8253=y
Actual value:     

Value requested for CONFIG_MAILBOX not in final .config
Requested value:  # CONFIG_MAILBOX is not set
Actual value:     CONFIG_MAILBOX=y

Value requested for CONFIG_CRYPTO_SERPENT_SSE2_586 not in final .config
Requested value:  # CONFIG_CRYPTO_SERPENT_SSE2_586 is not set
Actual value:     

Value requested for CONFIG_CRYPTO_TWOFISH_586 not in final .config
Requested value:  # CONFIG_CRYPTO_TWOFISH_586 is not set
Actual value:     

Value requested for CONFIG_CRYPTO_DEV_GEODE not in final .config
Requested value:  # CONFIG_CRYPTO_DEV_GEODE is not set
Actual value:     

Value requested for CONFIG_CRYPTO_DEV_HIFN_795X not in final .config
Requested value:  # CONFIG_CRYPTO_DEV_HIFN_795X is not set
Actual value:     

Value requested for CONFIG_CRYPTO_LIB_POLY1305_RSIZE not in final .config
Requested value:  CONFIG_CRYPTO_LIB_POLY1305_RSIZE=1
Actual value:     CONFIG_CRYPTO_LIB_POLY1305_RSIZE=11

Value requested for CONFIG_AUDIT_GENERIC not in final .config
Requested value:  CONFIG_AUDIT_GENERIC=y
Actual value:     

Value requested for CONFIG_GENERIC_VDSO_32 not in final .config
Requested value:  CONFIG_GENERIC_VDSO_32=y
Actual value:     

Value requested for CONFIG_DEBUG_KMAP_LOCAL not in final .config
Requested value:  # CONFIG_DEBUG_KMAP_LOCAL is not set
Actual value:     

Value requested for CONFIG_DEBUG_HIGHMEM not in final .config
Requested value:  # CONFIG_DEBUG_HIGHMEM is not set
Actual value:     

Value requested for CONFIG_HAVE_DEBUG_STACKOVERFLOW not in final .config
Requested value:  CONFIG_HAVE_DEBUG_STACKOVERFLOW=y
Actual value:     

Value requested for CONFIG_DEBUG_STACKOVERFLOW not in final .config
Requested value:  # CONFIG_DEBUG_STACKOVERFLOW is not set
Actual value:     

Value requested for CONFIG_HAVE_FUNCTION_GRAPH_TRACER not in final .config
Requested value:  CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
Actual value:     

Value requested for CONFIG_HAVE_FUNCTION_GRAPH_RETVAL not in final .config
Requested value:  CONFIG_HAVE_FUNCTION_GRAPH_RETVAL=y
Actual value:     

Value requested for CONFIG_DRM_KUNIT_TEST not in final .config
Requested value:  CONFIG_DRM_KUNIT_TEST=m
Actual value:     

Value requested for CONFIG_DRM_XE_WERROR not in final .config
Requested value:  CONFIG_DRM_XE_WERROR=y
Actual value:     

Value requested for CONFIG_DRM_XE_DEBUG not in final .config
Requested value:  CONFIG_DRM_XE_DEBUG=y
Actual value:     

Value requested for CONFIG_DRM_XE_DEBUG_MEM not in final .config
Requested value:  CONFIG_DRM_XE_DEBUG_MEM=y
Actual value:     

Value requested for CONFIG_DRM_XE_KUNIT_TEST not in final .config
Requested value:  CONFIG_DRM_XE_KUNIT_TEST=m
Actual value:     

++ nproc
+ make -j48 ARCH=i386 olddefconfig
  GEN     Makefile

WARNING: unmet direct dependencies detected for FB_IOMEM_HELPERS
  Depends on [n]: HAS_IOMEM [=y] && FB_CORE [=n]
  Selected by [m]:
  - DRM_XE_DISPLAY [=y] && HAS_IOMEM [=y] && DRM [=y] && DRM_XE [=m] && DRM_XE [=m]=m [=m]
#
# configuration written to .config
#
++ nproc
+ make -j48 ARCH=i386
  SYNC    include/config/auto.conf.cmd
  GEN     Makefile

WARNING: unmet direct dependencies detected for FB_IOMEM_HELPERS
  Depends on [n]: HAS_IOMEM [=y] && FB_CORE [=n]
  Selected by [m]:
  - DRM_XE_DISPLAY [=y] && HAS_IOMEM [=y] && DRM [=y] && DRM_XE [=m] && DRM_XE [=m]=m [=m]

WARNING: unmet direct dependencies detected for FB_IOMEM_HELPERS
  Depends on [n]: HAS_IOMEM [=y] && FB_CORE [=n]
  Selected by [m]:
  - DRM_XE_DISPLAY [=y] && HAS_IOMEM [=y] && DRM [=y] && DRM_XE [=m] && DRM_XE [=m]=m [=m]

WARNING: unmet direct dependencies detected for FB_IOMEM_HELPERS
  Depends on [n]: HAS_IOMEM [=y] && FB_CORE [=n]
  Selected by [m]:
  - DRM_XE_DISPLAY [=y] && HAS_IOMEM [=y] && DRM [=y] && DRM_XE [=m] && DRM_XE [=m]=m [=m]
  GEN     Makefile
  WRAP    arch/x86/include/generated/uapi/asm/bpf_perf_event.h
  WRAP    arch/x86/include/generated/uapi/asm/errno.h
  WRAP    arch/x86/include/generated/uapi/asm/fcntl.h
  WRAP    arch/x86/include/generated/uapi/asm/ioctl.h
  WRAP    arch/x86/include/generated/uapi/asm/ipcbuf.h
  WRAP    arch/x86/include/generated/uapi/asm/ioctls.h
  WRAP    arch/x86/include/generated/uapi/asm/param.h
  UPD     include/generated/uapi/linux/version.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_32.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_64.h
  WRAP    arch/x86/include/generated/uapi/asm/poll.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_x32.h
  WRAP    arch/x86/include/generated/uapi/asm/resource.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  WRAP    arch/x86/include/generated/uapi/asm/socket.h
  WRAP    arch/x86/include/generated/uapi/asm/sockios.h
  WRAP    arch/x86/include/generated/uapi/asm/termbits.h
  WRAP    arch/x86/include/generated/uapi/asm/termios.h
  WRAP    arch/x86/include/generated/uapi/asm/types.h
  UPD     include/generated/compile.h
  HOSTCC  arch/x86/tools/relocs_32.o
  HOSTCC  arch/x86/tools/relocs_64.o
  HOSTCC  arch/x86/tools/relocs_common.o
  WRAP    arch/x86/include/generated/asm/early_ioremap.h
  WRAP    arch/x86/include/generated/asm/mcs_spinlock.h
  WRAP    arch/x86/include/generated/asm/mmzone.h
  WRAP    arch/x86/include/generated/asm/irq_regs.h
  WRAP    arch/x86/include/generated/asm/kmap_size.h
  WRAP    arch/x86/include/generated/asm/local64.h
  WRAP    arch/x86/include/generated/asm/mmiowb.h
  WRAP    arch/x86/include/generated/asm/module.lds.h
  WRAP    arch/x86/include/generated/asm/rwonce.h
  HOSTCC  scripts/kallsyms
  HOSTCC  scripts/sorttable
  HOSTCC  scripts/asn1_compiler
  HOSTCC  scripts/selinux/genheaders/genheaders
  HOSTCC  scripts/selinux/mdp/mdp
  HOSTLD  arch/x86/tools/relocs
  UPD     include/config/kernel.release
  UPD     include/generated/utsrelease.h
  CC      scripts/mod/empty.o
  HOSTCC  scripts/mod/mk_elfconfig
  CC      scripts/mod/devicetable-offsets.s
  UPD     scripts/mod/devicetable-offsets.h
  MKELF   scripts/mod/elfconfig.h
  HOSTCC  scripts/mod/modpost.o
  HOSTCC  scripts/mod/file2alias.o
  HOSTCC  scripts/mod/sumversion.o
  HOSTCC  scripts/mod/symsearch.o
  HOSTLD  scripts/mod/modpost
  CC      kernel/bounds.s
  CHKSHA1 /workspace/kernel/include/linux/atomic/atomic-arch-fallback.h
  CHKSHA1 /workspace/kernel/include/linux/atomic/atomic-instrumented.h
  CHKSHA1 /workspace/kernel/include/linux/atomic/atomic-long.h
  UPD     include/generated/timeconst.h
  UPD     include/generated/bounds.h
  CC      arch/x86/kernel/asm-offsets.s
  UPD     include/generated/asm-offsets.h
  CALL    /workspace/kernel/scripts/checksyscalls.sh
  LDS     scripts/module.lds
  HOSTCC  usr/gen_init_cpio
  CC      init/main.o
  CC      init/do_mounts.o
  CC      certs/system_keyring.o
  CC      init/do_mounts_initrd.o
  UPD     init/utsversion-tmp.h
  CC      ipc/util.o
  CC      init/initramfs.o
  CC      ipc/msgutil.o
  CC      init/calibrate.o
  CC      security/commoncap.o
  CC      mm/filemap.o
  AS      arch/x86/lib/atomic64_cx8_32.o
  CC      security/lsm_syscalls.o
  CC      ipc/msg.o
  CC      io_uring/io_uring.o
  CC      init/init_task.o
  CC      mm/mempool.o
  CC      block/bdev.o
  CC      security/min_addr.o
  AS      arch/x86/lib/checksum_32.o
  CC      mm/oom_kill.o
  CC      arch/x86/lib/cmdline.o
  CC      io_uring/opdef.o
  CC      ipc/sem.o
  CC      arch/x86/video/video-common.o
  CC      security/keys/gc.o
  GEN     security/selinux/flask.h security/selinux/av_permissions.h
  CC      arch/x86/realmode/init.o
  AR      arch/x86/net/built-in.a
  CC      security/integrity/iint.o
  CC      arch/x86/pci/i386.o
  CC      block/partitions/core.o
  CC      arch/x86/power/cpu.o
  AR      arch/x86/crypto/built-in.a
  AR      virt/lib/built-in.a
  CC      security/selinux/avc.o
  AR      arch/x86/platform/atom/built-in.a
  CC      lib/math/div64.o
  AR      arch/x86/virt/svm/built-in.a
  CC      net/core/sock.o
  CC      arch/x86/mm/pat/set_memory.o
  AR      drivers/cache/built-in.a
  CC      fs/notify/dnotify/dnotify.o
  CC      arch/x86/events/amd/core.o
  AR      virt/built-in.a
  CC      arch/x86/kernel/fpu/init.o
  CC      arch/x86/events/amd/lbr.o
  CC      sound/core/seq/seq.o
  CC      arch/x86/events/intel/core.o
  AR      arch/x86/virt/vmx/built-in.a
  AR      arch/x86/platform/ce4100/built-in.a
  CC      lib/math/gcd.o
  AR      drivers/irqchip/built-in.a
  AR      arch/x86/virt/built-in.a
  CC      arch/x86/entry/vdso/vma.o
  CC      ipc/shm.o
  AR      sound/i2c/other/built-in.a
  CC      arch/x86/events/amd/ibs.o
  AR      sound/i2c/built-in.a
  CC      arch/x86/platform/efi/memmap.o
  AR      drivers/bus/mhi/built-in.a
  AR      drivers/bus/built-in.a
  CC      arch/x86/events/amd/uncore.o
  CC      kernel/sched/core.o
  AR      drivers/pwm/built-in.a
  CC      crypto/asymmetric_keys/asymmetric_type.o
  AR      drivers/leds/trigger/built-in.a
  AR      drivers/leds/blink/built-in.a
  AR      drivers/leds/simple/built-in.a
  HOSTCC  certs/extract-cert
  AS      arch/x86/lib/cmpxchg8b_emu.o
  CC      drivers/leds/led-core.o
  CC      arch/x86/lib/cpu.o
  CC      lib/math/lcm.o
  CC      lib/math/int_log.o
  CC      lib/math/int_pow.o
  AR      sound/drivers/opl3/built-in.a
  AR      sound/drivers/opl4/built-in.a
  AR      sound/drivers/mpu401/built-in.a
  GEN     usr/initramfs_data.cpio
  AR      sound/drivers/vx/built-in.a
  COPY    usr/initramfs_inc_data
  AS      usr/initramfs_data.o
  AR      sound/drivers/pcsp/built-in.a
  CC      arch/x86/pci/init.o
  AR      sound/drivers/built-in.a
  AR      usr/built-in.a
  CC      lib/math/int_sqrt.o
  CC      arch/x86/kernel/fpu/bugs.o
  CC      arch/x86/pci/pcbios.o
  CC      net/ethernet/eth.o
  CERT    certs/x509_certificate_list
  CERT    certs/signing_key.x509
  AS      certs/system_certificates.o
  AR      certs/built-in.a
  AR      net/802/built-in.a
  CC      lib/math/reciprocal_div.o
  CC      kernel/locking/mutex.o
  CC      arch/x86/kernel/fpu/core.o
  CC      arch/x86/lib/delay.o
  CC      lib/math/rational.o
  CC      sound/core/seq/seq_lock.o
  AS      arch/x86/realmode/rm/header.o
  AR      arch/x86/video/built-in.a
  AS      arch/x86/realmode/rm/trampoline_32.o
  AS      arch/x86/realmode/rm/stack.o
  AS      arch/x86/realmode/rm/reboot.o
  CC      fs/nfs_common/nfsacl.o
  CC      arch/x86/events/intel/bts.o
  AS      arch/x86/realmode/rm/wakeup_asm.o
  CC      arch/x86/realmode/rm/wakemain.o
  CC      fs/iomap/trace.o
  CC      sound/core/seq/seq_clientmgr.o
  CC      drivers/leds/led-class.o
  CC      arch/x86/kernel/fpu/regset.o
  CC      block/partitions/msdos.o
  CC      block/partitions/efi.o
  CC      kernel/power/qos.o
  CC      security/integrity/integrity_audit.o
  CC      arch/x86/realmode/rm/video-mode.o
  CC      security/keys/key.o
  CC      crypto/asymmetric_keys/restrict.o
  CC      arch/x86/entry/vdso/extable.o
  AR      fs/notify/dnotify/built-in.a
  CC      fs/notify/inotify/inotify_fsnotify.o
  CC      arch/x86/platform/efi/quirks.o
  AS      arch/x86/lib/getuser.o
  CC      arch/x86/power/hibernate_32.o
  CC      security/keys/keyring.o
  GEN     arch/x86/lib/inat-tables.c
  CC      security/keys/keyctl.o
  CC      arch/x86/lib/insn-eval.o
  AS      arch/x86/realmode/rm/copy.o
  CC      arch/x86/events/zhaoxin/core.o
  AS      arch/x86/realmode/rm/bioscall.o
  AR      lib/math/built-in.a
  CC      arch/x86/realmode/rm/regs.o
  CC      arch/x86/mm/pat/memtype.o
  CC      lib/crypto/mpi/generic_mpih-lshift.o
  CC      arch/x86/realmode/rm/video-vga.o
  CC      arch/x86/kernel/cpu/mce/core.o
  CC      arch/x86/kernel/acpi/boot.o
  CC      kernel/printk/printk.o
  CC      arch/x86/pci/mmconfig_32.o
  CC      arch/x86/realmode/rm/video-vesa.o
  CC      kernel/irq/irqdesc.o
  CC      arch/x86/realmode/rm/video-bios.o
  CC      kernel/rcu/update.o
  CC      fs/notify/inotify/inotify_user.o
  CC      arch/x86/events/core.o
  CC      init/version.o
  AR      arch/x86/events/amd/built-in.a
  PASYMS  arch/x86/realmode/rm/pasyms.h
  AS      arch/x86/entry/entry.o
  AR      arch/x86/entry/vsyscall/built-in.a
  CC      kernel/rcu/sync.o
  CC      drivers/leds/led-triggers.o
  CC      crypto/asymmetric_keys/signature.o
  LDS     arch/x86/realmode/rm/realmode.lds
  CC      security/selinux/hooks.o
  LD      arch/x86/realmode/rm/realmode.elf
  CC      arch/x86/pci/direct.o
  RELOCS  arch/x86/realmode/rm/realmode.relocs
  OBJCOPY arch/x86/realmode/rm/realmode.bin
  AS      arch/x86/realmode/rmpiggy.o
  AR      arch/x86/realmode/built-in.a
  CC      arch/x86/kernel/fpu/signal.o
  CC      io_uring/kbuf.o
  CC      fs/nfs_common/grace.o
  CC      fs/nfs_common/common.o
  CC      drivers/pci/msi/pcidev_msi.o
  CC      security/security.o
  AR      init/built-in.a
  AR      kernel/livepatch/built-in.a
  AR      security/integrity/built-in.a
  CC      lib/crypto/mpi/generic_mpih-mul1.o
  AS      arch/x86/power/hibernate_asm_32.o
  CC      arch/x86/power/hibernate.o
  CC      arch/x86/kernel/cpu/mce/severity.o
  CC      io_uring/rsrc.o
  CC      arch/x86/events/probe.o
  LDS     arch/x86/entry/vdso/vdso32/vdso32.lds
  CC      arch/x86/mm/pat/memtype_interval.o
  AS      arch/x86/entry/vdso/vdso32/note.o
  AS      arch/x86/entry/vdso/vdso32/system_call.o
  CC      kernel/printk/printk_safe.o
  CC      arch/x86/mm/init.o
  CC      ipc/syscall.o
  AS      arch/x86/entry/vdso/vdso32/sigreturn.o
  AR      block/partitions/built-in.a
  CC      arch/x86/entry/vdso/vdso32/vclock_gettime.o
  CC      block/fops.o
  CC      kernel/locking/semaphore.o
  CC      security/lsm_audit.o
  CC      kernel/irq/handle.o
  CC      arch/x86/lib/insn.o
  CC      drivers/pci/msi/api.o
  CC      arch/x86/platform/efi/efi.o
  CC      crypto/asymmetric_keys/public_key.o
  AR      arch/x86/events/zhaoxin/built-in.a
  ASN.1   crypto/asymmetric_keys/x509.asn1.[ch]
  CC      net/core/request_sock.o
  CC      kernel/dma/mapping.o
  AR      net/ethernet/built-in.a
  AR      fs/notify/fanotify/built-in.a
  CC      drivers/pci/msi/msi.o
  CC      arch/x86/platform/efi/efi_32.o
  CC      arch/x86/events/intel/ds.o
  CC      fs/iomap/iter.o
  CC      drivers/video/console/dummycon.o
  CC      arch/x86/pci/mmconfig-shared.o
  CC      kernel/power/main.o
  CC      sound/core/seq/seq_memory.o
  CC      arch/x86/pci/fixup.o
  AR      drivers/idle/built-in.a
  CC      arch/x86/lib/kaslr.o
  CC      fs/iomap/buffered-io.o
  AR      drivers/leds/built-in.a
  CC      lib/crypto/mpi/generic_mpih-mul2.o
  CC      arch/x86/lib/memcpy_32.o
  CC      security/keys/permission.o
  CC      kernel/irq/manage.o
  AR      fs/notify/inotify/built-in.a
  AR      fs/nfs_common/built-in.a
  CC      kernel/power/console.o
  CC      fs/notify/fsnotify.o
  CC      arch/x86/entry/vdso/vdso32/vgetcpu.o
  CC      arch/x86/kernel/acpi/sleep.o
  CC      io_uring/notif.o
  AR      arch/x86/power/built-in.a
  CC      drivers/pci/msi/irqdomain.o
  CC      lib/zlib_inflate/inffast.o
  CC      arch/x86/kernel/fpu/xstate.o
  AR      arch/x86/mm/pat/built-in.a
  CC      mm/fadvise.o
  CC      mm/maccess.o
  AS      arch/x86/lib/memmove_32.o
  CC      arch/x86/lib/misc.o
  HOSTCC  arch/x86/entry/vdso/vdso2c
  CC      arch/x86/lib/pc-conf-reg.o
  CC      ipc/ipc_sysctl.o
  CC      kernel/locking/rwsem.o
  CC      lib/crypto/memneq.o
  CC      lib/zlib_inflate/inflate.o
  ASN.1   crypto/asymmetric_keys/x509_akid.asn1.[ch]
  CC      crypto/asymmetric_keys/x509_loader.o
  AS      arch/x86/lib/putuser.o
  CC      lib/crypto/utils.o
  CC      drivers/video/console/vgacon.o
  AS      arch/x86/lib/retpoline.o
  CC      mm/page-writeback.o
  CC      arch/x86/lib/string_32.o
  CC      arch/x86/mm/init_32.o
  CC      arch/x86/lib/strstr_32.o
  CC      lib/zlib_inflate/infutil.o
  CC      lib/crypto/mpi/generic_mpih-mul3.o
  CC      arch/x86/lib/usercopy.o
  CC      crypto/asymmetric_keys/x509_public_key.o
  CC      lib/zlib_inflate/inftrees.o
  CC      block/bio.o
  AS      arch/x86/platform/efi/efi_stub_32.o
  CC      arch/x86/mm/fault.o
  CC      arch/x86/entry/vdso/vdso32-setup.o
  CC      arch/x86/platform/efi/runtime-map.o
  CC      arch/x86/events/intel/knc.o
  CC      security/keys/process_keys.o
  CC      arch/x86/kernel/cpu/mce/genpool.o
  CC      kernel/sched/fair.o
  CC      arch/x86/events/utils.o
  CC      sound/core/seq/seq_queue.o
  CC      sound/core/seq/seq_fifo.o
  CC      ipc/mqueue.o
  CC      kernel/rcu/srcutree.o
  AS      arch/x86/kernel/acpi/wakeup_32.o
  CC      arch/x86/kernel/acpi/cstate.o
  CC      lib/zlib_deflate/deflate.o
  CC      sound/core/sound.o
  CC      lib/lzo/lzo1x_compress.o
  CC      arch/x86/pci/acpi.o
  CC      arch/x86/lib/usercopy_32.o
  ASN.1   crypto/asymmetric_keys/pkcs7.asn1.[ch]
  CC      arch/x86/events/intel/lbr.o
  CC      fs/notify/notification.o
  CC      kernel/locking/percpu-rwsem.o
  AR      drivers/pci/msi/built-in.a
  CC      kernel/power/process.o
  CC      drivers/video/backlight/backlight.o
  CC      drivers/pci/pcie/portdrv.o
  VDSO    arch/x86/entry/vdso/vdso32.so.dbg
  CC      lib/zlib_inflate/inflate_syms.o
  OBJCOPY arch/x86/entry/vdso/vdso32.so
  VDSO2C  arch/x86/entry/vdso/vdso-image-32.c
  CC      net/core/skbuff.o
  CC      arch/x86/entry/vdso/vdso-image-32.o
  CC      net/core/datagram.o
  CC      sound/core/seq/seq_prioq.o
  CC      io_uring/tctx.o
  CC      security/selinux/selinuxfs.o
  CC      lib/crypto/mpi/generic_mpih-rshift.o
  CC      crypto/asymmetric_keys/pkcs7_trust.o
  CC      arch/x86/lib/msr-smp.o
  CC      kernel/printk/nbcon.o
  AR      arch/x86/entry/vdso/built-in.a
  AS      arch/x86/entry/entry_32.o
  AR      sound/isa/ad1816a/built-in.a
  CC      arch/x86/entry/syscall_32.o
  AR      sound/isa/ad1848/built-in.a
  AR      sound/isa/cs423x/built-in.a
  AR      lib/zlib_inflate/built-in.a
  CC      arch/x86/events/intel/p4.o
  AR      sound/isa/es1688/built-in.a
  AR      sound/isa/galaxy/built-in.a
  AR      sound/isa/gus/built-in.a
  AR      sound/isa/msnd/built-in.a
  CC      lib/lzo/lzo1x_decompress_safe.o
  AR      arch/x86/kernel/fpu/built-in.a
  AR      sound/isa/opti9xx/built-in.a
  CC      security/device_cgroup.o
  AR      sound/isa/sb/built-in.a
  CC      mm/folio-compat.o
  AR      sound/isa/wavefront/built-in.a
  AR      arch/x86/kernel/acpi/built-in.a
  CC      arch/x86/kernel/cpu/mce/intel.o
  AR      drivers/pci/pwrctl/built-in.a
  CC      crypto/api.o
  AR      sound/isa/wss/built-in.a
  AR      arch/x86/platform/efi/built-in.a
  CC      net/core/stream.o
  AR      sound/isa/built-in.a
  CC      kernel/irq/spurious.o
  AR      arch/x86/platform/geode/built-in.a
  CC      arch/x86/events/intel/p6.o
  AR      arch/x86/platform/iris/built-in.a
  CC      kernel/entry/common.o
  CC      arch/x86/platform/intel/iosf_mbi.o
  CC      arch/x86/lib/cache-smp.o
  CC      arch/x86/lib/msr.o
  CC      kernel/rcu/tree.o
  CC      crypto/asymmetric_keys/pkcs7_verify.o
  AR      drivers/video/console/built-in.a
  CC      kernel/locking/spinlock.o
  CC      arch/x86/kernel/cpu/mtrr/mtrr.o
  CC      fs/notify/group.o
  CC      arch/x86/kernel/cpu/mtrr/if.o
  CC      kernel/dma/direct.o
  CC      security/keys/request_key.o
  CC      security/selinux/netlink.o
  CC      arch/x86/pci/legacy.o
  CC      kernel/locking/osq_lock.o
  CC      sound/core/seq/seq_timer.o
  CC      lib/zlib_deflate/deftree.o
  CC      lib/crypto/mpi/generic_mpih-sub1.o
  CC      drivers/pci/pcie/rcec.o
  CC      lib/crypto/chacha.o
  AR      drivers/video/backlight/built-in.a
  CC      drivers/video/aperture.o
  AR      drivers/video/fbdev/core/built-in.a
  AR      drivers/video/fbdev/omap/built-in.a
  AR      drivers/video/fbdev/omap2/omapfb/dss/built-in.a
  AR      lib/lzo/built-in.a
  CC      net/core/scm.o
  AR      drivers/video/fbdev/omap2/omapfb/displays/built-in.a
  CC      kernel/dma/ops_helpers.o
  AR      drivers/video/fbdev/omap2/omapfb/built-in.a
  AR      drivers/video/fbdev/omap2/built-in.a
  AR      drivers/video/fbdev/built-in.a
  CC      fs/quota/dquot.o
  CC      kernel/locking/qspinlock.o
  CC      kernel/module/main.o
  CC      arch/x86/mm/ioremap.o
  CC      crypto/asymmetric_keys/x509.asn1.o
  CC      fs/iomap/direct-io.o
  CC      crypto/asymmetric_keys/x509_akid.asn1.o
  CC      crypto/asymmetric_keys/x509_cert_parser.o
  CC      kernel/irq/resend.o
  CC      kernel/time/time.o
  CC      arch/x86/kernel/cpu/mce/amd.o
  CC      kernel/printk/printk_ringbuffer.o
  CC      io_uring/filetable.o
  CC      kernel/power/suspend.o
  CC      security/keys/request_key_auth.o
  CC      lib/crypto/mpi/generic_mpih-add1.o
  CC      arch/x86/entry/common.o
  AR      arch/x86/platform/intel/built-in.a
  CC      crypto/cipher.o
  AR      arch/x86/platform/intel-mid/built-in.a
  AR      arch/x86/platform/intel-quark/built-in.a
  CC      arch/x86/pci/irq.o
  CC      drivers/video/cmdline.o
  AR      arch/x86/platform/olpc/built-in.a
  CC      drivers/video/nomodeset.o
  CC      lib/zlib_deflate/deflate_syms.o
  AR      arch/x86/platform/scx200/built-in.a
  CC      kernel/module/strict_rwx.o
  AR      arch/x86/platform/ts5500/built-in.a
  AR      arch/x86/platform/uv/built-in.a
  AS      arch/x86/lib/msr-reg.o
  AR      arch/x86/platform/built-in.a
  CC      fs/notify/mark.o
  CC      kernel/dma/remap.o
  CC      kernel/power/hibernate.o
  CC      arch/x86/lib/msr-reg-export.o
  CC      arch/x86/kernel/cpu/mtrr/generic.o
  CC      block/elevator.o
  CC      arch/x86/events/intel/pt.o
  CC      ipc/namespace.o
  CC      kernel/locking/rtmutex_api.o
  CC      sound/core/seq/seq_system.o
  CC      mm/readahead.o
  CC      drivers/pci/pcie/aspm.o
  CC      ipc/mq_sysctl.o
  CC      kernel/irq/chip.o
  CC      block/blk-core.o
  CC      lib/lz4/lz4_decompress.o
  AS      arch/x86/lib/hweight.o
  CC      arch/x86/events/intel/uncore.o
  CC      arch/x86/lib/iomem.o
  CC      kernel/entry/syscall_user_dispatch.o
  CC      security/selinux/nlmsgtab.o
  CC      block/blk-sysfs.o
  CC      crypto/asymmetric_keys/pkcs7.asn1.o
  CC      crypto/asymmetric_keys/pkcs7_parser.o
  CC      kernel/power/snapshot.o
  AR      lib/zlib_deflate/built-in.a
  CC      net/sched/sch_generic.o
  CC      kernel/time/timer.o
  CC      arch/x86/kernel/cpu/mtrr/cleanup.o
  CC      lib/crypto/mpi/mpicoder.o
  CC      security/keys/user_defined.o
  CC      arch/x86/mm/extable.o
  CC      kernel/printk/sysctl.o
  CC      drivers/video/hdmi.o
  CC      io_uring/rw.o
  CC      fs/iomap/fiemap.o
  CC      mm/swap.o
  AR      kernel/dma/built-in.a
  CC      arch/x86/kernel/apic/apic.o
  CC      arch/x86/kernel/kprobes/core.o
  CC      arch/x86/lib/atomic64_32.o
  AR      ipc/built-in.a
  CC      arch/x86/kernel/kprobes/opt.o
  CC      kernel/locking/qrwlock.o
  CC      sound/core/seq/seq_ports.o
  CC      security/keys/proc.o
  CC      io_uring/net.o
  AS      arch/x86/entry/thunk.o
  CC      arch/x86/lib/inat.o
  AR      arch/x86/entry/built-in.a
  CC      arch/x86/pci/common.o
  CC      kernel/module/kmod.o
  CC      sound/core/seq/seq_info.o
  AR      arch/x86/lib/built-in.a
  LDS     arch/x86/kernel/vmlinux.lds
  CC      lib/crypto/mpi/mpi-add.o
  AR      kernel/printk/built-in.a
  AR      arch/x86/lib/lib.a
  AR      crypto/asymmetric_keys/built-in.a
  AR      kernel/entry/built-in.a
  AR      drivers/char/ipmi/built-in.a
  CC      crypto/compress.o
  CC      arch/x86/mm/mmap.o
  CC      drivers/acpi/acpica/dsargs.o
  CC      lib/zstd/zstd_decompress_module.o
  CC      fs/notify/fdinfo.o
  CC      sound/core/init.o
  CC      security/selinux/netif.o
  CC      net/sched/sch_mq.o
  CC      fs/proc/task_mmu.o
  CC      arch/x86/kernel/cpu/mtrr/amd.o
  CC      drivers/pci/hotplug/pci_hotplug_core.o
  CC      kernel/irq/dummychip.o
  AR      kernel/locking/built-in.a
  CC      arch/x86/events/intel/uncore_nhmex.o
  AR      sound/pci/ac97/built-in.a
  AR      sound/pci/ali5451/built-in.a
  AR      sound/pci/asihpi/built-in.a
  AR      sound/pci/au88x0/built-in.a
  CC      kernel/module/tree_lookup.o
  AR      sound/pci/aw2/built-in.a
  AR      sound/pci/ctxfi/built-in.a
  AR      sound/pci/ca0106/built-in.a
  AR      sound/pci/cs46xx/built-in.a
  CC      mm/truncate.o
  CC      fs/iomap/seek.o
  AR      sound/pci/cs5535audio/built-in.a
  CC      arch/x86/mm/pgtable.o
  AR      sound/pci/lola/built-in.a
  AR      sound/pci/lx6464es/built-in.a
  CC      lib/crypto/aes.o
  AR      sound/pci/echoaudio/built-in.a
  AR      sound/pci/emu10k1/built-in.a
  CC      lib/zstd/decompress/huf_decompress.o
  CC      drivers/acpi/acpica/dscontrol.o
  AR      sound/pci/hda/built-in.a
  CC [M]  sound/pci/hda/hda_bind.o
  AR      drivers/video/built-in.a
  CC [M]  sound/pci/hda/hda_codec.o
  CC      fs/kernfs/mount.o
  CC      security/keys/sysctl.o
  CC      arch/x86/kernel/cpu/mce/threshold.o
  CC      fs/sysfs/file.o
  CC      lib/zstd/decompress/zstd_ddict.o
  CC      drivers/pci/pcie/pme.o
  CC      net/sched/sch_frag.o
  CC      sound/core/seq/seq_dummy.o
  CC      crypto/algapi.o
  CC      kernel/rcu/rcu_segcblist.o
  AR      lib/lz4/built-in.a
  CC      kernel/power/swap.o
  CC      lib/crypto/mpi/mpi-bit.o
  CC      kernel/power/user.o
  CC      net/netlink/af_netlink.o
  AR      fs/notify/built-in.a
  CC      security/selinux/netnode.o
  CC      arch/x86/pci/early.o
  CC      kernel/irq/devres.o
  CC      kernel/irq/autoprobe.o
  CC      fs/quota/quota_v2.o
  CC      arch/x86/kernel/cpu/mtrr/cyrix.o
  CC      drivers/acpi/acpica/dsdebug.o
  AR      arch/x86/kernel/kprobes/built-in.a
  CC      sound/core/memory.o
  CC      arch/x86/kernel/cpu/mtrr/centaur.o
  CC      net/core/gen_stats.o
  CC      net/sched/sch_api.o
  CC      kernel/module/kallsyms.o
  CC      fs/iomap/swapfile.o
  CC      security/keys/keyctl_pkey.o
  CC      lib/xz/xz_dec_syms.o
  CC      mm/vmscan.o
  CC      drivers/pci/hotplug/acpi_pcihp.o
  CC      lib/crypto/mpi/mpi-cmp.o
  CC      lib/crypto/arc4.o
  CC      lib/xz/xz_dec_stream.o
  AR      sound/core/seq/built-in.a
  CC      arch/x86/mm/physaddr.o
  CC      lib/crypto/mpi/mpi-sub-ui.o
  CC      drivers/acpi/acpica/dsfield.o
  CC      lib/crypto/mpi/mpi-div.o
  CC      kernel/irq/irqdomain.o
  CC      kernel/futex/core.o
  CC      fs/kernfs/inode.o
  CC      arch/x86/kernel/cpu/microcode/core.o
  AR      drivers/pci/pcie/built-in.a
  CC      fs/sysfs/dir.o
  CC      fs/sysfs/symlink.o
  CC      fs/proc/inode.o
  AR      net/bpf/built-in.a
  AR      sound/ppc/built-in.a
  CC      kernel/cgroup/cgroup.o
  CC      block/blk-flush.o
  CC      arch/x86/pci/bus_numa.o
  CC      kernel/time/hrtimer.o
  CC      arch/x86/pci/amd_bus.o
  CC      arch/x86/kernel/cpu/mtrr/legacy.o
  CC      kernel/trace/trace_clock.o
  CC      arch/x86/events/intel/uncore_snb.o
  CC      fs/quota/quota_tree.o
  CC      net/netlink/genetlink.o
  CC      kernel/futex/syscalls.o
  CC      sound/core/control.o
  AR      drivers/pci/controller/dwc/built-in.a
  AR      drivers/pci/controller/mobiveil/built-in.a
  AR      drivers/pci/controller/plda/built-in.a
  AR      drivers/pci/controller/built-in.a
  CC      arch/x86/kernel/cpu/cacheinfo.o
  AR      sound/pci/ice1712/built-in.a
  CC      kernel/sched/build_policy.o
  CC      drivers/acpi/acpica/dsinit.o
  CC      arch/x86/kernel/apic/apic_common.o
  AR      security/keys/built-in.a
  CC      lib/xz/xz_dec_lzma2.o
  CC      arch/x86/events/intel/uncore_snbep.o
  AR      arch/x86/kernel/cpu/mce/built-in.a
  CC      kernel/cgroup/rstat.o
  CC      io_uring/poll.o
  CC      arch/x86/mm/tlb.o
  CC      lib/zstd/decompress/zstd_decompress.o
  CC      kernel/irq/proc.o
  CC      arch/x86/events/intel/uncore_discovery.o
  CC      fs/proc/root.o
  AR      fs/iomap/built-in.a
  AR      arch/x86/kernel/cpu/mtrr/built-in.a
  CC      net/netlink/policy.o
  CC      kernel/module/procfs.o
  CC      kernel/module/sysfs.o
  CC      kernel/trace/ring_buffer.o
  AR      sound/arm/built-in.a
  CC      crypto/scatterwalk.o
  CC      lib/crypto/mpi/mpi-mod.o
  AR      drivers/pci/hotplug/built-in.a
  CC      net/core/gen_estimator.o
  AR      drivers/pci/switch/built-in.a
  CC      drivers/pci/access.o
  CC      security/selinux/netport.o
  CC      kernel/power/poweroff.o
  CC      arch/x86/kernel/cpu/microcode/intel.o
  CC      fs/sysfs/mount.o
  CC      drivers/acpi/acpica/dsmethod.o
  CC      arch/x86/kernel/cpu/microcode/amd.o
  AS      arch/x86/kernel/head_32.o
  CC      fs/kernfs/dir.o
  CC      drivers/pci/bus.o
  CC      kernel/sched/build_utility.o
  CC      arch/x86/kernel/apic/apic_noop.o
  CC      lib/dim/dim.o
  CC      crypto/proc.o
  AR      arch/x86/pci/built-in.a
  CC      arch/x86/kernel/head32.o
  AR      kernel/power/built-in.a
  CC      block/blk-settings.o
  CC      block/blk-ioc.o
  CC      block/blk-map.o
  CC      fs/quota/quota.o
  AR      kernel/rcu/built-in.a
  CC      security/selinux/status.o
  CC [M]  sound/pci/hda/hda_jack.o
  CC      lib/crypto/mpi/mpi-mul.o
  CC      arch/x86/mm/cpu_entry_area.o
  CC      kernel/futex/pi.o
  CC      lib/xz/xz_dec_bcj.o
  CC      sound/core/misc.o
  CC      kernel/irq/migration.o
  CC      kernel/irq/cpuhotplug.o
  CC      drivers/acpi/acpica/dsmthdat.o
  CC      lib/dim/net_dim.o
  CC      arch/x86/kernel/apic/ipi.o
  CC      fs/proc/base.o
  CC      arch/x86/kernel/cpu/scattered.o
  CC      lib/zstd/decompress/zstd_decompress_block.o
  AR      kernel/module/built-in.a
  CC      io_uring/eventfd.o
  CC      lib/crypto/mpi/mpih-cmp.o
  CC      arch/x86/events/rapl.o
  CC      crypto/aead.o
  CC      fs/kernfs/file.o
  CC      drivers/pci/probe.o
  CC      kernel/time/timekeeping.o
  CC      fs/quota/kqid.o
  CC      fs/sysfs/group.o
  AR      drivers/acpi/pmic/built-in.a
  CC      fs/proc/generic.o
  CC      drivers/acpi/dptf/int340x_thermal.o
  CC      arch/x86/events/msr.o
  CC      net/core/net_namespace.o
  CC      block/blk-merge.o
  CC      kernel/futex/requeue.o
  AR      arch/x86/kernel/cpu/microcode/built-in.a
  CC      drivers/acpi/acpica/dsobject.o
  CC      io_uring/uring_cmd.o
  CC      kernel/time/ntp.o
  AR      lib/xz/built-in.a
  CC      arch/x86/events/intel/cstate.o
  CC      net/ethtool/ioctl.o
  CC      fs/kernfs/symlink.o
  CC      lib/crypto/gf128mul.o
  CC      fs/quota/netlink.o
  CC      net/sched/sch_blackhole.o
  CC      arch/x86/kernel/cpu/topology_common.o
  CC      arch/x86/mm/maccess.o
  CC      arch/x86/mm/pgprot.o
  CC      lib/crypto/blake2s.o
  CC      sound/core/device.o
  CC      io_uring/openclose.o
  AR      net/netlink/built-in.a
  CC      net/sched/cls_api.o
  CC      arch/x86/kernel/apic/vector.o
  CC      lib/crypto/mpi/mpih-div.o
  CC      kernel/irq/pm.o
  CC      lib/crypto/mpi/mpih-mul.o
  CC      lib/zstd/zstd_common_module.o
  AR      drivers/acpi/dptf/built-in.a
  CC      drivers/acpi/acpica/dsopcode.o
  CC      arch/x86/kernel/cpu/topology_ext.o
  CC      kernel/bpf/core.o
  AR      sound/pci/korg1212/built-in.a
  CC      fs/proc/array.o
  CC      net/ethtool/common.o
  CC [M]  sound/pci/hda/hda_auto_parser.o
  CC      security/selinux/ss/ebitmap.o
  CC      crypto/geniv.o
  AR      fs/sysfs/built-in.a
  CC      lib/crypto/blake2s-generic.o
  CC      lib/dim/rdma_dim.o
  CC      kernel/futex/waitwake.o
  CC      drivers/pci/host-bridge.o
  CC      lib/crypto/mpi/mpi-pow.o
  CC      lib/fonts/fonts.o
  CC      net/ethtool/netlink.o
  CC      sound/core/info.o
  CC      fs/proc/fd.o
  CC      kernel/time/clocksource.o
  CC      crypto/lskcipher.o
  CC      io_uring/sqpoll.o
  CC      arch/x86/mm/pgtable_32.o
  CC      lib/fonts/font_8x16.o
  CC      net/sched/act_api.o
  AR      fs/kernfs/built-in.a
  CC      lib/crypto/sha1.o
  CC      drivers/acpi/acpica/dspkginit.o
  CC      crypto/skcipher.o
  CC      arch/x86/kernel/cpu/topology_amd.o
  AR      fs/quota/built-in.a
  CC      net/core/secure_seq.o
  CC      net/netfilter/core.o
  CC      lib/zstd/common/debug.o
  AR      lib/dim/built-in.a
  CC      drivers/acpi/x86/apple.o
  CC      net/ipv4/netfilter/nf_defrag_ipv4.o
  AR      arch/x86/events/intel/built-in.a
  CC      mm/shrinker.o
  AR      arch/x86/events/built-in.a
  CC      drivers/pci/remove.o
  CC      net/ipv4/route.o
  CC      kernel/trace/trace.o
  AR      lib/fonts/built-in.a
  CC      io_uring/xattr.o
  CC      kernel/irq/msi.o
  CC      arch/x86/kernel/cpu/common.o
  CC      net/sched/sch_fifo.o
  CC      net/ethtool/bitset.o
  CC      security/selinux/ss/hashtab.o
  AR      sound/pci/mixart/built-in.a
  CC      drivers/acpi/acpica/dsutils.o
  CC      drivers/pnp/pnpacpi/core.o
  CC      fs/proc/proc_tty.o
  CC      net/xfrm/xfrm_policy.o
  CC      kernel/cgroup/namespace.o
  AR      kernel/futex/built-in.a
  CC      fs/proc/cmdline.o
  CC      drivers/acpi/acpica/dswexec.o
  CC      arch/x86/mm/iomap_32.o
  AR      sound/pci/nm256/built-in.a
  CC      drivers/acpi/acpica/dswload.o
  CC      lib/crypto/mpi/mpiutil.o
  CC      lib/argv_split.o
  AR      drivers/amba/built-in.a
  CC      net/sched/cls_cgroup.o
  CC      block/blk-timeout.o
  CC      drivers/acpi/x86/cmos_rtc.o
  CC [M]  sound/pci/hda/hda_sysfs.o
  CC      sound/core/isadma.o
  CC      net/sched/ematch.o
  CC      drivers/pci/pci.o
  CC      fs/devpts/inode.o
  CC      crypto/seqiv.o
  CC      kernel/time/jiffies.o
  CC      security/selinux/ss/symtab.o
  CC      crypto/echainiv.o
  CC      net/unix/af_unix.o
  CC      kernel/trace/trace_output.o
  CC      sound/core/vmaster.o
  CC      lib/crypto/sha256.o
  CC      arch/x86/mm/hugetlbpage.o
  CC      drivers/acpi/acpica/dswload2.o
  CC      arch/x86/kernel/apic/init.o
  CC      lib/zstd/common/entropy_common.o
  CC      security/selinux/ss/sidtab.o
  CC      drivers/pnp/pnpacpi/rsparser.o
  CC      crypto/ahash.o
  CC      net/xfrm/xfrm_state.o
  CC      mm/shmem.o
  CC      kernel/events/core.o
  CC      fs/proc/consoles.o
  AR      sound/pci/oxygen/built-in.a
  CC      net/unix/garbage.o
  CC      io_uring/nop.o
  CC      kernel/irq/affinity.o
  AR      lib/crypto/mpi/built-in.a
  CC      drivers/acpi/tables.o
  CC      lib/zstd/common/error_private.o
  CC      kernel/time/timer_list.o
  CC      net/core/flow_dissector.o
  CC      lib/zstd/common/fse_decompress.o
  CC      net/ipv4/netfilter/nf_reject_ipv4.o
  CC      drivers/acpi/x86/lpss.o
  CC      net/xfrm/xfrm_hash.o
  CC      block/blk-lib.o
  CC      net/ipv6/netfilter/ip6_tables.o
  CC [M]  sound/pci/hda/hda_controller.o
  CC      net/ipv6/af_inet6.o
  CC      kernel/cgroup/cgroup-v1.o
  CC      drivers/acpi/x86/s2idle.o
  CC      arch/x86/kernel/apic/hw_nmi.o
  CC      sound/core/ctljack.o
  CC      net/ipv4/inetpeer.o
  CC      net/netfilter/nf_log.o
  CC      drivers/acpi/acpica/dswscope.o
  AR      lib/crypto/built-in.a
  CC      block/blk-mq.o
  AR      fs/devpts/built-in.a
  CC      kernel/trace/trace_seq.o
  CC      security/selinux/ss/avtab.o
  CC      kernel/irq/matrix.o
  CC [M]  sound/pci/hda/hda_proc.o
  CC      fs/proc/cpuinfo.o
  CC      arch/x86/mm/dump_pagetables.o
  CC      net/ethtool/strset.o
  CC      lib/zstd/common/zstd_common.o
  CC      arch/x86/kernel/cpu/rdrand.o
  AR      sound/sh/built-in.a
  CC      kernel/events/ring_buffer.o
  CC      kernel/events/callchain.o
  CC      net/unix/sysctl_net_unix.o
  CC      drivers/acpi/acpica/dswstate.o
  AR      lib/zstd/built-in.a
  CC      lib/bug.o
  CC      sound/core/jack.o
  CC      fs/netfs/buffered_read.o
  CC      arch/x86/kernel/cpu/match.o
  CC      kernel/time/timeconv.o
  AR      drivers/pnp/pnpacpi/built-in.a
  CC      io_uring/fs.o
  CC      drivers/pnp/core.o
  AR      net/sched/built-in.a
  CC      arch/x86/kernel/ebda.o
  CC      fs/netfs/buffered_write.o
  CC      crypto/shash.o
  CC      arch/x86/kernel/apic/io_apic.o
  CC      fs/proc/devices.o
  CC      kernel/trace/trace_stat.o
  CC      drivers/pnp/card.o
  CC      drivers/acpi/x86/utils.o
  CC      kernel/time/timecounter.o
  CC      net/core/sysctl_net_core.o
  AR      kernel/bpf/built-in.a
  CC      drivers/pci/pci-driver.o
  CC      fs/ext4/balloc.o
  CC      arch/x86/kernel/cpu/bugs.o
  CC      drivers/acpi/acpica/evevent.o
  CC      drivers/acpi/acpica/evgpe.o
  CC      net/ipv4/protocol.o
  CC      security/selinux/ss/policydb.o
  CC      kernel/time/alarmtimer.o
  AR      kernel/sched/built-in.a
  CC      kernel/fork.o
  CC      arch/x86/mm/highmem_32.o
  CC      lib/buildid.o
  CC      net/ipv4/netfilter/ip_tables.o
  CC      fs/proc/interrupts.o
  CC      mm/util.o
  CC      sound/core/timer.o
  CC      net/netfilter/nf_queue.o
  CC      drivers/pnp/driver.o
  CC      net/ipv6/netfilter/ip6table_filter.o
  CC      drivers/acpi/acpica/evgpeblk.o
  CC      drivers/acpi/x86/blacklist.o
  CC      net/xfrm/xfrm_input.o
  CC      net/ethtool/linkinfo.o
  CC      io_uring/splice.o
  CC      fs/netfs/direct_read.o
  CC      lib/clz_tab.o
  CC      fs/jbd2/transaction.o
  CC      kernel/cgroup/freezer.o
  CC [M]  sound/pci/hda/hda_hwdep.o
  CC      net/xfrm/xfrm_output.o
  CC      fs/ramfs/inode.o
  CC      fs/ext4/bitmap.o
  AR      kernel/irq/built-in.a
  CC      kernel/time/posix-timers.o
  CC      net/netfilter/nf_sockopt.o
  CC      security/selinux/ss/services.o
  CC      block/blk-mq-tag.o
  CC      crypto/akcipher.o
  AR      drivers/clk/actions/built-in.a
  CC      fs/proc/loadavg.o
  AR      drivers/clk/analogbits/built-in.a
  CC      security/selinux/ss/conditional.o
  AR      drivers/clk/bcm/built-in.a
  AR      drivers/clk/imgtec/built-in.a
  AR      drivers/clk/imx/built-in.a
  AR      drivers/clk/ingenic/built-in.a
  AR      drivers/clk/mediatek/built-in.a
  AR      drivers/clk/microchip/built-in.a
  AR      drivers/clk/mstar/built-in.a
  AR      drivers/clk/mvebu/built-in.a
  CC      drivers/acpi/acpica/evgpeinit.o
  AR      arch/x86/mm/built-in.a
  AR      drivers/clk/ralink/built-in.a
  AR      drivers/clk/renesas/built-in.a
  AR      drivers/clk/socfpga/built-in.a
  AR      net/unix/built-in.a
  CC      net/core/dev.o
  CC      fs/ext4/block_validity.o
  CC      net/ipv4/netfilter/iptable_filter.o
  AR      drivers/clk/sophgo/built-in.a
  AR      drivers/acpi/x86/built-in.a
  AR      drivers/clk/starfive/built-in.a
  AR      drivers/clk/sprd/built-in.a
  CC      io_uring/sync.o
  CC      net/netfilter/utils.o
  AR      drivers/clk/sunxi-ng/built-in.a
  AR      drivers/clk/ti/built-in.a
  CC      arch/x86/kernel/apic/msi.o
  AR      drivers/clk/versatile/built-in.a
  CC      lib/cmdline.o
  AR      drivers/clk/xilinx/built-in.a
  AR      drivers/clk/built-in.a
  CC      fs/netfs/direct_write.o
  CC      drivers/pnp/resource.o
  CC      drivers/pnp/manager.o
  CC      lib/cpumask.o
  CC [M]  sound/pci/hda/patch_hdmi.o
  CC      net/ipv6/anycast.o
  CC      kernel/exec_domain.o
  CC      fs/hugetlbfs/inode.o
  CC      net/ipv6/ip6_output.o
  CC      drivers/acpi/acpica/evgpeutil.o
  CC      drivers/pci/search.o
  CC      fs/ramfs/file-mmu.o
  CC      mm/mmzone.o
  CC      drivers/pci/rom.o
  CC      fs/proc/meminfo.o
  CC [M]  sound/pci/hda/hda_eld.o
  CC      net/ethtool/linkmodes.o
  AR      sound/pci/pcxhr/built-in.a
  CC      net/netfilter/nfnetlink.o
  CC      fs/fat/cache.o
  CC      kernel/cgroup/legacy_freezer.o
  CC      kernel/trace/trace_printk.o
  CC      kernel/trace/pid_list.o
  CC      arch/x86/kernel/apic/probe_32.o
  CC      net/ipv6/netfilter/ip6table_mangle.o
  CC      crypto/sig.o
  AR      sound/synth/emux/built-in.a
  AR      sound/synth/built-in.a
  CC      sound/core/hrtimer.o
  CC      arch/x86/kernel/cpu/aperfmperf.o
  CC      drivers/acpi/acpica/evglock.o
  CC      block/blk-stat.o
  CC      lib/ctype.o
  CC      net/core/dev_addr_lists.o
  CC      lib/dec_and_lock.o
  CC      mm/vmstat.o
  CC      fs/ext4/dir.o
  CC      net/ipv4/ip_input.o
  CC      kernel/events/hw_breakpoint.o
  CC      io_uring/msg_ring.o
  CC      net/netfilter/nfnetlink_log.o
  CC      kernel/time/posix-cpu-timers.o
  CC      kernel/cgroup/pids.o
  CC      lib/decompress.o
  CC      arch/x86/kernel/platform-quirks.o
  CC      net/ipv4/netfilter/iptable_mangle.o
  AR      fs/ramfs/built-in.a
  CC      lib/decompress_bunzip2.o
  CC      net/ipv4/ip_fragment.o
  CC      lib/decompress_inflate.o
  CC      fs/jbd2/commit.o
  AR      arch/x86/kernel/apic/built-in.a
  CC      fs/fat/dir.o
  CC      fs/netfs/iterator.o
  CC      drivers/pci/setup-res.o
  CC      drivers/acpi/acpica/evhandler.o
  CC      drivers/pnp/support.o
  CC      fs/proc/stat.o
  CC      net/packet/af_packet.o
  CC      net/xfrm/xfrm_sysctl.o
  CC      sound/core/seq_device.o
  CC      kernel/panic.o
  CC      drivers/pci/irq.o
  CC      drivers/pci/vpd.o
  CC      net/ipv6/netfilter/nf_defrag_ipv6_hooks.o
  CC      net/core/dst.o
  CC      net/ethtool/rss.o
  CC      arch/x86/kernel/cpu/cpuid-deps.o
  CC      crypto/kpp.o
  CC      block/blk-mq-sysfs.o
  AR      sound/pci/riptide/built-in.a
  CC      arch/x86/kernel/cpu/umwait.o
  CC      kernel/trace/trace_sched_switch.o
  CC      kernel/events/uprobes.o
  CC      drivers/acpi/acpica/evmisc.o
  CC      fs/ext4/ext4_jbd2.o
  CC      lib/decompress_unlz4.o
  CC      kernel/cgroup/rdma.o
  CC      drivers/pci/setup-bus.o
  CC      mm/backing-dev.o
  AR      fs/hugetlbfs/built-in.a
  CC      io_uring/advise.o
  AR      sound/pci/rme9652/built-in.a
  CC      kernel/trace/trace_nop.o
  CC      fs/ext4/extents.o
  CC      drivers/pnp/interface.o
  CC      fs/fat/fatent.o
  CC      fs/fat/file.o
  CC      fs/ext4/extents_status.o
  MKCAP   arch/x86/kernel/cpu/capflags.c
  CC [M]  sound/core/hwdep.o
  CC      fs/proc/uptime.o
  CC      kernel/time/posix-clock.o
  CC      block/blk-mq-cpumap.o
  CC      block/blk-mq-sched.o
  CC      net/ipv6/netfilter/nf_conntrack_reasm.o
  CC      drivers/acpi/acpica/evregion.o
  CC      net/ipv6/netfilter/nf_reject_ipv6.o
  CC      block/ioctl.o
  CC      net/core/netevent.o
  CC      net/ipv4/netfilter/ipt_REJECT.o
  CC      lib/decompress_unlzma.o
  CC      mm/mm_init.o
  CC      arch/x86/kernel/process_32.o
  CC      fs/netfs/locking.o
  CC      kernel/cgroup/cpuset.o
  CC      drivers/acpi/acpica/evrgnini.o
  CC      drivers/acpi/acpica/evsci.o
  CC      net/xfrm/xfrm_replay.o
  CC      security/selinux/ss/mls.o
  ASN.1   crypto/rsapubkey.asn1.[ch]
  ASN.1   crypto/rsaprivkey.asn1.[ch]
  CC      crypto/rsa.o
  CC      fs/fat/inode.o
  CC      kernel/cpu.o
  CC [M]  sound/pci/hda/hda_intel.o
  CC      drivers/pnp/quirks.o
  CC      fs/isofs/namei.o
  CC      net/netfilter/nf_conntrack_core.o
  AR      net/dsa/built-in.a
  CC      fs/proc/util.o
  CC      fs/jbd2/recovery.o
  CC      fs/ext4/file.o
  CC      lib/decompress_unlzo.o
  CC      net/ethtool/linkstate.o
  CC      io_uring/epoll.o
  CC      fs/isofs/inode.o
  CC [M]  sound/core/pcm.o
  AR      sound/pci/trident/built-in.a
  CC [M]  net/ipv4/netfilter/iptable_nat.o
  CC      kernel/trace/blktrace.o
  CC      fs/nfs/client.o
  CC      fs/netfs/main.o
  CC      kernel/time/itimer.o
  CC      drivers/acpi/acpica/evxface.o
  CC      kernel/time/clockevents.o
  CC      kernel/time/tick-common.o
  CC      kernel/time/tick-broadcast.o
  CC      fs/netfs/misc.o
  CC      fs/isofs/dir.o
  CC      fs/proc/version.o
  CC      crypto/rsa_helper.o
  CC      net/ipv4/ip_forward.o
  CC      net/netfilter/nf_conntrack_standalone.o
  CC      lib/decompress_unxz.o
  CC      drivers/acpi/osi.o
  CC      block/genhd.o
  CC      net/xfrm/xfrm_device.o
  CC      drivers/pnp/system.o
  CC      crypto/rsa-pkcs1pad.o
  AR      sound/pci/ymfpci/built-in.a
  CC      crypto/acompress.o
  CC      lib/decompress_unzstd.o
  CC      kernel/time/tick-broadcast-hrtimer.o
  CC      drivers/acpi/acpica/evxfevnt.o
  CC      drivers/pci/vc.o
  CC      fs/nfs/dir.o
  CC      kernel/cgroup/misc.o
  CC      net/ipv6/netfilter/ip6t_ipv6header.o
  CC      io_uring/statx.o
  CC      mm/percpu.o
  CC      fs/jbd2/checkpoint.o
  CC      security/selinux/ss/context.o
  LD [M]  sound/pci/hda/snd-hda-codec.o
  CC      fs/fat/misc.o
  CC [M]  sound/core/pcm_native.o
  CC      kernel/time/tick-oneshot.o
  CC      fs/proc/softirqs.o
  CC      net/ipv6/netfilter/ip6t_REJECT.o
  CC      fs/jbd2/revoke.o
  AR      drivers/pnp/built-in.a
  CC      fs/netfs/objects.o
  CC      fs/exportfs/expfs.o
  CC      net/ethtool/debug.o
  CC      fs/isofs/util.o
  CC      drivers/acpi/acpica/evxfgpe.o
  CC      fs/nfs/file.o
  CC      fs/netfs/read_collect.o
  AR      net/ipv4/netfilter/built-in.a
  CC      net/core/neighbour.o
  CC      kernel/exit.o
  CC      drivers/dma/dw/core.o
  AR      drivers/soc/apple/built-in.a
  AR      drivers/soc/aspeed/built-in.a
  CC      lib/dump_stack.o
  AR      drivers/soc/bcm/built-in.a
  AR      drivers/soc/fsl/built-in.a
  AR      drivers/soc/fujitsu/built-in.a
  AR      drivers/soc/hisilicon/built-in.a
  AR      drivers/soc/imx/built-in.a
  AR      drivers/soc/ixp4xx/built-in.a
  AR      drivers/soc/loongson/built-in.a
  AR      drivers/soc/mediatek/built-in.a
  AR      drivers/soc/microchip/built-in.a
  AR      drivers/soc/nuvoton/built-in.a
  AR      drivers/soc/pxa/built-in.a
  LD [M]  sound/pci/hda/snd-hda-codec-hdmi.o
  LD [M]  sound/pci/hda/snd-hda-intel.o
  AR      drivers/soc/amlogic/built-in.a
  AR      sound/pci/vx222/built-in.a
  AR      sound/pci/built-in.a
  AR      drivers/soc/qcom/built-in.a
  CC      net/ethtool/wol.o
  AR      drivers/soc/renesas/built-in.a
  CC      mm/slab_common.o
  CC      net/xfrm/xfrm_nat_keepalive.o
  AR      drivers/soc/rockchip/built-in.a
  AR      drivers/soc/sunxi/built-in.a
  AR      drivers/soc/ti/built-in.a
  CC      arch/x86/kernel/cpu/powerflags.o
  CC      fs/ext4/fsmap.o
  AR      drivers/soc/versatile/built-in.a
  AR      drivers/soc/xilinx/built-in.a
  AR      drivers/soc/built-in.a
  CC      kernel/cgroup/debug.o
  CC      crypto/scompress.o
  CC [M]  sound/core/pcm_lib.o
  CC      kernel/time/tick-sched.o
  CC      io_uring/timeout.o
  CC      drivers/pci/mmap.o
  CC      drivers/acpi/acpica/evxfregn.o
  CC      fs/proc/namespaces.o
  CC      block/ioprio.o
  CC      net/ethtool/features.o
  CC      kernel/trace/trace_events.o
  AR      fs/exportfs/built-in.a
  CC      net/core/rtnetlink.o
  CC      fs/fat/nfs.o
  CC      fs/isofs/rock.o
  CC      net/core/utils.o
  CC      net/sunrpc/auth_gss/auth_gss.o
  CC      fs/proc/self.o
  CC      fs/proc/thread_self.o
  CC      net/ipv4/ip_options.o
  CC      net/sunrpc/clnt.o
  AR      kernel/events/built-in.a
  CC      drivers/dma/hsu/hsu.o
  CC      fs/lockd/clntlock.o
  CC      lib/earlycpio.o
  AR      drivers/dma/idxd/built-in.a
  AR      net/packet/built-in.a
  CC      net/core/link_watch.o
  CC      drivers/pci/devres.o
  CC      security/selinux/netlabel.o
  CC      net/sunrpc/xprt.o
  CC      drivers/acpi/acpica/exconcat.o
  CC      lib/extable.o
  CC      fs/jbd2/journal.o
  CC      mm/compaction.o
  CC      arch/x86/kernel/cpu/topology.o
  CC      drivers/acpi/osl.o
  CC      drivers/pci/proc.o
  AR      net/ipv6/netfilter/built-in.a
  CC      net/ipv6/ip6_input.o
  CC      net/netfilter/nf_conntrack_expect.o
  AR      kernel/cgroup/built-in.a
  CC      arch/x86/kernel/cpu/proc.o
  CC      kernel/trace/trace_export.o
  CC      drivers/dma/dw/dw.o
  CC      fs/isofs/export.o
  CC      fs/netfs/read_pgpriv2.o
  CC      fs/netfs/read_retry.o
  CC      crypto/algboss.o
  CC      drivers/acpi/acpica/exconfig.o
  CC      fs/proc/proc_sysctl.o
  CC      lib/flex_proportions.o
  CC      block/badblocks.o
  CC      fs/fat/namei_vfat.o
  CC      net/xfrm/xfrm_algo.o
  CC      kernel/time/timer_migration.o
  CC      io_uring/fdinfo.o
  CC      net/ethtool/privflags.o
  AR      drivers/dma/hsu/built-in.a
  CC      drivers/acpi/acpica/exconvrt.o
  CC      kernel/softirq.o
  CC      arch/x86/kernel/signal.o
  CC      fs/ext4/fsync.o
  CC      lib/idr.o
  CC      fs/netfs/write_collect.o
  CC [M]  sound/core/pcm_misc.o
  CC      kernel/resource.o
  CC      drivers/dma/dw/idma32.o
  AR      sound/usb/misc/built-in.a
  AR      sound/usb/usx2y/built-in.a
  AR      sound/usb/caiaq/built-in.a
  CC      fs/isofs/joliet.o
  AR      sound/usb/6fire/built-in.a
  AR      sound/usb/hiface/built-in.a
  AR      sound/usb/bcd2000/built-in.a
  AR      sound/usb/built-in.a
  CC      drivers/pci/pci-sysfs.o
  CC      net/ethtool/rings.o
  CC      fs/isofs/compress.o
  CC      net/ipv4/ip_output.o
  CC      fs/lockd/clntproc.o
  CC      net/netfilter/nf_conntrack_helper.o
  AR      net/wireless/tests/built-in.a
  CC      net/wireless/core.o
  CC      kernel/sysctl.o
  CC      drivers/acpi/acpica/excreate.o
  AR      security/selinux/built-in.a
  CC      net/wireless/sysfs.o
  AR      security/built-in.a
  CC      fs/fat/namei_msdos.o
  CC      drivers/acpi/utils.o
  CC      lib/irq_regs.o
  CC      crypto/testmgr.o
  CC      lib/is_single_threaded.o
  CC      net/sunrpc/socklib.o
  CC      fs/proc/proc_net.o
  CC      block/blk-rq-qos.o
  CC      arch/x86/kernel/cpu/feat_ctl.o
  CC      kernel/trace/trace_event_perf.o
  AR      drivers/dma/amd/built-in.a
  CC      fs/ext4/hash.o
  CC      drivers/acpi/acpica/exdebug.o
  CC      fs/nls/nls_base.o
  CC      io_uring/cancel.o
  AR      fs/unicode/built-in.a
  CC      net/core/filter.o
  CC      fs/proc/kcore.o
  CC      net/sunrpc/xprtsock.o
  CC      net/sunrpc/auth_gss/gss_generic_token.o
  CC      crypto/cmac.o
  CC [M]  sound/core/pcm_memory.o
  CC      drivers/dma/dw/acpi.o
  CC      lib/klist.o
  CC      net/xfrm/xfrm_user.o
  CC      drivers/virtio/virtio.o
  CC      net/ipv6/addrconf.o
  CC      arch/x86/kernel/cpu/intel.o
  CC      drivers/virtio/virtio_ring.o
  CC      drivers/acpi/acpica/exdump.o
  AR      fs/isofs/built-in.a
  CC      net/sunrpc/sched.o
  CC      fs/proc/vmcore.o
  CC      fs/autofs/init.o
  AR      sound/firewire/built-in.a
  CC      net/wireless/radiotap.o
  CC [M]  sound/core/memalloc.o
  CC      kernel/time/vsyscall.o
  CC      fs/nfs/getroot.o
  AR      sound/sparc/built-in.a
  CC      fs/nfs/inode.o
  CC      fs/nls/nls_cp437.o
  CC      lib/kobject.o
  CC      net/ethtool/channels.o
  CC      net/ethtool/coalesce.o
  CC      crypto/hmac.o
  CC      fs/netfs/write_issue.o
  AR      fs/fat/built-in.a
  CC [M]  sound/core/pcm_timer.o
  CC      block/disk-events.o
  CC      drivers/pci/slot.o
  CC      block/blk-ia-ranges.o
  CC      drivers/acpi/acpica/exfield.o
  CC      block/early-lookup.o
  CC      io_uring/waitid.o
  AR      drivers/dma/dw/built-in.a
  AR      drivers/dma/mediatek/built-in.a
  CC      kernel/trace/trace_events_filter.o
  AR      drivers/dma/qcom/built-in.a
  AR      drivers/dma/stm32/built-in.a
  AR      drivers/dma/ti/built-in.a
  AR      drivers/dma/xilinx/built-in.a
  CC      net/wireless/util.o
  CC      drivers/dma/dmaengine.o
  CC      net/sunrpc/auth_gss/gss_mech_switch.o
  CC      drivers/virtio/virtio_anchor.o
  CC      fs/nls/nls_ascii.o
  CC      kernel/time/timekeeping_debug.o
  CC      net/netfilter/nf_conntrack_proto.o
  CC      fs/lockd/clntxdr.o
  CC      fs/ext4/ialloc.o
  LD [M]  sound/core/snd-hwdep.o
  CC      drivers/acpi/reboot.o
  AR      fs/jbd2/built-in.a
  CC      fs/nfs/super.o
  CC      net/ipv6/addrlabel.o
  CC      lib/kobject_uevent.o
  CC      mm/show_mem.o
  CC      drivers/dma/virt-dma.o
  CC      fs/autofs/inode.o
  CC      drivers/acpi/acpica/exfldio.o
  CC      net/ethtool/pause.o
  CC      crypto/crypto_null.o
  CC      fs/nls/nls_iso8859-1.o
  CC      kernel/capability.o
  CC      drivers/pci/pci-acpi.o
  AR      sound/core/built-in.a
  LD [M]  sound/core/snd-pcm.o
  AR      sound/spi/built-in.a
  AR      sound/parisc/built-in.a
  AR      sound/pcmcia/vx/built-in.a
  CC      fs/nfs/io.o
  AR      sound/pcmcia/pdaudiocf/built-in.a
  AR      sound/pcmcia/built-in.a
  CC      drivers/acpi/nvs.o
  CC      net/sunrpc/auth_gss/svcauth_gss.o
  CC      arch/x86/kernel/signal_32.o
  AR      sound/mips/built-in.a
  CC      fs/autofs/root.o
  CC      arch/x86/kernel/cpu/tsx.o
  CC      fs/proc/kmsg.o
  AR      sound/soc/built-in.a
  AR      sound/atmel/built-in.a
  CC      net/wireless/reg.o
  AR      sound/hda/built-in.a
  CC [M]  sound/hda/hda_bus_type.o
  CC      block/bounce.o
  CC      net/sunrpc/auth_gss/gss_rpc_upcall.o
  CC      kernel/time/namespace.o
  CC      fs/ext4/indirect.o
  CC      mm/interval_tree.o
  CC      fs/nls/nls_utf8.o
  CC      drivers/acpi/acpica/exmisc.o
  CC      net/ethtool/eee.o
  CC      net/ipv4/ip_sockglue.o
  CC      io_uring/register.o
  CC      lib/logic_pio.o
  CC      mm/list_lru.o
  AR      fs/netfs/built-in.a
  CC      net/sunrpc/auth.o
  CC [M]  sound/hda/hdac_bus.o
  CC      crypto/md5.o
  CC      arch/x86/kernel/cpu/intel_epb.o
  CC      fs/proc/page.o
  CC      drivers/tty/vt/vt_ioctl.o
  AR      fs/nls/built-in.a
  CC      arch/x86/kernel/traps.o
  CC      fs/lockd/host.o
  CC      drivers/acpi/acpica/exmutex.o
  CC      drivers/acpi/acpica/exnames.o
  CC      drivers/virtio/virtio_pci_modern_dev.o
  CC      drivers/virtio/virtio_pci_legacy_dev.o
  CC      drivers/dma/acpi-dma.o
  CC      drivers/acpi/acpica/exoparg1.o
  CC      io_uring/truncate.o
  CC      kernel/trace/trace_events_trigger.o
  CC      arch/x86/kernel/idt.o
  CC      kernel/ptrace.o
  CC      fs/ext4/inline.o
  AR      kernel/time/built-in.a
  CC      net/netfilter/nf_conntrack_proto_generic.o
  CC      net/sunrpc/auth_null.o
  CC      fs/nfs/direct.o
  CC      block/bsg.o
  CC      arch/x86/kernel/cpu/amd.o
  CC      fs/autofs/symlink.o
  CC      drivers/pci/iomap.o
  CC      block/blk-cgroup.o
  CC      crypto/sha256_generic.o
  CC      lib/maple_tree.o
  CC      kernel/user.o
  CC      arch/x86/kernel/irq.o
  CC      net/core/sock_diag.o
  CC      arch/x86/kernel/irq_32.o
  CC      kernel/trace/trace_eprobe.o
  AR      sound/x86/built-in.a
  CC      net/ipv6/route.o
  AR      net/xfrm/built-in.a
  CC      drivers/acpi/acpica/exoparg2.o
  CC      fs/lockd/svc.o
  CC      mm/workingset.o
  CC      net/ipv4/inet_hashtables.o
  CC [M]  sound/hda/hdac_device.o
  CC      net/ethtool/tsinfo.o
  AR      fs/proc/built-in.a
  CC      kernel/signal.o
  CC      net/wireless/scan.o
  AR      drivers/dma/built-in.a
  CC      io_uring/memmap.o
  CC      arch/x86/kernel/cpu/hygon.o
  CC      drivers/virtio/virtio_pci_modern.o
  CC      crypto/sha512_generic.o
  CC      drivers/virtio/virtio_pci_common.o
  CC      drivers/pci/quirks.o
  AR      sound/xen/built-in.a
  CC      drivers/tty/vt/vc_screen.o
  CC      fs/ext4/inode.o
  CC      drivers/acpi/acpica/exoparg3.o
  CC      block/blk-ioprio.o
  CC      drivers/tty/hvc/hvc_console.o
  CC      fs/autofs/waitq.o
  CC      drivers/tty/serial/8250/8250_core.o
  CC      kernel/trace/trace_kprobe.o
  CC      arch/x86/kernel/cpu/centaur.o
  CC      drivers/acpi/acpica/exoparg6.o
  CC      net/ipv4/inet_timewait_sock.o
  CC      lib/memcat_p.o
  CC      net/ipv4/inet_connection_sock.o
  CC      lib/nmi_backtrace.o
  CC      net/netfilter/nf_conntrack_proto_tcp.o
  CC      net/ethtool/cabletest.o
  CC      mm/debug.o
  CC      fs/9p/vfs_super.o
  CC      fs/ext4/ioctl.o
  CC      net/sunrpc/auth_gss/gss_rpc_xdr.o
  CC      net/core/dev_ioctl.o
  CC      fs/ext4/mballoc.o
  CC      io_uring/io-wq.o
  CC      drivers/acpi/wakeup.o
  CC      arch/x86/kernel/cpu/transmeta.o
  CC      drivers/acpi/acpica/exprep.o
  CC      crypto/sha3_generic.o
  CC      drivers/pci/pci-label.o
  CC      drivers/tty/serial/8250/8250_platform.o
  CC [M]  sound/hda/hdac_sysfs.o
  CC      drivers/tty/serial/8250/8250_pnp.o
  CC      drivers/tty/serial/serial_core.o
  CC      arch/x86/kernel/cpu/zhaoxin.o
  CC      fs/lockd/svclock.o
  CC      fs/autofs/expire.o
  CC      fs/nfs/pagelist.o
  CC      net/ipv6/ip6_fib.o
  CC      fs/nfs/read.o
  CC      drivers/virtio/virtio_pci_legacy.o
  CC      block/blk-iolatency.o
  AR      drivers/tty/hvc/built-in.a
  CC      fs/9p/vfs_inode.o
  CC      drivers/tty/vt/selection.o
  CC      fs/nfs/symlink.o
  CC      drivers/acpi/acpica/exregion.o
  CC      fs/9p/vfs_inode_dotl.o
  CC      fs/9p/vfs_addr.o
  CC      crypto/ecb.o
  CC      fs/nfs/unlink.o
  CC      fs/nfs/write.o
  CC      arch/x86/kernel/cpu/vortex.o
  CC      mm/gup.o
  CC      arch/x86/kernel/cpu/perfctr-watchdog.o
  CC      fs/nfs/namespace.o
  CC      arch/x86/kernel/cpu/vmware.o
  CC      fs/ext4/migrate.o
  CC      net/sunrpc/auth_gss/trace.o
  CC      drivers/acpi/acpica/exresnte.o
  AR      sound/virtio/built-in.a
  CC      net/sunrpc/auth_gss/gss_krb5_mech.o
  CC      drivers/tty/serial/8250/8250_rsa.o
  CC      lib/objpool.o
  CC [M]  sound/hda/hdac_regmap.o
  CC      net/ethtool/tunnels.o
  CC      drivers/virtio/virtio_pci_admin_legacy_io.o
  CC      fs/autofs/dev-ioctl.o
  AR      net/mac80211/tests/built-in.a
  CC      net/mac80211/main.o
  CC      crypto/cbc.o
  CC      io_uring/futex.o
  CC      net/netlabel/netlabel_user.o
  CC      drivers/tty/vt/keyboard.o
  CC      drivers/acpi/acpica/exresolv.o
  CC      fs/lockd/svcshare.o
  CC      net/wireless/nl80211.o
  CC      drivers/pci/vgaarb.o
  CC      net/netfilter/nf_conntrack_proto_udp.o
  CC      arch/x86/kernel/cpu/hypervisor.o
  CC      drivers/tty/serial/serial_base_bus.o
  CC      io_uring/napi.o
  CC      kernel/trace/error_report-traces.o
  CC      drivers/tty/serial/8250/8250_port.o
  CC      net/netlabel/netlabel_kapi.o
  CC      crypto/ctr.o
  CC      arch/x86/kernel/dumpstack_32.o
  AR      drivers/tty/ipwireless/built-in.a
  CC      drivers/char/hw_random/core.o
  CC      net/netlabel/netlabel_domainhash.o
  CC      drivers/acpi/sleep.o
  CC      fs/9p/vfs_file.o
  CC [M]  sound/hda/hdac_controller.o
  CC      drivers/acpi/acpica/exresop.o
  CC      block/blk-iocost.o
  CC      drivers/virtio/virtio_input.o
  CC      arch/x86/kernel/cpu/mshyperv.o
  CC      net/ipv4/tcp.o
  CC      net/sunrpc/auth_gss/gss_krb5_seal.o
  CC      net/mac80211/status.o
  AR      fs/autofs/built-in.a
  CC      net/ethtool/fec.o
  CC      fs/lockd/svcproc.o
  CC      lib/plist.o
  CC      net/ethtool/eeprom.o
  CC      net/ipv6/ipv6_sockglue.o
  CC      crypto/gcm.o
  CC      drivers/char/hw_random/intel-rng.o
  CC      drivers/acpi/acpica/exserial.o
  AR      drivers/iommu/amd/built-in.a
  AR      drivers/iommu/intel/built-in.a
  AR      drivers/iommu/arm/arm-smmu/built-in.a
  AR      drivers/iommu/iommufd/built-in.a
  CC [M]  sound/hda/hdac_stream.o
  AR      drivers/iommu/arm/arm-smmu-v3/built-in.a
  AR      drivers/iommu/arm/built-in.a
  CC      net/sunrpc/auth_tls.o
  CC      drivers/iommu/iommu.o
  CC      drivers/char/agp/backend.o
  CC      kernel/trace/power-traces.o
  CC      arch/x86/kernel/cpu/debugfs.o
  CC      drivers/virtio/virtio_dma_buf.o
  CC [M]  sound/hda/array.o
  CC      net/core/tso.o
  CC      fs/9p/vfs_dir.o
  CC      net/core/sock_reuseport.o
  CC      drivers/char/agp/generic.o
  AR      drivers/pci/built-in.a
  CC      drivers/iommu/iommu-traces.o
  CC      kernel/trace/rpm-traces.o
  AR      fs/hostfs/built-in.a
  CC      net/rfkill/core.o
  CC      drivers/acpi/acpica/exstore.o
  CC      drivers/tty/vt/vt.o
  CC      net/netfilter/nf_conntrack_proto_icmp.o
  CC      net/mac80211/driver-ops.o
  CC      fs/nfs/mount_clnt.o
  CC      net/sunrpc/auth_unix.o
  CC      mm/mmap_lock.o
  CC      drivers/char/hw_random/amd-rng.o
  CC      arch/x86/kernel/time.o
  CC      net/wireless/mlme.o
  CC      drivers/tty/tty_io.o
  CC      net/netlabel/netlabel_addrlist.o
  CC      arch/x86/kernel/cpu/capflags.o
  AR      arch/x86/kernel/cpu/built-in.a
  CC      lib/radix-tree.o
  CC      drivers/tty/serial/8250/8250_dma.o
  AR      io_uring/built-in.a
  CC      fs/ext4/mmp.o
  CC      fs/debugfs/inode.o
  AR      drivers/virtio/built-in.a
  CC      drivers/acpi/device_sysfs.o
  CC      drivers/acpi/acpica/exstoren.o
  CC      net/ethtool/stats.o
  CC      net/core/fib_notifier.o
  CC      net/sunrpc/auth_gss/gss_krb5_unseal.o
  CC      crypto/ccm.o
  CC      fs/9p/vfs_dentry.o
  CC      fs/lockd/svcsubs.o
  CC      drivers/acpi/acpica/exstorob.o
  CC [M]  sound/hda/hdmi_chmap.o
  CC      arch/x86/kernel/ioport.o
  CC      fs/lockd/mon.o
  CC      fs/debugfs/file.o
  CC      net/9p/mod.o
  CC      mm/highmem.o
  CC      net/netfilter/nf_conntrack_extend.o
  CC      drivers/tty/serial/8250/8250_dwlib.o
  CC      drivers/char/hw_random/geode-rng.o
  CC      net/rfkill/input.o
  CC      drivers/acpi/acpica/exsystem.o
  CC      fs/ext4/move_extent.o
  CC      net/ipv4/tcp_input.o
  CC      drivers/char/agp/isoch.o
  CC      net/ipv6/ndisc.o
  CC      mm/memory.o
  CC      drivers/iommu/iommu-sysfs.o
  CC      drivers/tty/serial/serial_ctrl.o
  CC      drivers/char/hw_random/via-rng.o
  CC      fs/9p/v9fs.o
  CC      net/ethtool/phc_vclocks.o
  CC      net/core/xdp.o
  CC      lib/ratelimit.o
  CC      fs/lockd/trace.o
  CC      fs/nfs/nfstrace.o
  CC      net/ipv4/tcp_output.o
  CC      drivers/char/agp/amd64-agp.o
  CC      drivers/acpi/acpica/extrace.o
  CC      drivers/char/mem.o
  CC      kernel/trace/trace_dynevent.o
  CC      arch/x86/kernel/dumpstack.o
  CC      net/dns_resolver/dns_key.o
  CC      net/9p/client.o
  CC      net/9p/error.o
  CC      net/sunrpc/svc.o
  CC      net/netlabel/netlabel_mgmt.o
  CC      crypto/aes_generic.o
  CC      net/sunrpc/auth_gss/gss_krb5_wrap.o
  CC [M]  sound/hda/trace.o
  AR      net/rfkill/built-in.a
  CC      net/ipv6/udp.o
  CC      drivers/tty/serial/8250/8250_pcilib.o
  CC      drivers/tty/n_tty.o
  AR      drivers/char/hw_random/built-in.a
  CC      drivers/iommu/dma-iommu.o
  COPY    drivers/tty/vt/defkeymap.c
  CC      drivers/char/agp/intel-agp.o
  CC      lib/rbtree.o
  CC      block/mq-deadline.o
  CC      net/9p/protocol.o
  CC      fs/9p/fid.o
  CC      sound/sound_core.o
  CC      drivers/acpi/acpica/exutils.o
  CC      fs/9p/xattr.o
  CC      net/handshake/alert.o
  CC      block/kyber-iosched.o
  CC      net/wireless/ibss.o
  CC      lib/seq_buf.o
  CC      net/devres.o
  CC      net/netlabel/netlabel_unlabeled.o
  CC      drivers/tty/vt/consolemap.o
  CC      net/netfilter/nf_conntrack_acct.o
  AR      fs/debugfs/built-in.a
  CC      net/sunrpc/auth_gss/gss_krb5_crypto.o
  CC      drivers/acpi/device_pm.o
  CC      fs/lockd/xdr.o
  CC      net/dns_resolver/dns_query.o
  CC      fs/ext4/namei.o
  CC      crypto/crc32c_generic.o
  CC      drivers/acpi/acpica/hwacpi.o
  CC      net/ethtool/mm.o
  CC      net/ipv6/udplite.o
  CC      arch/x86/kernel/nmi.o
  CC      drivers/char/random.o
  CC      kernel/trace/trace_probe.o
  CC      drivers/tty/serial/8250/8250_early.o
  CC      fs/nfs/export.o
  CC      crypto/authenc.o
  CC      drivers/char/agp/intel-gtt.o
  CC      drivers/tty/serial/serial_port.o
  AR      fs/9p/built-in.a
  CC      net/wireless/sme.o
  CC      net/9p/trans_common.o
  CC      net/netlabel/netlabel_cipso_v4.o
  CC      lib/siphash.o
  CC      drivers/acpi/acpica/hwesleep.o
  CC      drivers/acpi/proc.o
  CC      net/sunrpc/auth_gss/gss_krb5_keys.o
  CC [M]  sound/hda/hdac_component.o
  CC      kernel/trace/trace_uprobe.o
  CC      drivers/iommu/iova.o
  CC      net/core/flow_offload.o
  AR      net/dns_resolver/built-in.a
  CC      kernel/trace/rethook.o
  CC      mm/mincore.o
  HOSTCC  drivers/tty/vt/conmakehash
  CC      drivers/acpi/acpica/hwgpe.o
  CC      lib/string.o
  CC      drivers/tty/serial/8250/8250_exar.o
  CC      net/ipv4/tcp_timer.o
  CC      net/netfilter/nf_conntrack_seqadj.o
  CC      net/core/gro.o
  AR      drivers/gpu/host1x/built-in.a
  CC      net/sunrpc/svcsock.o
  AR      drivers/gpu/vga/built-in.a
  CC      drivers/acpi/bus.o
  CC      drivers/tty/vt/defkeymap.o
  CC      drivers/char/misc.o
  CC      net/handshake/genl.o
  CC      net/mac80211/sta_info.o
  CC      lib/timerqueue.o
  CC      lib/union_find.o
  CC      fs/lockd/clnt4xdr.o
  AR      drivers/gpu/drm/tests/built-in.a
  CC      net/ethtool/module.o
  AR      drivers/gpu/drm/arm/built-in.a
  CC      fs/lockd/xdr4.o
  CC      net/wireless/chan.o
  CC      arch/x86/kernel/ldt.o
  CC      net/9p/trans_fd.o
  CC      drivers/gpu/drm/display/drm_display_helper_mod.o
  CC      net/mac80211/wep.o
  CC      fs/tracefs/inode.o
  CC [M]  sound/hda/hdac_i915.o
  CC      net/socket.o
  CONMK   drivers/tty/vt/consolemap_deftbl.c
  CC      drivers/tty/vt/consolemap_deftbl.o
  CC      crypto/authencesn.o
  AR      drivers/tty/vt/built-in.a
  CC      net/sunrpc/svcauth.o
  CC      net/sunrpc/svcauth_unix.o
  CC      drivers/acpi/acpica/hwregs.o
  CC      lib/vsprintf.o
  CC      block/blk-mq-pci.o
  AR      drivers/char/agp/built-in.a
  CC      net/sunrpc/addr.o
  CC [M]  fs/efivarfs/inode.o
  CC      drivers/gpu/drm/display/drm_dp_dual_mode_helper.o
  CC      net/sunrpc/rpcb_clnt.o
  AR      drivers/iommu/built-in.a
  CC      net/sunrpc/timer.o
  AR      net/sunrpc/auth_gss/built-in.a
  CC      lib/win_minmax.o
  CC      net/netlabel/netlabel_calipso.o
  CC      net/wireless/ethtool.o
  CC      drivers/char/virtio_console.o
  CC      drivers/tty/tty_ioctl.o
  CC      drivers/tty/serial/8250/8250_lpss.o
  CC      drivers/acpi/acpica/hwsleep.o
  CC      net/ipv6/raw.o
  CC      net/handshake/netlink.o
  CC      fs/nfs/sysfs.o
  CC [M]  sound/hda/intel-dsp-config.o
  CC      arch/x86/kernel/setup.o
  CC      crypto/lzo.o
  CC      sound/last.o
  CC      block/blk-mq-virtio.o
  CC [M]  fs/efivarfs/file.o
  CC      net/sunrpc/xdr.o
  CC      fs/tracefs/event_inode.o
  CC      drivers/acpi/acpica/hwvalid.o
  CC      net/ipv4/tcp_ipv4.o
  CC      net/wireless/mesh.o
  CC      block/blk-mq-debugfs.o
  CC      net/netfilter/nf_conntrack_proto_icmpv6.o
  CC      net/ethtool/cmis_fw_update.o
  CC      fs/open.o
  CC      drivers/tty/tty_ldisc.o
  CC      fs/lockd/svc4proc.o
  AR      kernel/trace/built-in.a
  CC      kernel/sys.o
  CC      drivers/gpu/drm/display/drm_dp_helper.o
  CC      net/core/netdev-genl.o
  CC      net/core/netdev-genl-gen.o
  CC      drivers/tty/tty_buffer.o
  CC      net/9p/trans_virtio.o
  CC      mm/mlock.o
  CC      drivers/tty/serial/8250/8250_mid.o
  CC      drivers/acpi/acpica/hwxface.o
  CC      fs/nfs/fs_context.o
  CC      net/wireless/ap.o
  CC      net/netfilter/nf_conntrack_netlink.o
  CC      crypto/lzo-rle.o
  CC [M]  sound/hda/intel-nhlt.o
  CC [M]  fs/efivarfs/super.o
  CC      lib/xarray.o
  CC      drivers/acpi/glue.o
  AR      net/netlabel/built-in.a
  CC      fs/ext4/page-io.o
  CC      drivers/acpi/acpica/hwxfsleep.o
  CC      arch/x86/kernel/x86_init.o
  CC      drivers/connector/cn_queue.o
  CC      net/ethtool/cmis_cdb.o
  CC      net/handshake/request.o
  CC      drivers/char/hpet.o
  CC [M]  fs/efivarfs/vars.o
  AR      fs/tracefs/built-in.a
  CC      fs/ext4/readpage.o
  CC      block/blk-pm.o
  CC [M]  sound/hda/intel-sdw-acpi.o
  CC      crypto/rng.o
  CC      drivers/tty/serial/8250/8250_pci.o
  CC      fs/nfs/nfsroot.o
  CC      drivers/tty/serial/earlycon.o
  CC      drivers/gpu/drm/ttm/ttm_tt.o
  CC      drivers/acpi/acpica/hwpci.o
  CC      fs/lockd/procfs.o
  CC      net/handshake/tlshd.o
  CC      net/ipv4/tcp_minisocks.o
  CC      net/sunrpc/sunrpc_syms.o
  CC      mm/mmap.o
  CC      net/handshake/trace.o
  CC      drivers/gpu/drm/ttm/ttm_bo.o
  CC      drivers/connector/connector.o
  CC      net/ipv6/icmp.o
  LD [M]  sound/hda/snd-hda-core.o
  CC      net/ipv4/tcp_cong.o
  CC      drivers/base/power/sysfs.o
  LD [M]  sound/hda/snd-intel-dspcfg.o
  LD [M]  sound/hda/snd-intel-sdw-acpi.o
  AR      sound/built-in.a
  AR      net/9p/built-in.a
  CC      drivers/connector/cn_proc.o
  CC      net/sunrpc/cache.o
  CC      drivers/base/firmware_loader/builtin/main.o
  CC      drivers/acpi/acpica/nsaccess.o
  CC      block/holder.o
  CC      drivers/block/loop.o
  CC      arch/x86/kernel/i8259.o
  LD [M]  fs/efivarfs/efivarfs.o
  CC      net/netfilter/nf_conntrack_ftp.o
  CC      crypto/drbg.o
  CC      drivers/tty/tty_port.o
  CC      net/core/gso.o
  CC      net/ethtool/pse-pd.o
  CC      drivers/char/nvram.o
  CC      drivers/gpu/drm/ttm/ttm_bo_util.o
  CC      drivers/base/regmap/regmap.o
  CC      fs/ext4/resize.o
  CC      lib/lockref.o
  CC      mm/mmu_gather.o
  CC      drivers/gpu/drm/ttm/ttm_bo_vm.o
  CC      drivers/block/virtio_blk.o
  AR      drivers/base/firmware_loader/builtin/built-in.a
  CC      drivers/tty/tty_mutex.o
  CC      drivers/base/firmware_loader/main.o
  CC      kernel/umh.o
  AR      fs/lockd/built-in.a
  CC      kernel/workqueue.o
  CC      drivers/acpi/acpica/nsalloc.o
  CC      drivers/tty/serial/8250/8250_pericom.o
  CC      drivers/gpu/drm/display/drm_dp_mst_topology.o
  CC      drivers/base/power/generic_ops.o
  CC      lib/bcd.o
  CC      arch/x86/kernel/irqinit.o
  CC      lib/sort.o
  CC      net/sysctl_net.o
  CC      net/core/net-sysfs.o
  CC      net/wireless/trace.o
  CC      lib/parser.o
  CC      arch/x86/kernel/jump_label.o
  AR      block/built-in.a
  CC      net/mac80211/aead_api.o
  CC      kernel/pid.o
  CC      mm/mprotect.o
  CC      drivers/base/regmap/regcache.o
  CC      drivers/acpi/acpica/nsarguments.o
  CC      net/mac80211/wpa.o
  CC      crypto/jitterentropy.o
  CC      kernel/task_work.o
  CC      net/netfilter/nf_conntrack_irc.o
  CC      drivers/acpi/scan.o
  CC      net/ipv4/tcp_metrics.o
  CC      drivers/base/power/common.o
  CC      drivers/acpi/acpica/nsconvert.o
  AR      drivers/char/built-in.a
  CC      crypto/jitterentropy-kcapi.o
  AR      drivers/misc/eeprom/built-in.a
  AR      drivers/misc/cb710/built-in.a
  AR      drivers/misc/ti-st/built-in.a
  AR      drivers/misc/lis3lv02d/built-in.a
  AR      drivers/misc/cardreader/built-in.a
  AR      drivers/mfd/built-in.a
  CC      net/netfilter/nf_conntrack_sip.o
  AR      drivers/misc/keba/built-in.a
  AR      drivers/misc/built-in.a
  CC      lib/debug_locks.o
  AR      drivers/connector/built-in.a
  CC      drivers/base/regmap/regcache-rbtree.o
  CC      net/mac80211/scan.o
  CC      fs/nfs/sysctl.o
  CC      drivers/gpu/drm/ttm/ttm_module.o
  AR      drivers/tty/serial/8250/built-in.a
  AR      drivers/tty/serial/built-in.a
  CC      drivers/tty/tty_ldsem.o
  CC      drivers/tty/tty_baudrate.o
  CC      drivers/base/power/qos.o
  CC      drivers/gpu/drm/ttm/ttm_execbuf_util.o
  AR      net/handshake/built-in.a
  CC      net/ethtool/plca.o
  CC      net/wireless/ocb.o
  AR      drivers/nfc/built-in.a
  CC      drivers/acpi/mipi-disco-img.o
  CC      net/wireless/pmsr.o
  CC      net/netfilter/nf_nat_core.o
  CC      lib/random32.o
  AR      drivers/base/firmware_loader/built-in.a
  CC      drivers/gpu/drm/i915/i915_config.o
  CC      drivers/acpi/acpica/nsdump.o
  CC      fs/read_write.o
  CC      crypto/ghash-generic.o
  CC      mm/mremap.o
  CC      arch/x86/kernel/irq_work.o
  CC      net/core/hotdata.o
  CC      net/ethtool/phy.o
  CC      drivers/gpu/drm/display/drm_dsc_helper.o
  AR      drivers/block/built-in.a
  CC      drivers/gpu/drm/i915/i915_driver.o
  CC      net/ipv6/mcast.o
  CC      lib/bust_spinlocks.o
  AR      drivers/dax/hmem/built-in.a
  AR      drivers/dax/built-in.a
  CC      crypto/hash_info.o
  CC      drivers/gpu/drm/i915/i915_drm_client.o
  CC      drivers/gpu/drm/i915/i915_getparam.o
  CC      drivers/gpu/drm/i915/i915_ioctl.o
  CC      net/netfilter/nf_nat_proto.o
  CC      drivers/gpu/drm/display/drm_hdcp_helper.o
  AR      drivers/base/test/built-in.a
  CC      drivers/base/regmap/regcache-flat.o
  AR      drivers/gpu/drm/renesas/rcar-du/built-in.a
  AR      drivers/gpu/drm/renesas/rz-du/built-in.a
  CC      mm/msync.o
  AR      drivers/gpu/drm/renesas/built-in.a
  CC      drivers/acpi/acpica/nseval.o
  GEN     net/wireless/shipped-certs.c
  CC      fs/file_table.o
  CC      drivers/tty/tty_jobctrl.o
  CC      kernel/extable.o
  CC      crypto/rsapubkey.asn1.o
  CC      crypto/rsaprivkey.asn1.o
  CC      drivers/dma-buf/dma-buf.o
  AR      crypto/built-in.a
  AR      drivers/cxl/core/built-in.a
  AR      drivers/cxl/built-in.a
  CC      drivers/tty/n_null.o
  CC      net/core/netdev_rx_queue.o
  CC      drivers/gpu/drm/ttm/ttm_range_manager.o
  CC      lib/kasprintf.o
  CC      fs/nfs/nfs3super.o
  CC      arch/x86/kernel/probe_roms.o
  CC      fs/ext4/super.o
  CC      net/netfilter/nf_nat_helper.o
  CC      drivers/acpi/acpica/nsinit.o
  CC      drivers/base/component.o
  CC      net/mac80211/offchannel.o
  CC      net/sunrpc/rpc_pipe.o
  CC      net/core/net-procfs.o
  CC      lib/bitmap.o
  CC      drivers/gpu/drm/ttm/ttm_resource.o
  CC      fs/nfs/nfs3client.o
  AR      drivers/gpu/drm/omapdrm/built-in.a
  CC      drivers/base/power/runtime.o
  CC      net/ipv4/tcp_fastopen.o
  CC      lib/scatterlist.o
  CC      drivers/acpi/acpica/nsload.o
  CC      kernel/params.o
  CC      fs/super.o
  CC      fs/char_dev.o
  CC      drivers/base/core.o
  CC      drivers/tty/pty.o
  CC      drivers/base/regmap/regcache-maple.o
  CC      drivers/base/bus.o
  AR      net/ethtool/built-in.a
  CC      drivers/gpu/drm/ttm/ttm_pool.o
  CC      net/ipv6/reassembly.o
  CC      drivers/dma-buf/dma-fence.o
  CC      drivers/base/regmap/regmap-debugfs.o
  CC      mm/page_vma_mapped.o
  CC      fs/stat.o
  CC      kernel/kthread.o
  CC      arch/x86/kernel/sys_ia32.o
  CC      lib/list_sort.o
  CC      drivers/base/power/wakeirq.o
  CC      drivers/acpi/acpica/nsnames.o
  AR      drivers/gpu/drm/tilcdc/built-in.a
  CC      net/ipv4/tcp_rate.o
  CC      drivers/gpu/drm/virtio/virtgpu_drv.o
  CC      mm/pagewalk.o
  CC      net/ipv6/tcp_ipv6.o
  CC      net/core/netpoll.o
  CC      drivers/tty/tty_audit.o
  CC      fs/nfs/nfs3proc.o
  CC      drivers/acpi/acpica/nsobject.o
  CC      drivers/gpu/drm/ttm/ttm_device.o
  CC      net/sunrpc/sysfs.o
  CC      drivers/gpu/drm/i915/i915_irq.o
  CC      net/sunrpc/svc_xprt.o
  CC      drivers/gpu/drm/ttm/ttm_sys_manager.o
  CC      drivers/gpu/drm/display/drm_hdmi_helper.o
  CC      net/ipv4/tcp_recovery.o
  CC      fs/nfs/nfs3xdr.o
  CC      net/wireless/shipped-certs.o
  CC      drivers/tty/sysrq.o
  AR      drivers/base/regmap/built-in.a
  CC      net/core/fib_rules.o
  CC      drivers/dma-buf/dma-fence-array.o
  AR      drivers/gpu/drm/imx/built-in.a
  CC      net/ipv4/tcp_ulp.o
  AR      drivers/gpu/drm/i2c/built-in.a
  CC      net/netfilter/nf_nat_masquerade.o
  CC      drivers/base/power/main.o
  CC      arch/x86/kernel/ksysfs.o
  CC      drivers/acpi/acpica/nsparse.o
  CC      kernel/sys_ni.o
  CC      drivers/dma-buf/dma-fence-chain.o
  CC      drivers/dma-buf/dma-fence-unwrap.o
  CC      drivers/gpu/drm/ttm/ttm_agp_backend.o
  CC      fs/ext4/symlink.o
  CC      net/ipv6/ping.o
  CC      drivers/gpu/drm/virtio/virtgpu_kms.o
  CC      lib/uuid.o
  CC      net/mac80211/ht.o
  CC      mm/pgtable-generic.o
  CC      fs/nfs/nfs3acl.o
  CC      arch/x86/kernel/bootflag.o
  CC      net/ipv4/tcp_offload.o
  CC      drivers/acpi/resource.o
  CC      lib/iov_iter.o
  AR      drivers/gpu/drm/panel/built-in.a
  CC      drivers/acpi/acpica/nspredef.o
  AR      drivers/gpu/drm/bridge/analogix/built-in.a
  AR      drivers/gpu/drm/bridge/cadence/built-in.a
  AR      drivers/gpu/drm/bridge/imx/built-in.a
  CC      mm/rmap.o
  CC      drivers/acpi/acpica/nsprepkg.o
  AR      drivers/gpu/drm/bridge/synopsys/built-in.a
  AR      drivers/gpu/drm/bridge/built-in.a
  CC      net/ipv4/tcp_plb.o
  AR      drivers/gpu/drm/hisilicon/built-in.a
  CC      fs/ext4/sysfs.o
  CC      net/netfilter/nf_nat_ftp.o
  CC      kernel/nsproxy.o
  CC      drivers/gpu/drm/display/drm_scdc_helper.o
  CC      net/sunrpc/xprtmultipath.o
  CC      arch/x86/kernel/e820.o
  CC      net/core/net-traces.o
  CC      drivers/dma-buf/dma-resv.o
  CC      drivers/dma-buf/sync_file.o
  CC      drivers/gpu/drm/i915/i915_mitigations.o
  AR      drivers/gpu/drm/ttm/built-in.a
  CC      drivers/macintosh/mac_hid.o
  CC      net/ipv6/exthdrs.o
  CC      net/core/selftests.o
  CC      mm/vmalloc.o
  CC      net/sunrpc/stats.o
  CC      net/netfilter/nf_nat_irc.o
  CC      kernel/notifier.o
  CC      fs/exec.o
  CC      drivers/acpi/acpica/nsrepair.o
  AR      drivers/tty/built-in.a
  CC      drivers/acpi/acpica/nsrepair2.o
  CC      mm/vma.o
  CC      net/ipv6/datagram.o
  CC      drivers/gpu/drm/i915/i915_module.o
  AR      drivers/gpu/drm/mxsfb/built-in.a
  CC      drivers/base/power/wakeup.o
  CC      drivers/base/dd.o
  CC      net/sunrpc/sysctl.o
  CC      drivers/gpu/drm/virtio/virtgpu_gem.o
  CC      net/mac80211/agg-tx.o
  CC      net/ipv6/ip6_flowlabel.o
  CC      net/netfilter/nf_nat_sip.o
  CC      fs/pipe.o
  CC      lib/clz_ctz.o
  CC      kernel/ksysfs.o
  AR      drivers/scsi/pcmcia/built-in.a
  AR      drivers/macintosh/built-in.a
  CC      drivers/scsi/scsi.o
  CC      drivers/scsi/hosts.o
  CC      kernel/cred.o
  CC      kernel/reboot.o
  CC      drivers/acpi/acpica/nssearch.o
  AR      drivers/gpu/drm/display/built-in.a
  CC      arch/x86/kernel/pci-dma.o
  CC      drivers/base/power/wakeup_stats.o
  AR      drivers/dma-buf/built-in.a
  CC      drivers/base/syscore.o
  CC      drivers/acpi/acpi_processor.o
  AR      drivers/gpu/drm/tiny/built-in.a
  CC      lib/bsearch.o
  CC      net/core/ptp_classifier.o
  CC      net/core/netprio_cgroup.o
  CC      lib/find_bit.o
  CC      mm/process_vm_access.o
  CC      arch/x86/kernel/quirks.o
  AR      drivers/nvme/common/built-in.a
  AR      drivers/nvme/host/built-in.a
  AR      drivers/nvme/target/built-in.a
  AR      drivers/nvme/built-in.a
  CC      drivers/base/power/trace.o
  CC      drivers/acpi/acpica/nsutils.o
  CC      fs/namei.o
  CC      net/ipv6/inet6_connection_sock.o
  CC      net/ipv4/datagram.o
  CC      drivers/gpu/drm/virtio/virtgpu_vram.o
  CC      drivers/scsi/scsi_ioctl.o
  CC      drivers/gpu/drm/virtio/virtgpu_display.o
  CC      net/mac80211/agg-rx.o
  CC      drivers/gpu/drm/virtio/virtgpu_vq.o
  CC      drivers/gpu/drm/virtio/virtgpu_fence.o
  CC      lib/llist.o
  CC      net/mac80211/vht.o
  CC      net/core/netclassid_cgroup.o
  CC      fs/nfs/nfs4proc.o
  CC      net/netfilter/x_tables.o
  CC      drivers/gpu/drm/i915/i915_params.o
  CC      arch/x86/kernel/kdebugfs.o
  CC      drivers/ata/libata-core.o
  AR      drivers/gpu/drm/xlnx/built-in.a
  CC      drivers/net/phy/mdio-boardinfo.o
  AR      drivers/net/phy/qcom/built-in.a
  CC      net/ipv4/raw.o
  CC      drivers/acpi/acpica/nswalk.o
  CC      drivers/gpu/drm/virtio/virtgpu_object.o
  CC      fs/fcntl.o
  CC      fs/ioctl.o
  AR      drivers/net/pse-pd/built-in.a
  CC      drivers/acpi/processor_core.o
  CC      fs/ext4/xattr.o
  CC      lib/lwq.o
  CC      kernel/async.o
  CC      kernel/range.o
  CC      drivers/net/phy/stubs.o
  CC      drivers/base/driver.o
  AR      drivers/base/power/built-in.a
  CC      drivers/scsi/scsicam.o
  CC      drivers/base/class.o
  CC      drivers/base/platform.o
  CC      arch/x86/kernel/alternative.o
  CC      arch/x86/kernel/i8253.o
  CC      arch/x86/kernel/hw_breakpoint.o
  CC      fs/nfs/nfs4xdr.o
  AR      drivers/gpu/drm/gud/built-in.a
  AR      drivers/gpu/drm/solomon/built-in.a
  CC [M]  drivers/gpu/drm/scheduler/sched_main.o
  CC      lib/memweight.o
  CC      mm/page_alloc.o
  CC      drivers/base/cpu.o
  CC      drivers/acpi/acpica/nsxfeval.o
  HOSTCC  drivers/gpu/drm/xe/xe_gen_wa_oob
  CC      drivers/acpi/processor_pdc.o
  CC      net/netfilter/xt_tcpudp.o
  CC      lib/kfifo.o
  CC      fs/ext4/xattr_hurd.o
  CC      drivers/gpu/drm/drm_atomic.o
  CC      drivers/base/firmware.o
  GEN     xe_wa_oob.c xe_wa_oob.h
  CC [M]  drivers/gpu/drm/xe/xe_bb.o
  CC      drivers/gpu/drm/i915/i915_pci.o
  CC      drivers/ata/libata-scsi.o
  CC      kernel/smpboot.o
  CC      drivers/base/init.o
  CC      drivers/gpu/drm/i915/i915_scatterlist.o
  CC      net/ipv6/udp_offload.o
  CC      net/ipv4/udp.o
  CC      drivers/net/mdio/acpi_mdio.o
  CC      drivers/firewire/init_ohci1394_dma.o
  CC      kernel/ucount.o
  CC      net/netfilter/xt_CONNSECMARK.o
  CC      arch/x86/kernel/tsc.o
  CC      net/mac80211/he.o
  CC [M]  drivers/gpu/drm/scheduler/sched_fence.o
  CC      drivers/acpi/acpica/nsxfname.o
  CC      drivers/gpu/drm/virtio/virtgpu_debugfs.o
  CC      drivers/cdrom/cdrom.o
  AR      net/sunrpc/built-in.a
  CC      drivers/scsi/scsi_error.o
  CC [M]  drivers/gpu/drm/scheduler/sched_entity.o
  CC      drivers/net/phy/mdio_devres.o
  AR      drivers/auxdisplay/built-in.a
  CC      drivers/acpi/ec.o
  CC      net/ipv4/udplite.o
  CC      drivers/net/mdio/fwnode_mdio.o
  CC      kernel/regset.o
  CC      drivers/scsi/scsi_lib.o
  CC      drivers/net/phy/phy.o
  CC      fs/ext4/xattr_trusted.o
  CC      drivers/scsi/constants.o
  CC      fs/readdir.o
  CC      lib/percpu-refcount.o
  CC [M]  drivers/gpu/drm/xe/xe_bo.o
  CC      drivers/base/map.o
  CC      drivers/acpi/acpica/nsxfobj.o
  CC      drivers/base/devres.o
  AR      drivers/firewire/built-in.a
  CC      drivers/base/attribute_container.o
  CC      drivers/base/transport_class.o
  CC      drivers/base/topology.o
  CC      drivers/ata/libata-eh.o
  CC      kernel/ksyms_common.o
  CC      drivers/gpu/drm/i915/i915_suspend.o
  CC      drivers/gpu/drm/virtio/virtgpu_plane.o
  CC      fs/select.o
  CC      net/ipv4/udp_offload.o
  CC      drivers/net/phy/phy-c45.o
  CC      net/netfilter/xt_NFLOG.o
  CC      drivers/acpi/acpica/psargs.o
  LD [M]  drivers/gpu/drm/scheduler/gpu-sched.o
  CC      drivers/acpi/dock.o
  CC      drivers/ata/libata-transport.o
  CC      drivers/gpu/drm/virtio/virtgpu_ioctl.o
  CC      fs/dcache.o
  CC      mm/init-mm.o
  CC      kernel/groups.o
  CC      net/ipv6/seg6.o
  CC      drivers/gpu/drm/i915/i915_switcheroo.o
  CC      arch/x86/kernel/tsc_msr.o
  CC      net/netfilter/xt_SECMARK.o
  CC      drivers/ata/libata-trace.o
  CC      drivers/gpu/drm/drm_atomic_uapi.o
  CC      lib/rhashtable.o
  AR      drivers/net/pcs/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_bo_evict.o
  CC      drivers/net/phy/phy-core.o
  AR      drivers/net/mdio/built-in.a
  CC      drivers/gpu/drm/i915/i915_sysfs.o
  CC      drivers/pcmcia/cs.o
  CC      drivers/gpu/drm/drm_auth.o
  CC      net/ipv6/fib6_notifier.o
  CC      drivers/acpi/acpica/psloop.o
  CC      drivers/base/container.o
  CC      drivers/base/property.o
  CC      drivers/scsi/scsi_lib_dma.o
  CC      drivers/pcmcia/socket_sysfs.o
  CC      arch/x86/kernel/io_delay.o
  CC      net/core/dst_cache.o
  CC      drivers/ata/libata-sata.o
  CC      fs/nfs/nfs4state.o
  CC [M]  drivers/gpu/drm/xe/xe_devcoredump.o
  CC      net/mac80211/s1g.o
  CC      lib/base64.o
  AR      drivers/net/ethernet/3com/built-in.a
  CC      drivers/net/ethernet/8390/ne2k-pci.o
  CC      drivers/gpu/drm/virtio/virtgpu_prime.o
  CC      mm/memblock.o
  CC [M]  drivers/gpu/drm/xe/xe_device.o
  AR      drivers/net/wireless/admtek/built-in.a
  AR      drivers/net/wireless/ath/built-in.a
  CC      drivers/acpi/acpica/psobject.o
  AR      drivers/net/wireless/atmel/built-in.a
  AR      drivers/net/wireless/broadcom/built-in.a
  AR      drivers/net/wireless/intel/built-in.a
  CC      arch/x86/kernel/rtc.o
  CC      kernel/kcmp.o
  AR      drivers/net/wireless/intersil/built-in.a
  CC      mm/slub.o
  AR      drivers/net/wireless/marvell/built-in.a
  AR      drivers/net/wireless/mediatek/built-in.a
  AR      drivers/net/wireless/microchip/built-in.a
  AR      drivers/net/usb/built-in.a
  AR      drivers/net/wireless/purelifi/built-in.a
  AR      drivers/cdrom/built-in.a
  CC      net/ipv6/rpl.o
  AR      drivers/net/wireless/quantenna/built-in.a
  CC      arch/x86/kernel/resource.o
  AR      drivers/net/wireless/ralink/built-in.a
  AR      drivers/net/wireless/realtek/built-in.a
  AR      drivers/net/wireless/rsi/built-in.a
  AR      drivers/net/wireless/silabs/built-in.a
  CC      fs/nfs/nfs4renewd.o
  AR      drivers/net/wireless/st/built-in.a
  AR      drivers/net/wireless/ti/built-in.a
  AS      arch/x86/kernel/irqflags.o
  AR      drivers/net/wireless/zydas/built-in.a
  AR      drivers/net/wireless/virtual/built-in.a
  CC      arch/x86/kernel/static_call.o
  AR      drivers/net/wireless/built-in.a
  CC      net/netfilter/xt_TCPMSS.o
  CC      drivers/net/ethernet/8390/8390.o
  CC      mm/madvise.o
  CC      drivers/gpu/drm/virtio/virtgpu_trace_points.o
  CC      arch/x86/kernel/process.o
  CC      drivers/gpu/drm/drm_blend.o
  CC      drivers/scsi/scsi_scan.o
  CC      kernel/freezer.o
  CC [M]  drivers/gpu/drm/xe/xe_device_sysfs.o
  CC      net/mac80211/ibss.o
  CC      net/ipv6/ioam6.o
  CC [M]  drivers/gpu/drm/xe/xe_dma_buf.o
  CC      drivers/pcmcia/cardbus.o
  CC      lib/once.o
  CC      drivers/acpi/acpica/psopcode.o
  CC      drivers/acpi/pci_root.o
  CC      drivers/usb/common/common.o
  CC      drivers/gpu/drm/i915/i915_utils.o
  CC      drivers/net/phy/phy_device.o
  CC      drivers/usb/core/usb.o
  CC      drivers/usb/core/hub.o
  CC      drivers/usb/core/hcd.o
  CC      drivers/usb/common/debug.o
  CC      drivers/acpi/acpica/psopinfo.o
  CC      drivers/base/cacheinfo.o
  AR      drivers/usb/phy/built-in.a
  CC      drivers/gpu/drm/i915/intel_clock_gating.o
  CC      arch/x86/kernel/ptrace.o
  CC      drivers/acpi/acpica/psparse.o
  CC      net/core/gro_cells.o
  CC      lib/refcount.o
  CC      mm/page_io.o
  AR      drivers/net/ethernet/adaptec/built-in.a
  CC      drivers/gpu/drm/drm_bridge.o
  CC      net/netfilter/xt_conntrack.o
  CC [M]  drivers/gpu/drm/xe/xe_drm_client.o
  CC      arch/x86/kernel/tls.o
  CC      net/ipv6/sysctl_net_ipv6.o
  CC      fs/ext4/xattr_user.o
  CC      net/ipv4/arp.o
  CC      fs/nfs/nfs4super.o
  CC      lib/rcuref.o
  CC      drivers/pcmcia/ds.o
  CC      mm/swap_state.o
  CC      lib/usercopy.o
  CC      kernel/profile.o
  CC      drivers/gpu/drm/virtio/virtgpu_submit.o
  GEN     drivers/scsi/scsi_devinfo_tbl.c
  CC      drivers/gpu/drm/drm_cache.o
  CC      drivers/net/mii.o
  CC      drivers/ata/libata-sff.o
  CC      drivers/net/phy/linkmode.o
  AR      drivers/usb/common/built-in.a
  CC      drivers/base/swnode.o
  CC      drivers/acpi/acpica/psscope.o
  CC      drivers/input/serio/serio.o
  CC      drivers/usb/core/urb.o
  AR      drivers/net/ethernet/8390/built-in.a
  CC      drivers/input/serio/i8042.o
  AR      drivers/net/ethernet/agere/built-in.a
  AR      drivers/net/ethernet/alacritech/built-in.a
  AR      drivers/net/ethernet/alteon/built-in.a
  AR      drivers/net/ethernet/amazon/built-in.a
  AR      drivers/net/ethernet/amd/built-in.a
  CC      arch/x86/kernel/step.o
  CC      lib/errseq.o
  AR      drivers/net/ethernet/aquantia/built-in.a
  AR      drivers/net/ethernet/arc/built-in.a
  AR      drivers/net/ethernet/asix/built-in.a
  AR      drivers/net/ethernet/atheros/built-in.a
  CC      net/ipv6/xfrm6_policy.o
  CC      drivers/net/phy/phy_link_topology.o
  AR      drivers/net/ethernet/cadence/built-in.a
  CC      drivers/net/ethernet/broadcom/bnx2.o
  CC      net/ipv4/icmp.o
  CC      kernel/stacktrace.o
  CC      lib/bucket_locks.o
  CC      drivers/scsi/scsi_devinfo.o
  CC      drivers/acpi/acpica/pstree.o
  CC      drivers/net/ethernet/broadcom/tg3.o
  CC      drivers/input/serio/serport.o
  CC      fs/ext4/fast_commit.o
  CC      drivers/net/loopback.o
  CC      drivers/usb/mon/mon_main.o
  CC      drivers/usb/host/pci-quirks.o
  CC      drivers/usb/class/usblp.o
  CC      fs/inode.o
  CC      drivers/gpu/drm/i915/intel_device_info.o
  CC      net/core/failover.o
  AR      drivers/net/ethernet/brocade/built-in.a
  CC      drivers/input/keyboard/atkbd.o
  CC      drivers/rtc/lib.o
  CC      drivers/i2c/algos/i2c-algo-bit.o
  CC      drivers/i2c/busses/i2c-i801.o
  AR      drivers/i2c/muxes/built-in.a
  AR      drivers/net/ethernet/cavium/common/built-in.a
  CC      drivers/acpi/acpica/psutils.o
  AR      drivers/net/ethernet/chelsio/built-in.a
  CC      lib/generic-radix-tree.o
  AR      drivers/net/ethernet/cavium/thunder/built-in.a
  CC      drivers/usb/core/message.o
  AR      drivers/net/ethernet/cavium/liquidio/built-in.a
  AR      drivers/net/ethernet/cavium/octeon/built-in.a
  AR      drivers/net/ethernet/cavium/built-in.a
  AR      drivers/gpu/drm/virtio/built-in.a
  CC      net/mac80211/iface.o
  CC      drivers/ata/libata-pmp.o
  CC      mm/swapfile.o
  CC      drivers/gpu/drm/i915/intel_memory_region.o
  CC      net/netfilter/xt_policy.o
  CC      arch/x86/kernel/i8237.o
  CC [M]  drivers/gpu/drm/xe/xe_exec.o
  CC      drivers/pcmcia/pcmcia_resource.o
  CC      drivers/input/serio/libps2.o
  CC      net/mac80211/link.o
  CC      drivers/base/auxiliary.o
  CC      kernel/dma.o
  CC      drivers/input/mouse/psmouse-base.o
  CC      fs/nfs/nfs4file.o
  CC      drivers/gpu/drm/i915/intel_pcode.o
  AR      drivers/net/ethernet/cisco/built-in.a
  CC      drivers/pcmcia/cistpl.o
  CC [M]  drivers/gpu/drm/xe/xe_execlist.o
  CC      drivers/input/mouse/synaptics.o
  CC      drivers/acpi/acpica/pswalk.o
  CC      drivers/usb/mon/mon_stat.o
  CC      arch/x86/kernel/stacktrace.o
  CC      drivers/scsi/scsi_sysctl.o
  CC      drivers/acpi/pci_link.o
  CC      drivers/net/phy/mdio_bus.o
  CC      lib/bitmap-str.o
  CC      drivers/rtc/class.o
  AR      drivers/net/ethernet/cortina/built-in.a
  CC      drivers/net/netconsole.o
  CC      net/ipv6/xfrm6_state.o
  CC      drivers/base/devtmpfs.o
  CC      drivers/input/mouse/focaltech.o
  CC      kernel/smp.o
  CC      drivers/usb/host/ehci-hcd.o
  AR      drivers/usb/class/built-in.a
  CC      drivers/acpi/acpica/psxface.o
  CC      net/netfilter/xt_state.o
  CC      net/ipv4/devinet.o
  AR      drivers/input/joystick/built-in.a
  AR      drivers/i2c/algos/built-in.a
  CC      drivers/pcmcia/pcmcia_cis.o
  CC      drivers/acpi/pci_irq.o
  AR      drivers/input/serio/built-in.a
  CC      drivers/gpu/drm/i915/intel_region_ttm.o
  AR      net/core/built-in.a
  CC      drivers/rtc/interface.o
  CC      drivers/base/module.o
  CC      drivers/usb/mon/mon_text.o
  AR      drivers/input/keyboard/built-in.a
  CC      drivers/acpi/acpica/rsaddr.o
  CC      net/mac80211/rate.o
  AR      net/wireless/built-in.a
  CC      drivers/i2c/i2c-boardinfo.o
  CC      arch/x86/kernel/reboot.o
  CC      drivers/scsi/scsi_proc.o
  CC      drivers/gpu/drm/drm_color_mgmt.o
  CC      mm/swap_slots.o
  CC      net/ipv6/xfrm6_input.o
  CC      lib/string_helpers.o
  AR      drivers/i2c/busses/built-in.a
  CC      drivers/usb/storage/scsiglue.o
  CC      drivers/input/mouse/alps.o
  CC      fs/attr.o
  CC      fs/nfs/delegation.o
  CC [M]  drivers/gpu/drm/xe/xe_exec_queue.o
  CC      drivers/ata/libata-acpi.o
  CC      drivers/ata/libata-pata-timings.o
  CC      drivers/acpi/acpica/rscalc.o
  CC      drivers/usb/core/driver.o
  CC      drivers/input/mouse/byd.o
  CC      drivers/i2c/i2c-core-base.o
  CC      arch/x86/kernel/msr.o
  CC      drivers/pcmcia/rsrc_mgr.o
  CC      lib/hexdump.o
  AR      drivers/i3c/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_force_wake.o
  CC      drivers/usb/storage/protocol.o
  CC      drivers/net/phy/mdio_device.o
  CC      drivers/base/auxiliary_sysfs.o
  CC      drivers/base/devcoredump.o
  CC      fs/nfs/nfs4idmap.o
  CC      fs/bad_inode.o
  CC      drivers/input/mouse/logips2pp.o
  CC      drivers/gpu/drm/i915/intel_runtime_pm.o
  CC      drivers/gpu/drm/i915/intel_sbi.o
  AR      drivers/usb/misc/built-in.a
  CC      fs/nfs/callback.o
  CC      kernel/uid16.o
  CC [M]  net/netfilter/nf_log_syslog.o
  CC      drivers/acpi/acpica/rscreate.o
  CC      drivers/gpu/drm/drm_connector.o
  CC      drivers/usb/mon/mon_bin.o
  CC      net/ipv4/af_inet.o
  CC      drivers/net/virtio_net.o
  AR      drivers/net/ethernet/dec/tulip/built-in.a
  CC      drivers/scsi/scsi_debugfs.o
  CC [M]  drivers/gpu/drm/xe/xe_ggtt.o
  AR      drivers/net/ethernet/dec/built-in.a
  CC      fs/nfs/callback_xdr.o
  CC      mm/dmapool.o
  CC [M]  drivers/gpu/drm/xe/xe_gpu_scheduler.o
  CC      drivers/net/phy/swphy.o
  CC      kernel/kallsyms.o
  CC      fs/file.o
  CC      drivers/pcmcia/rsrc_nonstatic.o
  CC      lib/kstrtox.o
  CC      arch/x86/kernel/cpuid.o
  CC      drivers/gpu/drm/i915/intel_step.o
  CC      lib/iomap.o
  CC      drivers/ata/ahci.o
  CC      drivers/usb/storage/transport.o
  CC      drivers/acpi/acpica/rsdumpinfo.o
  CC      drivers/base/platform-msi.o
  CC      fs/ext4/orphan.o
  CC [M]  net/netfilter/xt_mark.o
  CC      drivers/ata/libahci.o
  CC      drivers/rtc/nvmem.o
  AR      drivers/media/i2c/built-in.a
  AR      drivers/media/tuners/built-in.a
  CC      fs/filesystems.o
  CC      drivers/pcmcia/yenta_socket.o
  AR      drivers/media/rc/keymaps/built-in.a
  AR      drivers/media/rc/built-in.a
  CC      drivers/gpu/drm/drm_crtc.o
  CC      drivers/gpu/drm/drm_displayid.o
  AR      drivers/media/common/b2c2/built-in.a
  AR      drivers/media/common/saa7146/built-in.a
  AR      drivers/media/common/siano/built-in.a
  AR      drivers/media/common/v4l2-tpg/built-in.a
  AR      drivers/media/common/videobuf2/built-in.a
  AR      drivers/media/common/built-in.a
  CC      drivers/usb/core/config.o
  CC      net/ipv6/xfrm6_output.o
  AR      drivers/media/platform/allegro-dvt/built-in.a
  AR      drivers/media/platform/amlogic/meson-ge2d/built-in.a
  AR      drivers/media/platform/amlogic/built-in.a
  AR      drivers/media/platform/amphion/built-in.a
  CC      drivers/acpi/acpica/rsinfo.o
  AR      drivers/media/platform/aspeed/built-in.a
  AR      drivers/media/platform/atmel/built-in.a
  CC      drivers/ata/ata_piix.o
  AR      drivers/media/platform/broadcom/built-in.a
  AR      drivers/media/platform/cadence/built-in.a
  AR      drivers/media/platform/chips-media/coda/built-in.a
  CC      drivers/scsi/scsi_trace.o
  AR      drivers/media/platform/chips-media/wave5/built-in.a
  AR      drivers/media/platform/chips-media/built-in.a
  CC      mm/hugetlb.o
  CC      mm/mmu_notifier.o
  CC      net/mac80211/michael.o
  AR      drivers/media/platform/imagination/built-in.a
  AR      drivers/media/platform/intel/built-in.a
  AR      drivers/media/platform/marvell/built-in.a
  CC      drivers/base/physical_location.o
  CC      arch/x86/kernel/early-quirks.o
  AR      drivers/media/platform/mediatek/jpeg/built-in.a
  AR      drivers/media/platform/mediatek/mdp/built-in.a
  AR      drivers/media/platform/mediatek/vcodec/common/built-in.a
  CC      drivers/acpi/acpica/rsio.o
  AR      drivers/media/platform/mediatek/vcodec/encoder/built-in.a
  AR      drivers/media/platform/mediatek/vcodec/decoder/built-in.a
  AR      drivers/media/platform/mediatek/vcodec/built-in.a
  AR      drivers/media/platform/mediatek/vpu/built-in.a
  AR      drivers/media/platform/mediatek/mdp3/built-in.a
  AR      drivers/media/platform/mediatek/built-in.a
  AR      drivers/media/platform/microchip/built-in.a
  CC      drivers/net/phy/fixed_phy.o
  CC      arch/x86/kernel/smp.o
  AR      drivers/media/platform/nuvoton/built-in.a
  AR      drivers/media/platform/nvidia/tegra-vde/built-in.a
  AR      drivers/media/platform/nxp/dw100/built-in.a
  AR      drivers/media/platform/nvidia/built-in.a
  AR      drivers/media/platform/nxp/imx-jpeg/built-in.a
  CC      drivers/gpu/drm/drm_drv.o
  CC      lib/iomap_copy.o
  AR      drivers/usb/mon/built-in.a
  AR      drivers/net/ethernet/dlink/built-in.a
  AR      drivers/media/platform/nxp/imx8-isi/built-in.a
  CC      drivers/acpi/acpica/rsirq.o
  CC      drivers/scsi/scsi_logging.o
  AR      drivers/media/platform/nxp/built-in.a
  AR      drivers/net/ethernet/emulex/built-in.a
  CC      arch/x86/kernel/smpboot.o
  AR      drivers/media/platform/qcom/camss/built-in.a
  CC      drivers/input/mouse/lifebook.o
  CC      drivers/rtc/dev.o
  CC      net/ipv4/igmp.o
  AR      drivers/media/platform/qcom/venus/built-in.a
  CC      net/ipv4/fib_frontend.o
  AR      drivers/media/platform/qcom/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_gsc.o
  AR      drivers/media/platform/raspberrypi/pisp_be/built-in.a
  AR      drivers/media/platform/raspberrypi/built-in.a
  AR      drivers/media/platform/renesas/rcar-vin/built-in.a
  CC      fs/ext4/acl.o
  AR      drivers/media/platform/renesas/rzg2l-cru/built-in.a
  AR      drivers/media/platform/renesas/vsp1/built-in.a
  AR      drivers/media/platform/renesas/built-in.a
  CC      lib/devres.o
  AR      drivers/media/platform/rockchip/rga/built-in.a
  AR      drivers/media/platform/rockchip/rkisp1/built-in.a
  AR      drivers/media/platform/rockchip/built-in.a
  CC      drivers/ata/pata_amd.o
  AR      drivers/media/platform/samsung/exynos-gsc/built-in.a
  AR      drivers/media/platform/samsung/exynos4-is/built-in.a
  AR      drivers/media/platform/samsung/s3c-camif/built-in.a
  AR      drivers/media/platform/samsung/s5p-g2d/built-in.a
  AR      drivers/media/platform/samsung/s5p-jpeg/built-in.a
  CC      drivers/gpu/drm/i915/intel_uncore.o
  CC      kernel/acct.o
  AR      drivers/media/platform/samsung/s5p-mfc/built-in.a
  AR      drivers/media/platform/samsung/built-in.a
  CC      drivers/base/trace.o
  CC      drivers/usb/storage/usb.o
  AR      drivers/media/platform/st/sti/bdisp/built-in.a
  CC      drivers/i2c/i2c-core-smbus.o
  AR      drivers/media/platform/st/sti/c8sectpfe/built-in.a
  CC      drivers/i2c/i2c-core-acpi.o
  AR      drivers/media/platform/st/sti/delta/built-in.a
  AR      drivers/media/platform/st/sti/hva/built-in.a
  AR      drivers/input/tablet/built-in.a
  AR      drivers/media/platform/st/stm32/built-in.a
  AR      drivers/media/platform/sunxi/sun4i-csi/built-in.a
  AR      drivers/media/platform/st/built-in.a
  CC      drivers/acpi/acpica/rslist.o
  AR      drivers/media/platform/sunxi/sun6i-csi/built-in.a
  AR      drivers/pps/clients/built-in.a
  CC      kernel/vmcore_info.o
  AR      drivers/media/platform/sunxi/sun6i-mipi-csi2/built-in.a
  AR      drivers/pps/generators/built-in.a
  AR      drivers/media/platform/sunxi/sun8i-a83t-mipi-csi2/built-in.a
  CC      drivers/pps/pps.o
  AR      drivers/media/platform/sunxi/sun8i-di/built-in.a
  AR      drivers/media/platform/sunxi/sun8i-rotate/built-in.a
  AR      drivers/media/platform/sunxi/built-in.a
  AR      drivers/media/platform/ti/am437x/built-in.a
  AR      drivers/media/platform/ti/cal/built-in.a
  CC      lib/check_signature.o
  AR      drivers/media/platform/ti/vpe/built-in.a
  AR      drivers/media/platform/ti/davinci/built-in.a
  CC      mm/migrate.o
  AR      drivers/media/platform/ti/j721e-csi2rx/built-in.a
  AR      drivers/media/platform/ti/omap/built-in.a
  CC      drivers/gpu/drm/i915/intel_wakeref.o
  AR      drivers/media/platform/ti/omap3isp/built-in.a
  CC [M]  net/netfilter/xt_nat.o
  AR      drivers/media/platform/ti/built-in.a
  CC      drivers/pps/kapi.o
  AR      drivers/media/platform/verisilicon/built-in.a
  AR      drivers/media/platform/via/built-in.a
  CC      net/ipv6/xfrm6_protocol.o
  CC      net/ipv6/netfilter.o
  AR      drivers/media/platform/xilinx/built-in.a
  AR      drivers/media/platform/built-in.a
  CC      drivers/input/mouse/trackpoint.o
  AR      drivers/media/pci/ttpci/built-in.a
  CC      drivers/input/mouse/cypress_ps2.o
  AR      drivers/media/pci/b2c2/built-in.a
  CC      fs/ext4/xattr_security.o
  AR      drivers/media/pci/pluto2/built-in.a
  AR      drivers/media/pci/dm1105/built-in.a
  CC      lib/interval_tree.o
  AR      drivers/media/pci/pt1/built-in.a
  CC      drivers/rtc/proc.o
  AR      drivers/media/pci/pt3/built-in.a
  AR      drivers/media/pci/mantis/built-in.a
  CC      arch/x86/kernel/tsc_sync.o
  CC      drivers/ptp/ptp_clock.o
  AR      drivers/media/pci/ngene/built-in.a
  CC      drivers/usb/core/file.o
  AR      drivers/media/pci/ddbridge/built-in.a
  AR      drivers/media/pci/saa7146/built-in.a
  AR      drivers/media/pci/smipcie/built-in.a
  AR      drivers/media/pci/netup_unidvb/built-in.a
  CC      drivers/acpi/acpica/rsmemory.o
  AR      drivers/media/usb/b2c2/built-in.a
  AR      drivers/media/usb/dvb-usb/built-in.a
  AR      drivers/media/pci/intel/ipu3/built-in.a
  AR      drivers/media/usb/dvb-usb-v2/built-in.a
  AR      drivers/media/pci/intel/ivsc/built-in.a
  AR      drivers/net/ethernet/engleder/built-in.a
  CC      fs/nfs/callback_proc.o
  AR      drivers/media/pci/intel/built-in.a
  CC      drivers/acpi/acpi_apd.o
  AR      drivers/media/usb/s2255/built-in.a
  AR      drivers/media/pci/built-in.a
  AR      drivers/media/usb/siano/built-in.a
  CC      fs/namespace.o
  AR      drivers/media/usb/ttusb-budget/built-in.a
  AR      drivers/pcmcia/built-in.a
  CC      fs/seq_file.o
  AR      drivers/media/usb/ttusb-dec/built-in.a
  AR      drivers/media/usb/built-in.a
  CC      drivers/scsi/scsi_pm.o
  CC      drivers/usb/host/ehci-pci.o
  AR      drivers/media/mmc/siano/built-in.a
  AR      drivers/media/mmc/built-in.a
  AR      drivers/media/firewire/built-in.a
  AR      drivers/media/spi/built-in.a
  CC [M]  net/netfilter/xt_LOG.o
  CC      drivers/gpu/drm/i915/vlv_sideband.o
  AR      drivers/media/test-drivers/built-in.a
  CC      lib/assoc_array.o
  AR      drivers/media/built-in.a
  CC      lib/bitrev.o
  CC      net/mac80211/tkip.o
  CC      drivers/input/mouse/psmouse-smbus.o
  CC      drivers/usb/core/buffer.o
  CC      drivers/net/phy/realtek.o
  CC      drivers/ata/pata_oldpiix.o
  CC [M]  net/netfilter/xt_MASQUERADE.o
  CC      drivers/usb/early/ehci-dbgp.o
  CC      drivers/acpi/acpica/rsmisc.o
  AR      drivers/base/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_gsc_debugfs.o
  CC      lib/crc-ccitt.o
  CC      drivers/pps/sysfs.o
  CC      fs/nfs/nfs4namespace.o
  CC      drivers/i2c/i2c-smbus.o
  CC      mm/page_counter.o
  CC      lib/crc16.o
  CC      drivers/gpu/drm/drm_dumb_buffers.o
  CC [M]  net/netfilter/xt_addrtype.o
  CC      drivers/rtc/sysfs.o
  CC      kernel/elfcorehdr.o
  CC      net/ipv4/fib_semantics.o
  CC      arch/x86/kernel/setup_percpu.o
  CC      drivers/scsi/scsi_bsg.o
  CC      drivers/usb/storage/initializers.o
  AR      drivers/input/touchscreen/built-in.a
  CC      drivers/gpu/drm/i915/vlv_suspend.o
  CC      drivers/power/supply/power_supply_core.o
  CC      net/ipv4/fib_trie.o
  CC      fs/xattr.o
  AR      fs/ext4/built-in.a
  CC      drivers/ata/pata_sch.o
  CC      drivers/usb/host/ohci-hcd.o
  AR      drivers/pps/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_gsc_proxy.o
  CC      drivers/acpi/acpi_platform.o
  CC      fs/libfs.o
  CC      drivers/acpi/acpica/rsserial.o
  AR      drivers/input/misc/built-in.a
  CC      drivers/rtc/rtc-mc146818-lib.o
  CC      drivers/usb/storage/sierra_ms.o
  CC      drivers/net/net_failover.o
  CC      drivers/usb/core/sysfs.o
  CC      drivers/ptp/ptp_chardev.o
  CC      arch/x86/kernel/mpparse.o
  HOSTCC  lib/gen_crc32table
  CC      drivers/hwmon/hwmon.o
  AR      drivers/input/mouse/built-in.a
  CC      drivers/input/input.o
  AR      drivers/thermal/broadcom/built-in.a
  CC      fs/nfs/nfs4getroot.o
  AR      drivers/net/ethernet/ezchip/built-in.a
  AR      drivers/thermal/renesas/built-in.a
  CC      lib/xxhash.o
  CC      mm/hugetlb_cgroup.o
  AR      drivers/thermal/samsung/built-in.a
  CC      kernel/crash_reserve.o
  CC      drivers/thermal/intel/intel_tcc.o
  CC      mm/early_ioremap.o
  CC      net/ipv6/proc.o
  AR      drivers/i2c/built-in.a
  CC      drivers/scsi/scsi_common.o
  CC      drivers/input/input-compat.o
  CC      fs/fs-writeback.o
  CC      drivers/acpi/acpica/rsutils.o
  AR      drivers/usb/early/built-in.a
  CC      drivers/rtc/rtc-cmos.o
  CC      fs/nfs/nfs4client.o
  CC      drivers/thermal/intel/therm_throt.o
  CC      drivers/gpu/drm/drm_edid.o
  CC      net/ipv4/fib_notifier.o
  CC      drivers/input/input-mt.o
  CC      net/ipv6/syncookies.o
  CC      drivers/ata/pata_mpiix.o
  AR      drivers/net/phy/built-in.a
  CC      fs/pnode.o
  CC      drivers/ptp/ptp_sysfs.o
  CC      drivers/power/supply/power_supply_sysfs.o
  CC [M]  drivers/gpu/drm/xe/xe_gsc_submit.o
  AR      drivers/watchdog/built-in.a
  CC      net/mac80211/aes_cmac.o
  CC      lib/genalloc.o
  CC [M]  drivers/thermal/intel/x86_pkg_temp_thermal.o
  CC      drivers/acpi/acpica/rsxface.o
  CC      net/mac80211/aes_gmac.o
  CC      drivers/scsi/scsi_transport_spi.o
  CC      drivers/usb/storage/option_ms.o
  CC      fs/splice.o
  AR      drivers/net/ethernet/fujitsu/built-in.a
  CC      kernel/kexec_core.o
  CC      net/ipv6/calipso.o
  CC      drivers/ptp/ptp_vclock.o
  AR      net/netfilter/built-in.a
  CC      drivers/gpu/drm/i915/soc/intel_dram.o
  CC      mm/secretmem.o
  CC      drivers/md/md.o
  CC      drivers/usb/core/endpoint.o
  CC      drivers/power/supply/power_supply_leds.o
  CC      net/mac80211/fils_aead.o
  CC      drivers/usb/storage/usual-tables.o
  CC      arch/x86/kernel/trace_clock.o
  CC      drivers/ata/ata_generic.o
  CC      net/ipv6/ah6.o
  CC      fs/nfs/nfs4session.o
  CC      drivers/acpi/acpica/tbdata.o
  CC      drivers/gpu/drm/drm_eld.o
  CC      mm/hmm.o
  CC      arch/x86/kernel/trace.o
  CC      net/ipv6/esp6.o
  CC      drivers/input/input-poller.o
  CC      lib/percpu_counter.o
  CC      drivers/gpu/drm/i915/soc/intel_gmch.o
  CC      drivers/scsi/virtio_scsi.o
  AR      drivers/rtc/built-in.a
  CC      net/ipv4/inet_fragment.o
  CC      drivers/power/supply/power_supply_hwmon.o
  CC      drivers/input/ff-core.o
  CC [M]  drivers/gpu/drm/xe/xe_gt.o
  CC      drivers/gpu/drm/i915/soc/intel_pch.o
  CC      drivers/gpu/drm/drm_encoder.o
  AR      drivers/thermal/intel/built-in.a
  AR      drivers/thermal/st/built-in.a
  CC      drivers/gpu/drm/i915/soc/intel_rom.o
  CC      drivers/usb/host/ohci-pci.o
  AR      drivers/thermal/qcom/built-in.a
  AR      drivers/hwmon/built-in.a
  CC      mm/memfd.o
  CC      drivers/ptp/ptp_kvm_x86.o
  AR      drivers/thermal/tegra/built-in.a
  AR      drivers/thermal/mediatek/built-in.a
  CC      drivers/thermal/thermal_core.o
  AR      drivers/net/ethernet/fungible/built-in.a
  CC      drivers/thermal/thermal_sysfs.o
  CC      drivers/thermal/thermal_trip.o
  AR      drivers/usb/storage/built-in.a
  CC      drivers/scsi/sd.o
  CC      fs/nfs/dns_resolve.o
  CC      drivers/acpi/acpi_pnp.o
  CC      drivers/md/md-bitmap.o
  CC      drivers/usb/core/devio.o
  CC      fs/sync.o
  CC      drivers/acpi/acpica/tbfadt.o
  CC      drivers/ptp/ptp_kvm_common.o
  CC      net/ipv4/ping.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_ccs_mode.o
  CC      lib/audit.o
  CC      kernel/crash_core.o
  AR      drivers/power/supply/built-in.a
  AR      drivers/ata/built-in.a
  CC      drivers/input/touchscreen.o
  AR      drivers/power/built-in.a
  CC      drivers/acpi/power.o
  CC      mm/ptdump.o
  CC      drivers/gpu/drm/i915/i915_memcpy.o
  CC      fs/nfs/nfs4trace.o
  CC      mm/execmem.o
  CC      arch/x86/kernel/rethook.o
  CC      drivers/acpi/event.o
  CC      drivers/usb/core/notify.o
  CC      drivers/md/md-autodetect.o
  CC      fs/nfs/nfs4sysctl.o
  CC      drivers/acpi/acpica/tbfind.o
  CC      net/ipv6/sit.o
  CC      lib/syscall.o
  CC      drivers/usb/host/uhci-hcd.o
  CC      drivers/scsi/sr.o
  CC      kernel/kexec.o
  CC      net/mac80211/cfg.o
  CC      arch/x86/kernel/vmcore_info_32.o
  CC      fs/utimes.o
  CC      net/ipv4/ip_tunnel_core.o
  CC      drivers/acpi/evged.o
  CC      drivers/cpufreq/cpufreq.o
  CC      drivers/md/dm.o
  CC      net/ipv6/addrconf_core.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_clock.o
  CC      arch/x86/kernel/machine_kexec_32.o
  CC      drivers/cpufreq/freq_table.o
  CC      drivers/gpu/drm/drm_file.o
  CC      drivers/input/ff-memless.o
  CC      drivers/acpi/acpica/tbinstal.o
  CC      drivers/acpi/sysfs.o
  CC      drivers/thermal/thermal_helpers.o
  AR      mm/built-in.a
  CC      drivers/md/dm-table.o
  CC      drivers/gpu/drm/i915/i915_mm.o
  AR      drivers/ptp/built-in.a
  CC      kernel/utsname.o
  CC      net/ipv6/exthdrs_core.o
  CC      drivers/usb/core/generic.o
  CC      drivers/gpu/drm/i915/i915_sw_fence.o
  CC      drivers/acpi/property.o
  CC      drivers/md/dm-target.o
  CC      drivers/thermal/thermal_hwmon.o
  CC      drivers/cpufreq/cpufreq_performance.o
  CC      drivers/input/sparse-keymap.o
  CC      net/ipv6/ip6_checksum.o
  CC      lib/errname.o
  CC      drivers/usb/host/xhci.o
  CC      lib/nlattr.o
  CC      drivers/cpufreq/cpufreq_userspace.o
  CC      net/mac80211/ethtool.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_freq.o
  CC      drivers/acpi/acpica/tbprint.o
  CC      drivers/input/vivaldi-fmap.o
  CC      drivers/gpu/drm/i915/i915_sw_fence_work.o
  CC      drivers/usb/core/quirks.o
  CC      drivers/usb/host/xhci-mem.o
  CC      drivers/gpu/drm/drm_fourcc.o
  CC      drivers/md/dm-linear.o
  CC      kernel/pid_namespace.o
  CC      drivers/cpuidle/governors/menu.o
  CC      lib/cpu_rmap.o
  AS      arch/x86/kernel/relocate_kernel_32.o
  CC      drivers/thermal/gov_step_wise.o
  CC      drivers/acpi/acpica/tbutils.o
  AR      drivers/net/ethernet/google/built-in.a
  CC      drivers/thermal/gov_user_space.o
  CC      drivers/usb/core/devices.o
  CC      drivers/md/dm-stripe.o
  CC      arch/x86/kernel/crash_dump_32.o
  CC      net/mac80211/rx.o
  AR      drivers/mmc/built-in.a
  CC      net/ipv4/gre_offload.o
  CC      drivers/usb/host/xhci-ext-caps.o
  CC      kernel/stop_machine.o
  CC      kernel/audit.o
  CC      lib/dynamic_queue_limits.o
  CC      net/mac80211/spectmgmt.o
  CC      drivers/input/input-leds.o
  AR      drivers/net/ethernet/huawei/built-in.a
  CC      drivers/usb/host/xhci-ring.o
  CC      drivers/cpuidle/governors/haltpoll.o
  AR      drivers/ufs/built-in.a
  CC      drivers/md/dm-ioctl.o
  CC      drivers/acpi/debugfs.o
  CC      drivers/md/dm-io.o
  CC      drivers/gpu/drm/i915/i915_syncmap.o
  CC      drivers/acpi/acpi_lpat.o
  CC      drivers/scsi/sr_ioctl.o
  CC      drivers/acpi/acpica/tbxface.o
  CC      arch/x86/kernel/crash.o
  CC      drivers/acpi/acpica/tbxfload.o
  CC      drivers/input/evdev.o
  CC      drivers/cpufreq/cpufreq_ondemand.o
  CC      drivers/net/ethernet/intel/e1000/e1000_main.o
  CC      kernel/auditfilter.o
  AR      drivers/thermal/built-in.a
  CC      drivers/net/ethernet/intel/e1000e/82571.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_idle.o
  CC      drivers/cpufreq/cpufreq_governor.o
  CC      kernel/auditsc.o
  CC      drivers/acpi/acpi_pcc.o
  CC      drivers/gpu/drm/i915/i915_user_extensions.o
  CC      drivers/acpi/acpica/tbxfroot.o
  CC      drivers/cpuidle/cpuidle.o
  CC      drivers/usb/host/xhci-hub.o
  CC      drivers/net/ethernet/intel/e100.o
  CC      drivers/gpu/drm/drm_framebuffer.o
  CC      drivers/net/ethernet/intel/e1000e/ich8lan.o
  CC      drivers/cpufreq/cpufreq_governor_attr_set.o
  CC      net/ipv6/ip6_icmp.o
  AR      drivers/net/ethernet/i825xx/built-in.a
  CC      drivers/acpi/acpica/utaddress.o
  CC      drivers/usb/host/xhci-dbg.o
  CC      drivers/cpuidle/driver.o
  CC      fs/d_path.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_mcr.o
  CC      lib/glob.o
  CC      drivers/net/ethernet/intel/e1000/e1000_hw.o
  CC      drivers/usb/host/xhci-trace.o
  CC      drivers/usb/core/phy.o
  AR      drivers/net/ethernet/broadcom/built-in.a
  AR      drivers/firmware/arm_ffa/built-in.a
  AR      drivers/firmware/arm_scmi/built-in.a
  AR      drivers/firmware/broadcom/built-in.a
  CC      drivers/usb/host/xhci-debugfs.o
  AR      drivers/firmware/cirrus/built-in.a
  AR      drivers/firmware/meson/built-in.a
  AR      drivers/firmware/microchip/built-in.a
  CC      drivers/gpu/drm/i915/i915_debugfs.o
  CC      drivers/firmware/efi/efi-bgrt.o
  AR      drivers/crypto/stm32/built-in.a
  AR      drivers/crypto/xilinx/built-in.a
  CC      drivers/firmware/efi/libstub/efi-stub-helper.o
  AR      drivers/crypto/hisilicon/built-in.a
  AR      drivers/crypto/intel/keembay/built-in.a
  CC      drivers/clocksource/acpi_pm.o
  AR      drivers/crypto/intel/ixp4xx/built-in.a
  AR      drivers/crypto/intel/built-in.a
  AR      drivers/crypto/starfive/built-in.a
  CC      arch/x86/kernel/module.o
  AR      drivers/crypto/built-in.a
  CC      drivers/usb/core/port.o
  AR      drivers/net/ethernet/microsoft/built-in.a
  CC      net/ipv6/output_core.o
  CC      drivers/firmware/efi/libstub/gop.o
  CC      drivers/usb/host/xhci-pci.o
  CC      drivers/net/ethernet/intel/e1000/e1000_ethtool.o
  CC      drivers/acpi/acpica/utalloc.o
  CC      drivers/scsi/sr_vendor.o
  CC      lib/strncpy_from_user.o
  CC      drivers/gpu/drm/i915/i915_debugfs_params.o
  CC      net/ipv4/metrics.o
  AR      drivers/cpuidle/governors/built-in.a
  CC      drivers/cpufreq/acpi-cpufreq.o
  CC      drivers/net/ethernet/intel/e1000e/80003es2lan.o
  CC      drivers/firmware/efi/efi.o
  CC      fs/stack.o
  CC      arch/x86/kernel/doublefault_32.o
  CC      drivers/usb/core/hcd-pci.o
  AR      drivers/input/built-in.a
  CC      drivers/clocksource/i8253.o
  CC      drivers/acpi/acpica/utascii.o
  CC      drivers/gpu/drm/drm_gem.o
  CC      drivers/firmware/efi/libstub/secureboot.o
  CC      drivers/acpi/ac.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_pagefault.o
  CC      lib/strnlen_user.o
  AR      drivers/net/ethernet/litex/built-in.a
  CC      drivers/cpuidle/governor.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_sysfs.o
  CC      lib/net_utils.o
  CC      drivers/scsi/sg.o
  CC      net/mac80211/tx.o
  AR      drivers/clocksource/built-in.a
  CC      drivers/firmware/efi/vars.o
  CC      drivers/acpi/acpica/utbuffer.o
  CC      drivers/firmware/efi/libstub/tpm.o
  CC      drivers/net/ethernet/intel/e1000e/mac.o
  CC      net/ipv6/protocol.o
  CC      arch/x86/kernel/early_printk.o
  CC      net/ipv4/netlink.o
  CC      drivers/usb/core/usb-acpi.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_throttle.o
  CC      drivers/gpu/drm/drm_ioctl.o
  CC      net/ipv6/ip6_offload.o
  CC      net/ipv4/nexthop.o
  CC      drivers/cpuidle/sysfs.o
  AR      fs/nfs/built-in.a
  CC      fs/fs_struct.o
  CC      drivers/cpufreq/amd-pstate.o
  CC      drivers/cpuidle/poll_state.o
  CC      drivers/gpu/drm/drm_lease.o
  CC      arch/x86/kernel/hpet.o
  CC      drivers/acpi/acpica/utcksum.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.o
  CC      drivers/hid/hid-core.o
  CC      drivers/hid/usbhid/hid-core.o
  CC      lib/sg_pool.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_topology.o
  AR      drivers/platform/x86/amd/built-in.a
  AR      drivers/platform/x86/intel/built-in.a
  CC      drivers/platform/x86/wmi.o
  CC      kernel/audit_watch.o
  CC      drivers/mailbox/mailbox.o
  CC      drivers/cpufreq/amd-pstate-trace.o
  CC      drivers/gpu/drm/i915/i915_pmu.o
  AR      drivers/net/ethernet/marvell/octeon_ep/built-in.a
  CC      drivers/acpi/button.o
  AR      drivers/net/ethernet/marvell/octeon_ep_vf/built-in.a
  CC      drivers/acpi/acpica/utcopy.o
  AR      drivers/net/ethernet/marvell/octeontx2/built-in.a
  CC      drivers/acpi/acpica/utexcep.o
  CC      fs/statfs.o
  AR      drivers/net/ethernet/marvell/prestera/built-in.a
  CC      drivers/net/ethernet/marvell/sky2.o
  CC      drivers/acpi/acpica/utdebug.o
  CC      kernel/audit_fsnotify.o
  CC      drivers/net/ethernet/intel/e1000/e1000_param.o
  CC      lib/stackdepot.o
  AR      drivers/perf/built-in.a
  CC      drivers/acpi/fan_core.o
  CC      drivers/firmware/efi/libstub/file.o
  CC      fs/fs_pin.o
  CC      drivers/firmware/efi/reboot.o
  AR      drivers/usb/core/built-in.a
  CC      drivers/cpuidle/cpuidle-haltpoll.o
  CC      net/ipv6/tcpv6_offload.o
  CC      drivers/gpu/drm/i915/gt/gen2_engine_cs.o
  AR      drivers/platform/surface/built-in.a
  CC      drivers/cpufreq/intel_pstate.o
  CC      net/ipv4/udp_tunnel_stub.o
  AR      drivers/net/ethernet/mellanox/built-in.a
  AR      drivers/net/ethernet/meta/built-in.a
  CC      lib/asn1_decoder.o
  CC      drivers/md/dm-kcopyd.o
  CC      drivers/hid/usbhid/hiddev.o
  CC      drivers/md/dm-sysfs.o
  CC      drivers/acpi/fan_attr.o
  AR      drivers/firmware/imx/built-in.a
  GEN     lib/oid_registry_data.c
  CC      drivers/firmware/efi/memattr.o
  CC      drivers/acpi/acpica/utdecode.o
  CC      drivers/gpu/drm/i915/gt/gen6_engine_cs.o
  CC      drivers/md/dm-stats.o
  CC      drivers/mailbox/pcc.o
  AR      drivers/cpuidle/built-in.a
  AR      drivers/hwtracing/intel_th/built-in.a
  CC      drivers/acpi/acpica/utdelete.o
  CC      drivers/platform/x86/wmi-bmof.o
  CC      fs/nsfs.o
  AR      drivers/usb/host/built-in.a
  AR      drivers/usb/built-in.a
  CC      drivers/hid/hid-input.o
  CC      kernel/audit_tree.o
  CC      drivers/net/ethernet/intel/e1000e/manage.o
  CC      arch/x86/kernel/amd_nb.o
  CC      drivers/hid/usbhid/hid-pidff.o
  CC      drivers/hid/hid-quirks.o
  CC [M]  drivers/gpu/drm/xe/xe_guc.o
  CC      drivers/acpi/acpica/uterror.o
  CC      drivers/firmware/efi/libstub/mem.o
  CC      drivers/gpu/drm/i915/gt/gen6_ppgtt.o
  CC      drivers/hid/hid-debug.o
  CC      arch/x86/kernel/kvm.o
  AR      drivers/android/built-in.a
  CC      net/ipv6/exthdrs_offload.o
  CC      drivers/firmware/efi/libstub/random.o
  CC      lib/ucs2_string.o
  AR      drivers/firmware/psci/built-in.a
  CC      arch/x86/kernel/kvmclock.o
  AR      drivers/firmware/qcom/built-in.a
  CC      drivers/scsi/scsi_sysfs.o
  CC      drivers/acpi/acpica/uteval.o
  CC      drivers/md/dm-rq.o
  CC      lib/sbitmap.o
  CC      drivers/net/ethernet/intel/e1000e/nvm.o
  CC      drivers/firmware/efi/tpm.o
  CC      drivers/net/ethernet/intel/e1000e/phy.o
  AR      drivers/nvmem/layouts/built-in.a
  CC      drivers/nvmem/core.o
  AR      drivers/net/ethernet/micrel/built-in.a
  CC      fs/fs_types.o
  CC      kernel/kprobes.o
  CC      arch/x86/kernel/paravirt.o
  CC      drivers/hid/hidraw.o
  CC      net/mac80211/key.o
  CC      net/ipv6/inet6_hashtables.o
  AR      drivers/mailbox/built-in.a
  CC      drivers/net/ethernet/intel/e1000e/param.o
  CC      drivers/platform/x86/eeepc-laptop.o
  AR      drivers/net/ethernet/intel/e1000/built-in.a
  AR      drivers/net/ethernet/microchip/built-in.a
  AR      drivers/firmware/smccc/built-in.a
  AR      drivers/firmware/tegra/built-in.a
  CC      fs/fs_context.o
  CC      net/mac80211/util.o
  CC      drivers/platform/x86/p2sb.o
  CC      drivers/acpi/acpica/utglobal.o
  CC      kernel/seccomp.o
  CC      drivers/hid/hid-generic.o
  CC      drivers/gpu/drm/drm_managed.o
  CC      drivers/gpu/drm/i915/gt/gen7_renderclear.o
  CC      drivers/acpi/acpica/uthex.o
  CC      drivers/acpi/fan_hwmon.o
  CC      kernel/relay.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_ads.o
  CC      net/ipv6/mcast_snoop.o
  CC      drivers/md/dm-io-rewind.o
  CC      drivers/firmware/efi/memmap.o
  CC      drivers/md/dm-builtin.o
  CC      drivers/firmware/efi/libstub/randomalloc.o
  CC      drivers/firmware/efi/capsule.o
  CC      drivers/acpi/acpi_video.o
  CC      drivers/firmware/efi/esrt.o
  CC      drivers/net/ethernet/intel/e1000e/ethtool.o
  CC      drivers/acpi/acpica/utids.o
  CC      lib/group_cpus.o
  CC      net/mac80211/parse.o
  CC      drivers/net/ethernet/intel/e1000e/netdev.o
  CC      drivers/md/dm-raid1.o
  CC      arch/x86/kernel/pvclock.o
  AR      drivers/net/ethernet/mscc/built-in.a
  CC      fs/fs_parser.o
  CC      drivers/gpu/drm/i915/gt/gen8_engine_cs.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_capture.o
  AR      drivers/firmware/xilinx/built-in.a
  CC      drivers/firmware/efi/libstub/pci.o
  CC      drivers/firmware/efi/runtime-wrappers.o
  AR      drivers/hid/usbhid/built-in.a
  CC      net/ipv4/ip_tunnel.o
  AR      drivers/net/ethernet/myricom/built-in.a
  CC      kernel/utsname_sysctl.o
  CC      drivers/gpu/drm/i915/gt/gen8_ppgtt.o
  CC      arch/x86/kernel/pcspeaker.o
  AR      drivers/net/ethernet/natsemi/built-in.a
  CC      drivers/acpi/video_detect.o
  CC      fs/fsopen.o
  CC      drivers/acpi/acpica/utinit.o
  AR      drivers/scsi/built-in.a
  CC      drivers/firmware/efi/capsule-loader.o
  CC      drivers/firmware/dmi_scan.o
  CC      drivers/acpi/processor_driver.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_ct.o
  AR      drivers/nvmem/built-in.a
  CC      drivers/firmware/efi/libstub/skip_spaces.o
  CC      lib/fw_table.o
  CC      drivers/firmware/efi/libstub/lib-cmdline.o
  CC      fs/init.o
  CC      drivers/gpu/drm/drm_mm.o
  CC      drivers/acpi/processor_thermal.o
  CC      net/mac80211/wme.o
  CC      drivers/firmware/efi/earlycon.o
  AR      drivers/platform/x86/built-in.a
  AR      drivers/platform/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_guc_db_mgr.o
  CC      drivers/net/ethernet/intel/e1000e/ptp.o
  CC      drivers/firmware/dmi-id.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_hwconfig.o
  CC      drivers/acpi/acpica/utlock.o
  CC      drivers/md/dm-log.o
  CC      arch/x86/kernel/check.o
  CC      drivers/acpi/acpica/utmath.o
  CC      drivers/hid/hid-a4tech.o
  CC      drivers/firmware/efi/libstub/lib-ctype.o
  CC      drivers/firmware/efi/libstub/alignedmem.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_id_mgr.o
  CC      drivers/md/dm-region-hash.o
  AR      drivers/cpufreq/built-in.a
  CC      drivers/firmware/memmap.o
  AR      net/ipv6/built-in.a
  AR      drivers/net/ethernet/neterion/built-in.a
  CC      fs/kernel_read_file.o
  CC      drivers/gpu/drm/i915/gt/intel_breadcrumbs.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_klv_helpers.o
  CC      drivers/acpi/acpica/utmisc.o
  CC      drivers/acpi/processor_idle.o
  CC      arch/x86/kernel/uprobes.o
  AR      lib/lib.a
  CC      drivers/hid/hid-apple.o
  CC      drivers/md/dm-zero.o
  CC      drivers/gpu/drm/drm_mode_config.o
  CC      drivers/hid/hid-belkin.o
  GEN     lib/crc32table.h
  CC      lib/oid_registry.o
  CC      net/ipv4/sysctl_net_ipv4.o
  CC      net/mac80211/chan.o
  CC      kernel/delayacct.o
  CC      fs/mnt_idmapping.o
  CC      drivers/gpu/drm/drm_mode_object.o
  CC      fs/remap_range.o
  AR      drivers/net/ethernet/marvell/built-in.a
  CC      drivers/gpu/drm/i915/gt/intel_context.o
  CC      drivers/gpu/drm/drm_modes.o
  CC      net/mac80211/trace.o
  CC      drivers/gpu/drm/i915/gt/intel_context_sseu.o
  CC      drivers/acpi/acpica/utmutex.o
  CC      kernel/taskstats.o
  CC      drivers/firmware/efi/libstub/relocate.o
  CC      drivers/firmware/efi/libstub/printk.o
  CC      net/ipv4/proc.o
  AR      drivers/firmware/efi/built-in.a
  CC      drivers/hid/hid-cherry.o
  CC      drivers/firmware/efi/libstub/vsprintf.o
  CC      drivers/gpu/drm/drm_modeset_lock.o
  CC      drivers/hid/hid-chicony.o
  CC      drivers/acpi/processor_throttling.o
  CC      lib/crc32.o
  CC      drivers/acpi/acpica/utnonansi.o
  CC      net/ipv4/fib_rules.o
  CC      drivers/firmware/efi/libstub/x86-stub.o
  AR      drivers/net/ethernet/netronome/built-in.a
  CC      drivers/gpu/drm/i915/gt/intel_engine_cs.o
  CC      arch/x86/kernel/perf_regs.o
  CC      kernel/tsacct.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_log.o
  CC      fs/pidfs.o
  CC      drivers/acpi/processor_perflib.o
  AR      drivers/md/built-in.a
  CC      drivers/acpi/acpica/utobject.o
  CC      drivers/firmware/efi/libstub/smbios.o
  CC      drivers/hid/hid-cypress.o
  CC      drivers/gpu/drm/i915/gt/intel_engine_heartbeat.o
  AR      drivers/net/ethernet/ni/built-in.a
  CC      drivers/gpu/drm/drm_plane.o
  STUBCPY drivers/firmware/efi/libstub/alignedmem.stub.o
  CC      drivers/net/ethernet/nvidia/forcedeth.o
  AR      drivers/net/ethernet/oki-semi/built-in.a
  CC      drivers/gpu/drm/drm_prime.o
  CC      kernel/tracepoint.o
  CC      drivers/gpu/drm/drm_print.o
  CC      fs/buffer.o
  STUBCPY drivers/firmware/efi/libstub/efi-stub-helper.stub.o
  CC      drivers/gpu/drm/drm_property.o
  STUBCPY drivers/firmware/efi/libstub/file.stub.o
  CC      net/ipv4/ipmr.o
  CC      drivers/hid/hid-ezkey.o
  CC      kernel/irq_work.o
  CC      arch/x86/kernel/tracepoint.o
  CC      drivers/acpi/acpica/utosi.o
  AR      lib/built-in.a
  CC      net/mac80211/mlme.o
  CC      fs/mpage.o
  AR      drivers/net/ethernet/packetengines/built-in.a
  CC      net/mac80211/tdls.o
  CC      drivers/acpi/container.o
  CC      drivers/gpu/drm/i915/gt/intel_engine_pm.o
  CC      kernel/static_call.o
  CC      arch/x86/kernel/itmt.o
  STUBCPY drivers/firmware/efi/libstub/gop.stub.o
  CC      drivers/hid/hid-gyration.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_pc.o
  CC      net/ipv4/ipmr_base.o
  AR      drivers/net/ethernet/qlogic/built-in.a
  CC      drivers/gpu/drm/drm_rect.o
  AR      drivers/net/ethernet/qualcomm/emac/built-in.a
  CC      drivers/gpu/drm/drm_syncobj.o
  AR      drivers/net/ethernet/qualcomm/built-in.a
  CC      drivers/acpi/thermal_lib.o
  CC      drivers/gpu/drm/i915/gt/intel_engine_user.o
  CC      kernel/padata.o
  CC      drivers/hid/hid-ite.o
  CC      fs/proc_namespace.o
  CC      net/ipv4/syncookies.o
  CC      drivers/acpi/acpica/utownerid.o
  STUBCPY drivers/firmware/efi/libstub/lib-cmdline.stub.o
  CC      kernel/jump_label.o
  STUBCPY drivers/firmware/efi/libstub/lib-ctype.stub.o
  STUBCPY drivers/firmware/efi/libstub/mem.stub.o
  STUBCPY drivers/firmware/efi/libstub/pci.stub.o
  STUBCPY drivers/firmware/efi/libstub/printk.stub.o
  CC      drivers/gpu/drm/i915/gt/intel_execlists_submission.o
  CC      arch/x86/kernel/umip.o
  STUBCPY drivers/firmware/efi/libstub/random.stub.o
  STUBCPY drivers/firmware/efi/libstub/randomalloc.stub.o
  CC      drivers/net/ethernet/realtek/8139too.o
  STUBCPY drivers/firmware/efi/libstub/relocate.stub.o
  STUBCPY drivers/firmware/efi/libstub/secureboot.stub.o
  STUBCPY drivers/firmware/efi/libstub/skip_spaces.stub.o
  CC      arch/x86/kernel/unwind_frame.o
  STUBCPY drivers/firmware/efi/libstub/smbios.stub.o
  STUBCPY drivers/firmware/efi/libstub/tpm.stub.o
  CC      drivers/net/ethernet/realtek/r8169_main.o
  STUBCPY drivers/firmware/efi/libstub/vsprintf.stub.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_submit.o
  CC      drivers/net/ethernet/realtek/r8169_firmware.o
  CC [M]  drivers/gpu/drm/xe/xe_heci_gsc.o
  STUBCPY drivers/firmware/efi/libstub/x86-stub.stub.o
  AR      drivers/firmware/efi/libstub/lib.a
  CC      fs/direct-io.o
  CC      drivers/acpi/acpica/utpredef.o
  AR      drivers/firmware/built-in.a
  CC      drivers/acpi/thermal.o
  CC      net/ipv4/tunnel4.o
  CC      fs/eventpoll.o
  CC [M]  drivers/gpu/drm/xe/xe_hw_engine.o
  CC      drivers/net/ethernet/realtek/r8169_phy_config.o
  AR      drivers/net/ethernet/renesas/built-in.a
  CC      kernel/context_tracking.o
  CC      drivers/hid/hid-kensington.o
  CC      drivers/gpu/drm/drm_sysfs.o
  CC      net/mac80211/ocb.o
  CC      drivers/gpu/drm/i915/gt/intel_ggtt.o
  CC      drivers/gpu/drm/drm_trace_points.o
  CC      fs/anon_inodes.o
  CC      net/ipv4/ipconfig.o
  CC [M]  drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.o
  CC      drivers/acpi/acpica/utresdecode.o
  CC      drivers/hid/hid-lg.o
  AR      drivers/net/ethernet/rdc/built-in.a
  CC      kernel/iomem.o
  CC      drivers/acpi/nhlt.o
  CC      net/mac80211/airtime.o
  CC      drivers/gpu/drm/drm_vblank.o
  CC      drivers/gpu/drm/drm_vblank_work.o
  CC      kernel/rseq.o
  CC      net/ipv4/netfilter.o
  CC [M]  drivers/gpu/drm/xe/xe_hw_engine_group.o
  CC      drivers/gpu/drm/drm_vma_manager.o
  CC      drivers/acpi/acpica/utresrc.o
  AR      arch/x86/kernel/built-in.a
  CC      net/ipv4/tcp_cubic.o
  AR      arch/x86/built-in.a
  CC      net/ipv4/tcp_sigpool.o
  AR      drivers/net/ethernet/rocker/built-in.a
  CC      drivers/acpi/acpi_memhotplug.o
  AR      drivers/net/ethernet/samsung/built-in.a
  CC      net/mac80211/eht.o
  AR      drivers/net/ethernet/seeq/built-in.a
  CC      drivers/gpu/drm/i915/gt/intel_ggtt_fencing.o
  CC      drivers/gpu/drm/drm_writeback.o
  CC      fs/signalfd.o
  CC      drivers/acpi/ioapic.o
  CC      net/ipv4/cipso_ipv4.o
  CC      drivers/hid/hid-lgff.o
  CC      net/mac80211/led.o
  CC      drivers/acpi/acpica/utstate.o
  CC [M]  drivers/gpu/drm/xe/xe_hw_fence.o
  CC      drivers/gpu/drm/i915/gt/intel_gt.o
  CC      drivers/gpu/drm/drm_panel.o
  CC      drivers/hid/hid-lg4ff.o
  CC      drivers/acpi/battery.o
  CC [M]  drivers/gpu/drm/xe/xe_huc.o
  CC      drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.o
  CC      drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.o
  AR      drivers/net/ethernet/silan/built-in.a
  CC      drivers/acpi/acpica/utstring.o
  CC      net/mac80211/pm.o
  CC      drivers/acpi/acpica/utstrsuppt.o
  CC      drivers/gpu/drm/drm_pci.o
  CC      fs/timerfd.o
  CC      drivers/gpu/drm/i915/gt/intel_gt_clock_utils.o
  CC      net/ipv4/xfrm4_policy.o
  AR      drivers/net/ethernet/sis/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_irq.o
  CC      drivers/acpi/bgrt.o
  CC      drivers/acpi/acpica/utstrtoul64.o
  AR      kernel/built-in.a
  CC      net/ipv4/xfrm4_state.o
  CC      drivers/hid/hid-lg-g15.o
  CC      drivers/gpu/drm/drm_debugfs.o
  CC      drivers/gpu/drm/i915/gt/intel_gt_debugfs.o
  CC [M]  drivers/gpu/drm/xe/xe_lrc.o
  CC      drivers/acpi/spcr.o
  AR      drivers/net/ethernet/sfc/built-in.a
  CC      net/mac80211/rc80211_minstrel_ht.o
  AR      drivers/net/ethernet/smsc/built-in.a
  CC      fs/eventfd.o
  CC      drivers/acpi/acpica/utxface.o
  CC [M]  drivers/gpu/drm/xe/xe_migrate.o
  CC      drivers/gpu/drm/drm_debugfs_crc.o
  CC      net/ipv4/xfrm4_input.o
  AR      drivers/net/ethernet/socionext/built-in.a
  CC      net/mac80211/wbrf.o
  CC      drivers/gpu/drm/i915/gt/intel_gt_engines_debugfs.o
  CC      drivers/hid/hid-microsoft.o
  CC      drivers/gpu/drm/i915/gt/intel_gt_irq.o
  CC      drivers/gpu/drm/drm_panel_orientation_quirks.o
  CC      drivers/gpu/drm/i915/gt/intel_gt_mcr.o
  CC [M]  drivers/gpu/drm/xe/xe_mmio.o
  CC      net/ipv4/xfrm4_output.o
  AR      drivers/net/ethernet/intel/e1000e/built-in.a
  AR      drivers/net/ethernet/intel/built-in.a
  CC      fs/aio.o
  CC      drivers/acpi/acpica/utxfinit.o
  CC      drivers/hid/hid-monterey.o
  CC      drivers/gpu/drm/i915/gt/intel_gt_pm.o
  AR      drivers/net/ethernet/stmicro/built-in.a
  CC      drivers/acpi/acpica/utxferror.o
  AR      drivers/net/ethernet/sun/built-in.a
  AR      drivers/net/ethernet/tehuti/built-in.a
  CC      fs/locks.o
  CC      drivers/hid/hid-ntrig.o
  AR      drivers/net/ethernet/ti/built-in.a
  CC      drivers/gpu/drm/drm_buddy.o
  AR      drivers/net/ethernet/vertexcom/built-in.a
  AR      drivers/net/ethernet/via/built-in.a
  AR      drivers/net/ethernet/wangxun/built-in.a
  CC      drivers/acpi/acpica/utxfmutex.o
  CC      drivers/gpu/drm/drm_gem_shmem_helper.o
  AR      drivers/net/ethernet/nvidia/built-in.a
  CC      drivers/hid/hid-pl.o
  CC      drivers/gpu/drm/drm_atomic_helper.o
  CC [M]  drivers/gpu/drm/xe/xe_mocs.o
  CC      fs/binfmt_misc.o
  CC      drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.o
  CC [M]  drivers/gpu/drm/xe/xe_module.o
  AR      drivers/net/ethernet/wiznet/built-in.a
  CC      drivers/hid/hid-petalynx.o
  CC      drivers/gpu/drm/drm_atomic_state_helper.o
  CC      fs/binfmt_script.o
  CC      net/ipv4/xfrm4_protocol.o
  CC      drivers/gpu/drm/drm_crtc_helper.o
  AR      drivers/net/ethernet/xilinx/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_oa.o
  CC      drivers/hid/hid-redragon.o
  CC      fs/binfmt_elf.o
  CC      drivers/gpu/drm/i915/gt/intel_gt_pm_irq.o
  AR      drivers/acpi/acpica/built-in.a
  AR      drivers/net/ethernet/xircom/built-in.a
  CC      drivers/gpu/drm/i915/gt/intel_gt_requests.o
  AR      drivers/net/ethernet/synopsys/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_observation.o
  AR      drivers/net/ethernet/pensando/built-in.a
  AR      drivers/acpi/built-in.a
  CC      fs/mbcache.o
  CC      fs/posix_acl.o
  CC      drivers/gpu/drm/drm_damage_helper.o
  CC      drivers/gpu/drm/i915/gt/intel_gt_sysfs.o
  CC [M]  drivers/gpu/drm/xe/xe_pat.o
  CC      drivers/hid/hid-samsung.o
  CC      drivers/gpu/drm/i915/gt/intel_gt_sysfs_pm.o
  CC      drivers/hid/hid-sony.o
  CC      drivers/hid/hid-sunplus.o
  CC      fs/coredump.o
  CC      drivers/gpu/drm/i915/gt/intel_gtt.o
  CC      drivers/hid/hid-topseed.o
  CC      drivers/gpu/drm/drm_encoder_slave.o
  CC      drivers/gpu/drm/i915/gt/intel_llc.o
  CC [M]  drivers/gpu/drm/xe/xe_pci.o
  AR      drivers/net/ethernet/realtek/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_pcode.o
  AR      drivers/net/ethernet/built-in.a
  CC      fs/drop_caches.o
  CC      drivers/gpu/drm/i915/gt/intel_lrc.o
  CC      drivers/gpu/drm/drm_flip_work.o
  AR      drivers/net/built-in.a
  CC      fs/sysctls.o
  CC [M]  drivers/gpu/drm/xe/xe_pm.o
  CC [M]  drivers/gpu/drm/xe/xe_preempt_fence.o
  CC      fs/fhandle.o
  CC [M]  drivers/gpu/drm/xe/xe_pt.o
  CC      drivers/gpu/drm/drm_format_helper.o
  CC [M]  drivers/gpu/drm/xe/xe_pt_walk.o
  CC      drivers/gpu/drm/i915/gt/intel_migrate.o
  CC      drivers/gpu/drm/drm_gem_atomic_helper.o
  CC [M]  drivers/gpu/drm/xe/xe_query.o
  CC      drivers/gpu/drm/drm_gem_framebuffer_helper.o
  CC      drivers/gpu/drm/i915/gt/intel_mocs.o
  CC      drivers/gpu/drm/drm_kms_helper_common.o
  CC      drivers/gpu/drm/i915/gt/intel_ppgtt.o
  CC [M]  drivers/gpu/drm/xe/xe_range_fence.o
  CC      drivers/gpu/drm/i915/gt/intel_rc6.o
  CC      drivers/gpu/drm/drm_modeset_helper.o
  CC [M]  drivers/gpu/drm/xe/xe_reg_sr.o
  AR      net/ipv4/built-in.a
  CC      drivers/gpu/drm/drm_plane_helper.o
  CC      drivers/gpu/drm/i915/gt/intel_region_lmem.o
  CC [M]  drivers/gpu/drm/xe/xe_reg_whitelist.o
  CC      drivers/gpu/drm/i915/gt/intel_renderstate.o
  CC      drivers/gpu/drm/drm_probe_helper.o
  CC [M]  drivers/gpu/drm/xe/xe_rtp.o
  CC [M]  drivers/gpu/drm/xe/xe_ring_ops.o
  CC [M]  drivers/gpu/drm/xe/xe_sa.o
  CC      drivers/gpu/drm/drm_self_refresh_helper.o
  CC      drivers/gpu/drm/drm_simple_kms_helper.o
  CC      drivers/gpu/drm/i915/gt/intel_reset.o
  CC [M]  drivers/gpu/drm/xe/xe_sched_job.o
  CC      drivers/gpu/drm/i915/gt/intel_ring.o
  CC      drivers/gpu/drm/bridge/panel.o
  CC [M]  drivers/gpu/drm/xe/xe_step.o
  CC      drivers/gpu/drm/i915/gt/intel_ring_submission.o
  CC [M]  drivers/gpu/drm/xe/xe_sync.o
  CC      drivers/gpu/drm/drm_mipi_dsi.o
  CC [M]  drivers/gpu/drm/xe/xe_tile.o
  CC      drivers/gpu/drm/i915/gt/intel_rps.o
  CC [M]  drivers/gpu/drm/xe/xe_tile_sysfs.o
  CC [M]  drivers/gpu/drm/drm_exec.o
  CC      drivers/gpu/drm/i915/gt/intel_sa_media.o
  CC [M]  drivers/gpu/drm/xe/xe_trace.o
  CC [M]  drivers/gpu/drm/drm_gpuvm.o
  CC      drivers/gpu/drm/i915/gt/intel_sseu.o
  AR      drivers/hid/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_trace_bo.o
  CC      drivers/gpu/drm/i915/gt/intel_sseu_debugfs.o
  CC [M]  drivers/gpu/drm/drm_suballoc.o
  CC [M]  drivers/gpu/drm/xe/xe_trace_guc.o
  CC      drivers/gpu/drm/i915/gt/intel_timeline.o
  CC [M]  drivers/gpu/drm/drm_gem_ttm_helper.o
  CC      drivers/gpu/drm/i915/gt/intel_tlb.o
  CC [M]  drivers/gpu/drm/xe/xe_ttm_sys_mgr.o
  CC [M]  drivers/gpu/drm/xe/xe_ttm_stolen_mgr.o
  CC      drivers/gpu/drm/i915/gt/intel_wopcm.o
  CC [M]  drivers/gpu/drm/xe/xe_ttm_vram_mgr.o
  CC [M]  drivers/gpu/drm/xe/xe_tuning.o
  CC      drivers/gpu/drm/i915/gt/intel_workarounds.o
  CC      drivers/gpu/drm/i915/gt/shmem_utils.o
  CC [M]  drivers/gpu/drm/xe/xe_uc.o
  CC [M]  drivers/gpu/drm/xe/xe_uc_fw.o
  CC [M]  drivers/gpu/drm/xe/xe_vm.o
  CC      drivers/gpu/drm/i915/gt/sysfs_engines.o
  AR      fs/built-in.a
  CC [M]  drivers/gpu/drm/xe/xe_vram.o
  CC      drivers/gpu/drm/i915/gt/intel_ggtt_gmch.o
  CC      drivers/gpu/drm/i915/gt/gen6_renderstate.o
  CC [M]  drivers/gpu/drm/xe/xe_vram_freq.o
  CC [M]  drivers/gpu/drm/xe/xe_wait_user_fence.o
  CC      drivers/gpu/drm/i915/gt/gen7_renderstate.o
  CC [M]  drivers/gpu/drm/xe/xe_wa.o
  CC      drivers/gpu/drm/i915/gt/gen8_renderstate.o
  CC [M]  drivers/gpu/drm/xe/xe_wopcm.o
  CC      drivers/gpu/drm/i915/gt/gen9_renderstate.o
  CC [M]  drivers/gpu/drm/xe/xe_hmm.o
  LD [M]  drivers/gpu/drm/drm_suballoc_helper.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_busy.o
  CC [M]  drivers/gpu/drm/xe/xe_hwmon.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_clflush.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_sriov_vf.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_context.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_create.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_relay.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_dmabuf.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_domain.o
  LD [M]  drivers/gpu/drm/drm_ttm_helper.o
  CC [M]  drivers/gpu/drm/xe/xe_memirq.o
  CC [M]  drivers/gpu/drm/xe/xe_sriov.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_internal.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_lmem.o
  CC [M]  drivers/gpu/drm/xe/display/ext/i915_irq.o
  CC [M]  drivers/gpu/drm/xe/display/ext/i915_utils.o
  CC [M]  drivers/gpu/drm/xe/display/intel_bo.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_mman.o
  CC [M]  drivers/gpu/drm/xe/display/intel_fb_bo.o
  CC [M]  drivers/gpu/drm/xe/display/intel_fbdev_fb.o
  CC [M]  drivers/gpu/drm/xe/display/xe_display.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_object.o
  CC [M]  drivers/gpu/drm/xe/display/xe_display_misc.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_pages.o
  CC [M]  drivers/gpu/drm/xe/display/xe_display_rps.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_phys.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_pm.o
  CC [M]  drivers/gpu/drm/xe/display/xe_display_wa.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_region.o
  CC [M]  drivers/gpu/drm/xe/display/xe_dsb_buffer.o
  CC [M]  drivers/gpu/drm/xe/display/xe_fb_pin.o
  CC [M]  drivers/gpu/drm/xe/display/xe_hdcp_gsc.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_shmem.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_shrinker.o
  CC [M]  drivers/gpu/drm/xe/display/xe_plane_initial.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_stolen.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_throttle.o
  CC [M]  drivers/gpu/drm/xe/display/xe_tdf.o
  CC [M]  drivers/gpu/drm/xe/i915-soc/intel_dram.o
  CC [M]  drivers/gpu/drm/xe/i915-soc/intel_pch.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_tiling.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_ttm.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_ttm_move.o
  CC [M]  drivers/gpu/drm/xe/i915-soc/intel_rom.o
  CC [M]  drivers/gpu/drm/xe/i915-display/icl_dsi.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_alpm.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_atomic.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_atomic_plane.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_audio.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_userptr.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_backlight.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_bios.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_bw.o
  CC      drivers/gpu/drm/i915/gem/i915_gem_wait.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_cdclk.o
  CC      drivers/gpu/drm/i915/gem/i915_gemfs.o
  CC      drivers/gpu/drm/i915/i915_active.o
  CC      drivers/gpu/drm/i915/i915_cmd_parser.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_color.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_combo_phy.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_connector.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_crtc.o
  CC      drivers/gpu/drm/i915/i915_deps.o
  CC      drivers/gpu/drm/i915/i915_gem.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_crtc_state_dump.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_cursor.o
  CC      drivers/gpu/drm/i915/i915_gem_evict.o
  CC      drivers/gpu/drm/i915/i915_gem_gtt.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_cx0_phy.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_ddi.o
  CC      drivers/gpu/drm/i915/i915_gem_ww.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_ddi_buf_trans.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display.o
  CC      drivers/gpu/drm/i915/i915_query.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_device.o
  CC      drivers/gpu/drm/i915/i915_request.o
  CC      drivers/gpu/drm/i915/i915_scheduler.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_driver.o
  CC      drivers/gpu/drm/i915/i915_trace_points.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_irq.o
  CC      drivers/gpu/drm/i915/i915_ttm_buddy_manager.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_params.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_power.o
  CC      drivers/gpu/drm/i915/i915_vma.o
  CC      drivers/gpu/drm/i915/i915_vma_resource.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_power_map.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_gsc_proxy.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_gsc_uc.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_debugfs.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_gsc_uc_heci_cmd_submit.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_power_well.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_guc.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_guc_ads.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_guc_capture.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_trace.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_wa.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dkl_phy.o
  AR      net/mac80211/built-in.a
  AR      net/built-in.a
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dmc.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dp.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_guc_ct.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dp_aux.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dp_aux_backlight.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dp_hdcp.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dp_link_training.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dp_mst.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dp_test.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_guc_fw.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dpll.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dpll_mgr.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_guc_log.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dpt_common.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_guc_log_debugfs.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_guc_rc.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_drrs.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_guc_submission.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dsb.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dsi.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_huc.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_huc_debugfs.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dsi_dcs_backlight.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_huc_fw.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dsi_vbt.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_uc.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_encoder.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_uc_debugfs.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_fb.o
  CC      drivers/gpu/drm/i915/gt/uc/intel_uc_fw.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_fbc.o
  CC      drivers/gpu/drm/i915/gt/intel_gsc.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_fdi.o
  CC      drivers/gpu/drm/i915/i915_hwmon.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_fifo_underrun.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_frontbuffer.o
  CC      drivers/gpu/drm/i915/display/hsw_ips.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_global_state.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_gmbus.o
  CC      drivers/gpu/drm/i915/display/i9xx_plane.o
  CC      drivers/gpu/drm/i915/display/i9xx_wm.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_hdcp.o
  CC      drivers/gpu/drm/i915/display/intel_alpm.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_hdcp_gsc_message.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_hdmi.o
  CC      drivers/gpu/drm/i915/display/intel_atomic.o
  CC      drivers/gpu/drm/i915/display/intel_atomic_plane.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_hotplug.o
  CC      drivers/gpu/drm/i915/display/intel_audio.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_hotplug_irq.o
  CC      drivers/gpu/drm/i915/display/intel_bios.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_hti.o
  CC      drivers/gpu/drm/i915/display/intel_bo.o
  CC      drivers/gpu/drm/i915/display/intel_bw.o
  CC      drivers/gpu/drm/i915/display/intel_cdclk.o
  CC      drivers/gpu/drm/i915/display/intel_color.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_link_bw.o
  CC      drivers/gpu/drm/i915/display/intel_combo_phy.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_lspcon.o
  CC      drivers/gpu/drm/i915/display/intel_connector.o
  CC      drivers/gpu/drm/i915/display/intel_crtc.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_modeset_lock.o
  CC      drivers/gpu/drm/i915/display/intel_crtc_state_dump.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_modeset_setup.o
  CC      drivers/gpu/drm/i915/display/intel_cursor.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_modeset_verify.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_panel.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_pfit.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_pmdemand.o
  CC      drivers/gpu/drm/i915/display/intel_display.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_pps.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_psr.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_qp_tables.o
  CC      drivers/gpu/drm/i915/display/intel_display_driver.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_quirks.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_snps_phy.o
  CC      drivers/gpu/drm/i915/display/intel_display_irq.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_tc.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_vblank.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_vdsc.o
  CC      drivers/gpu/drm/i915/display/intel_display_params.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_vga.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_vrr.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_dmc_wl.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_wm.o
  CC      drivers/gpu/drm/i915/display/intel_display_power.o
  CC [M]  drivers/gpu/drm/xe/i915-display/skl_scaler.o
  CC      drivers/gpu/drm/i915/display/intel_display_power_map.o
  CC      drivers/gpu/drm/i915/display/intel_display_power_well.o
  CC      drivers/gpu/drm/i915/display/intel_display_reset.o
  CC [M]  drivers/gpu/drm/xe/i915-display/skl_universal_plane.o
  CC      drivers/gpu/drm/i915/display/intel_display_rps.o
  CC [M]  drivers/gpu/drm/xe/i915-display/skl_watermark.o
  CC      drivers/gpu/drm/i915/display/intel_display_snapshot.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_acpi.o
  CC      drivers/gpu/drm/i915/display/intel_display_wa.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_opregion.o
  CC [M]  drivers/gpu/drm/xe/xe_debugfs.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_debugfs.o
  CC      drivers/gpu/drm/i915/display/intel_dmc.o
  CC      drivers/gpu/drm/i915/display/intel_dmc_wl.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_sriov_vf_debugfs.o
  CC      drivers/gpu/drm/i915/display/intel_dpio_phy.o
  CC      drivers/gpu/drm/i915/display/intel_dpll.o
  CC      drivers/gpu/drm/i915/display/intel_dpll_mgr.o
  CC [M]  drivers/gpu/drm/xe/xe_gt_stats.o
  CC      drivers/gpu/drm/i915/display/intel_dpt.o
  CC      drivers/gpu/drm/i915/display/intel_dpt_common.o
  CC [M]  drivers/gpu/drm/xe/xe_guc_debugfs.o
  CC [M]  drivers/gpu/drm/xe/xe_huc_debugfs.o
  CC [M]  drivers/gpu/drm/xe/xe_uc_debugfs.o
  CC      drivers/gpu/drm/i915/display/intel_drrs.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_debugfs.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_display_debugfs_params.o
  CC [M]  drivers/gpu/drm/xe/i915-display/intel_pipe_crc.o
  CC      drivers/gpu/drm/i915/display/intel_dsb.o
  CC      drivers/gpu/drm/i915/display/intel_dsb_buffer.o
  CC      drivers/gpu/drm/i915/display/intel_fb.o
  CC      drivers/gpu/drm/i915/display/intel_fb_bo.o
  CC      drivers/gpu/drm/i915/display/intel_fb_pin.o
  CC      drivers/gpu/drm/i915/display/intel_fbc.o
  CC      drivers/gpu/drm/i915/display/intel_fdi.o
  CC      drivers/gpu/drm/i915/display/intel_fifo_underrun.o
  CC      drivers/gpu/drm/i915/display/intel_frontbuffer.o
  CC      drivers/gpu/drm/i915/display/intel_global_state.o
  CC      drivers/gpu/drm/i915/display/intel_hdcp.o
  CC      drivers/gpu/drm/i915/display/intel_hdcp_gsc.o
  CC      drivers/gpu/drm/i915/display/intel_hdcp_gsc_message.o
  CC      drivers/gpu/drm/i915/display/intel_hotplug.o
  CC      drivers/gpu/drm/i915/display/intel_hotplug_irq.o
  CC      drivers/gpu/drm/i915/display/intel_hti.o
  CC      drivers/gpu/drm/i915/display/intel_link_bw.o
  CC      drivers/gpu/drm/i915/display/intel_load_detect.o
  CC      drivers/gpu/drm/i915/display/intel_lpe_audio.o
  CC      drivers/gpu/drm/i915/display/intel_modeset_lock.o
  CC      drivers/gpu/drm/i915/display/intel_modeset_setup.o
  CC      drivers/gpu/drm/i915/display/intel_modeset_verify.o
  CC      drivers/gpu/drm/i915/display/intel_overlay.o
  CC      drivers/gpu/drm/i915/display/intel_pch_display.o
  CC      drivers/gpu/drm/i915/display/intel_pch_refclk.o
  CC      drivers/gpu/drm/i915/display/intel_plane_initial.o
  CC      drivers/gpu/drm/i915/display/intel_pmdemand.o
  CC      drivers/gpu/drm/i915/display/intel_psr.o
  CC      drivers/gpu/drm/i915/display/intel_quirks.o
  CC      drivers/gpu/drm/i915/display/intel_sprite.o
  CC      drivers/gpu/drm/i915/display/intel_sprite_uapi.o
  CC      drivers/gpu/drm/i915/display/intel_tc.o
  CC      drivers/gpu/drm/i915/display/intel_vblank.o
  CC      drivers/gpu/drm/i915/display/intel_vga.o
  CC      drivers/gpu/drm/i915/display/intel_wm.o
  CC      drivers/gpu/drm/i915/display/skl_scaler.o
  CC      drivers/gpu/drm/i915/display/skl_universal_plane.o
  CC      drivers/gpu/drm/i915/display/skl_watermark.o
  CC      drivers/gpu/drm/i915/display/intel_acpi.o
  CC      drivers/gpu/drm/i915/display/intel_opregion.o
  CC      drivers/gpu/drm/i915/display/intel_display_debugfs.o
  CC      drivers/gpu/drm/i915/display/intel_display_debugfs_params.o
  CC      drivers/gpu/drm/i915/display/intel_pipe_crc.o
  CC      drivers/gpu/drm/i915/display/dvo_ch7017.o
  CC      drivers/gpu/drm/i915/display/dvo_ch7xxx.o
  CC      drivers/gpu/drm/i915/display/dvo_ivch.o
  CC      drivers/gpu/drm/i915/display/dvo_ns2501.o
  CC      drivers/gpu/drm/i915/display/dvo_sil164.o
  CC      drivers/gpu/drm/i915/display/dvo_tfp410.o
  CC      drivers/gpu/drm/i915/display/g4x_dp.o
  CC      drivers/gpu/drm/i915/display/g4x_hdmi.o
  CC      drivers/gpu/drm/i915/display/icl_dsi.o
  CC      drivers/gpu/drm/i915/display/intel_backlight.o
  CC      drivers/gpu/drm/i915/display/intel_crt.o
  CC      drivers/gpu/drm/i915/display/intel_cx0_phy.o
  CC      drivers/gpu/drm/i915/display/intel_ddi.o
  CC      drivers/gpu/drm/i915/display/intel_ddi_buf_trans.o
  CC      drivers/gpu/drm/i915/display/intel_display_device.o
  CC      drivers/gpu/drm/i915/display/intel_display_trace.o
  CC      drivers/gpu/drm/i915/display/intel_dkl_phy.o
  CC      drivers/gpu/drm/i915/display/intel_dp.o
  CC      drivers/gpu/drm/i915/display/intel_dp_aux.o
  CC      drivers/gpu/drm/i915/display/intel_dp_aux_backlight.o
  CC      drivers/gpu/drm/i915/display/intel_dp_hdcp.o
  LD [M]  drivers/gpu/drm/xe/xe.o
  CC      drivers/gpu/drm/i915/display/intel_dp_link_training.o
  CC      drivers/gpu/drm/i915/display/intel_dp_mst.o
  CC      drivers/gpu/drm/i915/display/intel_dp_test.o
  CC      drivers/gpu/drm/i915/display/intel_dsi.o
  CC      drivers/gpu/drm/i915/display/intel_dsi_dcs_backlight.o
  CC      drivers/gpu/drm/i915/display/intel_dsi_vbt.o
  CC      drivers/gpu/drm/i915/display/intel_dvo.o
  CC      drivers/gpu/drm/i915/display/intel_encoder.o
  CC      drivers/gpu/drm/i915/display/intel_gmbus.o
  CC      drivers/gpu/drm/i915/display/intel_hdmi.o
  CC      drivers/gpu/drm/i915/display/intel_lspcon.o
  CC      drivers/gpu/drm/i915/display/intel_lvds.o
  CC      drivers/gpu/drm/i915/display/intel_panel.o
  CC      drivers/gpu/drm/i915/display/intel_pfit.o
  CC      drivers/gpu/drm/i915/display/intel_pps.o
  CC      drivers/gpu/drm/i915/display/intel_qp_tables.o
  CC      drivers/gpu/drm/i915/display/intel_sdvo.o
  CC      drivers/gpu/drm/i915/display/intel_snps_phy.o
  CC      drivers/gpu/drm/i915/display/intel_tv.o
  CC      drivers/gpu/drm/i915/display/intel_vdsc.o
  CC      drivers/gpu/drm/i915/display/intel_vrr.o
  CC      drivers/gpu/drm/i915/display/vlv_dsi.o
  CC      drivers/gpu/drm/i915/display/vlv_dsi_pll.o
  CC      drivers/gpu/drm/i915/i915_perf.o
  CC      drivers/gpu/drm/i915/pxp/intel_pxp.o
  CC      drivers/gpu/drm/i915/pxp/intel_pxp_huc.o
  CC      drivers/gpu/drm/i915/pxp/intel_pxp_tee.o
  CC      drivers/gpu/drm/i915/i915_gpu_error.o
  CC      drivers/gpu/drm/i915/i915_vgpu.o
  AR      drivers/gpu/drm/i915/built-in.a
  AR      drivers/gpu/drm/built-in.a
  AR      drivers/gpu/built-in.a
  AR      drivers/built-in.a
  AR      built-in.a
  AR      vmlinux.a
  LD      vmlinux.o
  OBJCOPY modules.builtin.modinfo
  GEN     modules.builtin
  MODPOST Module.symvers
  CC      .vmlinux.export.o
  CC [M]  fs/efivarfs/efivarfs.mod.o
  CC [M]  .module-common.o
  CC [M]  drivers/gpu/drm/drm_exec.mod.o
  CC [M]  drivers/gpu/drm/drm_gpuvm.mod.o
  CC [M]  drivers/gpu/drm/drm_suballoc_helper.mod.o
  CC [M]  drivers/gpu/drm/drm_ttm_helper.mod.o
  CC [M]  drivers/gpu/drm/scheduler/gpu-sched.mod.o
  CC [M]  drivers/gpu/drm/xe/xe.mod.o
  CC [M]  drivers/thermal/intel/x86_pkg_temp_thermal.mod.o
  CC [M]  sound/core/snd-hwdep.mod.o
  CC [M]  sound/core/snd-pcm.mod.o
  CC [M]  sound/pci/hda/snd-hda-codec.mod.o
  CC [M]  sound/pci/hda/snd-hda-codec-hdmi.mod.o
  CC [M]  sound/pci/hda/snd-hda-intel.mod.o
  CC [M]  sound/hda/snd-hda-core.mod.o
  CC [M]  sound/hda/snd-intel-dspcfg.mod.o
  CC [M]  sound/hda/snd-intel-sdw-acpi.mod.o
  CC [M]  net/netfilter/nf_log_syslog.mod.o
  CC [M]  net/netfilter/xt_mark.mod.o
  CC [M]  net/netfilter/xt_nat.mod.o
  CC [M]  net/netfilter/xt_LOG.mod.o
  CC [M]  net/netfilter/xt_MASQUERADE.mod.o
  CC [M]  net/netfilter/xt_addrtype.mod.o
  CC [M]  net/ipv4/netfilter/iptable_nat.mod.o
  LD [M]  drivers/gpu/drm/drm_exec.ko
  LD [M]  drivers/gpu/drm/drm_gpuvm.ko
  LD [M]  drivers/gpu/drm/drm_ttm_helper.ko
  LD [M]  sound/core/snd-hwdep.ko
  LD [M]  sound/pci/hda/snd-hda-codec-hdmi.ko
  LD [M]  drivers/gpu/drm/drm_suballoc_helper.ko
  LD [M]  sound/hda/snd-intel-sdw-acpi.ko
  LD [M]  sound/core/snd-pcm.ko
  LD [M]  fs/efivarfs/efivarfs.ko
  LD [M]  net/netfilter/nf_log_syslog.ko
  LD [M]  sound/pci/hda/snd-hda-codec.ko
  LD [M]  net/netfilter/xt_mark.ko
  LD [M]  sound/hda/snd-intel-dspcfg.ko
  LD [M]  sound/hda/snd-hda-core.ko
  LD [M]  net/ipv4/netfilter/iptable_nat.ko
  LD [M]  sound/pci/hda/snd-hda-intel.ko
  LD [M]  net/netfilter/xt_MASQUERADE.ko
  LD [M]  net/netfilter/xt_addrtype.ko
  LD [M]  net/netfilter/xt_nat.ko
  LD [M]  drivers/gpu/drm/xe/xe.ko
  LD [M]  drivers/gpu/drm/scheduler/gpu-sched.ko
  LD [M]  drivers/thermal/intel/x86_pkg_temp_thermal.ko
  LD [M]  net/netfilter/xt_LOG.ko
  UPD     include/generated/utsversion.h
  CC      init/version-timestamp.o
  KSYMS   .tmp_vmlinux0.kallsyms.S
  AS      .tmp_vmlinux0.kallsyms.o
  LD      .tmp_vmlinux1
  NM      .tmp_vmlinux1.syms
  KSYMS   .tmp_vmlinux1.kallsyms.S
  AS      .tmp_vmlinux1.kallsyms.o
  LD      .tmp_vmlinux2
  NM      .tmp_vmlinux2.syms
  KSYMS   .tmp_vmlinux2.kallsyms.S
  AS      .tmp_vmlinux2.kallsyms.o
  LD      vmlinux
  NM      System.map
  SORTTAB vmlinux
  RELOCS  arch/x86/boot/compressed/vmlinux.relocs
  RSTRIP  vmlinux
  CC      arch/x86/boot/a20.o
  AS      arch/x86/boot/bioscall.o
  CC      arch/x86/boot/cmdline.o
  AS      arch/x86/boot/copy.o
  HOSTCC  arch/x86/boot/mkcpustr
  CC      arch/x86/boot/cpuflags.o
  CC      arch/x86/boot/cpucheck.o
  CC      arch/x86/boot/early_serial_console.o
  CC      arch/x86/boot/edd.o
  CC      arch/x86/boot/main.o
  CC      arch/x86/boot/memory.o
  CC      arch/x86/boot/pm.o
  AS      arch/x86/boot/pmjump.o
  CC      arch/x86/boot/printf.o
  CC      arch/x86/boot/regs.o
  CC      arch/x86/boot/string.o
  CC      arch/x86/boot/tty.o
  CC      arch/x86/boot/video.o
  CC      arch/x86/boot/video-mode.o
  CC      arch/x86/boot/version.o
  CC      arch/x86/boot/video-vga.o
  CC      arch/x86/boot/video-vesa.o
  CC      arch/x86/boot/video-bios.o
  HOSTCC  arch/x86/boot/tools/build
  LDS     arch/x86/boot/compressed/vmlinux.lds
  AS      arch/x86/boot/compressed/kernel_info.o
  AS      arch/x86/boot/compressed/head_32.o
  VOFFSET arch/x86/boot/compressed/../voffset.h
  CPUSTR  arch/x86/boot/cpustr.h
  CC      arch/x86/boot/compressed/string.o
  CC      arch/x86/boot/compressed/cmdline.o
  CC      arch/x86/boot/compressed/error.o
  CC      arch/x86/boot/cpu.o
  OBJCOPY arch/x86/boot/compressed/vmlinux.bin
  HOSTCC  arch/x86/boot/compressed/mkpiggy
  CC      arch/x86/boot/compressed/cpuflags.o
  CC      arch/x86/boot/compressed/early_serial_console.o
  CC      arch/x86/boot/compressed/kaslr.o
  CC      arch/x86/boot/compressed/acpi.o
  CC      arch/x86/boot/compressed/efi.o
  GZIP    arch/x86/boot/compressed/vmlinux.bin.gz
  CC      arch/x86/boot/compressed/misc.o
  MKPIGGY arch/x86/boot/compressed/piggy.S
  AS      arch/x86/boot/compressed/piggy.o
  LD      arch/x86/boot/compressed/vmlinux
  ZOFFSET arch/x86/boot/zoffset.h
  OBJCOPY arch/x86/boot/vmlinux.bin
  AS      arch/x86/boot/header.o
  LD      arch/x86/boot/setup.elf
  OBJCOPY arch/x86/boot/setup.bin
  BUILD   arch/x86/boot/bzImage
Kernel: arch/x86/boot/bzImage is ready  (#1)
run-parts: executing /workspace/ci/hooks/20-kernel-doc
+ SRC_DIR=/workspace/kernel
+ cd /workspace/kernel
+ find drivers/gpu/drm/xe/ -name '*.[ch]' -not -path 'drivers/gpu/drm/xe/display/*'
+ xargs ./scripts/kernel-doc -Werror -none include/uapi/drm/xe_drm.h
date: invalid date ‘+%s’
All hooks done



^ permalink raw reply	[flat|nested] 56+ messages in thread

* ✗ CI.checksparse: warning for Fix non-contiguous VRAM BO access in Xe (rev6)
  2024-10-31 18:10 [PATCH v6 0/8] Fix non-contiguous VRAM BO access in Xe Matthew Brost
                   ` (12 preceding siblings ...)
  2024-10-31 18:31 ` ✓ CI.Hooks: " Patchwork
@ 2024-10-31 18:32 ` Patchwork
  2024-10-31 18:57 ` ✓ CI.BAT: success " Patchwork
  2024-10-31 21:27 ` ✗ CI.FULL: failure " Patchwork
  15 siblings, 0 replies; 56+ messages in thread
From: Patchwork @ 2024-10-31 18:32 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

== Series Details ==

Series: Fix non-contiguous VRAM BO access in Xe (rev6)
URL   : https://patchwork.freedesktop.org/series/140200/
State : warning

== Summary ==

+ trap cleanup EXIT
+ KERNEL=/kernel
+ MT=/root/linux/maintainer-tools
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools /root/linux/maintainer-tools
Cloning into '/root/linux/maintainer-tools'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ make -C /root/linux/maintainer-tools
make: Entering directory '/root/linux/maintainer-tools'
cc -O2 -g -Wextra -o remap-log remap-log.c
make: Leaving directory '/root/linux/maintainer-tools'
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ /root/linux/maintainer-tools/dim sparse --fast b8c3c871a2df70e3201eb70505981d39e449384d
/root/linux/maintainer-tools/dim: line 2068: sparse: command not found
Sparse version: 
Fast mode used, each commit won't be checked separately.
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 56+ messages in thread

* ✓ CI.BAT: success for Fix non-contiguous VRAM BO access in Xe (rev6)
  2024-10-31 18:10 [PATCH v6 0/8] Fix non-contiguous VRAM BO access in Xe Matthew Brost
                   ` (13 preceding siblings ...)
  2024-10-31 18:32 ` ✗ CI.checksparse: warning " Patchwork
@ 2024-10-31 18:57 ` Patchwork
  2024-10-31 21:27 ` ✗ CI.FULL: failure " Patchwork
  15 siblings, 0 replies; 56+ messages in thread
From: Patchwork @ 2024-10-31 18:57 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 3451 bytes --]

== Series Details ==

Series: Fix non-contiguous VRAM BO access in Xe (rev6)
URL   : https://patchwork.freedesktop.org/series/140200/
State : success

== Summary ==

CI Bug Log - changes from xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932_BAT -> xe-pw-140200v6_BAT
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (9 -> 9)
------------------------------

  No changes in participating hosts

Known issues
------------

  Here are the changes found in xe-pw-140200v6_BAT that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@core_hotunplug@unbind-rebind:
    - bat-adlp-7:         [PASS][1] -> [DMESG-WARN][2] ([Intel XE#2871]) +2 other tests dmesg-warn
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/bat-adlp-7/igt@core_hotunplug@unbind-rebind.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/bat-adlp-7/igt@core_hotunplug@unbind-rebind.html

  * igt@kms_psr@psr-cursor-plane-move:
    - bat-adlp-7:         [PASS][3] -> [SKIP][4] ([Intel XE#455]) +3 other tests skip
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/bat-adlp-7/igt@kms_psr@psr-cursor-plane-move.html
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/bat-adlp-7/igt@kms_psr@psr-cursor-plane-move.html

  * igt@xe_live_ktest@xe_bo@xe_bo_shrink_kunit:
    - bat-adlp-7:         [PASS][5] -> [INCOMPLETE][6] ([Intel XE#2874]) +1 other test incomplete
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/bat-adlp-7/igt@xe_live_ktest@xe_bo@xe_bo_shrink_kunit.html
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/bat-adlp-7/igt@xe_live_ktest@xe_bo@xe_bo_shrink_kunit.html

  
#### Possible fixes ####

  * igt@kms_frontbuffer_tracking@basic:
    - bat-adlp-7:         [FAIL][7] ([Intel XE#1861]) -> [PASS][8]
   [7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/bat-adlp-7/igt@kms_frontbuffer_tracking@basic.html
   [8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/bat-adlp-7/igt@kms_frontbuffer_tracking@basic.html

  * igt@xe_exec_compute_mode@twice-bindexecqueue-userptr-invalidate:
    - bat-lnl-1:          [DMESG-WARN][9] ([Intel XE#2687]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/bat-lnl-1/igt@xe_exec_compute_mode@twice-bindexecqueue-userptr-invalidate.html
   [10]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/bat-lnl-1/igt@xe_exec_compute_mode@twice-bindexecqueue-userptr-invalidate.html

  
  [Intel XE#1861]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1861
  [Intel XE#2687]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2687
  [Intel XE#2871]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2871
  [Intel XE#2874]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2874
  [Intel XE#455]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/455


Build changes
-------------

  * Linux: xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932 -> xe-pw-140200v6

  IGT_8091: 8091
  xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932: 9c4962db90300e1d8daca8ed54195bc31d0ee932
  xe-pw-140200v6: 140200v6

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/index.html

[-- Attachment #2: Type: text/html, Size: 4130 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* ✗ CI.FULL: failure for Fix non-contiguous VRAM BO access in Xe (rev6)
  2024-10-31 18:10 [PATCH v6 0/8] Fix non-contiguous VRAM BO access in Xe Matthew Brost
                   ` (14 preceding siblings ...)
  2024-10-31 18:57 ` ✓ CI.BAT: success " Patchwork
@ 2024-10-31 21:27 ` Patchwork
  15 siblings, 0 replies; 56+ messages in thread
From: Patchwork @ 2024-10-31 21:27 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-xe

[-- Attachment #1: Type: text/plain, Size: 53895 bytes --]

== Series Details ==

Series: Fix non-contiguous VRAM BO access in Xe (rev6)
URL   : https://patchwork.freedesktop.org/series/140200/
State : failure

== Summary ==

CI Bug Log - changes from xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932_full -> xe-pw-140200v6_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with xe-pw-140200v6_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in xe-pw-140200v6_full, please notify your bug team (I915-ci-infra@lists.freedesktop.org) to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (4 -> 4)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in xe-pw-140200v6_full:

### IGT changes ###

#### Possible regressions ####

  * igt@kms_ccs@crc-primary-basic-4-tiled-dg2-rc-ccs-cc@pipe-a-hdmi-a-6:
    - shard-dg2-set2:     [PASS][1] -> [DMESG-WARN][2] +29 other tests dmesg-warn
   [1]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-433/igt@kms_ccs@crc-primary-basic-4-tiled-dg2-rc-ccs-cc@pipe-a-hdmi-a-6.html
   [2]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-464/igt@kms_ccs@crc-primary-basic-4-tiled-dg2-rc-ccs-cc@pipe-a-hdmi-a-6.html

  * igt@kms_cursor_legacy@torture-bo@pipe-b:
    - shard-bmg:          NOTRUN -> [INCOMPLETE][3]
   [3]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-4/igt@kms_cursor_legacy@torture-bo@pipe-b.html

  * igt@xe_oa@syncs-userptr-wait-cfg:
    - shard-dg2-set2:     NOTRUN -> [SKIP][4]
   [4]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-435/igt@xe_oa@syncs-userptr-wait-cfg.html

  
#### Warnings ####

  * igt@kms_ccs@crc-sprite-planes-basic-4-tiled-dg2-rc-ccs-cc@pipe-c-hdmi-a-6:
    - shard-dg2-set2:     [FAIL][5] ([Intel XE#616]) -> [DMESG-WARN][6] +8 other tests dmesg-warn
   [5]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-463/igt@kms_ccs@crc-sprite-planes-basic-4-tiled-dg2-rc-ccs-cc@pipe-c-hdmi-a-6.html
   [6]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-436/igt@kms_ccs@crc-sprite-planes-basic-4-tiled-dg2-rc-ccs-cc@pipe-c-hdmi-a-6.html

  * igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-mmap-wc:
    - shard-bmg:          [FAIL][7] ([Intel XE#2333]) -> [INCOMPLETE][8]
   [7]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-bmg-6/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-mmap-wc.html
   [8]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-6/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-shrfb-draw-mmap-wc.html

  
Known issues
------------

  Here are the changes found in xe-pw-140200v6_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_big_fb@4-tiled-8bpp-rotate-90:
    - shard-dg2-set2:     NOTRUN -> [SKIP][9] ([Intel XE#316]) +3 other tests skip
   [9]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-466/igt@kms_big_fb@4-tiled-8bpp-rotate-90.html

  * igt@kms_big_fb@4-tiled-max-hw-stride-64bpp-rotate-0-hflip-async-flip:
    - shard-adlp:         NOTRUN -> [SKIP][10] ([Intel XE#1124]) +4 other tests skip
   [10]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@kms_big_fb@4-tiled-max-hw-stride-64bpp-rotate-0-hflip-async-flip.html

  * igt@kms_big_fb@x-tiled-16bpp-rotate-90:
    - shard-adlp:         NOTRUN -> [SKIP][11] ([Intel XE#316])
   [11]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@kms_big_fb@x-tiled-16bpp-rotate-90.html

  * igt@kms_big_fb@y-tiled-32bpp-rotate-180:
    - shard-adlp:         [PASS][12] -> [DMESG-WARN][13] ([Intel XE#3086])
   [12]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-adlp-1/igt@kms_big_fb@y-tiled-32bpp-rotate-180.html
   [13]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-4/igt@kms_big_fb@y-tiled-32bpp-rotate-180.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip-async-flip:
    - shard-bmg:          NOTRUN -> [SKIP][14] ([Intel XE#1124]) +1 other test skip
   [14]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-5/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip-async-flip.html

  * igt@kms_big_fb@yf-tiled-16bpp-rotate-180:
    - shard-lnl:          NOTRUN -> [SKIP][15] ([Intel XE#1124])
   [15]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-lnl-6/igt@kms_big_fb@yf-tiled-16bpp-rotate-180.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0-hflip-async-flip:
    - shard-dg2-set2:     NOTRUN -> [SKIP][16] ([Intel XE#1124]) +6 other tests skip
   [16]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-466/igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0-hflip-async-flip.html

  * igt@kms_bw@connected-linear-tiling-4-displays-2160x1440p:
    - shard-dg2-set2:     NOTRUN -> [SKIP][17] ([Intel XE#2191])
   [17]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-435/igt@kms_bw@connected-linear-tiling-4-displays-2160x1440p.html

  * igt@kms_bw@linear-tiling-2-displays-2560x1440p:
    - shard-adlp:         NOTRUN -> [SKIP][18] ([Intel XE#367]) +2 other tests skip
   [18]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@kms_bw@linear-tiling-2-displays-2560x1440p.html

  * igt@kms_bw@linear-tiling-2-displays-3840x2160p:
    - shard-dg2-set2:     NOTRUN -> [SKIP][19] ([Intel XE#367]) +1 other test skip
   [19]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-466/igt@kms_bw@linear-tiling-2-displays-3840x2160p.html
    - shard-bmg:          NOTRUN -> [SKIP][20] ([Intel XE#367])
   [20]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-5/igt@kms_bw@linear-tiling-2-displays-3840x2160p.html

  * igt@kms_ccs@bad-aux-stride-4-tiled-mtl-mc-ccs@pipe-a-hdmi-a-6:
    - shard-dg2-set2:     NOTRUN -> [SKIP][21] ([Intel XE#787]) +34 other tests skip
   [21]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-435/igt@kms_ccs@bad-aux-stride-4-tiled-mtl-mc-ccs@pipe-a-hdmi-a-6.html

  * igt@kms_ccs@bad-aux-stride-y-tiled-gen12-rc-ccs-cc@pipe-d-dp-4:
    - shard-dg2-set2:     NOTRUN -> [SKIP][22] ([Intel XE#455] / [Intel XE#787]) +9 other tests skip
   [22]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-466/igt@kms_ccs@bad-aux-stride-y-tiled-gen12-rc-ccs-cc@pipe-d-dp-4.html

  * igt@kms_ccs@ccs-on-another-bo-y-tiled-gen12-rc-ccs:
    - shard-bmg:          NOTRUN -> [SKIP][23] ([Intel XE#2887])
   [23]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-5/igt@kms_ccs@ccs-on-another-bo-y-tiled-gen12-rc-ccs.html

  * igt@kms_ccs@crc-primary-rotation-180-4-tiled-bmg-ccs:
    - shard-adlp:         NOTRUN -> [SKIP][24] ([Intel XE#2907])
   [24]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@kms_ccs@crc-primary-rotation-180-4-tiled-bmg-ccs.html

  * igt@kms_ccs@crc-sprite-planes-basic-4-tiled-bmg-ccs:
    - shard-dg2-set2:     NOTRUN -> [SKIP][25] ([Intel XE#2907])
   [25]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-466/igt@kms_ccs@crc-sprite-planes-basic-4-tiled-bmg-ccs.html

  * igt@kms_ccs@missing-ccs-buffer-y-tiled-ccs:
    - shard-adlp:         NOTRUN -> [SKIP][26] ([Intel XE#455] / [Intel XE#787]) +3 other tests skip
   [26]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@kms_ccs@missing-ccs-buffer-y-tiled-ccs.html

  * igt@kms_ccs@missing-ccs-buffer-y-tiled-ccs@pipe-c-hdmi-a-1:
    - shard-adlp:         NOTRUN -> [SKIP][27] ([Intel XE#787]) +5 other tests skip
   [27]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@kms_ccs@missing-ccs-buffer-y-tiled-ccs@pipe-c-hdmi-a-1.html

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs:
    - shard-dg2-set2:     [PASS][28] -> [INCOMPLETE][29] ([Intel XE#1195] / [Intel XE#1727])
   [28]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-435/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs.html
   [29]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-433/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs.html

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-a-dp-4:
    - shard-dg2-set2:     [PASS][30] -> [DMESG-WARN][31] ([Intel XE#3113])
   [30]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-466/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-a-dp-4.html
   [31]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-464/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-a-dp-4.html

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-b-hdmi-a-6:
    - shard-dg2-set2:     [PASS][32] -> [INCOMPLETE][33] ([Intel XE#1195] / [Intel XE#3124])
   [32]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-466/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-b-hdmi-a-6.html
   [33]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-464/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs-cc@pipe-b-hdmi-a-6.html

  * igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs@pipe-c-dp-4:
    - shard-dg2-set2:     [PASS][34] -> [INCOMPLETE][35] ([Intel XE#1195])
   [34]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-435/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs@pipe-c-dp-4.html
   [35]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-433/igt@kms_ccs@random-ccs-data-4-tiled-dg2-rc-ccs@pipe-c-dp-4.html

  * igt@kms_chamelium_audio@hdmi-audio-edid:
    - shard-dg2-set2:     NOTRUN -> [SKIP][36] ([Intel XE#373]) +5 other tests skip
   [36]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-435/igt@kms_chamelium_audio@hdmi-audio-edid.html

  * igt@kms_chamelium_color@gamma:
    - shard-dg2-set2:     NOTRUN -> [SKIP][37] ([Intel XE#306])
   [37]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-466/igt@kms_chamelium_color@gamma.html

  * igt@kms_chamelium_frames@dp-crc-single:
    - shard-bmg:          NOTRUN -> [SKIP][38] ([Intel XE#2252])
   [38]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-4/igt@kms_chamelium_frames@dp-crc-single.html

  * igt@kms_chamelium_hpd@dp-hpd-storm-disable:
    - shard-adlp:         NOTRUN -> [SKIP][39] ([Intel XE#373])
   [39]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@kms_chamelium_hpd@dp-hpd-storm-disable.html

  * igt@kms_content_protection@dp-mst-type-0:
    - shard-dg2-set2:     NOTRUN -> [SKIP][40] ([Intel XE#307]) +2 other tests skip
   [40]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-436/igt@kms_content_protection@dp-mst-type-0.html

  * igt@kms_content_protection@legacy:
    - shard-dg2-set2:     NOTRUN -> [INCOMPLETE][41] ([Intel XE#1195] / [Intel XE#2715]) +1 other test incomplete
   [41]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-463/igt@kms_content_protection@legacy.html

  * igt@kms_cursor_crc@cursor-offscreen-512x512:
    - shard-dg2-set2:     NOTRUN -> [SKIP][42] ([Intel XE#308]) +1 other test skip
   [42]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-435/igt@kms_cursor_crc@cursor-offscreen-512x512.html

  * igt@kms_cursor_crc@cursor-random-32x32:
    - shard-adlp:         NOTRUN -> [SKIP][43] ([Intel XE#455]) +7 other tests skip
   [43]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@kms_cursor_crc@cursor-random-32x32.html

  * igt@kms_cursor_legacy@cursora-vs-flipb-varying-size:
    - shard-lnl:          NOTRUN -> [SKIP][44] ([Intel XE#309])
   [44]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-lnl-6/igt@kms_cursor_legacy@cursora-vs-flipb-varying-size.html

  * igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions-varying-size:
    - shard-bmg:          [PASS][45] -> [DMESG-WARN][46] ([Intel XE#877])
   [45]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-bmg-7/igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions-varying-size.html
   [46]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-8/igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions-varying-size.html

  * igt@kms_cursor_legacy@cursorb-vs-flipa-varying-size:
    - shard-adlp:         NOTRUN -> [SKIP][47] ([Intel XE#309])
   [47]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@kms_cursor_legacy@cursorb-vs-flipa-varying-size.html

  * igt@kms_cursor_legacy@short-flip-after-cursor-atomic-transitions:
    - shard-adlp:         [PASS][48] -> [DMESG-WARN][49] ([Intel XE#2953] / [Intel XE#3086])
   [48]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-adlp-6/igt@kms_cursor_legacy@short-flip-after-cursor-atomic-transitions.html
   [49]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-8/igt@kms_cursor_legacy@short-flip-after-cursor-atomic-transitions.html

  * igt@kms_cursor_legacy@torture-bo:
    - shard-bmg:          NOTRUN -> [INCOMPLETE][50] ([Intel XE#3226])
   [50]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-4/igt@kms_cursor_legacy@torture-bo.html
    - shard-dg2-set2:     [PASS][51] -> [DMESG-WARN][52] ([Intel XE#2932]) +1 other test dmesg-warn
   [51]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-436/igt@kms_cursor_legacy@torture-bo.html
   [52]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-435/igt@kms_cursor_legacy@torture-bo.html

  * igt@kms_dirtyfb@psr-dirtyfb-ioctl:
    - shard-bmg:          NOTRUN -> [SKIP][53] ([Intel XE#1508])
   [53]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-5/igt@kms_dirtyfb@psr-dirtyfb-ioctl.html

  * igt@kms_dither@fb-8bpc-vs-panel-6bpc@pipe-a-hdmi-a-6:
    - shard-dg2-set2:     NOTRUN -> [SKIP][54] ([i915#3804])
   [54]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-463/igt@kms_dither@fb-8bpc-vs-panel-6bpc@pipe-a-hdmi-a-6.html

  * igt@kms_feature_discovery@psr1:
    - shard-dg2-set2:     NOTRUN -> [SKIP][55] ([Intel XE#1135])
   [55]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-435/igt@kms_feature_discovery@psr1.html

  * igt@kms_flip@2x-blocking-absolute-wf_vblank-interruptible:
    - shard-adlp:         NOTRUN -> [SKIP][56] ([Intel XE#310])
   [56]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@kms_flip@2x-blocking-absolute-wf_vblank-interruptible.html

  * igt@kms_flip@2x-blocking-wf_vblank:
    - shard-bmg:          [PASS][57] -> [FAIL][58] ([Intel XE#2882]) +1 other test fail
   [57]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-bmg-7/igt@kms_flip@2x-blocking-wf_vblank.html
   [58]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-5/igt@kms_flip@2x-blocking-wf_vblank.html

  * igt@kms_flip@2x-flip-vs-expired-vblank@ab-dp2-hdmi-a3:
    - shard-bmg:          [PASS][59] -> [FAIL][60] ([Intel XE#301]) +3 other tests fail
   [59]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-bmg-5/igt@kms_flip@2x-flip-vs-expired-vblank@ab-dp2-hdmi-a3.html
   [60]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-4/igt@kms_flip@2x-flip-vs-expired-vblank@ab-dp2-hdmi-a3.html

  * igt@kms_flip@flip-vs-absolute-wf_vblank:
    - shard-lnl:          [PASS][61] -> [FAIL][62] ([Intel XE#886]) +7 other tests fail
   [61]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-lnl-4/igt@kms_flip@flip-vs-absolute-wf_vblank.html
   [62]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-lnl-5/igt@kms_flip@flip-vs-absolute-wf_vblank.html

  * igt@kms_flip@flip-vs-absolute-wf_vblank@a-hdmi-a1:
    - shard-adlp:         [PASS][63] -> [FAIL][64] ([Intel XE#2882]) +2 other tests fail
   [63]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-adlp-2/igt@kms_flip@flip-vs-absolute-wf_vblank@a-hdmi-a1.html
   [64]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@kms_flip@flip-vs-absolute-wf_vblank@a-hdmi-a1.html

  * igt@kms_flip@flip-vs-expired-vblank@b-hdmi-a6:
    - shard-dg2-set2:     [PASS][65] -> [FAIL][66] ([Intel XE#301]) +6 other tests fail
   [65]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-434/igt@kms_flip@flip-vs-expired-vblank@b-hdmi-a6.html
   [66]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-433/igt@kms_flip@flip-vs-expired-vblank@b-hdmi-a6.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytilegen12rcccs-downscaling:
    - shard-bmg:          NOTRUN -> [SKIP][67] ([Intel XE#2293] / [Intel XE#2380])
   [67]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-4/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytilegen12rcccs-downscaling.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytilegen12rcccs-downscaling@pipe-a-valid-mode:
    - shard-bmg:          NOTRUN -> [SKIP][68] ([Intel XE#2293])
   [68]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-4/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-32bpp-ytilegen12rcccs-downscaling@pipe-a-valid-mode.html

  * igt@kms_frontbuffer_tracking@drrs-1p-offscren-pri-shrfb-draw-render:
    - shard-adlp:         NOTRUN -> [SKIP][69] ([Intel XE#651]) +2 other tests skip
   [69]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@kms_frontbuffer_tracking@drrs-1p-offscren-pri-shrfb-draw-render.html

  * igt@kms_frontbuffer_tracking@drrs-1p-primscrn-indfb-msflip-blt:
    - shard-dg2-set2:     NOTRUN -> [SKIP][70] ([Intel XE#651]) +16 other tests skip
   [70]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-435/igt@kms_frontbuffer_tracking@drrs-1p-primscrn-indfb-msflip-blt.html

  * igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-cur-indfb-draw-blt:
    - shard-bmg:          NOTRUN -> [SKIP][71] ([Intel XE#2311])
   [71]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-5/igt@kms_frontbuffer_tracking@drrs-2p-scndscrn-cur-indfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-shrfb-pgflip-blt:
    - shard-bmg:          NOTRUN -> [FAIL][72] ([Intel XE#2333])
   [72]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-5/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-shrfb-pgflip-blt.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-shrfb-draw-mmap-wc:
    - shard-adlp:         NOTRUN -> [SKIP][73] ([Intel XE#653]) +4 other tests skip
   [73]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-shrfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-pri-indfb-multidraw:
    - shard-adlp:         NOTRUN -> [SKIP][74] ([Intel XE#656]) +9 other tests skip
   [74]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-4/igt@kms_frontbuffer_tracking@fbcpsr-2p-pri-indfb-multidraw.html

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-blt:
    - shard-dg2-set2:     NOTRUN -> [SKIP][75] ([Intel XE#653]) +18 other tests skip
   [75]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-435/igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@psr-2p-primscrn-pri-indfb-draw-blt:
    - shard-lnl:          NOTRUN -> [SKIP][76] ([Intel XE#656]) +1 other test skip
   [76]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-lnl-6/igt@kms_frontbuffer_tracking@psr-2p-primscrn-pri-indfb-draw-blt.html

  * igt@kms_frontbuffer_tracking@psr-rgb565-draw-mmap-wc:
    - shard-bmg:          NOTRUN -> [SKIP][77] ([Intel XE#2313]) +3 other tests skip
   [77]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-5/igt@kms_frontbuffer_tracking@psr-rgb565-draw-mmap-wc.html

  * igt@kms_lease@page-flip-implicit-plane:
    - shard-lnl:          [PASS][78] -> [FAIL][79] ([Intel XE#2205]) +1 other test fail
   [78]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-lnl-5/igt@kms_lease@page-flip-implicit-plane.html
   [79]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-lnl-6/igt@kms_lease@page-flip-implicit-plane.html

  * igt@kms_panel_fitting@atomic-fastset:
    - shard-dg2-set2:     NOTRUN -> [SKIP][80] ([Intel XE#455]) +9 other tests skip
   [80]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-435/igt@kms_panel_fitting@atomic-fastset.html

  * igt@kms_plane@pixel-format-source-clamping:
    - shard-dg2-set2:     [PASS][81] -> [DMESG-WARN][82] ([Intel XE#2566])
   [81]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-466/igt@kms_plane@pixel-format-source-clamping.html
   [82]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-464/igt@kms_plane@pixel-format-source-clamping.html

  * igt@kms_pm_dc@dc3co-vpb-simulation:
    - shard-dg2-set2:     NOTRUN -> [SKIP][83] ([Intel XE#1122])
   [83]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-436/igt@kms_pm_dc@dc3co-vpb-simulation.html

  * igt@kms_pm_dc@dc6-psr:
    - shard-bmg:          NOTRUN -> [SKIP][84] ([Intel XE#2392])
   [84]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-5/igt@kms_pm_dc@dc6-psr.html
    - shard-adlp:         NOTRUN -> [SKIP][85] ([Intel XE#1129])
   [85]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-4/igt@kms_pm_dc@dc6-psr.html
    - shard-dg2-set2:     NOTRUN -> [SKIP][86] ([Intel XE#1129])
   [86]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-466/igt@kms_pm_dc@dc6-psr.html

  * igt@kms_pm_rpm@basic-pci-d3-state:
    - shard-dg2-set2:     NOTRUN -> [FAIL][87] ([Intel XE#3129])
   [87]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-466/igt@kms_pm_rpm@basic-pci-d3-state.html

  * igt@kms_pm_rpm@drm-resources-equal:
    - shard-lnl:          [PASS][88] -> [DMESG-WARN][89] ([Intel XE#2042])
   [88]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-lnl-6/igt@kms_pm_rpm@drm-resources-equal.html
   [89]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-lnl-2/igt@kms_pm_rpm@drm-resources-equal.html

  * igt@kms_pm_rpm@system-suspend-modeset:
    - shard-dg2-set2:     [PASS][90] -> [ABORT][91] ([Intel XE#2625])
   [90]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-464/igt@kms_pm_rpm@system-suspend-modeset.html
   [91]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-432/igt@kms_pm_rpm@system-suspend-modeset.html

  * igt@kms_psr2_sf@fbc-pr-plane-move-sf-dmg-area:
    - shard-adlp:         NOTRUN -> [SKIP][92] ([Intel XE#1489])
   [92]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@kms_psr2_sf@fbc-pr-plane-move-sf-dmg-area.html

  * igt@kms_psr2_sf@psr2-cursor-plane-move-continuous-exceed-fully-sf:
    - shard-bmg:          NOTRUN -> [SKIP][93] ([Intel XE#1489])
   [93]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-4/igt@kms_psr2_sf@psr2-cursor-plane-move-continuous-exceed-fully-sf.html

  * igt@kms_psr2_sf@psr2-overlay-primary-update-sf-dmg-area:
    - shard-dg2-set2:     NOTRUN -> [SKIP][94] ([Intel XE#1489]) +3 other tests skip
   [94]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-436/igt@kms_psr2_sf@psr2-overlay-primary-update-sf-dmg-area.html

  * igt@kms_psr@fbc-psr2-sprite-render:
    - shard-adlp:         NOTRUN -> [SKIP][95] ([Intel XE#2850] / [Intel XE#929]) +4 other tests skip
   [95]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-4/igt@kms_psr@fbc-psr2-sprite-render.html
    - shard-bmg:          NOTRUN -> [SKIP][96] ([Intel XE#2234] / [Intel XE#2850]) +1 other test skip
   [96]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-5/igt@kms_psr@fbc-psr2-sprite-render.html

  * igt@kms_psr@psr-primary-render:
    - shard-dg2-set2:     NOTRUN -> [SKIP][97] ([Intel XE#2850] / [Intel XE#929]) +7 other tests skip
   [97]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-466/igt@kms_psr@psr-primary-render.html

  * igt@kms_rotation_crc@primary-y-tiled-reflect-x-180:
    - shard-dg2-set2:     NOTRUN -> [SKIP][98] ([Intel XE#1127])
   [98]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-435/igt@kms_rotation_crc@primary-y-tiled-reflect-x-180.html

  * igt@kms_tiled_display@basic-test-pattern:
    - shard-adlp:         NOTRUN -> [SKIP][99] ([Intel XE#362])
   [99]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-4/igt@kms_tiled_display@basic-test-pattern.html
    - shard-bmg:          NOTRUN -> [SKIP][100] ([Intel XE#2426])
   [100]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-5/igt@kms_tiled_display@basic-test-pattern.html
    - shard-dg2-set2:     NOTRUN -> [FAIL][101] ([Intel XE#1729])
   [101]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-466/igt@kms_tiled_display@basic-test-pattern.html

  * igt@kms_universal_plane@cursor-fb-leak@pipe-c-edp-1:
    - shard-lnl:          [PASS][102] -> [FAIL][103] ([Intel XE#899])
   [102]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-lnl-2/igt@kms_universal_plane@cursor-fb-leak@pipe-c-edp-1.html
   [103]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-lnl-1/igt@kms_universal_plane@cursor-fb-leak@pipe-c-edp-1.html

  * igt@xe_copy_basic@mem-copy-linear-0xfd:
    - shard-dg2-set2:     NOTRUN -> [SKIP][104] ([Intel XE#1123])
   [104]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-466/igt@xe_copy_basic@mem-copy-linear-0xfd.html

  * igt@xe_eudebug@basic-vm-bind-discovery:
    - shard-adlp:         NOTRUN -> [SKIP][105] ([Intel XE#2905])
   [105]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-4/igt@xe_eudebug@basic-vm-bind-discovery.html
    - shard-bmg:          NOTRUN -> [SKIP][106] ([Intel XE#2905])
   [106]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-5/igt@xe_eudebug@basic-vm-bind-discovery.html

  * igt@xe_eudebug_online@preempt-breakpoint:
    - shard-dg2-set2:     NOTRUN -> [SKIP][107] ([Intel XE#2905]) +7 other tests skip
   [107]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-435/igt@xe_eudebug_online@preempt-breakpoint.html

  * igt@xe_evict@evict-beng-large-multi-vm-cm:
    - shard-dg2-set2:     NOTRUN -> [FAIL][108] ([Intel XE#1600])
   [108]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-435/igt@xe_evict@evict-beng-large-multi-vm-cm.html

  * igt@xe_evict@evict-beng-mixed-threads-large:
    - shard-bmg:          [PASS][109] -> [TIMEOUT][110] ([Intel XE#1473])
   [109]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-bmg-7/igt@xe_evict@evict-beng-mixed-threads-large.html
   [110]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-5/igt@xe_evict@evict-beng-mixed-threads-large.html

  * igt@xe_evict@evict-cm-threads-small:
    - shard-adlp:         NOTRUN -> [SKIP][111] ([Intel XE#261] / [Intel XE#688]) +1 other test skip
   [111]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@xe_evict@evict-cm-threads-small.html

  * igt@xe_evict@evict-mixed-many-threads-small:
    - shard-bmg:          [PASS][112] -> [TIMEOUT][113] ([Intel XE#1473] / [Intel XE#2472])
   [112]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-bmg-2/igt@xe_evict@evict-mixed-many-threads-small.html
   [113]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-6/igt@xe_evict@evict-mixed-many-threads-small.html

  * igt@xe_evict_ccs@evict-overcommit-parallel-nofree-samefd:
    - shard-adlp:         NOTRUN -> [SKIP][114] ([Intel XE#688])
   [114]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@xe_evict_ccs@evict-overcommit-parallel-nofree-samefd.html

  * igt@xe_exec_basic@multigpu-many-execqueues-many-vm-null-defer-bind:
    - shard-adlp:         NOTRUN -> [SKIP][115] ([Intel XE#1392]) +2 other tests skip
   [115]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@xe_exec_basic@multigpu-many-execqueues-many-vm-null-defer-bind.html

  * igt@xe_exec_basic@multigpu-once-userptr-rebind:
    - shard-bmg:          NOTRUN -> [SKIP][116] ([Intel XE#2322])
   [116]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-5/igt@xe_exec_basic@multigpu-once-userptr-rebind.html

  * igt@xe_exec_fault_mode@many-execqueues-userptr-invalidate-race:
    - shard-bmg:          [PASS][117] -> [FAIL][118] ([Intel XE#3160]) +1 other test fail
   [117]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-bmg-8/igt@xe_exec_fault_mode@many-execqueues-userptr-invalidate-race.html
   [118]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-2/igt@xe_exec_fault_mode@many-execqueues-userptr-invalidate-race.html

  * igt@xe_exec_fault_mode@many-execqueues-userptr-invalidate-race-imm:
    - shard-lnl:          [PASS][119] -> [FAIL][120] ([Intel XE#3320])
   [119]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-lnl-3/igt@xe_exec_fault_mode@many-execqueues-userptr-invalidate-race-imm.html
   [120]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-lnl-8/igt@xe_exec_fault_mode@many-execqueues-userptr-invalidate-race-imm.html

  * igt@xe_exec_fault_mode@many-execqueues-userptr-invalidate-race-prefetch:
    - shard-lnl:          [PASS][121] -> [FAIL][122] ([Intel XE#3160]) +2 other tests fail
   [121]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-lnl-5/igt@xe_exec_fault_mode@many-execqueues-userptr-invalidate-race-prefetch.html
   [122]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-lnl-6/igt@xe_exec_fault_mode@many-execqueues-userptr-invalidate-race-prefetch.html

  * igt@xe_exec_fault_mode@twice-bindexecqueue-userptr-prefetch:
    - shard-adlp:         NOTRUN -> [SKIP][123] ([Intel XE#288]) +7 other tests skip
   [123]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-4/igt@xe_exec_fault_mode@twice-bindexecqueue-userptr-prefetch.html

  * igt@xe_exec_fault_mode@twice-invalid-fault:
    - shard-dg2-set2:     NOTRUN -> [SKIP][124] ([Intel XE#288]) +18 other tests skip
   [124]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-436/igt@xe_exec_fault_mode@twice-invalid-fault.html

  * igt@xe_oa@mmio-triggered-reports@ccs-0:
    - shard-bmg:          NOTRUN -> [FAIL][125] ([Intel XE#2249])
   [125]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-7/igt@xe_oa@mmio-triggered-reports@ccs-0.html

  * igt@xe_oa@non-privileged-access-vaddr:
    - shard-dg2-set2:     NOTRUN -> [SKIP][126] ([Intel XE#2541])
   [126]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-435/igt@xe_oa@non-privileged-access-vaddr.html

  * igt@xe_oa@non-system-wide-paranoid:
    - shard-adlp:         NOTRUN -> [SKIP][127] ([Intel XE#2541]) +1 other test skip
   [127]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@xe_oa@non-system-wide-paranoid.html

  * igt@xe_oa@oa-regs-whitelisted@ccs-0:
    - shard-bmg:          NOTRUN -> [FAIL][128] ([Intel XE#2514])
   [128]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-6/igt@xe_oa@oa-regs-whitelisted@ccs-0.html
    - shard-lnl:          NOTRUN -> [FAIL][129] ([Intel XE#2514])
   [129]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-lnl-7/igt@xe_oa@oa-regs-whitelisted@ccs-0.html

  * igt@xe_pm@d3hot-basic-exec:
    - shard-lnl:          [PASS][130] -> [DMESG-WARN][131] ([Intel XE#3184])
   [130]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-lnl-3/igt@xe_pm@d3hot-basic-exec.html
   [131]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-lnl-8/igt@xe_pm@d3hot-basic-exec.html

  * igt@xe_pm@d3hot-mmap-vram:
    - shard-adlp:         NOTRUN -> [SKIP][132] ([Intel XE#1948])
   [132]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@xe_pm@d3hot-mmap-vram.html

  * igt@xe_pm@s2idle-multiple-execs:
    - shard-dg2-set2:     [PASS][133] -> [ABORT][134] ([Intel XE#1358])
   [133]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-433/igt@xe_pm@s2idle-multiple-execs.html
   [134]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-432/igt@xe_pm@s2idle-multiple-execs.html

  * igt@xe_pm@s4-mocs:
    - shard-adlp:         [PASS][135] -> [ABORT][136] ([Intel XE#1794])
   [135]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-adlp-8/igt@xe_pm@s4-mocs.html
   [136]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-9/igt@xe_pm@s4-mocs.html

  * igt@xe_query@multigpu-query-invalid-extension:
    - shard-dg2-set2:     NOTRUN -> [SKIP][137] ([Intel XE#944]) +2 other tests skip
   [137]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-435/igt@xe_query@multigpu-query-invalid-extension.html

  * igt@xe_query@multigpu-query-mem-usage:
    - shard-adlp:         NOTRUN -> [SKIP][138] ([Intel XE#944])
   [138]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-2/igt@xe_query@multigpu-query-mem-usage.html

  * igt@xe_wedged@basic-wedged:
    - shard-dg2-set2:     NOTRUN -> [DMESG-WARN][139] ([Intel XE#2919])
   [139]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-435/igt@xe_wedged@basic-wedged.html

  
#### Possible fixes ####

  * igt@kms_atomic_transition@plane-all-modeset-transition-fencing:
    - shard-adlp:         [FAIL][140] ([Intel XE#1426]) -> [PASS][141] +1 other test pass
   [140]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-adlp-6/igt@kms_atomic_transition@plane-all-modeset-transition-fencing.html
   [141]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-8/igt@kms_atomic_transition@plane-all-modeset-transition-fencing.html

  * igt@kms_big_fb@linear-64bpp-rotate-0:
    - shard-dg2-set2:     [DMESG-WARN][142] ([Intel XE#877]) -> [PASS][143]
   [142]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-432/igt@kms_big_fb@linear-64bpp-rotate-0.html
   [143]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-463/igt@kms_big_fb@linear-64bpp-rotate-0.html

  * igt@kms_big_fb@linear-max-hw-stride-64bpp-rotate-0:
    - shard-adlp:         [DMESG-WARN][144] ([Intel XE#3086]) -> [PASS][145]
   [144]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-adlp-6/igt@kms_big_fb@linear-max-hw-stride-64bpp-rotate-0.html
   [145]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-3/igt@kms_big_fb@linear-max-hw-stride-64bpp-rotate-0.html

  * igt@kms_cursor_edge_walk@128x128-top-bottom:
    - shard-lnl:          [FAIL][146] ([Intel XE#2577]) -> [PASS][147] +1 other test pass
   [146]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-lnl-8/igt@kms_cursor_edge_walk@128x128-top-bottom.html
   [147]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-lnl-4/igt@kms_cursor_edge_walk@128x128-top-bottom.html

  * igt@kms_flip@2x-flip-vs-dpms-off-vs-modeset@bd-dp2-hdmi-a3:
    - shard-bmg:          [INCOMPLETE][148] -> [PASS][149] +1 other test pass
   [148]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-bmg-5/igt@kms_flip@2x-flip-vs-dpms-off-vs-modeset@bd-dp2-hdmi-a3.html
   [149]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-4/igt@kms_flip@2x-flip-vs-dpms-off-vs-modeset@bd-dp2-hdmi-a3.html

  * igt@kms_flip@2x-flip-vs-suspend:
    - shard-dg2-set2:     [ABORT][150] ([Intel XE#2625]) -> [PASS][151]
   [150]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-432/igt@kms_flip@2x-flip-vs-suspend.html
   [151]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-436/igt@kms_flip@2x-flip-vs-suspend.html

  * igt@kms_flip@2x-flip-vs-suspend@cd-hdmi-a6-dp4:
    - shard-dg2-set2:     [ABORT][152] -> [PASS][153]
   [152]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-432/igt@kms_flip@2x-flip-vs-suspend@cd-hdmi-a6-dp4.html
   [153]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-436/igt@kms_flip@2x-flip-vs-suspend@cd-hdmi-a6-dp4.html

  * igt@kms_flip@2x-plain-flip-fb-recreate-interruptible@ac-dp2-hdmi-a3:
    - shard-bmg:          [FAIL][154] ([Intel XE#2882]) -> [PASS][155]
   [154]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-bmg-2/igt@kms_flip@2x-plain-flip-fb-recreate-interruptible@ac-dp2-hdmi-a3.html
   [155]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-2/igt@kms_flip@2x-plain-flip-fb-recreate-interruptible@ac-dp2-hdmi-a3.html

  * igt@kms_flip@flip-vs-absolute-wf_vblank-interruptible:
    - shard-dg2-set2:     [INCOMPLETE][156] ([Intel XE#1195] / [Intel XE#2049]) -> [PASS][157]
   [156]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-436/igt@kms_flip@flip-vs-absolute-wf_vblank-interruptible.html
   [157]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-435/igt@kms_flip@flip-vs-absolute-wf_vblank-interruptible.html

  * igt@kms_flip@flip-vs-absolute-wf_vblank-interruptible@b-hdmi-a6:
    - shard-dg2-set2:     [INCOMPLETE][158] ([Intel XE#1195]) -> [PASS][159]
   [158]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-436/igt@kms_flip@flip-vs-absolute-wf_vblank-interruptible@b-hdmi-a6.html
   [159]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-435/igt@kms_flip@flip-vs-absolute-wf_vblank-interruptible@b-hdmi-a6.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible:
    - shard-bmg:          [FAIL][160] ([Intel XE#301]) -> [PASS][161] +4 other tests pass
   [160]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-bmg-2/igt@kms_flip@flip-vs-expired-vblank-interruptible.html
   [161]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-6/igt@kms_flip@flip-vs-expired-vblank-interruptible.html

  * igt@kms_flip@flip-vs-suspend:
    - shard-bmg:          [INCOMPLETE][162] ([Intel XE#2597]) -> [PASS][163]
   [162]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-bmg-4/igt@kms_flip@flip-vs-suspend.html
   [163]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-5/igt@kms_flip@flip-vs-suspend.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-adlp:         [DMESG-WARN][164] ([Intel XE#2953] / [Intel XE#3086]) -> [PASS][165] +1 other test pass
   [164]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-adlp-6/igt@kms_flip@flip-vs-suspend-interruptible.html
   [165]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-3/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_flip@flip-vs-suspend@d-hdmi-a3:
    - shard-bmg:          [INCOMPLETE][166] ([Intel XE#2597] / [Intel XE#2635]) -> [PASS][167]
   [166]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-bmg-4/igt@kms_flip@flip-vs-suspend@d-hdmi-a3.html
   [167]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-5/igt@kms_flip@flip-vs-suspend@d-hdmi-a3.html

  * igt@kms_pm_rpm@universal-planes:
    - shard-lnl:          [DMESG-WARN][168] ([Intel XE#2042]) -> [PASS][169]
   [168]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-lnl-7/igt@kms_pm_rpm@universal-planes.html
   [169]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-lnl-3/igt@kms_pm_rpm@universal-planes.html

  * igt@kms_pm_rpm@universal-planes@plane-68:
    - shard-lnl:          [DMESG-WARN][170] ([Intel XE#3184]) -> [PASS][171]
   [170]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-lnl-7/igt@kms_pm_rpm@universal-planes@plane-68.html
   [171]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-lnl-3/igt@kms_pm_rpm@universal-planes@plane-68.html

  * igt@kms_universal_plane@cursor-fb-leak@pipe-a-edp-1:
    - shard-lnl:          [FAIL][172] ([Intel XE#899]) -> [PASS][173] +1 other test pass
   [172]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-lnl-2/igt@kms_universal_plane@cursor-fb-leak@pipe-a-edp-1.html
   [173]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-lnl-1/igt@kms_universal_plane@cursor-fb-leak@pipe-a-edp-1.html

  * igt@xe_evict@evict-mixed-many-threads-small:
    - shard-dg2-set2:     [TIMEOUT][174] ([Intel XE#1473]) -> [PASS][175]
   [174]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-466/igt@xe_evict@evict-mixed-many-threads-small.html
   [175]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-464/igt@xe_evict@evict-mixed-many-threads-small.html

  * igt@xe_exec_fault_mode@many-execqueues-bindexecqueue-userptr-invalidate-race-imm:
    - shard-lnl:          [FAIL][176] ([Intel XE#3160]) -> [PASS][177]
   [176]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-lnl-7/igt@xe_exec_fault_mode@many-execqueues-bindexecqueue-userptr-invalidate-race-imm.html
   [177]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-lnl-3/igt@xe_exec_fault_mode@many-execqueues-bindexecqueue-userptr-invalidate-race-imm.html

  * igt@xe_exec_fault_mode@many-execqueues-invalid-userptr-fault:
    - shard-lnl:          [FAIL][178] -> [PASS][179] +1 other test pass
   [178]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-lnl-8/igt@xe_exec_fault_mode@many-execqueues-invalid-userptr-fault.html
   [179]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-lnl-4/igt@xe_exec_fault_mode@many-execqueues-invalid-userptr-fault.html

  * igt@xe_exec_fault_mode@many-userptr-invalidate-race-prefetch:
    - shard-bmg:          [FAIL][180] ([Intel XE#3160]) -> [PASS][181]
   [180]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-bmg-8/igt@xe_exec_fault_mode@many-userptr-invalidate-race-prefetch.html
   [181]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-bmg-4/igt@xe_exec_fault_mode@many-userptr-invalidate-race-prefetch.html

  * igt@xe_pm@s2idle-vm-bind-unbind-all:
    - shard-dg2-set2:     [ABORT][182] ([Intel XE#1694] / [Intel XE#1794]) -> [PASS][183]
   [182]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-432/igt@xe_pm@s2idle-vm-bind-unbind-all.html
   [183]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-466/igt@xe_pm@s2idle-vm-bind-unbind-all.html

  * igt@xe_pm@s3-exec-after:
    - shard-dg2-set2:     [ABORT][184] ([Intel XE#1358]) -> [PASS][185]
   [184]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-432/igt@xe_pm@s3-exec-after.html
   [185]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-463/igt@xe_pm@s3-exec-after.html

  * igt@xe_pm@s4-basic:
    - shard-lnl:          [ABORT][186] ([Intel XE#1358] / [Intel XE#1607]) -> [PASS][187]
   [186]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-lnl-2/igt@xe_pm@s4-basic.html
   [187]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-lnl-6/igt@xe_pm@s4-basic.html

  * igt@xe_pm@s4-exec-after:
    - shard-adlp:         [ABORT][188] ([Intel XE#1358] / [Intel XE#1607]) -> [PASS][189] +1 other test pass
   [188]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-adlp-9/igt@xe_pm@s4-exec-after.html
   [189]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-4/igt@xe_pm@s4-exec-after.html

  
#### Warnings ####

  * igt@kms_big_fb@x-tiled-max-hw-stride-64bpp-rotate-0-async-flip:
    - shard-adlp:         [FAIL][190] ([Intel XE#1231] / [Intel XE#1242]) -> [DMESG-FAIL][191] ([Intel XE#3194])
   [190]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-adlp-4/igt@kms_big_fb@x-tiled-max-hw-stride-64bpp-rotate-0-async-flip.html
   [191]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-3/igt@kms_big_fb@x-tiled-max-hw-stride-64bpp-rotate-0-async-flip.html

  * igt@kms_plane@pixel-format:
    - shard-adlp:         [INCOMPLETE][192] ([Intel XE#1035]) -> [INCOMPLETE][193] ([Intel XE#1035] / [Intel XE#1195]) +1 other test incomplete
   [192]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-adlp-3/igt@kms_plane@pixel-format.html
   [193]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-adlp-6/igt@kms_plane@pixel-format.html

  * igt@xe_evict@evict-mixed-many-threads-large:
    - shard-dg2-set2:     [FAIL][194] ([Intel XE#1000]) -> [TIMEOUT][195] ([Intel XE#1473])
   [194]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932/shard-dg2-435/igt@xe_evict@evict-mixed-many-threads-large.html
   [195]: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/shard-dg2-436/igt@xe_evict@evict-mixed-many-threads-large.html

  
  [Intel XE#1000]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1000
  [Intel XE#1035]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1035
  [Intel XE#1122]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1122
  [Intel XE#1123]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1123
  [Intel XE#1124]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1124
  [Intel XE#1127]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1127
  [Intel XE#1129]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1129
  [Intel XE#1135]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1135
  [Intel XE#1195]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1195
  [Intel XE#1231]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1231
  [Intel XE#1242]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1242
  [Intel XE#1358]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1358
  [Intel XE#1392]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1392
  [Intel XE#1426]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1426
  [Intel XE#1473]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1473
  [Intel XE#1489]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1489
  [Intel XE#1508]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1508
  [Intel XE#1600]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1600
  [Intel XE#1607]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1607
  [Intel XE#1694]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1694
  [Intel XE#1727]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1727
  [Intel XE#1729]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1729
  [Intel XE#1794]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1794
  [Intel XE#1948]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/1948
  [Intel XE#2042]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2042
  [Intel XE#2049]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2049
  [Intel XE#2191]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2191
  [Intel XE#2205]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2205
  [Intel XE#2234]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2234
  [Intel XE#2249]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2249
  [Intel XE#2252]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2252
  [Intel XE#2293]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2293
  [Intel XE#2311]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2311
  [Intel XE#2313]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2313
  [Intel XE#2322]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2322
  [Intel XE#2333]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2333
  [Intel XE#2380]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2380
  [Intel XE#2392]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2392
  [Intel XE#2426]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2426
  [Intel XE#2472]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2472
  [Intel XE#2514]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2514
  [Intel XE#2541]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2541
  [Intel XE#2566]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2566
  [Intel XE#2577]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2577
  [Intel XE#2597]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2597
  [Intel XE#261]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/261
  [Intel XE#2625]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2625
  [Intel XE#2635]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2635
  [Intel XE#2715]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2715
  [Intel XE#2850]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2850
  [Intel XE#288]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/288
  [Intel XE#2882]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2882
  [Intel XE#2887]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2887
  [Intel XE#2905]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2905
  [Intel XE#2907]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2907
  [Intel XE#2919]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2919
  [Intel XE#2932]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2932
  [Intel XE#2953]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2953
  [Intel XE#301]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/301
  [Intel XE#306]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/306
  [Intel XE#307]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/307
  [Intel XE#308]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/308
  [Intel XE#3086]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3086
  [Intel XE#309]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/309
  [Intel XE#310]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/310
  [Intel XE#3113]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3113
  [Intel XE#3124]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3124
  [Intel XE#3129]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3129
  [Intel XE#316]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/316
  [Intel XE#3160]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3160
  [Intel XE#3184]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3184
  [Intel XE#3194]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3194
  [Intel XE#3226]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3226
  [Intel XE#3320]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/3320
  [Intel XE#362]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/362
  [Intel XE#367]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/367
  [Intel XE#373]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/373
  [Intel XE#455]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/455
  [Intel XE#616]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/616
  [Intel XE#651]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/651
  [Intel XE#653]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/653
  [Intel XE#656]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/656
  [Intel XE#688]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/688
  [Intel XE#787]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/787
  [Intel XE#877]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/877
  [Intel XE#886]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/886
  [Intel XE#899]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/899
  [Intel XE#929]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/929
  [Intel XE#944]: https://gitlab.freedesktop.org/drm/xe/kernel/issues/944
  [i915#3804]: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/3804


Build changes
-------------

  * Linux: xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932 -> xe-pw-140200v6

  IGT_8091: 8091
  xe-2152-9c4962db90300e1d8daca8ed54195bc31d0ee932: 9c4962db90300e1d8daca8ed54195bc31d0ee932
  xe-pw-140200v6: 140200v6

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140200v6/index.html

[-- Attachment #2: Type: text/html, Size: 61896 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-10-31 18:10 ` [PATCH v6 2/8] drm/ttm: Add ttm_bo_access Matthew Brost
@ 2024-10-31 23:43   ` Matthew Brost
  2024-11-04 17:34     ` Rodrigo Vivi
  2024-11-04 19:47     ` Christian König
  0 siblings, 2 replies; 56+ messages in thread
From: Matthew Brost @ 2024-10-31 23:43 UTC (permalink / raw)
  To: intel-xe, ckoenig.leichtzumerken, dri-devel; +Cc: matthew.auld

On Thu, Oct 31, 2024 at 11:10:42AM -0700, Matthew Brost wrote:
> Non-contiguous VRAM cannot easily be mapped in TTM nor can non-visible
> VRAM easily be accessed. Add ttm_bo_access, which is similar to
> ttm_bo_vm_access, to access such memory.
> 
> v4:
>  - Fix checkpatch warnings (CI)
> v5:
>  - Fix checkpatch warnings (CI)
> v6:
>  - Fix kernel doc (Auld)
> 

Christian - Do you mind if I merge patch along with the rest of the
series to drm-xe-next?

Matt

> Reported-by: Christoph Manszewski <christoph.manszewski@intel.com>
> Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Tested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> ---
>  drivers/gpu/drm/ttm/ttm_bo_util.c | 86 +++++++++++++++++++++++++++++++
>  drivers/gpu/drm/ttm/ttm_bo_vm.c   | 65 +----------------------
>  include/drm/ttm/ttm_bo.h          |  2 +
>  3 files changed, 89 insertions(+), 64 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> index d939925efa81..77e760ea7193 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> @@ -919,3 +919,89 @@ s64 ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct ttm_device *bdev,
>  
>  	return progress;
>  }
> +
> +static int ttm_bo_access_kmap(struct ttm_buffer_object *bo,
> +			      unsigned long offset,
> +			      void *buf, int len, int write)
> +{
> +	unsigned long page = offset >> PAGE_SHIFT;
> +	unsigned long bytes_left = len;
> +	int ret;
> +
> +	/* Copy a page at a time, that way no extra virtual address
> +	 * mapping is needed
> +	 */
> +	offset -= page << PAGE_SHIFT;
> +	do {
> +		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
> +		struct ttm_bo_kmap_obj map;
> +		void *ptr;
> +		bool is_iomem;
> +
> +		ret = ttm_bo_kmap(bo, page, 1, &map);
> +		if (ret)
> +			return ret;
> +
> +		ptr = (void *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
> +		WARN_ON_ONCE(is_iomem);
> +		if (write)
> +			memcpy(ptr, buf, bytes);
> +		else
> +			memcpy(buf, ptr, bytes);
> +		ttm_bo_kunmap(&map);
> +
> +		page++;
> +		buf += bytes;
> +		bytes_left -= bytes;
> +		offset = 0;
> +	} while (bytes_left);
> +
> +	return len;
> +}
> +
> +/**
> + * ttm_bo_access - Helper to access a buffer object
> + *
> + * @bo: ttm buffer object
> + * @offset: access offset into buffer object
> + * @buf: pointer to caller memory to read into or write from
> + * @len: length of access
> + * @write: write access
> + *
> + * Utility function to access a buffer object. Useful when buffer object cannot
> + * be easily mapped (non-contiguous, non-visible, etc...).
> + *
> + * Returns:
> + * @len if successful, negative error code on failure.
> + */
> +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
> +		  void *buf, int len, int write)
> +{
> +	int ret;
> +
> +	if (len < 1 || (offset + len) > bo->base.size)
> +		return -EIO;
> +
> +	ret = ttm_bo_reserve(bo, true, false, NULL);
> +	if (ret)
> +		return ret;
> +
> +	switch (bo->resource->mem_type) {
> +	case TTM_PL_SYSTEM:
> +		fallthrough;
> +	case TTM_PL_TT:
> +		ret = ttm_bo_access_kmap(bo, offset, buf, len, write);
> +		break;
> +	default:
> +		if (bo->bdev->funcs->access_memory)
> +			ret = bo->bdev->funcs->access_memory
> +				(bo, offset, buf, len, write);
> +		else
> +			ret = -EIO;
> +	}
> +
> +	ttm_bo_unreserve(bo);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(ttm_bo_access);
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> index 2c699ed1963a..20b1e5f78684 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> @@ -366,45 +366,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma)
>  }
>  EXPORT_SYMBOL(ttm_bo_vm_close);
>  
> -static int ttm_bo_vm_access_kmap(struct ttm_buffer_object *bo,
> -				 unsigned long offset,
> -				 uint8_t *buf, int len, int write)
> -{
> -	unsigned long page = offset >> PAGE_SHIFT;
> -	unsigned long bytes_left = len;
> -	int ret;
> -
> -	/* Copy a page at a time, that way no extra virtual address
> -	 * mapping is needed
> -	 */
> -	offset -= page << PAGE_SHIFT;
> -	do {
> -		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
> -		struct ttm_bo_kmap_obj map;
> -		void *ptr;
> -		bool is_iomem;
> -
> -		ret = ttm_bo_kmap(bo, page, 1, &map);
> -		if (ret)
> -			return ret;
> -
> -		ptr = (uint8_t *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
> -		WARN_ON_ONCE(is_iomem);
> -		if (write)
> -			memcpy(ptr, buf, bytes);
> -		else
> -			memcpy(buf, ptr, bytes);
> -		ttm_bo_kunmap(&map);
> -
> -		page++;
> -		buf += bytes;
> -		bytes_left -= bytes;
> -		offset = 0;
> -	} while (bytes_left);
> -
> -	return len;
> -}
> -
>  int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
>  		     void *buf, int len, int write)
>  {
> @@ -412,32 +373,8 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
>  	unsigned long offset = (addr) - vma->vm_start +
>  		((vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node))
>  		 << PAGE_SHIFT);
> -	int ret;
> -
> -	if (len < 1 || (offset + len) > bo->base.size)
> -		return -EIO;
>  
> -	ret = ttm_bo_reserve(bo, true, false, NULL);
> -	if (ret)
> -		return ret;
> -
> -	switch (bo->resource->mem_type) {
> -	case TTM_PL_SYSTEM:
> -		fallthrough;
> -	case TTM_PL_TT:
> -		ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write);
> -		break;
> -	default:
> -		if (bo->bdev->funcs->access_memory)
> -			ret = bo->bdev->funcs->access_memory(
> -				bo, offset, buf, len, write);
> -		else
> -			ret = -EIO;
> -	}
> -
> -	ttm_bo_unreserve(bo);
> -
> -	return ret;
> +	return ttm_bo_access(bo, offset, buf, len, write);
>  }
>  EXPORT_SYMBOL(ttm_bo_vm_access);
>  
> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> index 5804408815be..8ea11cd8df39 100644
> --- a/include/drm/ttm/ttm_bo.h
> +++ b/include/drm/ttm/ttm_bo.h
> @@ -421,6 +421,8 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo);
>  int ttm_bo_evict_first(struct ttm_device *bdev,
>  		       struct ttm_resource_manager *man,
>  		       struct ttm_operation_ctx *ctx);
> +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
> +		  void *buf, int len, int write);
>  vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
>  			     struct vm_fault *vmf);
>  vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-10-31 23:43   ` Matthew Brost
@ 2024-11-04 17:34     ` Rodrigo Vivi
  2024-11-04 19:28       ` Christian König
  2024-11-04 19:47     ` Christian König
  1 sibling, 1 reply; 56+ messages in thread
From: Rodrigo Vivi @ 2024-11-04 17:34 UTC (permalink / raw)
  To: Matthew Brost, Christian Koenig, Huang Rui
  Cc: intel-xe, ckoenig.leichtzumerken, dri-devel, matthew.auld

On Thu, Oct 31, 2024 at 04:43:19PM -0700, Matthew Brost wrote:
> On Thu, Oct 31, 2024 at 11:10:42AM -0700, Matthew Brost wrote:
> > Non-contiguous VRAM cannot easily be mapped in TTM nor can non-visible
> > VRAM easily be accessed. Add ttm_bo_access, which is similar to
> > ttm_bo_vm_access, to access such memory.
> > 
> > v4:
> >  - Fix checkpatch warnings (CI)
> > v5:
> >  - Fix checkpatch warnings (CI)
> > v6:
> >  - Fix kernel doc (Auld)
> > 
> 
> Christian - Do you mind if I merge patch along with the rest of the
> series to drm-xe-next?

Ray, Christian,

ack on getting this patch to drm-xe-next?

> 
> Matt
> 
> > Reported-by: Christoph Manszewski <christoph.manszewski@intel.com>
> > Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Tested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> > ---
> >  drivers/gpu/drm/ttm/ttm_bo_util.c | 86 +++++++++++++++++++++++++++++++
> >  drivers/gpu/drm/ttm/ttm_bo_vm.c   | 65 +----------------------
> >  include/drm/ttm/ttm_bo.h          |  2 +
> >  3 files changed, 89 insertions(+), 64 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > index d939925efa81..77e760ea7193 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > @@ -919,3 +919,89 @@ s64 ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct ttm_device *bdev,
> >  
> >  	return progress;
> >  }
> > +
> > +static int ttm_bo_access_kmap(struct ttm_buffer_object *bo,
> > +			      unsigned long offset,
> > +			      void *buf, int len, int write)
> > +{
> > +	unsigned long page = offset >> PAGE_SHIFT;
> > +	unsigned long bytes_left = len;
> > +	int ret;
> > +
> > +	/* Copy a page at a time, that way no extra virtual address
> > +	 * mapping is needed
> > +	 */
> > +	offset -= page << PAGE_SHIFT;
> > +	do {
> > +		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
> > +		struct ttm_bo_kmap_obj map;
> > +		void *ptr;
> > +		bool is_iomem;
> > +
> > +		ret = ttm_bo_kmap(bo, page, 1, &map);
> > +		if (ret)
> > +			return ret;
> > +
> > +		ptr = (void *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
> > +		WARN_ON_ONCE(is_iomem);
> > +		if (write)
> > +			memcpy(ptr, buf, bytes);
> > +		else
> > +			memcpy(buf, ptr, bytes);
> > +		ttm_bo_kunmap(&map);
> > +
> > +		page++;
> > +		buf += bytes;
> > +		bytes_left -= bytes;
> > +		offset = 0;
> > +	} while (bytes_left);
> > +
> > +	return len;
> > +}
> > +
> > +/**
> > + * ttm_bo_access - Helper to access a buffer object
> > + *
> > + * @bo: ttm buffer object
> > + * @offset: access offset into buffer object
> > + * @buf: pointer to caller memory to read into or write from
> > + * @len: length of access
> > + * @write: write access
> > + *
> > + * Utility function to access a buffer object. Useful when buffer object cannot
> > + * be easily mapped (non-contiguous, non-visible, etc...).
> > + *
> > + * Returns:
> > + * @len if successful, negative error code on failure.
> > + */
> > +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
> > +		  void *buf, int len, int write)
> > +{
> > +	int ret;
> > +
> > +	if (len < 1 || (offset + len) > bo->base.size)
> > +		return -EIO;
> > +
> > +	ret = ttm_bo_reserve(bo, true, false, NULL);
> > +	if (ret)
> > +		return ret;
> > +
> > +	switch (bo->resource->mem_type) {
> > +	case TTM_PL_SYSTEM:
> > +		fallthrough;
> > +	case TTM_PL_TT:
> > +		ret = ttm_bo_access_kmap(bo, offset, buf, len, write);
> > +		break;
> > +	default:
> > +		if (bo->bdev->funcs->access_memory)
> > +			ret = bo->bdev->funcs->access_memory
> > +				(bo, offset, buf, len, write);
> > +		else
> > +			ret = -EIO;
> > +	}
> > +
> > +	ttm_bo_unreserve(bo);
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL(ttm_bo_access);
> > diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > index 2c699ed1963a..20b1e5f78684 100644
> > --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > @@ -366,45 +366,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma)
> >  }
> >  EXPORT_SYMBOL(ttm_bo_vm_close);
> >  
> > -static int ttm_bo_vm_access_kmap(struct ttm_buffer_object *bo,
> > -				 unsigned long offset,
> > -				 uint8_t *buf, int len, int write)
> > -{
> > -	unsigned long page = offset >> PAGE_SHIFT;
> > -	unsigned long bytes_left = len;
> > -	int ret;
> > -
> > -	/* Copy a page at a time, that way no extra virtual address
> > -	 * mapping is needed
> > -	 */
> > -	offset -= page << PAGE_SHIFT;
> > -	do {
> > -		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
> > -		struct ttm_bo_kmap_obj map;
> > -		void *ptr;
> > -		bool is_iomem;
> > -
> > -		ret = ttm_bo_kmap(bo, page, 1, &map);
> > -		if (ret)
> > -			return ret;
> > -
> > -		ptr = (uint8_t *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
> > -		WARN_ON_ONCE(is_iomem);
> > -		if (write)
> > -			memcpy(ptr, buf, bytes);
> > -		else
> > -			memcpy(buf, ptr, bytes);
> > -		ttm_bo_kunmap(&map);
> > -
> > -		page++;
> > -		buf += bytes;
> > -		bytes_left -= bytes;
> > -		offset = 0;
> > -	} while (bytes_left);
> > -
> > -	return len;
> > -}
> > -
> >  int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
> >  		     void *buf, int len, int write)
> >  {
> > @@ -412,32 +373,8 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
> >  	unsigned long offset = (addr) - vma->vm_start +
> >  		((vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node))
> >  		 << PAGE_SHIFT);
> > -	int ret;
> > -
> > -	if (len < 1 || (offset + len) > bo->base.size)
> > -		return -EIO;
> >  
> > -	ret = ttm_bo_reserve(bo, true, false, NULL);
> > -	if (ret)
> > -		return ret;
> > -
> > -	switch (bo->resource->mem_type) {
> > -	case TTM_PL_SYSTEM:
> > -		fallthrough;
> > -	case TTM_PL_TT:
> > -		ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write);
> > -		break;
> > -	default:
> > -		if (bo->bdev->funcs->access_memory)
> > -			ret = bo->bdev->funcs->access_memory(
> > -				bo, offset, buf, len, write);
> > -		else
> > -			ret = -EIO;
> > -	}
> > -
> > -	ttm_bo_unreserve(bo);
> > -
> > -	return ret;
> > +	return ttm_bo_access(bo, offset, buf, len, write);
> >  }
> >  EXPORT_SYMBOL(ttm_bo_vm_access);
> >  
> > diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> > index 5804408815be..8ea11cd8df39 100644
> > --- a/include/drm/ttm/ttm_bo.h
> > +++ b/include/drm/ttm/ttm_bo.h
> > @@ -421,6 +421,8 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo);
> >  int ttm_bo_evict_first(struct ttm_device *bdev,
> >  		       struct ttm_resource_manager *man,
> >  		       struct ttm_operation_ctx *ctx);
> > +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
> > +		  void *buf, int len, int write);
> >  vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
> >  			     struct vm_fault *vmf);
> >  vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-04 17:34     ` Rodrigo Vivi
@ 2024-11-04 19:28       ` Christian König
  2024-11-04 21:49         ` Matthew Brost
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2024-11-04 19:28 UTC (permalink / raw)
  To: Rodrigo Vivi, Matthew Brost, Christian Koenig, Huang Rui
  Cc: intel-xe, dri-devel, matthew.auld

Am 04.11.24 um 18:34 schrieb Rodrigo Vivi:
> On Thu, Oct 31, 2024 at 04:43:19PM -0700, Matthew Brost wrote:
>> On Thu, Oct 31, 2024 at 11:10:42AM -0700, Matthew Brost wrote:
>>> Non-contiguous VRAM cannot easily be mapped in TTM nor can non-visible
>>> VRAM easily be accessed. Add ttm_bo_access, which is similar to
>>> ttm_bo_vm_access, to access such memory.
>>>
>>> v4:
>>>   - Fix checkpatch warnings (CI)
>>> v5:
>>>   - Fix checkpatch warnings (CI)
>>> v6:
>>>   - Fix kernel doc (Auld)
>>>
>> Christian - Do you mind if I merge patch along with the rest of the
>> series to drm-xe-next?
> Ray, Christian,
>
> ack on getting this patch to drm-xe-next?

No, we actually spend quite some time removing the single page mapping 
functionality for BOs.

You need a really good justification to bring that back.

Regards,
Christian.

>
>> Matt
>>
>>> Reported-by: Christoph Manszewski <christoph.manszewski@intel.com>
>>> Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> Tested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
>>> ---
>>>   drivers/gpu/drm/ttm/ttm_bo_util.c | 86 +++++++++++++++++++++++++++++++
>>>   drivers/gpu/drm/ttm/ttm_bo_vm.c   | 65 +----------------------
>>>   include/drm/ttm/ttm_bo.h          |  2 +
>>>   3 files changed, 89 insertions(+), 64 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> index d939925efa81..77e760ea7193 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> @@ -919,3 +919,89 @@ s64 ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct ttm_device *bdev,
>>>   
>>>   	return progress;
>>>   }
>>> +
>>> +static int ttm_bo_access_kmap(struct ttm_buffer_object *bo,
>>> +			      unsigned long offset,
>>> +			      void *buf, int len, int write)
>>> +{
>>> +	unsigned long page = offset >> PAGE_SHIFT;
>>> +	unsigned long bytes_left = len;
>>> +	int ret;
>>> +
>>> +	/* Copy a page at a time, that way no extra virtual address
>>> +	 * mapping is needed
>>> +	 */
>>> +	offset -= page << PAGE_SHIFT;
>>> +	do {
>>> +		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
>>> +		struct ttm_bo_kmap_obj map;
>>> +		void *ptr;
>>> +		bool is_iomem;
>>> +
>>> +		ret = ttm_bo_kmap(bo, page, 1, &map);
>>> +		if (ret)
>>> +			return ret;
>>> +
>>> +		ptr = (void *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
>>> +		WARN_ON_ONCE(is_iomem);
>>> +		if (write)
>>> +			memcpy(ptr, buf, bytes);
>>> +		else
>>> +			memcpy(buf, ptr, bytes);
>>> +		ttm_bo_kunmap(&map);
>>> +
>>> +		page++;
>>> +		buf += bytes;
>>> +		bytes_left -= bytes;
>>> +		offset = 0;
>>> +	} while (bytes_left);
>>> +
>>> +	return len;
>>> +}
>>> +
>>> +/**
>>> + * ttm_bo_access - Helper to access a buffer object
>>> + *
>>> + * @bo: ttm buffer object
>>> + * @offset: access offset into buffer object
>>> + * @buf: pointer to caller memory to read into or write from
>>> + * @len: length of access
>>> + * @write: write access
>>> + *
>>> + * Utility function to access a buffer object. Useful when buffer object cannot
>>> + * be easily mapped (non-contiguous, non-visible, etc...).
>>> + *
>>> + * Returns:
>>> + * @len if successful, negative error code on failure.
>>> + */
>>> +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
>>> +		  void *buf, int len, int write)
>>> +{
>>> +	int ret;
>>> +
>>> +	if (len < 1 || (offset + len) > bo->base.size)
>>> +		return -EIO;
>>> +
>>> +	ret = ttm_bo_reserve(bo, true, false, NULL);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	switch (bo->resource->mem_type) {
>>> +	case TTM_PL_SYSTEM:
>>> +		fallthrough;
>>> +	case TTM_PL_TT:
>>> +		ret = ttm_bo_access_kmap(bo, offset, buf, len, write);
>>> +		break;
>>> +	default:
>>> +		if (bo->bdev->funcs->access_memory)
>>> +			ret = bo->bdev->funcs->access_memory
>>> +				(bo, offset, buf, len, write);
>>> +		else
>>> +			ret = -EIO;
>>> +	}
>>> +
>>> +	ttm_bo_unreserve(bo);
>>> +
>>> +	return ret;
>>> +}
>>> +EXPORT_SYMBOL(ttm_bo_access);
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>> index 2c699ed1963a..20b1e5f78684 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>> @@ -366,45 +366,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma)
>>>   }
>>>   EXPORT_SYMBOL(ttm_bo_vm_close);
>>>   
>>> -static int ttm_bo_vm_access_kmap(struct ttm_buffer_object *bo,
>>> -				 unsigned long offset,
>>> -				 uint8_t *buf, int len, int write)
>>> -{
>>> -	unsigned long page = offset >> PAGE_SHIFT;
>>> -	unsigned long bytes_left = len;
>>> -	int ret;
>>> -
>>> -	/* Copy a page at a time, that way no extra virtual address
>>> -	 * mapping is needed
>>> -	 */
>>> -	offset -= page << PAGE_SHIFT;
>>> -	do {
>>> -		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
>>> -		struct ttm_bo_kmap_obj map;
>>> -		void *ptr;
>>> -		bool is_iomem;
>>> -
>>> -		ret = ttm_bo_kmap(bo, page, 1, &map);
>>> -		if (ret)
>>> -			return ret;
>>> -
>>> -		ptr = (uint8_t *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
>>> -		WARN_ON_ONCE(is_iomem);
>>> -		if (write)
>>> -			memcpy(ptr, buf, bytes);
>>> -		else
>>> -			memcpy(buf, ptr, bytes);
>>> -		ttm_bo_kunmap(&map);
>>> -
>>> -		page++;
>>> -		buf += bytes;
>>> -		bytes_left -= bytes;
>>> -		offset = 0;
>>> -	} while (bytes_left);
>>> -
>>> -	return len;
>>> -}
>>> -
>>>   int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
>>>   		     void *buf, int len, int write)
>>>   {
>>> @@ -412,32 +373,8 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
>>>   	unsigned long offset = (addr) - vma->vm_start +
>>>   		((vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node))
>>>   		 << PAGE_SHIFT);
>>> -	int ret;
>>> -
>>> -	if (len < 1 || (offset + len) > bo->base.size)
>>> -		return -EIO;
>>>   
>>> -	ret = ttm_bo_reserve(bo, true, false, NULL);
>>> -	if (ret)
>>> -		return ret;
>>> -
>>> -	switch (bo->resource->mem_type) {
>>> -	case TTM_PL_SYSTEM:
>>> -		fallthrough;
>>> -	case TTM_PL_TT:
>>> -		ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write);
>>> -		break;
>>> -	default:
>>> -		if (bo->bdev->funcs->access_memory)
>>> -			ret = bo->bdev->funcs->access_memory(
>>> -				bo, offset, buf, len, write);
>>> -		else
>>> -			ret = -EIO;
>>> -	}
>>> -
>>> -	ttm_bo_unreserve(bo);
>>> -
>>> -	return ret;
>>> +	return ttm_bo_access(bo, offset, buf, len, write);
>>>   }
>>>   EXPORT_SYMBOL(ttm_bo_vm_access);
>>>   
>>> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
>>> index 5804408815be..8ea11cd8df39 100644
>>> --- a/include/drm/ttm/ttm_bo.h
>>> +++ b/include/drm/ttm/ttm_bo.h
>>> @@ -421,6 +421,8 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo);
>>>   int ttm_bo_evict_first(struct ttm_device *bdev,
>>>   		       struct ttm_resource_manager *man,
>>>   		       struct ttm_operation_ctx *ctx);
>>> +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
>>> +		  void *buf, int len, int write);
>>>   vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
>>>   			     struct vm_fault *vmf);
>>>   vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
>>> -- 
>>> 2.34.1
>>>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-10-31 23:43   ` Matthew Brost
  2024-11-04 17:34     ` Rodrigo Vivi
@ 2024-11-04 19:47     ` Christian König
  2024-11-04 21:30       ` Matthew Brost
  1 sibling, 1 reply; 56+ messages in thread
From: Christian König @ 2024-11-04 19:47 UTC (permalink / raw)
  To: Matthew Brost, intel-xe, dri-devel; +Cc: matthew.auld

Am 01.11.24 um 00:43 schrieb Matthew Brost:
> On Thu, Oct 31, 2024 at 11:10:42AM -0700, Matthew Brost wrote:
>> Non-contiguous VRAM cannot easily be mapped in TTM nor can non-visible
>> VRAM easily be accessed. Add ttm_bo_access, which is similar to
>> ttm_bo_vm_access, to access such memory.
>>
>> v4:
>>   - Fix checkpatch warnings (CI)
>> v5:
>>   - Fix checkpatch warnings (CI)
>> v6:
>>   - Fix kernel doc (Auld)
>>
> Christian - Do you mind if I merge patch along with the rest of the
> series to drm-xe-next?

I don't see the original patch anywhere in my inbox, please make sure to 
CC me while sending things out.

Apart from that I absolutely don't see any justification for this patch. 
You move stuff into ttm_bo_util.c which not even remotely belongs in there.

Regards,
Christian.

>
> Matt
>
>> Reported-by: Christoph Manszewski <christoph.manszewski@intel.com>
>> Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> Tested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo_util.c | 86 +++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/ttm/ttm_bo_vm.c   | 65 +----------------------
>>   include/drm/ttm/ttm_bo.h          |  2 +
>>   3 files changed, 89 insertions(+), 64 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
>> index d939925efa81..77e760ea7193 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>> @@ -919,3 +919,89 @@ s64 ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct ttm_device *bdev,
>>   
>>   	return progress;
>>   }
>> +
>> +static int ttm_bo_access_kmap(struct ttm_buffer_object *bo,
>> +			      unsigned long offset,
>> +			      void *buf, int len, int write)
>> +{
>> +	unsigned long page = offset >> PAGE_SHIFT;
>> +	unsigned long bytes_left = len;
>> +	int ret;
>> +
>> +	/* Copy a page at a time, that way no extra virtual address
>> +	 * mapping is needed
>> +	 */
>> +	offset -= page << PAGE_SHIFT;
>> +	do {
>> +		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
>> +		struct ttm_bo_kmap_obj map;
>> +		void *ptr;
>> +		bool is_iomem;
>> +
>> +		ret = ttm_bo_kmap(bo, page, 1, &map);
>> +		if (ret)
>> +			return ret;
>> +
>> +		ptr = (void *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
>> +		WARN_ON_ONCE(is_iomem);
>> +		if (write)
>> +			memcpy(ptr, buf, bytes);
>> +		else
>> +			memcpy(buf, ptr, bytes);
>> +		ttm_bo_kunmap(&map);
>> +
>> +		page++;
>> +		buf += bytes;
>> +		bytes_left -= bytes;
>> +		offset = 0;
>> +	} while (bytes_left);
>> +
>> +	return len;
>> +}
>> +
>> +/**
>> + * ttm_bo_access - Helper to access a buffer object
>> + *
>> + * @bo: ttm buffer object
>> + * @offset: access offset into buffer object
>> + * @buf: pointer to caller memory to read into or write from
>> + * @len: length of access
>> + * @write: write access
>> + *
>> + * Utility function to access a buffer object. Useful when buffer object cannot
>> + * be easily mapped (non-contiguous, non-visible, etc...).
>> + *
>> + * Returns:
>> + * @len if successful, negative error code on failure.
>> + */
>> +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
>> +		  void *buf, int len, int write)
>> +{
>> +	int ret;
>> +
>> +	if (len < 1 || (offset + len) > bo->base.size)
>> +		return -EIO;
>> +
>> +	ret = ttm_bo_reserve(bo, true, false, NULL);
>> +	if (ret)
>> +		return ret;
>> +
>> +	switch (bo->resource->mem_type) {
>> +	case TTM_PL_SYSTEM:
>> +		fallthrough;
>> +	case TTM_PL_TT:
>> +		ret = ttm_bo_access_kmap(bo, offset, buf, len, write);
>> +		break;
>> +	default:
>> +		if (bo->bdev->funcs->access_memory)
>> +			ret = bo->bdev->funcs->access_memory
>> +				(bo, offset, buf, len, write);
>> +		else
>> +			ret = -EIO;
>> +	}
>> +
>> +	ttm_bo_unreserve(bo);
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(ttm_bo_access);
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> index 2c699ed1963a..20b1e5f78684 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> @@ -366,45 +366,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma)
>>   }
>>   EXPORT_SYMBOL(ttm_bo_vm_close);
>>   
>> -static int ttm_bo_vm_access_kmap(struct ttm_buffer_object *bo,
>> -				 unsigned long offset,
>> -				 uint8_t *buf, int len, int write)
>> -{
>> -	unsigned long page = offset >> PAGE_SHIFT;
>> -	unsigned long bytes_left = len;
>> -	int ret;
>> -
>> -	/* Copy a page at a time, that way no extra virtual address
>> -	 * mapping is needed
>> -	 */
>> -	offset -= page << PAGE_SHIFT;
>> -	do {
>> -		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
>> -		struct ttm_bo_kmap_obj map;
>> -		void *ptr;
>> -		bool is_iomem;
>> -
>> -		ret = ttm_bo_kmap(bo, page, 1, &map);
>> -		if (ret)
>> -			return ret;
>> -
>> -		ptr = (uint8_t *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
>> -		WARN_ON_ONCE(is_iomem);
>> -		if (write)
>> -			memcpy(ptr, buf, bytes);
>> -		else
>> -			memcpy(buf, ptr, bytes);
>> -		ttm_bo_kunmap(&map);
>> -
>> -		page++;
>> -		buf += bytes;
>> -		bytes_left -= bytes;
>> -		offset = 0;
>> -	} while (bytes_left);
>> -
>> -	return len;
>> -}
>> -
>>   int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
>>   		     void *buf, int len, int write)
>>   {
>> @@ -412,32 +373,8 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
>>   	unsigned long offset = (addr) - vma->vm_start +
>>   		((vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node))
>>   		 << PAGE_SHIFT);
>> -	int ret;
>> -
>> -	if (len < 1 || (offset + len) > bo->base.size)
>> -		return -EIO;
>>   
>> -	ret = ttm_bo_reserve(bo, true, false, NULL);
>> -	if (ret)
>> -		return ret;
>> -
>> -	switch (bo->resource->mem_type) {
>> -	case TTM_PL_SYSTEM:
>> -		fallthrough;
>> -	case TTM_PL_TT:
>> -		ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write);
>> -		break;
>> -	default:
>> -		if (bo->bdev->funcs->access_memory)
>> -			ret = bo->bdev->funcs->access_memory(
>> -				bo, offset, buf, len, write);
>> -		else
>> -			ret = -EIO;
>> -	}
>> -
>> -	ttm_bo_unreserve(bo);
>> -
>> -	return ret;
>> +	return ttm_bo_access(bo, offset, buf, len, write);
>>   }
>>   EXPORT_SYMBOL(ttm_bo_vm_access);
>>   
>> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
>> index 5804408815be..8ea11cd8df39 100644
>> --- a/include/drm/ttm/ttm_bo.h
>> +++ b/include/drm/ttm/ttm_bo.h
>> @@ -421,6 +421,8 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo);
>>   int ttm_bo_evict_first(struct ttm_device *bdev,
>>   		       struct ttm_resource_manager *man,
>>   		       struct ttm_operation_ctx *ctx);
>> +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
>> +		  void *buf, int len, int write);
>>   vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
>>   			     struct vm_fault *vmf);
>>   vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
>> -- 
>> 2.34.1
>>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-04 19:47     ` Christian König
@ 2024-11-04 21:30       ` Matthew Brost
  2024-11-04 22:26         ` Rodrigo Vivi
  0 siblings, 1 reply; 56+ messages in thread
From: Matthew Brost @ 2024-11-04 21:30 UTC (permalink / raw)
  To: Christian König; +Cc: intel-xe, dri-devel, matthew.auld

On Mon, Nov 04, 2024 at 08:47:01PM +0100, Christian König wrote:
> Am 01.11.24 um 00:43 schrieb Matthew Brost:
> > On Thu, Oct 31, 2024 at 11:10:42AM -0700, Matthew Brost wrote:
> > > Non-contiguous VRAM cannot easily be mapped in TTM nor can non-visible
> > > VRAM easily be accessed. Add ttm_bo_access, which is similar to
> > > ttm_bo_vm_access, to access such memory.
> > > 
> > > v4:
> > >   - Fix checkpatch warnings (CI)
> > > v5:
> > >   - Fix checkpatch warnings (CI)
> > > v6:
> > >   - Fix kernel doc (Auld)
> > > 
> > Christian - Do you mind if I merge patch along with the rest of the
> > series to drm-xe-next?
> 
> I don't see the original patch anywhere in my inbox, please make sure to CC
> me while sending things out.
> 

I think I had you on an earlier revision but used the wrong alias to send out
this latest one. I will be sure to include you on future patches.

Would you like to continue the discussion here, or should I send out a fresh
revision with you included and an updated commit message?

> Apart from that I absolutely don't see any justification for this patch. You
> move stuff into ttm_bo_util.c which not even remotely belongs in there.
> 

The justification is that EuDebugger requires essentially the same functionality
as ptrace -> vm_access. This patch simply adds a helper to achieve this. There
is no functional change to the existing code.

Regarding the statement about ttm_bo_util.c, that seems quite aggressive. It is
a TTM BO helper function, so it could logically belong in either ttm_bo.c or
ttm_bo_util.c. A BO helper definitely shouldn't call into ttm_bo_vm.c, nor
should it reside there. Perhaps I chose the wrong ttm_bo* file? I apologize for
that. It would be helpful to know why you think this is the wrong place so I can
better understand your expectations for TTM.

Matt

> Regards,
> Christian.
> 
> > 
> > Matt
> > 
> > > Reported-by: Christoph Manszewski <christoph.manszewski@intel.com>
> > > Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > Tested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> > > ---
> > >   drivers/gpu/drm/ttm/ttm_bo_util.c | 86 +++++++++++++++++++++++++++++++
> > >   drivers/gpu/drm/ttm/ttm_bo_vm.c   | 65 +----------------------
> > >   include/drm/ttm/ttm_bo.h          |  2 +
> > >   3 files changed, 89 insertions(+), 64 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > index d939925efa81..77e760ea7193 100644
> > > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > @@ -919,3 +919,89 @@ s64 ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct ttm_device *bdev,
> > >   	return progress;
> > >   }
> > > +
> > > +static int ttm_bo_access_kmap(struct ttm_buffer_object *bo,
> > > +			      unsigned long offset,
> > > +			      void *buf, int len, int write)
> > > +{
> > > +	unsigned long page = offset >> PAGE_SHIFT;
> > > +	unsigned long bytes_left = len;
> > > +	int ret;
> > > +
> > > +	/* Copy a page at a time, that way no extra virtual address
> > > +	 * mapping is needed
> > > +	 */
> > > +	offset -= page << PAGE_SHIFT;
> > > +	do {
> > > +		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
> > > +		struct ttm_bo_kmap_obj map;
> > > +		void *ptr;
> > > +		bool is_iomem;
> > > +
> > > +		ret = ttm_bo_kmap(bo, page, 1, &map);
> > > +		if (ret)
> > > +			return ret;
> > > +
> > > +		ptr = (void *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
> > > +		WARN_ON_ONCE(is_iomem);
> > > +		if (write)
> > > +			memcpy(ptr, buf, bytes);
> > > +		else
> > > +			memcpy(buf, ptr, bytes);
> > > +		ttm_bo_kunmap(&map);
> > > +
> > > +		page++;
> > > +		buf += bytes;
> > > +		bytes_left -= bytes;
> > > +		offset = 0;
> > > +	} while (bytes_left);
> > > +
> > > +	return len;
> > > +}
> > > +
> > > +/**
> > > + * ttm_bo_access - Helper to access a buffer object
> > > + *
> > > + * @bo: ttm buffer object
> > > + * @offset: access offset into buffer object
> > > + * @buf: pointer to caller memory to read into or write from
> > > + * @len: length of access
> > > + * @write: write access
> > > + *
> > > + * Utility function to access a buffer object. Useful when buffer object cannot
> > > + * be easily mapped (non-contiguous, non-visible, etc...).
> > > + *
> > > + * Returns:
> > > + * @len if successful, negative error code on failure.
> > > + */
> > > +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
> > > +		  void *buf, int len, int write)
> > > +{
> > > +	int ret;
> > > +
> > > +	if (len < 1 || (offset + len) > bo->base.size)
> > > +		return -EIO;
> > > +
> > > +	ret = ttm_bo_reserve(bo, true, false, NULL);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	switch (bo->resource->mem_type) {
> > > +	case TTM_PL_SYSTEM:
> > > +		fallthrough;
> > > +	case TTM_PL_TT:
> > > +		ret = ttm_bo_access_kmap(bo, offset, buf, len, write);
> > > +		break;
> > > +	default:
> > > +		if (bo->bdev->funcs->access_memory)
> > > +			ret = bo->bdev->funcs->access_memory
> > > +				(bo, offset, buf, len, write);
> > > +		else
> > > +			ret = -EIO;
> > > +	}
> > > +
> > > +	ttm_bo_unreserve(bo);
> > > +
> > > +	return ret;
> > > +}
> > > +EXPORT_SYMBOL(ttm_bo_access);
> > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > index 2c699ed1963a..20b1e5f78684 100644
> > > --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > @@ -366,45 +366,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma)
> > >   }
> > >   EXPORT_SYMBOL(ttm_bo_vm_close);
> > > -static int ttm_bo_vm_access_kmap(struct ttm_buffer_object *bo,
> > > -				 unsigned long offset,
> > > -				 uint8_t *buf, int len, int write)
> > > -{
> > > -	unsigned long page = offset >> PAGE_SHIFT;
> > > -	unsigned long bytes_left = len;
> > > -	int ret;
> > > -
> > > -	/* Copy a page at a time, that way no extra virtual address
> > > -	 * mapping is needed
> > > -	 */
> > > -	offset -= page << PAGE_SHIFT;
> > > -	do {
> > > -		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
> > > -		struct ttm_bo_kmap_obj map;
> > > -		void *ptr;
> > > -		bool is_iomem;
> > > -
> > > -		ret = ttm_bo_kmap(bo, page, 1, &map);
> > > -		if (ret)
> > > -			return ret;
> > > -
> > > -		ptr = (uint8_t *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
> > > -		WARN_ON_ONCE(is_iomem);
> > > -		if (write)
> > > -			memcpy(ptr, buf, bytes);
> > > -		else
> > > -			memcpy(buf, ptr, bytes);
> > > -		ttm_bo_kunmap(&map);
> > > -
> > > -		page++;
> > > -		buf += bytes;
> > > -		bytes_left -= bytes;
> > > -		offset = 0;
> > > -	} while (bytes_left);
> > > -
> > > -	return len;
> > > -}
> > > -
> > >   int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
> > >   		     void *buf, int len, int write)
> > >   {
> > > @@ -412,32 +373,8 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
> > >   	unsigned long offset = (addr) - vma->vm_start +
> > >   		((vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node))
> > >   		 << PAGE_SHIFT);
> > > -	int ret;
> > > -
> > > -	if (len < 1 || (offset + len) > bo->base.size)
> > > -		return -EIO;
> > > -	ret = ttm_bo_reserve(bo, true, false, NULL);
> > > -	if (ret)
> > > -		return ret;
> > > -
> > > -	switch (bo->resource->mem_type) {
> > > -	case TTM_PL_SYSTEM:
> > > -		fallthrough;
> > > -	case TTM_PL_TT:
> > > -		ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write);
> > > -		break;
> > > -	default:
> > > -		if (bo->bdev->funcs->access_memory)
> > > -			ret = bo->bdev->funcs->access_memory(
> > > -				bo, offset, buf, len, write);
> > > -		else
> > > -			ret = -EIO;
> > > -	}
> > > -
> > > -	ttm_bo_unreserve(bo);
> > > -
> > > -	return ret;
> > > +	return ttm_bo_access(bo, offset, buf, len, write);
> > >   }
> > >   EXPORT_SYMBOL(ttm_bo_vm_access);
> > > diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> > > index 5804408815be..8ea11cd8df39 100644
> > > --- a/include/drm/ttm/ttm_bo.h
> > > +++ b/include/drm/ttm/ttm_bo.h
> > > @@ -421,6 +421,8 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo);
> > >   int ttm_bo_evict_first(struct ttm_device *bdev,
> > >   		       struct ttm_resource_manager *man,
> > >   		       struct ttm_operation_ctx *ctx);
> > > +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
> > > +		  void *buf, int len, int write);
> > >   vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
> > >   			     struct vm_fault *vmf);
> > >   vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
> > > -- 
> > > 2.34.1
> > > 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-04 19:28       ` Christian König
@ 2024-11-04 21:49         ` Matthew Brost
  2024-11-05  7:41           ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Matthew Brost @ 2024-11-04 21:49 UTC (permalink / raw)
  To: Christian König
  Cc: Rodrigo Vivi, Christian Koenig, Huang Rui, intel-xe, dri-devel,
	matthew.auld

On Mon, Nov 04, 2024 at 08:28:34PM +0100, Christian König wrote:
> Am 04.11.24 um 18:34 schrieb Rodrigo Vivi:
> > On Thu, Oct 31, 2024 at 04:43:19PM -0700, Matthew Brost wrote:
> > > On Thu, Oct 31, 2024 at 11:10:42AM -0700, Matthew Brost wrote:
> > > > Non-contiguous VRAM cannot easily be mapped in TTM nor can non-visible
> > > > VRAM easily be accessed. Add ttm_bo_access, which is similar to
> > > > ttm_bo_vm_access, to access such memory.
> > > > 
> > > > v4:
> > > >   - Fix checkpatch warnings (CI)
> > > > v5:
> > > >   - Fix checkpatch warnings (CI)
> > > > v6:
> > > >   - Fix kernel doc (Auld)
> > > > 
> > > Christian - Do you mind if I merge patch along with the rest of the
> > > series to drm-xe-next?
> > Ray, Christian,
> > 
> > ack on getting this patch to drm-xe-next?
> 
> No, we actually spend quite some time removing the single page mapping
> functionality for BOs.
> 

I don't understand this statement. This patch just adds a TTM BO helper
for access - it doesn't change anything wrt to single page mapping.

> You need a really good justification to bring that back.
> 

The use case is EuDebugger requires essentially the same functionality
as ptrace -> vm_access. 

TTM mapping non-contiguous VRAM doesn't work unless I'm blind. User BOs
which the EuDebugger accesses can be non-contiguous, hence the new
helper.

Matt

> Regards,
> Christian.
> 
> > 
> > > Matt
> > > 
> > > > Reported-by: Christoph Manszewski <christoph.manszewski@intel.com>
> > > > Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > Tested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > > Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> > > > ---
> > > >   drivers/gpu/drm/ttm/ttm_bo_util.c | 86 +++++++++++++++++++++++++++++++
> > > >   drivers/gpu/drm/ttm/ttm_bo_vm.c   | 65 +----------------------
> > > >   include/drm/ttm/ttm_bo.h          |  2 +
> > > >   3 files changed, 89 insertions(+), 64 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > index d939925efa81..77e760ea7193 100644
> > > > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > @@ -919,3 +919,89 @@ s64 ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct ttm_device *bdev,
> > > >   	return progress;
> > > >   }
> > > > +
> > > > +static int ttm_bo_access_kmap(struct ttm_buffer_object *bo,
> > > > +			      unsigned long offset,
> > > > +			      void *buf, int len, int write)
> > > > +{
> > > > +	unsigned long page = offset >> PAGE_SHIFT;
> > > > +	unsigned long bytes_left = len;
> > > > +	int ret;
> > > > +
> > > > +	/* Copy a page at a time, that way no extra virtual address
> > > > +	 * mapping is needed
> > > > +	 */
> > > > +	offset -= page << PAGE_SHIFT;
> > > > +	do {
> > > > +		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
> > > > +		struct ttm_bo_kmap_obj map;
> > > > +		void *ptr;
> > > > +		bool is_iomem;
> > > > +
> > > > +		ret = ttm_bo_kmap(bo, page, 1, &map);
> > > > +		if (ret)
> > > > +			return ret;
> > > > +
> > > > +		ptr = (void *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
> > > > +		WARN_ON_ONCE(is_iomem);
> > > > +		if (write)
> > > > +			memcpy(ptr, buf, bytes);
> > > > +		else
> > > > +			memcpy(buf, ptr, bytes);
> > > > +		ttm_bo_kunmap(&map);
> > > > +
> > > > +		page++;
> > > > +		buf += bytes;
> > > > +		bytes_left -= bytes;
> > > > +		offset = 0;
> > > > +	} while (bytes_left);
> > > > +
> > > > +	return len;
> > > > +}
> > > > +
> > > > +/**
> > > > + * ttm_bo_access - Helper to access a buffer object
> > > > + *
> > > > + * @bo: ttm buffer object
> > > > + * @offset: access offset into buffer object
> > > > + * @buf: pointer to caller memory to read into or write from
> > > > + * @len: length of access
> > > > + * @write: write access
> > > > + *
> > > > + * Utility function to access a buffer object. Useful when buffer object cannot
> > > > + * be easily mapped (non-contiguous, non-visible, etc...).
> > > > + *
> > > > + * Returns:
> > > > + * @len if successful, negative error code on failure.
> > > > + */
> > > > +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
> > > > +		  void *buf, int len, int write)
> > > > +{
> > > > +	int ret;
> > > > +
> > > > +	if (len < 1 || (offset + len) > bo->base.size)
> > > > +		return -EIO;
> > > > +
> > > > +	ret = ttm_bo_reserve(bo, true, false, NULL);
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	switch (bo->resource->mem_type) {
> > > > +	case TTM_PL_SYSTEM:
> > > > +		fallthrough;
> > > > +	case TTM_PL_TT:
> > > > +		ret = ttm_bo_access_kmap(bo, offset, buf, len, write);
> > > > +		break;
> > > > +	default:
> > > > +		if (bo->bdev->funcs->access_memory)
> > > > +			ret = bo->bdev->funcs->access_memory
> > > > +				(bo, offset, buf, len, write);
> > > > +		else
> > > > +			ret = -EIO;
> > > > +	}
> > > > +
> > > > +	ttm_bo_unreserve(bo);
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +EXPORT_SYMBOL(ttm_bo_access);
> > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > > index 2c699ed1963a..20b1e5f78684 100644
> > > > --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > > +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > > @@ -366,45 +366,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma)
> > > >   }
> > > >   EXPORT_SYMBOL(ttm_bo_vm_close);
> > > > -static int ttm_bo_vm_access_kmap(struct ttm_buffer_object *bo,
> > > > -				 unsigned long offset,
> > > > -				 uint8_t *buf, int len, int write)
> > > > -{
> > > > -	unsigned long page = offset >> PAGE_SHIFT;
> > > > -	unsigned long bytes_left = len;
> > > > -	int ret;
> > > > -
> > > > -	/* Copy a page at a time, that way no extra virtual address
> > > > -	 * mapping is needed
> > > > -	 */
> > > > -	offset -= page << PAGE_SHIFT;
> > > > -	do {
> > > > -		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
> > > > -		struct ttm_bo_kmap_obj map;
> > > > -		void *ptr;
> > > > -		bool is_iomem;
> > > > -
> > > > -		ret = ttm_bo_kmap(bo, page, 1, &map);
> > > > -		if (ret)
> > > > -			return ret;
> > > > -
> > > > -		ptr = (uint8_t *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
> > > > -		WARN_ON_ONCE(is_iomem);
> > > > -		if (write)
> > > > -			memcpy(ptr, buf, bytes);
> > > > -		else
> > > > -			memcpy(buf, ptr, bytes);
> > > > -		ttm_bo_kunmap(&map);
> > > > -
> > > > -		page++;
> > > > -		buf += bytes;
> > > > -		bytes_left -= bytes;
> > > > -		offset = 0;
> > > > -	} while (bytes_left);
> > > > -
> > > > -	return len;
> > > > -}
> > > > -
> > > >   int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
> > > >   		     void *buf, int len, int write)
> > > >   {
> > > > @@ -412,32 +373,8 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
> > > >   	unsigned long offset = (addr) - vma->vm_start +
> > > >   		((vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node))
> > > >   		 << PAGE_SHIFT);
> > > > -	int ret;
> > > > -
> > > > -	if (len < 1 || (offset + len) > bo->base.size)
> > > > -		return -EIO;
> > > > -	ret = ttm_bo_reserve(bo, true, false, NULL);
> > > > -	if (ret)
> > > > -		return ret;
> > > > -
> > > > -	switch (bo->resource->mem_type) {
> > > > -	case TTM_PL_SYSTEM:
> > > > -		fallthrough;
> > > > -	case TTM_PL_TT:
> > > > -		ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write);
> > > > -		break;
> > > > -	default:
> > > > -		if (bo->bdev->funcs->access_memory)
> > > > -			ret = bo->bdev->funcs->access_memory(
> > > > -				bo, offset, buf, len, write);
> > > > -		else
> > > > -			ret = -EIO;
> > > > -	}
> > > > -
> > > > -	ttm_bo_unreserve(bo);
> > > > -
> > > > -	return ret;
> > > > +	return ttm_bo_access(bo, offset, buf, len, write);
> > > >   }
> > > >   EXPORT_SYMBOL(ttm_bo_vm_access);
> > > > diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> > > > index 5804408815be..8ea11cd8df39 100644
> > > > --- a/include/drm/ttm/ttm_bo.h
> > > > +++ b/include/drm/ttm/ttm_bo.h
> > > > @@ -421,6 +421,8 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo);
> > > >   int ttm_bo_evict_first(struct ttm_device *bdev,
> > > >   		       struct ttm_resource_manager *man,
> > > >   		       struct ttm_operation_ctx *ctx);
> > > > +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
> > > > +		  void *buf, int len, int write);
> > > >   vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
> > > >   			     struct vm_fault *vmf);
> > > >   vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
> > > > -- 
> > > > 2.34.1
> > > > 
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-04 21:30       ` Matthew Brost
@ 2024-11-04 22:26         ` Rodrigo Vivi
  0 siblings, 0 replies; 56+ messages in thread
From: Rodrigo Vivi @ 2024-11-04 22:26 UTC (permalink / raw)
  To: Matthew Brost; +Cc: Christian König, intel-xe, dri-devel, matthew.auld

On Mon, Nov 04, 2024 at 01:30:41PM -0800, Matthew Brost wrote:
> On Mon, Nov 04, 2024 at 08:47:01PM +0100, Christian König wrote:
> > Am 01.11.24 um 00:43 schrieb Matthew Brost:
> > > On Thu, Oct 31, 2024 at 11:10:42AM -0700, Matthew Brost wrote:
> > > > Non-contiguous VRAM cannot easily be mapped in TTM nor can non-visible
> > > > VRAM easily be accessed. Add ttm_bo_access, which is similar to
> > > > ttm_bo_vm_access, to access such memory.
> > > > 
> > > > v4:
> > > >   - Fix checkpatch warnings (CI)
> > > > v5:
> > > >   - Fix checkpatch warnings (CI)
> > > > v6:
> > > >   - Fix kernel doc (Auld)
> > > > 
> > > Christian - Do you mind if I merge patch along with the rest of the
> > > series to drm-xe-next?
> > 
> > I don't see the original patch anywhere in my inbox, please make sure to CC
> > me while sending things out.
> > 
> 
> I think I had you on an earlier revision but used the wrong alias to send out
> this latest one. I will be sure to include you on future patches.
> 
> Would you like to continue the discussion here, or should I send out a fresh
> revision with you included and an updated commit message?

please send over a refreshed version with updated commit message and also
cc'ing dri-devel

> 
> > Apart from that I absolutely don't see any justification for this patch. You
> > move stuff into ttm_bo_util.c which not even remotely belongs in there.
> > 
> 
> The justification is that EuDebugger requires essentially the same functionality
> as ptrace -> vm_access. This patch simply adds a helper to achieve this. There
> is no functional change to the existing code.
> 
> Regarding the statement about ttm_bo_util.c, that seems quite aggressive. It is
> a TTM BO helper function, so it could logically belong in either ttm_bo.c or
> ttm_bo_util.c. A BO helper definitely shouldn't call into ttm_bo_vm.c, nor
> should it reside there. Perhaps I chose the wrong ttm_bo* file? I apologize for
> that. It would be helpful to know why you think this is the wrong place so I can
> better understand your expectations for TTM.

I also believe that the ttm_bo_util seems a good place for that, but
perhaps someone else has some other better ideas or suggestions there

> 
> Matt
> 
> > Regards,
> > Christian.
> > 
> > > 
> > > Matt
> > > 
> > > > Reported-by: Christoph Manszewski <christoph.manszewski@intel.com>
> > > > Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > Tested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > > > Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> > > > ---
> > > >   drivers/gpu/drm/ttm/ttm_bo_util.c | 86 +++++++++++++++++++++++++++++++
> > > >   drivers/gpu/drm/ttm/ttm_bo_vm.c   | 65 +----------------------
> > > >   include/drm/ttm/ttm_bo.h          |  2 +
> > > >   3 files changed, 89 insertions(+), 64 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > index d939925efa81..77e760ea7193 100644
> > > > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > @@ -919,3 +919,89 @@ s64 ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct ttm_device *bdev,
> > > >   	return progress;
> > > >   }
> > > > +
> > > > +static int ttm_bo_access_kmap(struct ttm_buffer_object *bo,
> > > > +			      unsigned long offset,
> > > > +			      void *buf, int len, int write)
> > > > +{
> > > > +	unsigned long page = offset >> PAGE_SHIFT;
> > > > +	unsigned long bytes_left = len;
> > > > +	int ret;
> > > > +
> > > > +	/* Copy a page at a time, that way no extra virtual address
> > > > +	 * mapping is needed
> > > > +	 */
> > > > +	offset -= page << PAGE_SHIFT;
> > > > +	do {
> > > > +		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
> > > > +		struct ttm_bo_kmap_obj map;
> > > > +		void *ptr;
> > > > +		bool is_iomem;
> > > > +
> > > > +		ret = ttm_bo_kmap(bo, page, 1, &map);
> > > > +		if (ret)
> > > > +			return ret;
> > > > +
> > > > +		ptr = (void *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
> > > > +		WARN_ON_ONCE(is_iomem);
> > > > +		if (write)
> > > > +			memcpy(ptr, buf, bytes);
> > > > +		else
> > > > +			memcpy(buf, ptr, bytes);
> > > > +		ttm_bo_kunmap(&map);
> > > > +
> > > > +		page++;
> > > > +		buf += bytes;
> > > > +		bytes_left -= bytes;
> > > > +		offset = 0;
> > > > +	} while (bytes_left);
> > > > +
> > > > +	return len;
> > > > +}
> > > > +
> > > > +/**
> > > > + * ttm_bo_access - Helper to access a buffer object
> > > > + *
> > > > + * @bo: ttm buffer object
> > > > + * @offset: access offset into buffer object
> > > > + * @buf: pointer to caller memory to read into or write from
> > > > + * @len: length of access
> > > > + * @write: write access
> > > > + *
> > > > + * Utility function to access a buffer object. Useful when buffer object cannot
> > > > + * be easily mapped (non-contiguous, non-visible, etc...).
> > > > + *
> > > > + * Returns:
> > > > + * @len if successful, negative error code on failure.
> > > > + */
> > > > +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
> > > > +		  void *buf, int len, int write)
> > > > +{
> > > > +	int ret;
> > > > +
> > > > +	if (len < 1 || (offset + len) > bo->base.size)
> > > > +		return -EIO;
> > > > +
> > > > +	ret = ttm_bo_reserve(bo, true, false, NULL);
> > > > +	if (ret)
> > > > +		return ret;
> > > > +
> > > > +	switch (bo->resource->mem_type) {
> > > > +	case TTM_PL_SYSTEM:
> > > > +		fallthrough;
> > > > +	case TTM_PL_TT:
> > > > +		ret = ttm_bo_access_kmap(bo, offset, buf, len, write);
> > > > +		break;
> > > > +	default:
> > > > +		if (bo->bdev->funcs->access_memory)
> > > > +			ret = bo->bdev->funcs->access_memory
> > > > +				(bo, offset, buf, len, write);
> > > > +		else
> > > > +			ret = -EIO;
> > > > +	}
> > > > +
> > > > +	ttm_bo_unreserve(bo);
> > > > +
> > > > +	return ret;
> > > > +}
> > > > +EXPORT_SYMBOL(ttm_bo_access);
> > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > > index 2c699ed1963a..20b1e5f78684 100644
> > > > --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > > +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > > @@ -366,45 +366,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma)
> > > >   }
> > > >   EXPORT_SYMBOL(ttm_bo_vm_close);
> > > > -static int ttm_bo_vm_access_kmap(struct ttm_buffer_object *bo,
> > > > -				 unsigned long offset,
> > > > -				 uint8_t *buf, int len, int write)
> > > > -{
> > > > -	unsigned long page = offset >> PAGE_SHIFT;
> > > > -	unsigned long bytes_left = len;
> > > > -	int ret;
> > > > -
> > > > -	/* Copy a page at a time, that way no extra virtual address
> > > > -	 * mapping is needed
> > > > -	 */
> > > > -	offset -= page << PAGE_SHIFT;
> > > > -	do {
> > > > -		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
> > > > -		struct ttm_bo_kmap_obj map;
> > > > -		void *ptr;
> > > > -		bool is_iomem;
> > > > -
> > > > -		ret = ttm_bo_kmap(bo, page, 1, &map);
> > > > -		if (ret)
> > > > -			return ret;
> > > > -
> > > > -		ptr = (uint8_t *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
> > > > -		WARN_ON_ONCE(is_iomem);
> > > > -		if (write)
> > > > -			memcpy(ptr, buf, bytes);
> > > > -		else
> > > > -			memcpy(buf, ptr, bytes);
> > > > -		ttm_bo_kunmap(&map);
> > > > -
> > > > -		page++;
> > > > -		buf += bytes;
> > > > -		bytes_left -= bytes;
> > > > -		offset = 0;
> > > > -	} while (bytes_left);
> > > > -
> > > > -	return len;
> > > > -}
> > > > -
> > > >   int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
> > > >   		     void *buf, int len, int write)
> > > >   {
> > > > @@ -412,32 +373,8 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
> > > >   	unsigned long offset = (addr) - vma->vm_start +
> > > >   		((vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node))
> > > >   		 << PAGE_SHIFT);
> > > > -	int ret;
> > > > -
> > > > -	if (len < 1 || (offset + len) > bo->base.size)
> > > > -		return -EIO;
> > > > -	ret = ttm_bo_reserve(bo, true, false, NULL);
> > > > -	if (ret)
> > > > -		return ret;
> > > > -
> > > > -	switch (bo->resource->mem_type) {
> > > > -	case TTM_PL_SYSTEM:
> > > > -		fallthrough;
> > > > -	case TTM_PL_TT:
> > > > -		ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write);
> > > > -		break;
> > > > -	default:
> > > > -		if (bo->bdev->funcs->access_memory)
> > > > -			ret = bo->bdev->funcs->access_memory(
> > > > -				bo, offset, buf, len, write);
> > > > -		else
> > > > -			ret = -EIO;
> > > > -	}
> > > > -
> > > > -	ttm_bo_unreserve(bo);
> > > > -
> > > > -	return ret;
> > > > +	return ttm_bo_access(bo, offset, buf, len, write);
> > > >   }
> > > >   EXPORT_SYMBOL(ttm_bo_vm_access);
> > > > diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> > > > index 5804408815be..8ea11cd8df39 100644
> > > > --- a/include/drm/ttm/ttm_bo.h
> > > > +++ b/include/drm/ttm/ttm_bo.h
> > > > @@ -421,6 +421,8 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo);
> > > >   int ttm_bo_evict_first(struct ttm_device *bdev,
> > > >   		       struct ttm_resource_manager *man,
> > > >   		       struct ttm_operation_ctx *ctx);
> > > > +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
> > > > +		  void *buf, int len, int write);
> > > >   vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
> > > >   			     struct vm_fault *vmf);
> > > >   vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
> > > > -- 
> > > > 2.34.1
> > > > 
> > 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-04 21:49         ` Matthew Brost
@ 2024-11-05  7:41           ` Christian König
  2024-11-05 18:35             ` Matthew Brost
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2024-11-05  7:41 UTC (permalink / raw)
  To: Matthew Brost, Christian König
  Cc: Rodrigo Vivi, Huang Rui, intel-xe, dri-devel, matthew.auld

[-- Attachment #1: Type: text/plain, Size: 8719 bytes --]

Am 04.11.24 um 22:49 schrieb Matthew Brost:
> On Mon, Nov 04, 2024 at 08:28:34PM +0100, Christian König wrote:
>> Am 04.11.24 um 18:34 schrieb Rodrigo Vivi:
>>> On Thu, Oct 31, 2024 at 04:43:19PM -0700, Matthew Brost wrote:
>>>> On Thu, Oct 31, 2024 at 11:10:42AM -0700, Matthew Brost wrote:
>>>>> Non-contiguous VRAM cannot easily be mapped in TTM nor can non-visible
>>>>> VRAM easily be accessed. Add ttm_bo_access, which is similar to
>>>>> ttm_bo_vm_access, to access such memory.
>>>>>
>>>>> v4:
>>>>>    - Fix checkpatch warnings (CI)
>>>>> v5:
>>>>>    - Fix checkpatch warnings (CI)
>>>>> v6:
>>>>>    - Fix kernel doc (Auld)
>>>>>
>>>> Christian - Do you mind if I merge patch along with the rest of the
>>>> series to drm-xe-next?
>>> Ray, Christian,
>>>
>>> ack on getting this patch to drm-xe-next?
>> No, we actually spend quite some time removing the single page mapping
>> functionality for BOs.
>>
> I don't understand this statement. This patch just adds a TTM BO helper
> for access - it doesn't change anything wrt to single page mapping.

Well we spend quite some time removing single page mappings from device 
drivers.

The only remaining use case of ttm_bo_kmap() with just one page is the 
ttm_bo_vm_access_kmap() function and I was really hoping to make that 
one TTM internal at some point.

>> You need a really good justification to bring that back.
>>
> The use case is EuDebugger requires essentially the same functionality
> as ptrace -> vm_access.

Then why don't you use ptrace in the first place?

> TTM mapping non-contiguous VRAM doesn't work unless I'm blind. User BOs
> which the EuDebugger accesses can be non-contiguous, hence the new
> helper.

Then why don't you handle that inside the driver in the first place 
instead of going through a TTM midlayer?

Regards,
Christian.

>
> Matt
>
>> Regards,
>> Christian.
>>
>>>> Matt
>>>>
>>>>> Reported-by: Christoph Manszewski<christoph.manszewski@intel.com>
>>>>> Suggested-by: Thomas Hellström<thomas.hellstrom@linux.intel.com>
>>>>> Signed-off-by: Matthew Brost<matthew.brost@intel.com>
>>>>> Tested-by: Mika Kuoppala<mika.kuoppala@linux.intel.com>
>>>>> Reviewed-by: Matthew Auld<matthew.auld@intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/ttm/ttm_bo_util.c | 86 +++++++++++++++++++++++++++++++
>>>>>    drivers/gpu/drm/ttm/ttm_bo_vm.c   | 65 +----------------------
>>>>>    include/drm/ttm/ttm_bo.h          |  2 +
>>>>>    3 files changed, 89 insertions(+), 64 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>>> index d939925efa81..77e760ea7193 100644
>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>>> @@ -919,3 +919,89 @@ s64 ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct ttm_device *bdev,
>>>>>    	return progress;
>>>>>    }
>>>>> +
>>>>> +static int ttm_bo_access_kmap(struct ttm_buffer_object *bo,
>>>>> +			      unsigned long offset,
>>>>> +			      void *buf, int len, int write)
>>>>> +{
>>>>> +	unsigned long page = offset >> PAGE_SHIFT;
>>>>> +	unsigned long bytes_left = len;
>>>>> +	int ret;
>>>>> +
>>>>> +	/* Copy a page at a time, that way no extra virtual address
>>>>> +	 * mapping is needed
>>>>> +	 */
>>>>> +	offset -= page << PAGE_SHIFT;
>>>>> +	do {
>>>>> +		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
>>>>> +		struct ttm_bo_kmap_obj map;
>>>>> +		void *ptr;
>>>>> +		bool is_iomem;
>>>>> +
>>>>> +		ret = ttm_bo_kmap(bo, page, 1, &map);
>>>>> +		if (ret)
>>>>> +			return ret;
>>>>> +
>>>>> +		ptr = (void *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
>>>>> +		WARN_ON_ONCE(is_iomem);
>>>>> +		if (write)
>>>>> +			memcpy(ptr, buf, bytes);
>>>>> +		else
>>>>> +			memcpy(buf, ptr, bytes);
>>>>> +		ttm_bo_kunmap(&map);
>>>>> +
>>>>> +		page++;
>>>>> +		buf += bytes;
>>>>> +		bytes_left -= bytes;
>>>>> +		offset = 0;
>>>>> +	} while (bytes_left);
>>>>> +
>>>>> +	return len;
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * ttm_bo_access - Helper to access a buffer object
>>>>> + *
>>>>> + * @bo: ttm buffer object
>>>>> + * @offset: access offset into buffer object
>>>>> + * @buf: pointer to caller memory to read into or write from
>>>>> + * @len: length of access
>>>>> + * @write: write access
>>>>> + *
>>>>> + * Utility function to access a buffer object. Useful when buffer object cannot
>>>>> + * be easily mapped (non-contiguous, non-visible, etc...).
>>>>> + *
>>>>> + * Returns:
>>>>> + * @len if successful, negative error code on failure.
>>>>> + */
>>>>> +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
>>>>> +		  void *buf, int len, int write)
>>>>> +{
>>>>> +	int ret;
>>>>> +
>>>>> +	if (len < 1 || (offset + len) > bo->base.size)
>>>>> +		return -EIO;
>>>>> +
>>>>> +	ret = ttm_bo_reserve(bo, true, false, NULL);
>>>>> +	if (ret)
>>>>> +		return ret;
>>>>> +
>>>>> +	switch (bo->resource->mem_type) {
>>>>> +	case TTM_PL_SYSTEM:
>>>>> +		fallthrough;
>>>>> +	case TTM_PL_TT:
>>>>> +		ret = ttm_bo_access_kmap(bo, offset, buf, len, write);
>>>>> +		break;
>>>>> +	default:
>>>>> +		if (bo->bdev->funcs->access_memory)
>>>>> +			ret = bo->bdev->funcs->access_memory
>>>>> +				(bo, offset, buf, len, write);
>>>>> +		else
>>>>> +			ret = -EIO;
>>>>> +	}
>>>>> +
>>>>> +	ttm_bo_unreserve(bo);
>>>>> +
>>>>> +	return ret;
>>>>> +}
>>>>> +EXPORT_SYMBOL(ttm_bo_access);
>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>> index 2c699ed1963a..20b1e5f78684 100644
>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>> @@ -366,45 +366,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma)
>>>>>    }
>>>>>    EXPORT_SYMBOL(ttm_bo_vm_close);
>>>>> -static int ttm_bo_vm_access_kmap(struct ttm_buffer_object *bo,
>>>>> -				 unsigned long offset,
>>>>> -				 uint8_t *buf, int len, int write)
>>>>> -{
>>>>> -	unsigned long page = offset >> PAGE_SHIFT;
>>>>> -	unsigned long bytes_left = len;
>>>>> -	int ret;
>>>>> -
>>>>> -	/* Copy a page at a time, that way no extra virtual address
>>>>> -	 * mapping is needed
>>>>> -	 */
>>>>> -	offset -= page << PAGE_SHIFT;
>>>>> -	do {
>>>>> -		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
>>>>> -		struct ttm_bo_kmap_obj map;
>>>>> -		void *ptr;
>>>>> -		bool is_iomem;
>>>>> -
>>>>> -		ret = ttm_bo_kmap(bo, page, 1, &map);
>>>>> -		if (ret)
>>>>> -			return ret;
>>>>> -
>>>>> -		ptr = (uint8_t *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
>>>>> -		WARN_ON_ONCE(is_iomem);
>>>>> -		if (write)
>>>>> -			memcpy(ptr, buf, bytes);
>>>>> -		else
>>>>> -			memcpy(buf, ptr, bytes);
>>>>> -		ttm_bo_kunmap(&map);
>>>>> -
>>>>> -		page++;
>>>>> -		buf += bytes;
>>>>> -		bytes_left -= bytes;
>>>>> -		offset = 0;
>>>>> -	} while (bytes_left);
>>>>> -
>>>>> -	return len;
>>>>> -}
>>>>> -
>>>>>    int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
>>>>>    		     void *buf, int len, int write)
>>>>>    {
>>>>> @@ -412,32 +373,8 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
>>>>>    	unsigned long offset = (addr) - vma->vm_start +
>>>>>    		((vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node))
>>>>>    		 << PAGE_SHIFT);
>>>>> -	int ret;
>>>>> -
>>>>> -	if (len < 1 || (offset + len) > bo->base.size)
>>>>> -		return -EIO;
>>>>> -	ret = ttm_bo_reserve(bo, true, false, NULL);
>>>>> -	if (ret)
>>>>> -		return ret;
>>>>> -
>>>>> -	switch (bo->resource->mem_type) {
>>>>> -	case TTM_PL_SYSTEM:
>>>>> -		fallthrough;
>>>>> -	case TTM_PL_TT:
>>>>> -		ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write);
>>>>> -		break;
>>>>> -	default:
>>>>> -		if (bo->bdev->funcs->access_memory)
>>>>> -			ret = bo->bdev->funcs->access_memory(
>>>>> -				bo, offset, buf, len, write);
>>>>> -		else
>>>>> -			ret = -EIO;
>>>>> -	}
>>>>> -
>>>>> -	ttm_bo_unreserve(bo);
>>>>> -
>>>>> -	return ret;
>>>>> +	return ttm_bo_access(bo, offset, buf, len, write);
>>>>>    }
>>>>>    EXPORT_SYMBOL(ttm_bo_vm_access);
>>>>> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
>>>>> index 5804408815be..8ea11cd8df39 100644
>>>>> --- a/include/drm/ttm/ttm_bo.h
>>>>> +++ b/include/drm/ttm/ttm_bo.h
>>>>> @@ -421,6 +421,8 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo);
>>>>>    int ttm_bo_evict_first(struct ttm_device *bdev,
>>>>>    		       struct ttm_resource_manager *man,
>>>>>    		       struct ttm_operation_ctx *ctx);
>>>>> +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
>>>>> +		  void *buf, int len, int write);
>>>>>    vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
>>>>>    			     struct vm_fault *vmf);
>>>>>    vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
>>>>> -- 
>>>>> 2.34.1
>>>>>

[-- Attachment #2: Type: text/html, Size: 10076 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-05  7:41           ` Christian König
@ 2024-11-05 18:35             ` Matthew Brost
  2024-11-06  9:48               ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Matthew Brost @ 2024-11-05 18:35 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, Rodrigo Vivi, Huang Rui, intel-xe,
	dri-devel, matthew.auld

On Tue, Nov 05, 2024 at 08:41:07AM +0100, Christian König wrote:
> Am 04.11.24 um 22:49 schrieb Matthew Brost:
> > On Mon, Nov 04, 2024 at 08:28:34PM +0100, Christian König wrote:
> > > Am 04.11.24 um 18:34 schrieb Rodrigo Vivi:
> > > > On Thu, Oct 31, 2024 at 04:43:19PM -0700, Matthew Brost wrote:
> > > > > On Thu, Oct 31, 2024 at 11:10:42AM -0700, Matthew Brost wrote:
> > > > > > Non-contiguous VRAM cannot easily be mapped in TTM nor can non-visible
> > > > > > VRAM easily be accessed. Add ttm_bo_access, which is similar to
> > > > > > ttm_bo_vm_access, to access such memory.
> > > > > > 
> > > > > > v4:
> > > > > >    - Fix checkpatch warnings (CI)
> > > > > > v5:
> > > > > >    - Fix checkpatch warnings (CI)
> > > > > > v6:
> > > > > >    - Fix kernel doc (Auld)
> > > > > > 
> > > > > Christian - Do you mind if I merge patch along with the rest of the
> > > > > series to drm-xe-next?
> > > > Ray, Christian,
> > > > 
> > > > ack on getting this patch to drm-xe-next?
> > > No, we actually spend quite some time removing the single page mapping
> > > functionality for BOs.
> > > 
> > I don't understand this statement. This patch just adds a TTM BO helper
> > for access - it doesn't change anything wrt to single page mapping.
> 
> Well we spend quite some time removing single page mappings from device
> drivers.
> 
> The only remaining use case of ttm_bo_kmap() with just one page is the
> ttm_bo_vm_access_kmap() function and I was really hoping to make that one
> TTM internal at some point.
> 

This is still static, right? I suppose this exposes this to the outside
world though in another place. I asume there is a reason we can't use
vmap in ttm_bo_vm_access?

> > > You need a really good justification to bring that back.
> > > 
> > The use case is EuDebugger requires essentially the same functionality
> > as ptrace -> vm_access.
> 
> Then why don't you use ptrace in the first place?
> 

I think the debugger speaks in GPU address space thus needs to access
via the GPU VM -> BO, userptrs.

> > TTM mapping non-contiguous VRAM doesn't work unless I'm blind. User BOs
> > which the EuDebugger accesses can be non-contiguous, hence the new
> > helper.
> 
> Then why don't you handle that inside the driver in the first place instead
> of going through a TTM midlayer?
> 

Well common code always seems like a good idea to me. Can do this if you
insist though.

What if I change my new helper ttm_bo_access to be based on vmap for
SYSTEM / TT but honestly that seems wasteful too for a temporary
access mapping.

With this, I strongly prefer the code as is.

Matt

> Regards,
> Christian.
> 
> > 
> > Matt
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > > Matt
> > > > > 
> > > > > > Reported-by: Christoph Manszewski<christoph.manszewski@intel.com>
> > > > > > Suggested-by: Thomas Hellström<thomas.hellstrom@linux.intel.com>
> > > > > > Signed-off-by: Matthew Brost<matthew.brost@intel.com>
> > > > > > Tested-by: Mika Kuoppala<mika.kuoppala@linux.intel.com>
> > > > > > Reviewed-by: Matthew Auld<matthew.auld@intel.com>
> > > > > > ---
> > > > > >    drivers/gpu/drm/ttm/ttm_bo_util.c | 86 +++++++++++++++++++++++++++++++
> > > > > >    drivers/gpu/drm/ttm/ttm_bo_vm.c   | 65 +----------------------
> > > > > >    include/drm/ttm/ttm_bo.h          |  2 +
> > > > > >    3 files changed, 89 insertions(+), 64 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > > > index d939925efa81..77e760ea7193 100644
> > > > > > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > > > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > > > @@ -919,3 +919,89 @@ s64 ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct ttm_device *bdev,
> > > > > >    	return progress;
> > > > > >    }
> > > > > > +
> > > > > > +static int ttm_bo_access_kmap(struct ttm_buffer_object *bo,
> > > > > > +			      unsigned long offset,
> > > > > > +			      void *buf, int len, int write)
> > > > > > +{
> > > > > > +	unsigned long page = offset >> PAGE_SHIFT;
> > > > > > +	unsigned long bytes_left = len;
> > > > > > +	int ret;
> > > > > > +
> > > > > > +	/* Copy a page at a time, that way no extra virtual address
> > > > > > +	 * mapping is needed
> > > > > > +	 */
> > > > > > +	offset -= page << PAGE_SHIFT;
> > > > > > +	do {
> > > > > > +		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
> > > > > > +		struct ttm_bo_kmap_obj map;
> > > > > > +		void *ptr;
> > > > > > +		bool is_iomem;
> > > > > > +
> > > > > > +		ret = ttm_bo_kmap(bo, page, 1, &map);
> > > > > > +		if (ret)
> > > > > > +			return ret;
> > > > > > +
> > > > > > +		ptr = (void *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
> > > > > > +		WARN_ON_ONCE(is_iomem);
> > > > > > +		if (write)
> > > > > > +			memcpy(ptr, buf, bytes);
> > > > > > +		else
> > > > > > +			memcpy(buf, ptr, bytes);
> > > > > > +		ttm_bo_kunmap(&map);
> > > > > > +
> > > > > > +		page++;
> > > > > > +		buf += bytes;
> > > > > > +		bytes_left -= bytes;
> > > > > > +		offset = 0;
> > > > > > +	} while (bytes_left);
> > > > > > +
> > > > > > +	return len;
> > > > > > +}
> > > > > > +
> > > > > > +/**
> > > > > > + * ttm_bo_access - Helper to access a buffer object
> > > > > > + *
> > > > > > + * @bo: ttm buffer object
> > > > > > + * @offset: access offset into buffer object
> > > > > > + * @buf: pointer to caller memory to read into or write from
> > > > > > + * @len: length of access
> > > > > > + * @write: write access
> > > > > > + *
> > > > > > + * Utility function to access a buffer object. Useful when buffer object cannot
> > > > > > + * be easily mapped (non-contiguous, non-visible, etc...).
> > > > > > + *
> > > > > > + * Returns:
> > > > > > + * @len if successful, negative error code on failure.
> > > > > > + */
> > > > > > +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
> > > > > > +		  void *buf, int len, int write)
> > > > > > +{
> > > > > > +	int ret;
> > > > > > +
> > > > > > +	if (len < 1 || (offset + len) > bo->base.size)
> > > > > > +		return -EIO;
> > > > > > +
> > > > > > +	ret = ttm_bo_reserve(bo, true, false, NULL);
> > > > > > +	if (ret)
> > > > > > +		return ret;
> > > > > > +
> > > > > > +	switch (bo->resource->mem_type) {
> > > > > > +	case TTM_PL_SYSTEM:
> > > > > > +		fallthrough;
> > > > > > +	case TTM_PL_TT:
> > > > > > +		ret = ttm_bo_access_kmap(bo, offset, buf, len, write);
> > > > > > +		break;
> > > > > > +	default:
> > > > > > +		if (bo->bdev->funcs->access_memory)
> > > > > > +			ret = bo->bdev->funcs->access_memory
> > > > > > +				(bo, offset, buf, len, write);
> > > > > > +		else
> > > > > > +			ret = -EIO;
> > > > > > +	}
> > > > > > +
> > > > > > +	ttm_bo_unreserve(bo);
> > > > > > +
> > > > > > +	return ret;
> > > > > > +}
> > > > > > +EXPORT_SYMBOL(ttm_bo_access);
> > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > > > > index 2c699ed1963a..20b1e5f78684 100644
> > > > > > --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > > > > +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > > > > @@ -366,45 +366,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma)
> > > > > >    }
> > > > > >    EXPORT_SYMBOL(ttm_bo_vm_close);
> > > > > > -static int ttm_bo_vm_access_kmap(struct ttm_buffer_object *bo,
> > > > > > -				 unsigned long offset,
> > > > > > -				 uint8_t *buf, int len, int write)
> > > > > > -{
> > > > > > -	unsigned long page = offset >> PAGE_SHIFT;
> > > > > > -	unsigned long bytes_left = len;
> > > > > > -	int ret;
> > > > > > -
> > > > > > -	/* Copy a page at a time, that way no extra virtual address
> > > > > > -	 * mapping is needed
> > > > > > -	 */
> > > > > > -	offset -= page << PAGE_SHIFT;
> > > > > > -	do {
> > > > > > -		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
> > > > > > -		struct ttm_bo_kmap_obj map;
> > > > > > -		void *ptr;
> > > > > > -		bool is_iomem;
> > > > > > -
> > > > > > -		ret = ttm_bo_kmap(bo, page, 1, &map);
> > > > > > -		if (ret)
> > > > > > -			return ret;
> > > > > > -
> > > > > > -		ptr = (uint8_t *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
> > > > > > -		WARN_ON_ONCE(is_iomem);
> > > > > > -		if (write)
> > > > > > -			memcpy(ptr, buf, bytes);
> > > > > > -		else
> > > > > > -			memcpy(buf, ptr, bytes);
> > > > > > -		ttm_bo_kunmap(&map);
> > > > > > -
> > > > > > -		page++;
> > > > > > -		buf += bytes;
> > > > > > -		bytes_left -= bytes;
> > > > > > -		offset = 0;
> > > > > > -	} while (bytes_left);
> > > > > > -
> > > > > > -	return len;
> > > > > > -}
> > > > > > -
> > > > > >    int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
> > > > > >    		     void *buf, int len, int write)
> > > > > >    {
> > > > > > @@ -412,32 +373,8 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
> > > > > >    	unsigned long offset = (addr) - vma->vm_start +
> > > > > >    		((vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node))
> > > > > >    		 << PAGE_SHIFT);
> > > > > > -	int ret;
> > > > > > -
> > > > > > -	if (len < 1 || (offset + len) > bo->base.size)
> > > > > > -		return -EIO;
> > > > > > -	ret = ttm_bo_reserve(bo, true, false, NULL);
> > > > > > -	if (ret)
> > > > > > -		return ret;
> > > > > > -
> > > > > > -	switch (bo->resource->mem_type) {
> > > > > > -	case TTM_PL_SYSTEM:
> > > > > > -		fallthrough;
> > > > > > -	case TTM_PL_TT:
> > > > > > -		ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write);
> > > > > > -		break;
> > > > > > -	default:
> > > > > > -		if (bo->bdev->funcs->access_memory)
> > > > > > -			ret = bo->bdev->funcs->access_memory(
> > > > > > -				bo, offset, buf, len, write);
> > > > > > -		else
> > > > > > -			ret = -EIO;
> > > > > > -	}
> > > > > > -
> > > > > > -	ttm_bo_unreserve(bo);
> > > > > > -
> > > > > > -	return ret;
> > > > > > +	return ttm_bo_access(bo, offset, buf, len, write);
> > > > > >    }
> > > > > >    EXPORT_SYMBOL(ttm_bo_vm_access);
> > > > > > diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> > > > > > index 5804408815be..8ea11cd8df39 100644
> > > > > > --- a/include/drm/ttm/ttm_bo.h
> > > > > > +++ b/include/drm/ttm/ttm_bo.h
> > > > > > @@ -421,6 +421,8 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo);
> > > > > >    int ttm_bo_evict_first(struct ttm_device *bdev,
> > > > > >    		       struct ttm_resource_manager *man,
> > > > > >    		       struct ttm_operation_ctx *ctx);
> > > > > > +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
> > > > > > +		  void *buf, int len, int write);
> > > > > >    vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
> > > > > >    			     struct vm_fault *vmf);
> > > > > >    vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
> > > > > > -- 
> > > > > > 2.34.1
> > > > > > 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-05 18:35             ` Matthew Brost
@ 2024-11-06  9:48               ` Christian König
  2024-11-06 15:25                 ` Matthew Brost
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2024-11-06  9:48 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Christian König, Rodrigo Vivi, Huang Rui, intel-xe,
	dri-devel, matthew.auld

[-- Attachment #1: Type: text/plain, Size: 9632 bytes --]

Am 05.11.24 um 19:35 schrieb Matthew Brost:
> [SNIP]
>> Well we spend quite some time removing single page mappings from device
>> drivers.
>>
>> The only remaining use case of ttm_bo_kmap() with just one page is the
>> ttm_bo_vm_access_kmap() function and I was really hoping to make that one
>> TTM internal at some point.
>>
> This is still static, right? I suppose this exposes this to the outside
> world though in another place. I asume there is a reason we can't use
> vmap in ttm_bo_vm_access?

Well no, the point is we don't want to.

There is a huge push from upstream to avoid using kmap/vmap if possible.

>>>> You need a really good justification to bring that back.
>>>>
>>> The use case is EuDebugger requires essentially the same functionality
>>> as ptrace -> vm_access.
>> Then why don't you use ptrace in the first place?
>>
> I think the debugger speaks in GPU address space thus needs to access
> via the GPU VM -> BO, userptrs.

Exactly that is strictly forbidden. You can't access userptrs through this.

That's one of the major reasons why upstream has pushed back on using 
kmap so massively.

Can you fully describe your use case? In other words what exactly is 
your debugger trying to do?

>>> TTM mapping non-contiguous VRAM doesn't work unless I'm blind. User BOs
>>> which the EuDebugger accesses can be non-contiguous, hence the new
>>> helper.
>> Then why don't you handle that inside the driver in the first place instead
>> of going through a TTM midlayer?
>>
> Well common code always seems like a good idea to me. Can do this if you
> insist though.
>
> What if I change my new helper ttm_bo_access to be based on vmap for
> SYSTEM / TT but honestly that seems wasteful too for a temporary
> access mapping.

Well, I think we need to take a step back. The major question is what is 
your use case and is that use case valid or causes security concerns.

For example userptrs are imported anonymous pages the GPU has a DMA 
mapping for. Re-mapping them into an user address space for debugging or 
even accessing them through the ptrace interface is strictly forbidden.

We already had people trying to do exactly that and it ended not well at 
all.

Regards,
Christian.

>
> With this, I strongly prefer the code as is.
>
> Matt
>
>> Regards,
>> Christian.
>>
>>> Matt
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>> Matt
>>>>>>
>>>>>>> Reported-by: Christoph Manszewski<christoph.manszewski@intel.com>
>>>>>>> Suggested-by: Thomas Hellström<thomas.hellstrom@linux.intel.com>
>>>>>>> Signed-off-by: Matthew Brost<matthew.brost@intel.com>
>>>>>>> Tested-by: Mika Kuoppala<mika.kuoppala@linux.intel.com>
>>>>>>> Reviewed-by: Matthew Auld<matthew.auld@intel.com>
>>>>>>> ---
>>>>>>>     drivers/gpu/drm/ttm/ttm_bo_util.c | 86 +++++++++++++++++++++++++++++++
>>>>>>>     drivers/gpu/drm/ttm/ttm_bo_vm.c   | 65 +----------------------
>>>>>>>     include/drm/ttm/ttm_bo.h          |  2 +
>>>>>>>     3 files changed, 89 insertions(+), 64 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>>>>> index d939925efa81..77e760ea7193 100644
>>>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>>>>> @@ -919,3 +919,89 @@ s64 ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct ttm_device *bdev,
>>>>>>>     	return progress;
>>>>>>>     }
>>>>>>> +
>>>>>>> +static int ttm_bo_access_kmap(struct ttm_buffer_object *bo,
>>>>>>> +			      unsigned long offset,
>>>>>>> +			      void *buf, int len, int write)
>>>>>>> +{
>>>>>>> +	unsigned long page = offset >> PAGE_SHIFT;
>>>>>>> +	unsigned long bytes_left = len;
>>>>>>> +	int ret;
>>>>>>> +
>>>>>>> +	/* Copy a page at a time, that way no extra virtual address
>>>>>>> +	 * mapping is needed
>>>>>>> +	 */
>>>>>>> +	offset -= page << PAGE_SHIFT;
>>>>>>> +	do {
>>>>>>> +		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
>>>>>>> +		struct ttm_bo_kmap_obj map;
>>>>>>> +		void *ptr;
>>>>>>> +		bool is_iomem;
>>>>>>> +
>>>>>>> +		ret = ttm_bo_kmap(bo, page, 1, &map);
>>>>>>> +		if (ret)
>>>>>>> +			return ret;
>>>>>>> +
>>>>>>> +		ptr = (void *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
>>>>>>> +		WARN_ON_ONCE(is_iomem);
>>>>>>> +		if (write)
>>>>>>> +			memcpy(ptr, buf, bytes);
>>>>>>> +		else
>>>>>>> +			memcpy(buf, ptr, bytes);
>>>>>>> +		ttm_bo_kunmap(&map);
>>>>>>> +
>>>>>>> +		page++;
>>>>>>> +		buf += bytes;
>>>>>>> +		bytes_left -= bytes;
>>>>>>> +		offset = 0;
>>>>>>> +	} while (bytes_left);
>>>>>>> +
>>>>>>> +	return len;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * ttm_bo_access - Helper to access a buffer object
>>>>>>> + *
>>>>>>> + * @bo: ttm buffer object
>>>>>>> + * @offset: access offset into buffer object
>>>>>>> + * @buf: pointer to caller memory to read into or write from
>>>>>>> + * @len: length of access
>>>>>>> + * @write: write access
>>>>>>> + *
>>>>>>> + * Utility function to access a buffer object. Useful when buffer object cannot
>>>>>>> + * be easily mapped (non-contiguous, non-visible, etc...).
>>>>>>> + *
>>>>>>> + * Returns:
>>>>>>> + * @len if successful, negative error code on failure.
>>>>>>> + */
>>>>>>> +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
>>>>>>> +		  void *buf, int len, int write)
>>>>>>> +{
>>>>>>> +	int ret;
>>>>>>> +
>>>>>>> +	if (len < 1 || (offset + len) > bo->base.size)
>>>>>>> +		return -EIO;
>>>>>>> +
>>>>>>> +	ret = ttm_bo_reserve(bo, true, false, NULL);
>>>>>>> +	if (ret)
>>>>>>> +		return ret;
>>>>>>> +
>>>>>>> +	switch (bo->resource->mem_type) {
>>>>>>> +	case TTM_PL_SYSTEM:
>>>>>>> +		fallthrough;
>>>>>>> +	case TTM_PL_TT:
>>>>>>> +		ret = ttm_bo_access_kmap(bo, offset, buf, len, write);
>>>>>>> +		break;
>>>>>>> +	default:
>>>>>>> +		if (bo->bdev->funcs->access_memory)
>>>>>>> +			ret = bo->bdev->funcs->access_memory
>>>>>>> +				(bo, offset, buf, len, write);
>>>>>>> +		else
>>>>>>> +			ret = -EIO;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +	ttm_bo_unreserve(bo);
>>>>>>> +
>>>>>>> +	return ret;
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL(ttm_bo_access);
>>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>>>> index 2c699ed1963a..20b1e5f78684 100644
>>>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>>>> @@ -366,45 +366,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma)
>>>>>>>     }
>>>>>>>     EXPORT_SYMBOL(ttm_bo_vm_close);
>>>>>>> -static int ttm_bo_vm_access_kmap(struct ttm_buffer_object *bo,
>>>>>>> -				 unsigned long offset,
>>>>>>> -				 uint8_t *buf, int len, int write)
>>>>>>> -{
>>>>>>> -	unsigned long page = offset >> PAGE_SHIFT;
>>>>>>> -	unsigned long bytes_left = len;
>>>>>>> -	int ret;
>>>>>>> -
>>>>>>> -	/* Copy a page at a time, that way no extra virtual address
>>>>>>> -	 * mapping is needed
>>>>>>> -	 */
>>>>>>> -	offset -= page << PAGE_SHIFT;
>>>>>>> -	do {
>>>>>>> -		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
>>>>>>> -		struct ttm_bo_kmap_obj map;
>>>>>>> -		void *ptr;
>>>>>>> -		bool is_iomem;
>>>>>>> -
>>>>>>> -		ret = ttm_bo_kmap(bo, page, 1, &map);
>>>>>>> -		if (ret)
>>>>>>> -			return ret;
>>>>>>> -
>>>>>>> -		ptr = (uint8_t *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
>>>>>>> -		WARN_ON_ONCE(is_iomem);
>>>>>>> -		if (write)
>>>>>>> -			memcpy(ptr, buf, bytes);
>>>>>>> -		else
>>>>>>> -			memcpy(buf, ptr, bytes);
>>>>>>> -		ttm_bo_kunmap(&map);
>>>>>>> -
>>>>>>> -		page++;
>>>>>>> -		buf += bytes;
>>>>>>> -		bytes_left -= bytes;
>>>>>>> -		offset = 0;
>>>>>>> -	} while (bytes_left);
>>>>>>> -
>>>>>>> -	return len;
>>>>>>> -}
>>>>>>> -
>>>>>>>     int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
>>>>>>>     		     void *buf, int len, int write)
>>>>>>>     {
>>>>>>> @@ -412,32 +373,8 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
>>>>>>>     	unsigned long offset = (addr) - vma->vm_start +
>>>>>>>     		((vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node))
>>>>>>>     		 << PAGE_SHIFT);
>>>>>>> -	int ret;
>>>>>>> -
>>>>>>> -	if (len < 1 || (offset + len) > bo->base.size)
>>>>>>> -		return -EIO;
>>>>>>> -	ret = ttm_bo_reserve(bo, true, false, NULL);
>>>>>>> -	if (ret)
>>>>>>> -		return ret;
>>>>>>> -
>>>>>>> -	switch (bo->resource->mem_type) {
>>>>>>> -	case TTM_PL_SYSTEM:
>>>>>>> -		fallthrough;
>>>>>>> -	case TTM_PL_TT:
>>>>>>> -		ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write);
>>>>>>> -		break;
>>>>>>> -	default:
>>>>>>> -		if (bo->bdev->funcs->access_memory)
>>>>>>> -			ret = bo->bdev->funcs->access_memory(
>>>>>>> -				bo, offset, buf, len, write);
>>>>>>> -		else
>>>>>>> -			ret = -EIO;
>>>>>>> -	}
>>>>>>> -
>>>>>>> -	ttm_bo_unreserve(bo);
>>>>>>> -
>>>>>>> -	return ret;
>>>>>>> +	return ttm_bo_access(bo, offset, buf, len, write);
>>>>>>>     }
>>>>>>>     EXPORT_SYMBOL(ttm_bo_vm_access);
>>>>>>> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
>>>>>>> index 5804408815be..8ea11cd8df39 100644
>>>>>>> --- a/include/drm/ttm/ttm_bo.h
>>>>>>> +++ b/include/drm/ttm/ttm_bo.h
>>>>>>> @@ -421,6 +421,8 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo);
>>>>>>>     int ttm_bo_evict_first(struct ttm_device *bdev,
>>>>>>>     		       struct ttm_resource_manager *man,
>>>>>>>     		       struct ttm_operation_ctx *ctx);
>>>>>>> +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
>>>>>>> +		  void *buf, int len, int write);
>>>>>>>     vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
>>>>>>>     			     struct vm_fault *vmf);
>>>>>>>     vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
>>>>>>> -- 
>>>>>>> 2.34.1
>>>>>>>

[-- Attachment #2: Type: text/html, Size: 10815 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-06  9:48               ` Christian König
@ 2024-11-06 15:25                 ` Matthew Brost
  2024-11-06 15:44                   ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Matthew Brost @ 2024-11-06 15:25 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, Rodrigo Vivi, Huang Rui, intel-xe,
	dri-devel, matthew.auld

On Wed, Nov 06, 2024 at 10:48:45AM +0100, Christian König wrote:
> Am 05.11.24 um 19:35 schrieb Matthew Brost:
> > [SNIP]
> > > Well we spend quite some time removing single page mappings from device
> > > drivers.
> > > 
> > > The only remaining use case of ttm_bo_kmap() with just one page is the
> > > ttm_bo_vm_access_kmap() function and I was really hoping to make that one
> > > TTM internal at some point.
> > > 
> > This is still static, right? I suppose this exposes this to the outside
> > world though in another place. I asume there is a reason we can't use
> > vmap in ttm_bo_vm_access?
> 
> Well no, the point is we don't want to.
> 
> There is a huge push from upstream to avoid using kmap/vmap if possible.
> 
> > > > > You need a really good justification to bring that back.
> > > > > 
> > > > The use case is EuDebugger requires essentially the same functionality
> > > > as ptrace -> vm_access.
> > > Then why don't you use ptrace in the first place?
> > > 
> > I think the debugger speaks in GPU address space thus needs to access
> > via the GPU VM -> BO, userptrs.
> 
> Exactly that is strictly forbidden. You can't access userptrs through this.
> 

My mistake for mentioning userptr—I clearly caused confusion. This patch
itself has nothing to do with userptr; it is accessing a BO. In Xe, a
userptr doesn't have a BO, unlike in AMDGPU, where you have BOs for
userptrs.

The above use case was an example of modifying a GPU program with
breakpoints, speaking in GPU address space rather than CPU address
space. Hence, we cannot use ptrace. Userptr is a possible example, but
that access path in the code is different and, again, has nothing to do
with BO access in this patch.

> That's one of the major reasons why upstream has pushed back on using kmap
> so massively.

Userptr access is not part of this patch—it will be a separate code
path, so this seemingly does not apply.

> 
> Can you fully describe your use case? In other words what exactly is your
> debugger trying to do?

See above; I hope I've made this clearer.

Also, I'm not really an expert on Eudebug, as I haven't been involved in
the development aside from reviewing its interaction with the core of
Xe. Any further explanation would likely require me to loop in a
colleague.

> 
> > > > TTM mapping non-contiguous VRAM doesn't work unless I'm blind. User BOs
> > > > which the EuDebugger accesses can be non-contiguous, hence the new
> > > > helper.
> > > Then why don't you handle that inside the driver in the first place instead
> > > of going through a TTM midlayer?
> > > 
> > Well common code always seems like a good idea to me. Can do this if you
> > insist though.
> > 
> > What if I change my new helper ttm_bo_access to be based on vmap for
> > SYSTEM / TT but honestly that seems wasteful too for a temporary
> > access mapping.
> 
> Well, I think we need to take a step back. The major question is what is
> your use case and is that use case valid or causes security concerns.
> 
> For example userptrs are imported anonymous pages the GPU has a DMA mapping
> for. Re-mapping them into an user address space for debugging or even
> accessing them through the ptrace interface is strictly forbidden.
> 
> We already had people trying to do exactly that and it ended not well at
> all.
> 

Again, if we can focus on what this patch is doing—accessing a BO, not a
userptr—I think that will help progress here.

To bring things together: "There is a huge push from upstream to avoid
using kmap/vmap if possible." How would you suggest accessing a BO then?
kmap/vmap are used everywhere in the DRM subsystem to access BOs, so I’m
failing to see the problem with adding a simple helper based on existing
code.

Matt

> Regards,
> Christian.
> 
> > 
> > With this, I strongly prefer the code as is.
> > 
> > Matt
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > Matt
> > > > 
> > > > > Regards,
> > > > > Christian.
> > > > > 
> > > > > > > Matt
> > > > > > > 
> > > > > > > > Reported-by: Christoph Manszewski<christoph.manszewski@intel.com>
> > > > > > > > Suggested-by: Thomas Hellström<thomas.hellstrom@linux.intel.com>
> > > > > > > > Signed-off-by: Matthew Brost<matthew.brost@intel.com>
> > > > > > > > Tested-by: Mika Kuoppala<mika.kuoppala@linux.intel.com>
> > > > > > > > Reviewed-by: Matthew Auld<matthew.auld@intel.com>
> > > > > > > > ---
> > > > > > > >     drivers/gpu/drm/ttm/ttm_bo_util.c | 86 +++++++++++++++++++++++++++++++
> > > > > > > >     drivers/gpu/drm/ttm/ttm_bo_vm.c   | 65 +----------------------
> > > > > > > >     include/drm/ttm/ttm_bo.h          |  2 +
> > > > > > > >     3 files changed, 89 insertions(+), 64 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > > > > > index d939925efa81..77e760ea7193 100644
> > > > > > > > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > > > > > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > > > > > @@ -919,3 +919,89 @@ s64 ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct ttm_device *bdev,
> > > > > > > >     	return progress;
> > > > > > > >     }
> > > > > > > > +
> > > > > > > > +static int ttm_bo_access_kmap(struct ttm_buffer_object *bo,
> > > > > > > > +			      unsigned long offset,
> > > > > > > > +			      void *buf, int len, int write)
> > > > > > > > +{
> > > > > > > > +	unsigned long page = offset >> PAGE_SHIFT;
> > > > > > > > +	unsigned long bytes_left = len;
> > > > > > > > +	int ret;
> > > > > > > > +
> > > > > > > > +	/* Copy a page at a time, that way no extra virtual address
> > > > > > > > +	 * mapping is needed
> > > > > > > > +	 */
> > > > > > > > +	offset -= page << PAGE_SHIFT;
> > > > > > > > +	do {
> > > > > > > > +		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
> > > > > > > > +		struct ttm_bo_kmap_obj map;
> > > > > > > > +		void *ptr;
> > > > > > > > +		bool is_iomem;
> > > > > > > > +
> > > > > > > > +		ret = ttm_bo_kmap(bo, page, 1, &map);
> > > > > > > > +		if (ret)
> > > > > > > > +			return ret;
> > > > > > > > +
> > > > > > > > +		ptr = (void *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
> > > > > > > > +		WARN_ON_ONCE(is_iomem);
> > > > > > > > +		if (write)
> > > > > > > > +			memcpy(ptr, buf, bytes);
> > > > > > > > +		else
> > > > > > > > +			memcpy(buf, ptr, bytes);
> > > > > > > > +		ttm_bo_kunmap(&map);
> > > > > > > > +
> > > > > > > > +		page++;
> > > > > > > > +		buf += bytes;
> > > > > > > > +		bytes_left -= bytes;
> > > > > > > > +		offset = 0;
> > > > > > > > +	} while (bytes_left);
> > > > > > > > +
> > > > > > > > +	return len;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +/**
> > > > > > > > + * ttm_bo_access - Helper to access a buffer object
> > > > > > > > + *
> > > > > > > > + * @bo: ttm buffer object
> > > > > > > > + * @offset: access offset into buffer object
> > > > > > > > + * @buf: pointer to caller memory to read into or write from
> > > > > > > > + * @len: length of access
> > > > > > > > + * @write: write access
> > > > > > > > + *
> > > > > > > > + * Utility function to access a buffer object. Useful when buffer object cannot
> > > > > > > > + * be easily mapped (non-contiguous, non-visible, etc...).
> > > > > > > > + *
> > > > > > > > + * Returns:
> > > > > > > > + * @len if successful, negative error code on failure.
> > > > > > > > + */
> > > > > > > > +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
> > > > > > > > +		  void *buf, int len, int write)
> > > > > > > > +{
> > > > > > > > +	int ret;
> > > > > > > > +
> > > > > > > > +	if (len < 1 || (offset + len) > bo->base.size)
> > > > > > > > +		return -EIO;
> > > > > > > > +
> > > > > > > > +	ret = ttm_bo_reserve(bo, true, false, NULL);
> > > > > > > > +	if (ret)
> > > > > > > > +		return ret;
> > > > > > > > +
> > > > > > > > +	switch (bo->resource->mem_type) {
> > > > > > > > +	case TTM_PL_SYSTEM:
> > > > > > > > +		fallthrough;
> > > > > > > > +	case TTM_PL_TT:
> > > > > > > > +		ret = ttm_bo_access_kmap(bo, offset, buf, len, write);
> > > > > > > > +		break;
> > > > > > > > +	default:
> > > > > > > > +		if (bo->bdev->funcs->access_memory)
> > > > > > > > +			ret = bo->bdev->funcs->access_memory
> > > > > > > > +				(bo, offset, buf, len, write);
> > > > > > > > +		else
> > > > > > > > +			ret = -EIO;
> > > > > > > > +	}
> > > > > > > > +
> > > > > > > > +	ttm_bo_unreserve(bo);
> > > > > > > > +
> > > > > > > > +	return ret;
> > > > > > > > +}
> > > > > > > > +EXPORT_SYMBOL(ttm_bo_access);
> > > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > > > > > > index 2c699ed1963a..20b1e5f78684 100644
> > > > > > > > --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > > > > > > +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > > > > > > @@ -366,45 +366,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma)
> > > > > > > >     }
> > > > > > > >     EXPORT_SYMBOL(ttm_bo_vm_close);
> > > > > > > > -static int ttm_bo_vm_access_kmap(struct ttm_buffer_object *bo,
> > > > > > > > -				 unsigned long offset,
> > > > > > > > -				 uint8_t *buf, int len, int write)
> > > > > > > > -{
> > > > > > > > -	unsigned long page = offset >> PAGE_SHIFT;
> > > > > > > > -	unsigned long bytes_left = len;
> > > > > > > > -	int ret;
> > > > > > > > -
> > > > > > > > -	/* Copy a page at a time, that way no extra virtual address
> > > > > > > > -	 * mapping is needed
> > > > > > > > -	 */
> > > > > > > > -	offset -= page << PAGE_SHIFT;
> > > > > > > > -	do {
> > > > > > > > -		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
> > > > > > > > -		struct ttm_bo_kmap_obj map;
> > > > > > > > -		void *ptr;
> > > > > > > > -		bool is_iomem;
> > > > > > > > -
> > > > > > > > -		ret = ttm_bo_kmap(bo, page, 1, &map);
> > > > > > > > -		if (ret)
> > > > > > > > -			return ret;
> > > > > > > > -
> > > > > > > > -		ptr = (uint8_t *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
> > > > > > > > -		WARN_ON_ONCE(is_iomem);
> > > > > > > > -		if (write)
> > > > > > > > -			memcpy(ptr, buf, bytes);
> > > > > > > > -		else
> > > > > > > > -			memcpy(buf, ptr, bytes);
> > > > > > > > -		ttm_bo_kunmap(&map);
> > > > > > > > -
> > > > > > > > -		page++;
> > > > > > > > -		buf += bytes;
> > > > > > > > -		bytes_left -= bytes;
> > > > > > > > -		offset = 0;
> > > > > > > > -	} while (bytes_left);
> > > > > > > > -
> > > > > > > > -	return len;
> > > > > > > > -}
> > > > > > > > -
> > > > > > > >     int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
> > > > > > > >     		     void *buf, int len, int write)
> > > > > > > >     {
> > > > > > > > @@ -412,32 +373,8 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
> > > > > > > >     	unsigned long offset = (addr) - vma->vm_start +
> > > > > > > >     		((vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node))
> > > > > > > >     		 << PAGE_SHIFT);
> > > > > > > > -	int ret;
> > > > > > > > -
> > > > > > > > -	if (len < 1 || (offset + len) > bo->base.size)
> > > > > > > > -		return -EIO;
> > > > > > > > -	ret = ttm_bo_reserve(bo, true, false, NULL);
> > > > > > > > -	if (ret)
> > > > > > > > -		return ret;
> > > > > > > > -
> > > > > > > > -	switch (bo->resource->mem_type) {
> > > > > > > > -	case TTM_PL_SYSTEM:
> > > > > > > > -		fallthrough;
> > > > > > > > -	case TTM_PL_TT:
> > > > > > > > -		ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write);
> > > > > > > > -		break;
> > > > > > > > -	default:
> > > > > > > > -		if (bo->bdev->funcs->access_memory)
> > > > > > > > -			ret = bo->bdev->funcs->access_memory(
> > > > > > > > -				bo, offset, buf, len, write);
> > > > > > > > -		else
> > > > > > > > -			ret = -EIO;
> > > > > > > > -	}
> > > > > > > > -
> > > > > > > > -	ttm_bo_unreserve(bo);
> > > > > > > > -
> > > > > > > > -	return ret;
> > > > > > > > +	return ttm_bo_access(bo, offset, buf, len, write);
> > > > > > > >     }
> > > > > > > >     EXPORT_SYMBOL(ttm_bo_vm_access);
> > > > > > > > diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> > > > > > > > index 5804408815be..8ea11cd8df39 100644
> > > > > > > > --- a/include/drm/ttm/ttm_bo.h
> > > > > > > > +++ b/include/drm/ttm/ttm_bo.h
> > > > > > > > @@ -421,6 +421,8 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo);
> > > > > > > >     int ttm_bo_evict_first(struct ttm_device *bdev,
> > > > > > > >     		       struct ttm_resource_manager *man,
> > > > > > > >     		       struct ttm_operation_ctx *ctx);
> > > > > > > > +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
> > > > > > > > +		  void *buf, int len, int write);
> > > > > > > >     vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
> > > > > > > >     			     struct vm_fault *vmf);
> > > > > > > >     vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
> > > > > > > > -- 
> > > > > > > > 2.34.1
> > > > > > > > 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-06 15:25                 ` Matthew Brost
@ 2024-11-06 15:44                   ` Christian König
  2024-11-06 17:00                     ` Matthew Brost
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2024-11-06 15:44 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Christian König, Rodrigo Vivi, Huang Rui, intel-xe,
	dri-devel, matthew.auld

[-- Attachment #1: Type: text/plain, Size: 10171 bytes --]

Am 06.11.24 um 16:25 schrieb Matthew Brost:
> [SNIP]
>> Can you fully describe your use case? In other words what exactly is your
>> debugger trying to do?
> See above; I hope I've made this clearer.

It at least sounds a little bit better.

> Also, I'm not really an expert on Eudebug, as I haven't been involved in
> the development aside from reviewing its interaction with the core of
> Xe. Any further explanation would likely require me to loop in a
> colleague.

I think that could help since I don't have a clear picture of your use case.


>> Well, I think we need to take a step back. The major question is what is
>> your use case and is that use case valid or causes security concerns.
>>
>> For example userptrs are imported anonymous pages the GPU has a DMA mapping
>> for. Re-mapping them into an user address space for debugging or even
>> accessing them through the ptrace interface is strictly forbidden.
>>
>> We already had people trying to do exactly that and it ended not well at
>> all.
>>
> Again, if we can focus on what this patch is doing—accessing a BO, not a
> userptr—I think that will help progress here.
>
> To bring things together: "There is a huge push from upstream to avoid
> using kmap/vmap if possible." How would you suggest accessing a BO then?

Well that's the whole point: You should *not* access the BO on behalves 
of userspace in a peek/poke like interface.

> kmap/vmap are used everywhere in the DRM subsystem to access BOs, so I’m
> failing to see the problem with adding a simple helper based on existing
> code.

What#s possible and often done is to do kmap/vmap if you need to 
implement a CPU copy for scanout for example or for copying/validating 
command buffers. But that usually requires accessing the whole BO and 
has separate security checks.

When you want to access only a few bytes of a BO that sounds massively 
like a peek/poke like interface and we have already rejected that more 
than once. There even used to be standardized GEM IOCTLs for that which 
have been removed by now.

If you need to access BOs which are placed in not CPU accessible memory 
then implement the access callback for ptrace, see 
amdgpu_ttm_access_memory for an example how to do this.

Regards,
Christian.

>
> Matt
>
>> Regards,
>> Christian.
>>
>>> With this, I strongly prefer the code as is.
>>>
>>> Matt
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Matt
>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>>> Matt
>>>>>>>>
>>>>>>>>> Reported-by: Christoph Manszewski<christoph.manszewski@intel.com>
>>>>>>>>> Suggested-by: Thomas Hellström<thomas.hellstrom@linux.intel.com>
>>>>>>>>> Signed-off-by: Matthew Brost<matthew.brost@intel.com>
>>>>>>>>> Tested-by: Mika Kuoppala<mika.kuoppala@linux.intel.com>
>>>>>>>>> Reviewed-by: Matthew Auld<matthew.auld@intel.com>
>>>>>>>>> ---
>>>>>>>>>      drivers/gpu/drm/ttm/ttm_bo_util.c | 86 +++++++++++++++++++++++++++++++
>>>>>>>>>      drivers/gpu/drm/ttm/ttm_bo_vm.c   | 65 +----------------------
>>>>>>>>>      include/drm/ttm/ttm_bo.h          |  2 +
>>>>>>>>>      3 files changed, 89 insertions(+), 64 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>>>>>>> index d939925efa81..77e760ea7193 100644
>>>>>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>>>>>>>> @@ -919,3 +919,89 @@ s64 ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct ttm_device *bdev,
>>>>>>>>>      	return progress;
>>>>>>>>>      }
>>>>>>>>> +
>>>>>>>>> +static int ttm_bo_access_kmap(struct ttm_buffer_object *bo,
>>>>>>>>> +			      unsigned long offset,
>>>>>>>>> +			      void *buf, int len, int write)
>>>>>>>>> +{
>>>>>>>>> +	unsigned long page = offset >> PAGE_SHIFT;
>>>>>>>>> +	unsigned long bytes_left = len;
>>>>>>>>> +	int ret;
>>>>>>>>> +
>>>>>>>>> +	/* Copy a page at a time, that way no extra virtual address
>>>>>>>>> +	 * mapping is needed
>>>>>>>>> +	 */
>>>>>>>>> +	offset -= page << PAGE_SHIFT;
>>>>>>>>> +	do {
>>>>>>>>> +		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
>>>>>>>>> +		struct ttm_bo_kmap_obj map;
>>>>>>>>> +		void *ptr;
>>>>>>>>> +		bool is_iomem;
>>>>>>>>> +
>>>>>>>>> +		ret = ttm_bo_kmap(bo, page, 1, &map);
>>>>>>>>> +		if (ret)
>>>>>>>>> +			return ret;
>>>>>>>>> +
>>>>>>>>> +		ptr = (void *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
>>>>>>>>> +		WARN_ON_ONCE(is_iomem);
>>>>>>>>> +		if (write)
>>>>>>>>> +			memcpy(ptr, buf, bytes);
>>>>>>>>> +		else
>>>>>>>>> +			memcpy(buf, ptr, bytes);
>>>>>>>>> +		ttm_bo_kunmap(&map);
>>>>>>>>> +
>>>>>>>>> +		page++;
>>>>>>>>> +		buf += bytes;
>>>>>>>>> +		bytes_left -= bytes;
>>>>>>>>> +		offset = 0;
>>>>>>>>> +	} while (bytes_left);
>>>>>>>>> +
>>>>>>>>> +	return len;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * ttm_bo_access - Helper to access a buffer object
>>>>>>>>> + *
>>>>>>>>> + * @bo: ttm buffer object
>>>>>>>>> + * @offset: access offset into buffer object
>>>>>>>>> + * @buf: pointer to caller memory to read into or write from
>>>>>>>>> + * @len: length of access
>>>>>>>>> + * @write: write access
>>>>>>>>> + *
>>>>>>>>> + * Utility function to access a buffer object. Useful when buffer object cannot
>>>>>>>>> + * be easily mapped (non-contiguous, non-visible, etc...).
>>>>>>>>> + *
>>>>>>>>> + * Returns:
>>>>>>>>> + * @len if successful, negative error code on failure.
>>>>>>>>> + */
>>>>>>>>> +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
>>>>>>>>> +		  void *buf, int len, int write)
>>>>>>>>> +{
>>>>>>>>> +	int ret;
>>>>>>>>> +
>>>>>>>>> +	if (len < 1 || (offset + len) > bo->base.size)
>>>>>>>>> +		return -EIO;
>>>>>>>>> +
>>>>>>>>> +	ret = ttm_bo_reserve(bo, true, false, NULL);
>>>>>>>>> +	if (ret)
>>>>>>>>> +		return ret;
>>>>>>>>> +
>>>>>>>>> +	switch (bo->resource->mem_type) {
>>>>>>>>> +	case TTM_PL_SYSTEM:
>>>>>>>>> +		fallthrough;
>>>>>>>>> +	case TTM_PL_TT:
>>>>>>>>> +		ret = ttm_bo_access_kmap(bo, offset, buf, len, write);
>>>>>>>>> +		break;
>>>>>>>>> +	default:
>>>>>>>>> +		if (bo->bdev->funcs->access_memory)
>>>>>>>>> +			ret = bo->bdev->funcs->access_memory
>>>>>>>>> +				(bo, offset, buf, len, write);
>>>>>>>>> +		else
>>>>>>>>> +			ret = -EIO;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	ttm_bo_unreserve(bo);
>>>>>>>>> +
>>>>>>>>> +	return ret;
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL(ttm_bo_access);
>>>>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>>>>>> index 2c699ed1963a..20b1e5f78684 100644
>>>>>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>>>>>>>> @@ -366,45 +366,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma)
>>>>>>>>>      }
>>>>>>>>>      EXPORT_SYMBOL(ttm_bo_vm_close);
>>>>>>>>> -static int ttm_bo_vm_access_kmap(struct ttm_buffer_object *bo,
>>>>>>>>> -				 unsigned long offset,
>>>>>>>>> -				 uint8_t *buf, int len, int write)
>>>>>>>>> -{
>>>>>>>>> -	unsigned long page = offset >> PAGE_SHIFT;
>>>>>>>>> -	unsigned long bytes_left = len;
>>>>>>>>> -	int ret;
>>>>>>>>> -
>>>>>>>>> -	/* Copy a page at a time, that way no extra virtual address
>>>>>>>>> -	 * mapping is needed
>>>>>>>>> -	 */
>>>>>>>>> -	offset -= page << PAGE_SHIFT;
>>>>>>>>> -	do {
>>>>>>>>> -		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
>>>>>>>>> -		struct ttm_bo_kmap_obj map;
>>>>>>>>> -		void *ptr;
>>>>>>>>> -		bool is_iomem;
>>>>>>>>> -
>>>>>>>>> -		ret = ttm_bo_kmap(bo, page, 1, &map);
>>>>>>>>> -		if (ret)
>>>>>>>>> -			return ret;
>>>>>>>>> -
>>>>>>>>> -		ptr = (uint8_t *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
>>>>>>>>> -		WARN_ON_ONCE(is_iomem);
>>>>>>>>> -		if (write)
>>>>>>>>> -			memcpy(ptr, buf, bytes);
>>>>>>>>> -		else
>>>>>>>>> -			memcpy(buf, ptr, bytes);
>>>>>>>>> -		ttm_bo_kunmap(&map);
>>>>>>>>> -
>>>>>>>>> -		page++;
>>>>>>>>> -		buf += bytes;
>>>>>>>>> -		bytes_left -= bytes;
>>>>>>>>> -		offset = 0;
>>>>>>>>> -	} while (bytes_left);
>>>>>>>>> -
>>>>>>>>> -	return len;
>>>>>>>>> -}
>>>>>>>>> -
>>>>>>>>>      int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
>>>>>>>>>      		     void *buf, int len, int write)
>>>>>>>>>      {
>>>>>>>>> @@ -412,32 +373,8 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
>>>>>>>>>      	unsigned long offset = (addr) - vma->vm_start +
>>>>>>>>>      		((vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node))
>>>>>>>>>      		 << PAGE_SHIFT);
>>>>>>>>> -	int ret;
>>>>>>>>> -
>>>>>>>>> -	if (len < 1 || (offset + len) > bo->base.size)
>>>>>>>>> -		return -EIO;
>>>>>>>>> -	ret = ttm_bo_reserve(bo, true, false, NULL);
>>>>>>>>> -	if (ret)
>>>>>>>>> -		return ret;
>>>>>>>>> -
>>>>>>>>> -	switch (bo->resource->mem_type) {
>>>>>>>>> -	case TTM_PL_SYSTEM:
>>>>>>>>> -		fallthrough;
>>>>>>>>> -	case TTM_PL_TT:
>>>>>>>>> -		ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write);
>>>>>>>>> -		break;
>>>>>>>>> -	default:
>>>>>>>>> -		if (bo->bdev->funcs->access_memory)
>>>>>>>>> -			ret = bo->bdev->funcs->access_memory(
>>>>>>>>> -				bo, offset, buf, len, write);
>>>>>>>>> -		else
>>>>>>>>> -			ret = -EIO;
>>>>>>>>> -	}
>>>>>>>>> -
>>>>>>>>> -	ttm_bo_unreserve(bo);
>>>>>>>>> -
>>>>>>>>> -	return ret;
>>>>>>>>> +	return ttm_bo_access(bo, offset, buf, len, write);
>>>>>>>>>      }
>>>>>>>>>      EXPORT_SYMBOL(ttm_bo_vm_access);
>>>>>>>>> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
>>>>>>>>> index 5804408815be..8ea11cd8df39 100644
>>>>>>>>> --- a/include/drm/ttm/ttm_bo.h
>>>>>>>>> +++ b/include/drm/ttm/ttm_bo.h
>>>>>>>>> @@ -421,6 +421,8 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo);
>>>>>>>>>      int ttm_bo_evict_first(struct ttm_device *bdev,
>>>>>>>>>      		       struct ttm_resource_manager *man,
>>>>>>>>>      		       struct ttm_operation_ctx *ctx);
>>>>>>>>> +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
>>>>>>>>> +		  void *buf, int len, int write);
>>>>>>>>>      vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
>>>>>>>>>      			     struct vm_fault *vmf);
>>>>>>>>>      vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
>>>>>>>>> -- 
>>>>>>>>> 2.34.1
>>>>>>>>>

[-- Attachment #2: Type: text/html, Size: 10915 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-06 15:44                   ` Christian König
@ 2024-11-06 17:00                     ` Matthew Brost
  2024-11-07  9:44                       ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Matthew Brost @ 2024-11-06 17:00 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, Rodrigo Vivi, Huang Rui, intel-xe,
	dri-devel, matthew.auld

On Wed, Nov 06, 2024 at 04:44:15PM +0100, Christian König wrote:
> Am 06.11.24 um 16:25 schrieb Matthew Brost:
> > [SNIP]
> > > Can you fully describe your use case? In other words what exactly is your
> > > debugger trying to do?
> > See above; I hope I've made this clearer.
> 
> It at least sounds a little bit better.
> 
> > Also, I'm not really an expert on Eudebug, as I haven't been involved in
> > the development aside from reviewing its interaction with the core of
> > Xe. Any further explanation would likely require me to loop in a
> > colleague.
> 
> I think that could help since I don't have a clear picture of your use case.
> 
> 
> > > Well, I think we need to take a step back. The major question is what is
> > > your use case and is that use case valid or causes security concerns.
> > > 
> > > For example userptrs are imported anonymous pages the GPU has a DMA mapping
> > > for. Re-mapping them into an user address space for debugging or even
> > > accessing them through the ptrace interface is strictly forbidden.
> > > 
> > > We already had people trying to do exactly that and it ended not well at
> > > all.
> > > 
> > Again, if we can focus on what this patch is doing—accessing a BO, not a
> > userptr—I think that will help progress here.
> > 
> > To bring things together: "There is a huge push from upstream to avoid
> > using kmap/vmap if possible." How would you suggest accessing a BO then?
> 
> Well that's the whole point: You should *not* access the BO on behalves of
> userspace in a peek/poke like interface.
> 

This is not a generic interface that anyone can freely access. The same
permissions used by ptrace are checked when opening such an interface.
See [1] [2].

[1] https://patchwork.freedesktop.org/patch/617470/?series=136572&rev=2
[2] https://patchwork.freedesktop.org/patch/617471/?series=136572&rev=2

> > kmap/vmap are used everywhere in the DRM subsystem to access BOs, so I’m
> > failing to see the problem with adding a simple helper based on existing
> > code.
> 
> What#s possible and often done is to do kmap/vmap if you need to implement a
> CPU copy for scanout for example or for copying/validating command buffers.
> But that usually requires accessing the whole BO and has separate security
> checks.
> 
> When you want to access only a few bytes of a BO that sounds massively like
> a peek/poke like interface and we have already rejected that more than once.
> There even used to be standardized GEM IOCTLs for that which have been
> removed by now.
> 
> If you need to access BOs which are placed in not CPU accessible memory then
> implement the access callback for ptrace, see amdgpu_ttm_access_memory for
> an example how to do this.
> 

Ptrace access via vm_operations_struct.access → ttm_bo_vm_access.

This series renames ttm_bo_vm_access to ttm_bo_access, with no code changes.

The above function accesses a BO via kmap if it is in SYSTEM / TT,
which is existing code.

This function is only exposed to user space via ptrace permissions.

In this series, we implement a function [3] similar to
amdgpu_ttm_access_memory for the TTM vfunc access_memory. What is
missing is non-visible CPU memory access, similar to
amdgpu_ttm_access_memory_sdma. This will be addressed in a follow-up and
was omitted in this series given its complexity.

So, this looks more or less identical to AMD's ptrace implementation,
but in GPU address space. Again, I fail to see what the problem is here.
What am I missing?

Matt

[3] https://patchwork.freedesktop.org/patch/622520/?series=140200&rev=6

> Regards,
> Christian.
> 
> > 
> > Matt
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > With this, I strongly prefer the code as is.
> > > > 
> > > > Matt
> > > > 
> > > > > Regards,
> > > > > Christian.
> > > > > 
> > > > > > Matt
> > > > > > 
> > > > > > > Regards,
> > > > > > > Christian.
> > > > > > > 
> > > > > > > > > Matt
> > > > > > > > > 
> > > > > > > > > > Reported-by: Christoph Manszewski<christoph.manszewski@intel.com>
> > > > > > > > > > Suggested-by: Thomas Hellström<thomas.hellstrom@linux.intel.com>
> > > > > > > > > > Signed-off-by: Matthew Brost<matthew.brost@intel.com>
> > > > > > > > > > Tested-by: Mika Kuoppala<mika.kuoppala@linux.intel.com>
> > > > > > > > > > Reviewed-by: Matthew Auld<matthew.auld@intel.com>
> > > > > > > > > > ---
> > > > > > > > > >      drivers/gpu/drm/ttm/ttm_bo_util.c | 86 +++++++++++++++++++++++++++++++
> > > > > > > > > >      drivers/gpu/drm/ttm/ttm_bo_vm.c   | 65 +----------------------
> > > > > > > > > >      include/drm/ttm/ttm_bo.h          |  2 +
> > > > > > > > > >      3 files changed, 89 insertions(+), 64 deletions(-)
> > > > > > > > > > 
> > > > > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > > > > > > > index d939925efa81..77e760ea7193 100644
> > > > > > > > > > --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > > > > > > > +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> > > > > > > > > > @@ -919,3 +919,89 @@ s64 ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct ttm_device *bdev,
> > > > > > > > > >      	return progress;
> > > > > > > > > >      }
> > > > > > > > > > +
> > > > > > > > > > +static int ttm_bo_access_kmap(struct ttm_buffer_object *bo,
> > > > > > > > > > +			      unsigned long offset,
> > > > > > > > > > +			      void *buf, int len, int write)
> > > > > > > > > > +{
> > > > > > > > > > +	unsigned long page = offset >> PAGE_SHIFT;
> > > > > > > > > > +	unsigned long bytes_left = len;
> > > > > > > > > > +	int ret;
> > > > > > > > > > +
> > > > > > > > > > +	/* Copy a page at a time, that way no extra virtual address
> > > > > > > > > > +	 * mapping is needed
> > > > > > > > > > +	 */
> > > > > > > > > > +	offset -= page << PAGE_SHIFT;
> > > > > > > > > > +	do {
> > > > > > > > > > +		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
> > > > > > > > > > +		struct ttm_bo_kmap_obj map;
> > > > > > > > > > +		void *ptr;
> > > > > > > > > > +		bool is_iomem;
> > > > > > > > > > +
> > > > > > > > > > +		ret = ttm_bo_kmap(bo, page, 1, &map);
> > > > > > > > > > +		if (ret)
> > > > > > > > > > +			return ret;
> > > > > > > > > > +
> > > > > > > > > > +		ptr = (void *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
> > > > > > > > > > +		WARN_ON_ONCE(is_iomem);
> > > > > > > > > > +		if (write)
> > > > > > > > > > +			memcpy(ptr, buf, bytes);
> > > > > > > > > > +		else
> > > > > > > > > > +			memcpy(buf, ptr, bytes);
> > > > > > > > > > +		ttm_bo_kunmap(&map);
> > > > > > > > > > +
> > > > > > > > > > +		page++;
> > > > > > > > > > +		buf += bytes;
> > > > > > > > > > +		bytes_left -= bytes;
> > > > > > > > > > +		offset = 0;
> > > > > > > > > > +	} while (bytes_left);
> > > > > > > > > > +
> > > > > > > > > > +	return len;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +/**
> > > > > > > > > > + * ttm_bo_access - Helper to access a buffer object
> > > > > > > > > > + *
> > > > > > > > > > + * @bo: ttm buffer object
> > > > > > > > > > + * @offset: access offset into buffer object
> > > > > > > > > > + * @buf: pointer to caller memory to read into or write from
> > > > > > > > > > + * @len: length of access
> > > > > > > > > > + * @write: write access
> > > > > > > > > > + *
> > > > > > > > > > + * Utility function to access a buffer object. Useful when buffer object cannot
> > > > > > > > > > + * be easily mapped (non-contiguous, non-visible, etc...).
> > > > > > > > > > + *
> > > > > > > > > > + * Returns:
> > > > > > > > > > + * @len if successful, negative error code on failure.
> > > > > > > > > > + */
> > > > > > > > > > +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
> > > > > > > > > > +		  void *buf, int len, int write)
> > > > > > > > > > +{
> > > > > > > > > > +	int ret;
> > > > > > > > > > +
> > > > > > > > > > +	if (len < 1 || (offset + len) > bo->base.size)
> > > > > > > > > > +		return -EIO;
> > > > > > > > > > +
> > > > > > > > > > +	ret = ttm_bo_reserve(bo, true, false, NULL);
> > > > > > > > > > +	if (ret)
> > > > > > > > > > +		return ret;
> > > > > > > > > > +
> > > > > > > > > > +	switch (bo->resource->mem_type) {
> > > > > > > > > > +	case TTM_PL_SYSTEM:
> > > > > > > > > > +		fallthrough;
> > > > > > > > > > +	case TTM_PL_TT:
> > > > > > > > > > +		ret = ttm_bo_access_kmap(bo, offset, buf, len, write);
> > > > > > > > > > +		break;
> > > > > > > > > > +	default:
> > > > > > > > > > +		if (bo->bdev->funcs->access_memory)
> > > > > > > > > > +			ret = bo->bdev->funcs->access_memory
> > > > > > > > > > +				(bo, offset, buf, len, write);
> > > > > > > > > > +		else
> > > > > > > > > > +			ret = -EIO;
> > > > > > > > > > +	}
> > > > > > > > > > +
> > > > > > > > > > +	ttm_bo_unreserve(bo);
> > > > > > > > > > +
> > > > > > > > > > +	return ret;
> > > > > > > > > > +}
> > > > > > > > > > +EXPORT_SYMBOL(ttm_bo_access);
> > > > > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > > > > > > > > index 2c699ed1963a..20b1e5f78684 100644
> > > > > > > > > > --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > > > > > > > > +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > > > > > > > > @@ -366,45 +366,6 @@ void ttm_bo_vm_close(struct vm_area_struct *vma)
> > > > > > > > > >      }
> > > > > > > > > >      EXPORT_SYMBOL(ttm_bo_vm_close);
> > > > > > > > > > -static int ttm_bo_vm_access_kmap(struct ttm_buffer_object *bo,
> > > > > > > > > > -				 unsigned long offset,
> > > > > > > > > > -				 uint8_t *buf, int len, int write)
> > > > > > > > > > -{
> > > > > > > > > > -	unsigned long page = offset >> PAGE_SHIFT;
> > > > > > > > > > -	unsigned long bytes_left = len;
> > > > > > > > > > -	int ret;
> > > > > > > > > > -
> > > > > > > > > > -	/* Copy a page at a time, that way no extra virtual address
> > > > > > > > > > -	 * mapping is needed
> > > > > > > > > > -	 */
> > > > > > > > > > -	offset -= page << PAGE_SHIFT;
> > > > > > > > > > -	do {
> > > > > > > > > > -		unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
> > > > > > > > > > -		struct ttm_bo_kmap_obj map;
> > > > > > > > > > -		void *ptr;
> > > > > > > > > > -		bool is_iomem;
> > > > > > > > > > -
> > > > > > > > > > -		ret = ttm_bo_kmap(bo, page, 1, &map);
> > > > > > > > > > -		if (ret)
> > > > > > > > > > -			return ret;
> > > > > > > > > > -
> > > > > > > > > > -		ptr = (uint8_t *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
> > > > > > > > > > -		WARN_ON_ONCE(is_iomem);
> > > > > > > > > > -		if (write)
> > > > > > > > > > -			memcpy(ptr, buf, bytes);
> > > > > > > > > > -		else
> > > > > > > > > > -			memcpy(buf, ptr, bytes);
> > > > > > > > > > -		ttm_bo_kunmap(&map);
> > > > > > > > > > -
> > > > > > > > > > -		page++;
> > > > > > > > > > -		buf += bytes;
> > > > > > > > > > -		bytes_left -= bytes;
> > > > > > > > > > -		offset = 0;
> > > > > > > > > > -	} while (bytes_left);
> > > > > > > > > > -
> > > > > > > > > > -	return len;
> > > > > > > > > > -}
> > > > > > > > > > -
> > > > > > > > > >      int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
> > > > > > > > > >      		     void *buf, int len, int write)
> > > > > > > > > >      {
> > > > > > > > > > @@ -412,32 +373,8 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
> > > > > > > > > >      	unsigned long offset = (addr) - vma->vm_start +
> > > > > > > > > >      		((vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node))
> > > > > > > > > >      		 << PAGE_SHIFT);
> > > > > > > > > > -	int ret;
> > > > > > > > > > -
> > > > > > > > > > -	if (len < 1 || (offset + len) > bo->base.size)
> > > > > > > > > > -		return -EIO;
> > > > > > > > > > -	ret = ttm_bo_reserve(bo, true, false, NULL);
> > > > > > > > > > -	if (ret)
> > > > > > > > > > -		return ret;
> > > > > > > > > > -
> > > > > > > > > > -	switch (bo->resource->mem_type) {
> > > > > > > > > > -	case TTM_PL_SYSTEM:
> > > > > > > > > > -		fallthrough;
> > > > > > > > > > -	case TTM_PL_TT:
> > > > > > > > > > -		ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write);
> > > > > > > > > > -		break;
> > > > > > > > > > -	default:
> > > > > > > > > > -		if (bo->bdev->funcs->access_memory)
> > > > > > > > > > -			ret = bo->bdev->funcs->access_memory(
> > > > > > > > > > -				bo, offset, buf, len, write);
> > > > > > > > > > -		else
> > > > > > > > > > -			ret = -EIO;
> > > > > > > > > > -	}
> > > > > > > > > > -
> > > > > > > > > > -	ttm_bo_unreserve(bo);
> > > > > > > > > > -
> > > > > > > > > > -	return ret;
> > > > > > > > > > +	return ttm_bo_access(bo, offset, buf, len, write);
> > > > > > > > > >      }
> > > > > > > > > >      EXPORT_SYMBOL(ttm_bo_vm_access);
> > > > > > > > > > diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> > > > > > > > > > index 5804408815be..8ea11cd8df39 100644
> > > > > > > > > > --- a/include/drm/ttm/ttm_bo.h
> > > > > > > > > > +++ b/include/drm/ttm/ttm_bo.h
> > > > > > > > > > @@ -421,6 +421,8 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo);
> > > > > > > > > >      int ttm_bo_evict_first(struct ttm_device *bdev,
> > > > > > > > > >      		       struct ttm_resource_manager *man,
> > > > > > > > > >      		       struct ttm_operation_ctx *ctx);
> > > > > > > > > > +int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
> > > > > > > > > > +		  void *buf, int len, int write);
> > > > > > > > > >      vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
> > > > > > > > > >      			     struct vm_fault *vmf);
> > > > > > > > > >      vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
> > > > > > > > > > -- 
> > > > > > > > > > 2.34.1
> > > > > > > > > > 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-06 17:00                     ` Matthew Brost
@ 2024-11-07  9:44                       ` Christian König
  2024-11-11  8:00                         ` Joonas Lahtinen
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2024-11-07  9:44 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Christian König, Rodrigo Vivi, Huang Rui, intel-xe,
	dri-devel, matthew.auld

[-- Attachment #1: Type: text/plain, Size: 3748 bytes --]

Am 06.11.24 um 18:00 schrieb Matthew Brost:
> [SNIP]
> This is not a generic interface that anyone can freely access. The same
> permissions used by ptrace are checked when opening such an interface.
> See [1] [2].
>
> [1]https://patchwork.freedesktop.org/patch/617470/?series=136572&rev=2
> [2]https://patchwork.freedesktop.org/patch/617471/?series=136572&rev=2

Thanks a lot for those pointers, that is exactly what I was looking for.

And yeah, it is what I feared. You are re-implementing existing 
functionality, but see below.

>>> kmap/vmap are used everywhere in the DRM subsystem to access BOs, so I’m
>>> failing to see the problem with adding a simple helper based on existing
>>> code.
>> What#s possible and often done is to do kmap/vmap if you need to implement a
>> CPU copy for scanout for example or for copying/validating command buffers.
>> But that usually requires accessing the whole BO and has separate security
>> checks.
>>
>> When you want to access only a few bytes of a BO that sounds massively like
>> a peek/poke like interface and we have already rejected that more than once.
>> There even used to be standardized GEM IOCTLs for that which have been
>> removed by now.
>>
>> If you need to access BOs which are placed in not CPU accessible memory then
>> implement the access callback for ptrace, see amdgpu_ttm_access_memory for
>> an example how to do this.
>>
> Ptrace access via vm_operations_struct.access → ttm_bo_vm_access.
>
> This series renames ttm_bo_vm_access to ttm_bo_access, with no code changes.
>
> The above function accesses a BO via kmap if it is in SYSTEM / TT,
> which is existing code.
>
> This function is only exposed to user space via ptrace permissions.
> In this series, we implement a function [3] similar to
> amdgpu_ttm_access_memory for the TTM vfunc access_memory. What is
> missing is non-visible CPU memory access, similar to
> amdgpu_ttm_access_memory_sdma. This will be addressed in a follow-up and
> was omitted in this series given its complexity.
>
> So, this looks more or less identical to AMD's ptrace implementation,
> but in GPU address space. Again, I fail to see what the problem is here.
> What am I missing?

The main question is why can't you use the existing interfaces directly?

Additional to the peek/poke interface of ptrace Linux has the 
pidfd_getfd system call, see here 
https://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.

The pidfd_getfd() allows to dup() the render node file descriptor into 
your gdb process. That in turn gives you all the access you need from 
gdb, including mapping BOs and command submission on behalf of the 
application.

As far as I can see that allows for the same functionality as the 
eudebug interface, just without any driver specific code messing with 
ptrace permissions and peek/poke interfaces.

So the question is still why do you need the whole eudebug interface in 
the first place? I might be missing something, but that seems to be 
superfluous from a high level view.

It's true that the AMD KFD part has still similar functionality, but 
that is because of the broken KFD design of tying driver state to the 
CPU process (which makes it inaccessible for gdb even with imported 
render node fd).

Both Sima and I (and partially Dave as well) have pushed back on the KFD 
approach. And the long term plan is to get rid of such device driver 
specific interface which re-implement existing functionality just 
differently.

So you need to have a really really good explanation why the eudebug 
interface is actually necessary.

Regards,
Christian.

>
> Matt
>
> [3]https://patchwork.freedesktop.org/patch/622520/?series=140200&rev=6
>
>> Regards,
>> Christian.
>>
>>> Matt
>>>
>>>> Regards,
>>>> Christian.

[-- Attachment #2: Type: text/html, Size: 5707 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-07  9:44                       ` Christian König
@ 2024-11-11  8:00                         ` Joonas Lahtinen
  2024-11-11 10:10                           ` Simona Vetter
  2024-11-11 11:27                           ` Christian König
  0 siblings, 2 replies; 56+ messages in thread
From: Joonas Lahtinen @ 2024-11-11  8:00 UTC (permalink / raw)
  To: Christian König, Matthew Brost
  Cc: Christian König, Rodrigo Vivi, Huang Rui, intel-xe,
	dri-devel, matthew.auld, David Airlie, Simona Vetter

Back from some time off and will try to answer below.

Adding Dave and Sima as this topic has been previously discussed to some
extent and will be good to reach common understanding about what the
series is trying to do and what is the difference to the AMD debugging
model.

Quoting Christian König (2024-11-07 11:44:33)
> Am 06.11.24 um 18:00 schrieb Matthew Brost:
> 
>     [SNIP]
> 
>     This is not a generic interface that anyone can freely access. The same
>     permissions used by ptrace are checked when opening such an interface.
>     See [1] [2].
> 
>     [1] https://patchwork.freedesktop.org/patch/617470/?series=136572&rev=2
>     [2] https://patchwork.freedesktop.org/patch/617471/?series=136572&rev=2
> 
> 
> Thanks a lot for those pointers, that is exactly what I was looking for.
> 
> And yeah, it is what I feared. You are re-implementing existing functionality,
> but see below.

Could you elaborate on what this "existing functionality" exactly is?
I do not think this functionality exists at this time.

The EU debugging architecture for Xe specifically avoids the need for GDB
to attach with ptrace to the CPU process or interfere with the CPU process for
the debugging via parasitic threads or so.

Debugger connection is opened to the DRM driver for given PID (which uses the
ptrace may access check for now) after which the all DRM client of that
PID are exposed to the debugger process.

What we want to expose via that debugger connection is the ability for GDB to
read/write the different GPU VM address spaces (ppGTT for Intel GPUs) just like
the EU threads would see them. Note that the layout of the ppGTT is
completely up to the userspace driver to setup and is mostly only partially
equal to the CPU address space.

Specifically as part of reading/writing the ppGTT for debugging purposes,
there are deep flushes needed: for example flushing instruction cache
when adding/removing breakpoints.

Maybe that will explain the background. I elaborate on this at the end some more.

>             kmap/vmap are used everywhere in the DRM subsystem to access BOs, so I’m
>             failing to see the problem with adding a simple helper based on existing
>             code.
> 
>         What#s possible and often done is to do kmap/vmap if you need to implement a
>         CPU copy for scanout for example or for copying/validating command buffers.
>         But that usually requires accessing the whole BO and has separate security
>         checks.
> 
>         When you want to access only a few bytes of a BO that sounds massively like
>         a peek/poke like interface and we have already rejected that more than once.
>         There even used to be standardized GEM IOCTLs for that which have been
>         removed by now.

Referring to the explanation at top: These IOCTL are not for the debugging target
process to issue. The peek/poke interface is specifically for GDB only
to facilitate the emulation of memory reads/writes on the GPU address
space as they were done by EUs themselves. And to recap: for modifying
instructions for example (add/remove breakpoint), extra level of cache flushing is
needed which is not available to regular userspace.

I specifically discussed with Sima on the difference before moving forward with this
design originally. If something has changed since then, I'm of course happy to rediscuss.

However, if this code can't be added, not sure how we would ever be able
to implement core dumps for GPU threads/memory?

>         If you need to access BOs which are placed in not CPU accessible memory then
>         implement the access callback for ptrace, see amdgpu_ttm_access_memory for
>         an example how to do this.

As also mentioned above, we don't work via ptrace at all when it comes
to debugging the EUs. The only thing used for now is the ptrace_may_access to
implement similar access restrictions as ptrace has. This can be changed
to something else if needed.

>     Ptrace access via vm_operations_struct.access → ttm_bo_vm_access.
> 
>     This series renames ttm_bo_vm_access to ttm_bo_access, with no code changes.
> 
>     The above function accesses a BO via kmap if it is in SYSTEM / TT,
>     which is existing code.
> 
>     This function is only exposed to user space via ptrace permissions.

Maybe this sentence is what caused the confusion.

Userspace is never exposed with peek/poke interface, only the debugger
connection which is its own FD.

>     In this series, we implement a function [3] similar to
>     amdgpu_ttm_access_memory for the TTM vfunc access_memory. What is
>     missing is non-visible CPU memory access, similar to
>     amdgpu_ttm_access_memory_sdma. This will be addressed in a follow-up and
>     was omitted in this series given its complexity.
> 
>     So, this looks more or less identical to AMD's ptrace implementation,
>     but in GPU address space. Again, I fail to see what the problem is here.
>     What am I missing?
> 
> 
> The main question is why can't you use the existing interfaces directly?

We're not working on the CPU address space or BOs. We're working
strictly on the GPU address space as would be seen by an EU thread if it
accessed address X.

> Additional to the peek/poke interface of ptrace Linux has the pidfd_getfd
> system call, see here https://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
> 
> The pidfd_getfd() allows to dup() the render node file descriptor into your gdb
> process. That in turn gives you all the access you need from gdb, including
> mapping BOs and command submission on behalf of the application.

We're not operating on the CPU address space nor are we operating on BOs
(there is no concept of BO in the EU debug interface). Each VMA in the VM
could come from anywhere, only the start address and size matter. And
neither do we need to interfere with the command submission of the
process under debug.

> As far as I can see that allows for the same functionality as the eudebug
> interface, just without any driver specific code messing with ptrace
> permissions and peek/poke interfaces.
> 
> So the question is still why do you need the whole eudebug interface in the
> first place? I might be missing something, but that seems to be superfluous
> from a high level view.

Recapping from above. It is to allow the debugging of EU threads per DRM
client, completely independent of the CPU process. If ptrace_may_acces
is the sore point, we could consider other permission checks, too. There
is no other connection to ptrace in this architecture as single
permission check to know if PID is fair game to access by debugger
process.

Why no parasitic thread or ptrace: Going forward, binding the EU debugging to
the DRM client would also pave way for being able to extend core kernel generated
core dump with each DRM client's EU thread/memory dump. We have similar
feature called "Offline core dump" enabled in the downstream public
trees for i915, where we currently attach the EU thread dump to i915 error state
and then later combine i915 error state with CPU core dump file with a
tool.

This is relatively little amount of extra code, as this baseline series
already introduces GDB the ability to perform the necessary actions.
It's just the matter of kernel driver calling: "stop all threads", then
copying the memory map and memory contents for GPU threads, just like is
done for CPU threads.

With parasitic thread injection, not sure if there is such way forward,
as it would seem to require to inject quite abit more logic to core kernel?

> It's true that the AMD KFD part has still similar functionality, but that is
> because of the broken KFD design of tying driver state to the CPU process
> (which makes it inaccessible for gdb even with imported render node fd).
> 
> Both Sima and I (and partially Dave as well) have pushed back on the KFD
> approach. And the long term plan is to get rid of such device driver specific
> interface which re-implement existing functionality just differently.

Recapping, this series is not adding it back. The debugger connection
is a separate FD from the DRM one, with separate IOCTL set. We don't allow
the DRM FD any new operations based on ptrace is attached or not. We
don't ever do that check even.

We only restrict the opening of the debugger connection to given PID with
ptrace_may_access check for now. That can be changed to something else,
if necessary.

> So you need to have a really really good explanation why the eudebug interface
> is actually necessary.

TL;DR The main point is to decouple the debugging of the EU workloads from the
debugging of the CPU process. This avoids the interference with the CPU process with
parasitic thread injection. Further this also allows generating a core dump
without any GDB connected. There are also many other smaller pros/cons
which can be discussed but for the context of this patch, this is the
main one.

So unlike parasitic thread injection, we don't unlock any special IOCTL for
the process under debug to be performed by the parasitic thread, but we
allow the minimal set of operations to be performed by GDB as if those were
done on the EUs themselves.

One can think of it like the minimal subset of ptrace but for EU threads,
not the CPU threads. And thus, building on this it's possible to extend
the core kernel generated core dumps with DRM specific extension which
would contain the EU thread/memory dump.

Regards, Joonas

> 
> Regards,
> Christian.
> 
> 
> 
>     Matt
> 
>     [3] https://patchwork.freedesktop.org/patch/622520/?series=140200&rev=6
> 
> 
>         Regards,
>         Christian.
> 
> 
>             Matt
> 
> 
>                 Regards,
>                 Christian.
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-11  8:00                         ` Joonas Lahtinen
@ 2024-11-11 10:10                           ` Simona Vetter
  2024-11-11 11:34                             ` Christian König
  2024-11-11 11:27                           ` Christian König
  1 sibling, 1 reply; 56+ messages in thread
From: Simona Vetter @ 2024-11-11 10:10 UTC (permalink / raw)
  To: Joonas Lahtinen
  Cc: Christian König, Matthew Brost, Christian König,
	Rodrigo Vivi, Huang Rui, intel-xe, dri-devel, matthew.auld,
	David Airlie, Simona Vetter

On Mon, Nov 11, 2024 at 10:00:17AM +0200, Joonas Lahtinen wrote:
> Back from some time off and will try to answer below.
> 
> Adding Dave and Sima as this topic has been previously discussed to some
> extent and will be good to reach common understanding about what the
> series is trying to do and what is the difference to the AMD debugging
> model.

I chatted about this thread a bit on irc with folks, and I think an
orthogonal issue is the question, what should be in ttm-utils? I've asked
Matt to type up a DOC patch once we have some consensus, since imo the
somewhat lackluster documentation situation for ttm is also somewhat a
cause for these big threads on various different topics. Aside from the
fact that gpu memory management is just hard.

On the uapi/design aspect, I think this would serve well with a patch to
drm-uapi.rst that adds a debugging section? At least once we have some
rough consensus across drivers, and more importantly userspace in the form
of gdb upstream (at least I'm not aware of any other upstream debugger
patches, I think amd's rocm stuff is also gdb-only).

Some wash-up thoughts from me below, but consider them fairly irrelevant
since I think the main driver for these big questions here should be
gdb/userspace.

> Quoting Christian König (2024-11-07 11:44:33)
> > Am 06.11.24 um 18:00 schrieb Matthew Brost:
> > 
> >     [SNIP]
> > 
> >     This is not a generic interface that anyone can freely access. The same
> >     permissions used by ptrace are checked when opening such an interface.
> >     See [1] [2].
> > 
> >     [1] https://patchwork.freedesktop.org/patch/617470/?series=136572&rev=2
> >     [2] https://patchwork.freedesktop.org/patch/617471/?series=136572&rev=2
> > 
> > 
> > Thanks a lot for those pointers, that is exactly what I was looking for.
> > 
> > And yeah, it is what I feared. You are re-implementing existing functionality,
> > but see below.
> 
> Could you elaborate on what this "existing functionality" exactly is?
> I do not think this functionality exists at this time.
> 
> The EU debugging architecture for Xe specifically avoids the need for GDB
> to attach with ptrace to the CPU process or interfere with the CPU process for
> the debugging via parasitic threads or so.
> 
> Debugger connection is opened to the DRM driver for given PID (which uses the
> ptrace may access check for now) after which the all DRM client of that
> PID are exposed to the debugger process.
> 
> What we want to expose via that debugger connection is the ability for GDB to
> read/write the different GPU VM address spaces (ppGTT for Intel GPUs) just like
> the EU threads would see them. Note that the layout of the ppGTT is
> completely up to the userspace driver to setup and is mostly only partially
> equal to the CPU address space.
> 
> Specifically as part of reading/writing the ppGTT for debugging purposes,
> there are deep flushes needed: for example flushing instruction cache
> when adding/removing breakpoints.
> 
> Maybe that will explain the background. I elaborate on this at the end some more.
> 
> >             kmap/vmap are used everywhere in the DRM subsystem to access BOs, so I’m
> >             failing to see the problem with adding a simple helper based on existing
> >             code.
> > 
> >         What#s possible and often done is to do kmap/vmap if you need to implement a
> >         CPU copy for scanout for example or for copying/validating command buffers.
> >         But that usually requires accessing the whole BO and has separate security
> >         checks.
> > 
> >         When you want to access only a few bytes of a BO that sounds massively like
> >         a peek/poke like interface and we have already rejected that more than once.
> >         There even used to be standardized GEM IOCTLs for that which have been
> >         removed by now.
> 
> Referring to the explanation at top: These IOCTL are not for the debugging target
> process to issue. The peek/poke interface is specifically for GDB only
> to facilitate the emulation of memory reads/writes on the GPU address
> space as they were done by EUs themselves. And to recap: for modifying
> instructions for example (add/remove breakpoint), extra level of cache flushing is
> needed which is not available to regular userspace.
> 
> I specifically discussed with Sima on the difference before moving forward with this
> design originally. If something has changed since then, I'm of course happy to rediscuss.
> 
> However, if this code can't be added, not sure how we would ever be able
> to implement core dumps for GPU threads/memory?
> 
> >         If you need to access BOs which are placed in not CPU accessible memory then
> >         implement the access callback for ptrace, see amdgpu_ttm_access_memory for
> >         an example how to do this.
> 
> As also mentioned above, we don't work via ptrace at all when it comes
> to debugging the EUs. The only thing used for now is the ptrace_may_access to
> implement similar access restrictions as ptrace has. This can be changed
> to something else if needed.
> 
> >     Ptrace access via vm_operations_struct.access → ttm_bo_vm_access.
> > 
> >     This series renames ttm_bo_vm_access to ttm_bo_access, with no code changes.
> > 
> >     The above function accesses a BO via kmap if it is in SYSTEM / TT,
> >     which is existing code.
> > 
> >     This function is only exposed to user space via ptrace permissions.
> 
> Maybe this sentence is what caused the confusion.
> 
> Userspace is never exposed with peek/poke interface, only the debugger
> connection which is its own FD.
> 
> >     In this series, we implement a function [3] similar to
> >     amdgpu_ttm_access_memory for the TTM vfunc access_memory. What is
> >     missing is non-visible CPU memory access, similar to
> >     amdgpu_ttm_access_memory_sdma. This will be addressed in a follow-up and
> >     was omitted in this series given its complexity.
> > 
> >     So, this looks more or less identical to AMD's ptrace implementation,
> >     but in GPU address space. Again, I fail to see what the problem is here.
> >     What am I missing?
> > 
> > 
> > The main question is why can't you use the existing interfaces directly?
> 
> We're not working on the CPU address space or BOs. We're working
> strictly on the GPU address space as would be seen by an EU thread if it
> accessed address X.
> 
> > Additional to the peek/poke interface of ptrace Linux has the pidfd_getfd
> > system call, see here https://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
> > 
> > The pidfd_getfd() allows to dup() the render node file descriptor into your gdb
> > process. That in turn gives you all the access you need from gdb, including
> > mapping BOs and command submission on behalf of the application.
> 
> We're not operating on the CPU address space nor are we operating on BOs
> (there is no concept of BO in the EU debug interface). Each VMA in the VM
> could come from anywhere, only the start address and size matter. And
> neither do we need to interfere with the command submission of the
> process under debug.
> 
> > As far as I can see that allows for the same functionality as the eudebug
> > interface, just without any driver specific code messing with ptrace
> > permissions and peek/poke interfaces.
> > 
> > So the question is still why do you need the whole eudebug interface in the
> > first place? I might be missing something, but that seems to be superfluous
> > from a high level view.
> 
> Recapping from above. It is to allow the debugging of EU threads per DRM
> client, completely independent of the CPU process. If ptrace_may_acces
> is the sore point, we could consider other permission checks, too. There
> is no other connection to ptrace in this architecture as single
> permission check to know if PID is fair game to access by debugger
> process.
> 
> Why no parasitic thread or ptrace: Going forward, binding the EU debugging to
> the DRM client would also pave way for being able to extend core kernel generated
> core dump with each DRM client's EU thread/memory dump. We have similar
> feature called "Offline core dump" enabled in the downstream public
> trees for i915, where we currently attach the EU thread dump to i915 error state
> and then later combine i915 error state with CPU core dump file with a
> tool.
> 
> This is relatively little amount of extra code, as this baseline series
> already introduces GDB the ability to perform the necessary actions.
> It's just the matter of kernel driver calling: "stop all threads", then
> copying the memory map and memory contents for GPU threads, just like is
> done for CPU threads.
> 
> With parasitic thread injection, not sure if there is such way forward,
> as it would seem to require to inject quite abit more logic to core kernel?
> 
> > It's true that the AMD KFD part has still similar functionality, but that is
> > because of the broken KFD design of tying driver state to the CPU process
> > (which makes it inaccessible for gdb even with imported render node fd).
> > 
> > Both Sima and I (and partially Dave as well) have pushed back on the KFD
> > approach. And the long term plan is to get rid of such device driver specific
> > interface which re-implement existing functionality just differently.
> 
> Recapping, this series is not adding it back. The debugger connection
> is a separate FD from the DRM one, with separate IOCTL set. We don't allow
> the DRM FD any new operations based on ptrace is attached or not. We
> don't ever do that check even.
> 
> We only restrict the opening of the debugger connection to given PID with
> ptrace_may_access check for now. That can be changed to something else,
> if necessary.

Yeah I think unnecessarily tying gpu processes to cpu processes is a bad
thing, least because even today all the svm discussions we have still hit
clear use-cases, where a 1:1 match is not wanted (like multiple gpu svm
sections with offsets). Not even speaking of all the gpu usecases where
the gpu vm space is still entirely independent of the cpu side.

So that's why I think this entirely separate approach looks like the right
one, with ptrace_may_access as the access control check to make sure we
match ptrace on the cpu side.

But there's very obviously a bikeshed to be had on what the actual uapi
should look like, especially how gdb opens up a gpu debug access fd. But I
also think that's not much on drm to decide, but whatever gdb wants. And
then we aim for some consistency on that lookup/access control part
(ideally, I might be missing some reasons why this is a bad idea) across
drm drivers.

> > So you need to have a really really good explanation why the eudebug interface
> > is actually necessary.
> 
> TL;DR The main point is to decouple the debugging of the EU workloads from the
> debugging of the CPU process. This avoids the interference with the CPU process with
> parasitic thread injection. Further this also allows generating a core dump
> without any GDB connected. There are also many other smaller pros/cons
> which can be discussed but for the context of this patch, this is the
> main one.
> 
> So unlike parasitic thread injection, we don't unlock any special IOCTL for
> the process under debug to be performed by the parasitic thread, but we
> allow the minimal set of operations to be performed by GDB as if those were
> done on the EUs themselves.
> 
> One can think of it like the minimal subset of ptrace but for EU threads,
> not the CPU threads. And thus, building on this it's possible to extend
> the core kernel generated core dumps with DRM specific extension which
> would contain the EU thread/memory dump.

It might be good to document (in that debugging doc patch probably) why
thread injection is not a great option, and why the tradeoffs for
debugging are different than for for checkpoint/restore, where with CRIU
we landed on doing most of this in userspace, and often requiring
injection threads to make it all work.

Cheers, Sima

> 
> Regards, Joonas
> 
> > 
> > Regards,
> > Christian.
> > 
> > 
> > 
> >     Matt
> > 
> >     [3] https://patchwork.freedesktop.org/patch/622520/?series=140200&rev=6
> > 
> > 
> >         Regards,
> >         Christian.
> > 
> > 
> >             Matt
> > 
> > 
> >                 Regards,
> >                 Christian.
> >

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-11  8:00                         ` Joonas Lahtinen
  2024-11-11 10:10                           ` Simona Vetter
@ 2024-11-11 11:27                           ` Christian König
  1 sibling, 0 replies; 56+ messages in thread
From: Christian König @ 2024-11-11 11:27 UTC (permalink / raw)
  To: Joonas Lahtinen, David Airlie, Simona Vetter
  Cc: Christian König, Matthew Brost, Rodrigo Vivi, Huang Rui,
	intel-xe, dri-devel, matthew.auld

[-- Attachment #1: Type: text/plain, Size: 10303 bytes --]

Hi Jonas,

Am 11.11.24 um 09:00 schrieb Joonas Lahtinen:
> Back from some time off and will try to answer below.

welcome back, good to have the designer of this at hand.

> Adding Dave and Sima as this topic has been previously discussed to some
> extent and will be good to reach common understanding about what the
> series is trying to do and what is the difference to the AMD debugging
> model.

Yeah, I was already wondering why that wasn't issued before.

> Quoting Christian König (2024-11-07 11:44:33)
>> Am 06.11.24 um 18:00 schrieb Matthew Brost:
>>
>>      [SNIP]
>>
>>      This is not a generic interface that anyone can freely access. The same
>>      permissions used by ptrace are checked when opening such an interface.
>>      See [1] [2].
>>
>>      [1]https://patchwork.freedesktop.org/patch/617470/?series=136572&rev=2
>>      [2]https://patchwork.freedesktop.org/patch/617471/?series=136572&rev=2
>>
>>
>> Thanks a lot for those pointers, that is exactly what I was looking for.
>>
>> And yeah, it is what I feared. You are re-implementing existing functionality,
>> but see below.
> Could you elaborate on what this "existing functionality" exactly is?
> I do not think this functionality exists at this time.

You can get the exact same functionality by requesting the render FD 
from the debugged process.

This also doesn't cause any security concerns since it uses the existing 
systemcall interfaces, especially see pidfd_getfd() and fdinfo for 
reference.

> The EU debugging architecture for Xe specifically avoids the need for GDB
> to attach with ptrace to the CPU process or interfere with the CPU process for
> the debugging via parasitic threads or so.

I can understand why you don't want to use parsitic threads, but why 
don't you want to attach with GDB to the process?

I mean you somehow need to prevent that the debugging information you 
try to gather or modify change while you access them.

> Debugger connection is opened to the DRM driver for given PID (which uses the
> ptrace may access check for now) after which the all DRM client of that
> PID are exposed to the debugger process.

That sounds extremely questionable and just re-implements existing 
functionality as far as I can see.

The fdinfo file under proc already provides the necessary information 
which file render nodes a pid uses and the pidfd_getfd() system call 
then gives you access to those render node file descriptors.

Why do you need that as separate and especially driver specific 
functionality?

> What we want to expose via that debugger connection is the ability for GDB to
> read/write the different GPU VM address spaces (ppGTT for Intel GPUs) just like
> the EU threads would see them. Note that the layout of the ppGTT is
> completely up to the userspace driver to setup and is mostly only partially
> equal to the CPU address space.
>
> Specifically as part of reading/writing the ppGTT for debugging purposes,
> there are deep flushes needed: for example flushing instruction cache
> when adding/removing breakpoints.

Is that not something you can do through the render node the PID uses as 
well?

If yes I think it would still be much more cleaner to expose that as 
IOCTL on the render node.

> Maybe that will explain the background. I elaborate on this at the end some more.
>
>>              kmap/vmap are used everywhere in the DRM subsystem to access BOs, so I’m
>>              failing to see the problem with adding a simple helper based on existing
>>              code.
>>
>>          What#s possible and often done is to do kmap/vmap if you need to implement a
>>          CPU copy for scanout for example or for copying/validating command buffers.
>>          But that usually requires accessing the whole BO and has separate security
>>          checks.
>>
>>          When you want to access only a few bytes of a BO that sounds massively like
>>          a peek/poke like interface and we have already rejected that more than once.
>>          There even used to be standardized GEM IOCTLs for that which have been
>>          removed by now.
> Referring to the explanation at top: These IOCTL are not for the debugging target
> process to issue. The peek/poke interface is specifically for GDB only
> to facilitate the emulation of memory reads/writes on the GPU address
> space as they were done by EUs themselves. And to recap: for modifying
> instructions for example (add/remove breakpoint), extra level of cache flushing is
> needed which is not available to regular userspace.
>
> I specifically discussed with Sima on the difference before moving forward with this
> design originally. If something has changed since then, I'm of course happy to rediscuss.

Do you have pointers to this discussion?

> However, if this code can't be added, not sure how we would ever be able
> to implement core dumps for GPU threads/memory?

Exactly that's what I tried to point out before. Use cases like core 
dumps or even CPU copies are valid use cases.

We do that inside AMDGPU as well or at least have plans for it, but we 
already figured out that you can't use the TTM interfaces for that.

When you want to do a core dump the GPU is usually stuck executing and 
when you try to call kmap/vmap it is possible that those calls wait for 
the stuck GPU to finish whatever it is executing.
That's why drivers need to use hardware specific approaches to access 
data for crash dumps.
[SNIP]
>> As far as I can see that allows for the same functionality as the eudebug
>> interface, just without any driver specific code messing with ptrace
>> permissions and peek/poke interfaces.
>>
>> So the question is still why do you need the whole eudebug interface in the
>> first place? I might be missing something, but that seems to be superfluous
>> from a high level view.
> Recapping from above. It is to allow the debugging of EU threads per DRM
> client, completely independent of the CPU process. If ptrace_may_acces
> is the sore point, we could consider other permission checks, too. There
> is no other connection to ptrace in this architecture as single
> permission check to know if PID is fair game to access by debugger
> process.

I would rather say that you try to debug completely independent of the 
CPU process is a really bad idea.

> Why no parasitic thread or ptrace: Going forward, binding the EU debugging to
> the DRM client would also pave way for being able to extend core kernel generated
> core dump with each DRM client's EU thread/memory dump. We have similar
> feature called "Offline core dump" enabled in the downstream public
> trees for i915, where we currently attach the EU thread dump to i915 error state
> and then later combine i915 error state with CPU core dump file with a
> tool.
>
> This is relatively little amount of extra code, as this baseline series
> already introduces GDB the ability to perform the necessary actions.
> It's just the matter of kernel driver calling: "stop all threads",

OH! Wait a second, you do WHAT? How do you guarantee dma_fence forward 
progress in that case?

See here: 
https://www.kernel.org/doc/html/latest/driver-api/dma-buf.html#indefinite-dma-fences

[SNIP]
>> So you need to have a really really good explanation why the eudebug interface
>> is actually necessary.
> TL;DR The main point is to decouple the debugging of the EU workloads from the
> debugging of the CPU process. This avoids the interference with the CPU process with
> parasitic thread injection. Further this also allows generating a core dump
> without any GDB connected. There are also many other smaller pros/cons
> which can be discussed but for the context of this patch, this is the
> main one.
>
> So unlike parasitic thread injection, we don't unlock any special IOCTL for
> the process under debug to be performed by the parasitic thread, but we
> allow the minimal set of operations to be performed by GDB as if those were
> done on the EUs themselves.
>
> One can think of it like the minimal subset of ptrace but for EU threads,
> not the CPU threads. And thus, building on this it's possible to extend
> the core kernel generated core dumps with DRM specific extension which
> would contain the EU thread/memory dump.

I can understand that you don't want to use complicated and hard to get 
right approaches like parasitic debugging threads, but this should also 
be absolutely not necessary.

The problem is that when you completely avoid ptrace and the existing 
interface you also have to implement a lot of stuff which is already 
more or less there. In other words debuggers like gdb already have the 
ability to interact with device drivers through their file descriptors. 
And that includes all IOCTLs, mmap() as well as things like command 
submission etc...

It could be that you need some addition IOCTL, e.g. like flushing caches 
etc..., but you certainly don't need a separate file descriptor gated by 
exporting ptrace access check functions. That's a really questionable 
design.


But taking a step back: When you stop GPU execution and insert break 
points you need to guarantee that this will never affect any dma_fence. 
Otherwise the core memory management can run into a deadlock.

Neither the preemption fence XE uses for it's threads nor the hardware 
fence used for end of submission indication can be delayed while things 
like a core dump is underway. That's why you also can't fully core dump 
in the case of a GPU hang.

What is possible is that you wait for the XE preemption fence to signal 
(which AFAIK is implemented XE internally as stopping all threads), but 
skimming over the code this absolutely doesn't seem what you do.

So at least of hand that looks like a classic indefinite DMA fence 
problem to me which will get you a whale big NAK from Sima and Dave.

While for the peek/poke interface is maybe a bit ugly, but more or less 
doable, stopping the GPU without signaling the dma_fences is really 
*not* something you can do.

Regards,
Christian.




>
> Regards, Joonas
>
>> Regards,
>> Christian.
>>
>>
>>
>>      Matt
>>
>>      [3]https://patchwork.freedesktop.org/patch/622520/?series=140200&rev=6
>>
>>
>>          Regards,
>>          Christian.
>>
>>
>>              Matt
>>
>>
>>                  Regards,
>>                  Christian.
>>

[-- Attachment #2: Type: text/html, Size: 14345 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-11 10:10                           ` Simona Vetter
@ 2024-11-11 11:34                             ` Christian König
  2024-11-11 14:00                               ` Joonas Lahtinen
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2024-11-11 11:34 UTC (permalink / raw)
  To: Simona Vetter, Joonas Lahtinen
  Cc: Matthew Brost, Christian König, Rodrigo Vivi, Huang Rui,
	intel-xe, dri-devel, matthew.auld, David Airlie, Simona Vetter

Am 11.11.24 um 11:10 schrieb Simona Vetter:
> On Mon, Nov 11, 2024 at 10:00:17AM +0200, Joonas Lahtinen wrote:
>> Back from some time off and will try to answer below.
>>
>> Adding Dave and Sima as this topic has been previously discussed to some
>> extent and will be good to reach common understanding about what the
>> series is trying to do and what is the difference to the AMD debugging
>> model.
> I chatted about this thread a bit on irc with folks, and I think an
> orthogonal issue is the question, what should be in ttm-utils? I've asked
> Matt to type up a DOC patch once we have some consensus, since imo the
> somewhat lackluster documentation situation for ttm is also somewhat a
> cause for these big threads on various different topics. Aside from the
> fact that gpu memory management is just hard.
>
> On the uapi/design aspect, I think this would serve well with a patch to
> drm-uapi.rst that adds a debugging section? At least once we have some
> rough consensus across drivers, and more importantly userspace in the form
> of gdb upstream (at least I'm not aware of any other upstream debugger
> patches, I think amd's rocm stuff is also gdb-only).

Yeah that seems to be a really good idea. Similar design ideas came up 
AMD internally as well but where dropped after pointing people to 
pidfd_getfd().

But the bigger problem seems to be that the design doesn't seems to take 
the dma_fence requirements into account.

In other words attaching gdb to a pid seems to stop the GPU thread of 
this pid without waiting for the XE preemption nor end of operation fence.

I mean if the GPU threads are preempted that could work, but yeah not 
like this :)

Regards,
Christian.

>
> Some wash-up thoughts from me below, but consider them fairly irrelevant
> since I think the main driver for these big questions here should be
> gdb/userspace.
>
>> Quoting Christian König (2024-11-07 11:44:33)
>>> Am 06.11.24 um 18:00 schrieb Matthew Brost:
>>>
>>>      [SNIP]
>>>
>>>      This is not a generic interface that anyone can freely access. The same
>>>      permissions used by ptrace are checked when opening such an interface.
>>>      See [1] [2].
>>>
>>>      [1] https://patchwork.freedesktop.org/patch/617470/?series=136572&rev=2
>>>      [2] https://patchwork.freedesktop.org/patch/617471/?series=136572&rev=2
>>>
>>>
>>> Thanks a lot for those pointers, that is exactly what I was looking for.
>>>
>>> And yeah, it is what I feared. You are re-implementing existing functionality,
>>> but see below.
>> Could you elaborate on what this "existing functionality" exactly is?
>> I do not think this functionality exists at this time.
>>
>> The EU debugging architecture for Xe specifically avoids the need for GDB
>> to attach with ptrace to the CPU process or interfere with the CPU process for
>> the debugging via parasitic threads or so.
>>
>> Debugger connection is opened to the DRM driver for given PID (which uses the
>> ptrace may access check for now) after which the all DRM client of that
>> PID are exposed to the debugger process.
>>
>> What we want to expose via that debugger connection is the ability for GDB to
>> read/write the different GPU VM address spaces (ppGTT for Intel GPUs) just like
>> the EU threads would see them. Note that the layout of the ppGTT is
>> completely up to the userspace driver to setup and is mostly only partially
>> equal to the CPU address space.
>>
>> Specifically as part of reading/writing the ppGTT for debugging purposes,
>> there are deep flushes needed: for example flushing instruction cache
>> when adding/removing breakpoints.
>>
>> Maybe that will explain the background. I elaborate on this at the end some more.
>>
>>>              kmap/vmap are used everywhere in the DRM subsystem to access BOs, so I’m
>>>              failing to see the problem with adding a simple helper based on existing
>>>              code.
>>>
>>>          What#s possible and often done is to do kmap/vmap if you need to implement a
>>>          CPU copy for scanout for example or for copying/validating command buffers.
>>>          But that usually requires accessing the whole BO and has separate security
>>>          checks.
>>>
>>>          When you want to access only a few bytes of a BO that sounds massively like
>>>          a peek/poke like interface and we have already rejected that more than once.
>>>          There even used to be standardized GEM IOCTLs for that which have been
>>>          removed by now.
>> Referring to the explanation at top: These IOCTL are not for the debugging target
>> process to issue. The peek/poke interface is specifically for GDB only
>> to facilitate the emulation of memory reads/writes on the GPU address
>> space as they were done by EUs themselves. And to recap: for modifying
>> instructions for example (add/remove breakpoint), extra level of cache flushing is
>> needed which is not available to regular userspace.
>>
>> I specifically discussed with Sima on the difference before moving forward with this
>> design originally. If something has changed since then, I'm of course happy to rediscuss.
>>
>> However, if this code can't be added, not sure how we would ever be able
>> to implement core dumps for GPU threads/memory?
>>
>>>          If you need to access BOs which are placed in not CPU accessible memory then
>>>          implement the access callback for ptrace, see amdgpu_ttm_access_memory for
>>>          an example how to do this.
>> As also mentioned above, we don't work via ptrace at all when it comes
>> to debugging the EUs. The only thing used for now is the ptrace_may_access to
>> implement similar access restrictions as ptrace has. This can be changed
>> to something else if needed.
>>
>>>      Ptrace access via vm_operations_struct.access → ttm_bo_vm_access.
>>>
>>>      This series renames ttm_bo_vm_access to ttm_bo_access, with no code changes.
>>>
>>>      The above function accesses a BO via kmap if it is in SYSTEM / TT,
>>>      which is existing code.
>>>
>>>      This function is only exposed to user space via ptrace permissions.
>> Maybe this sentence is what caused the confusion.
>>
>> Userspace is never exposed with peek/poke interface, only the debugger
>> connection which is its own FD.
>>
>>>      In this series, we implement a function [3] similar to
>>>      amdgpu_ttm_access_memory for the TTM vfunc access_memory. What is
>>>      missing is non-visible CPU memory access, similar to
>>>      amdgpu_ttm_access_memory_sdma. This will be addressed in a follow-up and
>>>      was omitted in this series given its complexity.
>>>
>>>      So, this looks more or less identical to AMD's ptrace implementation,
>>>      but in GPU address space. Again, I fail to see what the problem is here.
>>>      What am I missing?
>>>
>>>
>>> The main question is why can't you use the existing interfaces directly?
>> We're not working on the CPU address space or BOs. We're working
>> strictly on the GPU address space as would be seen by an EU thread if it
>> accessed address X.
>>
>>> Additional to the peek/poke interface of ptrace Linux has the pidfd_getfd
>>> system call, see here https://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
>>>
>>> The pidfd_getfd() allows to dup() the render node file descriptor into your gdb
>>> process. That in turn gives you all the access you need from gdb, including
>>> mapping BOs and command submission on behalf of the application.
>> We're not operating on the CPU address space nor are we operating on BOs
>> (there is no concept of BO in the EU debug interface). Each VMA in the VM
>> could come from anywhere, only the start address and size matter. And
>> neither do we need to interfere with the command submission of the
>> process under debug.
>>
>>> As far as I can see that allows for the same functionality as the eudebug
>>> interface, just without any driver specific code messing with ptrace
>>> permissions and peek/poke interfaces.
>>>
>>> So the question is still why do you need the whole eudebug interface in the
>>> first place? I might be missing something, but that seems to be superfluous
>>> from a high level view.
>> Recapping from above. It is to allow the debugging of EU threads per DRM
>> client, completely independent of the CPU process. If ptrace_may_acces
>> is the sore point, we could consider other permission checks, too. There
>> is no other connection to ptrace in this architecture as single
>> permission check to know if PID is fair game to access by debugger
>> process.
>>
>> Why no parasitic thread or ptrace: Going forward, binding the EU debugging to
>> the DRM client would also pave way for being able to extend core kernel generated
>> core dump with each DRM client's EU thread/memory dump. We have similar
>> feature called "Offline core dump" enabled in the downstream public
>> trees for i915, where we currently attach the EU thread dump to i915 error state
>> and then later combine i915 error state with CPU core dump file with a
>> tool.
>>
>> This is relatively little amount of extra code, as this baseline series
>> already introduces GDB the ability to perform the necessary actions.
>> It's just the matter of kernel driver calling: "stop all threads", then
>> copying the memory map and memory contents for GPU threads, just like is
>> done for CPU threads.
>>
>> With parasitic thread injection, not sure if there is such way forward,
>> as it would seem to require to inject quite abit more logic to core kernel?
>>
>>> It's true that the AMD KFD part has still similar functionality, but that is
>>> because of the broken KFD design of tying driver state to the CPU process
>>> (which makes it inaccessible for gdb even with imported render node fd).
>>>
>>> Both Sima and I (and partially Dave as well) have pushed back on the KFD
>>> approach. And the long term plan is to get rid of such device driver specific
>>> interface which re-implement existing functionality just differently.
>> Recapping, this series is not adding it back. The debugger connection
>> is a separate FD from the DRM one, with separate IOCTL set. We don't allow
>> the DRM FD any new operations based on ptrace is attached or not. We
>> don't ever do that check even.
>>
>> We only restrict the opening of the debugger connection to given PID with
>> ptrace_may_access check for now. That can be changed to something else,
>> if necessary.
> Yeah I think unnecessarily tying gpu processes to cpu processes is a bad
> thing, least because even today all the svm discussions we have still hit
> clear use-cases, where a 1:1 match is not wanted (like multiple gpu svm
> sections with offsets). Not even speaking of all the gpu usecases where
> the gpu vm space is still entirely independent of the cpu side.
>
> So that's why I think this entirely separate approach looks like the right
> one, with ptrace_may_access as the access control check to make sure we
> match ptrace on the cpu side.
>
> But there's very obviously a bikeshed to be had on what the actual uapi
> should look like, especially how gdb opens up a gpu debug access fd. But I
> also think that's not much on drm to decide, but whatever gdb wants. And
> then we aim for some consistency on that lookup/access control part
> (ideally, I might be missing some reasons why this is a bad idea) across
> drm drivers.
>
>>> So you need to have a really really good explanation why the eudebug interface
>>> is actually necessary.
>> TL;DR The main point is to decouple the debugging of the EU workloads from the
>> debugging of the CPU process. This avoids the interference with the CPU process with
>> parasitic thread injection. Further this also allows generating a core dump
>> without any GDB connected. There are also many other smaller pros/cons
>> which can be discussed but for the context of this patch, this is the
>> main one.
>>
>> So unlike parasitic thread injection, we don't unlock any special IOCTL for
>> the process under debug to be performed by the parasitic thread, but we
>> allow the minimal set of operations to be performed by GDB as if those were
>> done on the EUs themselves.
>>
>> One can think of it like the minimal subset of ptrace but for EU threads,
>> not the CPU threads. And thus, building on this it's possible to extend
>> the core kernel generated core dumps with DRM specific extension which
>> would contain the EU thread/memory dump.
> It might be good to document (in that debugging doc patch probably) why
> thread injection is not a great option, and why the tradeoffs for
> debugging are different than for for checkpoint/restore, where with CRIU
> we landed on doing most of this in userspace, and often requiring
> injection threads to make it all work.
>
> Cheers, Sima
>
>> Regards, Joonas
>>
>>> Regards,
>>> Christian.
>>>
>>>
>>>
>>>      Matt
>>>
>>>      [3] https://patchwork.freedesktop.org/patch/622520/?series=140200&rev=6
>>>
>>>
>>>          Regards,
>>>          Christian.
>>>
>>>
>>>              Matt
>>>
>>>
>>>                  Regards,
>>>                  Christian.
>>>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-11 11:34                             ` Christian König
@ 2024-11-11 14:00                               ` Joonas Lahtinen
  2024-11-11 15:54                                 ` Christian König
  2024-11-12  8:28                                 ` Simona Vetter
  0 siblings, 2 replies; 56+ messages in thread
From: Joonas Lahtinen @ 2024-11-11 14:00 UTC (permalink / raw)
  To: Christian König, Simona Vetter
  Cc: Matthew Brost, Christian König, Rodrigo Vivi, Huang Rui,
	intel-xe, dri-devel, matthew.auld, David Airlie, Simona Vetter

Quoting Christian König (2024-11-11 13:34:12)
> Am 11.11.24 um 11:10 schrieb Simona Vetter:
> > On Mon, Nov 11, 2024 at 10:00:17AM +0200, Joonas Lahtinen wrote:
> >> Back from some time off and will try to answer below.
> >>
> >> Adding Dave and Sima as this topic has been previously discussed to some
> >> extent and will be good to reach common understanding about what the
> >> series is trying to do and what is the difference to the AMD debugging
> >> model.
> > I chatted about this thread a bit on irc with folks, and I think an
> > orthogonal issue is the question, what should be in ttm-utils? I've asked
> > Matt to type up a DOC patch once we have some consensus, since imo the
> > somewhat lackluster documentation situation for ttm is also somewhat a
> > cause for these big threads on various different topics. Aside from the
> > fact that gpu memory management is just hard.
> >
> > On the uapi/design aspect, I think this would serve well with a patch to
> > drm-uapi.rst that adds a debugging section? At least once we have some
> > rough consensus across drivers, and more importantly userspace in the form
> > of gdb upstream (at least I'm not aware of any other upstream debugger
> > patches, I think amd's rocm stuff is also gdb-only).
> 
> Yeah that seems to be a really good idea. Similar design ideas came up 
> AMD internally as well but where dropped after pointing people to 
> pidfd_getfd().
> 
> But the bigger problem seems to be that the design doesn't seems to take 
> the dma_fence requirements into account.

Where would you deduce that?

We specifically limit the debugging to Long Running contexts which don't
depend on dma_fences.

> In other words attaching gdb to a pid seems to stop the GPU thread of 
> this pid without waiting for the XE preemption nor end of operation fence.
> 
> I mean if the GPU threads are preempted that could work, but yeah not 
> like this :)

For us, hitting a breakpoint inside the workload would always violate
any dma_fence timeout for the submitted workload, as the HW context can't
be switched out while in the breakpoint.

For any dma_fence workload the guarantee is that that it completes
within reasonable time after submission (guaranteed by the submitter). I
don't see how you could really allow interactive debugging of a
breakpoint under those restrictions anyway even if pre-emption was
supported as the workload would not finish in <10 seconds?

For i915 we did have the "pre-emptable but indefinitely long dma_fence workloads"
concept at one point and that was rejected after the lengthy discussion.

So I think only way to allow interactive debugging is to avoid the
dma_fences. Curious to hear if there are ideas for otherwise.

Regards, Joonas

> 
> Regards,
> Christian.
> 
> >
> > Some wash-up thoughts from me below, but consider them fairly irrelevant
> > since I think the main driver for these big questions here should be
> > gdb/userspace.
> >
> >> Quoting Christian König (2024-11-07 11:44:33)
> >>> Am 06.11.24 um 18:00 schrieb Matthew Brost:
> >>>
> >>>      [SNIP]
> >>>
> >>>      This is not a generic interface that anyone can freely access. The same
> >>>      permissions used by ptrace are checked when opening such an interface.
> >>>      See [1] [2].
> >>>
> >>>      [1] https://patchwork.freedesktop.org/patch/617470/?series=136572&rev=2
> >>>      [2] https://patchwork.freedesktop.org/patch/617471/?series=136572&rev=2
> >>>
> >>>
> >>> Thanks a lot for those pointers, that is exactly what I was looking for.
> >>>
> >>> And yeah, it is what I feared. You are re-implementing existing functionality,
> >>> but see below.
> >> Could you elaborate on what this "existing functionality" exactly is?
> >> I do not think this functionality exists at this time.
> >>
> >> The EU debugging architecture for Xe specifically avoids the need for GDB
> >> to attach with ptrace to the CPU process or interfere with the CPU process for
> >> the debugging via parasitic threads or so.
> >>
> >> Debugger connection is opened to the DRM driver for given PID (which uses the
> >> ptrace may access check for now) after which the all DRM client of that
> >> PID are exposed to the debugger process.
> >>
> >> What we want to expose via that debugger connection is the ability for GDB to
> >> read/write the different GPU VM address spaces (ppGTT for Intel GPUs) just like
> >> the EU threads would see them. Note that the layout of the ppGTT is
> >> completely up to the userspace driver to setup and is mostly only partially
> >> equal to the CPU address space.
> >>
> >> Specifically as part of reading/writing the ppGTT for debugging purposes,
> >> there are deep flushes needed: for example flushing instruction cache
> >> when adding/removing breakpoints.
> >>
> >> Maybe that will explain the background. I elaborate on this at the end some more.
> >>
> >>>              kmap/vmap are used everywhere in the DRM subsystem to access BOs, so I’m
> >>>              failing to see the problem with adding a simple helper based on existing
> >>>              code.
> >>>
> >>>          What#s possible and often done is to do kmap/vmap if you need to implement a
> >>>          CPU copy for scanout for example or for copying/validating command buffers.
> >>>          But that usually requires accessing the whole BO and has separate security
> >>>          checks.
> >>>
> >>>          When you want to access only a few bytes of a BO that sounds massively like
> >>>          a peek/poke like interface and we have already rejected that more than once.
> >>>          There even used to be standardized GEM IOCTLs for that which have been
> >>>          removed by now.
> >> Referring to the explanation at top: These IOCTL are not for the debugging target
> >> process to issue. The peek/poke interface is specifically for GDB only
> >> to facilitate the emulation of memory reads/writes on the GPU address
> >> space as they were done by EUs themselves. And to recap: for modifying
> >> instructions for example (add/remove breakpoint), extra level of cache flushing is
> >> needed which is not available to regular userspace.
> >>
> >> I specifically discussed with Sima on the difference before moving forward with this
> >> design originally. If something has changed since then, I'm of course happy to rediscuss.
> >>
> >> However, if this code can't be added, not sure how we would ever be able
> >> to implement core dumps for GPU threads/memory?
> >>
> >>>          If you need to access BOs which are placed in not CPU accessible memory then
> >>>          implement the access callback for ptrace, see amdgpu_ttm_access_memory for
> >>>          an example how to do this.
> >> As also mentioned above, we don't work via ptrace at all when it comes
> >> to debugging the EUs. The only thing used for now is the ptrace_may_access to
> >> implement similar access restrictions as ptrace has. This can be changed
> >> to something else if needed.
> >>
> >>>      Ptrace access via vm_operations_struct.access → ttm_bo_vm_access.
> >>>
> >>>      This series renames ttm_bo_vm_access to ttm_bo_access, with no code changes.
> >>>
> >>>      The above function accesses a BO via kmap if it is in SYSTEM / TT,
> >>>      which is existing code.
> >>>
> >>>      This function is only exposed to user space via ptrace permissions.
> >> Maybe this sentence is what caused the confusion.
> >>
> >> Userspace is never exposed with peek/poke interface, only the debugger
> >> connection which is its own FD.
> >>
> >>>      In this series, we implement a function [3] similar to
> >>>      amdgpu_ttm_access_memory for the TTM vfunc access_memory. What is
> >>>      missing is non-visible CPU memory access, similar to
> >>>      amdgpu_ttm_access_memory_sdma. This will be addressed in a follow-up and
> >>>      was omitted in this series given its complexity.
> >>>
> >>>      So, this looks more or less identical to AMD's ptrace implementation,
> >>>      but in GPU address space. Again, I fail to see what the problem is here.
> >>>      What am I missing?
> >>>
> >>>
> >>> The main question is why can't you use the existing interfaces directly?
> >> We're not working on the CPU address space or BOs. We're working
> >> strictly on the GPU address space as would be seen by an EU thread if it
> >> accessed address X.
> >>
> >>> Additional to the peek/poke interface of ptrace Linux has the pidfd_getfd
> >>> system call, see here https://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
> >>>
> >>> The pidfd_getfd() allows to dup() the render node file descriptor into your gdb
> >>> process. That in turn gives you all the access you need from gdb, including
> >>> mapping BOs and command submission on behalf of the application.
> >> We're not operating on the CPU address space nor are we operating on BOs
> >> (there is no concept of BO in the EU debug interface). Each VMA in the VM
> >> could come from anywhere, only the start address and size matter. And
> >> neither do we need to interfere with the command submission of the
> >> process under debug.
> >>
> >>> As far as I can see that allows for the same functionality as the eudebug
> >>> interface, just without any driver specific code messing with ptrace
> >>> permissions and peek/poke interfaces.
> >>>
> >>> So the question is still why do you need the whole eudebug interface in the
> >>> first place? I might be missing something, but that seems to be superfluous
> >>> from a high level view.
> >> Recapping from above. It is to allow the debugging of EU threads per DRM
> >> client, completely independent of the CPU process. If ptrace_may_acces
> >> is the sore point, we could consider other permission checks, too. There
> >> is no other connection to ptrace in this architecture as single
> >> permission check to know if PID is fair game to access by debugger
> >> process.
> >>
> >> Why no parasitic thread or ptrace: Going forward, binding the EU debugging to
> >> the DRM client would also pave way for being able to extend core kernel generated
> >> core dump with each DRM client's EU thread/memory dump. We have similar
> >> feature called "Offline core dump" enabled in the downstream public
> >> trees for i915, where we currently attach the EU thread dump to i915 error state
> >> and then later combine i915 error state with CPU core dump file with a
> >> tool.
> >>
> >> This is relatively little amount of extra code, as this baseline series
> >> already introduces GDB the ability to perform the necessary actions.
> >> It's just the matter of kernel driver calling: "stop all threads", then
> >> copying the memory map and memory contents for GPU threads, just like is
> >> done for CPU threads.
> >>
> >> With parasitic thread injection, not sure if there is such way forward,
> >> as it would seem to require to inject quite abit more logic to core kernel?
> >>
> >>> It's true that the AMD KFD part has still similar functionality, but that is
> >>> because of the broken KFD design of tying driver state to the CPU process
> >>> (which makes it inaccessible for gdb even with imported render node fd).
> >>>
> >>> Both Sima and I (and partially Dave as well) have pushed back on the KFD
> >>> approach. And the long term plan is to get rid of such device driver specific
> >>> interface which re-implement existing functionality just differently.
> >> Recapping, this series is not adding it back. The debugger connection
> >> is a separate FD from the DRM one, with separate IOCTL set. We don't allow
> >> the DRM FD any new operations based on ptrace is attached or not. We
> >> don't ever do that check even.
> >>
> >> We only restrict the opening of the debugger connection to given PID with
> >> ptrace_may_access check for now. That can be changed to something else,
> >> if necessary.
> > Yeah I think unnecessarily tying gpu processes to cpu processes is a bad
> > thing, least because even today all the svm discussions we have still hit
> > clear use-cases, where a 1:1 match is not wanted (like multiple gpu svm
> > sections with offsets). Not even speaking of all the gpu usecases where
> > the gpu vm space is still entirely independent of the cpu side.
> >
> > So that's why I think this entirely separate approach looks like the right
> > one, with ptrace_may_access as the access control check to make sure we
> > match ptrace on the cpu side.
> >
> > But there's very obviously a bikeshed to be had on what the actual uapi
> > should look like, especially how gdb opens up a gpu debug access fd. But I
> > also think that's not much on drm to decide, but whatever gdb wants. And
> > then we aim for some consistency on that lookup/access control part
> > (ideally, I might be missing some reasons why this is a bad idea) across
> > drm drivers.
> >
> >>> So you need to have a really really good explanation why the eudebug interface
> >>> is actually necessary.
> >> TL;DR The main point is to decouple the debugging of the EU workloads from the
> >> debugging of the CPU process. This avoids the interference with the CPU process with
> >> parasitic thread injection. Further this also allows generating a core dump
> >> without any GDB connected. There are also many other smaller pros/cons
> >> which can be discussed but for the context of this patch, this is the
> >> main one.
> >>
> >> So unlike parasitic thread injection, we don't unlock any special IOCTL for
> >> the process under debug to be performed by the parasitic thread, but we
> >> allow the minimal set of operations to be performed by GDB as if those were
> >> done on the EUs themselves.
> >>
> >> One can think of it like the minimal subset of ptrace but for EU threads,
> >> not the CPU threads. And thus, building on this it's possible to extend
> >> the core kernel generated core dumps with DRM specific extension which
> >> would contain the EU thread/memory dump.
> > It might be good to document (in that debugging doc patch probably) why
> > thread injection is not a great option, and why the tradeoffs for
> > debugging are different than for for checkpoint/restore, where with CRIU
> > we landed on doing most of this in userspace, and often requiring
> > injection threads to make it all work.
> >
> > Cheers, Sima
> >
> >> Regards, Joonas
> >>
> >>> Regards,
> >>> Christian.
> >>>
> >>>
> >>>
> >>>      Matt
> >>>
> >>>      [3] https://patchwork.freedesktop.org/patch/622520/?series=140200&rev=6
> >>>
> >>>
> >>>          Regards,
> >>>          Christian.
> >>>
> >>>
> >>>              Matt
> >>>
> >>>
> >>>                  Regards,
> >>>                  Christian.
> >>>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-11 14:00                               ` Joonas Lahtinen
@ 2024-11-11 15:54                                 ` Christian König
  2024-11-11 22:45                                   ` Matthew Brost
  2024-11-12  8:28                                 ` Simona Vetter
  1 sibling, 1 reply; 56+ messages in thread
From: Christian König @ 2024-11-11 15:54 UTC (permalink / raw)
  To: Joonas Lahtinen, Simona Vetter
  Cc: Matthew Brost, Christian König, Rodrigo Vivi, Huang Rui,
	intel-xe, dri-devel, matthew.auld, David Airlie, Simona Vetter

[-- Attachment #1: Type: text/plain, Size: 16442 bytes --]

Am 11.11.24 um 15:00 schrieb Joonas Lahtinen:
> Quoting Christian König (2024-11-11 13:34:12)
>> Am 11.11.24 um 11:10 schrieb Simona Vetter:
>>> On Mon, Nov 11, 2024 at 10:00:17AM +0200, Joonas Lahtinen wrote:
>>>> Back from some time off and will try to answer below.
>>>>
>>>> Adding Dave and Sima as this topic has been previously discussed to some
>>>> extent and will be good to reach common understanding about what the
>>>> series is trying to do and what is the difference to the AMD debugging
>>>> model.
>>> I chatted about this thread a bit on irc with folks, and I think an
>>> orthogonal issue is the question, what should be in ttm-utils? I've asked
>>> Matt to type up a DOC patch once we have some consensus, since imo the
>>> somewhat lackluster documentation situation for ttm is also somewhat a
>>> cause for these big threads on various different topics. Aside from the
>>> fact that gpu memory management is just hard.
>>>
>>> On the uapi/design aspect, I think this would serve well with a patch to
>>> drm-uapi.rst that adds a debugging section? At least once we have some
>>> rough consensus across drivers, and more importantly userspace in the form
>>> of gdb upstream (at least I'm not aware of any other upstream debugger
>>> patches, I think amd's rocm stuff is also gdb-only).
>> Yeah that seems to be a really good idea. Similar design ideas came up
>> AMD internally as well but where dropped after pointing people to
>> pidfd_getfd().
>>
>> But the bigger problem seems to be that the design doesn't seems to take
>> the dma_fence requirements into account.
> Where would you deduce that?

XE is based on a preemption fence based memory management.

> We specifically limit the debugging to Long Running contexts which don't
> depend on dma_fences.

That doesn't matter.

As long as you don't have a page fault (HMM) based memory management you 
still have that inter dependency and at least the public available XE 
code doesn't seem to have that.

>> In other words attaching gdb to a pid seems to stop the GPU thread of
>> this pid without waiting for the XE preemption nor end of operation fence.
>>
>> I mean if the GPU threads are preempted that could work, but yeah not
>> like this :)
> For us, hitting a breakpoint inside the workload would always violate
> any dma_fence timeout for the submitted workload, as the HW context can't
> be switched out while in the breakpoint.

That is clearly *not* something you can do without changing your memory 
management.

> For any dma_fence workload the guarantee is that that it completes
> within reasonable time after submission (guaranteed by the submitter). I
> don't see how you could really allow interactive debugging of a
> breakpoint under those restrictions anyway even if pre-emption was
> supported as the workload would not finish in <10 seconds?

Yeah that is the whole point, this is impossible as far as we know.

> For i915 we did have the "pre-emptable but indefinitely long dma_fence workloads"
> concept at one point and that was rejected after the lengthy discussion.

I mean I tried that more than once myself and we have multiple requests 
for this on the AMD side from customers. So far nobody came up with a 
solution which actually works correctly.

> So I think only way to allow interactive debugging is to avoid the
> dma_fences. Curious to hear if there are ideas for otherwise.

You need to guarantee somehow that the process is taken from the 
hardware so that the preemption fence can signal.

This means that a breakpoint or core dump doesn't halt GPU threads, but 
rather suspends them. E.g. all running wave data is collected into a 
state bag which can be restored later on.

I was under the impression that those long running compute threads do 
exactly that, but when the hardware can't switch out the GPU 
thread/process while in a break then that isn't the case.

As long as you don't find a way to avoid that this patch set is a pretty 
clear NAK from my side as DMA-buf and TTM maintainer.

What might work is to keep the submission on the hardware in the break 
state but forbid any memory access. This way you can signal your 
preemption fence even when the hardware isn't made available.

Before you continue XE setups a new pre-emption fence and makes sure 
that all page tables etc... are up to date.

Could be tricky to get this right if completion fence based submissions 
are mixed in as well, but that gives you at least a direction you could 
potentially go.

Regards,
Christian.

>
> Regards, Joonas
>
>> Regards,
>> Christian.
>>
>>> Some wash-up thoughts from me below, but consider them fairly irrelevant
>>> since I think the main driver for these big questions here should be
>>> gdb/userspace.
>>>
>>>> Quoting Christian König (2024-11-07 11:44:33)
>>>>> Am 06.11.24 um 18:00 schrieb Matthew Brost:
>>>>>
>>>>>       [SNIP]
>>>>>
>>>>>       This is not a generic interface that anyone can freely access. The same
>>>>>       permissions used by ptrace are checked when opening such an interface.
>>>>>       See [1] [2].
>>>>>
>>>>>       [1]https://patchwork.freedesktop.org/patch/617470/?series=136572&rev=2
>>>>>       [2]https://patchwork.freedesktop.org/patch/617471/?series=136572&rev=2
>>>>>
>>>>>
>>>>> Thanks a lot for those pointers, that is exactly what I was looking for.
>>>>>
>>>>> And yeah, it is what I feared. You are re-implementing existing functionality,
>>>>> but see below.
>>>> Could you elaborate on what this "existing functionality" exactly is?
>>>> I do not think this functionality exists at this time.
>>>>
>>>> The EU debugging architecture for Xe specifically avoids the need for GDB
>>>> to attach with ptrace to the CPU process or interfere with the CPU process for
>>>> the debugging via parasitic threads or so.
>>>>
>>>> Debugger connection is opened to the DRM driver for given PID (which uses the
>>>> ptrace may access check for now) after which the all DRM client of that
>>>> PID are exposed to the debugger process.
>>>>
>>>> What we want to expose via that debugger connection is the ability for GDB to
>>>> read/write the different GPU VM address spaces (ppGTT for Intel GPUs) just like
>>>> the EU threads would see them. Note that the layout of the ppGTT is
>>>> completely up to the userspace driver to setup and is mostly only partially
>>>> equal to the CPU address space.
>>>>
>>>> Specifically as part of reading/writing the ppGTT for debugging purposes,
>>>> there are deep flushes needed: for example flushing instruction cache
>>>> when adding/removing breakpoints.
>>>>
>>>> Maybe that will explain the background. I elaborate on this at the end some more.
>>>>
>>>>>               kmap/vmap are used everywhere in the DRM subsystem to access BOs, so I’m
>>>>>               failing to see the problem with adding a simple helper based on existing
>>>>>               code.
>>>>>
>>>>>           What#s possible and often done is to do kmap/vmap if you need to implement a
>>>>>           CPU copy for scanout for example or for copying/validating command buffers.
>>>>>           But that usually requires accessing the whole BO and has separate security
>>>>>           checks.
>>>>>
>>>>>           When you want to access only a few bytes of a BO that sounds massively like
>>>>>           a peek/poke like interface and we have already rejected that more than once.
>>>>>           There even used to be standardized GEM IOCTLs for that which have been
>>>>>           removed by now.
>>>> Referring to the explanation at top: These IOCTL are not for the debugging target
>>>> process to issue. The peek/poke interface is specifically for GDB only
>>>> to facilitate the emulation of memory reads/writes on the GPU address
>>>> space as they were done by EUs themselves. And to recap: for modifying
>>>> instructions for example (add/remove breakpoint), extra level of cache flushing is
>>>> needed which is not available to regular userspace.
>>>>
>>>> I specifically discussed with Sima on the difference before moving forward with this
>>>> design originally. If something has changed since then, I'm of course happy to rediscuss.
>>>>
>>>> However, if this code can't be added, not sure how we would ever be able
>>>> to implement core dumps for GPU threads/memory?
>>>>
>>>>>           If you need to access BOs which are placed in not CPU accessible memory then
>>>>>           implement the access callback for ptrace, see amdgpu_ttm_access_memory for
>>>>>           an example how to do this.
>>>> As also mentioned above, we don't work via ptrace at all when it comes
>>>> to debugging the EUs. The only thing used for now is the ptrace_may_access to
>>>> implement similar access restrictions as ptrace has. This can be changed
>>>> to something else if needed.
>>>>
>>>>>       Ptrace access via vm_operations_struct.access → ttm_bo_vm_access.
>>>>>
>>>>>       This series renames ttm_bo_vm_access to ttm_bo_access, with no code changes.
>>>>>
>>>>>       The above function accesses a BO via kmap if it is in SYSTEM / TT,
>>>>>       which is existing code.
>>>>>
>>>>>       This function is only exposed to user space via ptrace permissions.
>>>> Maybe this sentence is what caused the confusion.
>>>>
>>>> Userspace is never exposed with peek/poke interface, only the debugger
>>>> connection which is its own FD.
>>>>
>>>>>       In this series, we implement a function [3] similar to
>>>>>       amdgpu_ttm_access_memory for the TTM vfunc access_memory. What is
>>>>>       missing is non-visible CPU memory access, similar to
>>>>>       amdgpu_ttm_access_memory_sdma. This will be addressed in a follow-up and
>>>>>       was omitted in this series given its complexity.
>>>>>
>>>>>       So, this looks more or less identical to AMD's ptrace implementation,
>>>>>       but in GPU address space. Again, I fail to see what the problem is here.
>>>>>       What am I missing?
>>>>>
>>>>>
>>>>> The main question is why can't you use the existing interfaces directly?
>>>> We're not working on the CPU address space or BOs. We're working
>>>> strictly on the GPU address space as would be seen by an EU thread if it
>>>> accessed address X.
>>>>
>>>>> Additional to the peek/poke interface of ptrace Linux has the pidfd_getfd
>>>>> system call, see herehttps://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
>>>>>
>>>>> The pidfd_getfd() allows to dup() the render node file descriptor into your gdb
>>>>> process. That in turn gives you all the access you need from gdb, including
>>>>> mapping BOs and command submission on behalf of the application.
>>>> We're not operating on the CPU address space nor are we operating on BOs
>>>> (there is no concept of BO in the EU debug interface). Each VMA in the VM
>>>> could come from anywhere, only the start address and size matter. And
>>>> neither do we need to interfere with the command submission of the
>>>> process under debug.
>>>>
>>>>> As far as I can see that allows for the same functionality as the eudebug
>>>>> interface, just without any driver specific code messing with ptrace
>>>>> permissions and peek/poke interfaces.
>>>>>
>>>>> So the question is still why do you need the whole eudebug interface in the
>>>>> first place? I might be missing something, but that seems to be superfluous
>>>>> from a high level view.
>>>> Recapping from above. It is to allow the debugging of EU threads per DRM
>>>> client, completely independent of the CPU process. If ptrace_may_acces
>>>> is the sore point, we could consider other permission checks, too. There
>>>> is no other connection to ptrace in this architecture as single
>>>> permission check to know if PID is fair game to access by debugger
>>>> process.
>>>>
>>>> Why no parasitic thread or ptrace: Going forward, binding the EU debugging to
>>>> the DRM client would also pave way for being able to extend core kernel generated
>>>> core dump with each DRM client's EU thread/memory dump. We have similar
>>>> feature called "Offline core dump" enabled in the downstream public
>>>> trees for i915, where we currently attach the EU thread dump to i915 error state
>>>> and then later combine i915 error state with CPU core dump file with a
>>>> tool.
>>>>
>>>> This is relatively little amount of extra code, as this baseline series
>>>> already introduces GDB the ability to perform the necessary actions.
>>>> It's just the matter of kernel driver calling: "stop all threads", then
>>>> copying the memory map and memory contents for GPU threads, just like is
>>>> done for CPU threads.
>>>>
>>>> With parasitic thread injection, not sure if there is such way forward,
>>>> as it would seem to require to inject quite abit more logic to core kernel?
>>>>
>>>>> It's true that the AMD KFD part has still similar functionality, but that is
>>>>> because of the broken KFD design of tying driver state to the CPU process
>>>>> (which makes it inaccessible for gdb even with imported render node fd).
>>>>>
>>>>> Both Sima and I (and partially Dave as well) have pushed back on the KFD
>>>>> approach. And the long term plan is to get rid of such device driver specific
>>>>> interface which re-implement existing functionality just differently.
>>>> Recapping, this series is not adding it back. The debugger connection
>>>> is a separate FD from the DRM one, with separate IOCTL set. We don't allow
>>>> the DRM FD any new operations based on ptrace is attached or not. We
>>>> don't ever do that check even.
>>>>
>>>> We only restrict the opening of the debugger connection to given PID with
>>>> ptrace_may_access check for now. That can be changed to something else,
>>>> if necessary.
>>> Yeah I think unnecessarily tying gpu processes to cpu processes is a bad
>>> thing, least because even today all the svm discussions we have still hit
>>> clear use-cases, where a 1:1 match is not wanted (like multiple gpu svm
>>> sections with offsets). Not even speaking of all the gpu usecases where
>>> the gpu vm space is still entirely independent of the cpu side.
>>>
>>> So that's why I think this entirely separate approach looks like the right
>>> one, with ptrace_may_access as the access control check to make sure we
>>> match ptrace on the cpu side.
>>>
>>> But there's very obviously a bikeshed to be had on what the actual uapi
>>> should look like, especially how gdb opens up a gpu debug access fd. But I
>>> also think that's not much on drm to decide, but whatever gdb wants. And
>>> then we aim for some consistency on that lookup/access control part
>>> (ideally, I might be missing some reasons why this is a bad idea) across
>>> drm drivers.
>>>
>>>>> So you need to have a really really good explanation why the eudebug interface
>>>>> is actually necessary.
>>>> TL;DR The main point is to decouple the debugging of the EU workloads from the
>>>> debugging of the CPU process. This avoids the interference with the CPU process with
>>>> parasitic thread injection. Further this also allows generating a core dump
>>>> without any GDB connected. There are also many other smaller pros/cons
>>>> which can be discussed but for the context of this patch, this is the
>>>> main one.
>>>>
>>>> So unlike parasitic thread injection, we don't unlock any special IOCTL for
>>>> the process under debug to be performed by the parasitic thread, but we
>>>> allow the minimal set of operations to be performed by GDB as if those were
>>>> done on the EUs themselves.
>>>>
>>>> One can think of it like the minimal subset of ptrace but for EU threads,
>>>> not the CPU threads. And thus, building on this it's possible to extend
>>>> the core kernel generated core dumps with DRM specific extension which
>>>> would contain the EU thread/memory dump.
>>> It might be good to document (in that debugging doc patch probably) why
>>> thread injection is not a great option, and why the tradeoffs for
>>> debugging are different than for for checkpoint/restore, where with CRIU
>>> we landed on doing most of this in userspace, and often requiring
>>> injection threads to make it all work.
>>>
>>> Cheers, Sima
>>>
>>>> Regards, Joonas
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>
>>>>>
>>>>>       Matt
>>>>>
>>>>>       [3]https://patchwork.freedesktop.org/patch/622520/?series=140200&rev=6
>>>>>
>>>>>
>>>>>           Regards,
>>>>>           Christian.
>>>>>
>>>>>
>>>>>               Matt
>>>>>
>>>>>
>>>>>                   Regards,
>>>>>                   Christian.
>>>>>

[-- Attachment #2: Type: text/html, Size: 20274 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-11 15:54                                 ` Christian König
@ 2024-11-11 22:45                                   ` Matthew Brost
  2024-11-12  9:23                                     ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Matthew Brost @ 2024-11-11 22:45 UTC (permalink / raw)
  To: Christian König
  Cc: Joonas Lahtinen, Simona Vetter, Christian König,
	Rodrigo Vivi, Huang Rui, intel-xe, dri-devel, matthew.auld,
	David Airlie, Simona Vetter

On Mon, Nov 11, 2024 at 04:54:57PM +0100, Christian König wrote:
> Am 11.11.24 um 15:00 schrieb Joonas Lahtinen:
> > Quoting Christian König (2024-11-11 13:34:12)
> > > Am 11.11.24 um 11:10 schrieb Simona Vetter:
> > > > On Mon, Nov 11, 2024 at 10:00:17AM +0200, Joonas Lahtinen wrote:
> > > > > Back from some time off and will try to answer below.
> > > > > 
> > > > > Adding Dave and Sima as this topic has been previously discussed to some
> > > > > extent and will be good to reach common understanding about what the
> > > > > series is trying to do and what is the difference to the AMD debugging
> > > > > model.
> > > > I chatted about this thread a bit on irc with folks, and I think an
> > > > orthogonal issue is the question, what should be in ttm-utils? I've asked
> > > > Matt to type up a DOC patch once we have some consensus, since imo the
> > > > somewhat lackluster documentation situation for ttm is also somewhat a
> > > > cause for these big threads on various different topics. Aside from the
> > > > fact that gpu memory management is just hard.
> > > > 
> > > > On the uapi/design aspect, I think this would serve well with a patch to
> > > > drm-uapi.rst that adds a debugging section? At least once we have some
> > > > rough consensus across drivers, and more importantly userspace in the form
> > > > of gdb upstream (at least I'm not aware of any other upstream debugger
> > > > patches, I think amd's rocm stuff is also gdb-only).
> > > Yeah that seems to be a really good idea. Similar design ideas came up
> > > AMD internally as well but where dropped after pointing people to
> > > pidfd_getfd().
> > > 
> > > But the bigger problem seems to be that the design doesn't seems to take
> > > the dma_fence requirements into account.
> > Where would you deduce that?
> 
> XE is based on a preemption fence based memory management.
> 
> > We specifically limit the debugging to Long Running contexts which don't
> > depend on dma_fences.
> 
> That doesn't matter.
> 
> As long as you don't have a page fault (HMM) based memory management you
> still have that inter dependency and at least the public available XE code
> doesn't seem to have that.
> 
> > > In other words attaching gdb to a pid seems to stop the GPU thread of
> > > this pid without waiting for the XE preemption nor end of operation fence.
> > > 
> > > I mean if the GPU threads are preempted that could work, but yeah not
> > > like this :)
> > For us, hitting a breakpoint inside the workload would always violate
> > any dma_fence timeout for the submitted workload, as the HW context can't
> > be switched out while in the breakpoint.
> 
> That is clearly *not* something you can do without changing your memory
> management.
> 
> > For any dma_fence workload the guarantee is that that it completes
> > within reasonable time after submission (guaranteed by the submitter). I
> > don't see how you could really allow interactive debugging of a
> > breakpoint under those restrictions anyway even if pre-emption was
> > supported as the workload would not finish in <10 seconds?
> 
> Yeah that is the whole point, this is impossible as far as we know.
> 
> > For i915 we did have the "pre-emptable but indefinitely long dma_fence workloads"
> > concept at one point and that was rejected after the lengthy discussion.
> 
> I mean I tried that more than once myself and we have multiple requests for
> this on the AMD side from customers. So far nobody came up with a solution
> which actually works correctly.
> 
> > So I think only way to allow interactive debugging is to avoid the
> > dma_fences. Curious to hear if there are ideas for otherwise.
> 
> You need to guarantee somehow that the process is taken from the hardware so
> that the preemption fence can signal.
> 

Our preemption fences have this functionality.

A preemption fence issues a suspend execution command to the firmware. The
firmware, in turn, attempts to preempt the workload. If it doesn't respond
within a specified period, it resets the hardware queue, sends a message to KMD,
bans the software queue, and signals the preemption fence.

We provide even more protection than that. If, for some reason, the firmware
doesn't respond within a longer timeout period, the KMD performs a device reset,
ban the offending software queue(s), and will signal the preemption fences.

This flow remains the same whether a debugger is attached or, for example, a
user submits a 10-minute non-preemptable workload. In either case, other
processes are guaranteed to make forward progress.

The example above illustrates the memory oversubscription case, where two
processes are using 51% of the memory.

Another preemption scenario involves two processes sharing hardware resources.
Our firmware follows the same flow here. If an LR workload is using a hardware
resource and a DMA-fence workload is waiting, and if the LR workload doesn't
preempt the in a timely manner, the firmware issues a hardware reset, notifies
KMD, and bans the LR software queue. The DMA-fence workload then can make
forward progress

With the above in mind, this is why I say that if a user tries to run a game and
a non-preemptable LR workload, either oversubscribing memory or sharing hardware
resources, it is unlikely to work well. However, I don't think this is a common
use case. I would expect that when a debugger is open, it is typically by a
power user who knows how to disable other GPU tasks (e.g., by enabling software
rendering or using a machine without any display).

Given this, please to reconsider your position.

> This means that a breakpoint or core dump doesn't halt GPU threads, but
> rather suspends them. E.g. all running wave data is collected into a state
> bag which can be restored later on.
> 
> I was under the impression that those long running compute threads do
> exactly that, but when the hardware can't switch out the GPU thread/process
> while in a break then that isn't the case.
> 
> As long as you don't find a way to avoid that this patch set is a pretty
> clear NAK from my side as DMA-buf and TTM maintainer.
> 

I believe this is addressed above.

Matt

> What might work is to keep the submission on the hardware in the break state
> but forbid any memory access. This way you can signal your preemption fence
> even when the hardware isn't made available.
> 
> Before you continue XE setups a new pre-emption fence and makes sure that
> all page tables etc... are up to date.
> 
> Could be tricky to get this right if completion fence based submissions are
> mixed in as well, but that gives you at least a direction you could
> potentially go.
> 
> Regards,
> Christian.
> 
> > 
> > Regards, Joonas
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > Some wash-up thoughts from me below, but consider them fairly irrelevant
> > > > since I think the main driver for these big questions here should be
> > > > gdb/userspace.
> > > > 
> > > > > Quoting Christian König (2024-11-07 11:44:33)
> > > > > > Am 06.11.24 um 18:00 schrieb Matthew Brost:
> > > > > > 
> > > > > >       [SNIP]
> > > > > > 
> > > > > >       This is not a generic interface that anyone can freely access. The same
> > > > > >       permissions used by ptrace are checked when opening such an interface.
> > > > > >       See [1] [2].
> > > > > > 
> > > > > >       [1]https://patchwork.freedesktop.org/patch/617470/?series=136572&rev=2
> > > > > >       [2]https://patchwork.freedesktop.org/patch/617471/?series=136572&rev=2
> > > > > > 
> > > > > > 
> > > > > > Thanks a lot for those pointers, that is exactly what I was looking for.
> > > > > > 
> > > > > > And yeah, it is what I feared. You are re-implementing existing functionality,
> > > > > > but see below.
> > > > > Could you elaborate on what this "existing functionality" exactly is?
> > > > > I do not think this functionality exists at this time.
> > > > > 
> > > > > The EU debugging architecture for Xe specifically avoids the need for GDB
> > > > > to attach with ptrace to the CPU process or interfere with the CPU process for
> > > > > the debugging via parasitic threads or so.
> > > > > 
> > > > > Debugger connection is opened to the DRM driver for given PID (which uses the
> > > > > ptrace may access check for now) after which the all DRM client of that
> > > > > PID are exposed to the debugger process.
> > > > > 
> > > > > What we want to expose via that debugger connection is the ability for GDB to
> > > > > read/write the different GPU VM address spaces (ppGTT for Intel GPUs) just like
> > > > > the EU threads would see them. Note that the layout of the ppGTT is
> > > > > completely up to the userspace driver to setup and is mostly only partially
> > > > > equal to the CPU address space.
> > > > > 
> > > > > Specifically as part of reading/writing the ppGTT for debugging purposes,
> > > > > there are deep flushes needed: for example flushing instruction cache
> > > > > when adding/removing breakpoints.
> > > > > 
> > > > > Maybe that will explain the background. I elaborate on this at the end some more.
> > > > > 
> > > > > >               kmap/vmap are used everywhere in the DRM subsystem to access BOs, so I’m
> > > > > >               failing to see the problem with adding a simple helper based on existing
> > > > > >               code.
> > > > > > 
> > > > > >           What#s possible and often done is to do kmap/vmap if you need to implement a
> > > > > >           CPU copy for scanout for example or for copying/validating command buffers.
> > > > > >           But that usually requires accessing the whole BO and has separate security
> > > > > >           checks.
> > > > > > 
> > > > > >           When you want to access only a few bytes of a BO that sounds massively like
> > > > > >           a peek/poke like interface and we have already rejected that more than once.
> > > > > >           There even used to be standardized GEM IOCTLs for that which have been
> > > > > >           removed by now.
> > > > > Referring to the explanation at top: These IOCTL are not for the debugging target
> > > > > process to issue. The peek/poke interface is specifically for GDB only
> > > > > to facilitate the emulation of memory reads/writes on the GPU address
> > > > > space as they were done by EUs themselves. And to recap: for modifying
> > > > > instructions for example (add/remove breakpoint), extra level of cache flushing is
> > > > > needed which is not available to regular userspace.
> > > > > 
> > > > > I specifically discussed with Sima on the difference before moving forward with this
> > > > > design originally. If something has changed since then, I'm of course happy to rediscuss.
> > > > > 
> > > > > However, if this code can't be added, not sure how we would ever be able
> > > > > to implement core dumps for GPU threads/memory?
> > > > > 
> > > > > >           If you need to access BOs which are placed in not CPU accessible memory then
> > > > > >           implement the access callback for ptrace, see amdgpu_ttm_access_memory for
> > > > > >           an example how to do this.
> > > > > As also mentioned above, we don't work via ptrace at all when it comes
> > > > > to debugging the EUs. The only thing used for now is the ptrace_may_access to
> > > > > implement similar access restrictions as ptrace has. This can be changed
> > > > > to something else if needed.
> > > > > 
> > > > > >       Ptrace access via vm_operations_struct.access → ttm_bo_vm_access.
> > > > > > 
> > > > > >       This series renames ttm_bo_vm_access to ttm_bo_access, with no code changes.
> > > > > > 
> > > > > >       The above function accesses a BO via kmap if it is in SYSTEM / TT,
> > > > > >       which is existing code.
> > > > > > 
> > > > > >       This function is only exposed to user space via ptrace permissions.
> > > > > Maybe this sentence is what caused the confusion.
> > > > > 
> > > > > Userspace is never exposed with peek/poke interface, only the debugger
> > > > > connection which is its own FD.
> > > > > 
> > > > > >       In this series, we implement a function [3] similar to
> > > > > >       amdgpu_ttm_access_memory for the TTM vfunc access_memory. What is
> > > > > >       missing is non-visible CPU memory access, similar to
> > > > > >       amdgpu_ttm_access_memory_sdma. This will be addressed in a follow-up and
> > > > > >       was omitted in this series given its complexity.
> > > > > > 
> > > > > >       So, this looks more or less identical to AMD's ptrace implementation,
> > > > > >       but in GPU address space. Again, I fail to see what the problem is here.
> > > > > >       What am I missing?
> > > > > > 
> > > > > > 
> > > > > > The main question is why can't you use the existing interfaces directly?
> > > > > We're not working on the CPU address space or BOs. We're working
> > > > > strictly on the GPU address space as would be seen by an EU thread if it
> > > > > accessed address X.
> > > > > 
> > > > > > Additional to the peek/poke interface of ptrace Linux has the pidfd_getfd
> > > > > > system call, see herehttps://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
> > > > > > 
> > > > > > The pidfd_getfd() allows to dup() the render node file descriptor into your gdb
> > > > > > process. That in turn gives you all the access you need from gdb, including
> > > > > > mapping BOs and command submission on behalf of the application.
> > > > > We're not operating on the CPU address space nor are we operating on BOs
> > > > > (there is no concept of BO in the EU debug interface). Each VMA in the VM
> > > > > could come from anywhere, only the start address and size matter. And
> > > > > neither do we need to interfere with the command submission of the
> > > > > process under debug.
> > > > > 
> > > > > > As far as I can see that allows for the same functionality as the eudebug
> > > > > > interface, just without any driver specific code messing with ptrace
> > > > > > permissions and peek/poke interfaces.
> > > > > > 
> > > > > > So the question is still why do you need the whole eudebug interface in the
> > > > > > first place? I might be missing something, but that seems to be superfluous
> > > > > > from a high level view.
> > > > > Recapping from above. It is to allow the debugging of EU threads per DRM
> > > > > client, completely independent of the CPU process. If ptrace_may_acces
> > > > > is the sore point, we could consider other permission checks, too. There
> > > > > is no other connection to ptrace in this architecture as single
> > > > > permission check to know if PID is fair game to access by debugger
> > > > > process.
> > > > > 
> > > > > Why no parasitic thread or ptrace: Going forward, binding the EU debugging to
> > > > > the DRM client would also pave way for being able to extend core kernel generated
> > > > > core dump with each DRM client's EU thread/memory dump. We have similar
> > > > > feature called "Offline core dump" enabled in the downstream public
> > > > > trees for i915, where we currently attach the EU thread dump to i915 error state
> > > > > and then later combine i915 error state with CPU core dump file with a
> > > > > tool.
> > > > > 
> > > > > This is relatively little amount of extra code, as this baseline series
> > > > > already introduces GDB the ability to perform the necessary actions.
> > > > > It's just the matter of kernel driver calling: "stop all threads", then
> > > > > copying the memory map and memory contents for GPU threads, just like is
> > > > > done for CPU threads.
> > > > > 
> > > > > With parasitic thread injection, not sure if there is such way forward,
> > > > > as it would seem to require to inject quite abit more logic to core kernel?
> > > > > 
> > > > > > It's true that the AMD KFD part has still similar functionality, but that is
> > > > > > because of the broken KFD design of tying driver state to the CPU process
> > > > > > (which makes it inaccessible for gdb even with imported render node fd).
> > > > > > 
> > > > > > Both Sima and I (and partially Dave as well) have pushed back on the KFD
> > > > > > approach. And the long term plan is to get rid of such device driver specific
> > > > > > interface which re-implement existing functionality just differently.
> > > > > Recapping, this series is not adding it back. The debugger connection
> > > > > is a separate FD from the DRM one, with separate IOCTL set. We don't allow
> > > > > the DRM FD any new operations based on ptrace is attached or not. We
> > > > > don't ever do that check even.
> > > > > 
> > > > > We only restrict the opening of the debugger connection to given PID with
> > > > > ptrace_may_access check for now. That can be changed to something else,
> > > > > if necessary.
> > > > Yeah I think unnecessarily tying gpu processes to cpu processes is a bad
> > > > thing, least because even today all the svm discussions we have still hit
> > > > clear use-cases, where a 1:1 match is not wanted (like multiple gpu svm
> > > > sections with offsets). Not even speaking of all the gpu usecases where
> > > > the gpu vm space is still entirely independent of the cpu side.
> > > > 
> > > > So that's why I think this entirely separate approach looks like the right
> > > > one, with ptrace_may_access as the access control check to make sure we
> > > > match ptrace on the cpu side.
> > > > 
> > > > But there's very obviously a bikeshed to be had on what the actual uapi
> > > > should look like, especially how gdb opens up a gpu debug access fd. But I
> > > > also think that's not much on drm to decide, but whatever gdb wants. And
> > > > then we aim for some consistency on that lookup/access control part
> > > > (ideally, I might be missing some reasons why this is a bad idea) across
> > > > drm drivers.
> > > > 
> > > > > > So you need to have a really really good explanation why the eudebug interface
> > > > > > is actually necessary.
> > > > > TL;DR The main point is to decouple the debugging of the EU workloads from the
> > > > > debugging of the CPU process. This avoids the interference with the CPU process with
> > > > > parasitic thread injection. Further this also allows generating a core dump
> > > > > without any GDB connected. There are also many other smaller pros/cons
> > > > > which can be discussed but for the context of this patch, this is the
> > > > > main one.
> > > > > 
> > > > > So unlike parasitic thread injection, we don't unlock any special IOCTL for
> > > > > the process under debug to be performed by the parasitic thread, but we
> > > > > allow the minimal set of operations to be performed by GDB as if those were
> > > > > done on the EUs themselves.
> > > > > 
> > > > > One can think of it like the minimal subset of ptrace but for EU threads,
> > > > > not the CPU threads. And thus, building on this it's possible to extend
> > > > > the core kernel generated core dumps with DRM specific extension which
> > > > > would contain the EU thread/memory dump.
> > > > It might be good to document (in that debugging doc patch probably) why
> > > > thread injection is not a great option, and why the tradeoffs for
> > > > debugging are different than for for checkpoint/restore, where with CRIU
> > > > we landed on doing most of this in userspace, and often requiring
> > > > injection threads to make it all work.
> > > > 
> > > > Cheers, Sima
> > > > 
> > > > > Regards, Joonas
> > > > > 
> > > > > > Regards,
> > > > > > Christian.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > >       Matt
> > > > > > 
> > > > > >       [3]https://patchwork.freedesktop.org/patch/622520/?series=140200&rev=6
> > > > > > 
> > > > > > 
> > > > > >           Regards,
> > > > > >           Christian.
> > > > > > 
> > > > > > 
> > > > > >               Matt
> > > > > > 
> > > > > > 
> > > > > >                   Regards,
> > > > > >                   Christian.
> > > > > > 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-11 14:00                               ` Joonas Lahtinen
  2024-11-11 15:54                                 ` Christian König
@ 2024-11-12  8:28                                 ` Simona Vetter
  2024-11-12  8:58                                   ` Christian König
  1 sibling, 1 reply; 56+ messages in thread
From: Simona Vetter @ 2024-11-12  8:28 UTC (permalink / raw)
  To: Joonas Lahtinen
  Cc: Christian König, Simona Vetter, Matthew Brost,
	Christian König, Rodrigo Vivi, Huang Rui, intel-xe,
	dri-devel, matthew.auld, David Airlie, Simona Vetter

On Mon, Nov 11, 2024 at 04:00:02PM +0200, Joonas Lahtinen wrote:
> Quoting Christian König (2024-11-11 13:34:12)
> > Am 11.11.24 um 11:10 schrieb Simona Vetter:
> > > On Mon, Nov 11, 2024 at 10:00:17AM +0200, Joonas Lahtinen wrote:
> > >> Back from some time off and will try to answer below.
> > >>
> > >> Adding Dave and Sima as this topic has been previously discussed to some
> > >> extent and will be good to reach common understanding about what the
> > >> series is trying to do and what is the difference to the AMD debugging
> > >> model.
> > > I chatted about this thread a bit on irc with folks, and I think an
> > > orthogonal issue is the question, what should be in ttm-utils? I've asked
> > > Matt to type up a DOC patch once we have some consensus, since imo the
> > > somewhat lackluster documentation situation for ttm is also somewhat a
> > > cause for these big threads on various different topics. Aside from the
> > > fact that gpu memory management is just hard.
> > >
> > > On the uapi/design aspect, I think this would serve well with a patch to
> > > drm-uapi.rst that adds a debugging section? At least once we have some
> > > rough consensus across drivers, and more importantly userspace in the form
> > > of gdb upstream (at least I'm not aware of any other upstream debugger
> > > patches, I think amd's rocm stuff is also gdb-only).
> > 
> > Yeah that seems to be a really good idea. Similar design ideas came up 
> > AMD internally as well but where dropped after pointing people to 
> > pidfd_getfd().

Maybe not yet awake enough yet, but how does pidfd_getfd() sort out
debugger uapi fun?

> > But the bigger problem seems to be that the design doesn't seems to take 
> > the dma_fence requirements into account.
> 
> Where would you deduce that?
> 
> We specifically limit the debugging to Long Running contexts which don't
> depend on dma_fences.
> 
> > In other words attaching gdb to a pid seems to stop the GPU thread of 
> > this pid without waiting for the XE preemption nor end of operation fence.
> > 
> > I mean if the GPU threads are preempted that could work, but yeah not 
> > like this :)
> 
> For us, hitting a breakpoint inside the workload would always violate
> any dma_fence timeout for the submitted workload, as the HW context can't
> be switched out while in the breakpoint.
> 
> For any dma_fence workload the guarantee is that that it completes
> within reasonable time after submission (guaranteed by the submitter). I
> don't see how you could really allow interactive debugging of a
> breakpoint under those restrictions anyway even if pre-emption was
> supported as the workload would not finish in <10 seconds?

It defacto amounts to being able to kill a gpu process (if your debugger
is stuck for too long), which is random because of memory management
dependencies that could happen anywhere in userspace execution. So
definitely not something we should enable by default, at most it's tech
preview level or robust.

But as long as the tdr is there and still works even if a debugger session
is attached I don't see a fundamental issue. But should document some uapi
expectations for sure in this area.

> For i915 we did have the "pre-emptable but indefinitely long dma_fence workloads"
> concept at one point and that was rejected after the lengthy discussion.
> 
> So I think only way to allow interactive debugging is to avoid the
> dma_fences. Curious to hear if there are ideas for otherwise.

Yeah, if gpu debugging holds up preemption then no dma_fence is the only
way out. Which means allowing gdb requires that the gpu context uses hw
page faults for everything, so that we can still nuke away memory from
underneath it.

It probably also means you need exclusive access to the gpu, if that mode
holds up other workloads. So that's maybe another access rights question
the uapi doc patch needs to sort out.

I think finally we might want to have some really tainting debug module
option know that lifts some of the restrictions, for playing around or
people who know what they're doing, as in, they're ok with their
application under debugging occasionally just dying in tdr because of
timeouts.

Cheers, Sima

> Regards, Joonas
> 
> > 
> > Regards,
> > Christian.
> > 
> > >
> > > Some wash-up thoughts from me below, but consider them fairly irrelevant
> > > since I think the main driver for these big questions here should be
> > > gdb/userspace.
> > >
> > >> Quoting Christian König (2024-11-07 11:44:33)
> > >>> Am 06.11.24 um 18:00 schrieb Matthew Brost:
> > >>>
> > >>>      [SNIP]
> > >>>
> > >>>      This is not a generic interface that anyone can freely access. The same
> > >>>      permissions used by ptrace are checked when opening such an interface.
> > >>>      See [1] [2].
> > >>>
> > >>>      [1] https://patchwork.freedesktop.org/patch/617470/?series=136572&rev=2
> > >>>      [2] https://patchwork.freedesktop.org/patch/617471/?series=136572&rev=2
> > >>>
> > >>>
> > >>> Thanks a lot for those pointers, that is exactly what I was looking for.
> > >>>
> > >>> And yeah, it is what I feared. You are re-implementing existing functionality,
> > >>> but see below.
> > >> Could you elaborate on what this "existing functionality" exactly is?
> > >> I do not think this functionality exists at this time.
> > >>
> > >> The EU debugging architecture for Xe specifically avoids the need for GDB
> > >> to attach with ptrace to the CPU process or interfere with the CPU process for
> > >> the debugging via parasitic threads or so.
> > >>
> > >> Debugger connection is opened to the DRM driver for given PID (which uses the
> > >> ptrace may access check for now) after which the all DRM client of that
> > >> PID are exposed to the debugger process.
> > >>
> > >> What we want to expose via that debugger connection is the ability for GDB to
> > >> read/write the different GPU VM address spaces (ppGTT for Intel GPUs) just like
> > >> the EU threads would see them. Note that the layout of the ppGTT is
> > >> completely up to the userspace driver to setup and is mostly only partially
> > >> equal to the CPU address space.
> > >>
> > >> Specifically as part of reading/writing the ppGTT for debugging purposes,
> > >> there are deep flushes needed: for example flushing instruction cache
> > >> when adding/removing breakpoints.
> > >>
> > >> Maybe that will explain the background. I elaborate on this at the end some more.
> > >>
> > >>>              kmap/vmap are used everywhere in the DRM subsystem to access BOs, so I’m
> > >>>              failing to see the problem with adding a simple helper based on existing
> > >>>              code.
> > >>>
> > >>>          What#s possible and often done is to do kmap/vmap if you need to implement a
> > >>>          CPU copy for scanout for example or for copying/validating command buffers.
> > >>>          But that usually requires accessing the whole BO and has separate security
> > >>>          checks.
> > >>>
> > >>>          When you want to access only a few bytes of a BO that sounds massively like
> > >>>          a peek/poke like interface and we have already rejected that more than once.
> > >>>          There even used to be standardized GEM IOCTLs for that which have been
> > >>>          removed by now.
> > >> Referring to the explanation at top: These IOCTL are not for the debugging target
> > >> process to issue. The peek/poke interface is specifically for GDB only
> > >> to facilitate the emulation of memory reads/writes on the GPU address
> > >> space as they were done by EUs themselves. And to recap: for modifying
> > >> instructions for example (add/remove breakpoint), extra level of cache flushing is
> > >> needed which is not available to regular userspace.
> > >>
> > >> I specifically discussed with Sima on the difference before moving forward with this
> > >> design originally. If something has changed since then, I'm of course happy to rediscuss.
> > >>
> > >> However, if this code can't be added, not sure how we would ever be able
> > >> to implement core dumps for GPU threads/memory?
> > >>
> > >>>          If you need to access BOs which are placed in not CPU accessible memory then
> > >>>          implement the access callback for ptrace, see amdgpu_ttm_access_memory for
> > >>>          an example how to do this.
> > >> As also mentioned above, we don't work via ptrace at all when it comes
> > >> to debugging the EUs. The only thing used for now is the ptrace_may_access to
> > >> implement similar access restrictions as ptrace has. This can be changed
> > >> to something else if needed.
> > >>
> > >>>      Ptrace access via vm_operations_struct.access → ttm_bo_vm_access.
> > >>>
> > >>>      This series renames ttm_bo_vm_access to ttm_bo_access, with no code changes.
> > >>>
> > >>>      The above function accesses a BO via kmap if it is in SYSTEM / TT,
> > >>>      which is existing code.
> > >>>
> > >>>      This function is only exposed to user space via ptrace permissions.
> > >> Maybe this sentence is what caused the confusion.
> > >>
> > >> Userspace is never exposed with peek/poke interface, only the debugger
> > >> connection which is its own FD.
> > >>
> > >>>      In this series, we implement a function [3] similar to
> > >>>      amdgpu_ttm_access_memory for the TTM vfunc access_memory. What is
> > >>>      missing is non-visible CPU memory access, similar to
> > >>>      amdgpu_ttm_access_memory_sdma. This will be addressed in a follow-up and
> > >>>      was omitted in this series given its complexity.
> > >>>
> > >>>      So, this looks more or less identical to AMD's ptrace implementation,
> > >>>      but in GPU address space. Again, I fail to see what the problem is here.
> > >>>      What am I missing?
> > >>>
> > >>>
> > >>> The main question is why can't you use the existing interfaces directly?
> > >> We're not working on the CPU address space or BOs. We're working
> > >> strictly on the GPU address space as would be seen by an EU thread if it
> > >> accessed address X.
> > >>
> > >>> Additional to the peek/poke interface of ptrace Linux has the pidfd_getfd
> > >>> system call, see here https://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
> > >>>
> > >>> The pidfd_getfd() allows to dup() the render node file descriptor into your gdb
> > >>> process. That in turn gives you all the access you need from gdb, including
> > >>> mapping BOs and command submission on behalf of the application.
> > >> We're not operating on the CPU address space nor are we operating on BOs
> > >> (there is no concept of BO in the EU debug interface). Each VMA in the VM
> > >> could come from anywhere, only the start address and size matter. And
> > >> neither do we need to interfere with the command submission of the
> > >> process under debug.
> > >>
> > >>> As far as I can see that allows for the same functionality as the eudebug
> > >>> interface, just without any driver specific code messing with ptrace
> > >>> permissions and peek/poke interfaces.
> > >>>
> > >>> So the question is still why do you need the whole eudebug interface in the
> > >>> first place? I might be missing something, but that seems to be superfluous
> > >>> from a high level view.
> > >> Recapping from above. It is to allow the debugging of EU threads per DRM
> > >> client, completely independent of the CPU process. If ptrace_may_acces
> > >> is the sore point, we could consider other permission checks, too. There
> > >> is no other connection to ptrace in this architecture as single
> > >> permission check to know if PID is fair game to access by debugger
> > >> process.
> > >>
> > >> Why no parasitic thread or ptrace: Going forward, binding the EU debugging to
> > >> the DRM client would also pave way for being able to extend core kernel generated
> > >> core dump with each DRM client's EU thread/memory dump. We have similar
> > >> feature called "Offline core dump" enabled in the downstream public
> > >> trees for i915, where we currently attach the EU thread dump to i915 error state
> > >> and then later combine i915 error state with CPU core dump file with a
> > >> tool.
> > >>
> > >> This is relatively little amount of extra code, as this baseline series
> > >> already introduces GDB the ability to perform the necessary actions.
> > >> It's just the matter of kernel driver calling: "stop all threads", then
> > >> copying the memory map and memory contents for GPU threads, just like is
> > >> done for CPU threads.
> > >>
> > >> With parasitic thread injection, not sure if there is such way forward,
> > >> as it would seem to require to inject quite abit more logic to core kernel?
> > >>
> > >>> It's true that the AMD KFD part has still similar functionality, but that is
> > >>> because of the broken KFD design of tying driver state to the CPU process
> > >>> (which makes it inaccessible for gdb even with imported render node fd).
> > >>>
> > >>> Both Sima and I (and partially Dave as well) have pushed back on the KFD
> > >>> approach. And the long term plan is to get rid of such device driver specific
> > >>> interface which re-implement existing functionality just differently.
> > >> Recapping, this series is not adding it back. The debugger connection
> > >> is a separate FD from the DRM one, with separate IOCTL set. We don't allow
> > >> the DRM FD any new operations based on ptrace is attached or not. We
> > >> don't ever do that check even.
> > >>
> > >> We only restrict the opening of the debugger connection to given PID with
> > >> ptrace_may_access check for now. That can be changed to something else,
> > >> if necessary.
> > > Yeah I think unnecessarily tying gpu processes to cpu processes is a bad
> > > thing, least because even today all the svm discussions we have still hit
> > > clear use-cases, where a 1:1 match is not wanted (like multiple gpu svm
> > > sections with offsets). Not even speaking of all the gpu usecases where
> > > the gpu vm space is still entirely independent of the cpu side.
> > >
> > > So that's why I think this entirely separate approach looks like the right
> > > one, with ptrace_may_access as the access control check to make sure we
> > > match ptrace on the cpu side.
> > >
> > > But there's very obviously a bikeshed to be had on what the actual uapi
> > > should look like, especially how gdb opens up a gpu debug access fd. But I
> > > also think that's not much on drm to decide, but whatever gdb wants. And
> > > then we aim for some consistency on that lookup/access control part
> > > (ideally, I might be missing some reasons why this is a bad idea) across
> > > drm drivers.
> > >
> > >>> So you need to have a really really good explanation why the eudebug interface
> > >>> is actually necessary.
> > >> TL;DR The main point is to decouple the debugging of the EU workloads from the
> > >> debugging of the CPU process. This avoids the interference with the CPU process with
> > >> parasitic thread injection. Further this also allows generating a core dump
> > >> without any GDB connected. There are also many other smaller pros/cons
> > >> which can be discussed but for the context of this patch, this is the
> > >> main one.
> > >>
> > >> So unlike parasitic thread injection, we don't unlock any special IOCTL for
> > >> the process under debug to be performed by the parasitic thread, but we
> > >> allow the minimal set of operations to be performed by GDB as if those were
> > >> done on the EUs themselves.
> > >>
> > >> One can think of it like the minimal subset of ptrace but for EU threads,
> > >> not the CPU threads. And thus, building on this it's possible to extend
> > >> the core kernel generated core dumps with DRM specific extension which
> > >> would contain the EU thread/memory dump.
> > > It might be good to document (in that debugging doc patch probably) why
> > > thread injection is not a great option, and why the tradeoffs for
> > > debugging are different than for for checkpoint/restore, where with CRIU
> > > we landed on doing most of this in userspace, and often requiring
> > > injection threads to make it all work.
> > >
> > > Cheers, Sima
> > >
> > >> Regards, Joonas
> > >>
> > >>> Regards,
> > >>> Christian.
> > >>>
> > >>>
> > >>>
> > >>>      Matt
> > >>>
> > >>>      [3] https://patchwork.freedesktop.org/patch/622520/?series=140200&rev=6
> > >>>
> > >>>
> > >>>          Regards,
> > >>>          Christian.
> > >>>
> > >>>
> > >>>              Matt
> > >>>
> > >>>
> > >>>                  Regards,
> > >>>                  Christian.
> > >>>
> >

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-12  8:28                                 ` Simona Vetter
@ 2024-11-12  8:58                                   ` Christian König
  2024-11-12 13:30                                     ` Joonas Lahtinen
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2024-11-12  8:58 UTC (permalink / raw)
  To: Simona Vetter, Joonas Lahtinen
  Cc: Christian König, Matthew Brost, Rodrigo Vivi, Huang Rui,
	intel-xe, dri-devel, matthew.auld, David Airlie, Simona Vetter

Am 12.11.24 um 09:28 schrieb Simona Vetter:
> On Mon, Nov 11, 2024 at 04:00:02PM +0200, Joonas Lahtinen wrote:
>> Quoting Christian König (2024-11-11 13:34:12)
>>> Am 11.11.24 um 11:10 schrieb Simona Vetter:
>>>> On Mon, Nov 11, 2024 at 10:00:17AM +0200, Joonas Lahtinen wrote:
>>>>> Back from some time off and will try to answer below.
>>>>>
>>>>> Adding Dave and Sima as this topic has been previously discussed to some
>>>>> extent and will be good to reach common understanding about what the
>>>>> series is trying to do and what is the difference to the AMD debugging
>>>>> model.
>>>> I chatted about this thread a bit on irc with folks, and I think an
>>>> orthogonal issue is the question, what should be in ttm-utils? I've asked
>>>> Matt to type up a DOC patch once we have some consensus, since imo the
>>>> somewhat lackluster documentation situation for ttm is also somewhat a
>>>> cause for these big threads on various different topics. Aside from the
>>>> fact that gpu memory management is just hard.
>>>>
>>>> On the uapi/design aspect, I think this would serve well with a patch to
>>>> drm-uapi.rst that adds a debugging section? At least once we have some
>>>> rough consensus across drivers, and more importantly userspace in the form
>>>> of gdb upstream (at least I'm not aware of any other upstream debugger
>>>> patches, I think amd's rocm stuff is also gdb-only).
>>> Yeah that seems to be a really good idea. Similar design ideas came up
>>> AMD internally as well but where dropped after pointing people to
>>> pidfd_getfd().
> Maybe not yet awake enough yet, but how does pidfd_getfd() sort out
> debugger uapi fun?

It doesn't sorts them out, but it is a good helper to have in the toolbox.

The key point is it allows a debugger to not only suspend the CPU 
threads, peek/pook into the address space etc..., but also interact with 
the kernel device driver in the same way as the debugged application would.

So you can for example do things mmap() KMS handles to inspect scanned 
out images, or do things like command submission in the context of the 
debugged application etc....

At least for us that made in unnecessary to work with a parasitic thread 
injected into the debugged application, e.g. it avoided the need for the 
debugger to run code in the context of the debugged application.

You still need to define the approach, UAPI etc.. but you don't have to 
worry about access restrictions any more because that is already check 
by pidfd_getfd() and can implement you debugging UAPI just as normal 
driver IOCTLs on the DRM render node.

Regards,
Christian.

>
>>> But the bigger problem seems to be that the design doesn't seems to take
>>> the dma_fence requirements into account.
>> Where would you deduce that?
>>
>> We specifically limit the debugging to Long Running contexts which don't
>> depend on dma_fences.
>>
>>> In other words attaching gdb to a pid seems to stop the GPU thread of
>>> this pid without waiting for the XE preemption nor end of operation fence.
>>>
>>> I mean if the GPU threads are preempted that could work, but yeah not
>>> like this :)
>> For us, hitting a breakpoint inside the workload would always violate
>> any dma_fence timeout for the submitted workload, as the HW context can't
>> be switched out while in the breakpoint.
>>
>> For any dma_fence workload the guarantee is that that it completes
>> within reasonable time after submission (guaranteed by the submitter). I
>> don't see how you could really allow interactive debugging of a
>> breakpoint under those restrictions anyway even if pre-emption was
>> supported as the workload would not finish in <10 seconds?
> It defacto amounts to being able to kill a gpu process (if your debugger
> is stuck for too long), which is random because of memory management
> dependencies that could happen anywhere in userspace execution. So
> definitely not something we should enable by default, at most it's tech
> preview level or robust.
>
> But as long as the tdr is there and still works even if a debugger session
> is attached I don't see a fundamental issue. But should document some uapi
> expectations for sure in this area.
>
>> For i915 we did have the "pre-emptable but indefinitely long dma_fence workloads"
>> concept at one point and that was rejected after the lengthy discussion.
>>
>> So I think only way to allow interactive debugging is to avoid the
>> dma_fences. Curious to hear if there are ideas for otherwise.
> Yeah, if gpu debugging holds up preemption then no dma_fence is the only
> way out. Which means allowing gdb requires that the gpu context uses hw
> page faults for everything, so that we can still nuke away memory from
> underneath it.
>
> It probably also means you need exclusive access to the gpu, if that mode
> holds up other workloads. So that's maybe another access rights question
> the uapi doc patch needs to sort out.
>
> I think finally we might want to have some really tainting debug module
> option know that lifts some of the restrictions, for playing around or
> people who know what they're doing, as in, they're ok with their
> application under debugging occasionally just dying in tdr because of
> timeouts.
>
> Cheers, Sima
>
>> Regards, Joonas
>>
>>> Regards,
>>> Christian.
>>>
>>>> Some wash-up thoughts from me below, but consider them fairly irrelevant
>>>> since I think the main driver for these big questions here should be
>>>> gdb/userspace.
>>>>
>>>>> Quoting Christian König (2024-11-07 11:44:33)
>>>>>> Am 06.11.24 um 18:00 schrieb Matthew Brost:
>>>>>>
>>>>>>       [SNIP]
>>>>>>
>>>>>>       This is not a generic interface that anyone can freely access. The same
>>>>>>       permissions used by ptrace are checked when opening such an interface.
>>>>>>       See [1] [2].
>>>>>>
>>>>>>       [1] https://patchwork.freedesktop.org/patch/617470/?series=136572&rev=2
>>>>>>       [2] https://patchwork.freedesktop.org/patch/617471/?series=136572&rev=2
>>>>>>
>>>>>>
>>>>>> Thanks a lot for those pointers, that is exactly what I was looking for.
>>>>>>
>>>>>> And yeah, it is what I feared. You are re-implementing existing functionality,
>>>>>> but see below.
>>>>> Could you elaborate on what this "existing functionality" exactly is?
>>>>> I do not think this functionality exists at this time.
>>>>>
>>>>> The EU debugging architecture for Xe specifically avoids the need for GDB
>>>>> to attach with ptrace to the CPU process or interfere with the CPU process for
>>>>> the debugging via parasitic threads or so.
>>>>>
>>>>> Debugger connection is opened to the DRM driver for given PID (which uses the
>>>>> ptrace may access check for now) after which the all DRM client of that
>>>>> PID are exposed to the debugger process.
>>>>>
>>>>> What we want to expose via that debugger connection is the ability for GDB to
>>>>> read/write the different GPU VM address spaces (ppGTT for Intel GPUs) just like
>>>>> the EU threads would see them. Note that the layout of the ppGTT is
>>>>> completely up to the userspace driver to setup and is mostly only partially
>>>>> equal to the CPU address space.
>>>>>
>>>>> Specifically as part of reading/writing the ppGTT for debugging purposes,
>>>>> there are deep flushes needed: for example flushing instruction cache
>>>>> when adding/removing breakpoints.
>>>>>
>>>>> Maybe that will explain the background. I elaborate on this at the end some more.
>>>>>
>>>>>>               kmap/vmap are used everywhere in the DRM subsystem to access BOs, so I’m
>>>>>>               failing to see the problem with adding a simple helper based on existing
>>>>>>               code.
>>>>>>
>>>>>>           What#s possible and often done is to do kmap/vmap if you need to implement a
>>>>>>           CPU copy for scanout for example or for copying/validating command buffers.
>>>>>>           But that usually requires accessing the whole BO and has separate security
>>>>>>           checks.
>>>>>>
>>>>>>           When you want to access only a few bytes of a BO that sounds massively like
>>>>>>           a peek/poke like interface and we have already rejected that more than once.
>>>>>>           There even used to be standardized GEM IOCTLs for that which have been
>>>>>>           removed by now.
>>>>> Referring to the explanation at top: These IOCTL are not for the debugging target
>>>>> process to issue. The peek/poke interface is specifically for GDB only
>>>>> to facilitate the emulation of memory reads/writes on the GPU address
>>>>> space as they were done by EUs themselves. And to recap: for modifying
>>>>> instructions for example (add/remove breakpoint), extra level of cache flushing is
>>>>> needed which is not available to regular userspace.
>>>>>
>>>>> I specifically discussed with Sima on the difference before moving forward with this
>>>>> design originally. If something has changed since then, I'm of course happy to rediscuss.
>>>>>
>>>>> However, if this code can't be added, not sure how we would ever be able
>>>>> to implement core dumps for GPU threads/memory?
>>>>>
>>>>>>           If you need to access BOs which are placed in not CPU accessible memory then
>>>>>>           implement the access callback for ptrace, see amdgpu_ttm_access_memory for
>>>>>>           an example how to do this.
>>>>> As also mentioned above, we don't work via ptrace at all when it comes
>>>>> to debugging the EUs. The only thing used for now is the ptrace_may_access to
>>>>> implement similar access restrictions as ptrace has. This can be changed
>>>>> to something else if needed.
>>>>>
>>>>>>       Ptrace access via vm_operations_struct.access → ttm_bo_vm_access.
>>>>>>
>>>>>>       This series renames ttm_bo_vm_access to ttm_bo_access, with no code changes.
>>>>>>
>>>>>>       The above function accesses a BO via kmap if it is in SYSTEM / TT,
>>>>>>       which is existing code.
>>>>>>
>>>>>>       This function is only exposed to user space via ptrace permissions.
>>>>> Maybe this sentence is what caused the confusion.
>>>>>
>>>>> Userspace is never exposed with peek/poke interface, only the debugger
>>>>> connection which is its own FD.
>>>>>
>>>>>>       In this series, we implement a function [3] similar to
>>>>>>       amdgpu_ttm_access_memory for the TTM vfunc access_memory. What is
>>>>>>       missing is non-visible CPU memory access, similar to
>>>>>>       amdgpu_ttm_access_memory_sdma. This will be addressed in a follow-up and
>>>>>>       was omitted in this series given its complexity.
>>>>>>
>>>>>>       So, this looks more or less identical to AMD's ptrace implementation,
>>>>>>       but in GPU address space. Again, I fail to see what the problem is here.
>>>>>>       What am I missing?
>>>>>>
>>>>>>
>>>>>> The main question is why can't you use the existing interfaces directly?
>>>>> We're not working on the CPU address space or BOs. We're working
>>>>> strictly on the GPU address space as would be seen by an EU thread if it
>>>>> accessed address X.
>>>>>
>>>>>> Additional to the peek/poke interface of ptrace Linux has the pidfd_getfd
>>>>>> system call, see here https://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
>>>>>>
>>>>>> The pidfd_getfd() allows to dup() the render node file descriptor into your gdb
>>>>>> process. That in turn gives you all the access you need from gdb, including
>>>>>> mapping BOs and command submission on behalf of the application.
>>>>> We're not operating on the CPU address space nor are we operating on BOs
>>>>> (there is no concept of BO in the EU debug interface). Each VMA in the VM
>>>>> could come from anywhere, only the start address and size matter. And
>>>>> neither do we need to interfere with the command submission of the
>>>>> process under debug.
>>>>>
>>>>>> As far as I can see that allows for the same functionality as the eudebug
>>>>>> interface, just without any driver specific code messing with ptrace
>>>>>> permissions and peek/poke interfaces.
>>>>>>
>>>>>> So the question is still why do you need the whole eudebug interface in the
>>>>>> first place? I might be missing something, but that seems to be superfluous
>>>>>> from a high level view.
>>>>> Recapping from above. It is to allow the debugging of EU threads per DRM
>>>>> client, completely independent of the CPU process. If ptrace_may_acces
>>>>> is the sore point, we could consider other permission checks, too. There
>>>>> is no other connection to ptrace in this architecture as single
>>>>> permission check to know if PID is fair game to access by debugger
>>>>> process.
>>>>>
>>>>> Why no parasitic thread or ptrace: Going forward, binding the EU debugging to
>>>>> the DRM client would also pave way for being able to extend core kernel generated
>>>>> core dump with each DRM client's EU thread/memory dump. We have similar
>>>>> feature called "Offline core dump" enabled in the downstream public
>>>>> trees for i915, where we currently attach the EU thread dump to i915 error state
>>>>> and then later combine i915 error state with CPU core dump file with a
>>>>> tool.
>>>>>
>>>>> This is relatively little amount of extra code, as this baseline series
>>>>> already introduces GDB the ability to perform the necessary actions.
>>>>> It's just the matter of kernel driver calling: "stop all threads", then
>>>>> copying the memory map and memory contents for GPU threads, just like is
>>>>> done for CPU threads.
>>>>>
>>>>> With parasitic thread injection, not sure if there is such way forward,
>>>>> as it would seem to require to inject quite abit more logic to core kernel?
>>>>>
>>>>>> It's true that the AMD KFD part has still similar functionality, but that is
>>>>>> because of the broken KFD design of tying driver state to the CPU process
>>>>>> (which makes it inaccessible for gdb even with imported render node fd).
>>>>>>
>>>>>> Both Sima and I (and partially Dave as well) have pushed back on the KFD
>>>>>> approach. And the long term plan is to get rid of such device driver specific
>>>>>> interface which re-implement existing functionality just differently.
>>>>> Recapping, this series is not adding it back. The debugger connection
>>>>> is a separate FD from the DRM one, with separate IOCTL set. We don't allow
>>>>> the DRM FD any new operations based on ptrace is attached or not. We
>>>>> don't ever do that check even.
>>>>>
>>>>> We only restrict the opening of the debugger connection to given PID with
>>>>> ptrace_may_access check for now. That can be changed to something else,
>>>>> if necessary.
>>>> Yeah I think unnecessarily tying gpu processes to cpu processes is a bad
>>>> thing, least because even today all the svm discussions we have still hit
>>>> clear use-cases, where a 1:1 match is not wanted (like multiple gpu svm
>>>> sections with offsets). Not even speaking of all the gpu usecases where
>>>> the gpu vm space is still entirely independent of the cpu side.
>>>>
>>>> So that's why I think this entirely separate approach looks like the right
>>>> one, with ptrace_may_access as the access control check to make sure we
>>>> match ptrace on the cpu side.
>>>>
>>>> But there's very obviously a bikeshed to be had on what the actual uapi
>>>> should look like, especially how gdb opens up a gpu debug access fd. But I
>>>> also think that's not much on drm to decide, but whatever gdb wants. And
>>>> then we aim for some consistency on that lookup/access control part
>>>> (ideally, I might be missing some reasons why this is a bad idea) across
>>>> drm drivers.
>>>>
>>>>>> So you need to have a really really good explanation why the eudebug interface
>>>>>> is actually necessary.
>>>>> TL;DR The main point is to decouple the debugging of the EU workloads from the
>>>>> debugging of the CPU process. This avoids the interference with the CPU process with
>>>>> parasitic thread injection. Further this also allows generating a core dump
>>>>> without any GDB connected. There are also many other smaller pros/cons
>>>>> which can be discussed but for the context of this patch, this is the
>>>>> main one.
>>>>>
>>>>> So unlike parasitic thread injection, we don't unlock any special IOCTL for
>>>>> the process under debug to be performed by the parasitic thread, but we
>>>>> allow the minimal set of operations to be performed by GDB as if those were
>>>>> done on the EUs themselves.
>>>>>
>>>>> One can think of it like the minimal subset of ptrace but for EU threads,
>>>>> not the CPU threads. And thus, building on this it's possible to extend
>>>>> the core kernel generated core dumps with DRM specific extension which
>>>>> would contain the EU thread/memory dump.
>>>> It might be good to document (in that debugging doc patch probably) why
>>>> thread injection is not a great option, and why the tradeoffs for
>>>> debugging are different than for for checkpoint/restore, where with CRIU
>>>> we landed on doing most of this in userspace, and often requiring
>>>> injection threads to make it all work.
>>>>
>>>> Cheers, Sima
>>>>
>>>>> Regards, Joonas
>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>
>>>>>>
>>>>>>       Matt
>>>>>>
>>>>>>       [3] https://patchwork.freedesktop.org/patch/622520/?series=140200&rev=6
>>>>>>
>>>>>>
>>>>>>           Regards,
>>>>>>           Christian.
>>>>>>
>>>>>>
>>>>>>               Matt
>>>>>>
>>>>>>
>>>>>>                   Regards,
>>>>>>                   Christian.
>>>>>>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-11 22:45                                   ` Matthew Brost
@ 2024-11-12  9:23                                     ` Christian König
  2024-11-12 13:41                                       ` Joonas Lahtinen
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2024-11-12  9:23 UTC (permalink / raw)
  To: Matthew Brost, Christian König
  Cc: Joonas Lahtinen, Simona Vetter, Rodrigo Vivi, Huang Rui, intel-xe,
	dri-devel, matthew.auld, David Airlie, Simona Vetter

[-- Attachment #1: Type: text/plain, Size: 17014 bytes --]

Am 11.11.24 um 23:45 schrieb Matthew Brost:
> [SNIP]
>>> So I think only way to allow interactive debugging is to avoid the
>>> dma_fences. Curious to hear if there are ideas for otherwise.
>> You need to guarantee somehow that the process is taken from the hardware so
>> that the preemption fence can signal.
>>
> Our preemption fences have this functionality.
>
> A preemption fence issues a suspend execution command to the firmware. The
> firmware, in turn, attempts to preempt the workload. If it doesn't respond
> within a specified period, it resets the hardware queue, sends a message to KMD,
> bans the software queue, and signals the preemption fence.
>
> We provide even more protection than that. If, for some reason, the firmware
> doesn't respond within a longer timeout period, the KMD performs a device reset,
> ban the offending software queue(s), and will signal the preemption fences.
>
> This flow remains the same whether a debugger is attached or, for example, a
> user submits a 10-minute non-preemptable workload. In either case, other
> processes are guaranteed to make forward progress.

Yeah that is pretty much the same argumentation I have heard before and 
it turned out to not be working.

> The example above illustrates the memory oversubscription case, where two
> processes are using 51% of the memory.

That isn't even necessary. We have seen applications dying just because 
the core memory management tried to join back small pages into huge 
pages in an userptr.

That the core memory management jumps in and requests that the 
pre-emption fence signals can happen all the time.

You can mitigate that a bit, Fedora for example disables joining back 
small pages into huge pages by default for example and we even had 
people suggesting to use mprotect() so that userptrs VMAs don't fork() 
any more (which is of course completely illegal).

But my long term take away is that you can't block all causes of sudden 
requests to let a pre-emption fence signal.

> Another preemption scenario involves two processes sharing hardware resources.
> Our firmware follows the same flow here. If an LR workload is using a hardware
> resource and a DMA-fence workload is waiting, and if the LR workload doesn't
> preempt the in a timely manner, the firmware issues a hardware reset, notifies
> KMD, and bans the LR software queue. The DMA-fence workload then can make
> forward progress
>
> With the above in mind, this is why I say that if a user tries to run a game and
> a non-preemptable LR workload, either oversubscribing memory or sharing hardware
> resources, it is unlikely to work well. However, I don't think this is a common
> use case. I would expect that when a debugger is open, it is typically by a
> power user who knows how to disable other GPU tasks (e.g., by enabling software
> rendering or using a machine without any display).
>
> Given this, please to reconsider your position.

The key point here is that this isn't stable, you can do that as a tech 
demo but it can always be that debugging an application just randomly 
dies. And believe me AMD has tried this to a rather extreme extend as well.

What you could potentially work is to taint the kernel and make sure 
that this function is only available to user who absolutely know what 
they are doing.

But I would say we can only allow that if all other options have been 
exercised and doing it like this is really the only option left.

Regards,
Christian.

>> This means that a breakpoint or core dump doesn't halt GPU threads, but
>> rather suspends them. E.g. all running wave data is collected into a state
>> bag which can be restored later on.
>>
>> I was under the impression that those long running compute threads do
>> exactly that, but when the hardware can't switch out the GPU thread/process
>> while in a break then that isn't the case.
>>
>> As long as you don't find a way to avoid that this patch set is a pretty
>> clear NAK from my side as DMA-buf and TTM maintainer.
>>
> I believe this is addressed above.
>
> Matt
>
>> What might work is to keep the submission on the hardware in the break state
>> but forbid any memory access. This way you can signal your preemption fence
>> even when the hardware isn't made available.
>>
>> Before you continue XE setups a new pre-emption fence and makes sure that
>> all page tables etc... are up to date.
>>
>> Could be tricky to get this right if completion fence based submissions are
>> mixed in as well, but that gives you at least a direction you could
>> potentially go.
>>
>> Regards,
>> Christian.
>>
>>> Regards, Joonas
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Some wash-up thoughts from me below, but consider them fairly irrelevant
>>>>> since I think the main driver for these big questions here should be
>>>>> gdb/userspace.
>>>>>
>>>>>> Quoting Christian König (2024-11-07 11:44:33)
>>>>>>> Am 06.11.24 um 18:00 schrieb Matthew Brost:
>>>>>>>
>>>>>>>        [SNIP]
>>>>>>>
>>>>>>>        This is not a generic interface that anyone can freely access. The same
>>>>>>>        permissions used by ptrace are checked when opening such an interface.
>>>>>>>        See [1] [2].
>>>>>>>
>>>>>>>        [1]https://patchwork.freedesktop.org/patch/617470/?series=136572&rev=2
>>>>>>>        [2]https://patchwork.freedesktop.org/patch/617471/?series=136572&rev=2
>>>>>>>
>>>>>>>
>>>>>>> Thanks a lot for those pointers, that is exactly what I was looking for.
>>>>>>>
>>>>>>> And yeah, it is what I feared. You are re-implementing existing functionality,
>>>>>>> but see below.
>>>>>> Could you elaborate on what this "existing functionality" exactly is?
>>>>>> I do not think this functionality exists at this time.
>>>>>>
>>>>>> The EU debugging architecture for Xe specifically avoids the need for GDB
>>>>>> to attach with ptrace to the CPU process or interfere with the CPU process for
>>>>>> the debugging via parasitic threads or so.
>>>>>>
>>>>>> Debugger connection is opened to the DRM driver for given PID (which uses the
>>>>>> ptrace may access check for now) after which the all DRM client of that
>>>>>> PID are exposed to the debugger process.
>>>>>>
>>>>>> What we want to expose via that debugger connection is the ability for GDB to
>>>>>> read/write the different GPU VM address spaces (ppGTT for Intel GPUs) just like
>>>>>> the EU threads would see them. Note that the layout of the ppGTT is
>>>>>> completely up to the userspace driver to setup and is mostly only partially
>>>>>> equal to the CPU address space.
>>>>>>
>>>>>> Specifically as part of reading/writing the ppGTT for debugging purposes,
>>>>>> there are deep flushes needed: for example flushing instruction cache
>>>>>> when adding/removing breakpoints.
>>>>>>
>>>>>> Maybe that will explain the background. I elaborate on this at the end some more.
>>>>>>
>>>>>>>                kmap/vmap are used everywhere in the DRM subsystem to access BOs, so I’m
>>>>>>>                failing to see the problem with adding a simple helper based on existing
>>>>>>>                code.
>>>>>>>
>>>>>>>            What#s possible and often done is to do kmap/vmap if you need to implement a
>>>>>>>            CPU copy for scanout for example or for copying/validating command buffers.
>>>>>>>            But that usually requires accessing the whole BO and has separate security
>>>>>>>            checks.
>>>>>>>
>>>>>>>            When you want to access only a few bytes of a BO that sounds massively like
>>>>>>>            a peek/poke like interface and we have already rejected that more than once.
>>>>>>>            There even used to be standardized GEM IOCTLs for that which have been
>>>>>>>            removed by now.
>>>>>> Referring to the explanation at top: These IOCTL are not for the debugging target
>>>>>> process to issue. The peek/poke interface is specifically for GDB only
>>>>>> to facilitate the emulation of memory reads/writes on the GPU address
>>>>>> space as they were done by EUs themselves. And to recap: for modifying
>>>>>> instructions for example (add/remove breakpoint), extra level of cache flushing is
>>>>>> needed which is not available to regular userspace.
>>>>>>
>>>>>> I specifically discussed with Sima on the difference before moving forward with this
>>>>>> design originally. If something has changed since then, I'm of course happy to rediscuss.
>>>>>>
>>>>>> However, if this code can't be added, not sure how we would ever be able
>>>>>> to implement core dumps for GPU threads/memory?
>>>>>>
>>>>>>>            If you need to access BOs which are placed in not CPU accessible memory then
>>>>>>>            implement the access callback for ptrace, see amdgpu_ttm_access_memory for
>>>>>>>            an example how to do this.
>>>>>> As also mentioned above, we don't work via ptrace at all when it comes
>>>>>> to debugging the EUs. The only thing used for now is the ptrace_may_access to
>>>>>> implement similar access restrictions as ptrace has. This can be changed
>>>>>> to something else if needed.
>>>>>>
>>>>>>>        Ptrace access via vm_operations_struct.access → ttm_bo_vm_access.
>>>>>>>
>>>>>>>        This series renames ttm_bo_vm_access to ttm_bo_access, with no code changes.
>>>>>>>
>>>>>>>        The above function accesses a BO via kmap if it is in SYSTEM / TT,
>>>>>>>        which is existing code.
>>>>>>>
>>>>>>>        This function is only exposed to user space via ptrace permissions.
>>>>>> Maybe this sentence is what caused the confusion.
>>>>>>
>>>>>> Userspace is never exposed with peek/poke interface, only the debugger
>>>>>> connection which is its own FD.
>>>>>>
>>>>>>>        In this series, we implement a function [3] similar to
>>>>>>>        amdgpu_ttm_access_memory for the TTM vfunc access_memory. What is
>>>>>>>        missing is non-visible CPU memory access, similar to
>>>>>>>        amdgpu_ttm_access_memory_sdma. This will be addressed in a follow-up and
>>>>>>>        was omitted in this series given its complexity.
>>>>>>>
>>>>>>>        So, this looks more or less identical to AMD's ptrace implementation,
>>>>>>>        but in GPU address space. Again, I fail to see what the problem is here.
>>>>>>>        What am I missing?
>>>>>>>
>>>>>>>
>>>>>>> The main question is why can't you use the existing interfaces directly?
>>>>>> We're not working on the CPU address space or BOs. We're working
>>>>>> strictly on the GPU address space as would be seen by an EU thread if it
>>>>>> accessed address X.
>>>>>>
>>>>>>> Additional to the peek/poke interface of ptrace Linux has the pidfd_getfd
>>>>>>> system call, see herehttps://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
>>>>>>>
>>>>>>> The pidfd_getfd() allows to dup() the render node file descriptor into your gdb
>>>>>>> process. That in turn gives you all the access you need from gdb, including
>>>>>>> mapping BOs and command submission on behalf of the application.
>>>>>> We're not operating on the CPU address space nor are we operating on BOs
>>>>>> (there is no concept of BO in the EU debug interface). Each VMA in the VM
>>>>>> could come from anywhere, only the start address and size matter. And
>>>>>> neither do we need to interfere with the command submission of the
>>>>>> process under debug.
>>>>>>
>>>>>>> As far as I can see that allows for the same functionality as the eudebug
>>>>>>> interface, just without any driver specific code messing with ptrace
>>>>>>> permissions and peek/poke interfaces.
>>>>>>>
>>>>>>> So the question is still why do you need the whole eudebug interface in the
>>>>>>> first place? I might be missing something, but that seems to be superfluous
>>>>>>> from a high level view.
>>>>>> Recapping from above. It is to allow the debugging of EU threads per DRM
>>>>>> client, completely independent of the CPU process. If ptrace_may_acces
>>>>>> is the sore point, we could consider other permission checks, too. There
>>>>>> is no other connection to ptrace in this architecture as single
>>>>>> permission check to know if PID is fair game to access by debugger
>>>>>> process.
>>>>>>
>>>>>> Why no parasitic thread or ptrace: Going forward, binding the EU debugging to
>>>>>> the DRM client would also pave way for being able to extend core kernel generated
>>>>>> core dump with each DRM client's EU thread/memory dump. We have similar
>>>>>> feature called "Offline core dump" enabled in the downstream public
>>>>>> trees for i915, where we currently attach the EU thread dump to i915 error state
>>>>>> and then later combine i915 error state with CPU core dump file with a
>>>>>> tool.
>>>>>>
>>>>>> This is relatively little amount of extra code, as this baseline series
>>>>>> already introduces GDB the ability to perform the necessary actions.
>>>>>> It's just the matter of kernel driver calling: "stop all threads", then
>>>>>> copying the memory map and memory contents for GPU threads, just like is
>>>>>> done for CPU threads.
>>>>>>
>>>>>> With parasitic thread injection, not sure if there is such way forward,
>>>>>> as it would seem to require to inject quite abit more logic to core kernel?
>>>>>>
>>>>>>> It's true that the AMD KFD part has still similar functionality, but that is
>>>>>>> because of the broken KFD design of tying driver state to the CPU process
>>>>>>> (which makes it inaccessible for gdb even with imported render node fd).
>>>>>>>
>>>>>>> Both Sima and I (and partially Dave as well) have pushed back on the KFD
>>>>>>> approach. And the long term plan is to get rid of such device driver specific
>>>>>>> interface which re-implement existing functionality just differently.
>>>>>> Recapping, this series is not adding it back. The debugger connection
>>>>>> is a separate FD from the DRM one, with separate IOCTL set. We don't allow
>>>>>> the DRM FD any new operations based on ptrace is attached or not. We
>>>>>> don't ever do that check even.
>>>>>>
>>>>>> We only restrict the opening of the debugger connection to given PID with
>>>>>> ptrace_may_access check for now. That can be changed to something else,
>>>>>> if necessary.
>>>>> Yeah I think unnecessarily tying gpu processes to cpu processes is a bad
>>>>> thing, least because even today all the svm discussions we have still hit
>>>>> clear use-cases, where a 1:1 match is not wanted (like multiple gpu svm
>>>>> sections with offsets). Not even speaking of all the gpu usecases where
>>>>> the gpu vm space is still entirely independent of the cpu side.
>>>>>
>>>>> So that's why I think this entirely separate approach looks like the right
>>>>> one, with ptrace_may_access as the access control check to make sure we
>>>>> match ptrace on the cpu side.
>>>>>
>>>>> But there's very obviously a bikeshed to be had on what the actual uapi
>>>>> should look like, especially how gdb opens up a gpu debug access fd. But I
>>>>> also think that's not much on drm to decide, but whatever gdb wants. And
>>>>> then we aim for some consistency on that lookup/access control part
>>>>> (ideally, I might be missing some reasons why this is a bad idea) across
>>>>> drm drivers.
>>>>>
>>>>>>> So you need to have a really really good explanation why the eudebug interface
>>>>>>> is actually necessary.
>>>>>> TL;DR The main point is to decouple the debugging of the EU workloads from the
>>>>>> debugging of the CPU process. This avoids the interference with the CPU process with
>>>>>> parasitic thread injection. Further this also allows generating a core dump
>>>>>> without any GDB connected. There are also many other smaller pros/cons
>>>>>> which can be discussed but for the context of this patch, this is the
>>>>>> main one.
>>>>>>
>>>>>> So unlike parasitic thread injection, we don't unlock any special IOCTL for
>>>>>> the process under debug to be performed by the parasitic thread, but we
>>>>>> allow the minimal set of operations to be performed by GDB as if those were
>>>>>> done on the EUs themselves.
>>>>>>
>>>>>> One can think of it like the minimal subset of ptrace but for EU threads,
>>>>>> not the CPU threads. And thus, building on this it's possible to extend
>>>>>> the core kernel generated core dumps with DRM specific extension which
>>>>>> would contain the EU thread/memory dump.
>>>>> It might be good to document (in that debugging doc patch probably) why
>>>>> thread injection is not a great option, and why the tradeoffs for
>>>>> debugging are different than for for checkpoint/restore, where with CRIU
>>>>> we landed on doing most of this in userspace, and often requiring
>>>>> injection threads to make it all work.
>>>>>
>>>>> Cheers, Sima
>>>>>
>>>>>> Regards, Joonas
>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>        Matt
>>>>>>>
>>>>>>>        [3]https://patchwork.freedesktop.org/patch/622520/?series=140200&rev=6
>>>>>>>
>>>>>>>
>>>>>>>            Regards,
>>>>>>>            Christian.
>>>>>>>
>>>>>>>
>>>>>>>                Matt
>>>>>>>
>>>>>>>
>>>>>>>                    Regards,
>>>>>>>                    Christian.
>>>>>>>

[-- Attachment #2: Type: text/html, Size: 19706 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-12  8:58                                   ` Christian König
@ 2024-11-12 13:30                                     ` Joonas Lahtinen
  0 siblings, 0 replies; 56+ messages in thread
From: Joonas Lahtinen @ 2024-11-12 13:30 UTC (permalink / raw)
  To: Christian König, Simona Vetter
  Cc: Christian König, Matthew Brost, Rodrigo Vivi, Huang Rui,
	intel-xe, dri-devel, matthew.auld, David Airlie, Simona Vetter,
	Thomas Hellström, Francois Dugast

(+ Thomas and Francois related to the page-faults and scheduling)

Quoting Christian König (2024-11-12 10:58:18)
> Am 12.11.24 um 09:28 schrieb Simona Vetter:
> > On Mon, Nov 11, 2024 at 04:00:02PM +0200, Joonas Lahtinen wrote:
> >> Quoting Christian König (2024-11-11 13:34:12)
> >>> Am 11.11.24 um 11:10 schrieb Simona Vetter:
> >>>> On Mon, Nov 11, 2024 at 10:00:17AM +0200, Joonas Lahtinen wrote:
> >>>>> Back from some time off and will try to answer below.
> >>>>>
> >>>>> Adding Dave and Sima as this topic has been previously discussed to some
> >>>>> extent and will be good to reach common understanding about what the
> >>>>> series is trying to do and what is the difference to the AMD debugging
> >>>>> model.
> >>>> I chatted about this thread a bit on irc with folks, and I think an
> >>>> orthogonal issue is the question, what should be in ttm-utils? I've asked
> >>>> Matt to type up a DOC patch once we have some consensus, since imo the
> >>>> somewhat lackluster documentation situation for ttm is also somewhat a
> >>>> cause for these big threads on various different topics. Aside from the
> >>>> fact that gpu memory management is just hard.
> >>>>
> >>>> On the uapi/design aspect, I think this would serve well with a patch to
> >>>> drm-uapi.rst that adds a debugging section? At least once we have some
> >>>> rough consensus across drivers, and more importantly userspace in the form
> >>>> of gdb upstream (at least I'm not aware of any other upstream debugger
> >>>> patches, I think amd's rocm stuff is also gdb-only).
> >>> Yeah that seems to be a really good idea. Similar design ideas came up
> >>> AMD internally as well but where dropped after pointing people to
> >>> pidfd_getfd().
> > Maybe not yet awake enough yet, but how does pidfd_getfd() sort out
> > debugger uapi fun?
> 
> It doesn't sorts them out, but it is a good helper to have in the toolbox.

I didn't find any instance of pidfd_getfd in current GDB sources. Maybe
you are referring to downstream trees for existing usage you mentioned
in the other thread, do you have a pointer to such code?

> The key point is it allows a debugger to not only suspend the CPU 
> threads, peek/pook into the address space etc..., but also interact with 
> the kernel device driver in the same way as the debugged application would.

Unless you fully stop all the CPU threads, then do the pidfd_getfd and
close the dupped FD before resuming any CPU thread, you are interfering
with the FD lifetime perceived by the the CPU process. Aren't you?

I don't think stopping CPU threads is desireable as per my understanding
stop-all mode (or whatever is the right lingo inside GDB) is not always
good for catching bugs in massively parallel code. So the expectation
is that user may want to disable the stop-all mode so that even if you
would hit a breakpoint in one CPU/GPU thread, the rest of the CPU and GPU
threads would continue running unimpacted.

What you are suggesting seems to inherently always require stopping the CPU
threads to implement GPU debugging actions. Otherwise there is the fear
of interfering with the running CPU process? Stopping always is probably
fine for CRIU type of functionality, where the CPU threads are frozen, too.

My limited understanding to the subject is that the need is to expose
the GPU threads as their own set of threads/inferiors under GDB.
Controlling them should not interfere with the CPU threads. Injecting a
parasitic thread hidden from the user alleviates part of that, but brings
its own set of problems as you seem to agree.

However, for the ultimate details about why interfering with the CPU
threads is bad when controlling the GPU threads, we will probably have
to refer to the GDB experts.

This implementation follows the guideline of not interfering with the CPU
process as a primary design requirement coming from the GDB folks, so if
that needs to be discussed further, we need to pull in other folks and
mailing lists I think.

> So you can for example do things mmap() KMS handles to inspect scanned 
> out images, or do things like command submission in the context of the 
> debugged application etc....

That's understood, but we do not need any of that functionality at this point
and do not foresee needing it in the future.

> At least for us that made in unnecessary to work with a parasitic thread 
> injected into the debugged application, e.g. it avoided the need for the 
> debugger to run code in the context of the debugged application.

How do you avoid interfering with the application logic if user wants to
have the CPU threads running while single-stepping through the GPU code,
or do you simply not allow that?

> You still need to define the approach, UAPI etc.. but you don't have to 
> worry about access restrictions any more because that is already check 
> by pidfd_getfd() and can implement you debugging UAPI just as normal 
> driver IOCTLs on the DRM render node.

Avoiding the access check is neat indeed. My thinking is we could incorporate
just that part and only allow opening the debug connection if the initating
process provides a FD that maps to the same DRM client that is targeted
for debugging (acquired via pidfd_getfd or explicit sharing). That would
eliminate the need for ptrace_may_access exporting.

<SNIP>

> >>> But the bigger problem seems to be that the design doesn't seems to take
> >>> the dma_fence requirements into account.
> >> Where would you deduce that?
> >>
> >> We specifically limit the debugging to Long Running contexts which don't
> >> depend on dma_fences.
> >>
> >>> In other words attaching gdb to a pid seems to stop the GPU thread of
> >>> this pid without waiting for the XE preemption nor end of operation fence.
> >>>
> >>> I mean if the GPU threads are preempted that could work, but yeah not
> >>> like this :)
> >> For us, hitting a breakpoint inside the workload would always violate
> >> any dma_fence timeout for the submitted workload, as the HW context can't
> >> be switched out while in the breakpoint.
> >>
> >> For any dma_fence workload the guarantee is that that it completes
> >> within reasonable time after submission (guaranteed by the submitter). I
> >> don't see how you could really allow interactive debugging of a
> >> breakpoint under those restrictions anyway even if pre-emption was
> >> supported as the workload would not finish in <10 seconds?
> > It defacto amounts to being able to kill a gpu process (if your debugger
> > is stuck for too long), which is random because of memory management
> > dependencies that could happen anywhere in userspace execution. So
> > definitely not something we should enable by default, at most it's tech
> > preview level or robust.

It all exists behind a explicit enable flag that is disabled by default
because on current hardware we have this blocking problem and we also
have to limit to running contexts to single application due to stop-all
command impacting all active EUs for now.

> > But as long as the tdr is there and still works even if a debugger session
> > is attached I don't see a fundamental issue. But should document some uapi
> > expectations for sure in this area.

I think the uAPI expectations equally also apply for any long-running workloads?

> >> For i915 we did have the "pre-emptable but indefinitely long dma_fence workloads"
> >> concept at one point and that was rejected after the lengthy discussion.
> >>
> >> So I think only way to allow interactive debugging is to avoid the
> >> dma_fences. Curious to hear if there are ideas for otherwise.
> > Yeah, if gpu debugging holds up preemption then no dma_fence is the only
> > way out. Which means allowing gdb requires that the gpu context uses hw
> > page faults for everything, so that we can still nuke away memory from
> > underneath it.

Yeah, that has been under discussion. It's anyway highly desireable to use the
HW page faults when debugging to be able to catch NULL dereference and
similar bugs.

> > It probably also means you need exclusive access to the gpu, if that mode
> > holds up other workloads. So that's maybe another access rights question
> > the uapi doc patch needs to sort out.

We already require exclusive access because stop-all threads command
impacts all running threads on current hardware. That and the no
pre-emption when on breakpoint are the big reason why it's behind a
sysfs flag disabled by default.

> > I think finally we might want to have some really tainting debug module
> > option know that lifts some of the restrictions, for playing around or
> > people who know what they're doing, as in, they're ok with their
> > application under debugging occasionally just dying in tdr because of
> > timeouts.

That might be an option when the page-faults are not enabled. But once
they are enabled and thus we can satisfy the memory pre-empt fence, my
understanding is that we should be able to avoid dma_fence workloads
from being blocked by the LR workloads.

This would mean limiting the blocking to other LR workloads, and making
the problem into sysadmin decision about exclusive use of engine. And
not about core memory management going belly up due to dma fence
dependencies.

And again, the question arises equally if we're talking about
stuck-in-a-breakpoint workload or just a long thread group. So problem is not
specific to EU debugging. Thomas and Francois can probably elaborate
more if needed.

Regards, Joonas

> >
> > Cheers, Sima
> >
> >> Regards, Joonas
> >>
> >>> Regards,
> >>> Christian.
> >>>
> >>>> Some wash-up thoughts from me below, but consider them fairly irrelevant
> >>>> since I think the main driver for these big questions here should be
> >>>> gdb/userspace.
> >>>>
> >>>>> Quoting Christian König (2024-11-07 11:44:33)
> >>>>>> Am 06.11.24 um 18:00 schrieb Matthew Brost:
> >>>>>>
> >>>>>>       [SNIP]
> >>>>>>
> >>>>>>       This is not a generic interface that anyone can freely access. The same
> >>>>>>       permissions used by ptrace are checked when opening such an interface.
> >>>>>>       See [1] [2].
> >>>>>>
> >>>>>>       [1] https://patchwork.freedesktop.org/patch/617470/?series=136572&rev=2
> >>>>>>       [2] https://patchwork.freedesktop.org/patch/617471/?series=136572&rev=2
> >>>>>>
> >>>>>>
> >>>>>> Thanks a lot for those pointers, that is exactly what I was looking for.
> >>>>>>
> >>>>>> And yeah, it is what I feared. You are re-implementing existing functionality,
> >>>>>> but see below.
> >>>>> Could you elaborate on what this "existing functionality" exactly is?
> >>>>> I do not think this functionality exists at this time.
> >>>>>
> >>>>> The EU debugging architecture for Xe specifically avoids the need for GDB
> >>>>> to attach with ptrace to the CPU process or interfere with the CPU process for
> >>>>> the debugging via parasitic threads or so.
> >>>>>
> >>>>> Debugger connection is opened to the DRM driver for given PID (which uses the
> >>>>> ptrace may access check for now) after which the all DRM client of that
> >>>>> PID are exposed to the debugger process.
> >>>>>
> >>>>> What we want to expose via that debugger connection is the ability for GDB to
> >>>>> read/write the different GPU VM address spaces (ppGTT for Intel GPUs) just like
> >>>>> the EU threads would see them. Note that the layout of the ppGTT is
> >>>>> completely up to the userspace driver to setup and is mostly only partially
> >>>>> equal to the CPU address space.
> >>>>>
> >>>>> Specifically as part of reading/writing the ppGTT for debugging purposes,
> >>>>> there are deep flushes needed: for example flushing instruction cache
> >>>>> when adding/removing breakpoints.
> >>>>>
> >>>>> Maybe that will explain the background. I elaborate on this at the end some more.
> >>>>>
> >>>>>>               kmap/vmap are used everywhere in the DRM subsystem to access BOs, so I’m
> >>>>>>               failing to see the problem with adding a simple helper based on existing
> >>>>>>               code.
> >>>>>>
> >>>>>>           What#s possible and often done is to do kmap/vmap if you need to implement a
> >>>>>>           CPU copy for scanout for example or for copying/validating command buffers.
> >>>>>>           But that usually requires accessing the whole BO and has separate security
> >>>>>>           checks.
> >>>>>>
> >>>>>>           When you want to access only a few bytes of a BO that sounds massively like
> >>>>>>           a peek/poke like interface and we have already rejected that more than once.
> >>>>>>           There even used to be standardized GEM IOCTLs for that which have been
> >>>>>>           removed by now.
> >>>>> Referring to the explanation at top: These IOCTL are not for the debugging target
> >>>>> process to issue. The peek/poke interface is specifically for GDB only
> >>>>> to facilitate the emulation of memory reads/writes on the GPU address
> >>>>> space as they were done by EUs themselves. And to recap: for modifying
> >>>>> instructions for example (add/remove breakpoint), extra level of cache flushing is
> >>>>> needed which is not available to regular userspace.
> >>>>>
> >>>>> I specifically discussed with Sima on the difference before moving forward with this
> >>>>> design originally. If something has changed since then, I'm of course happy to rediscuss.
> >>>>>
> >>>>> However, if this code can't be added, not sure how we would ever be able
> >>>>> to implement core dumps for GPU threads/memory?
> >>>>>
> >>>>>>           If you need to access BOs which are placed in not CPU accessible memory then
> >>>>>>           implement the access callback for ptrace, see amdgpu_ttm_access_memory for
> >>>>>>           an example how to do this.
> >>>>> As also mentioned above, we don't work via ptrace at all when it comes
> >>>>> to debugging the EUs. The only thing used for now is the ptrace_may_access to
> >>>>> implement similar access restrictions as ptrace has. This can be changed
> >>>>> to something else if needed.
> >>>>>
> >>>>>>       Ptrace access via vm_operations_struct.access → ttm_bo_vm_access.
> >>>>>>
> >>>>>>       This series renames ttm_bo_vm_access to ttm_bo_access, with no code changes.
> >>>>>>
> >>>>>>       The above function accesses a BO via kmap if it is in SYSTEM / TT,
> >>>>>>       which is existing code.
> >>>>>>
> >>>>>>       This function is only exposed to user space via ptrace permissions.
> >>>>> Maybe this sentence is what caused the confusion.
> >>>>>
> >>>>> Userspace is never exposed with peek/poke interface, only the debugger
> >>>>> connection which is its own FD.
> >>>>>
> >>>>>>       In this series, we implement a function [3] similar to
> >>>>>>       amdgpu_ttm_access_memory for the TTM vfunc access_memory. What is
> >>>>>>       missing is non-visible CPU memory access, similar to
> >>>>>>       amdgpu_ttm_access_memory_sdma. This will be addressed in a follow-up and
> >>>>>>       was omitted in this series given its complexity.
> >>>>>>
> >>>>>>       So, this looks more or less identical to AMD's ptrace implementation,
> >>>>>>       but in GPU address space. Again, I fail to see what the problem is here.
> >>>>>>       What am I missing?
> >>>>>>
> >>>>>>
> >>>>>> The main question is why can't you use the existing interfaces directly?
> >>>>> We're not working on the CPU address space or BOs. We're working
> >>>>> strictly on the GPU address space as would be seen by an EU thread if it
> >>>>> accessed address X.
> >>>>>
> >>>>>> Additional to the peek/poke interface of ptrace Linux has the pidfd_getfd
> >>>>>> system call, see here https://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
> >>>>>>
> >>>>>> The pidfd_getfd() allows to dup() the render node file descriptor into your gdb
> >>>>>> process. That in turn gives you all the access you need from gdb, including
> >>>>>> mapping BOs and command submission on behalf of the application.
> >>>>> We're not operating on the CPU address space nor are we operating on BOs
> >>>>> (there is no concept of BO in the EU debug interface). Each VMA in the VM
> >>>>> could come from anywhere, only the start address and size matter. And
> >>>>> neither do we need to interfere with the command submission of the
> >>>>> process under debug.
> >>>>>
> >>>>>> As far as I can see that allows for the same functionality as the eudebug
> >>>>>> interface, just without any driver specific code messing with ptrace
> >>>>>> permissions and peek/poke interfaces.
> >>>>>>
> >>>>>> So the question is still why do you need the whole eudebug interface in the
> >>>>>> first place? I might be missing something, but that seems to be superfluous
> >>>>>> from a high level view.
> >>>>> Recapping from above. It is to allow the debugging of EU threads per DRM
> >>>>> client, completely independent of the CPU process. If ptrace_may_acces
> >>>>> is the sore point, we could consider other permission checks, too. There
> >>>>> is no other connection to ptrace in this architecture as single
> >>>>> permission check to know if PID is fair game to access by debugger
> >>>>> process.
> >>>>>
> >>>>> Why no parasitic thread or ptrace: Going forward, binding the EU debugging to
> >>>>> the DRM client would also pave way for being able to extend core kernel generated
> >>>>> core dump with each DRM client's EU thread/memory dump. We have similar
> >>>>> feature called "Offline core dump" enabled in the downstream public
> >>>>> trees for i915, where we currently attach the EU thread dump to i915 error state
> >>>>> and then later combine i915 error state with CPU core dump file with a
> >>>>> tool.
> >>>>>
> >>>>> This is relatively little amount of extra code, as this baseline series
> >>>>> already introduces GDB the ability to perform the necessary actions.
> >>>>> It's just the matter of kernel driver calling: "stop all threads", then
> >>>>> copying the memory map and memory contents for GPU threads, just like is
> >>>>> done for CPU threads.
> >>>>>
> >>>>> With parasitic thread injection, not sure if there is such way forward,
> >>>>> as it would seem to require to inject quite abit more logic to core kernel?
> >>>>>
> >>>>>> It's true that the AMD KFD part has still similar functionality, but that is
> >>>>>> because of the broken KFD design of tying driver state to the CPU process
> >>>>>> (which makes it inaccessible for gdb even with imported render node fd).
> >>>>>>
> >>>>>> Both Sima and I (and partially Dave as well) have pushed back on the KFD
> >>>>>> approach. And the long term plan is to get rid of such device driver specific
> >>>>>> interface which re-implement existing functionality just differently.
> >>>>> Recapping, this series is not adding it back. The debugger connection
> >>>>> is a separate FD from the DRM one, with separate IOCTL set. We don't allow
> >>>>> the DRM FD any new operations based on ptrace is attached or not. We
> >>>>> don't ever do that check even.
> >>>>>
> >>>>> We only restrict the opening of the debugger connection to given PID with
> >>>>> ptrace_may_access check for now. That can be changed to something else,
> >>>>> if necessary.
> >>>> Yeah I think unnecessarily tying gpu processes to cpu processes is a bad
> >>>> thing, least because even today all the svm discussions we have still hit
> >>>> clear use-cases, where a 1:1 match is not wanted (like multiple gpu svm
> >>>> sections with offsets). Not even speaking of all the gpu usecases where
> >>>> the gpu vm space is still entirely independent of the cpu side.
> >>>>
> >>>> So that's why I think this entirely separate approach looks like the right
> >>>> one, with ptrace_may_access as the access control check to make sure we
> >>>> match ptrace on the cpu side.
> >>>>
> >>>> But there's very obviously a bikeshed to be had on what the actual uapi
> >>>> should look like, especially how gdb opens up a gpu debug access fd. But I
> >>>> also think that's not much on drm to decide, but whatever gdb wants. And
> >>>> then we aim for some consistency on that lookup/access control part
> >>>> (ideally, I might be missing some reasons why this is a bad idea) across
> >>>> drm drivers.
> >>>>
> >>>>>> So you need to have a really really good explanation why the eudebug interface
> >>>>>> is actually necessary.
> >>>>> TL;DR The main point is to decouple the debugging of the EU workloads from the
> >>>>> debugging of the CPU process. This avoids the interference with the CPU process with
> >>>>> parasitic thread injection. Further this also allows generating a core dump
> >>>>> without any GDB connected. There are also many other smaller pros/cons
> >>>>> which can be discussed but for the context of this patch, this is the
> >>>>> main one.
> >>>>>
> >>>>> So unlike parasitic thread injection, we don't unlock any special IOCTL for
> >>>>> the process under debug to be performed by the parasitic thread, but we
> >>>>> allow the minimal set of operations to be performed by GDB as if those were
> >>>>> done on the EUs themselves.
> >>>>>
> >>>>> One can think of it like the minimal subset of ptrace but for EU threads,
> >>>>> not the CPU threads. And thus, building on this it's possible to extend
> >>>>> the core kernel generated core dumps with DRM specific extension which
> >>>>> would contain the EU thread/memory dump.
> >>>> It might be good to document (in that debugging doc patch probably) why
> >>>> thread injection is not a great option, and why the tradeoffs for
> >>>> debugging are different than for for checkpoint/restore, where with CRIU
> >>>> we landed on doing most of this in userspace, and often requiring
> >>>> injection threads to make it all work.
> >>>>
> >>>> Cheers, Sima
> >>>>
> >>>>> Regards, Joonas
> >>>>>
> >>>>>> Regards,
> >>>>>> Christian.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>       Matt
> >>>>>>
> >>>>>>       [3] https://patchwork.freedesktop.org/patch/622520/?series=140200&rev=6
> >>>>>>
> >>>>>>
> >>>>>>           Regards,
> >>>>>>           Christian.
> >>>>>>
> >>>>>>
> >>>>>>               Matt
> >>>>>>
> >>>>>>
> >>>>>>                   Regards,
> >>>>>>                   Christian.
> >>>>>>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-12  9:23                                     ` Christian König
@ 2024-11-12 13:41                                       ` Joonas Lahtinen
  2024-11-12 16:22                                         ` Thomas Hellström
  0 siblings, 1 reply; 56+ messages in thread
From: Joonas Lahtinen @ 2024-11-12 13:41 UTC (permalink / raw)
  To: Christian König, Christian König, Matthew Brost
  Cc: Simona Vetter, Rodrigo Vivi, Huang Rui, intel-xe, dri-devel,
	matthew.auld, David Airlie, Simona Vetter, Thomas Hellström

(+ Thomas)

Quoting Christian König (2024-11-12 11:23:36)
> Am 11.11.24 um 23:45 schrieb Matthew Brost:
> 
>     [SNIP]
> 
>             So I think only way to allow interactive debugging is to avoid the
>             dma_fences. Curious to hear if there are ideas for otherwise.
> 
>         You need to guarantee somehow that the process is taken from the hardware so
>         that the preemption fence can signal.
> 
> 
>     Our preemption fences have this functionality.
> 
>     A preemption fence issues a suspend execution command to the firmware. The
>     firmware, in turn, attempts to preempt the workload. If it doesn't respond
>     within a specified period, it resets the hardware queue, sends a message to KMD,
>     bans the software queue, and signals the preemption fence.
> 
>     We provide even more protection than that. If, for some reason, the firmware
>     doesn't respond within a longer timeout period, the KMD performs a device reset,
>     ban the offending software queue(s), and will signal the preemption fences.
> 
>     This flow remains the same whether a debugger is attached or, for example, a
>     user submits a 10-minute non-preemptable workload. In either case, other
>     processes are guaranteed to make forward progress.
> 
> 
> Yeah that is pretty much the same argumentation I have heard before and it
> turned out to not be working.
> 
> 
>     The example above illustrates the memory oversubscription case, where two
>     processes are using 51% of the memory.
> 
> 
> That isn't even necessary. We have seen applications dying just because the
> core memory management tried to join back small pages into huge pages in an
> userptr.
> 
> That the core memory management jumps in and requests that the pre-emption
> fence signals can happen all the time.

Ouch. Does there happen to be a known reproducer for this behavior or maybe
bug report?

> You can mitigate that a bit, Fedora for example disables joining back small
> pages into huge pages by default for example and we even had people suggesting
> to use mprotect() so that userptrs VMAs don't fork() any more (which is of
> course completely illegal).
> 
> But my long term take away is that you can't block all causes of sudden
> requests to let a pre-emption fence signal.

I think this problem equally applies to the LR-workloads like the EU
debugging ones.

>     Another preemption scenario involves two processes sharing hardware resources.
>     Our firmware follows the same flow here. If an LR workload is using a hardware
>     resource and a DMA-fence workload is waiting, and if the LR workload doesn't
>     preempt the in a timely manner, the firmware issues a hardware reset, notifies
>     KMD, and bans the LR software queue. The DMA-fence workload then can make
>     forward progress
> 
>     With the above in mind, this is why I say that if a user tries to run a game and
>     a non-preemptable LR workload, either oversubscribing memory or sharing hardware
>     resources, it is unlikely to work well. However, I don't think this is a common
>     use case. I would expect that when a debugger is open, it is typically by a
>     power user who knows how to disable other GPU tasks (e.g., by enabling software
>     rendering or using a machine without any display).
> 
>     Given this, please to reconsider your position.
> 
> 
> The key point here is that this isn't stable, you can do that as a tech demo
> but it can always be that debugging an application just randomly dies. And
> believe me AMD has tried this to a rather extreme extend as well.

It's not really only limited to the debuggable applications at all, the
normal LR workloads are equally impacted as far as I understand. Just
harder to catch the issue with LR-workloads if the pre-emption fence
signaling is sporadic.

> What you could potentially work is to taint the kernel and make sure that this
> function is only available to user who absolutely know what they are doing.
> 
> But I would say we can only allow that if all other options have been exercised
> and doing it like this is really the only option left.

It sounds like servicing the memory pre-empt fence by stealing the
pages from underneath the workload would be the way to resolve this
issue.

This has been extensively discussed already, but was expected to really
only be needed for low-on-memory scenarios. However it now seems like
the need is much earlier due to the random userptr page joining by core
mm.

If that is done and the memory pre-empt fence is serviced even for
debuggable contexts, do you have further concerns with the presented approach
from dma-buf and drm/sched perspective?

Regards, Joonas

> 
> Regards,
> Christian.
> 
> 
>         This means that a breakpoint or core dump doesn't halt GPU threads, but
>         rather suspends them. E.g. all running wave data is collected into a state
>         bag which can be restored later on.
> 
>         I was under the impression that those long running compute threads do
>         exactly that, but when the hardware can't switch out the GPU thread/process
>         while in a break then that isn't the case.
> 
>         As long as you don't find a way to avoid that this patch set is a pretty
>         clear NAK from my side as DMA-buf and TTM maintainer.
> 
> 
>     I believe this is addressed above.
> 
>     Matt
> 
> 
>         What might work is to keep the submission on the hardware in the break state
>         but forbid any memory access. This way you can signal your preemption fence
>         even when the hardware isn't made available.
> 
>         Before you continue XE setups a new pre-emption fence and makes sure that
>         all page tables etc... are up to date.
> 
>         Could be tricky to get this right if completion fence based submissions are
>         mixed in as well, but that gives you at least a direction you could
>         potentially go.
> 
>         Regards,
>         Christian.
> 
> 
>             Regards, Joonas
> 
> 
>                 Regards,
>                 Christian.
> 
> 
>                     Some wash-up thoughts from me below, but consider them fairly irrelevant
>                     since I think the main driver for these big questions here should be
>                     gdb/userspace.
> 
> 
>                         Quoting Christian König (2024-11-07 11:44:33)
> 
>                             Am 06.11.24 um 18:00 schrieb Matthew Brost:
> 
>                                   [SNIP]
> 
>                                   This is not a generic interface that anyone can freely access. The same
>                                   permissions used by ptrace are checked when opening such an interface.
>                                   See [1] [2].
> 
>                                   [1]https://patchwork.freedesktop.org/patch/617470/?series=136572&rev=2
>                                   [2]https://patchwork.freedesktop.org/patch/617471/?series=136572&rev=2
> 
> 
>                             Thanks a lot for those pointers, that is exactly what I was looking for.
> 
>                             And yeah, it is what I feared. You are re-implementing existing functionality,
>                             but see below.
> 
>                         Could you elaborate on what this "existing functionality" exactly is?
>                         I do not think this functionality exists at this time.
> 
>                         The EU debugging architecture for Xe specifically avoids the need for GDB
>                         to attach with ptrace to the CPU process or interfere with the CPU process for
>                         the debugging via parasitic threads or so.
> 
>                         Debugger connection is opened to the DRM driver for given PID (which uses the
>                         ptrace may access check for now) after which the all DRM client of that
>                         PID are exposed to the debugger process.
> 
>                         What we want to expose via that debugger connection is the ability for GDB to
>                         read/write the different GPU VM address spaces (ppGTT for Intel GPUs) just like
>                         the EU threads would see them. Note that the layout of the ppGTT is
>                         completely up to the userspace driver to setup and is mostly only partially
>                         equal to the CPU address space.
> 
>                         Specifically as part of reading/writing the ppGTT for debugging purposes,
>                         there are deep flushes needed: for example flushing instruction cache
>                         when adding/removing breakpoints.
> 
>                         Maybe that will explain the background. I elaborate on this at the end some more.
> 
> 
>                                           kmap/vmap are used everywhere in the DRM subsystem to access BOs, so I’m
>                                           failing to see the problem with adding a simple helper based on existing
>                                           code.
> 
>                                       What#s possible and often done is to do kmap/vmap if you need to implement a
>                                       CPU copy for scanout for example or for copying/validating command buffers.
>                                       But that usually requires accessing the whole BO and has separate security
>                                       checks.
> 
>                                       When you want to access only a few bytes of a BO that sounds massively like
>                                       a peek/poke like interface and we have already rejected that more than once.
>                                       There even used to be standardized GEM IOCTLs for that which have been
>                                       removed by now.
> 
>                         Referring to the explanation at top: These IOCTL are not for the debugging target
>                         process to issue. The peek/poke interface is specifically for GDB only
>                         to facilitate the emulation of memory reads/writes on the GPU address
>                         space as they were done by EUs themselves. And to recap: for modifying
>                         instructions for example (add/remove breakpoint), extra level of cache flushing is
>                         needed which is not available to regular userspace.
> 
>                         I specifically discussed with Sima on the difference before moving forward with this
>                         design originally. If something has changed since then, I'm of course happy to rediscuss.
> 
>                         However, if this code can't be added, not sure how we would ever be able
>                         to implement core dumps for GPU threads/memory?
> 
> 
>                                       If you need to access BOs which are placed in not CPU accessible memory then
>                                       implement the access callback for ptrace, see amdgpu_ttm_access_memory for
>                                       an example how to do this.
> 
>                         As also mentioned above, we don't work via ptrace at all when it comes
>                         to debugging the EUs. The only thing used for now is the ptrace_may_access to
>                         implement similar access restrictions as ptrace has. This can be changed
>                         to something else if needed.
> 
> 
>                                   Ptrace access via vm_operations_struct.access → ttm_bo_vm_access.
> 
>                                   This series renames ttm_bo_vm_access to ttm_bo_access, with no code changes.
> 
>                                   The above function accesses a BO via kmap if it is in SYSTEM / TT,
>                                   which is existing code.
> 
>                                   This function is only exposed to user space via ptrace permissions.
> 
>                         Maybe this sentence is what caused the confusion.
> 
>                         Userspace is never exposed with peek/poke interface, only the debugger
>                         connection which is its own FD.
> 
> 
>                                   In this series, we implement a function [3] similar to
>                                   amdgpu_ttm_access_memory for the TTM vfunc access_memory. What is
>                                   missing is non-visible CPU memory access, similar to
>                                   amdgpu_ttm_access_memory_sdma. This will be addressed in a follow-up and
>                                   was omitted in this series given its complexity.
> 
>                                   So, this looks more or less identical to AMD's ptrace implementation,
>                                   but in GPU address space. Again, I fail to see what the problem is here.
>                                   What am I missing?
> 
> 
>                             The main question is why can't you use the existing interfaces directly?
> 
>                         We're not working on the CPU address space or BOs. We're working
>                         strictly on the GPU address space as would be seen by an EU thread if it
>                         accessed address X.
> 
> 
>                             Additional to the peek/poke interface of ptrace Linux has the pidfd_getfd
>                             system call, see herehttps://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
> 
>                             The pidfd_getfd() allows to dup() the render node file descriptor into your gdb
>                             process. That in turn gives you all the access you need from gdb, including
>                             mapping BOs and command submission on behalf of the application.
> 
>                         We're not operating on the CPU address space nor are we operating on BOs
>                         (there is no concept of BO in the EU debug interface). Each VMA in the VM
>                         could come from anywhere, only the start address and size matter. And
>                         neither do we need to interfere with the command submission of the
>                         process under debug.
> 
> 
>                             As far as I can see that allows for the same functionality as the eudebug
>                             interface, just without any driver specific code messing with ptrace
>                             permissions and peek/poke interfaces.
> 
>                             So the question is still why do you need the whole eudebug interface in the
>                             first place? I might be missing something, but that seems to be superfluous
>                             from a high level view.
> 
>                         Recapping from above. It is to allow the debugging of EU threads per DRM
>                         client, completely independent of the CPU process. If ptrace_may_acces
>                         is the sore point, we could consider other permission checks, too. There
>                         is no other connection to ptrace in this architecture as single
>                         permission check to know if PID is fair game to access by debugger
>                         process.
> 
>                         Why no parasitic thread or ptrace: Going forward, binding the EU debugging to
>                         the DRM client would also pave way for being able to extend core kernel generated
>                         core dump with each DRM client's EU thread/memory dump. We have similar
>                         feature called "Offline core dump" enabled in the downstream public
>                         trees for i915, where we currently attach the EU thread dump to i915 error state
>                         and then later combine i915 error state with CPU core dump file with a
>                         tool.
> 
>                         This is relatively little amount of extra code, as this baseline series
>                         already introduces GDB the ability to perform the necessary actions.
>                         It's just the matter of kernel driver calling: "stop all threads", then
>                         copying the memory map and memory contents for GPU threads, just like is
>                         done for CPU threads.
> 
>                         With parasitic thread injection, not sure if there is such way forward,
>                         as it would seem to require to inject quite abit more logic to core kernel?
> 
> 
>                             It's true that the AMD KFD part has still similar functionality, but that is
>                             because of the broken KFD design of tying driver state to the CPU process
>                             (which makes it inaccessible for gdb even with imported render node fd).
> 
>                             Both Sima and I (and partially Dave as well) have pushed back on the KFD
>                             approach. And the long term plan is to get rid of such device driver specific
>                             interface which re-implement existing functionality just differently.
> 
>                         Recapping, this series is not adding it back. The debugger connection
>                         is a separate FD from the DRM one, with separate IOCTL set. We don't allow
>                         the DRM FD any new operations based on ptrace is attached or not. We
>                         don't ever do that check even.
> 
>                         We only restrict the opening of the debugger connection to given PID with
>                         ptrace_may_access check for now. That can be changed to something else,
>                         if necessary.
> 
>                     Yeah I think unnecessarily tying gpu processes to cpu processes is a bad
>                     thing, least because even today all the svm discussions we have still hit
>                     clear use-cases, where a 1:1 match is not wanted (like multiple gpu svm
>                     sections with offsets). Not even speaking of all the gpu usecases where
>                     the gpu vm space is still entirely independent of the cpu side.
> 
>                     So that's why I think this entirely separate approach looks like the right
>                     one, with ptrace_may_access as the access control check to make sure we
>                     match ptrace on the cpu side.
> 
>                     But there's very obviously a bikeshed to be had on what the actual uapi
>                     should look like, especially how gdb opens up a gpu debug access fd. But I
>                     also think that's not much on drm to decide, but whatever gdb wants. And
>                     then we aim for some consistency on that lookup/access control part
>                     (ideally, I might be missing some reasons why this is a bad idea) across
>                     drm drivers.
> 
> 
>                             So you need to have a really really good explanation why the eudebug interface
>                             is actually necessary.
> 
>                         TL;DR The main point is to decouple the debugging of the EU workloads from the
>                         debugging of the CPU process. This avoids the interference with the CPU process with
>                         parasitic thread injection. Further this also allows generating a core dump
>                         without any GDB connected. There are also many other smaller pros/cons
>                         which can be discussed but for the context of this patch, this is the
>                         main one.
> 
>                         So unlike parasitic thread injection, we don't unlock any special IOCTL for
>                         the process under debug to be performed by the parasitic thread, but we
>                         allow the minimal set of operations to be performed by GDB as if those were
>                         done on the EUs themselves.
> 
>                         One can think of it like the minimal subset of ptrace but for EU threads,
>                         not the CPU threads. And thus, building on this it's possible to extend
>                         the core kernel generated core dumps with DRM specific extension which
>                         would contain the EU thread/memory dump.
> 
>                     It might be good to document (in that debugging doc patch probably) why
>                     thread injection is not a great option, and why the tradeoffs for
>                     debugging are different than for for checkpoint/restore, where with CRIU
>                     we landed on doing most of this in userspace, and often requiring
>                     injection threads to make it all work.
> 
>                     Cheers, Sima
> 
> 
>                         Regards, Joonas
> 
> 
>                             Regards,
>                             Christian.
> 
> 
> 
>                                   Matt
> 
>                                   [3]https://patchwork.freedesktop.org/patch/622520/?series=140200&rev=6
> 
> 
>                                       Regards,
>                                       Christian.
> 
> 
>                                           Matt
> 
> 
>                                               Regards,
>                                               Christian.
> 
> 
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-12 13:41                                       ` Joonas Lahtinen
@ 2024-11-12 16:22                                         ` Thomas Hellström
  2024-11-12 16:25                                           ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Thomas Hellström @ 2024-11-12 16:22 UTC (permalink / raw)
  To: Joonas Lahtinen, Christian König, Christian König,
	Matthew Brost
  Cc: Simona Vetter, Rodrigo Vivi, Huang Rui, intel-xe, dri-devel,
	matthew.auld, David Airlie, Simona Vetter

On Tue, 2024-11-12 at 15:41 +0200, Joonas Lahtinen wrote:
> (+ Thomas)
> 
> Quoting Christian König (2024-11-12 11:23:36)
> > Am 11.11.24 um 23:45 schrieb Matthew Brost:
> > 
> >     [SNIP]
> > 
> >             So I think only way to allow interactive debugging is
> > to avoid the
> >             dma_fences. Curious to hear if there are ideas for
> > otherwise.
> > 
> >         You need to guarantee somehow that the process is taken
> > from the hardware so
> >         that the preemption fence can signal.
> > 
> > 
> >     Our preemption fences have this functionality.
> > 
> >     A preemption fence issues a suspend execution command to the
> > firmware. The
> >     firmware, in turn, attempts to preempt the workload. If it
> > doesn't respond
> >     within a specified period, it resets the hardware queue, sends
> > a message to KMD,
> >     bans the software queue, and signals the preemption fence.
> > 
> >     We provide even more protection than that. If, for some reason,
> > the firmware
> >     doesn't respond within a longer timeout period, the KMD
> > performs a device reset,
> >     ban the offending software queue(s), and will signal the
> > preemption fences.
> > 
> >     This flow remains the same whether a debugger is attached or,
> > for example, a
> >     user submits a 10-minute non-preemptable workload. In either
> > case, other
> >     processes are guaranteed to make forward progress.
> > 
> > 
> > Yeah that is pretty much the same argumentation I have heard before
> > and it
> > turned out to not be working.
> > 
> > 
> >     The example above illustrates the memory oversubscription case,
> > where two
> >     processes are using 51% of the memory.
> > 
> > 
> > That isn't even necessary. We have seen applications dying just
> > because the
> > core memory management tried to join back small pages into huge
> > pages in an
> > userptr.
> > 
> > That the core memory management jumps in and requests that the pre-
> > emption
> > fence signals can happen all the time.
> 
> Ouch. Does there happen to be a known reproducer for this behavior or
> maybe
> bug report?
> 
> > You can mitigate that a bit, Fedora for example disables joining
> > back small
> > pages into huge pages by default for example and we even had people
> > suggesting
> > to use mprotect() so that userptrs VMAs don't fork() any more
> > (which is of
> > course completely illegal).
> > 
> > But my long term take away is that you can't block all causes of
> > sudden
> > requests to let a pre-emption fence signal.
> 
> I think this problem equally applies to the LR-workloads like the EU
> debugging ones.
> 
> >     Another preemption scenario involves two processes sharing
> > hardware resources.
> >     Our firmware follows the same flow here. If an LR workload is
> > using a hardware
> >     resource and a DMA-fence workload is waiting, and if the LR
> > workload doesn't
> >     preempt the in a timely manner, the firmware issues a hardware
> > reset, notifies
> >     KMD, and bans the LR software queue. The DMA-fence workload
> > then can make
> >     forward progress
> > 
> >     With the above in mind, this is why I say that if a user tries
> > to run a game and
> >     a non-preemptable LR workload, either oversubscribing memory or
> > sharing hardware
> >     resources, it is unlikely to work well. However, I don't think
> > this is a common
> >     use case. I would expect that when a debugger is open, it is
> > typically by a
> >     power user who knows how to disable other GPU tasks (e.g., by
> > enabling software
> >     rendering or using a machine without any display).
> > 
> >     Given this, please to reconsider your position.
> > 
> > 
> > The key point here is that this isn't stable, you can do that as a
> > tech demo
> > but it can always be that debugging an application just randomly
> > dies. And
> > believe me AMD has tried this to a rather extreme extend as well.
> 
> It's not really only limited to the debuggable applications at all,
> the
> normal LR workloads are equally impacted as far as I understand. Just
> harder to catch the issue with LR-workloads if the pre-emption fence
> signaling is sporadic.
> 
> > What you could potentially work is to taint the kernel and make
> > sure that this
> > function is only available to user who absolutely know what they
> > are doing.
> > 
> > But I would say we can only allow that if all other options have
> > been exercised
> > and doing it like this is really the only option left.
> 
> It sounds like servicing the memory pre-empt fence by stealing the
> pages from underneath the workload would be the way to resolve this
> issue.
> 
> This has been extensively discussed already, but was expected to
> really
> only be needed for low-on-memory scenarios. However it now seems like
> the need is much earlier due to the random userptr page joining by
> core
> mm.

Just to clarify here:
 
In Long-Running mode with recoverable pagefaults enabled we don't have
any preempt-fences, but rather just zap the PTEs pointing to the
affected memory and flush TLB. So from a memory resource POW a
breakpoint should be safe, and no mmu notifier nor shrinker will be
blocked.

Nor will there be any jobs with published dma-fences depending on the
job blocked either temporarily by a pagefault or long-term by a
debugger breakpoint.

/Thomas


> 
> If that is done and the memory pre-empt fence is serviced even for
> debuggable contexts, do you have further concerns with the presented
> approach
> from dma-buf and drm/sched perspective?
> 
> Regards, Joonas
> 
> > 
> > Regards,
> > Christian.
> > 
> > 
> >         This means that a breakpoint or core dump doesn't halt GPU
> > threads, but
> >         rather suspends them. E.g. all running wave data is
> > collected into a state
> >         bag which can be restored later on.
> > 
> >         I was under the impression that those long running compute
> > threads do
> >         exactly that, but when the hardware can't switch out the
> > GPU thread/process
> >         while in a break then that isn't the case.
> > 
> >         As long as you don't find a way to avoid that this patch
> > set is a pretty
> >         clear NAK from my side as DMA-buf and TTM maintainer.
> > 
> > 
> >     I believe this is addressed above.
> > 
> >     Matt
> > 
> > 
> >         What might work is to keep the submission on the hardware
> > in the break state
> >         but forbid any memory access. This way you can signal your
> > preemption fence
> >         even when the hardware isn't made available.
> > 
> >         Before you continue XE setups a new pre-emption fence and
> > makes sure that
> >         all page tables etc... are up to date.
> > 
> >         Could be tricky to get this right if completion fence based
> > submissions are
> >         mixed in as well, but that gives you at least a direction
> > you could
> >         potentially go.
> > 
> >         Regards,
> >         Christian.
> > 
> > 
> >             Regards, Joonas
> > 
> > 
> >                 Regards,
> >                 Christian.
> > 
> > 
> >                     Some wash-up thoughts from me below, but
> > consider them fairly irrelevant
> >                     since I think the main driver for these big
> > questions here should be
> >                     gdb/userspace.
> > 
> > 
> >                         Quoting Christian König (2024-11-07
> > 11:44:33)
> > 
> >                             Am 06.11.24 um 18:00 schrieb Matthew
> > Brost:
> > 
> >                                   [SNIP]
> > 
> >                                   This is not a generic interface
> > that anyone can freely access. The same
> >                                   permissions used by ptrace are
> > checked when opening such an interface.
> >                                   See [1] [2].
> > 
> >                                  
> > [1]https://patchwork.freedesktop.org/patch/617470/?series=136572&re
> > v=2
> >                                  
> > [2]https://patchwork.freedesktop.org/patch/617471/?series=136572&re
> > v=2
> > 
> > 
> >                             Thanks a lot for those pointers, that
> > is exactly what I was looking for.
> > 
> >                             And yeah, it is what I feared. You are
> > re-implementing existing functionality,
> >                             but see below.
> > 
> >                         Could you elaborate on what this "existing
> > functionality" exactly is?
> >                         I do not think this functionality exists at
> > this time.
> > 
> >                         The EU debugging architecture for Xe
> > specifically avoids the need for GDB
> >                         to attach with ptrace to the CPU process or
> > interfere with the CPU process for
> >                         the debugging via parasitic threads or so.
> > 
> >                         Debugger connection is opened to the DRM
> > driver for given PID (which uses the
> >                         ptrace may access check for now) after
> > which the all DRM client of that
> >                         PID are exposed to the debugger process.
> > 
> >                         What we want to expose via that debugger
> > connection is the ability for GDB to
> >                         read/write the different GPU VM address
> > spaces (ppGTT for Intel GPUs) just like
> >                         the EU threads would see them. Note that
> > the layout of the ppGTT is
> >                         completely up to the userspace driver to
> > setup and is mostly only partially
> >                         equal to the CPU address space.
> > 
> >                         Specifically as part of reading/writing the
> > ppGTT for debugging purposes,
> >                         there are deep flushes needed: for example
> > flushing instruction cache
> >                         when adding/removing breakpoints.
> > 
> >                         Maybe that will explain the background. I
> > elaborate on this at the end some more.
> > 
> > 
> >                                           kmap/vmap are used
> > everywhere in the DRM subsystem to access BOs, so I’m
> >                                           failing to see the
> > problem with adding a simple helper based on existing
> >                                           code.
> > 
> >                                       What#s possible and often
> > done is to do kmap/vmap if you need to implement a
> >                                       CPU copy for scanout for
> > example or for copying/validating command buffers.
> >                                       But that usually requires
> > accessing the whole BO and has separate security
> >                                       checks.
> > 
> >                                       When you want to access only
> > a few bytes of a BO that sounds massively like
> >                                       a peek/poke like interface
> > and we have already rejected that more than once.
> >                                       There even used to be
> > standardized GEM IOCTLs for that which have been
> >                                       removed by now.
> > 
> >                         Referring to the explanation at top: These
> > IOCTL are not for the debugging target
> >                         process to issue. The peek/poke interface
> > is specifically for GDB only
> >                         to facilitate the emulation of memory
> > reads/writes on the GPU address
> >                         space as they were done by EUs themselves.
> > And to recap: for modifying
> >                         instructions for example (add/remove
> > breakpoint), extra level of cache flushing is
> >                         needed which is not available to regular
> > userspace.
> > 
> >                         I specifically discussed with Sima on the
> > difference before moving forward with this
> >                         design originally. If something has changed
> > since then, I'm of course happy to rediscuss.
> > 
> >                         However, if this code can't be added, not
> > sure how we would ever be able
> >                         to implement core dumps for GPU
> > threads/memory?
> > 
> > 
> >                                       If you need to access BOs
> > which are placed in not CPU accessible memory then
> >                                       implement the access callback
> > for ptrace, see amdgpu_ttm_access_memory for
> >                                       an example how to do this.
> > 
> >                         As also mentioned above, we don't work via
> > ptrace at all when it comes
> >                         to debugging the EUs. The only thing used
> > for now is the ptrace_may_access to
> >                         implement similar access restrictions as
> > ptrace has. This can be changed
> >                         to something else if needed.
> > 
> > 
> >                                   Ptrace access via
> > vm_operations_struct.access → ttm_bo_vm_access.
> > 
> >                                   This series renames
> > ttm_bo_vm_access to ttm_bo_access, with no code changes.
> > 
> >                                   The above function accesses a BO
> > via kmap if it is in SYSTEM / TT,
> >                                   which is existing code.
> > 
> >                                   This function is only exposed to
> > user space via ptrace permissions.
> > 
> >                         Maybe this sentence is what caused the
> > confusion.
> > 
> >                         Userspace is never exposed with peek/poke
> > interface, only the debugger
> >                         connection which is its own FD.
> > 
> > 
> >                                   In this series, we implement a
> > function [3] similar to
> >                                   amdgpu_ttm_access_memory for the
> > TTM vfunc access_memory. What is
> >                                   missing is non-visible CPU memory
> > access, similar to
> >                                   amdgpu_ttm_access_memory_sdma.
> > This will be addressed in a follow-up and
> >                                   was omitted in this series given
> > its complexity.
> > 
> >                                   So, this looks more or less
> > identical to AMD's ptrace implementation,
> >                                   but in GPU address space. Again,
> > I fail to see what the problem is here.
> >                                   What am I missing?
> > 
> > 
> >                             The main question is why can't you use
> > the existing interfaces directly?
> > 
> >                         We're not working on the CPU address space
> > or BOs. We're working
> >                         strictly on the GPU address space as would
> > be seen by an EU thread if it
> >                         accessed address X.
> > 
> > 
> >                             Additional to the peek/poke interface
> > of ptrace Linux has the pidfd_getfd
> >                             system call, see
> > herehttps://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
> > 
> >                             The pidfd_getfd() allows to dup() the
> > render node file descriptor into your gdb
> >                             process. That in turn gives you all the
> > access you need from gdb, including
> >                             mapping BOs and command submission on
> > behalf of the application.
> > 
> >                         We're not operating on the CPU address
> > space nor are we operating on BOs
> >                         (there is no concept of BO in the EU debug
> > interface). Each VMA in the VM
> >                         could come from anywhere, only the start
> > address and size matter. And
> >                         neither do we need to interfere with the
> > command submission of the
> >                         process under debug.
> > 
> > 
> >                             As far as I can see that allows for the
> > same functionality as the eudebug
> >                             interface, just without any driver
> > specific code messing with ptrace
> >                             permissions and peek/poke interfaces.
> > 
> >                             So the question is still why do you
> > need the whole eudebug interface in the
> >                             first place? I might be missing
> > something, but that seems to be superfluous
> >                             from a high level view.
> > 
> >                         Recapping from above. It is to allow the
> > debugging of EU threads per DRM
> >                         client, completely independent of the CPU
> > process. If ptrace_may_acces
> >                         is the sore point, we could consider other
> > permission checks, too. There
> >                         is no other connection to ptrace in this
> > architecture as single
> >                         permission check to know if PID is fair
> > game to access by debugger
> >                         process.
> > 
> >                         Why no parasitic thread or ptrace: Going
> > forward, binding the EU debugging to
> >                         the DRM client would also pave way for
> > being able to extend core kernel generated
> >                         core dump with each DRM client's EU
> > thread/memory dump. We have similar
> >                         feature called "Offline core dump" enabled
> > in the downstream public
> >                         trees for i915, where we currently attach
> > the EU thread dump to i915 error state
> >                         and then later combine i915 error state
> > with CPU core dump file with a
> >                         tool.
> > 
> >                         This is relatively little amount of extra
> > code, as this baseline series
> >                         already introduces GDB the ability to
> > perform the necessary actions.
> >                         It's just the matter of kernel driver
> > calling: "stop all threads", then
> >                         copying the memory map and memory contents
> > for GPU threads, just like is
> >                         done for CPU threads.
> > 
> >                         With parasitic thread injection, not sure
> > if there is such way forward,
> >                         as it would seem to require to inject quite
> > abit more logic to core kernel?
> > 
> > 
> >                             It's true that the AMD KFD part has
> > still similar functionality, but that is
> >                             because of the broken KFD design of
> > tying driver state to the CPU process
> >                             (which makes it inaccessible for gdb
> > even with imported render node fd).
> > 
> >                             Both Sima and I (and partially Dave as
> > well) have pushed back on the KFD
> >                             approach. And the long term plan is to
> > get rid of such device driver specific
> >                             interface which re-implement existing
> > functionality just differently.
> > 
> >                         Recapping, this series is not adding it
> > back. The debugger connection
> >                         is a separate FD from the DRM one, with
> > separate IOCTL set. We don't allow
> >                         the DRM FD any new operations based on
> > ptrace is attached or not. We
> >                         don't ever do that check even.
> > 
> >                         We only restrict the opening of the
> > debugger connection to given PID with
> >                         ptrace_may_access check for now. That can
> > be changed to something else,
> >                         if necessary.
> > 
> >                     Yeah I think unnecessarily tying gpu processes
> > to cpu processes is a bad
> >                     thing, least because even today all the svm
> > discussions we have still hit
> >                     clear use-cases, where a 1:1 match is not
> > wanted (like multiple gpu svm
> >                     sections with offsets). Not even speaking of
> > all the gpu usecases where
> >                     the gpu vm space is still entirely independent
> > of the cpu side.
> > 
> >                     So that's why I think this entirely separate
> > approach looks like the right
> >                     one, with ptrace_may_access as the access
> > control check to make sure we
> >                     match ptrace on the cpu side.
> > 
> >                     But there's very obviously a bikeshed to be had
> > on what the actual uapi
> >                     should look like, especially how gdb opens up a
> > gpu debug access fd. But I
> >                     also think that's not much on drm to decide,
> > but whatever gdb wants. And
> >                     then we aim for some consistency on that
> > lookup/access control part
> >                     (ideally, I might be missing some reasons why
> > this is a bad idea) across
> >                     drm drivers.
> > 
> > 
> >                             So you need to have a really really
> > good explanation why the eudebug interface
> >                             is actually necessary.
> > 
> >                         TL;DR The main point is to decouple the
> > debugging of the EU workloads from the
> >                         debugging of the CPU process. This avoids
> > the interference with the CPU process with
> >                         parasitic thread injection. Further this
> > also allows generating a core dump
> >                         without any GDB connected. There are also
> > many other smaller pros/cons
> >                         which can be discussed but for the context
> > of this patch, this is the
> >                         main one.
> > 
> >                         So unlike parasitic thread injection, we
> > don't unlock any special IOCTL for
> >                         the process under debug to be performed by
> > the parasitic thread, but we
> >                         allow the minimal set of operations to be
> > performed by GDB as if those were
> >                         done on the EUs themselves.
> > 
> >                         One can think of it like the minimal subset
> > of ptrace but for EU threads,
> >                         not the CPU threads. And thus, building on
> > this it's possible to extend
> >                         the core kernel generated core dumps with
> > DRM specific extension which
> >                         would contain the EU thread/memory dump.
> > 
> >                     It might be good to document (in that debugging
> > doc patch probably) why
> >                     thread injection is not a great option, and why
> > the tradeoffs for
> >                     debugging are different than for for
> > checkpoint/restore, where with CRIU
> >                     we landed on doing most of this in userspace,
> > and often requiring
> >                     injection threads to make it all work.
> > 
> >                     Cheers, Sima
> > 
> > 
> >                         Regards, Joonas
> > 
> > 
> >                             Regards,
> >                             Christian.
> > 
> > 
> > 
> >                                   Matt
> > 
> >                                  
> > [3]https://patchwork.freedesktop.org/patch/622520/?series=140200&re
> > v=6
> > 
> > 
> >                                       Regards,
> >                                       Christian.
> > 
> > 
> >                                           Matt
> > 
> > 
> >                                               Regards,
> >                                               Christian.
> > 
> > 
> > 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-12 16:22                                         ` Thomas Hellström
@ 2024-11-12 16:25                                           ` Christian König
  2024-11-12 16:33                                             ` Thomas Hellström
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2024-11-12 16:25 UTC (permalink / raw)
  To: Thomas Hellström, Joonas Lahtinen, Christian König,
	Matthew Brost
  Cc: Simona Vetter, Rodrigo Vivi, Huang Rui, intel-xe, dri-devel,
	matthew.auld, David Airlie, Simona Vetter

Am 12.11.24 um 17:22 schrieb Thomas Hellström:
> On Tue, 2024-11-12 at 15:41 +0200, Joonas Lahtinen wrote:
>> (+ Thomas)
>>
>> Quoting Christian König (2024-11-12 11:23:36)
>>> Am 11.11.24 um 23:45 schrieb Matthew Brost:
>>>
>>>      [SNIP]
>>>
>>>              So I think only way to allow interactive debugging is
>>> to avoid the
>>>              dma_fences. Curious to hear if there are ideas for
>>> otherwise.
>>>
>>>          You need to guarantee somehow that the process is taken
>>> from the hardware so
>>>          that the preemption fence can signal.
>>>
>>>
>>>      Our preemption fences have this functionality.
>>>
>>>      A preemption fence issues a suspend execution command to the
>>> firmware. The
>>>      firmware, in turn, attempts to preempt the workload. If it
>>> doesn't respond
>>>      within a specified period, it resets the hardware queue, sends
>>> a message to KMD,
>>>      bans the software queue, and signals the preemption fence.
>>>
>>>      We provide even more protection than that. If, for some reason,
>>> the firmware
>>>      doesn't respond within a longer timeout period, the KMD
>>> performs a device reset,
>>>      ban the offending software queue(s), and will signal the
>>> preemption fences.
>>>
>>>      This flow remains the same whether a debugger is attached or,
>>> for example, a
>>>      user submits a 10-minute non-preemptable workload. In either
>>> case, other
>>>      processes are guaranteed to make forward progress.
>>>
>>>
>>> Yeah that is pretty much the same argumentation I have heard before
>>> and it
>>> turned out to not be working.
>>>
>>>
>>>      The example above illustrates the memory oversubscription case,
>>> where two
>>>      processes are using 51% of the memory.
>>>
>>>
>>> That isn't even necessary. We have seen applications dying just
>>> because the
>>> core memory management tried to join back small pages into huge
>>> pages in an
>>> userptr.
>>>
>>> That the core memory management jumps in and requests that the pre-
>>> emption
>>> fence signals can happen all the time.
>> Ouch. Does there happen to be a known reproducer for this behavior or
>> maybe
>> bug report?
>>
>>> You can mitigate that a bit, Fedora for example disables joining
>>> back small
>>> pages into huge pages by default for example and we even had people
>>> suggesting
>>> to use mprotect() so that userptrs VMAs don't fork() any more
>>> (which is of
>>> course completely illegal).
>>>
>>> But my long term take away is that you can't block all causes of
>>> sudden
>>> requests to let a pre-emption fence signal.
>> I think this problem equally applies to the LR-workloads like the EU
>> debugging ones.
>>
>>>      Another preemption scenario involves two processes sharing
>>> hardware resources.
>>>      Our firmware follows the same flow here. If an LR workload is
>>> using a hardware
>>>      resource and a DMA-fence workload is waiting, and if the LR
>>> workload doesn't
>>>      preempt the in a timely manner, the firmware issues a hardware
>>> reset, notifies
>>>      KMD, and bans the LR software queue. The DMA-fence workload
>>> then can make
>>>      forward progress
>>>
>>>      With the above in mind, this is why I say that if a user tries
>>> to run a game and
>>>      a non-preemptable LR workload, either oversubscribing memory or
>>> sharing hardware
>>>      resources, it is unlikely to work well. However, I don't think
>>> this is a common
>>>      use case. I would expect that when a debugger is open, it is
>>> typically by a
>>>      power user who knows how to disable other GPU tasks (e.g., by
>>> enabling software
>>>      rendering or using a machine without any display).
>>>
>>>      Given this, please to reconsider your position.
>>>
>>>
>>> The key point here is that this isn't stable, you can do that as a
>>> tech demo
>>> but it can always be that debugging an application just randomly
>>> dies. And
>>> believe me AMD has tried this to a rather extreme extend as well.
>> It's not really only limited to the debuggable applications at all,
>> the
>> normal LR workloads are equally impacted as far as I understand. Just
>> harder to catch the issue with LR-workloads if the pre-emption fence
>> signaling is sporadic.
>>
>>> What you could potentially work is to taint the kernel and make
>>> sure that this
>>> function is only available to user who absolutely know what they
>>> are doing.
>>>
>>> But I would say we can only allow that if all other options have
>>> been exercised
>>> and doing it like this is really the only option left.
>> It sounds like servicing the memory pre-empt fence by stealing the
>> pages from underneath the workload would be the way to resolve this
>> issue.
>>
>> This has been extensively discussed already, but was expected to
>> really
>> only be needed for low-on-memory scenarios. However it now seems like
>> the need is much earlier due to the random userptr page joining by
>> core
>> mm.
> Just to clarify here:
>   
> In Long-Running mode with recoverable pagefaults enabled we don't have
> any preempt-fences, but rather just zap the PTEs pointing to the
> affected memory and flush TLB. So from a memory resource POW a
> breakpoint should be safe, and no mmu notifier nor shrinker will be
> blocked.

That sounds like a HMM based approach which would clearly work.

But where is that? I don't see any HMM based approach anywhere in the XE 
driver.

Regards,
Christian.

>
> Nor will there be any jobs with published dma-fences depending on the
> job blocked either temporarily by a pagefault or long-term by a
> debugger breakpoint.
>
> /Thomas
>
>
>> If that is done and the memory pre-empt fence is serviced even for
>> debuggable contexts, do you have further concerns with the presented
>> approach
>> from dma-buf and drm/sched perspective?
>>
>> Regards, Joonas
>>
>>> Regards,
>>> Christian.
>>>
>>>
>>>          This means that a breakpoint or core dump doesn't halt GPU
>>> threads, but
>>>          rather suspends them. E.g. all running wave data is
>>> collected into a state
>>>          bag which can be restored later on.
>>>
>>>          I was under the impression that those long running compute
>>> threads do
>>>          exactly that, but when the hardware can't switch out the
>>> GPU thread/process
>>>          while in a break then that isn't the case.
>>>
>>>          As long as you don't find a way to avoid that this patch
>>> set is a pretty
>>>          clear NAK from my side as DMA-buf and TTM maintainer.
>>>
>>>
>>>      I believe this is addressed above.
>>>
>>>      Matt
>>>
>>>
>>>          What might work is to keep the submission on the hardware
>>> in the break state
>>>          but forbid any memory access. This way you can signal your
>>> preemption fence
>>>          even when the hardware isn't made available.
>>>
>>>          Before you continue XE setups a new pre-emption fence and
>>> makes sure that
>>>          all page tables etc... are up to date.
>>>
>>>          Could be tricky to get this right if completion fence based
>>> submissions are
>>>          mixed in as well, but that gives you at least a direction
>>> you could
>>>          potentially go.
>>>
>>>          Regards,
>>>          Christian.
>>>
>>>
>>>              Regards, Joonas
>>>
>>>
>>>                  Regards,
>>>                  Christian.
>>>
>>>
>>>                      Some wash-up thoughts from me below, but
>>> consider them fairly irrelevant
>>>                      since I think the main driver for these big
>>> questions here should be
>>>                      gdb/userspace.
>>>
>>>
>>>                          Quoting Christian König (2024-11-07
>>> 11:44:33)
>>>
>>>                              Am 06.11.24 um 18:00 schrieb Matthew
>>> Brost:
>>>
>>>                                    [SNIP]
>>>
>>>                                    This is not a generic interface
>>> that anyone can freely access. The same
>>>                                    permissions used by ptrace are
>>> checked when opening such an interface.
>>>                                    See [1] [2].
>>>
>>>                                   
>>> [1]https://patchwork.freedesktop.org/patch/617470/?series=136572&re
>>> v=2
>>>                                   
>>> [2]https://patchwork.freedesktop.org/patch/617471/?series=136572&re
>>> v=2
>>>
>>>
>>>                              Thanks a lot for those pointers, that
>>> is exactly what I was looking for.
>>>
>>>                              And yeah, it is what I feared. You are
>>> re-implementing existing functionality,
>>>                              but see below.
>>>
>>>                          Could you elaborate on what this "existing
>>> functionality" exactly is?
>>>                          I do not think this functionality exists at
>>> this time.
>>>
>>>                          The EU debugging architecture for Xe
>>> specifically avoids the need for GDB
>>>                          to attach with ptrace to the CPU process or
>>> interfere with the CPU process for
>>>                          the debugging via parasitic threads or so.
>>>
>>>                          Debugger connection is opened to the DRM
>>> driver for given PID (which uses the
>>>                          ptrace may access check for now) after
>>> which the all DRM client of that
>>>                          PID are exposed to the debugger process.
>>>
>>>                          What we want to expose via that debugger
>>> connection is the ability for GDB to
>>>                          read/write the different GPU VM address
>>> spaces (ppGTT for Intel GPUs) just like
>>>                          the EU threads would see them. Note that
>>> the layout of the ppGTT is
>>>                          completely up to the userspace driver to
>>> setup and is mostly only partially
>>>                          equal to the CPU address space.
>>>
>>>                          Specifically as part of reading/writing the
>>> ppGTT for debugging purposes,
>>>                          there are deep flushes needed: for example
>>> flushing instruction cache
>>>                          when adding/removing breakpoints.
>>>
>>>                          Maybe that will explain the background. I
>>> elaborate on this at the end some more.
>>>
>>>
>>>                                            kmap/vmap are used
>>> everywhere in the DRM subsystem to access BOs, so I’m
>>>                                            failing to see the
>>> problem with adding a simple helper based on existing
>>>                                            code.
>>>
>>>                                        What#s possible and often
>>> done is to do kmap/vmap if you need to implement a
>>>                                        CPU copy for scanout for
>>> example or for copying/validating command buffers.
>>>                                        But that usually requires
>>> accessing the whole BO and has separate security
>>>                                        checks.
>>>
>>>                                        When you want to access only
>>> a few bytes of a BO that sounds massively like
>>>                                        a peek/poke like interface
>>> and we have already rejected that more than once.
>>>                                        There even used to be
>>> standardized GEM IOCTLs for that which have been
>>>                                        removed by now.
>>>
>>>                          Referring to the explanation at top: These
>>> IOCTL are not for the debugging target
>>>                          process to issue. The peek/poke interface
>>> is specifically for GDB only
>>>                          to facilitate the emulation of memory
>>> reads/writes on the GPU address
>>>                          space as they were done by EUs themselves.
>>> And to recap: for modifying
>>>                          instructions for example (add/remove
>>> breakpoint), extra level of cache flushing is
>>>                          needed which is not available to regular
>>> userspace.
>>>
>>>                          I specifically discussed with Sima on the
>>> difference before moving forward with this
>>>                          design originally. If something has changed
>>> since then, I'm of course happy to rediscuss.
>>>
>>>                          However, if this code can't be added, not
>>> sure how we would ever be able
>>>                          to implement core dumps for GPU
>>> threads/memory?
>>>
>>>
>>>                                        If you need to access BOs
>>> which are placed in not CPU accessible memory then
>>>                                        implement the access callback
>>> for ptrace, see amdgpu_ttm_access_memory for
>>>                                        an example how to do this.
>>>
>>>                          As also mentioned above, we don't work via
>>> ptrace at all when it comes
>>>                          to debugging the EUs. The only thing used
>>> for now is the ptrace_may_access to
>>>                          implement similar access restrictions as
>>> ptrace has. This can be changed
>>>                          to something else if needed.
>>>
>>>
>>>                                    Ptrace access via
>>> vm_operations_struct.access → ttm_bo_vm_access.
>>>
>>>                                    This series renames
>>> ttm_bo_vm_access to ttm_bo_access, with no code changes.
>>>
>>>                                    The above function accesses a BO
>>> via kmap if it is in SYSTEM / TT,
>>>                                    which is existing code.
>>>
>>>                                    This function is only exposed to
>>> user space via ptrace permissions.
>>>
>>>                          Maybe this sentence is what caused the
>>> confusion.
>>>
>>>                          Userspace is never exposed with peek/poke
>>> interface, only the debugger
>>>                          connection which is its own FD.
>>>
>>>
>>>                                    In this series, we implement a
>>> function [3] similar to
>>>                                    amdgpu_ttm_access_memory for the
>>> TTM vfunc access_memory. What is
>>>                                    missing is non-visible CPU memory
>>> access, similar to
>>>                                    amdgpu_ttm_access_memory_sdma.
>>> This will be addressed in a follow-up and
>>>                                    was omitted in this series given
>>> its complexity.
>>>
>>>                                    So, this looks more or less
>>> identical to AMD's ptrace implementation,
>>>                                    but in GPU address space. Again,
>>> I fail to see what the problem is here.
>>>                                    What am I missing?
>>>
>>>
>>>                              The main question is why can't you use
>>> the existing interfaces directly?
>>>
>>>                          We're not working on the CPU address space
>>> or BOs. We're working
>>>                          strictly on the GPU address space as would
>>> be seen by an EU thread if it
>>>                          accessed address X.
>>>
>>>
>>>                              Additional to the peek/poke interface
>>> of ptrace Linux has the pidfd_getfd
>>>                              system call, see
>>> herehttps://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
>>>
>>>                              The pidfd_getfd() allows to dup() the
>>> render node file descriptor into your gdb
>>>                              process. That in turn gives you all the
>>> access you need from gdb, including
>>>                              mapping BOs and command submission on
>>> behalf of the application.
>>>
>>>                          We're not operating on the CPU address
>>> space nor are we operating on BOs
>>>                          (there is no concept of BO in the EU debug
>>> interface). Each VMA in the VM
>>>                          could come from anywhere, only the start
>>> address and size matter. And
>>>                          neither do we need to interfere with the
>>> command submission of the
>>>                          process under debug.
>>>
>>>
>>>                              As far as I can see that allows for the
>>> same functionality as the eudebug
>>>                              interface, just without any driver
>>> specific code messing with ptrace
>>>                              permissions and peek/poke interfaces.
>>>
>>>                              So the question is still why do you
>>> need the whole eudebug interface in the
>>>                              first place? I might be missing
>>> something, but that seems to be superfluous
>>>                              from a high level view.
>>>
>>>                          Recapping from above. It is to allow the
>>> debugging of EU threads per DRM
>>>                          client, completely independent of the CPU
>>> process. If ptrace_may_acces
>>>                          is the sore point, we could consider other
>>> permission checks, too. There
>>>                          is no other connection to ptrace in this
>>> architecture as single
>>>                          permission check to know if PID is fair
>>> game to access by debugger
>>>                          process.
>>>
>>>                          Why no parasitic thread or ptrace: Going
>>> forward, binding the EU debugging to
>>>                          the DRM client would also pave way for
>>> being able to extend core kernel generated
>>>                          core dump with each DRM client's EU
>>> thread/memory dump. We have similar
>>>                          feature called "Offline core dump" enabled
>>> in the downstream public
>>>                          trees for i915, where we currently attach
>>> the EU thread dump to i915 error state
>>>                          and then later combine i915 error state
>>> with CPU core dump file with a
>>>                          tool.
>>>
>>>                          This is relatively little amount of extra
>>> code, as this baseline series
>>>                          already introduces GDB the ability to
>>> perform the necessary actions.
>>>                          It's just the matter of kernel driver
>>> calling: "stop all threads", then
>>>                          copying the memory map and memory contents
>>> for GPU threads, just like is
>>>                          done for CPU threads.
>>>
>>>                          With parasitic thread injection, not sure
>>> if there is such way forward,
>>>                          as it would seem to require to inject quite
>>> abit more logic to core kernel?
>>>
>>>
>>>                              It's true that the AMD KFD part has
>>> still similar functionality, but that is
>>>                              because of the broken KFD design of
>>> tying driver state to the CPU process
>>>                              (which makes it inaccessible for gdb
>>> even with imported render node fd).
>>>
>>>                              Both Sima and I (and partially Dave as
>>> well) have pushed back on the KFD
>>>                              approach. And the long term plan is to
>>> get rid of such device driver specific
>>>                              interface which re-implement existing
>>> functionality just differently.
>>>
>>>                          Recapping, this series is not adding it
>>> back. The debugger connection
>>>                          is a separate FD from the DRM one, with
>>> separate IOCTL set. We don't allow
>>>                          the DRM FD any new operations based on
>>> ptrace is attached or not. We
>>>                          don't ever do that check even.
>>>
>>>                          We only restrict the opening of the
>>> debugger connection to given PID with
>>>                          ptrace_may_access check for now. That can
>>> be changed to something else,
>>>                          if necessary.
>>>
>>>                      Yeah I think unnecessarily tying gpu processes
>>> to cpu processes is a bad
>>>                      thing, least because even today all the svm
>>> discussions we have still hit
>>>                      clear use-cases, where a 1:1 match is not
>>> wanted (like multiple gpu svm
>>>                      sections with offsets). Not even speaking of
>>> all the gpu usecases where
>>>                      the gpu vm space is still entirely independent
>>> of the cpu side.
>>>
>>>                      So that's why I think this entirely separate
>>> approach looks like the right
>>>                      one, with ptrace_may_access as the access
>>> control check to make sure we
>>>                      match ptrace on the cpu side.
>>>
>>>                      But there's very obviously a bikeshed to be had
>>> on what the actual uapi
>>>                      should look like, especially how gdb opens up a
>>> gpu debug access fd. But I
>>>                      also think that's not much on drm to decide,
>>> but whatever gdb wants. And
>>>                      then we aim for some consistency on that
>>> lookup/access control part
>>>                      (ideally, I might be missing some reasons why
>>> this is a bad idea) across
>>>                      drm drivers.
>>>
>>>
>>>                              So you need to have a really really
>>> good explanation why the eudebug interface
>>>                              is actually necessary.
>>>
>>>                          TL;DR The main point is to decouple the
>>> debugging of the EU workloads from the
>>>                          debugging of the CPU process. This avoids
>>> the interference with the CPU process with
>>>                          parasitic thread injection. Further this
>>> also allows generating a core dump
>>>                          without any GDB connected. There are also
>>> many other smaller pros/cons
>>>                          which can be discussed but for the context
>>> of this patch, this is the
>>>                          main one.
>>>
>>>                          So unlike parasitic thread injection, we
>>> don't unlock any special IOCTL for
>>>                          the process under debug to be performed by
>>> the parasitic thread, but we
>>>                          allow the minimal set of operations to be
>>> performed by GDB as if those were
>>>                          done on the EUs themselves.
>>>
>>>                          One can think of it like the minimal subset
>>> of ptrace but for EU threads,
>>>                          not the CPU threads. And thus, building on
>>> this it's possible to extend
>>>                          the core kernel generated core dumps with
>>> DRM specific extension which
>>>                          would contain the EU thread/memory dump.
>>>
>>>                      It might be good to document (in that debugging
>>> doc patch probably) why
>>>                      thread injection is not a great option, and why
>>> the tradeoffs for
>>>                      debugging are different than for for
>>> checkpoint/restore, where with CRIU
>>>                      we landed on doing most of this in userspace,
>>> and often requiring
>>>                      injection threads to make it all work.
>>>
>>>                      Cheers, Sima
>>>
>>>
>>>                          Regards, Joonas
>>>
>>>
>>>                              Regards,
>>>                              Christian.
>>>
>>>
>>>
>>>                                    Matt
>>>
>>>                                   
>>> [3]https://patchwork.freedesktop.org/patch/622520/?series=140200&re
>>> v=6
>>>
>>>
>>>                                        Regards,
>>>                                        Christian.
>>>
>>>
>>>                                            Matt
>>>
>>>
>>>                                                Regards,
>>>                                                Christian.
>>>
>>>
>>>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-12 16:25                                           ` Christian König
@ 2024-11-12 16:33                                             ` Thomas Hellström
  2024-11-13  8:37                                               ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Thomas Hellström @ 2024-11-12 16:33 UTC (permalink / raw)
  To: Christian König, Joonas Lahtinen, Christian König,
	Matthew Brost
  Cc: Simona Vetter, Rodrigo Vivi, Huang Rui, intel-xe, dri-devel,
	matthew.auld, David Airlie, Simona Vetter

On Tue, 2024-11-12 at 17:25 +0100, Christian König wrote:
> Am 12.11.24 um 17:22 schrieb Thomas Hellström:
> > On Tue, 2024-11-12 at 15:41 +0200, Joonas Lahtinen wrote:
> > > (+ Thomas)
> > > 
> > > Quoting Christian König (2024-11-12 11:23:36)
> > > > Am 11.11.24 um 23:45 schrieb Matthew Brost:
> > > > 
> > > >      [SNIP]
> > > > 
> > > >              So I think only way to allow interactive debugging
> > > > is
> > > > to avoid the
> > > >              dma_fences. Curious to hear if there are ideas for
> > > > otherwise.
> > > > 
> > > >          You need to guarantee somehow that the process is
> > > > taken
> > > > from the hardware so
> > > >          that the preemption fence can signal.
> > > > 
> > > > 
> > > >      Our preemption fences have this functionality.
> > > > 
> > > >      A preemption fence issues a suspend execution command to
> > > > the
> > > > firmware. The
> > > >      firmware, in turn, attempts to preempt the workload. If it
> > > > doesn't respond
> > > >      within a specified period, it resets the hardware queue,
> > > > sends
> > > > a message to KMD,
> > > >      bans the software queue, and signals the preemption fence.
> > > > 
> > > >      We provide even more protection than that. If, for some
> > > > reason,
> > > > the firmware
> > > >      doesn't respond within a longer timeout period, the KMD
> > > > performs a device reset,
> > > >      ban the offending software queue(s), and will signal the
> > > > preemption fences.
> > > > 
> > > >      This flow remains the same whether a debugger is attached
> > > > or,
> > > > for example, a
> > > >      user submits a 10-minute non-preemptable workload. In
> > > > either
> > > > case, other
> > > >      processes are guaranteed to make forward progress.
> > > > 
> > > > 
> > > > Yeah that is pretty much the same argumentation I have heard
> > > > before
> > > > and it
> > > > turned out to not be working.
> > > > 
> > > > 
> > > >      The example above illustrates the memory oversubscription
> > > > case,
> > > > where two
> > > >      processes are using 51% of the memory.
> > > > 
> > > > 
> > > > That isn't even necessary. We have seen applications dying just
> > > > because the
> > > > core memory management tried to join back small pages into huge
> > > > pages in an
> > > > userptr.
> > > > 
> > > > That the core memory management jumps in and requests that the
> > > > pre-
> > > > emption
> > > > fence signals can happen all the time.
> > > Ouch. Does there happen to be a known reproducer for this
> > > behavior or
> > > maybe
> > > bug report?
> > > 
> > > > You can mitigate that a bit, Fedora for example disables
> > > > joining
> > > > back small
> > > > pages into huge pages by default for example and we even had
> > > > people
> > > > suggesting
> > > > to use mprotect() so that userptrs VMAs don't fork() any more
> > > > (which is of
> > > > course completely illegal).
> > > > 
> > > > But my long term take away is that you can't block all causes
> > > > of
> > > > sudden
> > > > requests to let a pre-emption fence signal.
> > > I think this problem equally applies to the LR-workloads like the
> > > EU
> > > debugging ones.
> > > 
> > > >      Another preemption scenario involves two processes sharing
> > > > hardware resources.
> > > >      Our firmware follows the same flow here. If an LR workload
> > > > is
> > > > using a hardware
> > > >      resource and a DMA-fence workload is waiting, and if the
> > > > LR
> > > > workload doesn't
> > > >      preempt the in a timely manner, the firmware issues a
> > > > hardware
> > > > reset, notifies
> > > >      KMD, and bans the LR software queue. The DMA-fence
> > > > workload
> > > > then can make
> > > >      forward progress
> > > > 
> > > >      With the above in mind, this is why I say that if a user
> > > > tries
> > > > to run a game and
> > > >      a non-preemptable LR workload, either oversubscribing
> > > > memory or
> > > > sharing hardware
> > > >      resources, it is unlikely to work well. However, I don't
> > > > think
> > > > this is a common
> > > >      use case. I would expect that when a debugger is open, it
> > > > is
> > > > typically by a
> > > >      power user who knows how to disable other GPU tasks (e.g.,
> > > > by
> > > > enabling software
> > > >      rendering or using a machine without any display).
> > > > 
> > > >      Given this, please to reconsider your position.
> > > > 
> > > > 
> > > > The key point here is that this isn't stable, you can do that
> > > > as a
> > > > tech demo
> > > > but it can always be that debugging an application just
> > > > randomly
> > > > dies. And
> > > > believe me AMD has tried this to a rather extreme extend as
> > > > well.
> > > It's not really only limited to the debuggable applications at
> > > all,
> > > the
> > > normal LR workloads are equally impacted as far as I understand.
> > > Just
> > > harder to catch the issue with LR-workloads if the pre-emption
> > > fence
> > > signaling is sporadic.
> > > 
> > > > What you could potentially work is to taint the kernel and make
> > > > sure that this
> > > > function is only available to user who absolutely know what
> > > > they
> > > > are doing.
> > > > 
> > > > But I would say we can only allow that if all other options
> > > > have
> > > > been exercised
> > > > and doing it like this is really the only option left.
> > > It sounds like servicing the memory pre-empt fence by stealing
> > > the
> > > pages from underneath the workload would be the way to resolve
> > > this
> > > issue.
> > > 
> > > This has been extensively discussed already, but was expected to
> > > really
> > > only be needed for low-on-memory scenarios. However it now seems
> > > like
> > > the need is much earlier due to the random userptr page joining
> > > by
> > > core
> > > mm.
> > Just to clarify here:
> >   
> > In Long-Running mode with recoverable pagefaults enabled we don't
> > have
> > any preempt-fences, but rather just zap the PTEs pointing to the
> > affected memory and flush TLB. So from a memory resource POW a
> > breakpoint should be safe, and no mmu notifier nor shrinker will be
> > blocked.
> 
> That sounds like a HMM based approach which would clearly work.
> 
> But where is that? I don't see any HMM based approach anywhere in the
> XE 
> driver.

This is a mode that uses recoverable pagefaults to fault either full
userptr or full bos, and used with DRM_XE_VM_CRATE_FLAG_FAULT_MODE.
(not SVM)!

userptrs in xe are bo-less, and using the vm's resv, but otherwise
using hmm similar to amdgpu: xe_hmm.c

fault servicing:
xe_gt_pagefault.c

PTE zapping on eviction and notifier:
xe_vm_invalidate_vma(), xe_vm.c

Thanks,
Thomas

> 
> Regards,
> Christian.
> 
> > 
> > Nor will there be any jobs with published dma-fences depending on
> > the
> > job blocked either temporarily by a pagefault or long-term by a
> > debugger breakpoint.
> > 
> > /Thomas
> > 
> > 
> > > If that is done and the memory pre-empt fence is serviced even
> > > for
> > > debuggable contexts, do you have further concerns with the
> > > presented
> > > approach
> > > from dma-buf and drm/sched perspective?
> > > 
> > > Regards, Joonas
> > > 
> > > > Regards,
> > > > Christian.
> > > > 
> > > > 
> > > >          This means that a breakpoint or core dump doesn't halt
> > > > GPU
> > > > threads, but
> > > >          rather suspends them. E.g. all running wave data is
> > > > collected into a state
> > > >          bag which can be restored later on.
> > > > 
> > > >          I was under the impression that those long running
> > > > compute
> > > > threads do
> > > >          exactly that, but when the hardware can't switch out
> > > > the
> > > > GPU thread/process
> > > >          while in a break then that isn't the case.
> > > > 
> > > >          As long as you don't find a way to avoid that this
> > > > patch
> > > > set is a pretty
> > > >          clear NAK from my side as DMA-buf and TTM maintainer.
> > > > 
> > > > 
> > > >      I believe this is addressed above.
> > > > 
> > > >      Matt
> > > > 
> > > > 
> > > >          What might work is to keep the submission on the
> > > > hardware
> > > > in the break state
> > > >          but forbid any memory access. This way you can signal
> > > > your
> > > > preemption fence
> > > >          even when the hardware isn't made available.
> > > > 
> > > >          Before you continue XE setups a new pre-emption fence
> > > > and
> > > > makes sure that
> > > >          all page tables etc... are up to date.
> > > > 
> > > >          Could be tricky to get this right if completion fence
> > > > based
> > > > submissions are
> > > >          mixed in as well, but that gives you at least a
> > > > direction
> > > > you could
> > > >          potentially go.
> > > > 
> > > >          Regards,
> > > >          Christian.
> > > > 
> > > > 
> > > >              Regards, Joonas
> > > > 
> > > > 
> > > >                  Regards,
> > > >                  Christian.
> > > > 
> > > > 
> > > >                      Some wash-up thoughts from me below, but
> > > > consider them fairly irrelevant
> > > >                      since I think the main driver for these
> > > > big
> > > > questions here should be
> > > >                      gdb/userspace.
> > > > 
> > > > 
> > > >                          Quoting Christian König (2024-11-07
> > > > 11:44:33)
> > > > 
> > > >                              Am 06.11.24 um 18:00 schrieb
> > > > Matthew
> > > > Brost:
> > > > 
> > > >                                    [SNIP]
> > > > 
> > > >                                    This is not a generic
> > > > interface
> > > > that anyone can freely access. The same
> > > >                                    permissions used by ptrace
> > > > are
> > > > checked when opening such an interface.
> > > >                                    See [1] [2].
> > > > 
> > > >                                   
> > > > [1]
> > > > https://patchwork.freedesktop.org/patch/617470/?series=136572&r
> > > > e
> > > > v=2
> > > >                                   
> > > > [2]
> > > > https://patchwork.freedesktop.org/patch/617471/?series=136572&r
> > > > e
> > > > v=2
> > > > 
> > > > 
> > > >                              Thanks a lot for those pointers,
> > > > that
> > > > is exactly what I was looking for.
> > > > 
> > > >                              And yeah, it is what I feared. You
> > > > are
> > > > re-implementing existing functionality,
> > > >                              but see below.
> > > > 
> > > >                          Could you elaborate on what this
> > > > "existing
> > > > functionality" exactly is?
> > > >                          I do not think this functionality
> > > > exists at
> > > > this time.
> > > > 
> > > >                          The EU debugging architecture for Xe
> > > > specifically avoids the need for GDB
> > > >                          to attach with ptrace to the CPU
> > > > process or
> > > > interfere with the CPU process for
> > > >                          the debugging via parasitic threads or
> > > > so.
> > > > 
> > > >                          Debugger connection is opened to the
> > > > DRM
> > > > driver for given PID (which uses the
> > > >                          ptrace may access check for now) after
> > > > which the all DRM client of that
> > > >                          PID are exposed to the debugger
> > > > process.
> > > > 
> > > >                          What we want to expose via that
> > > > debugger
> > > > connection is the ability for GDB to
> > > >                          read/write the different GPU VM
> > > > address
> > > > spaces (ppGTT for Intel GPUs) just like
> > > >                          the EU threads would see them. Note
> > > > that
> > > > the layout of the ppGTT is
> > > >                          completely up to the userspace driver
> > > > to
> > > > setup and is mostly only partially
> > > >                          equal to the CPU address space.
> > > > 
> > > >                          Specifically as part of
> > > > reading/writing the
> > > > ppGTT for debugging purposes,
> > > >                          there are deep flushes needed: for
> > > > example
> > > > flushing instruction cache
> > > >                          when adding/removing breakpoints.
> > > > 
> > > >                          Maybe that will explain the
> > > > background. I
> > > > elaborate on this at the end some more.
> > > > 
> > > > 
> > > >                                            kmap/vmap are used
> > > > everywhere in the DRM subsystem to access BOs, so I’m
> > > >                                            failing to see the
> > > > problem with adding a simple helper based on existing
> > > >                                            code.
> > > > 
> > > >                                        What#s possible and
> > > > often
> > > > done is to do kmap/vmap if you need to implement a
> > > >                                        CPU copy for scanout for
> > > > example or for copying/validating command buffers.
> > > >                                        But that usually
> > > > requires
> > > > accessing the whole BO and has separate security
> > > >                                        checks.
> > > > 
> > > >                                        When you want to access
> > > > only
> > > > a few bytes of a BO that sounds massively like
> > > >                                        a peek/poke like
> > > > interface
> > > > and we have already rejected that more than once.
> > > >                                        There even used to be
> > > > standardized GEM IOCTLs for that which have been
> > > >                                        removed by now.
> > > > 
> > > >                          Referring to the explanation at top:
> > > > These
> > > > IOCTL are not for the debugging target
> > > >                          process to issue. The peek/poke
> > > > interface
> > > > is specifically for GDB only
> > > >                          to facilitate the emulation of memory
> > > > reads/writes on the GPU address
> > > >                          space as they were done by EUs
> > > > themselves.
> > > > And to recap: for modifying
> > > >                          instructions for example (add/remove
> > > > breakpoint), extra level of cache flushing is
> > > >                          needed which is not available to
> > > > regular
> > > > userspace.
> > > > 
> > > >                          I specifically discussed with Sima on
> > > > the
> > > > difference before moving forward with this
> > > >                          design originally. If something has
> > > > changed
> > > > since then, I'm of course happy to rediscuss.
> > > > 
> > > >                          However, if this code can't be added,
> > > > not
> > > > sure how we would ever be able
> > > >                          to implement core dumps for GPU
> > > > threads/memory?
> > > > 
> > > > 
> > > >                                        If you need to access
> > > > BOs
> > > > which are placed in not CPU accessible memory then
> > > >                                        implement the access
> > > > callback
> > > > for ptrace, see amdgpu_ttm_access_memory for
> > > >                                        an example how to do
> > > > this.
> > > > 
> > > >                          As also mentioned above, we don't work
> > > > via
> > > > ptrace at all when it comes
> > > >                          to debugging the EUs. The only thing
> > > > used
> > > > for now is the ptrace_may_access to
> > > >                          implement similar access restrictions
> > > > as
> > > > ptrace has. This can be changed
> > > >                          to something else if needed.
> > > > 
> > > > 
> > > >                                    Ptrace access via
> > > > vm_operations_struct.access → ttm_bo_vm_access.
> > > > 
> > > >                                    This series renames
> > > > ttm_bo_vm_access to ttm_bo_access, with no code changes.
> > > > 
> > > >                                    The above function accesses
> > > > a BO
> > > > via kmap if it is in SYSTEM / TT,
> > > >                                    which is existing code.
> > > > 
> > > >                                    This function is only
> > > > exposed to
> > > > user space via ptrace permissions.
> > > > 
> > > >                          Maybe this sentence is what caused the
> > > > confusion.
> > > > 
> > > >                          Userspace is never exposed with
> > > > peek/poke
> > > > interface, only the debugger
> > > >                          connection which is its own FD.
> > > > 
> > > > 
> > > >                                    In this series, we implement
> > > > a
> > > > function [3] similar to
> > > >                                    amdgpu_ttm_access_memory for
> > > > the
> > > > TTM vfunc access_memory. What is
> > > >                                    missing is non-visible CPU
> > > > memory
> > > > access, similar to
> > > >                                   
> > > > amdgpu_ttm_access_memory_sdma.
> > > > This will be addressed in a follow-up and
> > > >                                    was omitted in this series
> > > > given
> > > > its complexity.
> > > > 
> > > >                                    So, this looks more or less
> > > > identical to AMD's ptrace implementation,
> > > >                                    but in GPU address space.
> > > > Again,
> > > > I fail to see what the problem is here.
> > > >                                    What am I missing?
> > > > 
> > > > 
> > > >                              The main question is why can't you
> > > > use
> > > > the existing interfaces directly?
> > > > 
> > > >                          We're not working on the CPU address
> > > > space
> > > > or BOs. We're working
> > > >                          strictly on the GPU address space as
> > > > would
> > > > be seen by an EU thread if it
> > > >                          accessed address X.
> > > > 
> > > > 
> > > >                              Additional to the peek/poke
> > > > interface
> > > > of ptrace Linux has the pidfd_getfd
> > > >                              system call, see
> > > > herehttps://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
> > > > 
> > > >                              The pidfd_getfd() allows to dup()
> > > > the
> > > > render node file descriptor into your gdb
> > > >                              process. That in turn gives you
> > > > all the
> > > > access you need from gdb, including
> > > >                              mapping BOs and command submission
> > > > on
> > > > behalf of the application.
> > > > 
> > > >                          We're not operating on the CPU address
> > > > space nor are we operating on BOs
> > > >                          (there is no concept of BO in the EU
> > > > debug
> > > > interface). Each VMA in the VM
> > > >                          could come from anywhere, only the
> > > > start
> > > > address and size matter. And
> > > >                          neither do we need to interfere with
> > > > the
> > > > command submission of the
> > > >                          process under debug.
> > > > 
> > > > 
> > > >                              As far as I can see that allows
> > > > for the
> > > > same functionality as the eudebug
> > > >                              interface, just without any driver
> > > > specific code messing with ptrace
> > > >                              permissions and peek/poke
> > > > interfaces.
> > > > 
> > > >                              So the question is still why do
> > > > you
> > > > need the whole eudebug interface in the
> > > >                              first place? I might be missing
> > > > something, but that seems to be superfluous
> > > >                              from a high level view.
> > > > 
> > > >                          Recapping from above. It is to allow
> > > > the
> > > > debugging of EU threads per DRM
> > > >                          client, completely independent of the
> > > > CPU
> > > > process. If ptrace_may_acces
> > > >                          is the sore point, we could consider
> > > > other
> > > > permission checks, too. There
> > > >                          is no other connection to ptrace in
> > > > this
> > > > architecture as single
> > > >                          permission check to know if PID is
> > > > fair
> > > > game to access by debugger
> > > >                          process.
> > > > 
> > > >                          Why no parasitic thread or ptrace:
> > > > Going
> > > > forward, binding the EU debugging to
> > > >                          the DRM client would also pave way for
> > > > being able to extend core kernel generated
> > > >                          core dump with each DRM client's EU
> > > > thread/memory dump. We have similar
> > > >                          feature called "Offline core dump"
> > > > enabled
> > > > in the downstream public
> > > >                          trees for i915, where we currently
> > > > attach
> > > > the EU thread dump to i915 error state
> > > >                          and then later combine i915 error
> > > > state
> > > > with CPU core dump file with a
> > > >                          tool.
> > > > 
> > > >                          This is relatively little amount of
> > > > extra
> > > > code, as this baseline series
> > > >                          already introduces GDB the ability to
> > > > perform the necessary actions.
> > > >                          It's just the matter of kernel driver
> > > > calling: "stop all threads", then
> > > >                          copying the memory map and memory
> > > > contents
> > > > for GPU threads, just like is
> > > >                          done for CPU threads.
> > > > 
> > > >                          With parasitic thread injection, not
> > > > sure
> > > > if there is such way forward,
> > > >                          as it would seem to require to inject
> > > > quite
> > > > abit more logic to core kernel?
> > > > 
> > > > 
> > > >                              It's true that the AMD KFD part
> > > > has
> > > > still similar functionality, but that is
> > > >                              because of the broken KFD design
> > > > of
> > > > tying driver state to the CPU process
> > > >                              (which makes it inaccessible for
> > > > gdb
> > > > even with imported render node fd).
> > > > 
> > > >                              Both Sima and I (and partially
> > > > Dave as
> > > > well) have pushed back on the KFD
> > > >                              approach. And the long term plan
> > > > is to
> > > > get rid of such device driver specific
> > > >                              interface which re-implement
> > > > existing
> > > > functionality just differently.
> > > > 
> > > >                          Recapping, this series is not adding
> > > > it
> > > > back. The debugger connection
> > > >                          is a separate FD from the DRM one,
> > > > with
> > > > separate IOCTL set. We don't allow
> > > >                          the DRM FD any new operations based on
> > > > ptrace is attached or not. We
> > > >                          don't ever do that check even.
> > > > 
> > > >                          We only restrict the opening of the
> > > > debugger connection to given PID with
> > > >                          ptrace_may_access check for now. That
> > > > can
> > > > be changed to something else,
> > > >                          if necessary.
> > > > 
> > > >                      Yeah I think unnecessarily tying gpu
> > > > processes
> > > > to cpu processes is a bad
> > > >                      thing, least because even today all the
> > > > svm
> > > > discussions we have still hit
> > > >                      clear use-cases, where a 1:1 match is not
> > > > wanted (like multiple gpu svm
> > > >                      sections with offsets). Not even speaking
> > > > of
> > > > all the gpu usecases where
> > > >                      the gpu vm space is still entirely
> > > > independent
> > > > of the cpu side.
> > > > 
> > > >                      So that's why I think this entirely
> > > > separate
> > > > approach looks like the right
> > > >                      one, with ptrace_may_access as the access
> > > > control check to make sure we
> > > >                      match ptrace on the cpu side.
> > > > 
> > > >                      But there's very obviously a bikeshed to
> > > > be had
> > > > on what the actual uapi
> > > >                      should look like, especially how gdb opens
> > > > up a
> > > > gpu debug access fd. But I
> > > >                      also think that's not much on drm to
> > > > decide,
> > > > but whatever gdb wants. And
> > > >                      then we aim for some consistency on that
> > > > lookup/access control part
> > > >                      (ideally, I might be missing some reasons
> > > > why
> > > > this is a bad idea) across
> > > >                      drm drivers.
> > > > 
> > > > 
> > > >                              So you need to have a really
> > > > really
> > > > good explanation why the eudebug interface
> > > >                              is actually necessary.
> > > > 
> > > >                          TL;DR The main point is to decouple
> > > > the
> > > > debugging of the EU workloads from the
> > > >                          debugging of the CPU process. This
> > > > avoids
> > > > the interference with the CPU process with
> > > >                          parasitic thread injection. Further
> > > > this
> > > > also allows generating a core dump
> > > >                          without any GDB connected. There are
> > > > also
> > > > many other smaller pros/cons
> > > >                          which can be discussed but for the
> > > > context
> > > > of this patch, this is the
> > > >                          main one.
> > > > 
> > > >                          So unlike parasitic thread injection,
> > > > we
> > > > don't unlock any special IOCTL for
> > > >                          the process under debug to be
> > > > performed by
> > > > the parasitic thread, but we
> > > >                          allow the minimal set of operations to
> > > > be
> > > > performed by GDB as if those were
> > > >                          done on the EUs themselves.
> > > > 
> > > >                          One can think of it like the minimal
> > > > subset
> > > > of ptrace but for EU threads,
> > > >                          not the CPU threads. And thus,
> > > > building on
> > > > this it's possible to extend
> > > >                          the core kernel generated core dumps
> > > > with
> > > > DRM specific extension which
> > > >                          would contain the EU thread/memory
> > > > dump.
> > > > 
> > > >                      It might be good to document (in that
> > > > debugging
> > > > doc patch probably) why
> > > >                      thread injection is not a great option,
> > > > and why
> > > > the tradeoffs for
> > > >                      debugging are different than for for
> > > > checkpoint/restore, where with CRIU
> > > >                      we landed on doing most of this in
> > > > userspace,
> > > > and often requiring
> > > >                      injection threads to make it all work.
> > > > 
> > > >                      Cheers, Sima
> > > > 
> > > > 
> > > >                          Regards, Joonas
> > > > 
> > > > 
> > > >                              Regards,
> > > >                              Christian.
> > > > 
> > > > 
> > > > 
> > > >                                    Matt
> > > > 
> > > >                                   
> > > > [3]
> > > > https://patchwork.freedesktop.org/patch/622520/?series=140200&r
> > > > e
> > > > v=6
> > > > 
> > > > 
> > > >                                        Regards,
> > > >                                        Christian.
> > > > 
> > > > 
> > > >                                            Matt
> > > > 
> > > > 
> > > >                                                Regards,
> > > >                                                Christian.
> > > > 
> > > > 
> > > > 
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-12 16:33                                             ` Thomas Hellström
@ 2024-11-13  8:37                                               ` Christian König
  2024-11-13 10:44                                                 ` Thomas Hellström
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2024-11-13  8:37 UTC (permalink / raw)
  To: Thomas Hellström, Joonas Lahtinen, Christian König,
	Matthew Brost
  Cc: Simona Vetter, Rodrigo Vivi, Huang Rui, intel-xe, dri-devel,
	matthew.auld, David Airlie, Simona Vetter

[-- Attachment #1: Type: text/plain, Size: 26965 bytes --]

Am 12.11.24 um 17:33 schrieb Thomas Hellström:
> [SNIP]
>>>> This has been extensively discussed already, but was expected to
>>>> really
>>>> only be needed for low-on-memory scenarios. However it now seems
>>>> like
>>>> the need is much earlier due to the random userptr page joining
>>>> by
>>>> core
>>>> mm.
>>> Just to clarify here:
>>>    
>>> In Long-Running mode with recoverable pagefaults enabled we don't
>>> have
>>> any preempt-fences, but rather just zap the PTEs pointing to the
>>> affected memory and flush TLB. So from a memory resource POW a
>>> breakpoint should be safe, and no mmu notifier nor shrinker will be
>>> blocked.
>> That sounds like a HMM based approach which would clearly work.
>>
>> But where is that? I don't see any HMM based approach anywhere in the
>> XE
>> driver.
> This is a mode that uses recoverable pagefaults to fault either full
> userptr or full bos, and used with DRM_XE_VM_CRATE_FLAG_FAULT_MODE.
> (not SVM)!
>
> userptrs in xe are bo-less, and using the vm's resv, but otherwise
> using hmm similar to amdgpu: xe_hmm.c

Yeah, I have seen that one.

> fault servicing:
> xe_gt_pagefault.c
>
> PTE zapping on eviction and notifier:
> xe_vm_invalidate_vma(), xe_vm.c

Ah, that was the stuff I was missing.

So the implementation in xe_preempt_fence.c is just for graphics 
submissions? That would make the whole thing much easier to handle.

The only remaining question I can then see is if long running 
submissions with DRM_XE_VM_CRATE_FLAG_FAULT_MODE could potentially block 
graphics submissions without this flag from accessing the hardware?

Thanks a lot for pointing this out,
Christian.

>
> Thanks,
> Thomas
>
>> Regards,
>> Christian.
>>
>>> Nor will there be any jobs with published dma-fences depending on
>>> the
>>> job blocked either temporarily by a pagefault or long-term by a
>>> debugger breakpoint.
>>>
>>> /Thomas
>>>
>>>
>>>> If that is done and the memory pre-empt fence is serviced even
>>>> for
>>>> debuggable contexts, do you have further concerns with the
>>>> presented
>>>> approach
>>>> from dma-buf and drm/sched perspective?
>>>>
>>>> Regards, Joonas
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>
>>>>>           This means that a breakpoint or core dump doesn't halt
>>>>> GPU
>>>>> threads, but
>>>>>           rather suspends them. E.g. all running wave data is
>>>>> collected into a state
>>>>>           bag which can be restored later on.
>>>>>
>>>>>           I was under the impression that those long running
>>>>> compute
>>>>> threads do
>>>>>           exactly that, but when the hardware can't switch out
>>>>> the
>>>>> GPU thread/process
>>>>>           while in a break then that isn't the case.
>>>>>
>>>>>           As long as you don't find a way to avoid that this
>>>>> patch
>>>>> set is a pretty
>>>>>           clear NAK from my side as DMA-buf and TTM maintainer.
>>>>>
>>>>>
>>>>>       I believe this is addressed above.
>>>>>
>>>>>       Matt
>>>>>
>>>>>
>>>>>           What might work is to keep the submission on the
>>>>> hardware
>>>>> in the break state
>>>>>           but forbid any memory access. This way you can signal
>>>>> your
>>>>> preemption fence
>>>>>           even when the hardware isn't made available.
>>>>>
>>>>>           Before you continue XE setups a new pre-emption fence
>>>>> and
>>>>> makes sure that
>>>>>           all page tables etc... are up to date.
>>>>>
>>>>>           Could be tricky to get this right if completion fence
>>>>> based
>>>>> submissions are
>>>>>           mixed in as well, but that gives you at least a
>>>>> direction
>>>>> you could
>>>>>           potentially go.
>>>>>
>>>>>           Regards,
>>>>>           Christian.
>>>>>
>>>>>
>>>>>               Regards, Joonas
>>>>>
>>>>>
>>>>>                   Regards,
>>>>>                   Christian.
>>>>>
>>>>>
>>>>>                       Some wash-up thoughts from me below, but
>>>>> consider them fairly irrelevant
>>>>>                       since I think the main driver for these
>>>>> big
>>>>> questions here should be
>>>>>                       gdb/userspace.
>>>>>
>>>>>
>>>>>                           Quoting Christian König (2024-11-07
>>>>> 11:44:33)
>>>>>
>>>>>                               Am 06.11.24 um 18:00 schrieb
>>>>> Matthew
>>>>> Brost:
>>>>>
>>>>>                                     [SNIP]
>>>>>
>>>>>                                     This is not a generic
>>>>> interface
>>>>> that anyone can freely access. The same
>>>>>                                     permissions used by ptrace
>>>>> are
>>>>> checked when opening such an interface.
>>>>>                                     See [1] [2].
>>>>>
>>>>>                                    
>>>>> [1]
>>>>> https://patchwork.freedesktop.org/patch/617470/?series=136572&r
>>>>> e
>>>>> v=2
>>>>>                                    
>>>>> [2]
>>>>> https://patchwork.freedesktop.org/patch/617471/?series=136572&r
>>>>> e
>>>>> v=2
>>>>>
>>>>>
>>>>>                               Thanks a lot for those pointers,
>>>>> that
>>>>> is exactly what I was looking for.
>>>>>
>>>>>                               And yeah, it is what I feared. You
>>>>> are
>>>>> re-implementing existing functionality,
>>>>>                               but see below.
>>>>>
>>>>>                           Could you elaborate on what this
>>>>> "existing
>>>>> functionality" exactly is?
>>>>>                           I do not think this functionality
>>>>> exists at
>>>>> this time.
>>>>>
>>>>>                           The EU debugging architecture for Xe
>>>>> specifically avoids the need for GDB
>>>>>                           to attach with ptrace to the CPU
>>>>> process or
>>>>> interfere with the CPU process for
>>>>>                           the debugging via parasitic threads or
>>>>> so.
>>>>>
>>>>>                           Debugger connection is opened to the
>>>>> DRM
>>>>> driver for given PID (which uses the
>>>>>                           ptrace may access check for now) after
>>>>> which the all DRM client of that
>>>>>                           PID are exposed to the debugger
>>>>> process.
>>>>>
>>>>>                           What we want to expose via that
>>>>> debugger
>>>>> connection is the ability for GDB to
>>>>>                           read/write the different GPU VM
>>>>> address
>>>>> spaces (ppGTT for Intel GPUs) just like
>>>>>                           the EU threads would see them. Note
>>>>> that
>>>>> the layout of the ppGTT is
>>>>>                           completely up to the userspace driver
>>>>> to
>>>>> setup and is mostly only partially
>>>>>                           equal to the CPU address space.
>>>>>
>>>>>                           Specifically as part of
>>>>> reading/writing the
>>>>> ppGTT for debugging purposes,
>>>>>                           there are deep flushes needed: for
>>>>> example
>>>>> flushing instruction cache
>>>>>                           when adding/removing breakpoints.
>>>>>
>>>>>                           Maybe that will explain the
>>>>> background. I
>>>>> elaborate on this at the end some more.
>>>>>
>>>>>
>>>>>                                             kmap/vmap are used
>>>>> everywhere in the DRM subsystem to access BOs, so I’m
>>>>>                                             failing to see the
>>>>> problem with adding a simple helper based on existing
>>>>>                                             code.
>>>>>
>>>>>                                         What#s possible and
>>>>> often
>>>>> done is to do kmap/vmap if you need to implement a
>>>>>                                         CPU copy for scanout for
>>>>> example or for copying/validating command buffers.
>>>>>                                         But that usually
>>>>> requires
>>>>> accessing the whole BO and has separate security
>>>>>                                         checks.
>>>>>
>>>>>                                         When you want to access
>>>>> only
>>>>> a few bytes of a BO that sounds massively like
>>>>>                                         a peek/poke like
>>>>> interface
>>>>> and we have already rejected that more than once.
>>>>>                                         There even used to be
>>>>> standardized GEM IOCTLs for that which have been
>>>>>                                         removed by now.
>>>>>
>>>>>                           Referring to the explanation at top:
>>>>> These
>>>>> IOCTL are not for the debugging target
>>>>>                           process to issue. The peek/poke
>>>>> interface
>>>>> is specifically for GDB only
>>>>>                           to facilitate the emulation of memory
>>>>> reads/writes on the GPU address
>>>>>                           space as they were done by EUs
>>>>> themselves.
>>>>> And to recap: for modifying
>>>>>                           instructions for example (add/remove
>>>>> breakpoint), extra level of cache flushing is
>>>>>                           needed which is not available to
>>>>> regular
>>>>> userspace.
>>>>>
>>>>>                           I specifically discussed with Sima on
>>>>> the
>>>>> difference before moving forward with this
>>>>>                           design originally. If something has
>>>>> changed
>>>>> since then, I'm of course happy to rediscuss.
>>>>>
>>>>>                           However, if this code can't be added,
>>>>> not
>>>>> sure how we would ever be able
>>>>>                           to implement core dumps for GPU
>>>>> threads/memory?
>>>>>
>>>>>
>>>>>                                         If you need to access
>>>>> BOs
>>>>> which are placed in not CPU accessible memory then
>>>>>                                         implement the access
>>>>> callback
>>>>> for ptrace, see amdgpu_ttm_access_memory for
>>>>>                                         an example how to do
>>>>> this.
>>>>>
>>>>>                           As also mentioned above, we don't work
>>>>> via
>>>>> ptrace at all when it comes
>>>>>                           to debugging the EUs. The only thing
>>>>> used
>>>>> for now is the ptrace_may_access to
>>>>>                           implement similar access restrictions
>>>>> as
>>>>> ptrace has. This can be changed
>>>>>                           to something else if needed.
>>>>>
>>>>>
>>>>>                                     Ptrace access via
>>>>> vm_operations_struct.access → ttm_bo_vm_access.
>>>>>
>>>>>                                     This series renames
>>>>> ttm_bo_vm_access to ttm_bo_access, with no code changes.
>>>>>
>>>>>                                     The above function accesses
>>>>> a BO
>>>>> via kmap if it is in SYSTEM / TT,
>>>>>                                     which is existing code.
>>>>>
>>>>>                                     This function is only
>>>>> exposed to
>>>>> user space via ptrace permissions.
>>>>>
>>>>>                           Maybe this sentence is what caused the
>>>>> confusion.
>>>>>
>>>>>                           Userspace is never exposed with
>>>>> peek/poke
>>>>> interface, only the debugger
>>>>>                           connection which is its own FD.
>>>>>
>>>>>
>>>>>                                     In this series, we implement
>>>>> a
>>>>> function [3] similar to
>>>>>                                     amdgpu_ttm_access_memory for
>>>>> the
>>>>> TTM vfunc access_memory. What is
>>>>>                                     missing is non-visible CPU
>>>>> memory
>>>>> access, similar to
>>>>>                                    
>>>>> amdgpu_ttm_access_memory_sdma.
>>>>> This will be addressed in a follow-up and
>>>>>                                     was omitted in this series
>>>>> given
>>>>> its complexity.
>>>>>
>>>>>                                     So, this looks more or less
>>>>> identical to AMD's ptrace implementation,
>>>>>                                     but in GPU address space.
>>>>> Again,
>>>>> I fail to see what the problem is here.
>>>>>                                     What am I missing?
>>>>>
>>>>>
>>>>>                               The main question is why can't you
>>>>> use
>>>>> the existing interfaces directly?
>>>>>
>>>>>                           We're not working on the CPU address
>>>>> space
>>>>> or BOs. We're working
>>>>>                           strictly on the GPU address space as
>>>>> would
>>>>> be seen by an EU thread if it
>>>>>                           accessed address X.
>>>>>
>>>>>
>>>>>                               Additional to the peek/poke
>>>>> interface
>>>>> of ptrace Linux has the pidfd_getfd
>>>>>                               system call, see
>>>>> herehttps://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
>>>>>
>>>>>                               The pidfd_getfd() allows to dup()
>>>>> the
>>>>> render node file descriptor into your gdb
>>>>>                               process. That in turn gives you
>>>>> all the
>>>>> access you need from gdb, including
>>>>>                               mapping BOs and command submission
>>>>> on
>>>>> behalf of the application.
>>>>>
>>>>>                           We're not operating on the CPU address
>>>>> space nor are we operating on BOs
>>>>>                           (there is no concept of BO in the EU
>>>>> debug
>>>>> interface). Each VMA in the VM
>>>>>                           could come from anywhere, only the
>>>>> start
>>>>> address and size matter. And
>>>>>                           neither do we need to interfere with
>>>>> the
>>>>> command submission of the
>>>>>                           process under debug.
>>>>>
>>>>>
>>>>>                               As far as I can see that allows
>>>>> for the
>>>>> same functionality as the eudebug
>>>>>                               interface, just without any driver
>>>>> specific code messing with ptrace
>>>>>                               permissions and peek/poke
>>>>> interfaces.
>>>>>
>>>>>                               So the question is still why do
>>>>> you
>>>>> need the whole eudebug interface in the
>>>>>                               first place? I might be missing
>>>>> something, but that seems to be superfluous
>>>>>                               from a high level view.
>>>>>
>>>>>                           Recapping from above. It is to allow
>>>>> the
>>>>> debugging of EU threads per DRM
>>>>>                           client, completely independent of the
>>>>> CPU
>>>>> process. If ptrace_may_acces
>>>>>                           is the sore point, we could consider
>>>>> other
>>>>> permission checks, too. There
>>>>>                           is no other connection to ptrace in
>>>>> this
>>>>> architecture as single
>>>>>                           permission check to know if PID is
>>>>> fair
>>>>> game to access by debugger
>>>>>                           process.
>>>>>
>>>>>                           Why no parasitic thread or ptrace:
>>>>> Going
>>>>> forward, binding the EU debugging to
>>>>>                           the DRM client would also pave way for
>>>>> being able to extend core kernel generated
>>>>>                           core dump with each DRM client's EU
>>>>> thread/memory dump. We have similar
>>>>>                           feature called "Offline core dump"
>>>>> enabled
>>>>> in the downstream public
>>>>>                           trees for i915, where we currently
>>>>> attach
>>>>> the EU thread dump to i915 error state
>>>>>                           and then later combine i915 error
>>>>> state
>>>>> with CPU core dump file with a
>>>>>                           tool.
>>>>>
>>>>>                           This is relatively little amount of
>>>>> extra
>>>>> code, as this baseline series
>>>>>                           already introduces GDB the ability to
>>>>> perform the necessary actions.
>>>>>                           It's just the matter of kernel driver
>>>>> calling: "stop all threads", then
>>>>>                           copying the memory map and memory
>>>>> contents
>>>>> for GPU threads, just like is
>>>>>                           done for CPU threads.
>>>>>
>>>>>                           With parasitic thread injection, not
>>>>> sure
>>>>> if there is such way forward,
>>>>>                           as it would seem to require to inject
>>>>> quite
>>>>> abit more logic to core kernel?
>>>>>
>>>>>
>>>>>                               It's true that the AMD KFD part
>>>>> has
>>>>> still similar functionality, but that is
>>>>>                               because of the broken KFD design
>>>>> of
>>>>> tying driver state to the CPU process
>>>>>                               (which makes it inaccessible for
>>>>> gdb
>>>>> even with imported render node fd).
>>>>>
>>>>>                               Both Sima and I (and partially
>>>>> Dave as
>>>>> well) have pushed back on the KFD
>>>>>                               approach. And the long term plan
>>>>> is to
>>>>> get rid of such device driver specific
>>>>>                               interface which re-implement
>>>>> existing
>>>>> functionality just differently.
>>>>>
>>>>>                           Recapping, this series is not adding
>>>>> it
>>>>> back. The debugger connection
>>>>>                           is a separate FD from the DRM one,
>>>>> with
>>>>> separate IOCTL set. We don't allow
>>>>>                           the DRM FD any new operations based on
>>>>> ptrace is attached or not. We
>>>>>                           don't ever do that check even.
>>>>>
>>>>>                           We only restrict the opening of the
>>>>> debugger connection to given PID with
>>>>>                           ptrace_may_access check for now. That
>>>>> can
>>>>> be changed to something else,
>>>>>                           if necessary.
>>>>>
>>>>>                       Yeah I think unnecessarily tying gpu
>>>>> processes
>>>>> to cpu processes is a bad
>>>>>                       thing, least because even today all the
>>>>> svm
>>>>> discussions we have still hit
>>>>>                       clear use-cases, where a 1:1 match is not
>>>>> wanted (like multiple gpu svm
>>>>>                       sections with offsets). Not even speaking
>>>>> of
>>>>> all the gpu usecases where
>>>>>                       the gpu vm space is still entirely
>>>>> independent
>>>>> of the cpu side.
>>>>>
>>>>>                       So that's why I think this entirely
>>>>> separate
>>>>> approach looks like the right
>>>>>                       one, with ptrace_may_access as the access
>>>>> control check to make sure we
>>>>>                       match ptrace on the cpu side.
>>>>>
>>>>>                       But there's very obviously a bikeshed to
>>>>> be had
>>>>> on what the actual uapi
>>>>>                       should look like, especially how gdb opens
>>>>> up a
>>>>> gpu debug access fd. But I
>>>>>                       also think that's not much on drm to
>>>>> decide,
>>>>> but whatever gdb wants. And
>>>>>                       then we aim for some consistency on that
>>>>> lookup/access control part
>>>>>                       (ideally, I might be missing some reasons
>>>>> why
>>>>> this is a bad idea) across
>>>>>                       drm drivers.
>>>>>
>>>>>
>>>>>                               So you need to have a really
>>>>> really
>>>>> good explanation why the eudebug interface
>>>>>                               is actually necessary.
>>>>>
>>>>>                           TL;DR The main point is to decouple
>>>>> the
>>>>> debugging of the EU workloads from the
>>>>>                           debugging of the CPU process. This
>>>>> avoids
>>>>> the interference with the CPU process with
>>>>>                           parasitic thread injection. Further
>>>>> this
>>>>> also allows generating a core dump
>>>>>                           without any GDB connected. There are
>>>>> also
>>>>> many other smaller pros/cons
>>>>>                           which can be discussed but for the
>>>>> context
>>>>> of this patch, this is the
>>>>>                           main one.
>>>>>
>>>>>                           So unlike parasitic thread injection,
>>>>> we
>>>>> don't unlock any special IOCTL for
>>>>>                           the process under debug to be
>>>>> performed by
>>>>> the parasitic thread, but we
>>>>>                           allow the minimal set of operations to
>>>>> be
>>>>> performed by GDB as if those were
>>>>>                           done on the EUs themselves.
>>>>>
>>>>>                           One can think of it like the minimal
>>>>> subset
>>>>> of ptrace but for EU threads,
>>>>>                           not the CPU threads. And thus,
>>>>> building on
>>>>> this it's possible to extend
>>>>>                           the core kernel generated core dumps
>>>>> with
>>>>> DRM specific extension which
>>>>>                           would contain the EU thread/memory
>>>>> dump.
>>>>>
>>>>>                       It might be good to document (in that
>>>>> debugging
>>>>> doc patch probably) why
>>>>>                       thread injection is not a great option,
>>>>> and why
>>>>> the tradeoffs for
>>>>>                       debugging are different than for for
>>>>> checkpoint/restore, where with CRIU
>>>>>                       we landed on doing most of this in
>>>>> userspace,
>>>>> and often requiring
>>>>>                       injection threads to make it all work.
>>>>>
>>>>>                       Cheers, Sima
>>>>>
>>>>>
>>>>>                           Regards, Joonas
>>>>>
>>>>>
>>>>>                               Regards,
>>>>>                               Christian.
>>>>>
>>>>>
>>>>>
>>>>>                                     Matt
>>>>>
>>>>>                                    
>>>>> [3]
>>>>> https://patchwork.freedesktop.org/patch/622520/?series=140200&r
>>>>> e
>>>>> v=6
>>>>>
>>>>>
>>>>>                                         Regards,
>>>>>                                         Christian.
>>>>>
>>>>>
>>>>>                                             Matt
>>>>>
>>>>>
>>>>>                                                 Regards,
>>>>>                                                 Christian.
>>>>>
>>>>>
>>>>>

[-- Attachment #2: Type: text/html, Size: 45048 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-13  8:37                                               ` Christian König
@ 2024-11-13 10:44                                                 ` Thomas Hellström
  2024-11-13 11:42                                                   ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Thomas Hellström @ 2024-11-13 10:44 UTC (permalink / raw)
  To: Christian König, Joonas Lahtinen, Christian König,
	Matthew Brost
  Cc: Simona Vetter, Rodrigo Vivi, Huang Rui, intel-xe, dri-devel,
	matthew.auld, David Airlie, Simona Vetter

On Wed, 2024-11-13 at 09:37 +0100, Christian König wrote:
> Am 12.11.24 um 17:33 schrieb Thomas Hellström:
> > [SNIP]
> > > > > This has been extensively discussed already, but was expected
> > > > > to
> > > > > really
> > > > > only be needed for low-on-memory scenarios. However it now
> > > > > seems
> > > > > like
> > > > > the need is much earlier due to the random userptr page
> > > > > joining
> > > > > by
> > > > > core
> > > > > mm.
> > > > Just to clarify here:
> > > >    
> > > > In Long-Running mode with recoverable pagefaults enabled we
> > > > don't
> > > > have
> > > > any preempt-fences, but rather just zap the PTEs pointing to
> > > > the
> > > > affected memory and flush TLB. So from a memory resource POW a
> > > > breakpoint should be safe, and no mmu notifier nor shrinker
> > > > will be
> > > > blocked.
> > > That sounds like a HMM based approach which would clearly work.
> > > 
> > > But where is that? I don't see any HMM based approach anywhere in
> > > the
> > > XE
> > > driver.
> > This is a mode that uses recoverable pagefaults to fault either
> > full
> > userptr or full bos, and used with DRM_XE_VM_CRATE_FLAG_FAULT_MODE.
> > (not SVM)!
> > 
> > userptrs in xe are bo-less, and using the vm's resv, but otherwise
> > using hmm similar to amdgpu: xe_hmm.c
> 
> Yeah, I have seen that one.
> 
> > fault servicing:
> > xe_gt_pagefault.c
> > 
> > PTE zapping on eviction and notifier:
> > xe_vm_invalidate_vma(), xe_vm.c
> 
> Ah, that was the stuff I was missing.
> 
> So the implementation in xe_preempt_fence.c is just for graphics 
> submissions? That would make the whole thing much easier to handle.

Actually it's not, it's intended for long-running mode, but as a
consequence the debugger would be allowed only in fault mode.

> 
> The only remaining question I can then see is if long running 
> submissions with DRM_XE_VM_CRATE_FLAG_FAULT_MODE could potentially
> block 
> graphics submissions without this flag from accessing the hardware?

Yes and no. We have a mechanism in place that allows either only fault
mode jobs or non-faulting jobs on the same, what we call "engine
group".
A pagefault on an engine group would block or hamper progress of other
jobs on that engine group.

So let's say a dma-fence job is submitted to an engine group that is
currently running a faulting job. We'd then need to switch mode of the
engine group and, in the exec ioctl we'd (explicitly without preempt-
fences) preempt the faulting job before submitting the dma-fence job
and publishing its fence. This preemption will incur a delay which is
typically the delay of servicing any outstanding pagefaults. It's not
ideal, but the best we can do, and it doesn't affect core memory
management nor does it affect migration blits.

In the debugger case, this delay could be long due to breakpoints, and
that's why enabling the debugger would sit behind a flag and not
something default (I think this was discussed earlier in the thread).
Still, core memory management would be unaffected, and also ofc the
migration blits are completely independent.

/Thomas

> 
> Thanks a lot for pointing this out,
> Christian.
> 
> > 
> > Thanks,
> > Thomas
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > Nor will there be any jobs with published dma-fences depending
> > > > on
> > > > the
> > > > job blocked either temporarily by a pagefault or long-term by a
> > > > debugger breakpoint.
> > > > 
> > > > /Thomas
> > > > 
> > > > 
> > > > > If that is done and the memory pre-empt fence is serviced
> > > > > even
> > > > > for
> > > > > debuggable contexts, do you have further concerns with the
> > > > > presented
> > > > > approach
> > > > > from dma-buf and drm/sched perspective?
> > > > > 
> > > > > Regards, Joonas
> > > > > 
> > > > > > Regards,
> > > > > > Christian.
> > > > > > 
> > > > > > 
> > > > > >           This means that a breakpoint or core dump doesn't
> > > > > > halt
> > > > > > GPU
> > > > > > threads, but
> > > > > >           rather suspends them. E.g. all running wave data
> > > > > > is
> > > > > > collected into a state
> > > > > >           bag which can be restored later on.
> > > > > > 
> > > > > >           I was under the impression that those long
> > > > > > running
> > > > > > compute
> > > > > > threads do
> > > > > >           exactly that, but when the hardware can't switch
> > > > > > out
> > > > > > the
> > > > > > GPU thread/process
> > > > > >           while in a break then that isn't the case.
> > > > > > 
> > > > > >           As long as you don't find a way to avoid that
> > > > > > this
> > > > > > patch
> > > > > > set is a pretty
> > > > > >           clear NAK from my side as DMA-buf and TTM
> > > > > > maintainer.
> > > > > > 
> > > > > > 
> > > > > >       I believe this is addressed above.
> > > > > > 
> > > > > >       Matt
> > > > > > 
> > > > > > 
> > > > > >           What might work is to keep the submission on the
> > > > > > hardware
> > > > > > in the break state
> > > > > >           but forbid any memory access. This way you can
> > > > > > signal
> > > > > > your
> > > > > > preemption fence
> > > > > >           even when the hardware isn't made available.
> > > > > > 
> > > > > >           Before you continue XE setups a new pre-emption
> > > > > > fence
> > > > > > and
> > > > > > makes sure that
> > > > > >           all page tables etc... are up to date.
> > > > > > 
> > > > > >           Could be tricky to get this right if completion
> > > > > > fence
> > > > > > based
> > > > > > submissions are
> > > > > >           mixed in as well, but that gives you at least a
> > > > > > direction
> > > > > > you could
> > > > > >           potentially go.
> > > > > > 
> > > > > >           Regards,
> > > > > >           Christian.
> > > > > > 
> > > > > > 
> > > > > >               Regards, Joonas
> > > > > > 
> > > > > > 
> > > > > >                   Regards,
> > > > > >                   Christian.
> > > > > > 
> > > > > > 
> > > > > >                       Some wash-up thoughts from me below,
> > > > > > but
> > > > > > consider them fairly irrelevant
> > > > > >                       since I think the main driver for
> > > > > > these
> > > > > > big
> > > > > > questions here should be
> > > > > >                       gdb/userspace.
> > > > > > 
> > > > > > 
> > > > > >                           Quoting Christian König (2024-11-
> > > > > > 07
> > > > > > 11:44:33)
> > > > > > 
> > > > > >                               Am 06.11.24 um 18:00 schrieb
> > > > > > Matthew
> > > > > > Brost:
> > > > > > 
> > > > > >                                     [SNIP]
> > > > > > 
> > > > > >                                     This is not a generic
> > > > > > interface
> > > > > > that anyone can freely access. The same
> > > > > >                                     permissions used by
> > > > > > ptrace
> > > > > > are
> > > > > > checked when opening such an interface.
> > > > > >                                     See [1] [2].
> > > > > > 
> > > > > >                                    
> > > > > > [1]
> > > > > > https://patchwork.freedesktop.org/patch/617470/?series=136572&r
> > > > > > e
> > > > > > v=2
> > > > > >                                    
> > > > > > [2]
> > > > > > https://patchwork.freedesktop.org/patch/617471/?series=136572&r
> > > > > > e
> > > > > > v=2
> > > > > > 
> > > > > > 
> > > > > >                               Thanks a lot for those
> > > > > > pointers,
> > > > > > that
> > > > > > is exactly what I was looking for.
> > > > > > 
> > > > > >                               And yeah, it is what I
> > > > > > feared. You
> > > > > > are
> > > > > > re-implementing existing functionality,
> > > > > >                               but see below.
> > > > > > 
> > > > > >                           Could you elaborate on what this
> > > > > > "existing
> > > > > > functionality" exactly is?
> > > > > >                           I do not think this functionality
> > > > > > exists at
> > > > > > this time.
> > > > > > 
> > > > > >                           The EU debugging architecture for
> > > > > > Xe
> > > > > > specifically avoids the need for GDB
> > > > > >                           to attach with ptrace to the CPU
> > > > > > process or
> > > > > > interfere with the CPU process for
> > > > > >                           the debugging via parasitic
> > > > > > threads or
> > > > > > so.
> > > > > > 
> > > > > >                           Debugger connection is opened to
> > > > > > the
> > > > > > DRM
> > > > > > driver for given PID (which uses the
> > > > > >                           ptrace may access check for now)
> > > > > > after
> > > > > > which the all DRM client of that
> > > > > >                           PID are exposed to the debugger
> > > > > > process.
> > > > > > 
> > > > > >                           What we want to expose via that
> > > > > > debugger
> > > > > > connection is the ability for GDB to
> > > > > >                           read/write the different GPU VM
> > > > > > address
> > > > > > spaces (ppGTT for Intel GPUs) just like
> > > > > >                           the EU threads would see them.
> > > > > > Note
> > > > > > that
> > > > > > the layout of the ppGTT is
> > > > > >                           completely up to the userspace
> > > > > > driver
> > > > > > to
> > > > > > setup and is mostly only partially
> > > > > >                           equal to the CPU address space.
> > > > > > 
> > > > > >                           Specifically as part of
> > > > > > reading/writing the
> > > > > > ppGTT for debugging purposes,
> > > > > >                           there are deep flushes needed:
> > > > > > for
> > > > > > example
> > > > > > flushing instruction cache
> > > > > >                           when adding/removing breakpoints.
> > > > > > 
> > > > > >                           Maybe that will explain the
> > > > > > background. I
> > > > > > elaborate on this at the end some more.
> > > > > > 
> > > > > > 
> > > > > >                                             kmap/vmap are
> > > > > > used
> > > > > > everywhere in the DRM subsystem to access BOs, so I’m
> > > > > >                                             failing to see
> > > > > > the
> > > > > > problem with adding a simple helper based on existing
> > > > > >                                             code.
> > > > > > 
> > > > > >                                         What#s possible and
> > > > > > often
> > > > > > done is to do kmap/vmap if you need to implement a
> > > > > >                                         CPU copy for
> > > > > > scanout for
> > > > > > example or for copying/validating command buffers.
> > > > > >                                         But that usually
> > > > > > requires
> > > > > > accessing the whole BO and has separate security
> > > > > >                                         checks.
> > > > > > 
> > > > > >                                         When you want to
> > > > > > access
> > > > > > only
> > > > > > a few bytes of a BO that sounds massively like
> > > > > >                                         a peek/poke like
> > > > > > interface
> > > > > > and we have already rejected that more than once.
> > > > > >                                         There even used to
> > > > > > be
> > > > > > standardized GEM IOCTLs for that which have been
> > > > > >                                         removed by now.
> > > > > > 
> > > > > >                           Referring to the explanation at
> > > > > > top:
> > > > > > These
> > > > > > IOCTL are not for the debugging target
> > > > > >                           process to issue. The peek/poke
> > > > > > interface
> > > > > > is specifically for GDB only
> > > > > >                           to facilitate the emulation of
> > > > > > memory
> > > > > > reads/writes on the GPU address
> > > > > >                           space as they were done by EUs
> > > > > > themselves.
> > > > > > And to recap: for modifying
> > > > > >                           instructions for example
> > > > > > (add/remove
> > > > > > breakpoint), extra level of cache flushing is
> > > > > >                           needed which is not available to
> > > > > > regular
> > > > > > userspace.
> > > > > > 
> > > > > >                           I specifically discussed with
> > > > > > Sima on
> > > > > > the
> > > > > > difference before moving forward with this
> > > > > >                           design originally. If something
> > > > > > has
> > > > > > changed
> > > > > > since then, I'm of course happy to rediscuss.
> > > > > > 
> > > > > >                           However, if this code can't be
> > > > > > added,
> > > > > > not
> > > > > > sure how we would ever be able
> > > > > >                           to implement core dumps for GPU
> > > > > > threads/memory?
> > > > > > 
> > > > > > 
> > > > > >                                         If you need to
> > > > > > access
> > > > > > BOs
> > > > > > which are placed in not CPU accessible memory then
> > > > > >                                         implement the
> > > > > > access
> > > > > > callback
> > > > > > for ptrace, see amdgpu_ttm_access_memory for
> > > > > >                                         an example how to
> > > > > > do
> > > > > > this.
> > > > > > 
> > > > > >                           As also mentioned above, we don't
> > > > > > work
> > > > > > via
> > > > > > ptrace at all when it comes
> > > > > >                           to debugging the EUs. The only
> > > > > > thing
> > > > > > used
> > > > > > for now is the ptrace_may_access to
> > > > > >                           implement similar access
> > > > > > restrictions
> > > > > > as
> > > > > > ptrace has. This can be changed
> > > > > >                           to something else if needed.
> > > > > > 
> > > > > > 
> > > > > >                                     Ptrace access via
> > > > > > vm_operations_struct.access → ttm_bo_vm_access.
> > > > > > 
> > > > > >                                     This series renames
> > > > > > ttm_bo_vm_access to ttm_bo_access, with no code changes.
> > > > > > 
> > > > > >                                     The above function
> > > > > > accesses
> > > > > > a BO
> > > > > > via kmap if it is in SYSTEM / TT,
> > > > > >                                     which is existing code.
> > > > > > 
> > > > > >                                     This function is only
> > > > > > exposed to
> > > > > > user space via ptrace permissions.
> > > > > > 
> > > > > >                           Maybe this sentence is what
> > > > > > caused the
> > > > > > confusion.
> > > > > > 
> > > > > >                           Userspace is never exposed with
> > > > > > peek/poke
> > > > > > interface, only the debugger
> > > > > >                           connection which is its own FD.
> > > > > > 
> > > > > > 
> > > > > >                                     In this series, we
> > > > > > implement
> > > > > > a
> > > > > > function [3] similar to
> > > > > >                                    
> > > > > > amdgpu_ttm_access_memory for
> > > > > > the
> > > > > > TTM vfunc access_memory. What is
> > > > > >                                     missing is non-visible
> > > > > > CPU
> > > > > > memory
> > > > > > access, similar to
> > > > > >                                    
> > > > > > amdgpu_ttm_access_memory_sdma.
> > > > > > This will be addressed in a follow-up and
> > > > > >                                     was omitted in this
> > > > > > series
> > > > > > given
> > > > > > its complexity.
> > > > > > 
> > > > > >                                     So, this looks more or
> > > > > > less
> > > > > > identical to AMD's ptrace implementation,
> > > > > >                                     but in GPU address
> > > > > > space.
> > > > > > Again,
> > > > > > I fail to see what the problem is here.
> > > > > >                                     What am I missing?
> > > > > > 
> > > > > > 
> > > > > >                               The main question is why
> > > > > > can't you
> > > > > > use
> > > > > > the existing interfaces directly?
> > > > > > 
> > > > > >                           We're not working on the CPU
> > > > > > address
> > > > > > space
> > > > > > or BOs. We're working
> > > > > >                           strictly on the GPU address space
> > > > > > as
> > > > > > would
> > > > > > be seen by an EU thread if it
> > > > > >                           accessed address X.
> > > > > > 
> > > > > > 
> > > > > >                               Additional to the peek/poke
> > > > > > interface
> > > > > > of ptrace Linux has the pidfd_getfd
> > > > > >                               system call, see
> > > > > > here
> > > > > > https://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
> > > > > > 
> > > > > >                               The pidfd_getfd() allows to
> > > > > > dup()
> > > > > > the
> > > > > > render node file descriptor into your gdb
> > > > > >                               process. That in turn gives
> > > > > > you
> > > > > > all the
> > > > > > access you need from gdb, including
> > > > > >                               mapping BOs and command
> > > > > > submission
> > > > > > on
> > > > > > behalf of the application.
> > > > > > 
> > > > > >                           We're not operating on the CPU
> > > > > > address
> > > > > > space nor are we operating on BOs
> > > > > >                           (there is no concept of BO in the
> > > > > > EU
> > > > > > debug
> > > > > > interface). Each VMA in the VM
> > > > > >                           could come from anywhere, only
> > > > > > the
> > > > > > start
> > > > > > address and size matter. And
> > > > > >                           neither do we need to interfere
> > > > > > with
> > > > > > the
> > > > > > command submission of the
> > > > > >                           process under debug.
> > > > > > 
> > > > > > 
> > > > > >                               As far as I can see that
> > > > > > allows
> > > > > > for the
> > > > > > same functionality as the eudebug
> > > > > >                               interface, just without any
> > > > > > driver
> > > > > > specific code messing with ptrace
> > > > > >                               permissions and peek/poke
> > > > > > interfaces.
> > > > > > 
> > > > > >                               So the question is still why
> > > > > > do
> > > > > > you
> > > > > > need the whole eudebug interface in the
> > > > > >                               first place? I might be
> > > > > > missing
> > > > > > something, but that seems to be superfluous
> > > > > >                               from a high level view.
> > > > > > 
> > > > > >                           Recapping from above. It is to
> > > > > > allow
> > > > > > the
> > > > > > debugging of EU threads per DRM
> > > > > >                           client, completely independent of
> > > > > > the
> > > > > > CPU
> > > > > > process. If ptrace_may_acces
> > > > > >                           is the sore point, we could
> > > > > > consider
> > > > > > other
> > > > > > permission checks, too. There
> > > > > >                           is no other connection to ptrace
> > > > > > in
> > > > > > this
> > > > > > architecture as single
> > > > > >                           permission check to know if PID
> > > > > > is
> > > > > > fair
> > > > > > game to access by debugger
> > > > > >                           process.
> > > > > > 
> > > > > >                           Why no parasitic thread or
> > > > > > ptrace:
> > > > > > Going
> > > > > > forward, binding the EU debugging to
> > > > > >                           the DRM client would also pave
> > > > > > way for
> > > > > > being able to extend core kernel generated
> > > > > >                           core dump with each DRM client's
> > > > > > EU
> > > > > > thread/memory dump. We have similar
> > > > > >                           feature called "Offline core
> > > > > > dump"
> > > > > > enabled
> > > > > > in the downstream public
> > > > > >                           trees for i915, where we
> > > > > > currently
> > > > > > attach
> > > > > > the EU thread dump to i915 error state
> > > > > >                           and then later combine i915 error
> > > > > > state
> > > > > > with CPU core dump file with a
> > > > > >                           tool.
> > > > > > 
> > > > > >                           This is relatively little amount
> > > > > > of
> > > > > > extra
> > > > > > code, as this baseline series
> > > > > >                           already introduces GDB the
> > > > > > ability to
> > > > > > perform the necessary actions.
> > > > > >                           It's just the matter of kernel
> > > > > > driver
> > > > > > calling: "stop all threads", then
> > > > > >                           copying the memory map and memory
> > > > > > contents
> > > > > > for GPU threads, just like is
> > > > > >                           done for CPU threads.
> > > > > > 
> > > > > >                           With parasitic thread injection,
> > > > > > not
> > > > > > sure
> > > > > > if there is such way forward,
> > > > > >                           as it would seem to require to
> > > > > > inject
> > > > > > quite
> > > > > > abit more logic to core kernel?
> > > > > > 
> > > > > > 
> > > > > >                               It's true that the AMD KFD
> > > > > > part
> > > > > > has
> > > > > > still similar functionality, but that is
> > > > > >                               because of the broken KFD
> > > > > > design
> > > > > > of
> > > > > > tying driver state to the CPU process
> > > > > >                               (which makes it inaccessible
> > > > > > for
> > > > > > gdb
> > > > > > even with imported render node fd).
> > > > > > 
> > > > > >                               Both Sima and I (and
> > > > > > partially
> > > > > > Dave as
> > > > > > well) have pushed back on the KFD
> > > > > >                               approach. And the long term
> > > > > > plan
> > > > > > is to
> > > > > > get rid of such device driver specific
> > > > > >                               interface which re-implement
> > > > > > existing
> > > > > > functionality just differently.
> > > > > > 
> > > > > >                           Recapping, this series is not
> > > > > > adding
> > > > > > it
> > > > > > back. The debugger connection
> > > > > >                           is a separate FD from the DRM
> > > > > > one,
> > > > > > with
> > > > > > separate IOCTL set. We don't allow
> > > > > >                           the DRM FD any new operations
> > > > > > based on
> > > > > > ptrace is attached or not. We
> > > > > >                           don't ever do that check even.
> > > > > > 
> > > > > >                           We only restrict the opening of
> > > > > > the
> > > > > > debugger connection to given PID with
> > > > > >                           ptrace_may_access check for now.
> > > > > > That
> > > > > > can
> > > > > > be changed to something else,
> > > > > >                           if necessary.
> > > > > > 
> > > > > >                       Yeah I think unnecessarily tying gpu
> > > > > > processes
> > > > > > to cpu processes is a bad
> > > > > >                       thing, least because even today all
> > > > > > the
> > > > > > svm
> > > > > > discussions we have still hit
> > > > > >                       clear use-cases, where a 1:1 match is
> > > > > > not
> > > > > > wanted (like multiple gpu svm
> > > > > >                       sections with offsets). Not even
> > > > > > speaking
> > > > > > of
> > > > > > all the gpu usecases where
> > > > > >                       the gpu vm space is still entirely
> > > > > > independent
> > > > > > of the cpu side.
> > > > > > 
> > > > > >                       So that's why I think this entirely
> > > > > > separate
> > > > > > approach looks like the right
> > > > > >                       one, with ptrace_may_access as the
> > > > > > access
> > > > > > control check to make sure we
> > > > > >                       match ptrace on the cpu side.
> > > > > > 
> > > > > >                       But there's very obviously a bikeshed
> > > > > > to
> > > > > > be had
> > > > > > on what the actual uapi
> > > > > >                       should look like, especially how gdb
> > > > > > opens
> > > > > > up a
> > > > > > gpu debug access fd. But I
> > > > > >                       also think that's not much on drm to
> > > > > > decide,
> > > > > > but whatever gdb wants. And
> > > > > >                       then we aim for some consistency on
> > > > > > that
> > > > > > lookup/access control part
> > > > > >                       (ideally, I might be missing some
> > > > > > reasons
> > > > > > why
> > > > > > this is a bad idea) across
> > > > > >                       drm drivers.
> > > > > > 
> > > > > > 
> > > > > >                               So you need to have a really
> > > > > > really
> > > > > > good explanation why the eudebug interface
> > > > > >                               is actually necessary.
> > > > > > 
> > > > > >                           TL;DR The main point is to
> > > > > > decouple
> > > > > > the
> > > > > > debugging of the EU workloads from the
> > > > > >                           debugging of the CPU process.
> > > > > > This
> > > > > > avoids
> > > > > > the interference with the CPU process with
> > > > > >                           parasitic thread injection.
> > > > > > Further
> > > > > > this
> > > > > > also allows generating a core dump
> > > > > >                           without any GDB connected. There
> > > > > > are
> > > > > > also
> > > > > > many other smaller pros/cons
> > > > > >                           which can be discussed but for
> > > > > > the
> > > > > > context
> > > > > > of this patch, this is the
> > > > > >                           main one.
> > > > > > 
> > > > > >                           So unlike parasitic thread
> > > > > > injection,
> > > > > > we
> > > > > > don't unlock any special IOCTL for
> > > > > >                           the process under debug to be
> > > > > > performed by
> > > > > > the parasitic thread, but we
> > > > > >                           allow the minimal set of
> > > > > > operations to
> > > > > > be
> > > > > > performed by GDB as if those were
> > > > > >                           done on the EUs themselves.
> > > > > > 
> > > > > >                           One can think of it like the
> > > > > > minimal
> > > > > > subset
> > > > > > of ptrace but for EU threads,
> > > > > >                           not the CPU threads. And thus,
> > > > > > building on
> > > > > > this it's possible to extend
> > > > > >                           the core kernel generated core
> > > > > > dumps
> > > > > > with
> > > > > > DRM specific extension which
> > > > > >                           would contain the EU
> > > > > > thread/memory
> > > > > > dump.
> > > > > > 
> > > > > >                       It might be good to document (in that
> > > > > > debugging
> > > > > > doc patch probably) why
> > > > > >                       thread injection is not a great
> > > > > > option,
> > > > > > and why
> > > > > > the tradeoffs for
> > > > > >                       debugging are different than for for
> > > > > > checkpoint/restore, where with CRIU
> > > > > >                       we landed on doing most of this in
> > > > > > userspace,
> > > > > > and often requiring
> > > > > >                       injection threads to make it all
> > > > > > work.
> > > > > > 
> > > > > >                       Cheers, Sima
> > > > > > 
> > > > > > 
> > > > > >                           Regards, Joonas
> > > > > > 
> > > > > > 
> > > > > >                               Regards,
> > > > > >                               Christian.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > >                                     Matt
> > > > > > 
> > > > > >                                    
> > > > > > [3]
> > > > > > https://patchwork.freedesktop.org/patch/622520/?series=140200&r
> > > > > > e
> > > > > > v=6
> > > > > > 
> > > > > > 
> > > > > >                                         Regards,
> > > > > >                                         Christian.
> > > > > > 
> > > > > > 
> > > > > >                                             Matt
> > > > > > 
> > > > > > 
> > > > > >                                                 Regards,
> > > > > >                                                 Christian.
> > > > > > 
> > > > > > 
> > > > > > 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-13 10:44                                                 ` Thomas Hellström
@ 2024-11-13 11:42                                                   ` Christian König
  2024-11-15 18:27                                                     ` Matthew Brost
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2024-11-13 11:42 UTC (permalink / raw)
  To: Thomas Hellström, Joonas Lahtinen, Christian König,
	Matthew Brost
  Cc: Simona Vetter, Rodrigo Vivi, Huang Rui, intel-xe, dri-devel,
	matthew.auld, David Airlie, Simona Vetter

[-- Attachment #1: Type: text/plain, Size: 31122 bytes --]

Am 13.11.24 um 11:44 schrieb Thomas Hellström:
> On Wed, 2024-11-13 at 09:37 +0100, Christian König wrote:
>> Am 12.11.24 um 17:33 schrieb Thomas Hellström:
>>> [SNIP]
>>>>>> This has been extensively discussed already, but was expected
>>>>>> to
>>>>>> really
>>>>>> only be needed for low-on-memory scenarios. However it now
>>>>>> seems
>>>>>> like
>>>>>> the need is much earlier due to the random userptr page
>>>>>> joining
>>>>>> by
>>>>>> core
>>>>>> mm.
>>>>> Just to clarify here:
>>>>>     
>>>>> In Long-Running mode with recoverable pagefaults enabled we
>>>>> don't
>>>>> have
>>>>> any preempt-fences, but rather just zap the PTEs pointing to
>>>>> the
>>>>> affected memory and flush TLB. So from a memory resource POW a
>>>>> breakpoint should be safe, and no mmu notifier nor shrinker
>>>>> will be
>>>>> blocked.
>>>> That sounds like a HMM based approach which would clearly work.
>>>>
>>>> But where is that? I don't see any HMM based approach anywhere in
>>>> the
>>>> XE
>>>> driver.
>>> This is a mode that uses recoverable pagefaults to fault either
>>> full
>>> userptr or full bos, and used with DRM_XE_VM_CRATE_FLAG_FAULT_MODE.
>>> (not SVM)!
>>>
>>> userptrs in xe are bo-less, and using the vm's resv, but otherwise
>>> using hmm similar to amdgpu: xe_hmm.c
>> Yeah, I have seen that one.
>>
>>> fault servicing:
>>> xe_gt_pagefault.c
>>>
>>> PTE zapping on eviction and notifier:
>>> xe_vm_invalidate_vma(), xe_vm.c
>> Ah, that was the stuff I was missing.
>>
>> So the implementation in xe_preempt_fence.c is just for graphics
>> submissions? That would make the whole thing much easier to handle.
> Actually it's not, it's intended for long-running mode, but as a
> consequence the debugger would be allowed only in fault mode.

Make sense, yes.

>> The only remaining question I can then see is if long running
>> submissions with DRM_XE_VM_CRATE_FLAG_FAULT_MODE could potentially
>> block
>> graphics submissions without this flag from accessing the hardware?
> Yes and no. We have a mechanism in place that allows either only fault
> mode jobs or non-faulting jobs on the same, what we call "engine
> group".
> A pagefault on an engine group would block or hamper progress of other
> jobs on that engine group.
>
> So let's say a dma-fence job is submitted to an engine group that is
> currently running a faulting job. We'd then need to switch mode of the
> engine group and, in the exec ioctl we'd (explicitly without preempt-
> fences) preempt the faulting job before submitting the dma-fence job
> and publishing its fence. This preemption will incur a delay which is
> typically the delay of servicing any outstanding pagefaults. It's not
> ideal, but the best we can do, and it doesn't affect core memory
> management nor does it affect migration blits.
>
> In the debugger case, this delay could be long due to breakpoints, and
> that's why enabling the debugger would sit behind a flag and not
> something default (I think this was discussed earlier in the thread).
> Still, core memory management would be unaffected, and also ofc the
> migration blits are completely independent.

Yeah, that sounds totally sane to me.

Sorry for the noise then. I didn't realized that you have two separate 
modes of operation.

Going to reply on the other open questions separately.

Regards,
Christian.

>
> /Thomas
>
>> Thanks a lot for pointing this out,
>> Christian.
>>
>>> Thanks,
>>> Thomas
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Nor will there be any jobs with published dma-fences depending
>>>>> on
>>>>> the
>>>>> job blocked either temporarily by a pagefault or long-term by a
>>>>> debugger breakpoint.
>>>>>
>>>>> /Thomas
>>>>>
>>>>>
>>>>>> If that is done and the memory pre-empt fence is serviced
>>>>>> even
>>>>>> for
>>>>>> debuggable contexts, do you have further concerns with the
>>>>>> presented
>>>>>> approach
>>>>>> from dma-buf and drm/sched perspective?
>>>>>>
>>>>>> Regards, Joonas
>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>
>>>>>>>            This means that a breakpoint or core dump doesn't
>>>>>>> halt
>>>>>>> GPU
>>>>>>> threads, but
>>>>>>>            rather suspends them. E.g. all running wave data
>>>>>>> is
>>>>>>> collected into a state
>>>>>>>            bag which can be restored later on.
>>>>>>>
>>>>>>>            I was under the impression that those long
>>>>>>> running
>>>>>>> compute
>>>>>>> threads do
>>>>>>>            exactly that, but when the hardware can't switch
>>>>>>> out
>>>>>>> the
>>>>>>> GPU thread/process
>>>>>>>            while in a break then that isn't the case.
>>>>>>>
>>>>>>>            As long as you don't find a way to avoid that
>>>>>>> this
>>>>>>> patch
>>>>>>> set is a pretty
>>>>>>>            clear NAK from my side as DMA-buf and TTM
>>>>>>> maintainer.
>>>>>>>
>>>>>>>
>>>>>>>        I believe this is addressed above.
>>>>>>>
>>>>>>>        Matt
>>>>>>>
>>>>>>>
>>>>>>>            What might work is to keep the submission on the
>>>>>>> hardware
>>>>>>> in the break state
>>>>>>>            but forbid any memory access. This way you can
>>>>>>> signal
>>>>>>> your
>>>>>>> preemption fence
>>>>>>>            even when the hardware isn't made available.
>>>>>>>
>>>>>>>            Before you continue XE setups a new pre-emption
>>>>>>> fence
>>>>>>> and
>>>>>>> makes sure that
>>>>>>>            all page tables etc... are up to date.
>>>>>>>
>>>>>>>            Could be tricky to get this right if completion
>>>>>>> fence
>>>>>>> based
>>>>>>> submissions are
>>>>>>>            mixed in as well, but that gives you at least a
>>>>>>> direction
>>>>>>> you could
>>>>>>>            potentially go.
>>>>>>>
>>>>>>>            Regards,
>>>>>>>            Christian.
>>>>>>>
>>>>>>>
>>>>>>>                Regards, Joonas
>>>>>>>
>>>>>>>
>>>>>>>                    Regards,
>>>>>>>                    Christian.
>>>>>>>
>>>>>>>
>>>>>>>                        Some wash-up thoughts from me below,
>>>>>>> but
>>>>>>> consider them fairly irrelevant
>>>>>>>                        since I think the main driver for
>>>>>>> these
>>>>>>> big
>>>>>>> questions here should be
>>>>>>>                        gdb/userspace.
>>>>>>>
>>>>>>>
>>>>>>>                            Quoting Christian König (2024-11-
>>>>>>> 07
>>>>>>> 11:44:33)
>>>>>>>
>>>>>>>                                Am 06.11.24 um 18:00 schrieb
>>>>>>> Matthew
>>>>>>> Brost:
>>>>>>>
>>>>>>>                                      [SNIP]
>>>>>>>
>>>>>>>                                      This is not a generic
>>>>>>> interface
>>>>>>> that anyone can freely access. The same
>>>>>>>                                      permissions used by
>>>>>>> ptrace
>>>>>>> are
>>>>>>> checked when opening such an interface.
>>>>>>>                                      See [1] [2].
>>>>>>>
>>>>>>>                                     
>>>>>>> [1]
>>>>>>> https://patchwork.freedesktop.org/patch/617470/?series=136572&r
>>>>>>> e
>>>>>>> v=2
>>>>>>>                                     
>>>>>>> [2]
>>>>>>> https://patchwork.freedesktop.org/patch/617471/?series=136572&r
>>>>>>> e
>>>>>>> v=2
>>>>>>>
>>>>>>>
>>>>>>>                                Thanks a lot for those
>>>>>>> pointers,
>>>>>>> that
>>>>>>> is exactly what I was looking for.
>>>>>>>
>>>>>>>                                And yeah, it is what I
>>>>>>> feared. You
>>>>>>> are
>>>>>>> re-implementing existing functionality,
>>>>>>>                                but see below.
>>>>>>>
>>>>>>>                            Could you elaborate on what this
>>>>>>> "existing
>>>>>>> functionality" exactly is?
>>>>>>>                            I do not think this functionality
>>>>>>> exists at
>>>>>>> this time.
>>>>>>>
>>>>>>>                            The EU debugging architecture for
>>>>>>> Xe
>>>>>>> specifically avoids the need for GDB
>>>>>>>                            to attach with ptrace to the CPU
>>>>>>> process or
>>>>>>> interfere with the CPU process for
>>>>>>>                            the debugging via parasitic
>>>>>>> threads or
>>>>>>> so.
>>>>>>>
>>>>>>>                            Debugger connection is opened to
>>>>>>> the
>>>>>>> DRM
>>>>>>> driver for given PID (which uses the
>>>>>>>                            ptrace may access check for now)
>>>>>>> after
>>>>>>> which the all DRM client of that
>>>>>>>                            PID are exposed to the debugger
>>>>>>> process.
>>>>>>>
>>>>>>>                            What we want to expose via that
>>>>>>> debugger
>>>>>>> connection is the ability for GDB to
>>>>>>>                            read/write the different GPU VM
>>>>>>> address
>>>>>>> spaces (ppGTT for Intel GPUs) just like
>>>>>>>                            the EU threads would see them.
>>>>>>> Note
>>>>>>> that
>>>>>>> the layout of the ppGTT is
>>>>>>>                            completely up to the userspace
>>>>>>> driver
>>>>>>> to
>>>>>>> setup and is mostly only partially
>>>>>>>                            equal to the CPU address space.
>>>>>>>
>>>>>>>                            Specifically as part of
>>>>>>> reading/writing the
>>>>>>> ppGTT for debugging purposes,
>>>>>>>                            there are deep flushes needed:
>>>>>>> for
>>>>>>> example
>>>>>>> flushing instruction cache
>>>>>>>                            when adding/removing breakpoints.
>>>>>>>
>>>>>>>                            Maybe that will explain the
>>>>>>> background. I
>>>>>>> elaborate on this at the end some more.
>>>>>>>
>>>>>>>
>>>>>>>                                              kmap/vmap are
>>>>>>> used
>>>>>>> everywhere in the DRM subsystem to access BOs, so I’m
>>>>>>>                                              failing to see
>>>>>>> the
>>>>>>> problem with adding a simple helper based on existing
>>>>>>>                                              code.
>>>>>>>
>>>>>>>                                          What#s possible and
>>>>>>> often
>>>>>>> done is to do kmap/vmap if you need to implement a
>>>>>>>                                          CPU copy for
>>>>>>> scanout for
>>>>>>> example or for copying/validating command buffers.
>>>>>>>                                          But that usually
>>>>>>> requires
>>>>>>> accessing the whole BO and has separate security
>>>>>>>                                          checks.
>>>>>>>
>>>>>>>                                          When you want to
>>>>>>> access
>>>>>>> only
>>>>>>> a few bytes of a BO that sounds massively like
>>>>>>>                                          a peek/poke like
>>>>>>> interface
>>>>>>> and we have already rejected that more than once.
>>>>>>>                                          There even used to
>>>>>>> be
>>>>>>> standardized GEM IOCTLs for that which have been
>>>>>>>                                          removed by now.
>>>>>>>
>>>>>>>                            Referring to the explanation at
>>>>>>> top:
>>>>>>> These
>>>>>>> IOCTL are not for the debugging target
>>>>>>>                            process to issue. The peek/poke
>>>>>>> interface
>>>>>>> is specifically for GDB only
>>>>>>>                            to facilitate the emulation of
>>>>>>> memory
>>>>>>> reads/writes on the GPU address
>>>>>>>                            space as they were done by EUs
>>>>>>> themselves.
>>>>>>> And to recap: for modifying
>>>>>>>                            instructions for example
>>>>>>> (add/remove
>>>>>>> breakpoint), extra level of cache flushing is
>>>>>>>                            needed which is not available to
>>>>>>> regular
>>>>>>> userspace.
>>>>>>>
>>>>>>>                            I specifically discussed with
>>>>>>> Sima on
>>>>>>> the
>>>>>>> difference before moving forward with this
>>>>>>>                            design originally. If something
>>>>>>> has
>>>>>>> changed
>>>>>>> since then, I'm of course happy to rediscuss.
>>>>>>>
>>>>>>>                            However, if this code can't be
>>>>>>> added,
>>>>>>> not
>>>>>>> sure how we would ever be able
>>>>>>>                            to implement core dumps for GPU
>>>>>>> threads/memory?
>>>>>>>
>>>>>>>
>>>>>>>                                          If you need to
>>>>>>> access
>>>>>>> BOs
>>>>>>> which are placed in not CPU accessible memory then
>>>>>>>                                          implement the
>>>>>>> access
>>>>>>> callback
>>>>>>> for ptrace, see amdgpu_ttm_access_memory for
>>>>>>>                                          an example how to
>>>>>>> do
>>>>>>> this.
>>>>>>>
>>>>>>>                            As also mentioned above, we don't
>>>>>>> work
>>>>>>> via
>>>>>>> ptrace at all when it comes
>>>>>>>                            to debugging the EUs. The only
>>>>>>> thing
>>>>>>> used
>>>>>>> for now is the ptrace_may_access to
>>>>>>>                            implement similar access
>>>>>>> restrictions
>>>>>>> as
>>>>>>> ptrace has. This can be changed
>>>>>>>                            to something else if needed.
>>>>>>>
>>>>>>>
>>>>>>>                                      Ptrace access via
>>>>>>> vm_operations_struct.access → ttm_bo_vm_access.
>>>>>>>
>>>>>>>                                      This series renames
>>>>>>> ttm_bo_vm_access to ttm_bo_access, with no code changes.
>>>>>>>
>>>>>>>                                      The above function
>>>>>>> accesses
>>>>>>> a BO
>>>>>>> via kmap if it is in SYSTEM / TT,
>>>>>>>                                      which is existing code.
>>>>>>>
>>>>>>>                                      This function is only
>>>>>>> exposed to
>>>>>>> user space via ptrace permissions.
>>>>>>>
>>>>>>>                            Maybe this sentence is what
>>>>>>> caused the
>>>>>>> confusion.
>>>>>>>
>>>>>>>                            Userspace is never exposed with
>>>>>>> peek/poke
>>>>>>> interface, only the debugger
>>>>>>>                            connection which is its own FD.
>>>>>>>
>>>>>>>
>>>>>>>                                      In this series, we
>>>>>>> implement
>>>>>>> a
>>>>>>> function [3] similar to
>>>>>>>                                     
>>>>>>> amdgpu_ttm_access_memory for
>>>>>>> the
>>>>>>> TTM vfunc access_memory. What is
>>>>>>>                                      missing is non-visible
>>>>>>> CPU
>>>>>>> memory
>>>>>>> access, similar to
>>>>>>>                                     
>>>>>>> amdgpu_ttm_access_memory_sdma.
>>>>>>> This will be addressed in a follow-up and
>>>>>>>                                      was omitted in this
>>>>>>> series
>>>>>>> given
>>>>>>> its complexity.
>>>>>>>
>>>>>>>                                      So, this looks more or
>>>>>>> less
>>>>>>> identical to AMD's ptrace implementation,
>>>>>>>                                      but in GPU address
>>>>>>> space.
>>>>>>> Again,
>>>>>>> I fail to see what the problem is here.
>>>>>>>                                      What am I missing?
>>>>>>>
>>>>>>>
>>>>>>>                                The main question is why
>>>>>>> can't you
>>>>>>> use
>>>>>>> the existing interfaces directly?
>>>>>>>
>>>>>>>                            We're not working on the CPU
>>>>>>> address
>>>>>>> space
>>>>>>> or BOs. We're working
>>>>>>>                            strictly on the GPU address space
>>>>>>> as
>>>>>>> would
>>>>>>> be seen by an EU thread if it
>>>>>>>                            accessed address X.
>>>>>>>
>>>>>>>
>>>>>>>                                Additional to the peek/poke
>>>>>>> interface
>>>>>>> of ptrace Linux has the pidfd_getfd
>>>>>>>                                system call, see
>>>>>>> here
>>>>>>> https://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
>>>>>>>
>>>>>>>                                The pidfd_getfd() allows to
>>>>>>> dup()
>>>>>>> the
>>>>>>> render node file descriptor into your gdb
>>>>>>>                                process. That in turn gives
>>>>>>> you
>>>>>>> all the
>>>>>>> access you need from gdb, including
>>>>>>>                                mapping BOs and command
>>>>>>> submission
>>>>>>> on
>>>>>>> behalf of the application.
>>>>>>>
>>>>>>>                            We're not operating on the CPU
>>>>>>> address
>>>>>>> space nor are we operating on BOs
>>>>>>>                            (there is no concept of BO in the
>>>>>>> EU
>>>>>>> debug
>>>>>>> interface). Each VMA in the VM
>>>>>>>                            could come from anywhere, only
>>>>>>> the
>>>>>>> start
>>>>>>> address and size matter. And
>>>>>>>                            neither do we need to interfere
>>>>>>> with
>>>>>>> the
>>>>>>> command submission of the
>>>>>>>                            process under debug.
>>>>>>>
>>>>>>>
>>>>>>>                                As far as I can see that
>>>>>>> allows
>>>>>>> for the
>>>>>>> same functionality as the eudebug
>>>>>>>                                interface, just without any
>>>>>>> driver
>>>>>>> specific code messing with ptrace
>>>>>>>                                permissions and peek/poke
>>>>>>> interfaces.
>>>>>>>
>>>>>>>                                So the question is still why
>>>>>>> do
>>>>>>> you
>>>>>>> need the whole eudebug interface in the
>>>>>>>                                first place? I might be
>>>>>>> missing
>>>>>>> something, but that seems to be superfluous
>>>>>>>                                from a high level view.
>>>>>>>
>>>>>>>                            Recapping from above. It is to
>>>>>>> allow
>>>>>>> the
>>>>>>> debugging of EU threads per DRM
>>>>>>>                            client, completely independent of
>>>>>>> the
>>>>>>> CPU
>>>>>>> process. If ptrace_may_acces
>>>>>>>                            is the sore point, we could
>>>>>>> consider
>>>>>>> other
>>>>>>> permission checks, too. There
>>>>>>>                            is no other connection to ptrace
>>>>>>> in
>>>>>>> this
>>>>>>> architecture as single
>>>>>>>                            permission check to know if PID
>>>>>>> is
>>>>>>> fair
>>>>>>> game to access by debugger
>>>>>>>                            process.
>>>>>>>
>>>>>>>                            Why no parasitic thread or
>>>>>>> ptrace:
>>>>>>> Going
>>>>>>> forward, binding the EU debugging to
>>>>>>>                            the DRM client would also pave
>>>>>>> way for
>>>>>>> being able to extend core kernel generated
>>>>>>>                            core dump with each DRM client's
>>>>>>> EU
>>>>>>> thread/memory dump. We have similar
>>>>>>>                            feature called "Offline core
>>>>>>> dump"
>>>>>>> enabled
>>>>>>> in the downstream public
>>>>>>>                            trees for i915, where we
>>>>>>> currently
>>>>>>> attach
>>>>>>> the EU thread dump to i915 error state
>>>>>>>                            and then later combine i915 error
>>>>>>> state
>>>>>>> with CPU core dump file with a
>>>>>>>                            tool.
>>>>>>>
>>>>>>>                            This is relatively little amount
>>>>>>> of
>>>>>>> extra
>>>>>>> code, as this baseline series
>>>>>>>                            already introduces GDB the
>>>>>>> ability to
>>>>>>> perform the necessary actions.
>>>>>>>                            It's just the matter of kernel
>>>>>>> driver
>>>>>>> calling: "stop all threads", then
>>>>>>>                            copying the memory map and memory
>>>>>>> contents
>>>>>>> for GPU threads, just like is
>>>>>>>                            done for CPU threads.
>>>>>>>
>>>>>>>                            With parasitic thread injection,
>>>>>>> not
>>>>>>> sure
>>>>>>> if there is such way forward,
>>>>>>>                            as it would seem to require to
>>>>>>> inject
>>>>>>> quite
>>>>>>> abit more logic to core kernel?
>>>>>>>
>>>>>>>
>>>>>>>                                It's true that the AMD KFD
>>>>>>> part
>>>>>>> has
>>>>>>> still similar functionality, but that is
>>>>>>>                                because of the broken KFD
>>>>>>> design
>>>>>>> of
>>>>>>> tying driver state to the CPU process
>>>>>>>                                (which makes it inaccessible
>>>>>>> for
>>>>>>> gdb
>>>>>>> even with imported render node fd).
>>>>>>>
>>>>>>>                                Both Sima and I (and
>>>>>>> partially
>>>>>>> Dave as
>>>>>>> well) have pushed back on the KFD
>>>>>>>                                approach. And the long term
>>>>>>> plan
>>>>>>> is to
>>>>>>> get rid of such device driver specific
>>>>>>>                                interface which re-implement
>>>>>>> existing
>>>>>>> functionality just differently.
>>>>>>>
>>>>>>>                            Recapping, this series is not
>>>>>>> adding
>>>>>>> it
>>>>>>> back. The debugger connection
>>>>>>>                            is a separate FD from the DRM
>>>>>>> one,
>>>>>>> with
>>>>>>> separate IOCTL set. We don't allow
>>>>>>>                            the DRM FD any new operations
>>>>>>> based on
>>>>>>> ptrace is attached or not. We
>>>>>>>                            don't ever do that check even.
>>>>>>>
>>>>>>>                            We only restrict the opening of
>>>>>>> the
>>>>>>> debugger connection to given PID with
>>>>>>>                            ptrace_may_access check for now.
>>>>>>> That
>>>>>>> can
>>>>>>> be changed to something else,
>>>>>>>                            if necessary.
>>>>>>>
>>>>>>>                        Yeah I think unnecessarily tying gpu
>>>>>>> processes
>>>>>>> to cpu processes is a bad
>>>>>>>                        thing, least because even today all
>>>>>>> the
>>>>>>> svm
>>>>>>> discussions we have still hit
>>>>>>>                        clear use-cases, where a 1:1 match is
>>>>>>> not
>>>>>>> wanted (like multiple gpu svm
>>>>>>>                        sections with offsets). Not even
>>>>>>> speaking
>>>>>>> of
>>>>>>> all the gpu usecases where
>>>>>>>                        the gpu vm space is still entirely
>>>>>>> independent
>>>>>>> of the cpu side.
>>>>>>>
>>>>>>>                        So that's why I think this entirely
>>>>>>> separate
>>>>>>> approach looks like the right
>>>>>>>                        one, with ptrace_may_access as the
>>>>>>> access
>>>>>>> control check to make sure we
>>>>>>>                        match ptrace on the cpu side.
>>>>>>>
>>>>>>>                        But there's very obviously a bikeshed
>>>>>>> to
>>>>>>> be had
>>>>>>> on what the actual uapi
>>>>>>>                        should look like, especially how gdb
>>>>>>> opens
>>>>>>> up a
>>>>>>> gpu debug access fd. But I
>>>>>>>                        also think that's not much on drm to
>>>>>>> decide,
>>>>>>> but whatever gdb wants. And
>>>>>>>                        then we aim for some consistency on
>>>>>>> that
>>>>>>> lookup/access control part
>>>>>>>                        (ideally, I might be missing some
>>>>>>> reasons
>>>>>>> why
>>>>>>> this is a bad idea) across
>>>>>>>                        drm drivers.
>>>>>>>
>>>>>>>
>>>>>>>                                So you need to have a really
>>>>>>> really
>>>>>>> good explanation why the eudebug interface
>>>>>>>                                is actually necessary.
>>>>>>>
>>>>>>>                            TL;DR The main point is to
>>>>>>> decouple
>>>>>>> the
>>>>>>> debugging of the EU workloads from the
>>>>>>>                            debugging of the CPU process.
>>>>>>> This
>>>>>>> avoids
>>>>>>> the interference with the CPU process with
>>>>>>>                            parasitic thread injection.
>>>>>>> Further
>>>>>>> this
>>>>>>> also allows generating a core dump
>>>>>>>                            without any GDB connected. There
>>>>>>> are
>>>>>>> also
>>>>>>> many other smaller pros/cons
>>>>>>>                            which can be discussed but for
>>>>>>> the
>>>>>>> context
>>>>>>> of this patch, this is the
>>>>>>>                            main one.
>>>>>>>
>>>>>>>                            So unlike parasitic thread
>>>>>>> injection,
>>>>>>> we
>>>>>>> don't unlock any special IOCTL for
>>>>>>>                            the process under debug to be
>>>>>>> performed by
>>>>>>> the parasitic thread, but we
>>>>>>>                            allow the minimal set of
>>>>>>> operations to
>>>>>>> be
>>>>>>> performed by GDB as if those were
>>>>>>>                            done on the EUs themselves.
>>>>>>>
>>>>>>>                            One can think of it like the
>>>>>>> minimal
>>>>>>> subset
>>>>>>> of ptrace but for EU threads,
>>>>>>>                            not the CPU threads. And thus,
>>>>>>> building on
>>>>>>> this it's possible to extend
>>>>>>>                            the core kernel generated core
>>>>>>> dumps
>>>>>>> with
>>>>>>> DRM specific extension which
>>>>>>>                            would contain the EU
>>>>>>> thread/memory
>>>>>>> dump.
>>>>>>>
>>>>>>>                        It might be good to document (in that
>>>>>>> debugging
>>>>>>> doc patch probably) why
>>>>>>>                        thread injection is not a great
>>>>>>> option,
>>>>>>> and why
>>>>>>> the tradeoffs for
>>>>>>>                        debugging are different than for for
>>>>>>> checkpoint/restore, where with CRIU
>>>>>>>                        we landed on doing most of this in
>>>>>>> userspace,
>>>>>>> and often requiring
>>>>>>>                        injection threads to make it all
>>>>>>> work.
>>>>>>>
>>>>>>>                        Cheers, Sima
>>>>>>>
>>>>>>>
>>>>>>>                            Regards, Joonas
>>>>>>>
>>>>>>>
>>>>>>>                                Regards,
>>>>>>>                                Christian.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                                      Matt
>>>>>>>
>>>>>>>                                     
>>>>>>> [3]
>>>>>>> https://patchwork.freedesktop.org/patch/622520/?series=140200&r
>>>>>>> e
>>>>>>> v=6
>>>>>>>
>>>>>>>
>>>>>>>                                          Regards,
>>>>>>>                                          Christian.
>>>>>>>
>>>>>>>
>>>>>>>                                              Matt
>>>>>>>
>>>>>>>
>>>>>>>                                                  Regards,
>>>>>>>                                                  Christian.
>>>>>>>
>>>>>>>
>>>>>>>

[-- Attachment #2: Type: text/html, Size: 48760 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-13 11:42                                                   ` Christian König
@ 2024-11-15 18:27                                                     ` Matthew Brost
  2024-11-25 15:29                                                       ` Matthew Brost
  0 siblings, 1 reply; 56+ messages in thread
From: Matthew Brost @ 2024-11-15 18:27 UTC (permalink / raw)
  To: Christian König
  Cc: Thomas Hellström, Joonas Lahtinen, Christian König,
	Simona Vetter, Rodrigo Vivi, Huang Rui, intel-xe, dri-devel,
	matthew.auld, David Airlie, Simona Vetter

On Wed, Nov 13, 2024 at 12:42:35PM +0100, Christian König wrote:
> Am 13.11.24 um 11:44 schrieb Thomas Hellström:
> > On Wed, 2024-11-13 at 09:37 +0100, Christian König wrote:
> > > Am 12.11.24 um 17:33 schrieb Thomas Hellström:
> > > > [SNIP]
> > > > > > > This has been extensively discussed already, but was expected
> > > > > > > to
> > > > > > > really
> > > > > > > only be needed for low-on-memory scenarios. However it now
> > > > > > > seems
> > > > > > > like
> > > > > > > the need is much earlier due to the random userptr page
> > > > > > > joining
> > > > > > > by
> > > > > > > core
> > > > > > > mm.
> > > > > > Just to clarify here:
> > > > > > In Long-Running mode with recoverable pagefaults enabled we
> > > > > > don't
> > > > > > have
> > > > > > any preempt-fences, but rather just zap the PTEs pointing to
> > > > > > the
> > > > > > affected memory and flush TLB. So from a memory resource POW a
> > > > > > breakpoint should be safe, and no mmu notifier nor shrinker
> > > > > > will be
> > > > > > blocked.
> > > > > That sounds like a HMM based approach which would clearly work.
> > > > > 
> > > > > But where is that? I don't see any HMM based approach anywhere in
> > > > > the
> > > > > XE
> > > > > driver.
> > > > This is a mode that uses recoverable pagefaults to fault either
> > > > full
> > > > userptr or full bos, and used with DRM_XE_VM_CRATE_FLAG_FAULT_MODE.
> > > > (not SVM)!
> > > > 
> > > > userptrs in xe are bo-less, and using the vm's resv, but otherwise
> > > > using hmm similar to amdgpu: xe_hmm.c
> > > Yeah, I have seen that one.
> > > 
> > > > fault servicing:
> > > > xe_gt_pagefault.c
> > > > 
> > > > PTE zapping on eviction and notifier:
> > > > xe_vm_invalidate_vma(), xe_vm.c
> > > Ah, that was the stuff I was missing.
> > > 
> > > So the implementation in xe_preempt_fence.c is just for graphics
> > > submissions? That would make the whole thing much easier to handle.
> > Actually it's not, it's intended for long-running mode, but as a
> > consequence the debugger would be allowed only in fault mode.
> 
> Make sense, yes.
> 
> > > The only remaining question I can then see is if long running
> > > submissions with DRM_XE_VM_CRATE_FLAG_FAULT_MODE could potentially
> > > block
> > > graphics submissions without this flag from accessing the hardware?
> > Yes and no. We have a mechanism in place that allows either only fault
> > mode jobs or non-faulting jobs on the same, what we call "engine
> > group".
> > A pagefault on an engine group would block or hamper progress of other
> > jobs on that engine group.
> > 
> > So let's say a dma-fence job is submitted to an engine group that is
> > currently running a faulting job. We'd then need to switch mode of the
> > engine group and, in the exec ioctl we'd (explicitly without preempt-
> > fences) preempt the faulting job before submitting the dma-fence job
> > and publishing its fence. This preemption will incur a delay which is
> > typically the delay of servicing any outstanding pagefaults. It's not
> > ideal, but the best we can do, and it doesn't affect core memory
> > management nor does it affect migration blits.
> > 
> > In the debugger case, this delay could be long due to breakpoints, and
> > that's why enabling the debugger would sit behind a flag and not
> > something default (I think this was discussed earlier in the thread).
> > Still, core memory management would be unaffected, and also ofc the
> > migration blits are completely independent.
> 
> Yeah, that sounds totally sane to me.
> 

Nice, glad to see this part of the thread resolved.

Setting aside the peek/poke and FD PID duplication issues (which seem to
be part of a larger discussion, with Joonas as the point of contact for
that), we have another use case for this helper in my current series.

We use this interface to read a BO marked with a dumpable flag during a
GPU hang in our error capture code. This is an internal KMD feature, not
directly exposed to user space. Would adding this helper be acceptable
for this use case? I can add kernel indicating the current restrictions
of the helper (do not directly expose to user space) too if that would
help.

Matt

> Sorry for the noise then. I didn't realized that you have two separate modes
> of operation.
> 
> Going to reply on the other open questions separately.
> 
> Regards,
> Christian.
> 
> > 
> > /Thomas
> > 
> > > Thanks a lot for pointing this out,
> > > Christian.
> > > 
> > > > Thanks,
> > > > Thomas
> > > > 
> > > > > Regards,
> > > > > Christian.
> > > > > 
> > > > > > Nor will there be any jobs with published dma-fences depending
> > > > > > on
> > > > > > the
> > > > > > job blocked either temporarily by a pagefault or long-term by a
> > > > > > debugger breakpoint.
> > > > > > 
> > > > > > /Thomas
> > > > > > 
> > > > > > 
> > > > > > > If that is done and the memory pre-empt fence is serviced
> > > > > > > even
> > > > > > > for
> > > > > > > debuggable contexts, do you have further concerns with the
> > > > > > > presented
> > > > > > > approach
> > > > > > > from dma-buf and drm/sched perspective?
> > > > > > > 
> > > > > > > Regards, Joonas
> > > > > > > 
> > > > > > > > Regards,
> > > > > > > > Christian.
> > > > > > > > 
> > > > > > > > 
> > > > > > > >            This means that a breakpoint or core dump doesn't
> > > > > > > > halt
> > > > > > > > GPU
> > > > > > > > threads, but
> > > > > > > >            rather suspends them. E.g. all running wave data
> > > > > > > > is
> > > > > > > > collected into a state
> > > > > > > >            bag which can be restored later on.
> > > > > > > > 
> > > > > > > >            I was under the impression that those long
> > > > > > > > running
> > > > > > > > compute
> > > > > > > > threads do
> > > > > > > >            exactly that, but when the hardware can't switch
> > > > > > > > out
> > > > > > > > the
> > > > > > > > GPU thread/process
> > > > > > > >            while in a break then that isn't the case.
> > > > > > > > 
> > > > > > > >            As long as you don't find a way to avoid that
> > > > > > > > this
> > > > > > > > patch
> > > > > > > > set is a pretty
> > > > > > > >            clear NAK from my side as DMA-buf and TTM
> > > > > > > > maintainer.
> > > > > > > > 
> > > > > > > > 
> > > > > > > >        I believe this is addressed above.
> > > > > > > > 
> > > > > > > >        Matt
> > > > > > > > 
> > > > > > > > 
> > > > > > > >            What might work is to keep the submission on the
> > > > > > > > hardware
> > > > > > > > in the break state
> > > > > > > >            but forbid any memory access. This way you can
> > > > > > > > signal
> > > > > > > > your
> > > > > > > > preemption fence
> > > > > > > >            even when the hardware isn't made available.
> > > > > > > > 
> > > > > > > >            Before you continue XE setups a new pre-emption
> > > > > > > > fence
> > > > > > > > and
> > > > > > > > makes sure that
> > > > > > > >            all page tables etc... are up to date.
> > > > > > > > 
> > > > > > > >            Could be tricky to get this right if completion
> > > > > > > > fence
> > > > > > > > based
> > > > > > > > submissions are
> > > > > > > >            mixed in as well, but that gives you at least a
> > > > > > > > direction
> > > > > > > > you could
> > > > > > > >            potentially go.
> > > > > > > > 
> > > > > > > >            Regards,
> > > > > > > >            Christian.
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                Regards, Joonas
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                    Regards,
> > > > > > > >                    Christian.
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                        Some wash-up thoughts from me below,
> > > > > > > > but
> > > > > > > > consider them fairly irrelevant
> > > > > > > >                        since I think the main driver for
> > > > > > > > these
> > > > > > > > big
> > > > > > > > questions here should be
> > > > > > > >                        gdb/userspace.
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                            Quoting Christian König (2024-11-
> > > > > > > > 07
> > > > > > > > 11:44:33)
> > > > > > > > 
> > > > > > > >                                Am 06.11.24 um 18:00 schrieb
> > > > > > > > Matthew
> > > > > > > > Brost:
> > > > > > > > 
> > > > > > > >                                      [SNIP]
> > > > > > > > 
> > > > > > > >                                      This is not a generic
> > > > > > > > interface
> > > > > > > > that anyone can freely access. The same
> > > > > > > >                                      permissions used by
> > > > > > > > ptrace
> > > > > > > > are
> > > > > > > > checked when opening such an interface.
> > > > > > > >                                      See [1] [2].
> > > > > > > > 
> > > > > > > > [1]
> > > > > > > > https://patchwork.freedesktop.org/patch/617470/?series=136572&r
> > > > > > > > e
> > > > > > > > v=2
> > > > > > > > [2]
> > > > > > > > https://patchwork.freedesktop.org/patch/617471/?series=136572&r
> > > > > > > > e
> > > > > > > > v=2
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                                Thanks a lot for those
> > > > > > > > pointers,
> > > > > > > > that
> > > > > > > > is exactly what I was looking for.
> > > > > > > > 
> > > > > > > >                                And yeah, it is what I
> > > > > > > > feared. You
> > > > > > > > are
> > > > > > > > re-implementing existing functionality,
> > > > > > > >                                but see below.
> > > > > > > > 
> > > > > > > >                            Could you elaborate on what this
> > > > > > > > "existing
> > > > > > > > functionality" exactly is?
> > > > > > > >                            I do not think this functionality
> > > > > > > > exists at
> > > > > > > > this time.
> > > > > > > > 
> > > > > > > >                            The EU debugging architecture for
> > > > > > > > Xe
> > > > > > > > specifically avoids the need for GDB
> > > > > > > >                            to attach with ptrace to the CPU
> > > > > > > > process or
> > > > > > > > interfere with the CPU process for
> > > > > > > >                            the debugging via parasitic
> > > > > > > > threads or
> > > > > > > > so.
> > > > > > > > 
> > > > > > > >                            Debugger connection is opened to
> > > > > > > > the
> > > > > > > > DRM
> > > > > > > > driver for given PID (which uses the
> > > > > > > >                            ptrace may access check for now)
> > > > > > > > after
> > > > > > > > which the all DRM client of that
> > > > > > > >                            PID are exposed to the debugger
> > > > > > > > process.
> > > > > > > > 
> > > > > > > >                            What we want to expose via that
> > > > > > > > debugger
> > > > > > > > connection is the ability for GDB to
> > > > > > > >                            read/write the different GPU VM
> > > > > > > > address
> > > > > > > > spaces (ppGTT for Intel GPUs) just like
> > > > > > > >                            the EU threads would see them.
> > > > > > > > Note
> > > > > > > > that
> > > > > > > > the layout of the ppGTT is
> > > > > > > >                            completely up to the userspace
> > > > > > > > driver
> > > > > > > > to
> > > > > > > > setup and is mostly only partially
> > > > > > > >                            equal to the CPU address space.
> > > > > > > > 
> > > > > > > >                            Specifically as part of
> > > > > > > > reading/writing the
> > > > > > > > ppGTT for debugging purposes,
> > > > > > > >                            there are deep flushes needed:
> > > > > > > > for
> > > > > > > > example
> > > > > > > > flushing instruction cache
> > > > > > > >                            when adding/removing breakpoints.
> > > > > > > > 
> > > > > > > >                            Maybe that will explain the
> > > > > > > > background. I
> > > > > > > > elaborate on this at the end some more.
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                                              kmap/vmap are
> > > > > > > > used
> > > > > > > > everywhere in the DRM subsystem to access BOs, so I’m
> > > > > > > >                                              failing to see
> > > > > > > > the
> > > > > > > > problem with adding a simple helper based on existing
> > > > > > > >                                              code.
> > > > > > > > 
> > > > > > > >                                          What#s possible and
> > > > > > > > often
> > > > > > > > done is to do kmap/vmap if you need to implement a
> > > > > > > >                                          CPU copy for
> > > > > > > > scanout for
> > > > > > > > example or for copying/validating command buffers.
> > > > > > > >                                          But that usually
> > > > > > > > requires
> > > > > > > > accessing the whole BO and has separate security
> > > > > > > >                                          checks.
> > > > > > > > 
> > > > > > > >                                          When you want to
> > > > > > > > access
> > > > > > > > only
> > > > > > > > a few bytes of a BO that sounds massively like
> > > > > > > >                                          a peek/poke like
> > > > > > > > interface
> > > > > > > > and we have already rejected that more than once.
> > > > > > > >                                          There even used to
> > > > > > > > be
> > > > > > > > standardized GEM IOCTLs for that which have been
> > > > > > > >                                          removed by now.
> > > > > > > > 
> > > > > > > >                            Referring to the explanation at
> > > > > > > > top:
> > > > > > > > These
> > > > > > > > IOCTL are not for the debugging target
> > > > > > > >                            process to issue. The peek/poke
> > > > > > > > interface
> > > > > > > > is specifically for GDB only
> > > > > > > >                            to facilitate the emulation of
> > > > > > > > memory
> > > > > > > > reads/writes on the GPU address
> > > > > > > >                            space as they were done by EUs
> > > > > > > > themselves.
> > > > > > > > And to recap: for modifying
> > > > > > > >                            instructions for example
> > > > > > > > (add/remove
> > > > > > > > breakpoint), extra level of cache flushing is
> > > > > > > >                            needed which is not available to
> > > > > > > > regular
> > > > > > > > userspace.
> > > > > > > > 
> > > > > > > >                            I specifically discussed with
> > > > > > > > Sima on
> > > > > > > > the
> > > > > > > > difference before moving forward with this
> > > > > > > >                            design originally. If something
> > > > > > > > has
> > > > > > > > changed
> > > > > > > > since then, I'm of course happy to rediscuss.
> > > > > > > > 
> > > > > > > >                            However, if this code can't be
> > > > > > > > added,
> > > > > > > > not
> > > > > > > > sure how we would ever be able
> > > > > > > >                            to implement core dumps for GPU
> > > > > > > > threads/memory?
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                                          If you need to
> > > > > > > > access
> > > > > > > > BOs
> > > > > > > > which are placed in not CPU accessible memory then
> > > > > > > >                                          implement the
> > > > > > > > access
> > > > > > > > callback
> > > > > > > > for ptrace, see amdgpu_ttm_access_memory for
> > > > > > > >                                          an example how to
> > > > > > > > do
> > > > > > > > this.
> > > > > > > > 
> > > > > > > >                            As also mentioned above, we don't
> > > > > > > > work
> > > > > > > > via
> > > > > > > > ptrace at all when it comes
> > > > > > > >                            to debugging the EUs. The only
> > > > > > > > thing
> > > > > > > > used
> > > > > > > > for now is the ptrace_may_access to
> > > > > > > >                            implement similar access
> > > > > > > > restrictions
> > > > > > > > as
> > > > > > > > ptrace has. This can be changed
> > > > > > > >                            to something else if needed.
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                                      Ptrace access via
> > > > > > > > vm_operations_struct.access → ttm_bo_vm_access.
> > > > > > > > 
> > > > > > > >                                      This series renames
> > > > > > > > ttm_bo_vm_access to ttm_bo_access, with no code changes.
> > > > > > > > 
> > > > > > > >                                      The above function
> > > > > > > > accesses
> > > > > > > > a BO
> > > > > > > > via kmap if it is in SYSTEM / TT,
> > > > > > > >                                      which is existing code.
> > > > > > > > 
> > > > > > > >                                      This function is only
> > > > > > > > exposed to
> > > > > > > > user space via ptrace permissions.
> > > > > > > > 
> > > > > > > >                            Maybe this sentence is what
> > > > > > > > caused the
> > > > > > > > confusion.
> > > > > > > > 
> > > > > > > >                            Userspace is never exposed with
> > > > > > > > peek/poke
> > > > > > > > interface, only the debugger
> > > > > > > >                            connection which is its own FD.
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                                      In this series, we
> > > > > > > > implement
> > > > > > > > a
> > > > > > > > function [3] similar to
> > > > > > > > amdgpu_ttm_access_memory for
> > > > > > > > the
> > > > > > > > TTM vfunc access_memory. What is
> > > > > > > >                                      missing is non-visible
> > > > > > > > CPU
> > > > > > > > memory
> > > > > > > > access, similar to
> > > > > > > > amdgpu_ttm_access_memory_sdma.
> > > > > > > > This will be addressed in a follow-up and
> > > > > > > >                                      was omitted in this
> > > > > > > > series
> > > > > > > > given
> > > > > > > > its complexity.
> > > > > > > > 
> > > > > > > >                                      So, this looks more or
> > > > > > > > less
> > > > > > > > identical to AMD's ptrace implementation,
> > > > > > > >                                      but in GPU address
> > > > > > > > space.
> > > > > > > > Again,
> > > > > > > > I fail to see what the problem is here.
> > > > > > > >                                      What am I missing?
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                                The main question is why
> > > > > > > > can't you
> > > > > > > > use
> > > > > > > > the existing interfaces directly?
> > > > > > > > 
> > > > > > > >                            We're not working on the CPU
> > > > > > > > address
> > > > > > > > space
> > > > > > > > or BOs. We're working
> > > > > > > >                            strictly on the GPU address space
> > > > > > > > as
> > > > > > > > would
> > > > > > > > be seen by an EU thread if it
> > > > > > > >                            accessed address X.
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                                Additional to the peek/poke
> > > > > > > > interface
> > > > > > > > of ptrace Linux has the pidfd_getfd
> > > > > > > >                                system call, see
> > > > > > > > here
> > > > > > > > https://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
> > > > > > > > 
> > > > > > > >                                The pidfd_getfd() allows to
> > > > > > > > dup()
> > > > > > > > the
> > > > > > > > render node file descriptor into your gdb
> > > > > > > >                                process. That in turn gives
> > > > > > > > you
> > > > > > > > all the
> > > > > > > > access you need from gdb, including
> > > > > > > >                                mapping BOs and command
> > > > > > > > submission
> > > > > > > > on
> > > > > > > > behalf of the application.
> > > > > > > > 
> > > > > > > >                            We're not operating on the CPU
> > > > > > > > address
> > > > > > > > space nor are we operating on BOs
> > > > > > > >                            (there is no concept of BO in the
> > > > > > > > EU
> > > > > > > > debug
> > > > > > > > interface). Each VMA in the VM
> > > > > > > >                            could come from anywhere, only
> > > > > > > > the
> > > > > > > > start
> > > > > > > > address and size matter. And
> > > > > > > >                            neither do we need to interfere
> > > > > > > > with
> > > > > > > > the
> > > > > > > > command submission of the
> > > > > > > >                            process under debug.
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                                As far as I can see that
> > > > > > > > allows
> > > > > > > > for the
> > > > > > > > same functionality as the eudebug
> > > > > > > >                                interface, just without any
> > > > > > > > driver
> > > > > > > > specific code messing with ptrace
> > > > > > > >                                permissions and peek/poke
> > > > > > > > interfaces.
> > > > > > > > 
> > > > > > > >                                So the question is still why
> > > > > > > > do
> > > > > > > > you
> > > > > > > > need the whole eudebug interface in the
> > > > > > > >                                first place? I might be
> > > > > > > > missing
> > > > > > > > something, but that seems to be superfluous
> > > > > > > >                                from a high level view.
> > > > > > > > 
> > > > > > > >                            Recapping from above. It is to
> > > > > > > > allow
> > > > > > > > the
> > > > > > > > debugging of EU threads per DRM
> > > > > > > >                            client, completely independent of
> > > > > > > > the
> > > > > > > > CPU
> > > > > > > > process. If ptrace_may_acces
> > > > > > > >                            is the sore point, we could
> > > > > > > > consider
> > > > > > > > other
> > > > > > > > permission checks, too. There
> > > > > > > >                            is no other connection to ptrace
> > > > > > > > in
> > > > > > > > this
> > > > > > > > architecture as single
> > > > > > > >                            permission check to know if PID
> > > > > > > > is
> > > > > > > > fair
> > > > > > > > game to access by debugger
> > > > > > > >                            process.
> > > > > > > > 
> > > > > > > >                            Why no parasitic thread or
> > > > > > > > ptrace:
> > > > > > > > Going
> > > > > > > > forward, binding the EU debugging to
> > > > > > > >                            the DRM client would also pave
> > > > > > > > way for
> > > > > > > > being able to extend core kernel generated
> > > > > > > >                            core dump with each DRM client's
> > > > > > > > EU
> > > > > > > > thread/memory dump. We have similar
> > > > > > > >                            feature called "Offline core
> > > > > > > > dump"
> > > > > > > > enabled
> > > > > > > > in the downstream public
> > > > > > > >                            trees for i915, where we
> > > > > > > > currently
> > > > > > > > attach
> > > > > > > > the EU thread dump to i915 error state
> > > > > > > >                            and then later combine i915 error
> > > > > > > > state
> > > > > > > > with CPU core dump file with a
> > > > > > > >                            tool.
> > > > > > > > 
> > > > > > > >                            This is relatively little amount
> > > > > > > > of
> > > > > > > > extra
> > > > > > > > code, as this baseline series
> > > > > > > >                            already introduces GDB the
> > > > > > > > ability to
> > > > > > > > perform the necessary actions.
> > > > > > > >                            It's just the matter of kernel
> > > > > > > > driver
> > > > > > > > calling: "stop all threads", then
> > > > > > > >                            copying the memory map and memory
> > > > > > > > contents
> > > > > > > > for GPU threads, just like is
> > > > > > > >                            done for CPU threads.
> > > > > > > > 
> > > > > > > >                            With parasitic thread injection,
> > > > > > > > not
> > > > > > > > sure
> > > > > > > > if there is such way forward,
> > > > > > > >                            as it would seem to require to
> > > > > > > > inject
> > > > > > > > quite
> > > > > > > > abit more logic to core kernel?
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                                It's true that the AMD KFD
> > > > > > > > part
> > > > > > > > has
> > > > > > > > still similar functionality, but that is
> > > > > > > >                                because of the broken KFD
> > > > > > > > design
> > > > > > > > of
> > > > > > > > tying driver state to the CPU process
> > > > > > > >                                (which makes it inaccessible
> > > > > > > > for
> > > > > > > > gdb
> > > > > > > > even with imported render node fd).
> > > > > > > > 
> > > > > > > >                                Both Sima and I (and
> > > > > > > > partially
> > > > > > > > Dave as
> > > > > > > > well) have pushed back on the KFD
> > > > > > > >                                approach. And the long term
> > > > > > > > plan
> > > > > > > > is to
> > > > > > > > get rid of such device driver specific
> > > > > > > >                                interface which re-implement
> > > > > > > > existing
> > > > > > > > functionality just differently.
> > > > > > > > 
> > > > > > > >                            Recapping, this series is not
> > > > > > > > adding
> > > > > > > > it
> > > > > > > > back. The debugger connection
> > > > > > > >                            is a separate FD from the DRM
> > > > > > > > one,
> > > > > > > > with
> > > > > > > > separate IOCTL set. We don't allow
> > > > > > > >                            the DRM FD any new operations
> > > > > > > > based on
> > > > > > > > ptrace is attached or not. We
> > > > > > > >                            don't ever do that check even.
> > > > > > > > 
> > > > > > > >                            We only restrict the opening of
> > > > > > > > the
> > > > > > > > debugger connection to given PID with
> > > > > > > >                            ptrace_may_access check for now.
> > > > > > > > That
> > > > > > > > can
> > > > > > > > be changed to something else,
> > > > > > > >                            if necessary.
> > > > > > > > 
> > > > > > > >                        Yeah I think unnecessarily tying gpu
> > > > > > > > processes
> > > > > > > > to cpu processes is a bad
> > > > > > > >                        thing, least because even today all
> > > > > > > > the
> > > > > > > > svm
> > > > > > > > discussions we have still hit
> > > > > > > >                        clear use-cases, where a 1:1 match is
> > > > > > > > not
> > > > > > > > wanted (like multiple gpu svm
> > > > > > > >                        sections with offsets). Not even
> > > > > > > > speaking
> > > > > > > > of
> > > > > > > > all the gpu usecases where
> > > > > > > >                        the gpu vm space is still entirely
> > > > > > > > independent
> > > > > > > > of the cpu side.
> > > > > > > > 
> > > > > > > >                        So that's why I think this entirely
> > > > > > > > separate
> > > > > > > > approach looks like the right
> > > > > > > >                        one, with ptrace_may_access as the
> > > > > > > > access
> > > > > > > > control check to make sure we
> > > > > > > >                        match ptrace on the cpu side.
> > > > > > > > 
> > > > > > > >                        But there's very obviously a bikeshed
> > > > > > > > to
> > > > > > > > be had
> > > > > > > > on what the actual uapi
> > > > > > > >                        should look like, especially how gdb
> > > > > > > > opens
> > > > > > > > up a
> > > > > > > > gpu debug access fd. But I
> > > > > > > >                        also think that's not much on drm to
> > > > > > > > decide,
> > > > > > > > but whatever gdb wants. And
> > > > > > > >                        then we aim for some consistency on
> > > > > > > > that
> > > > > > > > lookup/access control part
> > > > > > > >                        (ideally, I might be missing some
> > > > > > > > reasons
> > > > > > > > why
> > > > > > > > this is a bad idea) across
> > > > > > > >                        drm drivers.
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                                So you need to have a really
> > > > > > > > really
> > > > > > > > good explanation why the eudebug interface
> > > > > > > >                                is actually necessary.
> > > > > > > > 
> > > > > > > >                            TL;DR The main point is to
> > > > > > > > decouple
> > > > > > > > the
> > > > > > > > debugging of the EU workloads from the
> > > > > > > >                            debugging of the CPU process.
> > > > > > > > This
> > > > > > > > avoids
> > > > > > > > the interference with the CPU process with
> > > > > > > >                            parasitic thread injection.
> > > > > > > > Further
> > > > > > > > this
> > > > > > > > also allows generating a core dump
> > > > > > > >                            without any GDB connected. There
> > > > > > > > are
> > > > > > > > also
> > > > > > > > many other smaller pros/cons
> > > > > > > >                            which can be discussed but for
> > > > > > > > the
> > > > > > > > context
> > > > > > > > of this patch, this is the
> > > > > > > >                            main one.
> > > > > > > > 
> > > > > > > >                            So unlike parasitic thread
> > > > > > > > injection,
> > > > > > > > we
> > > > > > > > don't unlock any special IOCTL for
> > > > > > > >                            the process under debug to be
> > > > > > > > performed by
> > > > > > > > the parasitic thread, but we
> > > > > > > >                            allow the minimal set of
> > > > > > > > operations to
> > > > > > > > be
> > > > > > > > performed by GDB as if those were
> > > > > > > >                            done on the EUs themselves.
> > > > > > > > 
> > > > > > > >                            One can think of it like the
> > > > > > > > minimal
> > > > > > > > subset
> > > > > > > > of ptrace but for EU threads,
> > > > > > > >                            not the CPU threads. And thus,
> > > > > > > > building on
> > > > > > > > this it's possible to extend
> > > > > > > >                            the core kernel generated core
> > > > > > > > dumps
> > > > > > > > with
> > > > > > > > DRM specific extension which
> > > > > > > >                            would contain the EU
> > > > > > > > thread/memory
> > > > > > > > dump.
> > > > > > > > 
> > > > > > > >                        It might be good to document (in that
> > > > > > > > debugging
> > > > > > > > doc patch probably) why
> > > > > > > >                        thread injection is not a great
> > > > > > > > option,
> > > > > > > > and why
> > > > > > > > the tradeoffs for
> > > > > > > >                        debugging are different than for for
> > > > > > > > checkpoint/restore, where with CRIU
> > > > > > > >                        we landed on doing most of this in
> > > > > > > > userspace,
> > > > > > > > and often requiring
> > > > > > > >                        injection threads to make it all
> > > > > > > > work.
> > > > > > > > 
> > > > > > > >                        Cheers, Sima
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                            Regards, Joonas
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                                Regards,
> > > > > > > >                                Christian.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                                      Matt
> > > > > > > > 
> > > > > > > > [3]
> > > > > > > > https://patchwork.freedesktop.org/patch/622520/?series=140200&r
> > > > > > > > e
> > > > > > > > v=6
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                                          Regards,
> > > > > > > >                                          Christian.
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                                              Matt
> > > > > > > > 
> > > > > > > > 
> > > > > > > >                                                  Regards,
> > > > > > > >                                                  Christian.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-15 18:27                                                     ` Matthew Brost
@ 2024-11-25 15:29                                                       ` Matthew Brost
  2024-11-25 16:19                                                         ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Matthew Brost @ 2024-11-25 15:29 UTC (permalink / raw)
  To: Christian König
  Cc: Thomas Hellström, Joonas Lahtinen, Christian König,
	Simona Vetter, Rodrigo Vivi, Huang Rui, intel-xe, dri-devel,
	matthew.auld, David Airlie, Simona Vetter

On Fri, Nov 15, 2024 at 10:27:59AM -0800, Matthew Brost wrote:
> On Wed, Nov 13, 2024 at 12:42:35PM +0100, Christian König wrote:
> > Am 13.11.24 um 11:44 schrieb Thomas Hellström:
> > > On Wed, 2024-11-13 at 09:37 +0100, Christian König wrote:
> > > > Am 12.11.24 um 17:33 schrieb Thomas Hellström:
> > > > > [SNIP]
> > > > > > > > This has been extensively discussed already, but was expected
> > > > > > > > to
> > > > > > > > really
> > > > > > > > only be needed for low-on-memory scenarios. However it now
> > > > > > > > seems
> > > > > > > > like
> > > > > > > > the need is much earlier due to the random userptr page
> > > > > > > > joining
> > > > > > > > by
> > > > > > > > core
> > > > > > > > mm.
> > > > > > > Just to clarify here:
> > > > > > > In Long-Running mode with recoverable pagefaults enabled we
> > > > > > > don't
> > > > > > > have
> > > > > > > any preempt-fences, but rather just zap the PTEs pointing to
> > > > > > > the
> > > > > > > affected memory and flush TLB. So from a memory resource POW a
> > > > > > > breakpoint should be safe, and no mmu notifier nor shrinker
> > > > > > > will be
> > > > > > > blocked.
> > > > > > That sounds like a HMM based approach which would clearly work.
> > > > > > 
> > > > > > But where is that? I don't see any HMM based approach anywhere in
> > > > > > the
> > > > > > XE
> > > > > > driver.
> > > > > This is a mode that uses recoverable pagefaults to fault either
> > > > > full
> > > > > userptr or full bos, and used with DRM_XE_VM_CRATE_FLAG_FAULT_MODE.
> > > > > (not SVM)!
> > > > > 
> > > > > userptrs in xe are bo-less, and using the vm's resv, but otherwise
> > > > > using hmm similar to amdgpu: xe_hmm.c
> > > > Yeah, I have seen that one.
> > > > 
> > > > > fault servicing:
> > > > > xe_gt_pagefault.c
> > > > > 
> > > > > PTE zapping on eviction and notifier:
> > > > > xe_vm_invalidate_vma(), xe_vm.c
> > > > Ah, that was the stuff I was missing.
> > > > 
> > > > So the implementation in xe_preempt_fence.c is just for graphics
> > > > submissions? That would make the whole thing much easier to handle.
> > > Actually it's not, it's intended for long-running mode, but as a
> > > consequence the debugger would be allowed only in fault mode.
> > 
> > Make sense, yes.
> > 
> > > > The only remaining question I can then see is if long running
> > > > submissions with DRM_XE_VM_CRATE_FLAG_FAULT_MODE could potentially
> > > > block
> > > > graphics submissions without this flag from accessing the hardware?
> > > Yes and no. We have a mechanism in place that allows either only fault
> > > mode jobs or non-faulting jobs on the same, what we call "engine
> > > group".
> > > A pagefault on an engine group would block or hamper progress of other
> > > jobs on that engine group.
> > > 
> > > So let's say a dma-fence job is submitted to an engine group that is
> > > currently running a faulting job. We'd then need to switch mode of the
> > > engine group and, in the exec ioctl we'd (explicitly without preempt-
> > > fences) preempt the faulting job before submitting the dma-fence job
> > > and publishing its fence. This preemption will incur a delay which is
> > > typically the delay of servicing any outstanding pagefaults. It's not
> > > ideal, but the best we can do, and it doesn't affect core memory
> > > management nor does it affect migration blits.
> > > 
> > > In the debugger case, this delay could be long due to breakpoints, and
> > > that's why enabling the debugger would sit behind a flag and not
> > > something default (I think this was discussed earlier in the thread).
> > > Still, core memory management would be unaffected, and also ofc the
> > > migration blits are completely independent.
> > 
> > Yeah, that sounds totally sane to me.
> > 
> 
> Nice, glad to see this part of the thread resolved.
> 
> Setting aside the peek/poke and FD PID duplication issues (which seem to
> be part of a larger discussion, with Joonas as the point of contact for
> that), we have another use case for this helper in my current series.
> 
> We use this interface to read a BO marked with a dumpable flag during a
> GPU hang in our error capture code. This is an internal KMD feature, not
> directly exposed to user space. Would adding this helper be acceptable
> for this use case? I can add kernel indicating the current restrictions
> of the helper (do not directly expose to user space) too if that would
> help.
> 

Christian - ping on above.


> Matt
> 
> > Sorry for the noise then. I didn't realized that you have two separate modes
> > of operation.
> > 
> > Going to reply on the other open questions separately.
> > 
> > Regards,
> > Christian.
> > 
> > > 
> > > /Thomas
> > > 
> > > > Thanks a lot for pointing this out,
> > > > Christian.
> > > > 
> > > > > Thanks,
> > > > > Thomas
> > > > > 
> > > > > > Regards,
> > > > > > Christian.
> > > > > > 
> > > > > > > Nor will there be any jobs with published dma-fences depending
> > > > > > > on
> > > > > > > the
> > > > > > > job blocked either temporarily by a pagefault or long-term by a
> > > > > > > debugger breakpoint.
> > > > > > > 
> > > > > > > /Thomas
> > > > > > > 
> > > > > > > 
> > > > > > > > If that is done and the memory pre-empt fence is serviced
> > > > > > > > even
> > > > > > > > for
> > > > > > > > debuggable contexts, do you have further concerns with the
> > > > > > > > presented
> > > > > > > > approach
> > > > > > > > from dma-buf and drm/sched perspective?
> > > > > > > > 
> > > > > > > > Regards, Joonas
> > > > > > > > 
> > > > > > > > > Regards,
> > > > > > > > > Christian.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >            This means that a breakpoint or core dump doesn't
> > > > > > > > > halt
> > > > > > > > > GPU
> > > > > > > > > threads, but
> > > > > > > > >            rather suspends them. E.g. all running wave data
> > > > > > > > > is
> > > > > > > > > collected into a state
> > > > > > > > >            bag which can be restored later on.
> > > > > > > > > 
> > > > > > > > >            I was under the impression that those long
> > > > > > > > > running
> > > > > > > > > compute
> > > > > > > > > threads do
> > > > > > > > >            exactly that, but when the hardware can't switch
> > > > > > > > > out
> > > > > > > > > the
> > > > > > > > > GPU thread/process
> > > > > > > > >            while in a break then that isn't the case.
> > > > > > > > > 
> > > > > > > > >            As long as you don't find a way to avoid that
> > > > > > > > > this
> > > > > > > > > patch
> > > > > > > > > set is a pretty
> > > > > > > > >            clear NAK from my side as DMA-buf and TTM
> > > > > > > > > maintainer.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >        I believe this is addressed above.
> > > > > > > > > 
> > > > > > > > >        Matt
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >            What might work is to keep the submission on the
> > > > > > > > > hardware
> > > > > > > > > in the break state
> > > > > > > > >            but forbid any memory access. This way you can
> > > > > > > > > signal
> > > > > > > > > your
> > > > > > > > > preemption fence
> > > > > > > > >            even when the hardware isn't made available.
> > > > > > > > > 
> > > > > > > > >            Before you continue XE setups a new pre-emption
> > > > > > > > > fence
> > > > > > > > > and
> > > > > > > > > makes sure that
> > > > > > > > >            all page tables etc... are up to date.
> > > > > > > > > 
> > > > > > > > >            Could be tricky to get this right if completion
> > > > > > > > > fence
> > > > > > > > > based
> > > > > > > > > submissions are
> > > > > > > > >            mixed in as well, but that gives you at least a
> > > > > > > > > direction
> > > > > > > > > you could
> > > > > > > > >            potentially go.
> > > > > > > > > 
> > > > > > > > >            Regards,
> > > > > > > > >            Christian.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                Regards, Joonas
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                    Regards,
> > > > > > > > >                    Christian.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                        Some wash-up thoughts from me below,
> > > > > > > > > but
> > > > > > > > > consider them fairly irrelevant
> > > > > > > > >                        since I think the main driver for
> > > > > > > > > these
> > > > > > > > > big
> > > > > > > > > questions here should be
> > > > > > > > >                        gdb/userspace.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                            Quoting Christian König (2024-11-
> > > > > > > > > 07
> > > > > > > > > 11:44:33)
> > > > > > > > > 
> > > > > > > > >                                Am 06.11.24 um 18:00 schrieb
> > > > > > > > > Matthew
> > > > > > > > > Brost:
> > > > > > > > > 
> > > > > > > > >                                      [SNIP]
> > > > > > > > > 
> > > > > > > > >                                      This is not a generic
> > > > > > > > > interface
> > > > > > > > > that anyone can freely access. The same
> > > > > > > > >                                      permissions used by
> > > > > > > > > ptrace
> > > > > > > > > are
> > > > > > > > > checked when opening such an interface.
> > > > > > > > >                                      See [1] [2].
> > > > > > > > > 
> > > > > > > > > [1]
> > > > > > > > > https://patchwork.freedesktop.org/patch/617470/?series=136572&r
> > > > > > > > > e
> > > > > > > > > v=2
> > > > > > > > > [2]
> > > > > > > > > https://patchwork.freedesktop.org/patch/617471/?series=136572&r
> > > > > > > > > e
> > > > > > > > > v=2
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                Thanks a lot for those
> > > > > > > > > pointers,
> > > > > > > > > that
> > > > > > > > > is exactly what I was looking for.
> > > > > > > > > 
> > > > > > > > >                                And yeah, it is what I
> > > > > > > > > feared. You
> > > > > > > > > are
> > > > > > > > > re-implementing existing functionality,
> > > > > > > > >                                but see below.
> > > > > > > > > 
> > > > > > > > >                            Could you elaborate on what this
> > > > > > > > > "existing
> > > > > > > > > functionality" exactly is?
> > > > > > > > >                            I do not think this functionality
> > > > > > > > > exists at
> > > > > > > > > this time.
> > > > > > > > > 
> > > > > > > > >                            The EU debugging architecture for
> > > > > > > > > Xe
> > > > > > > > > specifically avoids the need for GDB
> > > > > > > > >                            to attach with ptrace to the CPU
> > > > > > > > > process or
> > > > > > > > > interfere with the CPU process for
> > > > > > > > >                            the debugging via parasitic
> > > > > > > > > threads or
> > > > > > > > > so.
> > > > > > > > > 
> > > > > > > > >                            Debugger connection is opened to
> > > > > > > > > the
> > > > > > > > > DRM
> > > > > > > > > driver for given PID (which uses the
> > > > > > > > >                            ptrace may access check for now)
> > > > > > > > > after
> > > > > > > > > which the all DRM client of that
> > > > > > > > >                            PID are exposed to the debugger
> > > > > > > > > process.
> > > > > > > > > 
> > > > > > > > >                            What we want to expose via that
> > > > > > > > > debugger
> > > > > > > > > connection is the ability for GDB to
> > > > > > > > >                            read/write the different GPU VM
> > > > > > > > > address
> > > > > > > > > spaces (ppGTT for Intel GPUs) just like
> > > > > > > > >                            the EU threads would see them.
> > > > > > > > > Note
> > > > > > > > > that
> > > > > > > > > the layout of the ppGTT is
> > > > > > > > >                            completely up to the userspace
> > > > > > > > > driver
> > > > > > > > > to
> > > > > > > > > setup and is mostly only partially
> > > > > > > > >                            equal to the CPU address space.
> > > > > > > > > 
> > > > > > > > >                            Specifically as part of
> > > > > > > > > reading/writing the
> > > > > > > > > ppGTT for debugging purposes,
> > > > > > > > >                            there are deep flushes needed:
> > > > > > > > > for
> > > > > > > > > example
> > > > > > > > > flushing instruction cache
> > > > > > > > >                            when adding/removing breakpoints.
> > > > > > > > > 
> > > > > > > > >                            Maybe that will explain the
> > > > > > > > > background. I
> > > > > > > > > elaborate on this at the end some more.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                              kmap/vmap are
> > > > > > > > > used
> > > > > > > > > everywhere in the DRM subsystem to access BOs, so I’m
> > > > > > > > >                                              failing to see
> > > > > > > > > the
> > > > > > > > > problem with adding a simple helper based on existing
> > > > > > > > >                                              code.
> > > > > > > > > 
> > > > > > > > >                                          What#s possible and
> > > > > > > > > often
> > > > > > > > > done is to do kmap/vmap if you need to implement a
> > > > > > > > >                                          CPU copy for
> > > > > > > > > scanout for
> > > > > > > > > example or for copying/validating command buffers.
> > > > > > > > >                                          But that usually
> > > > > > > > > requires
> > > > > > > > > accessing the whole BO and has separate security
> > > > > > > > >                                          checks.
> > > > > > > > > 
> > > > > > > > >                                          When you want to
> > > > > > > > > access
> > > > > > > > > only
> > > > > > > > > a few bytes of a BO that sounds massively like
> > > > > > > > >                                          a peek/poke like
> > > > > > > > > interface
> > > > > > > > > and we have already rejected that more than once.
> > > > > > > > >                                          There even used to
> > > > > > > > > be
> > > > > > > > > standardized GEM IOCTLs for that which have been
> > > > > > > > >                                          removed by now.
> > > > > > > > > 
> > > > > > > > >                            Referring to the explanation at
> > > > > > > > > top:
> > > > > > > > > These
> > > > > > > > > IOCTL are not for the debugging target
> > > > > > > > >                            process to issue. The peek/poke
> > > > > > > > > interface
> > > > > > > > > is specifically for GDB only
> > > > > > > > >                            to facilitate the emulation of
> > > > > > > > > memory
> > > > > > > > > reads/writes on the GPU address
> > > > > > > > >                            space as they were done by EUs
> > > > > > > > > themselves.
> > > > > > > > > And to recap: for modifying
> > > > > > > > >                            instructions for example
> > > > > > > > > (add/remove
> > > > > > > > > breakpoint), extra level of cache flushing is
> > > > > > > > >                            needed which is not available to
> > > > > > > > > regular
> > > > > > > > > userspace.
> > > > > > > > > 
> > > > > > > > >                            I specifically discussed with
> > > > > > > > > Sima on
> > > > > > > > > the
> > > > > > > > > difference before moving forward with this
> > > > > > > > >                            design originally. If something
> > > > > > > > > has
> > > > > > > > > changed
> > > > > > > > > since then, I'm of course happy to rediscuss.
> > > > > > > > > 
> > > > > > > > >                            However, if this code can't be
> > > > > > > > > added,
> > > > > > > > > not
> > > > > > > > > sure how we would ever be able
> > > > > > > > >                            to implement core dumps for GPU
> > > > > > > > > threads/memory?
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                          If you need to
> > > > > > > > > access
> > > > > > > > > BOs
> > > > > > > > > which are placed in not CPU accessible memory then
> > > > > > > > >                                          implement the
> > > > > > > > > access
> > > > > > > > > callback
> > > > > > > > > for ptrace, see amdgpu_ttm_access_memory for
> > > > > > > > >                                          an example how to
> > > > > > > > > do
> > > > > > > > > this.
> > > > > > > > > 
> > > > > > > > >                            As also mentioned above, we don't
> > > > > > > > > work
> > > > > > > > > via
> > > > > > > > > ptrace at all when it comes
> > > > > > > > >                            to debugging the EUs. The only
> > > > > > > > > thing
> > > > > > > > > used
> > > > > > > > > for now is the ptrace_may_access to
> > > > > > > > >                            implement similar access
> > > > > > > > > restrictions
> > > > > > > > > as
> > > > > > > > > ptrace has. This can be changed
> > > > > > > > >                            to something else if needed.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                      Ptrace access via
> > > > > > > > > vm_operations_struct.access → ttm_bo_vm_access.
> > > > > > > > > 
> > > > > > > > >                                      This series renames
> > > > > > > > > ttm_bo_vm_access to ttm_bo_access, with no code changes.
> > > > > > > > > 
> > > > > > > > >                                      The above function
> > > > > > > > > accesses
> > > > > > > > > a BO
> > > > > > > > > via kmap if it is in SYSTEM / TT,
> > > > > > > > >                                      which is existing code.
> > > > > > > > > 
> > > > > > > > >                                      This function is only
> > > > > > > > > exposed to
> > > > > > > > > user space via ptrace permissions.
> > > > > > > > > 
> > > > > > > > >                            Maybe this sentence is what
> > > > > > > > > caused the
> > > > > > > > > confusion.
> > > > > > > > > 
> > > > > > > > >                            Userspace is never exposed with
> > > > > > > > > peek/poke
> > > > > > > > > interface, only the debugger
> > > > > > > > >                            connection which is its own FD.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                      In this series, we
> > > > > > > > > implement
> > > > > > > > > a
> > > > > > > > > function [3] similar to
> > > > > > > > > amdgpu_ttm_access_memory for
> > > > > > > > > the
> > > > > > > > > TTM vfunc access_memory. What is
> > > > > > > > >                                      missing is non-visible
> > > > > > > > > CPU
> > > > > > > > > memory
> > > > > > > > > access, similar to
> > > > > > > > > amdgpu_ttm_access_memory_sdma.
> > > > > > > > > This will be addressed in a follow-up and
> > > > > > > > >                                      was omitted in this
> > > > > > > > > series
> > > > > > > > > given
> > > > > > > > > its complexity.
> > > > > > > > > 
> > > > > > > > >                                      So, this looks more or
> > > > > > > > > less
> > > > > > > > > identical to AMD's ptrace implementation,
> > > > > > > > >                                      but in GPU address
> > > > > > > > > space.
> > > > > > > > > Again,
> > > > > > > > > I fail to see what the problem is here.
> > > > > > > > >                                      What am I missing?
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                The main question is why
> > > > > > > > > can't you
> > > > > > > > > use
> > > > > > > > > the existing interfaces directly?
> > > > > > > > > 
> > > > > > > > >                            We're not working on the CPU
> > > > > > > > > address
> > > > > > > > > space
> > > > > > > > > or BOs. We're working
> > > > > > > > >                            strictly on the GPU address space
> > > > > > > > > as
> > > > > > > > > would
> > > > > > > > > be seen by an EU thread if it
> > > > > > > > >                            accessed address X.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                Additional to the peek/poke
> > > > > > > > > interface
> > > > > > > > > of ptrace Linux has the pidfd_getfd
> > > > > > > > >                                system call, see
> > > > > > > > > here
> > > > > > > > > https://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
> > > > > > > > > 
> > > > > > > > >                                The pidfd_getfd() allows to
> > > > > > > > > dup()
> > > > > > > > > the
> > > > > > > > > render node file descriptor into your gdb
> > > > > > > > >                                process. That in turn gives
> > > > > > > > > you
> > > > > > > > > all the
> > > > > > > > > access you need from gdb, including
> > > > > > > > >                                mapping BOs and command
> > > > > > > > > submission
> > > > > > > > > on
> > > > > > > > > behalf of the application.
> > > > > > > > > 
> > > > > > > > >                            We're not operating on the CPU
> > > > > > > > > address
> > > > > > > > > space nor are we operating on BOs
> > > > > > > > >                            (there is no concept of BO in the
> > > > > > > > > EU
> > > > > > > > > debug
> > > > > > > > > interface). Each VMA in the VM
> > > > > > > > >                            could come from anywhere, only
> > > > > > > > > the
> > > > > > > > > start
> > > > > > > > > address and size matter. And
> > > > > > > > >                            neither do we need to interfere
> > > > > > > > > with
> > > > > > > > > the
> > > > > > > > > command submission of the
> > > > > > > > >                            process under debug.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                As far as I can see that
> > > > > > > > > allows
> > > > > > > > > for the
> > > > > > > > > same functionality as the eudebug
> > > > > > > > >                                interface, just without any
> > > > > > > > > driver
> > > > > > > > > specific code messing with ptrace
> > > > > > > > >                                permissions and peek/poke
> > > > > > > > > interfaces.
> > > > > > > > > 
> > > > > > > > >                                So the question is still why
> > > > > > > > > do
> > > > > > > > > you
> > > > > > > > > need the whole eudebug interface in the
> > > > > > > > >                                first place? I might be
> > > > > > > > > missing
> > > > > > > > > something, but that seems to be superfluous
> > > > > > > > >                                from a high level view.
> > > > > > > > > 
> > > > > > > > >                            Recapping from above. It is to
> > > > > > > > > allow
> > > > > > > > > the
> > > > > > > > > debugging of EU threads per DRM
> > > > > > > > >                            client, completely independent of
> > > > > > > > > the
> > > > > > > > > CPU
> > > > > > > > > process. If ptrace_may_acces
> > > > > > > > >                            is the sore point, we could
> > > > > > > > > consider
> > > > > > > > > other
> > > > > > > > > permission checks, too. There
> > > > > > > > >                            is no other connection to ptrace
> > > > > > > > > in
> > > > > > > > > this
> > > > > > > > > architecture as single
> > > > > > > > >                            permission check to know if PID
> > > > > > > > > is
> > > > > > > > > fair
> > > > > > > > > game to access by debugger
> > > > > > > > >                            process.
> > > > > > > > > 
> > > > > > > > >                            Why no parasitic thread or
> > > > > > > > > ptrace:
> > > > > > > > > Going
> > > > > > > > > forward, binding the EU debugging to
> > > > > > > > >                            the DRM client would also pave
> > > > > > > > > way for
> > > > > > > > > being able to extend core kernel generated
> > > > > > > > >                            core dump with each DRM client's
> > > > > > > > > EU
> > > > > > > > > thread/memory dump. We have similar
> > > > > > > > >                            feature called "Offline core
> > > > > > > > > dump"
> > > > > > > > > enabled
> > > > > > > > > in the downstream public
> > > > > > > > >                            trees for i915, where we
> > > > > > > > > currently
> > > > > > > > > attach
> > > > > > > > > the EU thread dump to i915 error state
> > > > > > > > >                            and then later combine i915 error
> > > > > > > > > state
> > > > > > > > > with CPU core dump file with a
> > > > > > > > >                            tool.
> > > > > > > > > 
> > > > > > > > >                            This is relatively little amount
> > > > > > > > > of
> > > > > > > > > extra
> > > > > > > > > code, as this baseline series
> > > > > > > > >                            already introduces GDB the
> > > > > > > > > ability to
> > > > > > > > > perform the necessary actions.
> > > > > > > > >                            It's just the matter of kernel
> > > > > > > > > driver
> > > > > > > > > calling: "stop all threads", then
> > > > > > > > >                            copying the memory map and memory
> > > > > > > > > contents
> > > > > > > > > for GPU threads, just like is
> > > > > > > > >                            done for CPU threads.
> > > > > > > > > 
> > > > > > > > >                            With parasitic thread injection,
> > > > > > > > > not
> > > > > > > > > sure
> > > > > > > > > if there is such way forward,
> > > > > > > > >                            as it would seem to require to
> > > > > > > > > inject
> > > > > > > > > quite
> > > > > > > > > abit more logic to core kernel?
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                It's true that the AMD KFD
> > > > > > > > > part
> > > > > > > > > has
> > > > > > > > > still similar functionality, but that is
> > > > > > > > >                                because of the broken KFD
> > > > > > > > > design
> > > > > > > > > of
> > > > > > > > > tying driver state to the CPU process
> > > > > > > > >                                (which makes it inaccessible
> > > > > > > > > for
> > > > > > > > > gdb
> > > > > > > > > even with imported render node fd).
> > > > > > > > > 
> > > > > > > > >                                Both Sima and I (and
> > > > > > > > > partially
> > > > > > > > > Dave as
> > > > > > > > > well) have pushed back on the KFD
> > > > > > > > >                                approach. And the long term
> > > > > > > > > plan
> > > > > > > > > is to
> > > > > > > > > get rid of such device driver specific
> > > > > > > > >                                interface which re-implement
> > > > > > > > > existing
> > > > > > > > > functionality just differently.
> > > > > > > > > 
> > > > > > > > >                            Recapping, this series is not
> > > > > > > > > adding
> > > > > > > > > it
> > > > > > > > > back. The debugger connection
> > > > > > > > >                            is a separate FD from the DRM
> > > > > > > > > one,
> > > > > > > > > with
> > > > > > > > > separate IOCTL set. We don't allow
> > > > > > > > >                            the DRM FD any new operations
> > > > > > > > > based on
> > > > > > > > > ptrace is attached or not. We
> > > > > > > > >                            don't ever do that check even.
> > > > > > > > > 
> > > > > > > > >                            We only restrict the opening of
> > > > > > > > > the
> > > > > > > > > debugger connection to given PID with
> > > > > > > > >                            ptrace_may_access check for now.
> > > > > > > > > That
> > > > > > > > > can
> > > > > > > > > be changed to something else,
> > > > > > > > >                            if necessary.
> > > > > > > > > 
> > > > > > > > >                        Yeah I think unnecessarily tying gpu
> > > > > > > > > processes
> > > > > > > > > to cpu processes is a bad
> > > > > > > > >                        thing, least because even today all
> > > > > > > > > the
> > > > > > > > > svm
> > > > > > > > > discussions we have still hit
> > > > > > > > >                        clear use-cases, where a 1:1 match is
> > > > > > > > > not
> > > > > > > > > wanted (like multiple gpu svm
> > > > > > > > >                        sections with offsets). Not even
> > > > > > > > > speaking
> > > > > > > > > of
> > > > > > > > > all the gpu usecases where
> > > > > > > > >                        the gpu vm space is still entirely
> > > > > > > > > independent
> > > > > > > > > of the cpu side.
> > > > > > > > > 
> > > > > > > > >                        So that's why I think this entirely
> > > > > > > > > separate
> > > > > > > > > approach looks like the right
> > > > > > > > >                        one, with ptrace_may_access as the
> > > > > > > > > access
> > > > > > > > > control check to make sure we
> > > > > > > > >                        match ptrace on the cpu side.
> > > > > > > > > 
> > > > > > > > >                        But there's very obviously a bikeshed
> > > > > > > > > to
> > > > > > > > > be had
> > > > > > > > > on what the actual uapi
> > > > > > > > >                        should look like, especially how gdb
> > > > > > > > > opens
> > > > > > > > > up a
> > > > > > > > > gpu debug access fd. But I
> > > > > > > > >                        also think that's not much on drm to
> > > > > > > > > decide,
> > > > > > > > > but whatever gdb wants. And
> > > > > > > > >                        then we aim for some consistency on
> > > > > > > > > that
> > > > > > > > > lookup/access control part
> > > > > > > > >                        (ideally, I might be missing some
> > > > > > > > > reasons
> > > > > > > > > why
> > > > > > > > > this is a bad idea) across
> > > > > > > > >                        drm drivers.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                So you need to have a really
> > > > > > > > > really
> > > > > > > > > good explanation why the eudebug interface
> > > > > > > > >                                is actually necessary.
> > > > > > > > > 
> > > > > > > > >                            TL;DR The main point is to
> > > > > > > > > decouple
> > > > > > > > > the
> > > > > > > > > debugging of the EU workloads from the
> > > > > > > > >                            debugging of the CPU process.
> > > > > > > > > This
> > > > > > > > > avoids
> > > > > > > > > the interference with the CPU process with
> > > > > > > > >                            parasitic thread injection.
> > > > > > > > > Further
> > > > > > > > > this
> > > > > > > > > also allows generating a core dump
> > > > > > > > >                            without any GDB connected. There
> > > > > > > > > are
> > > > > > > > > also
> > > > > > > > > many other smaller pros/cons
> > > > > > > > >                            which can be discussed but for
> > > > > > > > > the
> > > > > > > > > context
> > > > > > > > > of this patch, this is the
> > > > > > > > >                            main one.
> > > > > > > > > 
> > > > > > > > >                            So unlike parasitic thread
> > > > > > > > > injection,
> > > > > > > > > we
> > > > > > > > > don't unlock any special IOCTL for
> > > > > > > > >                            the process under debug to be
> > > > > > > > > performed by
> > > > > > > > > the parasitic thread, but we
> > > > > > > > >                            allow the minimal set of
> > > > > > > > > operations to
> > > > > > > > > be
> > > > > > > > > performed by GDB as if those were
> > > > > > > > >                            done on the EUs themselves.
> > > > > > > > > 
> > > > > > > > >                            One can think of it like the
> > > > > > > > > minimal
> > > > > > > > > subset
> > > > > > > > > of ptrace but for EU threads,
> > > > > > > > >                            not the CPU threads. And thus,
> > > > > > > > > building on
> > > > > > > > > this it's possible to extend
> > > > > > > > >                            the core kernel generated core
> > > > > > > > > dumps
> > > > > > > > > with
> > > > > > > > > DRM specific extension which
> > > > > > > > >                            would contain the EU
> > > > > > > > > thread/memory
> > > > > > > > > dump.
> > > > > > > > > 
> > > > > > > > >                        It might be good to document (in that
> > > > > > > > > debugging
> > > > > > > > > doc patch probably) why
> > > > > > > > >                        thread injection is not a great
> > > > > > > > > option,
> > > > > > > > > and why
> > > > > > > > > the tradeoffs for
> > > > > > > > >                        debugging are different than for for
> > > > > > > > > checkpoint/restore, where with CRIU
> > > > > > > > >                        we landed on doing most of this in
> > > > > > > > > userspace,
> > > > > > > > > and often requiring
> > > > > > > > >                        injection threads to make it all
> > > > > > > > > work.
> > > > > > > > > 
> > > > > > > > >                        Cheers, Sima
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                            Regards, Joonas
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                Regards,
> > > > > > > > >                                Christian.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                      Matt
> > > > > > > > > 
> > > > > > > > > [3]
> > > > > > > > > https://patchwork.freedesktop.org/patch/622520/?series=140200&r
> > > > > > > > > e
> > > > > > > > > v=6
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                          Regards,
> > > > > > > > >                                          Christian.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                              Matt
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                                  Regards,
> > > > > > > > >                                                  Christian.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-25 15:29                                                       ` Matthew Brost
@ 2024-11-25 16:19                                                         ` Christian König
  2024-11-25 17:27                                                           ` Matthew Brost
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2024-11-25 16:19 UTC (permalink / raw)
  To: Matthew Brost
  Cc: Thomas Hellström, Joonas Lahtinen, Christian König,
	Simona Vetter, Rodrigo Vivi, Huang Rui, intel-xe, dri-devel,
	matthew.auld, David Airlie, Simona Vetter

[-- Attachment #1: Type: text/plain, Size: 990 bytes --]

Am 25.11.24 um 16:29 schrieb Matthew Brost:
> On Fri, Nov 15, 2024 at 10:27:59AM -0800, Matthew Brost wrote:
>> [SNIP]
>> We use this interface to read a BO marked with a dumpable flag during a
>> GPU hang in our error capture code. This is an internal KMD feature, not
>> directly exposed to user space. Would adding this helper be acceptable
>> for this use case? I can add kernel indicating the current restrictions
>> of the helper (do not directly expose to user space) too if that would
>> help.
>>
> Christian - ping on above.

Sorry, I will try to give those mailing list tasks a bit more time in 
before the xmas holidays.

That is an acceptable use case, but the problem is that this helper 
won't work for that.

See during a GPU hang you can't lock BOs, so how do you want to look 
into their content with the peek helper?

The only thing you could potentially do is to trylock the BO and then 
dump, but that would most likely be a bit unreliable.

Regards,
Christian.

>> Matt

[-- Attachment #2: Type: text/html, Size: 1802 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-25 16:19                                                         ` Christian König
@ 2024-11-25 17:27                                                           ` Matthew Brost
  2024-11-26  8:19                                                             ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Matthew Brost @ 2024-11-25 17:27 UTC (permalink / raw)
  To: Christian König
  Cc: Thomas Hellström, Joonas Lahtinen, Christian König,
	Simona Vetter, Rodrigo Vivi, Huang Rui, intel-xe, dri-devel,
	matthew.auld, David Airlie, Simona Vetter

On Mon, Nov 25, 2024 at 05:19:54PM +0100, Christian König wrote:
> Am 25.11.24 um 16:29 schrieb Matthew Brost:
> > On Fri, Nov 15, 2024 at 10:27:59AM -0800, Matthew Brost wrote:
> > > [SNIP]
> > > We use this interface to read a BO marked with a dumpable flag during a
> > > GPU hang in our error capture code. This is an internal KMD feature, not
> > > directly exposed to user space. Would adding this helper be acceptable
> > > for this use case? I can add kernel indicating the current restrictions
> > > of the helper (do not directly expose to user space) too if that would
> > > help.
> > > 
> > Christian - ping on above.
> 
> Sorry, I will try to give those mailing list tasks a bit more time in before
> the xmas holidays.
> 
> That is an acceptable use case, but the problem is that this helper won't
> work for that.
> 
> See during a GPU hang you can't lock BOs, so how do you want to look into
> their content with the peek helper?
> 

Agree we cannot lock BO directly in GPU hang path (TDR). Our error
capture code takes a snapshot of some the GPU state which is small and
safe to capture in TDR and kicks a worker which opportunistically
captures the VM state which has been marked to be captured. This is
where the helper is called and it is safe to lock the BO.

Matt

> The only thing you could potentially do is to trylock the BO and then dump,
> but that would most likely be a bit unreliable.
> 
> Regards,
> Christian.
> 
> > > Matt

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-25 17:27                                                           ` Matthew Brost
@ 2024-11-26  8:19                                                             ` Christian König
  2024-11-26 17:49                                                               ` Matthew Brost
  0 siblings, 1 reply; 56+ messages in thread
From: Christian König @ 2024-11-26  8:19 UTC (permalink / raw)
  To: Matthew Brost, Christian König
  Cc: Thomas Hellström, Joonas Lahtinen, Simona Vetter,
	Rodrigo Vivi, Huang Rui, intel-xe, dri-devel, matthew.auld,
	David Airlie, Simona Vetter

Am 25.11.24 um 18:27 schrieb Matthew Brost:
> On Mon, Nov 25, 2024 at 05:19:54PM +0100, Christian König wrote:
>> Am 25.11.24 um 16:29 schrieb Matthew Brost:
>>> On Fri, Nov 15, 2024 at 10:27:59AM -0800, Matthew Brost wrote:
>>>> [SNIP]
>>>> We use this interface to read a BO marked with a dumpable flag during a
>>>> GPU hang in our error capture code. This is an internal KMD feature, not
>>>> directly exposed to user space. Would adding this helper be acceptable
>>>> for this use case? I can add kernel indicating the current restrictions
>>>> of the helper (do not directly expose to user space) too if that would
>>>> help.
>>>>
>>> Christian - ping on above.
>> Sorry, I will try to give those mailing list tasks a bit more time in before
>> the xmas holidays.
>>
>> That is an acceptable use case, but the problem is that this helper won't
>> work for that.
>>
>> See during a GPU hang you can't lock BOs, so how do you want to look into
>> their content with the peek helper?
>>
> Agree we cannot lock BO directly in GPU hang path (TDR). Our error
> capture code takes a snapshot of some the GPU state which is small and
> safe to capture in TDR and kicks a worker which opportunistically
> captures the VM state which has been marked to be captured. This is
> where the helper is called and it is safe to lock the BO.

Yeah that sounds like it should work.

No objections from my side for that use case, but I would rather like to 
keep the code inside ttm_bo_vm.c.

Crash dumping is usually something associated with the VMA even if it's 
a bit special here for the VM state.

Regards,
Christian.

>
> Matt
>
>> The only thing you could potentially do is to trylock the BO and then dump,
>> but that would most likely be a bit unreliable.
>>
>> Regards,
>> Christian.
>>
>>>> Matt


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-26  8:19                                                             ` Christian König
@ 2024-11-26 17:49                                                               ` Matthew Brost
  2024-11-27 13:21                                                                 ` Christian König
  0 siblings, 1 reply; 56+ messages in thread
From: Matthew Brost @ 2024-11-26 17:49 UTC (permalink / raw)
  To: Christian König
  Cc: Christian König, Thomas Hellström, Joonas Lahtinen,
	Simona Vetter, Rodrigo Vivi, Huang Rui, intel-xe, dri-devel,
	matthew.auld, David Airlie, Simona Vetter

On Tue, Nov 26, 2024 at 09:19:47AM +0100, Christian König wrote:
> Am 25.11.24 um 18:27 schrieb Matthew Brost:
> > On Mon, Nov 25, 2024 at 05:19:54PM +0100, Christian König wrote:
> > > Am 25.11.24 um 16:29 schrieb Matthew Brost:
> > > > On Fri, Nov 15, 2024 at 10:27:59AM -0800, Matthew Brost wrote:
> > > > > [SNIP]
> > > > > We use this interface to read a BO marked with a dumpable flag during a
> > > > > GPU hang in our error capture code. This is an internal KMD feature, not
> > > > > directly exposed to user space. Would adding this helper be acceptable
> > > > > for this use case? I can add kernel indicating the current restrictions
> > > > > of the helper (do not directly expose to user space) too if that would
> > > > > help.
> > > > > 
> > > > Christian - ping on above.
> > > Sorry, I will try to give those mailing list tasks a bit more time in before
> > > the xmas holidays.
> > > 
> > > That is an acceptable use case, but the problem is that this helper won't
> > > work for that.
> > > 
> > > See during a GPU hang you can't lock BOs, so how do you want to look into
> > > their content with the peek helper?
> > > 
> > Agree we cannot lock BO directly in GPU hang path (TDR). Our error
> > capture code takes a snapshot of some the GPU state which is small and
> > safe to capture in TDR and kicks a worker which opportunistically
> > captures the VM state which has been marked to be captured. This is
> > where the helper is called and it is safe to lock the BO.
> 
> Yeah that sounds like it should work.
> 
> No objections from my side for that use case, but I would rather like to
> keep the code inside ttm_bo_vm.c.
>

Thanks, reposted with code inside ttm_bo_vm.c. Any objection to merging
entire series through drm-xe-next and then backporting single TTM patch
drm-misc-next?

Matt

> Crash dumping is usually something associated with the VMA even if it's a
> bit special here for the VM state.
> 
> Regards,
> Christian.
> 
> > 
> > Matt
> > 
> > > The only thing you could potentially do is to trylock the BO and then dump,
> > > but that would most likely be a bit unreliable.
> > > 
> > > Regards,
> > > Christian.
> > > 
> > > > > Matt
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 2/8] drm/ttm: Add ttm_bo_access
  2024-11-26 17:49                                                               ` Matthew Brost
@ 2024-11-27 13:21                                                                 ` Christian König
  0 siblings, 0 replies; 56+ messages in thread
From: Christian König @ 2024-11-27 13:21 UTC (permalink / raw)
  To: Matthew Brost, Christian König
  Cc: Thomas Hellström, Joonas Lahtinen, Simona Vetter,
	Rodrigo Vivi, Huang Rui, intel-xe, dri-devel, matthew.auld,
	David Airlie, Simona Vetter

Am 26.11.24 um 18:49 schrieb Matthew Brost:
> On Tue, Nov 26, 2024 at 09:19:47AM +0100, Christian König wrote:
>> Am 25.11.24 um 18:27 schrieb Matthew Brost:
>>> On Mon, Nov 25, 2024 at 05:19:54PM +0100, Christian König wrote:
>>>> Am 25.11.24 um 16:29 schrieb Matthew Brost:
>>>>> On Fri, Nov 15, 2024 at 10:27:59AM -0800, Matthew Brost wrote:
>>>>>> [SNIP]
>>>>>> We use this interface to read a BO marked with a dumpable flag during a
>>>>>> GPU hang in our error capture code. This is an internal KMD feature, not
>>>>>> directly exposed to user space. Would adding this helper be acceptable
>>>>>> for this use case? I can add kernel indicating the current restrictions
>>>>>> of the helper (do not directly expose to user space) too if that would
>>>>>> help.
>>>>>>
>>>>> Christian - ping on above.
>>>> Sorry, I will try to give those mailing list tasks a bit more time in before
>>>> the xmas holidays.
>>>>
>>>> That is an acceptable use case, but the problem is that this helper won't
>>>> work for that.
>>>>
>>>> See during a GPU hang you can't lock BOs, so how do you want to look into
>>>> their content with the peek helper?
>>>>
>>> Agree we cannot lock BO directly in GPU hang path (TDR). Our error
>>> capture code takes a snapshot of some the GPU state which is small and
>>> safe to capture in TDR and kicks a worker which opportunistically
>>> captures the VM state which has been marked to be captured. This is
>>> where the helper is called and it is safe to lock the BO.
>> Yeah that sounds like it should work.
>>
>> No objections from my side for that use case, but I would rather like to
>> keep the code inside ttm_bo_vm.c.
>>
> Thanks, reposted with code inside ttm_bo_vm.c. Any objection to merging
> entire series through drm-xe-next and then backporting single TTM patch
> drm-misc-next?

No need for a backport as long as nobody in drm-misc-next depends on that.

As far as I can see the change is small enough to not cause any 
conflicts, so merging through drm-xe-next is fine with me.

Christian.

>
> Matt
>
>> Crash dumping is usually something associated with the VMA even if it's a
>> bit special here for the VM state.
>>
>> Regards,
>> Christian.
>>
>>> Matt
>>>
>>>> The only thing you could potentially do is to trylock the BO and then dump,
>>>> but that would most likely be a bit unreliable.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>> Matt


^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2024-11-27 13:21 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-31 18:10 [PATCH v6 0/8] Fix non-contiguous VRAM BO access in Xe Matthew Brost
2024-10-31 18:10 ` [PATCH v6 1/8] drm/xe: Add xe_bo_vm_access Matthew Brost
2024-10-31 18:10 ` [PATCH v6 2/8] drm/ttm: Add ttm_bo_access Matthew Brost
2024-10-31 23:43   ` Matthew Brost
2024-11-04 17:34     ` Rodrigo Vivi
2024-11-04 19:28       ` Christian König
2024-11-04 21:49         ` Matthew Brost
2024-11-05  7:41           ` Christian König
2024-11-05 18:35             ` Matthew Brost
2024-11-06  9:48               ` Christian König
2024-11-06 15:25                 ` Matthew Brost
2024-11-06 15:44                   ` Christian König
2024-11-06 17:00                     ` Matthew Brost
2024-11-07  9:44                       ` Christian König
2024-11-11  8:00                         ` Joonas Lahtinen
2024-11-11 10:10                           ` Simona Vetter
2024-11-11 11:34                             ` Christian König
2024-11-11 14:00                               ` Joonas Lahtinen
2024-11-11 15:54                                 ` Christian König
2024-11-11 22:45                                   ` Matthew Brost
2024-11-12  9:23                                     ` Christian König
2024-11-12 13:41                                       ` Joonas Lahtinen
2024-11-12 16:22                                         ` Thomas Hellström
2024-11-12 16:25                                           ` Christian König
2024-11-12 16:33                                             ` Thomas Hellström
2024-11-13  8:37                                               ` Christian König
2024-11-13 10:44                                                 ` Thomas Hellström
2024-11-13 11:42                                                   ` Christian König
2024-11-15 18:27                                                     ` Matthew Brost
2024-11-25 15:29                                                       ` Matthew Brost
2024-11-25 16:19                                                         ` Christian König
2024-11-25 17:27                                                           ` Matthew Brost
2024-11-26  8:19                                                             ` Christian König
2024-11-26 17:49                                                               ` Matthew Brost
2024-11-27 13:21                                                                 ` Christian König
2024-11-12  8:28                                 ` Simona Vetter
2024-11-12  8:58                                   ` Christian König
2024-11-12 13:30                                     ` Joonas Lahtinen
2024-11-11 11:27                           ` Christian König
2024-11-04 19:47     ` Christian König
2024-11-04 21:30       ` Matthew Brost
2024-11-04 22:26         ` Rodrigo Vivi
2024-10-31 18:10 ` [PATCH v6 3/8] drm/xe: Add xe_ttm_access_memory Matthew Brost
2024-10-31 18:10 ` [PATCH v6 4/8] drm/xe: Take PM ref in delayed snapshot capture worker Matthew Brost
2024-10-31 18:10 ` [PATCH v6 5/8] drm/xe/display: Update intel_bo_read_from_page to use ttm_bo_access Matthew Brost
2024-10-31 18:10 ` [PATCH v6 6/8] drm/xe: Use ttm_bo_access in xe_vm_snapshot_capture_delayed Matthew Brost
2024-10-31 18:10 ` [PATCH v6 7/8] drm/xe: Set XE_BO_FLAG_PINNED in migrate selftest BOs Matthew Brost
2024-10-31 18:10 ` [PATCH v6 8/8] drm/xe: Only allow contiguous BOs to use xe_bo_vmap Matthew Brost
2024-10-31 18:15 ` ✓ CI.Patch_applied: success for Fix non-contiguous VRAM BO access in Xe (rev6) Patchwork
2024-10-31 18:15 ` ✗ CI.checkpatch: warning " Patchwork
2024-10-31 18:17 ` ✓ CI.KUnit: success " Patchwork
2024-10-31 18:28 ` ✓ CI.Build: " Patchwork
2024-10-31 18:31 ` ✓ CI.Hooks: " Patchwork
2024-10-31 18:32 ` ✗ CI.checksparse: warning " Patchwork
2024-10-31 18:57 ` ✓ CI.BAT: success " Patchwork
2024-10-31 21:27 ` ✗ CI.FULL: failure " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox