[PATCH V11 00/12] Add memory page offlining support

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH V11 00/12] Add memory page offlining support
@ 2026-06-05 12:38 Tejas Upadhyay
  2026-06-05 12:38 ` [PATCH V11 01/12] drm/xe: Link VRAM object with gpu buddy Tejas Upadhyay
                   ` (11 more replies)
  0 siblings, 12 replies; 15+ messages in thread
From: Tejas Upadhyay @ 2026-06-05 12:38 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.auld, matthew.brost, thomas.hellstrom,
	himal.prasad.ghimiray, Tejas Upadhyay

This functionality represents a significant step in making
the xe driver gracefully handle hardware memory degradation.
By integrating with the DRM Buddy allocator, the driver
can permanently "carve out" faulty memory so it isn't reused
by subsequent allocations.

This series adds memory page offlining support with following:
1. Link VRAM object with gpu buddy
2. Integrate lockdep for gpu buddy manager
3. Link and track ttm BO's with physical addresses
4. Link LRC BO and its execution Queue
5. Extend BO purge to handle vram pages as well
6. Handle the generated physical address error by reserving addresses 4K page
7. Adds supporting debugfs to automate injection of physcal address error
8. Add buddy block allocation dump for debuggin buddy related issues
9. Add configfs for vram bad page reservation policy
10. Sysfs entry to provide statistics of bad gpu vram pages for user info
11. Expose ban reason in EXEC_QUEUE_GET_PROPERTY_BAN
12. Add soft/hard offline mode for VRAM page retirement

v11:
- Add BAN reason for UMD to know about offlining
- Add support for soft offline mode
v10:
- Remove RFC
v7:
- Improve debugfs warning messages
- Use scope_guard for locking(MattB)
- Adapt addition of queue member of LRC BO(MattB)
- Extend and use xe_ttm_bo_purge API for vram pages(MattB)
- Handle dma_buf_map requests for native and remote(MattB)
- Address if in never initialized block, set block to NULL
- Add lockdep in gpu buddy (MattB)
- Correct allocated_addr_to_block logic (MattA)
V6:
- Add more specific tests to noncritical bo sections
- Handle smooth exit of user created exec queues
- Break code and make purge specific static API
V5:
- Sysfs "max_pages" addition
- Reset block->private NULL post purge
- Remove wedge, return -EIO to system controller will initiate reset
- Add debugfs tests to trigger different test scenarios manually and via igt
- Rename addr_to_tbo to addr_to_block and move under gpu/buddy.c
V4: API reworks, add configfs for policy reservation and apply config everywhere
V3: use res_to_mem_region to avoid use of block->private (MattA)
V2:
- some fixes and clean up on errors
- Added xe_vram_addr_to_region helper to avoid other use of block->private (MattB)

Debugfs shows test of different scenarios,
echo 0 > /sys/kernel/debug/dri/bdf/invalid_addr_vram0
where 0 is below address types to be tested,
enum mempage_offline_mode {
        MEMPAGE_OFFLINE_UNALLOCATED = 0,
        MEMPAGE_OFFLINE_USER_ALLOCATED = 1,
        MEMPAGE_OFFLINE_KERNEL_USER_GGTT_ALLOCATED = 2,
        MEMPAGE_OFFLINE_KERNEL_USER_PPGTT_ALLOCATED = 3,
        MEMPAGE_OFFLINE_KERNEL_CRITICAL_ALLOCATED = 4,
        MEMPAGE_OFFLINE_RESERVED = 5,
};

IGT tests for testing this feature standalone core memory side:
https://patchwork.freedesktop.org/patch/714751/

Tejas Upadhyay (12):
  drm/xe: Link VRAM object with gpu buddy
  drm/buddy: Integrate lockdep annotations for gpu buddy manager
  drm/gpu: Add gpu_buddy_allocated_addr_to_block helper
  drm/xe: Link LRC BO and its execution Queue
  drm/xe: Extend BO purge to handle vram pages as well
  drm/xe: Handle physical memory address error
  drm/xe/cri: Add debugfs to inject faulty vram address
  gpu/buddy: Add routine to dump allocated buddy blocks
  drm/xe/configfs: Add vram bad page reservation policy
  drm/xe/cri: Add sysfs interface for bad gpu vram pages
  drm/xe/uapi: Expose ban reason in EXEC_QUEUE_GET_PROPERTY_BAN
  drm/xe: Add soft/hard offline mode for VRAM page retirement

 drivers/gpu/buddy.c                        | 107 +++++
 drivers/gpu/drm/drm_buddy.c                |   1 +
 drivers/gpu/drm/xe/xe_bo.c                 |  16 +-
 drivers/gpu/drm/xe/xe_bo.h                 |   5 +-
 drivers/gpu/drm/xe/xe_bo_types.h           |   3 +
 drivers/gpu/drm/xe/xe_configfs.c           |  64 ++-
 drivers/gpu/drm/xe/xe_configfs.h           |   2 +
 drivers/gpu/drm/xe/xe_debugfs.c            | 168 ++++++++
 drivers/gpu/drm/xe/xe_device_sysfs.c       |   7 +
 drivers/gpu/drm/xe/xe_dma_buf.c            |   3 +
 drivers/gpu/drm/xe/xe_exec_queue.c         |  13 +-
 drivers/gpu/drm/xe/xe_exec_queue_types.h   |   7 +-
 drivers/gpu/drm/xe/xe_execlist.c           |   4 +-
 drivers/gpu/drm/xe/xe_guc_submit.c         |  24 +-
 drivers/gpu/drm/xe/xe_lrc.c                |   1 +
 drivers/gpu/drm/xe/xe_pt.c                 |   3 +-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c       | 442 +++++++++++++++++++--
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.h       |   2 +
 drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h |  44 ++
 include/linux/gpu_buddy.h                  |  44 ++
 include/uapi/drm/xe_drm.h                  |  12 +-
 21 files changed, 927 insertions(+), 45 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH V11 01/12] drm/xe: Link VRAM object with gpu buddy
  2026-06-05 12:38 [PATCH V11 00/12] Add memory page offlining support Tejas Upadhyay
@ 2026-06-05 12:38 ` Tejas Upadhyay
  2026-06-05 12:38 ` [PATCH V11 02/12] drm/buddy: Integrate lockdep annotations for gpu buddy manager Tejas Upadhyay
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Tejas Upadhyay @ 2026-06-05 12:38 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.auld, matthew.brost, thomas.hellstrom,
	himal.prasad.ghimiray, Tejas Upadhyay

Setup to link TTM buffer object inside gpu buddy. This functionality
is critical for supporting the memory page offline feature on CRI,
where identified faulty pages must be traced back to their
originating buffer for safe removal.

V2(MattB): Clear block->private in xe_ttm_vram_mgr_del as well

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index 9dad5cf5b50a..ffba13961528 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -54,6 +54,7 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
 	struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
 	struct xe_ttm_vram_mgr_resource *vres;
 	struct gpu_buddy *mm = &mgr->mm;
+	struct gpu_buddy_block *block;
 	u64 size, min_page_size;
 	unsigned long lpfn;
 	int err;
@@ -138,6 +139,8 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
 	}
 
 	mgr->visible_avail -= vres->used_visible_size;
+	list_for_each_entry(block, &vres->blocks, link)
+		block->private = tbo;
 	mutex_unlock(&mgr->lock);
 
 	if (!(vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) &&
@@ -176,8 +179,11 @@ static void xe_ttm_vram_mgr_del(struct ttm_resource_manager *man,
 		to_xe_ttm_vram_mgr_resource(res);
 	struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
 	struct gpu_buddy *mm = &mgr->mm;
+	struct gpu_buddy_block *block;
 
 	mutex_lock(&mgr->lock);
+	list_for_each_entry(block, &vres->blocks, link)
+		block->private = NULL;
 	gpu_buddy_free_list(mm, &vres->blocks, 0);
 	mgr->visible_avail += vres->used_visible_size;
 	mutex_unlock(&mgr->lock);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V11 02/12] drm/buddy: Integrate lockdep annotations for gpu buddy manager
  2026-06-05 12:38 [PATCH V11 00/12] Add memory page offlining support Tejas Upadhyay
  2026-06-05 12:38 ` [PATCH V11 01/12] drm/xe: Link VRAM object with gpu buddy Tejas Upadhyay
@ 2026-06-05 12:38 ` Tejas Upadhyay
  2026-06-05 12:38 ` [PATCH V11 03/12] drm/gpu: Add gpu_buddy_allocated_addr_to_block helper Tejas Upadhyay
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Tejas Upadhyay @ 2026-06-05 12:38 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.auld, matthew.brost, thomas.hellstrom,
	himal.prasad.ghimiray, Tejas Upadhyay, Arunpravin Paneer Selvam

gpu_buddy APIs are expected to be called with the driver-provided lock
held, but there is no runtime enforcement of this contract. Add lockdep
annotations to catch locking violations early.

Introduce gpu_buddy_driver_set_lock() for the driver to register the
lock that protects the buddy manager. Add gpu_buddy_driver_lock_held()
assertions to all exported gpu_buddy and drm_buddy APIs that
access/modify the manager state. The lock_dep_map field is only compiled
in when CONFIG_LOCKDEP is enabled, adding zero overhead to production
builds.

Wire up xe_ttm_vram_mgr to register its mutex with the buddy manager
after initialization.

Assisted-by: Copilot:claude-opus-4.6
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Link: https://patch.msgid.link/20260508065544.4049240-2-tejas.upadhyay@intel.com
(cherry picked from commit 35b535db69589ea0025ec3f06df08f2e3faad26f)
---
 drivers/gpu/buddy.c                  | 11 ++++++++
 drivers/gpu/drm/drm_buddy.c          |  1 +
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c |  1 +
 include/linux/gpu_buddy.h            | 41 ++++++++++++++++++++++++++++
 4 files changed, 54 insertions(+)

diff --git a/drivers/gpu/buddy.c b/drivers/gpu/buddy.c
index 52686672e99f..eb1457376307 100644
--- a/drivers/gpu/buddy.c
+++ b/drivers/gpu/buddy.c
@@ -437,6 +437,9 @@ int gpu_buddy_init(struct gpu_buddy *mm, u64 size, u64 chunk_size)
 		root_count++;
 	} while (size);
 
+#ifdef CONFIG_LOCKDEP
+	mm->lock_dep_map = NULL;
+#endif
 	return 0;
 
 out_free_roots:
@@ -538,6 +541,7 @@ void gpu_buddy_reset_clear(struct gpu_buddy *mm, bool is_clear)
 	unsigned int order;
 	int i;
 
+	gpu_buddy_driver_lock_held(mm);
 	size = mm->size;
 	for (i = 0; i < mm->n_roots; ++i) {
 		order = ilog2(size) - ilog2(mm->chunk_size);
@@ -580,6 +584,7 @@ EXPORT_SYMBOL(gpu_buddy_reset_clear);
 void gpu_buddy_free_block(struct gpu_buddy *mm,
 			  struct gpu_buddy_block *block)
 {
+	gpu_buddy_driver_lock_held(mm);
 	BUG_ON(!gpu_buddy_block_is_allocated(block));
 	mm->avail += gpu_buddy_block_size(mm, block);
 	if (gpu_buddy_block_is_clear(block))
@@ -633,6 +638,7 @@ void gpu_buddy_free_list(struct gpu_buddy *mm,
 {
 	bool mark_clear = flags & GPU_BUDDY_CLEARED;
 
+	gpu_buddy_driver_lock_held(mm);
 	__gpu_buddy_free_list(mm, objects, mark_clear, !mark_clear);
 }
 EXPORT_SYMBOL(gpu_buddy_free_list);
@@ -1172,6 +1178,8 @@ int gpu_buddy_block_trim(struct gpu_buddy *mm,
 	u64 new_start;
 	int err;
 
+	gpu_buddy_driver_lock_held(mm);
+
 	if (!list_is_singular(blocks))
 		return -EINVAL;
 
@@ -1287,6 +1295,8 @@ int gpu_buddy_alloc_blocks(struct gpu_buddy *mm,
 	unsigned long pages;
 	int err;
 
+	gpu_buddy_driver_lock_held(mm);
+
 	if (size < mm->chunk_size)
 		return -EINVAL;
 
@@ -1475,6 +1485,7 @@ void gpu_buddy_print(struct gpu_buddy *mm)
 {
 	int order;
 
+	gpu_buddy_driver_lock_held(mm);
 	pr_info("chunk_size: %lluKiB, total: %lluMiB, free: %lluMiB, clear_free: %lluMiB\n",
 		mm->chunk_size >> 10, mm->size >> 20, mm->avail >> 20, mm->clear_avail >> 20);
 
diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 841f3de5f307..faa025498de4 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -42,6 +42,7 @@ void drm_buddy_print(struct gpu_buddy *mm, struct drm_printer *p)
 {
 	int order;
 
+	gpu_buddy_driver_lock_held(mm);
 	drm_printf(p, "chunk_size: %lluKiB, total: %lluMiB, free: %lluMiB, clear_free: %lluMiB\n",
 		   mm->chunk_size >> 10, mm->size >> 20, mm->avail >> 20, mm->clear_avail >> 20);
 
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index ffba13961528..5ab5dfdb183c 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -327,6 +327,7 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
 	if (err)
 		return err;
 
+	gpu_buddy_driver_set_lock(&mgr->mm, &mgr->lock);
 	ttm_set_driver_manager(&xe->ttm, mem_type, &mgr->manager);
 	ttm_resource_manager_set_used(&mgr->manager, true);
 
diff --git a/include/linux/gpu_buddy.h b/include/linux/gpu_buddy.h
index 5fa917ba5450..71941a039648 100644
--- a/include/linux/gpu_buddy.h
+++ b/include/linux/gpu_buddy.h
@@ -154,6 +154,7 @@ struct gpu_buddy_block {
  * @avail: Total free space currently available for allocation in bytes.
  * @clear_avail: Free space available in the clear tree (zeroed memory) in bytes.
  *               This is a subset of @avail.
+ * @lock_dep_map: Annotates gpu_buddy API with a driver provided lock.
  */
 struct gpu_buddy {
 /* private: */
@@ -179,8 +180,48 @@ struct gpu_buddy {
 	u64 size;
 	u64 avail;
 	u64 clear_avail;
+#ifdef CONFIG_LOCKDEP
+	struct lockdep_map *lock_dep_map;
+#endif
 };
 
+#ifdef CONFIG_LOCKDEP
+/**
+ * gpu_buddy_driver_set_lock() - Set the lock protecting accesses to GPU BUDDY
+ * @mm: Pointer to GPU buddy structure.
+ * @lock: the lock used to protect the gpu buddy. The locking primitive
+ * must contain a dep_map field.
+ *
+ * Call this to annotate gpu_buddy APIs which access/modify gpu_buddy manager
+ */
+#define gpu_buddy_driver_set_lock(mm, lock) \
+	do { \
+		struct gpu_buddy *__mm = (mm); \
+		if (!WARN(__mm->lock_dep_map, "GPU BUDDY MM lock should be set only once.")) \
+			__mm->lock_dep_map = &(lock)->dep_map; \
+	} while (0)
+#else
+#define gpu_buddy_driver_set_lock(mm, lock) do { (void)(mm); (void)(lock); } while (0)
+#endif
+
+#ifdef CONFIG_LOCKDEP
+/**
+ * gpu_buddy_driver_lock_held() - Assert GPU BUDDY manager lock is held
+ * @mm: Pointer to the GPU BUDDY structure.
+ *
+ * Ensure driver lock is held.
+ */
+static inline void gpu_buddy_driver_lock_held(struct gpu_buddy *mm)
+{
+	if (mm->lock_dep_map)
+		lockdep_assert(lock_is_held_type(mm->lock_dep_map, 0));
+}
+#else
+static inline void gpu_buddy_driver_lock_held(struct gpu_buddy *mm)
+{
+}
+#endif
+
 static inline u64
 gpu_buddy_block_offset(const struct gpu_buddy_block *block)
 {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V11 03/12] drm/gpu: Add gpu_buddy_allocated_addr_to_block helper
  2026-06-05 12:38 [PATCH V11 00/12] Add memory page offlining support Tejas Upadhyay
  2026-06-05 12:38 ` [PATCH V11 01/12] drm/xe: Link VRAM object with gpu buddy Tejas Upadhyay
  2026-06-05 12:38 ` [PATCH V11 02/12] drm/buddy: Integrate lockdep annotations for gpu buddy manager Tejas Upadhyay
@ 2026-06-05 12:38 ` Tejas Upadhyay
  2026-06-05 12:38 ` [PATCH V11 04/12] drm/xe: Link LRC BO and its execution Queue Tejas Upadhyay
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Tejas Upadhyay @ 2026-06-05 12:38 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.auld, matthew.brost, thomas.hellstrom,
	himal.prasad.ghimiray, Tejas Upadhyay, Arunpravin Paneer Selvam,
	dri-devel

Add helper with primary purpose is to efficiently trace a specific
physical memory address back to its corresponding TTM buffer object.

v3:
- use mm->chunk_size minimum allocation granularity (Arun)
v2:
- %s/gpu_buddy_addr_to_block/gpu_buddy_allocated_addr_to_block(MattA)
- remove clear->avail and split nodes check(MattA)
- Adapt lockdep(MattB)

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: Arunpravin Paneer Selvam <arunpravin.paneerselvam@amd.com>
Cc: dri-devel@lists.freedesktop.org
---
 drivers/gpu/buddy.c       | 53 +++++++++++++++++++++++++++++++++++++++
 include/linux/gpu_buddy.h |  2 ++
 2 files changed, 55 insertions(+)

diff --git a/drivers/gpu/buddy.c b/drivers/gpu/buddy.c
index eb1457376307..315f860f4346 100644
--- a/drivers/gpu/buddy.c
+++ b/drivers/gpu/buddy.c
@@ -594,6 +594,59 @@ void gpu_buddy_free_block(struct gpu_buddy *mm,
 }
 EXPORT_SYMBOL(gpu_buddy_free_block);
 
+/**
+ * gpu_buddy_allocated_addr_to_block - given relative address find the allocated block
+ *
+ * @mm: GPU buddy manager
+ * @addr: Relative address
+ *
+ * Returns:
+ * gpu_buddy_block on success, NULL or error code on failure
+ */
+struct gpu_buddy_block *gpu_buddy_allocated_addr_to_block(struct gpu_buddy *mm, u64 addr)
+{
+	struct gpu_buddy_block *block;
+	LIST_HEAD(dfs);
+	u64 end;
+	int i;
+
+	gpu_buddy_driver_lock_held(mm);
+
+	end = addr + mm->chunk_size - 1;
+	for (i = 0; i < mm->n_roots; ++i)
+		list_add_tail(&mm->roots[i]->tmp_link, &dfs);
+
+	do {
+		u64 block_start;
+		u64 block_end;
+
+		block = list_first_entry_or_null(&dfs,
+						 struct gpu_buddy_block,
+						 tmp_link);
+		if (!block)
+			break;
+
+		list_del(&block->tmp_link);
+
+		block_start = gpu_buddy_block_offset(block);
+		block_end = block_start + gpu_buddy_block_size(mm, block) - 1;
+
+		if (!overlaps(addr, end, block_start, block_end))
+			continue;
+
+		if (gpu_buddy_block_is_allocated(block))
+			return block;
+		else if (gpu_buddy_block_is_free(block))
+			return NULL;
+
+		list_add(&block->right->tmp_link, &dfs);
+		list_add(&block->left->tmp_link, &dfs);
+	} while (1);
+
+	return ERR_PTR(-ENXIO);
+}
+EXPORT_SYMBOL(gpu_buddy_allocated_addr_to_block);
+
 static void __gpu_buddy_free_list(struct gpu_buddy *mm,
 				  struct list_head *objects,
 				  bool mark_clear,
diff --git a/include/linux/gpu_buddy.h b/include/linux/gpu_buddy.h
index 71941a039648..e7e22fa05ee2 100644
--- a/include/linux/gpu_buddy.h
+++ b/include/linux/gpu_buddy.h
@@ -272,6 +272,8 @@ void gpu_buddy_reset_clear(struct gpu_buddy *mm, bool is_clear);
 
 void gpu_buddy_free_block(struct gpu_buddy *mm, struct gpu_buddy_block *block);
 
+struct gpu_buddy_block *gpu_buddy_allocated_addr_to_block(struct gpu_buddy *mm, u64 addr);
+
 void gpu_buddy_free_list(struct gpu_buddy *mm,
 			 struct list_head *objects,
 			 unsigned int flags);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V11 04/12] drm/xe: Link LRC BO and its execution Queue
  2026-06-05 12:38 [PATCH V11 00/12] Add memory page offlining support Tejas Upadhyay
                   ` (2 preceding siblings ...)
  2026-06-05 12:38 ` [PATCH V11 03/12] drm/gpu: Add gpu_buddy_allocated_addr_to_block helper Tejas Upadhyay
@ 2026-06-05 12:38 ` Tejas Upadhyay
  2026-06-05 12:38 ` [PATCH V11 05/12] drm/xe: Extend BO purge to handle vram pages as well Tejas Upadhyay
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Tejas Upadhyay @ 2026-06-05 12:38 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.auld, matthew.brost, thomas.hellstrom,
	himal.prasad.ghimiray, Tejas Upadhyay

To establish a link between an LRC BO (Logical Ring Context
Buffer Object) and its corresponding execution Queue in the
drm/xe driver, you need to store a back-pointer to the queue
within the BO's private data structure. This allows the
driver to identify and take corrective action on the specific
queue if the LRC BO encounters an error (e.g., memory
corruption or eviction issues).

V2(MattB):
- Handle multiqueue

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
 drivers/gpu/drm/xe/xe_bo_types.h   | 3 +++
 drivers/gpu/drm/xe/xe_exec_queue.c | 5 +++++
 drivers/gpu/drm/xe/xe_lrc.c        | 1 +
 3 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
index fcc63ae3f455..3d456f0fc14e 100644
--- a/drivers/gpu/drm/xe/xe_bo_types.h
+++ b/drivers/gpu/drm/xe/xe_bo_types.h
@@ -20,6 +20,7 @@
 struct xe_device;
 struct xe_mem_pool_node;
 struct xe_vm;
+struct xe_exec_queue;
 
 #define XE_BO_MAX_PLACEMENTS	3
 
@@ -40,6 +41,8 @@ struct xe_bo {
 	u32 flags;
 	/** @vm: VM this BO is attached to, for extobj this will be NULL */
 	struct xe_vm *vm;
+	/** @q: Queue this BO is attached to, mostly for LRC BO, NULL otherwise */
+	struct xe_exec_queue *q;
 	/** @tile: Tile this BO is attached to (kernel BO only) */
 	struct xe_tile *tile;
 	/** @placements: valid placements for this BO */
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 1b5ca3ce578a..835d2f336777 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -390,6 +390,11 @@ static int __xe_exec_queue_init(struct xe_exec_queue *q, u32 exec_queue_flags)
 				goto err_lrc;
 			}
 
+			/*
+			 * The queue ref counts the LRCs, thus it safe for the LRC BO to hold a
+			 * pointer to queue without reference.
+			 */
+			lrc->bo->q = xe_exec_queue_multi_queue_primary(q);
 			xe_exec_queue_set_lrc(q, lrc, i);
 
 			if (__lrc)
diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index a4292a11391d..55d00c7daaf2 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -1066,6 +1066,7 @@ static void xe_lrc_set_ppgtt(struct xe_lrc *lrc, struct xe_vm *vm)
 static void xe_lrc_finish(struct xe_lrc *lrc)
 {
 	xe_hw_fence_ctx_finish(&lrc->fence_ctx);
+	lrc->bo->q = NULL;
 	xe_bo_unpin_map_no_vm(lrc->bo);
 	xe_bo_unpin_map_no_vm(lrc->seqno_bo);
 }
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V11 05/12] drm/xe: Extend BO purge to handle vram pages as well
  2026-06-05 12:38 [PATCH V11 00/12] Add memory page offlining support Tejas Upadhyay
                   ` (3 preceding siblings ...)
  2026-06-05 12:38 ` [PATCH V11 04/12] drm/xe: Link LRC BO and its execution Queue Tejas Upadhyay
@ 2026-06-05 12:38 ` Tejas Upadhyay
  2026-06-05 12:38 ` [PATCH V11 06/12] drm/xe: Handle physical memory address error Tejas Upadhyay
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Tejas Upadhyay @ 2026-06-05 12:38 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.auld, matthew.brost, thomas.hellstrom,
	himal.prasad.ghimiray, Tejas Upadhyay, Arvind Yadav

Recent driver update introduce support for purgeable buffer
objects (BOs), extending the API to include VRAM pages to
better manage memory pressure and enable memory offlining.

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Arvind Yadav <arvind.yadav@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c | 5 +----
 drivers/gpu/drm/xe/xe_bo.h | 1 +
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 4c80bac67622..3c2513e4cb17 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -909,7 +909,7 @@ void xe_bo_set_purgeable_state(struct xe_bo *bo,
  *
  * Return: 0 on success, negative error code on failure
  */
-static int xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operation_ctx *ctx)
+int xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operation_ctx *ctx)
 {
 	struct xe_bo *bo = ttm_to_xe_bo(ttm_bo);
 	struct ttm_placement place = {};
@@ -917,9 +917,6 @@ static int xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operatio
 
 	xe_bo_assert_held(bo);
 
-	if (!ttm_bo->ttm)
-		return 0;
-
 	if (!xe_bo_madv_is_dontneed(bo))
 		return 0;
 
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 6340317f7d2e..8f9f16043c4c 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -584,6 +584,7 @@ struct xe_bo_shrink_flags {
 long xe_bo_shrink(struct ttm_operation_ctx *ctx, struct ttm_buffer_object *bo,
 		  const struct xe_bo_shrink_flags flags,
 		  unsigned long *scanned);
+int xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operation_ctx *ctx);
 
 /**
  * xe_bo_is_mem_type - Whether the bo currently resides in the given
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V11 06/12] drm/xe: Handle physical memory address error
  2026-06-05 12:38 [PATCH V11 00/12] Add memory page offlining support Tejas Upadhyay
                   ` (4 preceding siblings ...)
  2026-06-05 12:38 ` [PATCH V11 05/12] drm/xe: Extend BO purge to handle vram pages as well Tejas Upadhyay
@ 2026-06-05 12:38 ` Tejas Upadhyay
  2026-06-05 12:38 ` [PATCH V11 07/12] drm/xe/cri: Add debugfs to inject faulty vram address Tejas Upadhyay
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Tejas Upadhyay @ 2026-06-05 12:38 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.auld, matthew.brost, thomas.hellstrom,
	himal.prasad.ghimiray, Tejas Upadhyay

This functionality represents a significant step in making
the xe driver gracefully handle hardware memory degradation.
By integrating with the DRM Buddy allocator, the driver
can permanently "carve out" faulty memory so it isn't reused
by subsequent allocations.

Buddy Block Reservation:
----------------------
When a memory address is reported as faulty, the driver instructs
the DRM Buddy allocator to reserve a block of the specific page
size (typically 4KB). This marks the memory as "dirty/used"
indefinitely.

Two-Stage Tracking:
-----------------
Offlined Pages:
Pages that have been successfully isolated and removed from the
available memory pool.

Queued Pages:
Addresses that have been flagged as faulty but are currently in
use by a process. These are tracked until the associated buffer
object (BO) is released or migrated, at which point they move
to the "offlined" state.

V8(MattA):
- introduce helper for vram_buddy_alloc and free to avoid code dup
- Add WARN_ON for -ENXIO
v7:
- keep vm ref during vm kill and fix some typos
- FW communication code is moved in RAS, keep comment for same
V6:
- Use scope_guard for locking(MattB)
- Adapt addition of queue member of LRC BO(MattB)
- Extend and use xe_ttm_bo_purge API for vram pages(MattB)
- Handle dma_buf_map requests for native and remote(MattB)
- Address if in never initialized block, set block to NULL
V5:
- Categorise and handle BOs accordingly
- Fix crash found with new debugfs tests
V4:
- Set block->private NULL post bo purge
- Filter out gsm address early on
- Rebase
V3:
-rename api, remove tile dependency and add status of reservation
V2:
- Fix mm->avail counter issue
- Remove unused code and handle clean up in case of error

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c                 |  11 +-
 drivers/gpu/drm/xe/xe_bo.h                 |   4 +-
 drivers/gpu/drm/xe/xe_dma_buf.c            |   3 +
 drivers/gpu/drm/xe/xe_exec_queue.c         |   8 +-
 drivers/gpu/drm/xe/xe_pt.c                 |   3 +-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c       | 343 +++++++++++++++++++--
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.h       |   1 +
 drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h |  28 ++
 8 files changed, 365 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 3c2513e4cb17..0e8466bf9289 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -158,7 +158,16 @@ bool xe_bo_is_vm_bound(struct xe_bo *bo)
 	return !list_empty(&bo->ttm.base.gpuva.list);
 }
 
-static bool xe_bo_is_user(struct xe_bo *bo)
+/**
+ * xe_bo_is_user - check if BO is user created BO
+ * @bo: The BO
+ *
+ * Check if  BO is user created BO. This requires the
+ * reservation lock for the BO to be held.
+ *
+ * Returns: boolean
+ */
+bool xe_bo_is_user(struct xe_bo *bo)
 {
 	return bo->flags & XE_BO_FLAG_USER;
 }
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 8f9f16043c4c..2328cf419c44 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -361,7 +361,8 @@ static inline void xe_bo_unpin_map_no_vm(struct xe_bo *bo)
 {
 	if (likely(bo)) {
 		xe_bo_lock(bo, false);
-		xe_bo_unpin(bo);
+		if (!xe_bo_is_purged(bo))
+			xe_bo_unpin(bo);
 		xe_bo_unlock(bo);
 
 		xe_bo_put(bo);
@@ -585,6 +586,7 @@ long xe_bo_shrink(struct ttm_operation_ctx *ctx, struct ttm_buffer_object *bo,
 		  const struct xe_bo_shrink_flags flags,
 		  unsigned long *scanned);
 int xe_ttm_bo_purge(struct ttm_buffer_object *ttm_bo, struct ttm_operation_ctx *ctx);
+bool xe_bo_is_user(struct xe_bo *bo);
 
 /**
  * xe_bo_is_mem_type - Whether the bo currently resides in the given
diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index 8a920e58245c..4eaff2430b57 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -104,6 +104,9 @@ static struct sg_table *xe_dma_buf_map(struct dma_buf_attachment *attach,
 	struct sg_table *sgt;
 	int r = 0;
 
+	if (xe_bo_is_purged(bo))
+		return ERR_PTR(-ENOENT);
+
 	if (!attach->peer2peer && !xe_bo_can_migrate(bo, XE_PL_TT))
 		return ERR_PTR(-EOPNOTSUPP);
 
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
index 835d2f336777..e5bd94ca103b 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue.c
+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
@@ -1559,8 +1559,12 @@ void xe_exec_queue_update_run_ticks(struct xe_exec_queue *q)
 	 * errors.
 	 */
 	lrc = q->lrc[0];
-	new_ts = xe_lrc_update_timestamp(lrc, &old_ts);
-	q->xef->run_ticks[q->class] += (new_ts - old_ts) * q->width;
+	xe_bo_lock(lrc->bo, false);
+	if (!xe_bo_is_purged(lrc->bo)) {
+		new_ts = xe_lrc_update_timestamp(lrc, &old_ts);
+		q->xef->run_ticks[q->class] += (new_ts - old_ts) * q->width;
+	}
+	xe_bo_unlock(lrc->bo);
 
 	drm_dev_exit(idx);
 }
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 15ce77ce7793..67000e45aa19 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -212,7 +212,8 @@ void xe_pt_destroy(struct xe_pt *pt, u32 flags, struct llist_head *deferred)
 		return;
 
 	XE_WARN_ON(!list_empty(&pt->bo->ttm.base.gpuva.list));
-	xe_bo_unpin(pt->bo);
+	if (!xe_bo_is_purged(pt->bo))
+		xe_bo_unpin(pt->bo);
 	xe_bo_put_deferred(pt->bo, deferred);
 
 	if (pt->level > 0 && pt->num_live) {
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index 5ab5dfdb183c..0e968ae47fd9 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -13,7 +13,10 @@
 
 #include "xe_bo.h"
 #include "xe_device.h"
+#include "xe_exec_queue.h"
+#include "xe_lrc.h"
 #include "xe_res_cursor.h"
+#include "xe_ttm_stolen_mgr.h"
 #include "xe_ttm_vram_mgr.h"
 #include "xe_vram_types.h"
 
@@ -46,6 +49,40 @@ static inline bool xe_is_vram_mgr_blocks_contiguous(struct gpu_buddy *mm,
 	return true;
 }
 
+static int xe_ttm_vram_buddy_alloc(struct xe_ttm_vram_mgr *mgr, u64 start,
+				   u64 end, u64 size, u64 min_page_size,
+				   struct list_head *blocks, unsigned long flags,
+				   void *priv, u64 *used_visible)
+{
+	struct gpu_buddy *mm = &mgr->mm;
+	struct gpu_buddy_block *block;
+	int err;
+
+	err = gpu_buddy_alloc_blocks(mm, start, end, size, min_page_size, blocks, flags);
+	if (err)
+		return err;
+
+	list_for_each_entry(block, blocks, link)
+		block->private = priv;
+
+	if (end <= mgr->visible_size) {
+		*used_visible = size;
+	} else {
+		list_for_each_entry(block, blocks, link) {
+			u64 blk_start = gpu_buddy_block_offset(block);
+
+			if (blk_start < mgr->visible_size) {
+				u64 blk_end = blk_start + gpu_buddy_block_size(mm, block);
+
+				*used_visible += min(blk_end, mgr->visible_size) - blk_start;
+			}
+		}
+	}
+
+	mgr->visible_avail -= *used_visible;
+	return 0;
+}
+
 static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
 			       struct ttm_buffer_object *tbo,
 			       const struct ttm_place *place,
@@ -54,7 +91,6 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
 	struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
 	struct xe_ttm_vram_mgr_resource *vres;
 	struct gpu_buddy *mm = &mgr->mm;
-	struct gpu_buddy_block *block;
 	u64 size, min_page_size;
 	unsigned long lpfn;
 	int err;
@@ -115,32 +151,12 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
 		goto error_unlock;
 	}
 
-	err = gpu_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT,
-				     (u64)lpfn << PAGE_SHIFT, size,
-				     min_page_size, &vres->blocks, vres->flags);
+	err = xe_ttm_vram_buddy_alloc(mgr, (u64)place->fpfn << PAGE_SHIFT,
+				      (u64)lpfn << PAGE_SHIFT, size,
+				      min_page_size, &vres->blocks, vres->flags,
+				      tbo, &vres->used_visible_size);
 	if (err)
 		goto error_unlock;
-
-	if (lpfn <= mgr->visible_size >> PAGE_SHIFT) {
-		vres->used_visible_size = size;
-	} else {
-		struct gpu_buddy_block *block;
-
-		list_for_each_entry(block, &vres->blocks, link) {
-			u64 start = gpu_buddy_block_offset(block);
-
-			if (start < mgr->visible_size) {
-				u64 end = start + gpu_buddy_block_size(mm, block);
-
-				vres->used_visible_size +=
-					min(end, mgr->visible_size) - start;
-			}
-		}
-	}
-
-	mgr->visible_avail -= vres->used_visible_size;
-	list_for_each_entry(block, &vres->blocks, link)
-		block->private = tbo;
 	mutex_unlock(&mgr->lock);
 
 	if (!(vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) &&
@@ -172,20 +188,27 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
 	return err;
 }
 
+static void xe_ttm_vram_buddy_free(struct xe_ttm_vram_mgr *mgr,
+				   struct list_head *blocks,
+				   u64 used_visible)
+{
+	struct gpu_buddy_block *block;
+
+	list_for_each_entry(block, blocks, link)
+		block->private = NULL;
+	gpu_buddy_free_list(&mgr->mm, blocks, 0);
+	mgr->visible_avail += used_visible;
+}
+
 static void xe_ttm_vram_mgr_del(struct ttm_resource_manager *man,
 				struct ttm_resource *res)
 {
 	struct xe_ttm_vram_mgr_resource *vres =
 		to_xe_ttm_vram_mgr_resource(res);
 	struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
-	struct gpu_buddy *mm = &mgr->mm;
-	struct gpu_buddy_block *block;
 
 	mutex_lock(&mgr->lock);
-	list_for_each_entry(block, &vres->blocks, link)
-		block->private = NULL;
-	gpu_buddy_free_list(mm, &vres->blocks, 0);
-	mgr->visible_avail += vres->used_visible_size;
+	xe_ttm_vram_buddy_free(mgr, &vres->blocks, vres->used_visible_size);
 	mutex_unlock(&mgr->lock);
 
 	ttm_resource_fini(man, res);
@@ -280,6 +303,24 @@ static const struct ttm_resource_manager_func xe_ttm_vram_mgr_func = {
 	.debug	= xe_ttm_vram_mgr_debug
 };
 
+static void xe_ttm_vram_free_bad_pages(struct drm_device *dev, struct xe_ttm_vram_mgr *mgr)
+{
+	struct xe_ttm_vram_offline_resource *pos, *n;
+
+	list_for_each_entry_safe(pos, n, &mgr->offlined_pages, offlined_link) {
+		xe_ttm_vram_buddy_free(mgr, &pos->blocks, pos->used_visible_size);
+		list_del(&pos->offlined_link);
+		--mgr->n_offlined_pages;
+		kfree(pos);
+	}
+	list_for_each_entry_safe(pos, n, &mgr->queued_pages, queued_link) {
+		xe_ttm_vram_buddy_free(mgr, &pos->blocks, 0);
+		list_del(&pos->queued_link);
+		--mgr->n_queued_pages;
+		kfree(pos);
+	}
+}
+
 static void xe_ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
 {
 	struct xe_device *xe = to_xe_device(dev);
@@ -291,6 +332,10 @@ static void xe_ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
 	if (ttm_resource_manager_evict_all(&xe->ttm, man))
 		return;
 
+	mutex_lock(&mgr->lock);
+	xe_ttm_vram_free_bad_pages(dev, mgr);
+	mutex_unlock(&mgr->lock);
+
 	WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
 
 	gpu_buddy_fini(&mgr->mm);
@@ -318,6 +363,8 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
 	err = drmm_mutex_init(&xe->drm, &mgr->lock);
 	if (err)
 		return err;
+	INIT_LIST_HEAD(&mgr->offlined_pages);
+	INIT_LIST_HEAD(&mgr->queued_pages);
 	mgr->default_page_size = default_page_size;
 	mgr->visible_size = io_size;
 	mgr->visible_avail = io_size;
@@ -474,3 +521,237 @@ u64 xe_ttm_vram_get_avail(struct ttm_resource_manager *man)
 
 	return avail;
 }
+
+static int xe_ttm_vram_purge_page(struct xe_device *xe, struct xe_bo *bo)
+{
+	struct ttm_operation_ctx ctx = {};
+	struct xe_vm *vm = NULL;
+	u32	flags;
+	int ret = 0;
+
+	xe_bo_lock(bo, false);
+	if (bo->vm)
+		vm = xe_vm_get(bo->vm);
+	flags = bo->flags;
+	xe_bo_unlock(bo);
+	/*  Ban VM if BO is PPGTT */
+	if (vm && (flags & XE_BO_FLAG_PAGETABLE)) {
+		down_write(&vm->lock);
+		xe_vm_kill(vm, true);
+		up_write(&vm->lock);
+	}
+	if (vm)
+		xe_vm_put(vm);
+
+	xe_bo_lock(bo, false);
+	/*  Ban exec queue if BO is lrc */
+	if (bo->q && xe_exec_queue_get_unless_zero(bo->q)) {
+		/* ban queue */
+		xe_exec_queue_kill(bo->q);
+		xe_exec_queue_put(bo->q);
+	}
+
+	xe_bo_set_purgeable_state(bo, XE_MADV_PURGEABLE_DONTNEED);
+	ttm_bo_unmap_virtual(&bo->ttm);   /* nuke CPU mmap + VRAM IO mappings */
+	if (xe_bo_is_pinned(bo))
+		xe_bo_unpin(bo);
+	ret = xe_ttm_bo_purge(&bo->ttm, &ctx);
+	xe_bo_unlock(bo);
+
+	return ret;
+}
+
+static bool xe_ttm_vram_page_already_processed(struct xe_ttm_vram_mgr *mgr,
+					       unsigned long addr)
+{
+	struct xe_ttm_vram_offline_resource *pos;
+
+	lockdep_assert_held(&mgr->lock);
+
+	list_for_each_entry(pos, &mgr->offlined_pages, offlined_link) {
+		if (pos->addr == addr)
+			return true;
+	}
+
+	list_for_each_entry(pos, &mgr->queued_pages, queued_link) {
+		if (pos->addr == addr)
+			return true;
+	}
+
+	return false;
+}
+
+static int xe_ttm_vram_reserve_page_at_addr(struct xe_device *xe, unsigned long addr,
+					    struct xe_ttm_vram_mgr *vram_mgr, struct gpu_buddy *mm)
+{
+	struct xe_ttm_vram_offline_resource *nentry;
+	struct ttm_buffer_object *tbo = NULL;
+	struct gpu_buddy_block *block;
+	enum reserve_status {
+		pending = 0,
+		fail
+	};
+	u64 size = SZ_4K;
+	int ret = 0;
+
+	scoped_guard(mutex, &vram_mgr->lock) {
+		block = gpu_buddy_allocated_addr_to_block(mm, addr);
+		if (WARN_ON(IS_ERR(block)))
+			return PTR_ERR(block);
+
+		nentry = kzalloc_obj(*nentry);
+		if (!nentry)
+			return -ENOMEM;
+		INIT_LIST_HEAD(&nentry->blocks);
+		nentry->status = pending;
+		nentry->addr = addr;
+
+		if (block) {
+			struct xe_bo *pbo;
+
+			WARN_ON(!block->private);
+			tbo = block->private;
+			pbo = ttm_to_xe_bo(tbo);
+
+			/* Get reference safely - BO may have zero refcount */
+			if (!xe_bo_get_unless_zero(pbo)) {
+				kfree(nentry);
+				return -ENOENT;
+			}
+			/* Critical kernel BO? */
+			if ((pbo->ttm.type == ttm_bo_type_kernel &&
+			     !(pbo->flags & XE_BO_FLAG_PINNED_LATE_RESTORE)) ||
+			    (xe_bo_is_user(pbo) && xe_bo_is_pinned(pbo))) {
+				kfree(nentry);
+				xe_ttm_vram_free_bad_pages(&xe->drm, vram_mgr);
+				xe_bo_put(pbo);
+				drm_err(&xe->drm,
+					"%s: addr: 0x%lx is critical kernel bo, requesting SBR\n",
+					__func__, addr);
+				/* Hint System controller driver for reset with -EIO  */
+				return -EIO;
+			}
+			nentry->id = ++vram_mgr->n_queued_pages;
+			list_add(&nentry->queued_link, &vram_mgr->queued_pages);
+		}
+	}
+	if (block) {
+		struct xe_ttm_vram_offline_resource *pos, *n;
+		struct xe_bo *pbo = ttm_to_xe_bo(tbo);
+
+		/* Purge BO containing address - reference held from above */
+		ret = xe_ttm_vram_purge_page(xe, pbo);
+		xe_bo_put(pbo);
+		if (ret) {
+			nentry->status = fail;
+			return ret;
+		}
+
+		/* Reserve page at address addr*/
+		scoped_guard(mutex, &vram_mgr->lock) {
+			ret = xe_ttm_vram_buddy_alloc(vram_mgr, addr, addr + size,
+						      size, size, &nentry->blocks,
+						      GPU_BUDDY_RANGE_ALLOCATION,
+						      NULL, &nentry->used_visible_size);
+			if (ret) {
+				drm_warn(&xe->drm, "Could not reserve page at addr:0x%lx, ret:%d\n",
+					 addr, ret);
+				nentry->status = fail;
+				return ret;
+			}
+
+			list_for_each_entry_safe(pos, n, &vram_mgr->queued_pages, queued_link) {
+				if (pos->id == nentry->id) {
+					--vram_mgr->n_queued_pages;
+				list_del(&pos->queued_link);
+				break;
+				}
+			}
+			list_add(&nentry->offlined_link, &vram_mgr->offlined_pages);
+			/* RAS will send command to FW for offlining page based on ret value */
+			++vram_mgr->n_offlined_pages;
+			return ret;
+		}
+	} else {
+		scoped_guard(mutex, &vram_mgr->lock) {
+			ret = xe_ttm_vram_buddy_alloc(vram_mgr, addr, addr + size,
+						      size, size, &nentry->blocks,
+						      GPU_BUDDY_RANGE_ALLOCATION,
+						      NULL, &nentry->used_visible_size);
+			if (ret) {
+				drm_warn(&xe->drm, "Could not reserve page at addr:0x%lx, ret:%d\n",
+					 addr, ret);
+				nentry->status = fail;
+				return ret;
+			}
+
+			nentry->id = ++vram_mgr->n_offlined_pages;
+			list_add(&nentry->offlined_link, &vram_mgr->offlined_pages);
+			/* RAS will send command to FW for offlining page based on ret value */
+		}
+	}
+	/* Success */
+	return ret;
+}
+
+static struct xe_vram_region *xe_ttm_vram_addr_to_region(struct xe_device *xe,
+							 resource_size_t addr)
+{
+	unsigned long stolen_base = xe_ttm_stolen_gpu_offset(xe);
+	struct xe_vram_region *vr;
+	struct xe_tile *tile;
+	int id;
+
+	/* Addr from stolen memory? */
+	if (addr + SZ_4K >= stolen_base)
+		return NULL;
+
+	for_each_tile(tile, xe, id) {
+		vr = tile->mem.vram;
+		if ((addr <= vr->dpa_base + vr->actual_physical_size) &&
+		    (addr + SZ_4K >= vr->dpa_base))
+			return vr;
+	}
+	return NULL;
+}
+
+/**
+ * xe_ttm_vram_handle_addr_fault - Handle vram physical address error flaged
+ * @xe: pointer to parent device
+ * @addr: physical faulty address
+ *
+ * Handle the physcial faulty address error on specific tile.
+ *
+ * Returns 0 for success, negative error code otherwise as follow:
+ * * %-EIO - critical BO or address outside any VRAM region; next action is reset.
+ * * %-EOPNOTSUPP - log-only policy; no further action.
+ * * %-ENOMEM - allocation failure; next action is reset.
+ * * %-ENXIO - address not found in buddy; next action is reset.
+ * * %-EEXIST - address already processed; no further action.
+ * * % Any other negative error - next action is reset.
+ */
+int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long addr)
+{
+	struct xe_ttm_vram_mgr *vram_mgr;
+	struct xe_vram_region *vr;
+	struct gpu_buddy *mm;
+
+	vr = xe_ttm_vram_addr_to_region(xe, addr);
+	if (!vr) {
+		drm_err(&xe->drm, "%s:%d addr:%lx error requesting SBR\n",
+			__func__, __LINE__, addr);
+		/* Hint System controller driver for reset with -EIO  */
+		return -EIO;
+	}
+	vram_mgr = &vr->ttm;
+	mm = &vram_mgr->mm;
+
+	scoped_guard(mutex, &vram_mgr->lock) {
+		if (xe_ttm_vram_page_already_processed(vram_mgr, addr))
+			return -EEXIST;
+	}
+
+	/* Reserve page at address */
+	return xe_ttm_vram_reserve_page_at_addr(xe, addr, vram_mgr, mm);
+}
+EXPORT_SYMBOL(xe_ttm_vram_handle_addr_fault);
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
index 87b7fae5edba..8ef06d9d44f7 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
@@ -31,6 +31,7 @@ u64 xe_ttm_vram_get_cpu_visible_size(struct ttm_resource_manager *man);
 void xe_ttm_vram_get_used(struct ttm_resource_manager *man,
 			  u64 *used, u64 *used_visible);
 
+int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long addr);
 static inline struct xe_ttm_vram_mgr_resource *
 to_xe_ttm_vram_mgr_resource(struct ttm_resource *res)
 {
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
index 9106da056b49..3ad7966798eb 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
@@ -19,6 +19,14 @@ struct xe_ttm_vram_mgr {
 	struct ttm_resource_manager manager;
 	/** @mm: DRM buddy allocator which manages the VRAM */
 	struct gpu_buddy mm;
+	/** @offlined_pages: List of offlined pages */
+	struct list_head offlined_pages;
+	/** @n_offlined_pages: Number of offlined pages */
+	u16 n_offlined_pages;
+	/** @queued_pages: List of queued pages */
+	struct list_head queued_pages;
+	/** @n_queued_pages: Number of queued pages */
+	u16 n_queued_pages;
 	/** @visible_size: Proped size of the CPU visible portion */
 	u64 visible_size;
 	/** @visible_avail: CPU visible portion still unallocated */
@@ -45,4 +53,24 @@ struct xe_ttm_vram_mgr_resource {
 	unsigned long flags;
 };
 
+/**
+ * struct xe_ttm_vram_offline_resource - Xe TTM VRAM offline  resource
+ */
+struct xe_ttm_vram_offline_resource {
+	/** @offlined_link: Link to offlined pages */
+	struct list_head offlined_link;
+	/** @queued_link: Link to queued pages */
+	struct list_head queued_link;
+	/** @blocks: list of DRM buddy blocks */
+	struct list_head blocks;
+	/** @used_visible_size: How many CPU visible bytes this resource is using */
+	u64 used_visible_size;
+	/** @id: The id of an offline resource */
+	u16 id;
+	/** @addr: Address of faulty memory location reported by HW */
+	unsigned long addr;
+	/** @status: reservation status of resource */
+	bool status;
+};
+
 #endif
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V11 07/12] drm/xe/cri: Add debugfs to inject faulty vram address
  2026-06-05 12:38 [PATCH V11 00/12] Add memory page offlining support Tejas Upadhyay
                   ` (5 preceding siblings ...)
  2026-06-05 12:38 ` [PATCH V11 06/12] drm/xe: Handle physical memory address error Tejas Upadhyay
@ 2026-06-05 12:38 ` Tejas Upadhyay
  2026-06-05 12:38 ` [PATCH V11 08/12] gpu/buddy: Add routine to dump allocated buddy blocks Tejas Upadhyay
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Tejas Upadhyay @ 2026-06-05 12:38 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.auld, matthew.brost, thomas.hellstrom,
	himal.prasad.ghimiray, Tejas Upadhyay

Add debugfs which can help testing feature with manual error injection.
Adding a debugfs interface to the drm/xe driver allows manual injection
of faulty VRAM addresses, facilitating the testing of the CRI memory
page offline feature before it is fully functional. The implementation
involves creating a debugfs entry, likely under
/sys/kernel/debug/dri/bdf/invalid_addr_vram0,
to accept specific faulty addresses for validation.

For example,
echo 0 > /sys/kernel/debug/dri/bdf/invalid_addr_vram0
where 0 is below address types to be tested,
enum mempage_offline_mode {
        MEMPAGE_OFFLINE_UNALLOCATED = 0,
        MEMPAGE_OFFLINE_USER_ALLOCATED = 1,
        MEMPAGE_OFFLINE_KERNEL_USER_GGTT_ALLOCATED = 2,
        MEMPAGE_OFFLINE_KERNEL_USER_PPGTT_ALLOCATED = 3,
        MEMPAGE_OFFLINE_KERNEL_CRITICAL_ALLOCATED = 4,
        MEMPAGE_OFFLINE_RESERVED = 5
};

v5:
- Remove redundant code
v4:
- Use scope_guard around lock, adapt bo->q and enhance warn messages
- %s/gpu_buddy_addr_to_block/gpu_buddy_allocated_addr_to_block
v3:
- Add more specific noncritical bo tests
v2:
- Add mode based automated test vs manual address feed

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
 drivers/gpu/drm/xe/xe_debugfs.c            | 168 +++++++++++++++++++++
 drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h |   2 +
 2 files changed, 170 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
index 22b471303984..bc1f39c90c8e 100644
--- a/drivers/gpu/drm/xe/xe_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_debugfs.c
@@ -14,6 +14,7 @@
 #include "regs/xe_pmt.h"
 #include "xe_bo.h"
 #include "xe_device.h"
+#include "xe_exec_queue_types.h"
 #include "xe_force_wake.h"
 #include "xe_gt.h"
 #include "xe_gt_debugfs.h"
@@ -21,6 +22,7 @@
 #include "xe_guc_ads.h"
 #include "xe_hw_engine.h"
 #include "xe_mmio.h"
+#include "xe_migrate.h"
 #include "xe_pm.h"
 #include "xe_psmi.h"
 #include "xe_pxp_debugfs.h"
@@ -29,6 +31,8 @@
 #include "xe_sriov_vf.h"
 #include "xe_step.h"
 #include "xe_tile_debugfs.h"
+#include "xe_ttm_stolen_mgr.h"
+#include "xe_ttm_vram_mgr.h"
 #include "xe_vsec.h"
 #include "xe_wa.h"
 
@@ -40,6 +44,14 @@
 
 DECLARE_FAULT_ATTR(gt_reset_failure);
 DECLARE_FAULT_ATTR(inject_csc_hw_error);
+enum mempage_offline_mode {
+	MEMPAGE_OFFLINE_UNALLOCATED = 0,
+	MEMPAGE_OFFLINE_USER_ALLOCATED = 1,
+	MEMPAGE_OFFLINE_KERNEL_USER_GGTT_ALLOCATED = 2,
+	MEMPAGE_OFFLINE_KERNEL_USER_PPGTT_ALLOCATED = 3,
+	MEMPAGE_OFFLINE_KERNEL_CRITICAL_ALLOCATED = 4,
+	MEMPAGE_OFFLINE_RESERVED = 5,
+};
 
 static void read_residency_counter(struct xe_device *xe, struct xe_mmio *mmio,
 				   u32 offset, const char *name, struct drm_printer *p)
@@ -544,6 +556,151 @@ static const struct file_operations disable_late_binding_fops = {
 	.write = disable_late_binding_set,
 };
 
+static ssize_t addr_fault_reporting_show(struct file *f, char __user *ubuf,
+					 size_t size, loff_t *pos)
+{
+	struct xe_device *xe = file_inode(f)->i_private;
+	char buf[32];
+	int len;
+
+	len = scnprintf(buf, sizeof(buf), "%lld\n", xe->mem.vram->ttm.offline_mode);
+
+	return simple_read_from_buffer(ubuf, size, pos, buf, len);
+}
+
+static int mempage_exec_offline(struct xe_device *xe, u64 mode)
+{
+	struct xe_tile *tile = xe_device_get_root_tile(xe);
+	struct xe_vram_region *vr = tile->mem.vram;
+	struct ttm_buffer_object *tbo = NULL;
+	struct xe_ttm_vram_mgr *vram_mgr;
+	struct gpu_buddy_block *block;
+	bool do_offline = false;
+	struct gpu_buddy *mm;
+	struct xe_bo *bo;
+	u64 addr = 0x0;
+	int ret = 0;
+
+	vram_mgr = &vr->ttm;
+	mm = &vram_mgr->mm;
+	addr = vr->dpa_base;
+	while (addr <= vr->dpa_base + vr->actual_physical_size) {
+		scoped_guard(mutex, &vram_mgr->lock) {
+			block = gpu_buddy_allocated_addr_to_block(mm, addr);
+			if (!block && mode == MEMPAGE_OFFLINE_UNALLOCATED)
+				do_offline = true;
+			if (block && PTR_ERR(block) != -ENXIO) {
+				if (!block->private) {
+					addr = addr + SZ_4K;
+					do_offline = false;
+					continue;
+				}
+				tbo = block->private;
+				bo = ttm_to_xe_bo(tbo);
+				if (bo->ttm.type == ttm_bo_type_device &&
+				    bo->flags & XE_BO_FLAG_USER &&
+				    bo->flags & XE_BO_FLAG_VRAM_MASK &&
+				    mode == MEMPAGE_OFFLINE_USER_ALLOCATED) {
+					do_offline = true;
+				} else if (bo->q &&
+					   mode == MEMPAGE_OFFLINE_KERNEL_USER_GGTT_ALLOCATED) {
+					/* lrc */
+					struct xe_vm *migrate_vm;
+
+					migrate_vm = xe_migrate_get_vm(tile->migrate);
+					if (migrate_vm != bo->q->vm)
+						do_offline = true;
+					xe_vm_put(migrate_vm);
+				} else if (bo->ttm.type == ttm_bo_type_kernel &&
+					   bo->flags & XE_BO_FLAG_FORCE_USER_VRAM &&
+					   bo->flags & XE_BO_FLAG_PAGETABLE &&
+					   mode == MEMPAGE_OFFLINE_KERNEL_USER_PPGTT_ALLOCATED) {
+					/* ppgtt */
+					do_offline = true;
+				} else if (bo->ttm.type == ttm_bo_type_kernel &&
+					   !(bo->flags & XE_BO_FLAG_FORCE_USER_VRAM) &&
+					   mode == MEMPAGE_OFFLINE_KERNEL_CRITICAL_ALLOCATED) {
+					do_offline = true;
+				}
+			}
+		}
+		if (do_offline) {
+			/* Report fault */
+			ret = xe_ttm_vram_handle_addr_fault(xe, addr);
+			if (ret) {
+				if ((ret == -EIO) &&
+				    mode == MEMPAGE_OFFLINE_KERNEL_USER_GGTT_ALLOCATED) {
+					addr = addr + SZ_4K;
+					do_offline = false;
+					continue;
+				}
+				break;
+			}
+			/* Verify addr + SZ_4K is allocated */
+			scoped_guard(mutex, &vram_mgr->lock) {
+				block = gpu_buddy_allocated_addr_to_block(mm, addr);
+				if (!block || PTR_ERR(block) == -ENXIO || block->private)
+					ret = -EBUSY;
+			}
+			break;
+		}
+		addr = addr + SZ_4K;
+	}
+	if (!do_offline)
+		drm_warn(&xe->drm, "no such object, ret:%d\n", ret);
+
+	return ret;
+}
+
+static ssize_t addr_fault_reporting_set(struct file *f, const char __user *ubuf,
+					size_t size, loff_t *pos)
+{
+	struct xe_device *xe = file_inode(f)->i_private;
+	int ret = 0;
+	u64 mode;
+
+	ret = kstrtou64_from_user(ubuf, size, 0, &mode);
+	if (ret)
+		return ret;
+
+	switch (mode) {
+	case MEMPAGE_OFFLINE_UNALLOCATED:
+	case MEMPAGE_OFFLINE_USER_ALLOCATED:
+	case MEMPAGE_OFFLINE_KERNEL_USER_GGTT_ALLOCATED:
+	case MEMPAGE_OFFLINE_KERNEL_USER_PPGTT_ALLOCATED:
+	case MEMPAGE_OFFLINE_KERNEL_CRITICAL_ALLOCATED:
+		ret = mempage_exec_offline(xe, mode);
+		break;
+	case MEMPAGE_OFFLINE_RESERVED:
+		u64 stolen_base;
+
+		stolen_base = xe_ttm_stolen_gpu_offset(xe);
+		ret = xe_ttm_vram_handle_addr_fault(xe, stolen_base);
+		break;
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	xe->mem.vram->ttm.offline_mode = mode;
+	if (!ret || (ret == -EIO &&
+		     (mode == MEMPAGE_OFFLINE_KERNEL_CRITICAL_ALLOCATED ||
+		      mode == MEMPAGE_OFFLINE_RESERVED))) {
+		drm_info(&xe->drm, "offline mode %llu passed ret:%d\n", mode, ret);
+	} else {
+		drm_warn(&xe->drm, "offline mode %llu failed, ret:%d\n", mode, ret);
+		return ret;
+	}
+
+	return size;
+}
+
+static const struct file_operations addr_fault_reporting_fops = {
+	.owner = THIS_MODULE,
+	.read = addr_fault_reporting_show,
+	.write = addr_fault_reporting_set,
+};
+
 void xe_debugfs_register(struct xe_device *xe)
 {
 	struct ttm_device *bdev = &xe->ttm;
@@ -600,6 +757,17 @@ void xe_debugfs_register(struct xe_device *xe)
 	if (man)
 		ttm_resource_manager_create_debugfs(man, root, "stolen_mm");
 
+	if (xe->info.platform == XE_CRESCENTISLAND) {
+		man = ttm_manager_type(bdev, XE_PL_VRAM0);
+		if (man) {
+			char name[20];
+
+			snprintf(name, sizeof(name), "invalid_addr_vram%d", 0);
+			debugfs_create_file(name, 0600, root, xe,
+					    &addr_fault_reporting_fops);
+		}
+	}
+
 	for_each_tile(tile, xe, tile_id)
 		xe_tile_debugfs_register(tile);
 
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
index 3ad7966798eb..07ed88b47e04 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
@@ -37,6 +37,8 @@ struct xe_ttm_vram_mgr {
 	struct mutex lock;
 	/** @mem_type: The TTM memory type */
 	u32 mem_type;
+	/** @offline_mode: debugfs hook for setting page offline mode */
+	u64 offline_mode;
 };
 
 /**
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V11 08/12] gpu/buddy: Add routine to dump allocated buddy blocks
  2026-06-05 12:38 [PATCH V11 00/12] Add memory page offlining support Tejas Upadhyay
                   ` (6 preceding siblings ...)
  2026-06-05 12:38 ` [PATCH V11 07/12] drm/xe/cri: Add debugfs to inject faulty vram address Tejas Upadhyay
@ 2026-06-05 12:38 ` Tejas Upadhyay
  2026-06-05 12:38 ` [PATCH V11 09/12] drm/xe/configfs: Add vram bad page reservation policy Tejas Upadhyay
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Tejas Upadhyay @ 2026-06-05 12:38 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.auld, matthew.brost, thomas.hellstrom,
	himal.prasad.ghimiray, Tejas Upadhyay, Arunpravin Paneer Selvam,
	dri-devel

To implement the ability to see allocated blocks under a specific VRAM
instance in the drm driver, new api is introduced. While existing structs
often show the free block list, this addition provides a comprehensive view
of all currently resident VRAM allocations.

Dump will look like,

[  +0.000003] xe 0000:03:00.0: [drm] 0x00000002f8000000-0x00000002f8800000: 8388608
[  +0.000005] xe 0000:03:00.0: [drm] 0x00000002f8800000-0x00000002f8840000: 262144
[  +0.000004] xe 0000:03:00.0: [drm] 0x00000002f8840000-0x00000002f8860000: 131072
[  +0.000004] xe 0000:03:00.0: [drm] 0x00000002f8860000-0x00000002f8870000: 65536
[  +0.000005] xe 0000:03:00.0: [drm] 0x00000002f9000000-0x00000002f9800000: 8388608
[  +0.000004] xe 0000:03:00.0: [drm] 0x00000002f9800000-0x00000002f9880000: 524288
[  +0.000005] xe 0000:03:00.0: [drm] 0x00000002f9880000-0x00000002f9884000: 16384
[  +0.000004] xe 0000:03:00.0: [drm] 0x00000002f9900000-0x00000002f9980000: 524288
[  +0.000005] xe 0000:03:00.0: [drm] 0x00000002f9980000-0x00000002f9988000: 32768
[  +0.000004] xe 0000:03:00.0: [drm] 0x00000002f9988000-0x00000002f998c000: 16384

v2(MattB):
- Add lockdep assert

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: Arunpravin Paneer Selvam <arunpravin.paneerselvam@amd.com>
Cc: dri-devel@lists.freedesktop.org
---
 drivers/gpu/buddy.c       | 43 +++++++++++++++++++++++++++++++++++++++
 include/linux/gpu_buddy.h |  1 +
 2 files changed, 44 insertions(+)

diff --git a/drivers/gpu/buddy.c b/drivers/gpu/buddy.c
index 315f860f4346..9a6e91eb9a58 100644
--- a/drivers/gpu/buddy.c
+++ b/drivers/gpu/buddy.c
@@ -10,6 +10,7 @@
 #include <linux/sizes.h>
 
 #include <linux/gpu_buddy.h>
+#include <drm/drm_print.h>
 
 /**
  * gpu_buddy_assert - assert a condition in the buddy allocator
@@ -1294,6 +1295,48 @@ int gpu_buddy_block_trim(struct gpu_buddy *mm,
 }
 EXPORT_SYMBOL(gpu_buddy_block_trim);
 
+/**
+ * gpu_buddy_dump_allocated_blocks - print all allocated blocks in drm buddy
+ *
+ * @mm: DRM buddy manager to look into
+ *
+ * Looks into buddy manager for each block and their status and if allocated
+ * print allocated block range and size
+ *
+ * Returns:
+ * void
+ */
+void gpu_buddy_dump_allocated_blocks(struct gpu_buddy *mm)
+{
+	struct gpu_buddy_block *block;
+	LIST_HEAD(dfs);
+	int i;
+
+	gpu_buddy_driver_lock_held(mm);
+
+	for (i = 0; i < mm->n_roots; ++i)
+		list_add_tail(&mm->roots[i]->tmp_link, &dfs);
+
+	do {
+		block = list_first_entry_or_null(&dfs,
+						 struct gpu_buddy_block,
+						 tmp_link);
+		if (!block)
+			break;
+
+		list_del(&block->tmp_link);
+
+		if (gpu_buddy_block_is_allocated(block))
+			gpu_buddy_block_print(mm, block);
+
+		if (gpu_buddy_block_is_split(block)) {
+			list_add(&block->right->tmp_link, &dfs);
+			list_add(&block->left->tmp_link, &dfs);
+		}
+	} while (1);
+}
+EXPORT_SYMBOL(gpu_buddy_dump_allocated_blocks);
+
 static struct gpu_buddy_block *
 __gpu_buddy_alloc_blocks(struct gpu_buddy *mm,
 			 u64 start, u64 end,
diff --git a/include/linux/gpu_buddy.h b/include/linux/gpu_buddy.h
index e7e22fa05ee2..aaebdceb1c44 100644
--- a/include/linux/gpu_buddy.h
+++ b/include/linux/gpu_buddy.h
@@ -267,6 +267,7 @@ int gpu_buddy_block_trim(struct gpu_buddy *mm,
 			 u64 *start,
 			 u64 new_size,
 			 struct list_head *blocks);
+void gpu_buddy_dump_allocated_blocks(struct gpu_buddy *mm);
 
 void gpu_buddy_reset_clear(struct gpu_buddy *mm, bool is_clear);
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V11 09/12] drm/xe/configfs: Add vram bad page reservation policy
  2026-06-05 12:38 [PATCH V11 00/12] Add memory page offlining support Tejas Upadhyay
                   ` (7 preceding siblings ...)
  2026-06-05 12:38 ` [PATCH V11 08/12] gpu/buddy: Add routine to dump allocated buddy blocks Tejas Upadhyay
@ 2026-06-05 12:38 ` Tejas Upadhyay
  2026-06-05 12:38 ` [PATCH V11 10/12] drm/xe/cri: Add sysfs interface for bad gpu vram pages Tejas Upadhyay
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 15+ messages in thread
From: Tejas Upadhyay @ 2026-06-05 12:38 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.auld, matthew.brost, thomas.hellstrom,
	himal.prasad.ghimiray, Tejas Upadhyay

The interface enables setting the policy for how bad pages are
handled in VRAM. This is crucial for maintaining system
stability in scenarios where VRAM degradation occurs.

By default policy will be "reserve", which can be changed to
"logging" only.

v3:
- All FW communication moved under RAS
v2:
- Add CRI check and rebase

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
 drivers/gpu/drm/xe/xe_configfs.c     | 64 +++++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_configfs.h     |  2 +
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 10 +++++
 3 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_configfs.c b/drivers/gpu/drm/xe/xe_configfs.c
index 32102600a148..e07a6a74896b 100644
--- a/drivers/gpu/drm/xe/xe_configfs.c
+++ b/drivers/gpu/drm/xe/xe_configfs.c
@@ -61,7 +61,8 @@
  *	    ├── survivability_mode
  *	    ├── gt_types_allowed
  *	    ├── engines_allowed
- *	    └── enable_psmi
+ *          ├── enable_psmi
+ *          └── bad_page_reservation
  *
  * After configuring the attributes as per next section, the device can be
  * probed with::
@@ -159,6 +160,16 @@
  *
  * This attribute can only be set before binding to the device.
  *
+ * Bad pages reservation:
+ * ---------------------
+ *
+ * Disable vram bad pages reservation, instead just report it in dmesg.
+ *  Example to disable it::
+ *
+ *      # echo 0 > /sys/kernel/config/xe/0000:03:00.0/bad_page_reservation
+ *
+ * This attribute can only be set before binding to the device.
+ *
  * Context restore BB
  * ------------------
  *
@@ -262,6 +273,7 @@ struct xe_config_group_device {
 		struct wa_bb ctx_restore_mid_bb[XE_ENGINE_CLASS_MAX];
 		bool survivability_mode;
 		bool enable_psmi;
+		bool bad_page_reservation;
 		struct {
 			unsigned int max_vfs;
 			bool admin_only_pf;
@@ -281,6 +293,7 @@ static const struct xe_config_device device_defaults = {
 	.engines_allowed = U64_MAX,
 	.survivability_mode = false,
 	.enable_psmi = false,
+	.bad_page_reservation = true,
 	.sriov = {
 		.max_vfs = XE_DEFAULT_MAX_VFS,
 		.admin_only_pf = XE_DEFAULT_ADMIN_ONLY_PF,
@@ -575,6 +588,32 @@ static ssize_t enable_psmi_store(struct config_item *item, const char *page, siz
 	return len;
 }
 
+static ssize_t bad_page_reservation_show(struct config_item *item, char *page)
+{
+	struct xe_config_device *dev = to_xe_config_device(item);
+
+	return sprintf(page, "%d\n", dev->bad_page_reservation);
+}
+
+static ssize_t bad_page_reservation_store(struct config_item *item, const char *page, size_t len)
+{
+	struct xe_config_group_device *dev = to_xe_config_group_device(item);
+	bool val;
+	int ret;
+
+	ret = kstrtobool(page, &val);
+	if (ret)
+		return ret;
+
+	guard(mutex)(&dev->lock);
+	if (is_bound(dev))
+		return -EBUSY;
+
+	dev->config.bad_page_reservation = val;
+
+	return len;
+}
+
 static bool wa_bb_read_advance(bool dereference, char **p,
 			       const char *append, size_t len,
 			       size_t *max_size)
@@ -813,6 +852,7 @@ static ssize_t ctx_restore_post_bb_store(struct config_item *item,
 CONFIGFS_ATTR(, ctx_restore_mid_bb);
 CONFIGFS_ATTR(, ctx_restore_post_bb);
 CONFIGFS_ATTR(, enable_psmi);
+CONFIGFS_ATTR(, bad_page_reservation);
 CONFIGFS_ATTR(, engines_allowed);
 CONFIGFS_ATTR(, gt_types_allowed);
 CONFIGFS_ATTR(, survivability_mode);
@@ -821,6 +861,7 @@ static struct configfs_attribute *xe_config_device_attrs[] = {
 	&attr_ctx_restore_mid_bb,
 	&attr_ctx_restore_post_bb,
 	&attr_enable_psmi,
+	&attr_bad_page_reservation,
 	&attr_engines_allowed,
 	&attr_gt_types_allowed,
 	&attr_survivability_mode,
@@ -1098,6 +1139,7 @@ static void dump_custom_dev_config(struct pci_dev *pdev,
 	PRI_CUSTOM_ATTR("%llx", gt_types_allowed);
 	PRI_CUSTOM_ATTR("%llx", engines_allowed);
 	PRI_CUSTOM_ATTR("%d", enable_psmi);
+	PRI_CUSTOM_ATTR("%d", bad_page_reservation);
 	PRI_CUSTOM_ATTR("%d", survivability_mode);
 	PRI_CUSTOM_ATTR("%u", sriov.admin_only_pf);
 
@@ -1225,6 +1267,26 @@ bool xe_configfs_get_psmi_enabled(struct pci_dev *pdev)
 	return ret;
 }
 
+/**
+ * xe_configfs_get_bad_page_reservation - get configfs bad_page_reservation setting
+ * @pdev: pci device
+ *
+ * Return: bad_page_reservation setting in configfs
+ */
+bool xe_configfs_get_bad_page_reservation(struct pci_dev *pdev)
+{
+	struct xe_config_group_device *dev = find_xe_config_group_device(pdev);
+	bool ret;
+
+	if (!dev)
+		return device_defaults.bad_page_reservation;
+
+	ret = dev->config.bad_page_reservation;
+	config_group_put(&dev->group);
+
+	return ret;
+}
+
 /**
  * xe_configfs_get_ctx_restore_mid_bb - get configfs ctx_restore_mid_bb setting
  * @pdev: pci device
diff --git a/drivers/gpu/drm/xe/xe_configfs.h b/drivers/gpu/drm/xe/xe_configfs.h
index 07d62bf0c152..c107d84b2c62 100644
--- a/drivers/gpu/drm/xe/xe_configfs.h
+++ b/drivers/gpu/drm/xe/xe_configfs.h
@@ -23,6 +23,7 @@ bool xe_configfs_primary_gt_allowed(struct pci_dev *pdev);
 bool xe_configfs_media_gt_allowed(struct pci_dev *pdev);
 u64 xe_configfs_get_engines_allowed(struct pci_dev *pdev);
 bool xe_configfs_get_psmi_enabled(struct pci_dev *pdev);
+bool xe_configfs_get_bad_page_reservation(struct pci_dev *pdev);
 u32 xe_configfs_get_ctx_restore_mid_bb(struct pci_dev *pdev,
 				       enum xe_engine_class class,
 				       const u32 **cs);
@@ -42,6 +43,7 @@ static inline bool xe_configfs_primary_gt_allowed(struct pci_dev *pdev) { return
 static inline bool xe_configfs_media_gt_allowed(struct pci_dev *pdev) { return true; }
 static inline u64 xe_configfs_get_engines_allowed(struct pci_dev *pdev) { return U64_MAX; }
 static inline bool xe_configfs_get_psmi_enabled(struct pci_dev *pdev) { return false; }
+static inline bool xe_configfs_get_bad_page_reservation(struct pci_dev *pdev) { return true; }
 static inline u32 xe_configfs_get_ctx_restore_mid_bb(struct pci_dev *pdev,
 						     enum xe_engine_class class,
 						     const u32 **cs) { return 0; }
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index 0e968ae47fd9..f18eb51e98a1 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -12,6 +12,7 @@
 #include <drm/ttm/ttm_range_manager.h>
 
 #include "xe_bo.h"
+#include "xe_configfs.h"
 #include "xe_device.h"
 #include "xe_exec_queue.h"
 #include "xe_lrc.h"
@@ -735,6 +736,7 @@ int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long addr)
 	struct xe_ttm_vram_mgr *vram_mgr;
 	struct xe_vram_region *vr;
 	struct gpu_buddy *mm;
+        bool policy;
 
 	vr = xe_ttm_vram_addr_to_region(xe, addr);
 	if (!vr) {
@@ -751,6 +753,14 @@ int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long addr)
 			return -EEXIST;
 	}
 
+	policy = xe_configfs_get_bad_page_reservation(to_pci_dev(xe->drm.dev));
+	if (!policy) {
+		drm_err(&xe->drm, "0x%lx is reported as corrupted address by HW\n",
+			addr);
+		/* Let RAS report to FW to drop addr from SRAM queue */
+		return -EOPNOTSUPP;
+	}
+
 	/* Reserve page at address */
 	return xe_ttm_vram_reserve_page_at_addr(xe, addr, vram_mgr, mm);
 }
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V11 10/12] drm/xe/cri: Add sysfs interface for bad gpu vram pages
  2026-06-05 12:38 [PATCH V11 00/12] Add memory page offlining support Tejas Upadhyay
                   ` (8 preceding siblings ...)
  2026-06-05 12:38 ` [PATCH V11 09/12] drm/xe/configfs: Add vram bad page reservation policy Tejas Upadhyay
@ 2026-06-05 12:38 ` Tejas Upadhyay
  2026-06-05 12:38 ` [PATCH V11 11/12] drm/xe/uapi: Expose ban reason in EXEC_QUEUE_GET_PROPERTY_BAN Tejas Upadhyay
  2026-06-05 12:38 ` [PATCH V11 12/12] drm/xe: Add soft/hard offline mode for VRAM page retirement Tejas Upadhyay
  11 siblings, 0 replies; 15+ messages in thread
From: Tejas Upadhyay @ 2026-06-05 12:38 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.auld, matthew.brost, thomas.hellstrom,
	himal.prasad.ghimiray, Tejas Upadhyay

Starting CRI, Include a sysfs interface designed to expose information
about bad VRAM pages—those identified as having hardware faults
(e.g., ECC errors). This interface allows userspace tools and
administrators to monitor the health of the GPU's local memory and
track the status of page retirement.To get details on bad gpu vram
pages can be found under /sys/bus/pci/devices/bdf/vram_bad_pages.

Where The format is, pfn : gpu page size : flags

flags:
R: reserved, this gpu page is reserved.
P: pending for reserve, this gpu page is marked as bad, will be reserved
   in next window of page_reserve.
F: unable to reserve. this gpu page can’t be reserved due to some reasons.

For example if you read using cat /sys/bus/pci/devices/bdf/vram_bad_pages,
max_pages : 10000
0x00000000 : 0x00001000 : R
0x00001234 : 0x00001000 : P

v3:
- Move FW communication in RAS code
v2:
- Add max_pages info as per updated design doc
- Rebase

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
 drivers/gpu/drm/xe/xe_device_sysfs.c       |  7 ++
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c       | 79 ++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.h       |  1 +
 drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h |  2 +
 4 files changed, 89 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_device_sysfs.c b/drivers/gpu/drm/xe/xe_device_sysfs.c
index a73e0e957cb0..47c5be4180fe 100644
--- a/drivers/gpu/drm/xe/xe_device_sysfs.c
+++ b/drivers/gpu/drm/xe/xe_device_sysfs.c
@@ -8,12 +8,14 @@
 #include <linux/pci.h>
 #include <linux/sysfs.h>
 
+#include "xe_configfs.h"
 #include "xe_device.h"
 #include "xe_device_sysfs.h"
 #include "xe_mmio.h"
 #include "xe_pcode_api.h"
 #include "xe_pcode.h"
 #include "xe_pm.h"
+#include "xe_ttm_vram_mgr.h"
 
 /**
  * DOC: Xe device sysfs
@@ -267,6 +269,7 @@ static const struct attribute_group auto_link_downgrade_attr_group = {
 int xe_device_sysfs_init(struct xe_device *xe)
 {
 	struct device *dev = xe->drm.dev;
+	bool policy;
 	int ret;
 
 	if (xe->d3cold.capable) {
@@ -285,5 +288,9 @@ int xe_device_sysfs_init(struct xe_device *xe)
 			return ret;
 	}
 
+	policy = xe_configfs_get_bad_page_reservation(to_pci_dev(dev));
+	if (xe->info.platform == XE_CRESCENTISLAND && policy)
+		xe_ttm_vram_sysfs_init(xe);
+
 	return 0;
 }
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index f18eb51e98a1..35b5eaf590fa 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -765,3 +765,82 @@ int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long addr)
 	return xe_ttm_vram_reserve_page_at_addr(xe, addr, vram_mgr, mm);
 }
 EXPORT_SYMBOL(xe_ttm_vram_handle_addr_fault);
+
+static void xe_ttm_vram_dump_bad_pages_info(char *buf, struct xe_ttm_vram_mgr *mgr)
+{
+	const unsigned int element_size = sizeof("0xabcdabcd : 0x12345678 : R\n") - 1;
+	const unsigned int maxpage_size = sizeof("max_pages: 10000\n") - 1;
+	struct xe_ttm_vram_offline_resource *pos, *n;
+	struct gpu_buddy_block *block;
+	ssize_t s = 0;
+
+	mutex_lock(&mgr->lock);
+	s += scnprintf(&buf[s], maxpage_size + 1, "max_pages: %d\n", mgr->max_pages);
+	list_for_each_entry_safe(pos, n, &mgr->offlined_pages, offlined_link) {
+		block = list_first_entry(&pos->blocks,
+					 struct gpu_buddy_block,
+					 link);
+		s += scnprintf(&buf[s], element_size + 1,
+			       "0x%08llx : 0x%08llx : %1s\n",
+			       gpu_buddy_block_offset(block) >> PAGE_SHIFT,
+			       gpu_buddy_block_size(&mgr->mm, block),
+			       "R");
+	}
+	list_for_each_entry_safe(pos, n, &mgr->queued_pages, queued_link) {
+		block = list_first_entry(&pos->blocks,
+					 struct gpu_buddy_block,
+					 link);
+		s += scnprintf(&buf[s], element_size + 1,
+			       "0x%08llx : 0x%08llx : %1s\n",
+			       gpu_buddy_block_offset(block) >> PAGE_SHIFT,
+			       gpu_buddy_block_size(&mgr->mm, block),
+			       pos->status ? "P" : "F");
+	}
+	mutex_unlock(&mgr->lock);
+}
+
+static ssize_t vram_bad_pages_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct xe_device *xe = pdev_to_xe_device(pdev);
+	struct ttm_resource_manager *man;
+	struct xe_ttm_vram_mgr *mgr;
+
+	man = ttm_manager_type(&xe->ttm, XE_PL_VRAM0);
+	if (man) {
+		mgr = to_xe_ttm_vram_mgr(man);
+		xe_ttm_vram_dump_bad_pages_info(buf, mgr);
+	}
+
+	return sysfs_emit(buf, "%s\n", buf);
+}
+static DEVICE_ATTR_RO(vram_bad_pages);
+
+static void xe_ttm_vram_sysfs_fini(void *arg)
+{
+	struct xe_device *xe = arg;
+
+	device_remove_file(xe->drm.dev, &dev_attr_vram_bad_pages);
+}
+
+/**
+ * xe_ttm_vram_sysfs_init - Initialize vram sysfs component
+ * @tile: Xe Tile object
+ *
+ * It needs to be initialized after the main tile component is ready
+ *
+ * Returns: 0 on success, negative error code on error.
+ */
+int xe_ttm_vram_sysfs_init(struct xe_device *xe)
+{
+	int err;
+
+	err = device_create_file(xe->drm.dev, &dev_attr_vram_bad_pages);
+	if (err) {
+		dev_err(xe->drm.dev, "Failed to create vram_bad_pages sysfs file: %d\n", err);
+		return 0;
+	}
+
+	return devm_add_action_or_reset(xe->drm.dev, xe_ttm_vram_sysfs_fini, xe);
+}
+EXPORT_SYMBOL(xe_ttm_vram_sysfs_init);
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
index 8ef06d9d44f7..c33e1a8d9217 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
@@ -32,6 +32,7 @@ void xe_ttm_vram_get_used(struct ttm_resource_manager *man,
 			  u64 *used, u64 *used_visible);
 
 int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long addr);
+int xe_ttm_vram_sysfs_init(struct xe_device *xe);
 static inline struct xe_ttm_vram_mgr_resource *
 to_xe_ttm_vram_mgr_resource(struct ttm_resource *res)
 {
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
index 07ed88b47e04..b23796066a1a 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
@@ -39,6 +39,8 @@ struct xe_ttm_vram_mgr {
 	u32 mem_type;
 	/** @offline_mode: debugfs hook for setting page offline mode */
 	u64 offline_mode;
+	/** @max_pages: max pages that can be in offline queue retrieved from FW */
+	u16 max_pages;
 };
 
 /**
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V11 11/12] drm/xe/uapi: Expose ban reason in EXEC_QUEUE_GET_PROPERTY_BAN
  2026-06-05 12:38 [PATCH V11 00/12] Add memory page offlining support Tejas Upadhyay
                   ` (9 preceding siblings ...)
  2026-06-05 12:38 ` [PATCH V11 10/12] drm/xe/cri: Add sysfs interface for bad gpu vram pages Tejas Upadhyay
@ 2026-06-05 12:38 ` Tejas Upadhyay
  2026-06-08 14:03   ` Souza, Jose
  2026-06-05 12:38 ` [PATCH V11 12/12] drm/xe: Add soft/hard offline mode for VRAM page retirement Tejas Upadhyay
  11 siblings, 1 reply; 15+ messages in thread
From: Tejas Upadhyay @ 2026-06-05 12:38 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.auld, matthew.brost, thomas.hellstrom,
	himal.prasad.ghimiray, Tejas Upadhyay, Mrozek, Michal,
	José Roberto de Souza, Vivi, Rodrigo

Extend DRM_XE_EXEC_QUEUE_GET_PROPERTY_BAN to return a bitmask indicating
the reason for the ban, rather than a simple boolean. This allows
userspace to distinguish between different ban causes:

- DRM_XE_EXEC_QUEUE_BAN_REASON_GPU_HANG (bit 0): exec queue was banned
  due to a GPU hang or job timeout detected by the TDR.
- DRM_XE_EXEC_QUEUE_BAN_REASON_PAGE_OFFLINE (bit 1): exec queue was
  banned because a VRAM page backing its resources was taken offline.

The ban_reason field is added to struct xe_exec_queue and set at the
point where the ban is triggered:
- In guc_exec_queue_timedout_job() for GPU hang.
- In xe_ttm_vram_purge_page() for memory page offline, before calling
  xe_exec_queue_kill() or xe_vm_kill().

The reset_status op is updated to return u64 with the reason bitmask.
When a queue is banned but no explicit reason was recorded (e.g., from a
generic CAT error), it defaults to GPU_HANG for backward compatibility.
A value of 0 means the exec queue is not banned.

Assisted-by: Copilot:claude-opus-4.6
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
cc: Mrozek, Michal <michal.mrozek@intel.com>
cc: José Roberto de Souza <jose.souza@intel.com>
cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_exec_queue_types.h |  7 +++++--
 drivers/gpu/drm/xe/xe_execlist.c         |  4 ++--
 drivers/gpu/drm/xe/xe_guc_submit.c       | 24 +++++++++++++++++++-----
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c     |  7 +++++++
 include/uapi/drm/xe_drm.h                | 12 +++++++++++-
 5 files changed, 44 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h
index 2f5ccf294675..77a621da4487 100644
--- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
+++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
@@ -143,6 +143,9 @@ struct xe_exec_queue {
 	 */
 	unsigned long flags;
 
+	/** @ban_reason: Bitmask of ban reasons (DRM_XE_EXEC_QUEUE_BAN_REASON_*) */
+	u32 ban_reason;
+
 	union {
 		/** @multi_gt_list: list head for VM bind engines if multi-GT */
 		struct list_head multi_gt_list;
@@ -316,8 +319,8 @@ struct xe_exec_queue_ops {
 	 * signalled when this function is called.
 	 */
 	void (*resume)(struct xe_exec_queue *q);
-	/** @reset_status: check exec queue reset status */
-	bool (*reset_status)(struct xe_exec_queue *q);
+	/** @reset_status: check exec queue ban status, returns ban reason bitmask */
+	u64 (*reset_status)(struct xe_exec_queue *q);
 	/** @active: check exec queue is active */
 	bool (*active)(struct xe_exec_queue *q);
 };
diff --git a/drivers/gpu/drm/xe/xe_execlist.c b/drivers/gpu/drm/xe/xe_execlist.c
index 9fb99c038ea8..35e6e05ba418 100644
--- a/drivers/gpu/drm/xe/xe_execlist.c
+++ b/drivers/gpu/drm/xe/xe_execlist.c
@@ -452,10 +452,10 @@ static void execlist_exec_queue_resume(struct xe_exec_queue *q)
 	/* NIY */
 }
 
-static bool execlist_exec_queue_reset_status(struct xe_exec_queue *q)
+static u64 execlist_exec_queue_reset_status(struct xe_exec_queue *q)
 {
 	/* NIY */
-	return false;
+	return 0;
 }
 
 static bool execlist_exec_queue_active(struct xe_exec_queue *q)
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index 4b247a3019d2..ff28eab7cee2 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -6,6 +6,7 @@
 #include "xe_guc_submit.h"
 
 #include <linux/bitfield.h>
+#include <uapi/drm/xe_drm.h>
 #include <linux/bitmap.h>
 #include <linux/circ_buf.h>
 #include <linux/dma-fence-array.h>
@@ -1530,6 +1531,7 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 	if (!exec_queue_killed(q))
 		wedged = guc_submit_hint_wedged(exec_queue_to_guc(q));
 
+	q->ban_reason |= DRM_XE_EXEC_QUEUE_BAN_REASON_GPU_HANG;
 	set_exec_queue_banned(q);
 
 	/* Kick job / queue off hardware */
@@ -2211,13 +2213,25 @@ static void guc_exec_queue_resume(struct xe_exec_queue *q)
 	xe_sched_msg_unlock(sched);
 }
 
-static bool guc_exec_queue_reset_status(struct xe_exec_queue *q)
+static u64 guc_exec_queue_reset_status(struct xe_exec_queue *q)
 {
-	if (xe_exec_queue_is_multi_queue_secondary(q) &&
-	    guc_exec_queue_reset_status(xe_exec_queue_multi_queue_primary(q)))
-		return true;
+	if (xe_exec_queue_is_multi_queue_secondary(q)) {
+		u64 status = guc_exec_queue_reset_status(
+				xe_exec_queue_multi_queue_primary(q));
+		if (status)
+			return status;
+	}
+
+	if (exec_queue_reset(q) || exec_queue_killed_or_banned_or_wedged(q)) {
+		u64 reason = q->ban_reason;
 
-	return exec_queue_reset(q) || exec_queue_killed_or_banned_or_wedged(q);
+		/* If no specific reason was recorded, default to GPU hang */
+		if (!reason)
+			reason = DRM_XE_EXEC_QUEUE_BAN_REASON_GPU_HANG;
+		return reason;
+	}
+
+	return 0;
 }
 
 static bool guc_exec_queue_active(struct xe_exec_queue *q)
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index 35b5eaf590fa..3765e8fcdcec 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -7,6 +7,7 @@
 #include <drm/drm_managed.h>
 #include <drm/drm_drv.h>
 #include <drm/drm_buddy.h>
+#include <uapi/drm/xe_drm.h>
 
 #include <drm/ttm/ttm_placement.h>
 #include <drm/ttm/ttm_range_manager.h>
@@ -537,10 +538,15 @@ static int xe_ttm_vram_purge_page(struct xe_device *xe, struct xe_bo *bo)
 	xe_bo_unlock(bo);
 	/*  Ban VM if BO is PPGTT */
 	if (vm && (flags & XE_BO_FLAG_PAGETABLE)) {
+		struct xe_exec_queue *eq;
+
 		down_write(&vm->lock);
+		list_for_each_entry(eq, &vm->preempt.exec_queues, lr.link)
+			eq->ban_reason |= DRM_XE_EXEC_QUEUE_BAN_REASON_PAGE_OFFLINE;
 		xe_vm_kill(vm, true);
 		up_write(&vm->lock);
 	}
+
 	if (vm)
 		xe_vm_put(vm);
 
@@ -548,6 +554,7 @@ static int xe_ttm_vram_purge_page(struct xe_device *xe, struct xe_bo *bo)
 	/*  Ban exec queue if BO is lrc */
 	if (bo->q && xe_exec_queue_get_unless_zero(bo->q)) {
 		/* ban queue */
+		bo->q->ban_reason |= DRM_XE_EXEC_QUEUE_BAN_REASON_PAGE_OFFLINE;
 		xe_exec_queue_kill(bo->q);
 		xe_exec_queue_put(bo->q);
 	}
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 48e9f1fdb78d..904d58b039fe 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -1503,7 +1503,17 @@ struct drm_xe_exec_queue_get_property {
 	/** @property: property to get */
 	__u32 property;
 
-	/** @value: property value */
+	/**
+	 * @value: property value
+	 *
+	 * For %DRM_XE_EXEC_QUEUE_GET_PROPERTY_BAN, this is a bitmask of:
+	 *  - %DRM_XE_EXEC_QUEUE_BAN_REASON_GPU_HANG - banned due to GPU hang/timeout
+	 *  - %DRM_XE_EXEC_QUEUE_BAN_REASON_PAGE_OFFLINE - banned due to memory page offline
+	 *
+	 * Value of 0 means the exec queue is not banned.
+	 */
+#define DRM_XE_EXEC_QUEUE_BAN_REASON_GPU_HANG		(1 << 0)
+#define DRM_XE_EXEC_QUEUE_BAN_REASON_PAGE_OFFLINE	(1 << 1)
 	__u64 value;
 
 	/** @reserved: Reserved */
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH V11 12/12] drm/xe: Add soft/hard offline mode for VRAM page retirement
  2026-06-05 12:38 [PATCH V11 00/12] Add memory page offlining support Tejas Upadhyay
                   ` (10 preceding siblings ...)
  2026-06-05 12:38 ` [PATCH V11 11/12] drm/xe/uapi: Expose ban reason in EXEC_QUEUE_GET_PROPERTY_BAN Tejas Upadhyay
@ 2026-06-05 12:38 ` Tejas Upadhyay
  11 siblings, 0 replies; 15+ messages in thread
From: Tejas Upadhyay @ 2026-06-05 12:38 UTC (permalink / raw)
  To: intel-xe
  Cc: matthew.auld, matthew.brost, thomas.hellstrom,
	himal.prasad.ghimiray, Tejas Upadhyay

Introduce a mode-based approach for VRAM page offlining to distinguish
between soft (temporary) and hard (permanent) page retirement:

- Add enum xe_page_offline_mode with XE_PAGE_OFFLINE_SOFT and
  XE_PAGE_OFFLINE_HARD states.
- Add 'mode' field to struct xe_ttm_vram_offline_resource.

On first fault at an address:
- Reserve the block, add to offlined_pages with mode = SOFT.
- Page stays out of the allocator pool but is not permanently retired.

On second fault at the same address:
- xe_ttm_vram_page_already_processed() finds the entry and promotes
  mode from SOFT to HARD.
- Returns -EEXIST to the caller, which signals FW to permanently
  retire the page.

This avoids unnecessary permanent retirement on transient single-bit
errors while ensuring persistent faults get hard-offlined.

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c       | 10 +++++++++-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h | 12 ++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index 3765e8fcdcec..71da7ee6ba7e 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -577,8 +577,15 @@ static bool xe_ttm_vram_page_already_processed(struct xe_ttm_vram_mgr *mgr,
 	lockdep_assert_held(&mgr->lock);
 
 	list_for_each_entry(pos, &mgr->offlined_pages, offlined_link) {
-		if (pos->addr == addr)
+		if (pos->addr == addr) {
+			/*
+			 * Second fault at same addr: promote soft to hard.
+			 * Caller returns -EEXIST to FW for permanent retirement.
+			 */
+			if (pos->mode == XE_PAGE_OFFLINE_SOFT)
+				pos->mode = XE_PAGE_OFFLINE_HARD;
 			return true;
+		}
 	}
 
 	list_for_each_entry(pos, &mgr->queued_pages, queued_link) {
@@ -613,6 +620,7 @@ static int xe_ttm_vram_reserve_page_at_addr(struct xe_device *xe, unsigned long
 		INIT_LIST_HEAD(&nentry->blocks);
 		nentry->status = pending;
 		nentry->addr = addr;
+		nentry->mode = XE_PAGE_OFFLINE_SOFT;
 
 		if (block) {
 			struct xe_bo *pbo;
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
index b23796066a1a..f73dfd1ad82b 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
@@ -57,6 +57,16 @@ struct xe_ttm_vram_mgr_resource {
 	unsigned long flags;
 };
 
+/**
+ * enum xe_page_offline_mode - Mode of page offline
+ * @XE_PAGE_OFFLINE_SOFT: Page is soft-offlined (scrubbed, can be restored)
+ * @XE_PAGE_OFFLINE_HARD: Page is hard-offlined (permanently retired)
+ */
+enum xe_page_offline_mode {
+	XE_PAGE_OFFLINE_SOFT = 0,
+	XE_PAGE_OFFLINE_HARD,
+};
+
 /**
  * struct xe_ttm_vram_offline_resource - Xe TTM VRAM offline  resource
  */
@@ -75,6 +85,8 @@ struct xe_ttm_vram_offline_resource {
 	unsigned long addr;
 	/** @status: reservation status of resource */
 	bool status;
+	/** @mode: offline mode (soft or hard) */
+	enum xe_page_offline_mode mode;
 };
 
 #endif
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH V11 11/12] drm/xe/uapi: Expose ban reason in EXEC_QUEUE_GET_PROPERTY_BAN
  2026-06-05 12:38 ` [PATCH V11 11/12] drm/xe/uapi: Expose ban reason in EXEC_QUEUE_GET_PROPERTY_BAN Tejas Upadhyay
@ 2026-06-08 14:03   ` Souza, Jose
  2026-06-09  0:37     ` Rodrigo Vivi
  0 siblings, 1 reply; 15+ messages in thread
From: Souza, Jose @ 2026-06-08 14:03 UTC (permalink / raw)
  To: intel-xe@lists.freedesktop.org, Upadhyay, Tejas
  Cc: Brost, Matthew, Vivi, Rodrigo, Ghimiray, Himal Prasad,
	Auld, Matthew, thomas.hellstrom@linux.intel.com, Mrozek, Michal

On Fri, 2026-06-05 at 18:08 +0530, Tejas Upadhyay wrote:
> Extend DRM_XE_EXEC_QUEUE_GET_PROPERTY_BAN to return a bitmask
> indicating
> the reason for the ban, rather than a simple boolean. This allows
> userspace to distinguish between different ban causes:
> 
> - DRM_XE_EXEC_QUEUE_BAN_REASON_GPU_HANG (bit 0): exec queue was
> banned
>   due to a GPU hang or job timeout detected by the TDR.
> - DRM_XE_EXEC_QUEUE_BAN_REASON_PAGE_OFFLINE (bit 1): exec queue was
>   banned because a VRAM page backing its resources was taken offline.
> 
> The ban_reason field is added to struct xe_exec_queue and set at the
> point where the ban is triggered:
> - In guc_exec_queue_timedout_job() for GPU hang.
> - In xe_ttm_vram_purge_page() for memory page offline, before calling
>   xe_exec_queue_kill() or xe_vm_kill().
> 
> The reset_status op is updated to return u64 with the reason bitmask.
> When a queue is banned but no explicit reason was recorded (e.g.,
> from a
> generic CAT error), it defaults to GPU_HANG for backward
> compatibility.
> A value of 0 means the exec queue is not banned.
> 

Acked-by: José Roberto de Souza <jose.souza@intel.com>

> Assisted-by: Copilot:claude-opus-4.6
> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
> cc: Mrozek, Michal <michal.mrozek@intel.com>
> cc: José Roberto de Souza <jose.souza@intel.com>
> cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_exec_queue_types.h |  7 +++++--
>  drivers/gpu/drm/xe/xe_execlist.c         |  4 ++--
>  drivers/gpu/drm/xe/xe_guc_submit.c       | 24 +++++++++++++++++++---
> --
>  drivers/gpu/drm/xe/xe_ttm_vram_mgr.c     |  7 +++++++
>  include/uapi/drm/xe_drm.h                | 12 +++++++++++-
>  5 files changed, 44 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> index 2f5ccf294675..77a621da4487 100644
> --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> @@ -143,6 +143,9 @@ struct xe_exec_queue {
>  	 */
>  	unsigned long flags;
>  
> +	/** @ban_reason: Bitmask of ban reasons
> (DRM_XE_EXEC_QUEUE_BAN_REASON_*) */
> +	u32 ban_reason;
> +
>  	union {
>  		/** @multi_gt_list: list head for VM bind engines if
> multi-GT */
>  		struct list_head multi_gt_list;
> @@ -316,8 +319,8 @@ struct xe_exec_queue_ops {
>  	 * signalled when this function is called.
>  	 */
>  	void (*resume)(struct xe_exec_queue *q);
> -	/** @reset_status: check exec queue reset status */
> -	bool (*reset_status)(struct xe_exec_queue *q);
> +	/** @reset_status: check exec queue ban status, returns ban
> reason bitmask */
> +	u64 (*reset_status)(struct xe_exec_queue *q);
>  	/** @active: check exec queue is active */
>  	bool (*active)(struct xe_exec_queue *q);
>  };
> diff --git a/drivers/gpu/drm/xe/xe_execlist.c
> b/drivers/gpu/drm/xe/xe_execlist.c
> index 9fb99c038ea8..35e6e05ba418 100644
> --- a/drivers/gpu/drm/xe/xe_execlist.c
> +++ b/drivers/gpu/drm/xe/xe_execlist.c
> @@ -452,10 +452,10 @@ static void execlist_exec_queue_resume(struct
> xe_exec_queue *q)
>  	/* NIY */
>  }
>  
> -static bool execlist_exec_queue_reset_status(struct xe_exec_queue
> *q)
> +static u64 execlist_exec_queue_reset_status(struct xe_exec_queue *q)
>  {
>  	/* NIY */
> -	return false;
> +	return 0;
>  }
>  
>  static bool execlist_exec_queue_active(struct xe_exec_queue *q)
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 4b247a3019d2..ff28eab7cee2 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -6,6 +6,7 @@
>  #include "xe_guc_submit.h"
>  
>  #include <linux/bitfield.h>
> +#include <uapi/drm/xe_drm.h>
>  #include <linux/bitmap.h>
>  #include <linux/circ_buf.h>
>  #include <linux/dma-fence-array.h>
> @@ -1530,6 +1531,7 @@ guc_exec_queue_timedout_job(struct
> drm_sched_job *drm_job)
>  	if (!exec_queue_killed(q))
>  		wedged =
> guc_submit_hint_wedged(exec_queue_to_guc(q));
>  
> +	q->ban_reason |= DRM_XE_EXEC_QUEUE_BAN_REASON_GPU_HANG;
>  	set_exec_queue_banned(q);
>  
>  	/* Kick job / queue off hardware */
> @@ -2211,13 +2213,25 @@ static void guc_exec_queue_resume(struct
> xe_exec_queue *q)
>  	xe_sched_msg_unlock(sched);
>  }
>  
> -static bool guc_exec_queue_reset_status(struct xe_exec_queue *q)
> +static u64 guc_exec_queue_reset_status(struct xe_exec_queue *q)
>  {
> -	if (xe_exec_queue_is_multi_queue_secondary(q) &&
> -	   
> guc_exec_queue_reset_status(xe_exec_queue_multi_queue_primary(q)))
> -		return true;
> +	if (xe_exec_queue_is_multi_queue_secondary(q)) {
> +		u64 status = guc_exec_queue_reset_status(
> +				xe_exec_queue_multi_queue_primary(q)
> );
> +		if (status)
> +			return status;
> +	}
> +
> +	if (exec_queue_reset(q) ||
> exec_queue_killed_or_banned_or_wedged(q)) {
> +		u64 reason = q->ban_reason;
>  
> -	return exec_queue_reset(q) ||
> exec_queue_killed_or_banned_or_wedged(q);
> +		/* If no specific reason was recorded, default to
> GPU hang */
> +		if (!reason)
> +			reason =
> DRM_XE_EXEC_QUEUE_BAN_REASON_GPU_HANG;
> +		return reason;
> +	}
> +
> +	return 0;
>  }
>  
>  static bool guc_exec_queue_active(struct xe_exec_queue *q)
> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> index 35b5eaf590fa..3765e8fcdcec 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> @@ -7,6 +7,7 @@
>  #include <drm/drm_managed.h>
>  #include <drm/drm_drv.h>
>  #include <drm/drm_buddy.h>
> +#include <uapi/drm/xe_drm.h>
>  
>  #include <drm/ttm/ttm_placement.h>
>  #include <drm/ttm/ttm_range_manager.h>
> @@ -537,10 +538,15 @@ static int xe_ttm_vram_purge_page(struct
> xe_device *xe, struct xe_bo *bo)
>  	xe_bo_unlock(bo);
>  	/*  Ban VM if BO is PPGTT */
>  	if (vm && (flags & XE_BO_FLAG_PAGETABLE)) {
> +		struct xe_exec_queue *eq;
> +
>  		down_write(&vm->lock);
> +		list_for_each_entry(eq, &vm->preempt.exec_queues,
> lr.link)
> +			eq->ban_reason |=
> DRM_XE_EXEC_QUEUE_BAN_REASON_PAGE_OFFLINE;
>  		xe_vm_kill(vm, true);
>  		up_write(&vm->lock);
>  	}
> +
>  	if (vm)
>  		xe_vm_put(vm);
>  
> @@ -548,6 +554,7 @@ static int xe_ttm_vram_purge_page(struct
> xe_device *xe, struct xe_bo *bo)
>  	/*  Ban exec queue if BO is lrc */
>  	if (bo->q && xe_exec_queue_get_unless_zero(bo->q)) {
>  		/* ban queue */
> +		bo->q->ban_reason |=
> DRM_XE_EXEC_QUEUE_BAN_REASON_PAGE_OFFLINE;
>  		xe_exec_queue_kill(bo->q);
>  		xe_exec_queue_put(bo->q);
>  	}
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index 48e9f1fdb78d..904d58b039fe 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -1503,7 +1503,17 @@ struct drm_xe_exec_queue_get_property {
>  	/** @property: property to get */
>  	__u32 property;
>  
> -	/** @value: property value */
> +	/**
> +	 * @value: property value
> +	 *
> +	 * For %DRM_XE_EXEC_QUEUE_GET_PROPERTY_BAN, this is a
> bitmask of:
> +	 *  - %DRM_XE_EXEC_QUEUE_BAN_REASON_GPU_HANG - banned due to
> GPU hang/timeout
> +	 *  - %DRM_XE_EXEC_QUEUE_BAN_REASON_PAGE_OFFLINE - banned
> due to memory page offline
> +	 *
> +	 * Value of 0 means the exec queue is not banned.
> +	 */
> +#define DRM_XE_EXEC_QUEUE_BAN_REASON_GPU_HANG		(1 << 0)
> +#define DRM_XE_EXEC_QUEUE_BAN_REASON_PAGE_OFFLINE	(1 << 1)
>  	__u64 value;
>  
>  	/** @reserved: Reserved */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH V11 11/12] drm/xe/uapi: Expose ban reason in EXEC_QUEUE_GET_PROPERTY_BAN
  2026-06-08 14:03   ` Souza, Jose
@ 2026-06-09  0:37     ` Rodrigo Vivi
  0 siblings, 0 replies; 15+ messages in thread
From: Rodrigo Vivi @ 2026-06-09  0:37 UTC (permalink / raw)
  To: Souza, Jose
  Cc: intel-xe@lists.freedesktop.org, Upadhyay, Tejas, Brost, Matthew,
	Ghimiray, Himal Prasad, Auld, Matthew,
	thomas.hellstrom@linux.intel.com, Mrozek, Michal

On Mon, Jun 08, 2026 at 10:03:50AM -0400, Souza, Jose wrote:
> On Fri, 2026-06-05 at 18:08 +0530, Tejas Upadhyay wrote:
> > Extend DRM_XE_EXEC_QUEUE_GET_PROPERTY_BAN to return a bitmask
> > indicating
> > the reason for the ban, rather than a simple boolean. This allows
> > userspace to distinguish between different ban causes:
> > 
> > - DRM_XE_EXEC_QUEUE_BAN_REASON_GPU_HANG (bit 0): exec queue was
> > banned
> >   due to a GPU hang or job timeout detected by the TDR.
> > - DRM_XE_EXEC_QUEUE_BAN_REASON_PAGE_OFFLINE (bit 1): exec queue was
> >   banned because a VRAM page backing its resources was taken offline.
> > 
> > The ban_reason field is added to struct xe_exec_queue and set at the
> > point where the ban is triggered:
> > - In guc_exec_queue_timedout_job() for GPU hang.
> > - In xe_ttm_vram_purge_page() for memory page offline, before calling
> >   xe_exec_queue_kill() or xe_vm_kill().
> > 
> > The reset_status op is updated to return u64 with the reason bitmask.
> > When a queue is banned but no explicit reason was recorded (e.g.,
> > from a
> > generic CAT error), it defaults to GPU_HANG for backward
> > compatibility.
> > A value of 0 means the exec queue is not banned.
> > 
> 
> Acked-by: José Roberto de Souza <jose.souza@intel.com>

Do we already have a userpace change with this?

Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>

Thomas, thought on this vs the watch_queue you have or they are orthogonal?

> 
> > Assisted-by: Copilot:claude-opus-4.6
> > Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
> > cc: Mrozek, Michal <michal.mrozek@intel.com>
> > cc: José Roberto de Souza <jose.souza@intel.com>
> > cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_exec_queue_types.h |  7 +++++--
> >  drivers/gpu/drm/xe/xe_execlist.c         |  4 ++--
> >  drivers/gpu/drm/xe/xe_guc_submit.c       | 24 +++++++++++++++++++---
> > --
> >  drivers/gpu/drm/xe/xe_ttm_vram_mgr.c     |  7 +++++++
> >  include/uapi/drm/xe_drm.h                | 12 +++++++++++-
> >  5 files changed, 44 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > index 2f5ccf294675..77a621da4487 100644
> > --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h
> > @@ -143,6 +143,9 @@ struct xe_exec_queue {
> >  	 */
> >  	unsigned long flags;
> >  
> > +	/** @ban_reason: Bitmask of ban reasons
> > (DRM_XE_EXEC_QUEUE_BAN_REASON_*) */
> > +	u32 ban_reason;
> > +
> >  	union {
> >  		/** @multi_gt_list: list head for VM bind engines if
> > multi-GT */
> >  		struct list_head multi_gt_list;
> > @@ -316,8 +319,8 @@ struct xe_exec_queue_ops {
> >  	 * signalled when this function is called.
> >  	 */
> >  	void (*resume)(struct xe_exec_queue *q);
> > -	/** @reset_status: check exec queue reset status */
> > -	bool (*reset_status)(struct xe_exec_queue *q);
> > +	/** @reset_status: check exec queue ban status, returns ban
> > reason bitmask */
> > +	u64 (*reset_status)(struct xe_exec_queue *q);
> >  	/** @active: check exec queue is active */
> >  	bool (*active)(struct xe_exec_queue *q);
> >  };
> > diff --git a/drivers/gpu/drm/xe/xe_execlist.c
> > b/drivers/gpu/drm/xe/xe_execlist.c
> > index 9fb99c038ea8..35e6e05ba418 100644
> > --- a/drivers/gpu/drm/xe/xe_execlist.c
> > +++ b/drivers/gpu/drm/xe/xe_execlist.c
> > @@ -452,10 +452,10 @@ static void execlist_exec_queue_resume(struct
> > xe_exec_queue *q)
> >  	/* NIY */
> >  }
> >  
> > -static bool execlist_exec_queue_reset_status(struct xe_exec_queue
> > *q)
> > +static u64 execlist_exec_queue_reset_status(struct xe_exec_queue *q)
> >  {
> >  	/* NIY */
> > -	return false;
> > +	return 0;
> >  }
> >  
> >  static bool execlist_exec_queue_active(struct xe_exec_queue *q)
> > diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
> > b/drivers/gpu/drm/xe/xe_guc_submit.c
> > index 4b247a3019d2..ff28eab7cee2 100644
> > --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> > +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> > @@ -6,6 +6,7 @@
> >  #include "xe_guc_submit.h"
> >  
> >  #include <linux/bitfield.h>
> > +#include <uapi/drm/xe_drm.h>
> >  #include <linux/bitmap.h>
> >  #include <linux/circ_buf.h>
> >  #include <linux/dma-fence-array.h>
> > @@ -1530,6 +1531,7 @@ guc_exec_queue_timedout_job(struct
> > drm_sched_job *drm_job)
> >  	if (!exec_queue_killed(q))
> >  		wedged =
> > guc_submit_hint_wedged(exec_queue_to_guc(q));
> >  
> > +	q->ban_reason |= DRM_XE_EXEC_QUEUE_BAN_REASON_GPU_HANG;
> >  	set_exec_queue_banned(q);
> >  
> >  	/* Kick job / queue off hardware */
> > @@ -2211,13 +2213,25 @@ static void guc_exec_queue_resume(struct
> > xe_exec_queue *q)
> >  	xe_sched_msg_unlock(sched);
> >  }
> >  
> > -static bool guc_exec_queue_reset_status(struct xe_exec_queue *q)
> > +static u64 guc_exec_queue_reset_status(struct xe_exec_queue *q)
> >  {
> > -	if (xe_exec_queue_is_multi_queue_secondary(q) &&
> > -	   
> > guc_exec_queue_reset_status(xe_exec_queue_multi_queue_primary(q)))
> > -		return true;
> > +	if (xe_exec_queue_is_multi_queue_secondary(q)) {
> > +		u64 status = guc_exec_queue_reset_status(
> > +				xe_exec_queue_multi_queue_primary(q)
> > );
> > +		if (status)
> > +			return status;
> > +	}
> > +
> > +	if (exec_queue_reset(q) ||
> > exec_queue_killed_or_banned_or_wedged(q)) {
> > +		u64 reason = q->ban_reason;
> >  
> > -	return exec_queue_reset(q) ||
> > exec_queue_killed_or_banned_or_wedged(q);
> > +		/* If no specific reason was recorded, default to
> > GPU hang */
> > +		if (!reason)
> > +			reason =
> > DRM_XE_EXEC_QUEUE_BAN_REASON_GPU_HANG;
> > +		return reason;
> > +	}
> > +
> > +	return 0;
> >  }
> >  
> >  static bool guc_exec_queue_active(struct xe_exec_queue *q)
> > diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > index 35b5eaf590fa..3765e8fcdcec 100644
> > --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > @@ -7,6 +7,7 @@
> >  #include <drm/drm_managed.h>
> >  #include <drm/drm_drv.h>
> >  #include <drm/drm_buddy.h>
> > +#include <uapi/drm/xe_drm.h>
> >  
> >  #include <drm/ttm/ttm_placement.h>
> >  #include <drm/ttm/ttm_range_manager.h>
> > @@ -537,10 +538,15 @@ static int xe_ttm_vram_purge_page(struct
> > xe_device *xe, struct xe_bo *bo)
> >  	xe_bo_unlock(bo);
> >  	/*  Ban VM if BO is PPGTT */
> >  	if (vm && (flags & XE_BO_FLAG_PAGETABLE)) {
> > +		struct xe_exec_queue *eq;
> > +
> >  		down_write(&vm->lock);
> > +		list_for_each_entry(eq, &vm->preempt.exec_queues,
> > lr.link)
> > +			eq->ban_reason |=
> > DRM_XE_EXEC_QUEUE_BAN_REASON_PAGE_OFFLINE;
> >  		xe_vm_kill(vm, true);
> >  		up_write(&vm->lock);
> >  	}
> > +
> >  	if (vm)
> >  		xe_vm_put(vm);
> >  
> > @@ -548,6 +554,7 @@ static int xe_ttm_vram_purge_page(struct
> > xe_device *xe, struct xe_bo *bo)
> >  	/*  Ban exec queue if BO is lrc */
> >  	if (bo->q && xe_exec_queue_get_unless_zero(bo->q)) {
> >  		/* ban queue */
> > +		bo->q->ban_reason |=
> > DRM_XE_EXEC_QUEUE_BAN_REASON_PAGE_OFFLINE;
> >  		xe_exec_queue_kill(bo->q);
> >  		xe_exec_queue_put(bo->q);
> >  	}
> > diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> > index 48e9f1fdb78d..904d58b039fe 100644
> > --- a/include/uapi/drm/xe_drm.h
> > +++ b/include/uapi/drm/xe_drm.h
> > @@ -1503,7 +1503,17 @@ struct drm_xe_exec_queue_get_property {
> >  	/** @property: property to get */
> >  	__u32 property;
> >  
> > -	/** @value: property value */
> > +	/**
> > +	 * @value: property value
> > +	 *
> > +	 * For %DRM_XE_EXEC_QUEUE_GET_PROPERTY_BAN, this is a
> > bitmask of:
> > +	 *  - %DRM_XE_EXEC_QUEUE_BAN_REASON_GPU_HANG - banned due to
> > GPU hang/timeout
> > +	 *  - %DRM_XE_EXEC_QUEUE_BAN_REASON_PAGE_OFFLINE - banned
> > due to memory page offline
> > +	 *
> > +	 * Value of 0 means the exec queue is not banned.
> > +	 */
> > +#define DRM_XE_EXEC_QUEUE_BAN_REASON_GPU_HANG		(1 << 0)
> > +#define DRM_XE_EXEC_QUEUE_BAN_REASON_PAGE_OFFLINE	(1 << 1)
> >  	__u64 value;
> >  
> >  	/** @reserved: Reserved */

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-06-09  0:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-05 12:38 [PATCH V11 00/12] Add memory page offlining support Tejas Upadhyay
2026-06-05 12:38 ` [PATCH V11 01/12] drm/xe: Link VRAM object with gpu buddy Tejas Upadhyay
2026-06-05 12:38 ` [PATCH V11 02/12] drm/buddy: Integrate lockdep annotations for gpu buddy manager Tejas Upadhyay
2026-06-05 12:38 ` [PATCH V11 03/12] drm/gpu: Add gpu_buddy_allocated_addr_to_block helper Tejas Upadhyay
2026-06-05 12:38 ` [PATCH V11 04/12] drm/xe: Link LRC BO and its execution Queue Tejas Upadhyay
2026-06-05 12:38 ` [PATCH V11 05/12] drm/xe: Extend BO purge to handle vram pages as well Tejas Upadhyay
2026-06-05 12:38 ` [PATCH V11 06/12] drm/xe: Handle physical memory address error Tejas Upadhyay
2026-06-05 12:38 ` [PATCH V11 07/12] drm/xe/cri: Add debugfs to inject faulty vram address Tejas Upadhyay
2026-06-05 12:38 ` [PATCH V11 08/12] gpu/buddy: Add routine to dump allocated buddy blocks Tejas Upadhyay
2026-06-05 12:38 ` [PATCH V11 09/12] drm/xe/configfs: Add vram bad page reservation policy Tejas Upadhyay
2026-06-05 12:38 ` [PATCH V11 10/12] drm/xe/cri: Add sysfs interface for bad gpu vram pages Tejas Upadhyay
2026-06-05 12:38 ` [PATCH V11 11/12] drm/xe/uapi: Expose ban reason in EXEC_QUEUE_GET_PROPERTY_BAN Tejas Upadhyay
2026-06-08 14:03   ` Souza, Jose
2026-06-09  0:37     ` Rodrigo Vivi
2026-06-05 12:38 ` [PATCH V11 12/12] drm/xe: Add soft/hard offline mode for VRAM page retirement Tejas Upadhyay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox