[RFC PATCH V4 0/7] Add memory page offlining support

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH V4 0/7] Add memory page offlining support
@ 2026-02-27 13:44 Tejas Upadhyay
  2026-02-27 13:44 ` [RFC PATCH V4 1/7] drm/xe/svm: Use res_to_mem_region Tejas Upadhyay
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Tejas Upadhyay @ 2026-02-27 13:44 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.auld, thomas.hellstrom, matthew.brost, Tejas Upadhyay

This functionality represents a significant step in making
the xe driver gracefully handle hardware memory degradation.
By integrating with the DRM Buddy allocator, the driver
can permanently "carve out" faulty memory so it isn't reused
by subsequent allocations.

This series adds memory page offlining support with following:
1. drm/xe/svm: Use xe_vram_addr_to_region, avoid block->private usage
2. Link and track ttm BO's with physical addresses
3. Handle the generated physical address error by reserving addresses 4K page
4. Adds supporting debugfs to inject manual physcal address error
5. Add buddy block allocation dump for debuggin buddy related issues
6. Sysfs entry to provide statistics of bad gpu vram pages for user info
7. Add configfs for vram bad page reservation policy

Opens:
1. in case of faulty address in critical bo, wedge or go for reset via system controller, in
discussion

V4: API reworks, add configfs for policy reservation and apply config everywhere
V3: use res_to_mem_region to avoid use of block->private (MattA)
V2:
- some fixes and clean up on errors
- Added xe_vram_addr_to_region helper to avoid other use of block->private (MattB)

Tejas Upadhyay (7):
  drm/xe/svm: Use res_to_mem_region
  drm/xe: Implement VRAM object tracking ability using physical address
  drm/xe: Handle physical memory address error
  [DO_NOT_REVIEW]]drm/xe/cri: Add debugfs to inject faulty vram address
  drm/buddy: Add routine to dump allocated buddy blocks
  drm/xe/cri: Add sysfs interface for bad gpu vram pages
  drm/xe/configfs: Add vram bad page reservation policy

 drivers/gpu/drm/drm_buddy.c                |  43 +++
 drivers/gpu/drm/xe/xe_bo.c                 |  18 +-
 drivers/gpu/drm/xe/xe_bo.h                 |   1 +
 drivers/gpu/drm/xe/xe_configfs.c           |  64 +++-
 drivers/gpu/drm/xe/xe_configfs.h           |   2 +
 drivers/gpu/drm/xe/xe_debugfs.c            |  49 +++
 drivers/gpu/drm/xe/xe_device.c             |  41 +++
 drivers/gpu/drm/xe/xe_device_sysfs.c       |   7 +
 drivers/gpu/drm/xe/xe_svm.c                |  10 +-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c       | 363 +++++++++++++++++++++
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.h       |   3 +-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h |  25 ++
 12 files changed, 611 insertions(+), 15 deletions(-)

-- 
2.52.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH V4 1/7] drm/xe/svm: Use res_to_mem_region
  2026-02-27 13:44 [RFC PATCH V4 0/7] Add memory page offlining support Tejas Upadhyay
@ 2026-02-27 13:44 ` Tejas Upadhyay
  2026-02-27 13:44 ` [RFC PATCH V4 2/7] drm/xe: Implement VRAM object tracking ability using physical address Tejas Upadhyay
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Tejas Upadhyay @ 2026-02-27 13:44 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.auld, thomas.hellstrom, matthew.brost, Tejas Upadhyay

Replace the direct use of block->private with the helper function
res_to_mem_region to get vram region.

V3(MattB): kernel-doc and move declarations out of loop
V2(MattA): Use res_to_mem_region

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c  | 18 +++++++++++++-----
 drivers/gpu/drm/xe/xe_bo.h  |  1 +
 drivers/gpu/drm/xe/xe_svm.c | 10 ++--------
 3 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index d6c2cb959cdd..e276eb91d54a 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -173,7 +173,15 @@ mem_type_to_migrate(struct xe_device *xe, u32 mem_type)
 	return tile->migrate;
 }
 
-static struct xe_vram_region *res_to_mem_region(struct ttm_resource *res)
+/**
+ * xe_get_memory_region_from_resource - find memory region from resource used by BO
+ * @res: The ttm_resource used by BO
+ *
+ * Extract memory region from resource used by BO
+ *
+ * Returns: A pointer to struct xe_vram_region
+ */
+struct xe_vram_region *xe_get_memory_region_from_resource(struct ttm_resource *res)
 {
 	struct xe_device *xe = ttm_to_xe_device(res->bo->bdev);
 	struct ttm_resource_manager *mgr;
@@ -637,7 +645,7 @@ static int xe_ttm_io_mem_reserve(struct ttm_device *bdev,
 		return 0;
 	case XE_PL_VRAM0:
 	case XE_PL_VRAM1: {
-		struct xe_vram_region *vram = res_to_mem_region(mem);
+		struct xe_vram_region *vram = xe_get_memory_region_from_resource(mem);
 
 		if (!xe_ttm_resource_visible(mem))
 			return -EINVAL;
@@ -1506,7 +1514,7 @@ static unsigned long xe_ttm_io_mem_pfn(struct ttm_buffer_object *ttm_bo,
 	if (ttm_bo->resource->mem_type == XE_PL_STOLEN)
 		return xe_ttm_stolen_io_offset(bo, page_offset << PAGE_SHIFT) >> PAGE_SHIFT;
 
-	vram = res_to_mem_region(ttm_bo->resource);
+	vram = xe_get_memory_region_from_resource(ttm_bo->resource);
 	xe_res_first(ttm_bo->resource, (u64)page_offset << PAGE_SHIFT, 0, &cursor);
 	return (vram->io_start + cursor.start) >> PAGE_SHIFT;
 }
@@ -1658,7 +1666,7 @@ static int xe_ttm_access_memory(struct ttm_buffer_object *ttm_bo,
 		goto out;
 	}
 
-	vram = res_to_mem_region(ttm_bo->resource);
+	vram = xe_get_memory_region_from_resource(ttm_bo->resource);
 	xe_res_first(ttm_bo->resource, offset & PAGE_MASK,
 		     xe_bo_size(bo) - (offset & PAGE_MASK), &cursor);
 
@@ -2752,7 +2760,7 @@ uint64_t vram_region_gpu_offset(struct ttm_resource *res)
 	case XE_PL_SYSTEM:
 		return 0;
 	default:
-		return res_to_mem_region(res)->dpa_base;
+		return xe_get_memory_region_from_resource(res)->dpa_base;
 	}
 	return 0;
 }
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index c914ab719f20..7ef6e0d87e68 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -311,6 +311,7 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
 		      struct drm_mode_create_dumb *args);
 
 bool xe_bo_needs_ccs_pages(struct xe_bo *bo);
+struct xe_vram_region *xe_get_memory_region_from_resource(struct ttm_resource *res);
 
 static inline size_t xe_bo_ccs_pages_start(struct xe_bo *bo)
 {
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 78f4b2c60670..4c52c59bad41 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -758,12 +758,12 @@ static int xe_svm_populate_devmem_pfn(struct drm_pagemap_devmem *devmem_allocati
 	struct xe_bo *bo = to_xe_bo(devmem_allocation);
 	struct ttm_resource *res = bo->ttm.resource;
 	struct list_head *blocks = &to_xe_ttm_vram_mgr_resource(res)->blocks;
+	struct xe_vram_region *vr = xe_get_memory_region_from_resource(res);
+	struct drm_buddy *buddy = vram_to_buddy(vr);
 	struct drm_buddy_block *block;
 	int j = 0;
 
 	list_for_each_entry(block, blocks, link) {
-		struct xe_vram_region *vr = block->private;
-		struct drm_buddy *buddy = vram_to_buddy(vr);
 		u64 block_pfn = block_offset_to_pfn(devmem_allocation->dpagemap,
 						    drm_buddy_block_offset(block));
 		int i;
@@ -1033,9 +1033,7 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
 	struct dma_fence *pre_migrate_fence = NULL;
 	struct xe_device *xe = vr->xe;
 	struct device *dev = xe->drm.dev;
-	struct drm_buddy_block *block;
 	struct xe_validation_ctx vctx;
-	struct list_head *blocks;
 	struct drm_exec exec;
 	struct xe_bo *bo;
 	int err = 0, idx;
@@ -1072,10 +1070,6 @@ static int xe_drm_pagemap_populate_mm(struct drm_pagemap *dpagemap,
 					&dpagemap_devmem_ops, dpagemap, end - start,
 					pre_migrate_fence);
 
-		blocks = &to_xe_ttm_vram_mgr_resource(bo->ttm.resource)->blocks;
-		list_for_each_entry(block, blocks, link)
-			block->private = vr;
-
 		xe_bo_get(bo);
 
 		/* Ensure the device has a pm ref while there are device pages active. */
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH V4 2/7] drm/xe: Implement VRAM object tracking ability using physical address
  2026-02-27 13:44 [RFC PATCH V4 0/7] Add memory page offlining support Tejas Upadhyay
  2026-02-27 13:44 ` [RFC PATCH V4 1/7] drm/xe/svm: Use res_to_mem_region Tejas Upadhyay
@ 2026-02-27 13:44 ` Tejas Upadhyay
  2026-02-27 13:44 ` [RFC PATCH V4 3/7] drm/xe: Handle physical memory address error Tejas Upadhyay
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Tejas Upadhyay @ 2026-02-27 13:44 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.auld, thomas.hellstrom, matthew.brost, Tejas Upadhyay

Implement the capability to track and identify TTM buffer objects
using a specific faulty memory address in VRAM. This functionality
is critical for supporting the memory page offline feature on CRI,
where identified faulty pages must be traced back to their
originating buffer for safe removal.

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 75 ++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.h |  2 +-
 2 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index d6aa61e55f4d..4e852eed5170 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -56,6 +56,7 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
 	u64 size, min_page_size;
 	unsigned long lpfn;
 	int err;
+	struct drm_buddy_block *block;
 
 	lpfn = place->lpfn;
 	if (!lpfn || lpfn > man->size >> PAGE_SHIFT)
@@ -137,6 +138,8 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man,
 	}
 
 	mgr->visible_avail -= vres->used_visible_size;
+	list_for_each_entry(block, &vres->blocks, link)
+		block->private = tbo;
 	mutex_unlock(&mgr->lock);
 
 	if (!(vres->base.placement & TTM_PL_FLAG_CONTIGUOUS) &&
@@ -467,3 +470,75 @@ u64 xe_ttm_vram_get_avail(struct ttm_resource_manager *man)
 
 	return avail;
 }
+
+static inline bool overlaps(u64 s1, u64 e1, u64 s2, u64 e2)
+{
+	return s1 <= e2 && e1 >= s2;
+}
+
+static inline bool contains(u64 s1, u64 e1, u64 s2, u64 e2)
+{
+	return s1 <= s2 && e1 <= e2;
+}
+
+static struct ttm_buffer_object *xe_ttm_vram_addr_to_tbo(struct drm_buddy *mm, u64 start)
+{
+	struct drm_buddy_block *block;
+	u64 end;
+	LIST_HEAD(dfs);
+	int i;
+
+	end = start + SZ_4K - 1;
+	for (i = 0; i < mm->n_roots; ++i)
+		list_add_tail(&mm->roots[i]->tmp_link, &dfs);
+
+	do {
+		u64 block_start;
+		u64 block_end;
+
+		block = list_first_entry_or_null(&dfs,
+						 struct drm_buddy_block,
+						 tmp_link);
+		if (!block)
+			break;
+
+		list_del(&block->tmp_link);
+
+		block_start = drm_buddy_block_offset(block);
+		block_end = block_start + drm_buddy_block_size(mm, block) - 1;
+
+		if (!overlaps(start, end, block_start, block_end))
+			continue;
+
+		if (contains(start, end, block_start, block_end) &&
+		    !drm_buddy_block_is_split(block)) {
+			if (drm_buddy_block_is_free(block)) {
+				return NULL;
+			} else if (drm_buddy_block_is_allocated(block) && !mm->clear_avail) {
+				struct ttm_buffer_object *tbo = block->private;
+
+				WARN_ON(!tbo);
+				return tbo;
+			}
+		}
+
+		if (drm_buddy_block_is_split(block)) {
+			list_add(&block->right->tmp_link, &dfs);
+			list_add(&block->left->tmp_link, &dfs);
+		}
+	} while (1);
+
+	return NULL;
+}
+
+int xe_ttm_tbo_handle_addr_fault(struct xe_tile *tile, unsigned long addr)
+{
+	struct xe_ttm_vram_mgr *vram_mgr = &tile->mem.vram->ttm;
+	struct drm_buddy mm = vram_mgr->mm;
+	struct ttm_buffer_object *tbo;
+
+	tbo = xe_ttm_vram_addr_to_tbo(&mm, addr);
+
+	return 0;
+}
+EXPORT_SYMBOL(xe_ttm_tbo_handle_addr_fault);
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
index 87b7fae5edba..1d6075411ebf 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
@@ -30,7 +30,7 @@ u64 xe_ttm_vram_get_avail(struct ttm_resource_manager *man);
 u64 xe_ttm_vram_get_cpu_visible_size(struct ttm_resource_manager *man);
 void xe_ttm_vram_get_used(struct ttm_resource_manager *man,
 			  u64 *used, u64 *used_visible);
-
+int xe_ttm_tbo_handle_addr_fault(struct xe_tile *tile, unsigned long addr);
 static inline struct xe_ttm_vram_mgr_resource *
 to_xe_ttm_vram_mgr_resource(struct ttm_resource *res)
 {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH V4 3/7] drm/xe: Handle physical memory address error
  2026-02-27 13:44 [RFC PATCH V4 0/7] Add memory page offlining support Tejas Upadhyay
  2026-02-27 13:44 ` [RFC PATCH V4 1/7] drm/xe/svm: Use res_to_mem_region Tejas Upadhyay
  2026-02-27 13:44 ` [RFC PATCH V4 2/7] drm/xe: Implement VRAM object tracking ability using physical address Tejas Upadhyay
@ 2026-02-27 13:44 ` Tejas Upadhyay
  2026-03-02  5:11   ` Aravind Iddamsetty
  2026-02-27 13:44 ` [RFC PATCH V4 4/7] [DO_NOT_REVIEW]]drm/xe/cri: Add debugfs to inject faulty vram address Tejas Upadhyay
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Tejas Upadhyay @ 2026-02-27 13:44 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.auld, thomas.hellstrom, matthew.brost, Tejas Upadhyay

This functionality represents a significant step in making
the xe driver gracefully handle hardware memory degradation.
By integrating with the DRM Buddy allocator, the driver
can permanently "carve out" faulty memory so it isn't reused
by subsequent allocations.

Buddy Block Reservation:
----------------------
When a memory address is reported as faulty, the driver instructs
the DRM Buddy allocator to reserve a block of the specific page
size (typically 4KB). This marks the memory as "dirty/used"
indefinitely.

Two-Stage Tracking:
-----------------
Offlined Pages:
Pages that have been successfully isolated and removed from the
available memory pool.

Queued Pages:
Addresses that have been flagged as faulty but are currently in
use by a process. These are tracked until the associated buffer
object (BO) is released or migrated, at which point they move
to the "offlined" state.

Sysfs Reporting:
--------------
The patch exposes these metrics through a standard interface,
allowing administrators to monitor VRAM health:
/sys/bus/pci/devices/<device_id>/vram_bad_bad_pages

V3:
-rename api, remove tile dependency and add status of reservation
V2:
- Fix mm->avail counter issue
- Remove unused code and handle clean up in case of error

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c       | 214 ++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.h       |   2 +-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h |  23 +++
 3 files changed, 231 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index 4e852eed5170..42d531b1dabf 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -276,6 +276,26 @@ static const struct ttm_resource_manager_func xe_ttm_vram_mgr_func = {
 	.debug	= xe_ttm_vram_mgr_debug
 };
 
+static void xe_ttm_vram_free_bad_pages(struct drm_device *dev, struct xe_ttm_vram_mgr *mgr)
+{
+	struct xe_ttm_offline_resource *pos, *n;
+
+	mutex_lock(&mgr->lock);
+	list_for_each_entry_safe(pos, n, &mgr->offlined_pages, offlined_link) {
+		--mgr->n_offlined_pages;
+		drm_buddy_free_list(&mgr->mm, &pos->blocks, 0);
+		mgr->visible_avail += pos->used_visible_size;
+		list_del(&pos->offlined_link);
+		kfree(pos);
+	}
+	list_for_each_entry_safe(pos, n, &mgr->queued_pages, queued_link) {
+		list_del(&pos->queued_link);
+		mgr->n_queued_pages--;
+		kfree(pos);
+	}
+	mutex_unlock(&mgr->lock);
+}
+
 static void xe_ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
 {
 	struct xe_device *xe = to_xe_device(dev);
@@ -287,6 +307,8 @@ static void xe_ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
 	if (ttm_resource_manager_evict_all(&xe->ttm, man))
 		return;
 
+	xe_ttm_vram_free_bad_pages(dev, mgr);
+
 	WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
 
 	drm_buddy_fini(&mgr->mm);
@@ -315,6 +337,8 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
 	man->func = &xe_ttm_vram_mgr_func;
 	mgr->mem_type = mem_type;
 	mutex_init(&mgr->lock);
+	INIT_LIST_HEAD(&mgr->offlined_pages);
+	INIT_LIST_HEAD(&mgr->queued_pages);
 	mgr->default_page_size = default_page_size;
 	mgr->visible_size = io_size;
 	mgr->visible_avail = io_size;
@@ -531,14 +555,190 @@ static struct ttm_buffer_object *xe_ttm_vram_addr_to_tbo(struct drm_buddy *mm, u
 	return NULL;
 }
 
-int xe_ttm_tbo_handle_addr_fault(struct xe_tile *tile, unsigned long addr)
+static int xe_ttm_vram_reserve_page_at_addr(struct xe_device *xe, unsigned long addr,
+					    struct xe_ttm_vram_mgr *vram_mgr, struct drm_buddy *mm)
 {
-	struct xe_ttm_vram_mgr *vram_mgr = &tile->mem.vram->ttm;
-	struct drm_buddy mm = vram_mgr->mm;
-	struct ttm_buffer_object *tbo;
+	int ret = 0;
+	u64 size = SZ_4K;
+	struct ttm_buffer_object *tbo = NULL;
+	struct xe_ttm_offline_resource *nentry;
+	enum reserve_status {
+		pending = 0,
+		fail
+	};
+
+	mutex_lock(&vram_mgr->lock);
+	tbo = xe_ttm_vram_addr_to_tbo(mm, addr);
+
+	nentry = kzalloc(sizeof(*nentry), GFP_KERNEL);
+	if (!nentry)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&nentry->blocks);
+	nentry->status = pending;
+
+	if (tbo) {
+		struct xe_ttm_vram_mgr_resource *pvres;
+		struct ttm_placement place = {};
+		struct ttm_operation_ctx ctx = {
+			.interruptible = false,
+			.gfp_retry_mayfail = false,
+		};
+		bool locked;
+		struct xe_ttm_offline_resource *pos, *n;
+		struct xe_bo *pbo = ttm_to_xe_bo(tbo);
+
+		xe_bo_get(pbo);
+		/* Critical kernel BO? */
+		if (pbo->ttm.type == ttm_bo_type_kernel &&
+		    !(pbo->flags & XE_BO_FLAG_FORCE_USER_VRAM)) {
+			mutex_unlock(&vram_mgr->lock);
+			kfree(nentry);
+			xe_ttm_vram_free_bad_pages(&xe->drm, vram_mgr);
+			xe_bo_put(pbo);
+			drm_warn(&xe->drm,
+				 "%s: corrupt addr: 0x%lx in critical kernel bo, wedge now\n",
+				__func__, addr);
+			/* Wedge the device */
+			xe_device_declare_wedged(xe);
+			return -EIO;
+		}
+		pvres = to_xe_ttm_vram_mgr_resource(pbo->ttm.resource);
+		nentry->id = ++vram_mgr->n_queued_pages;
+		nentry->blocks = pvres->blocks;
+		list_add(&nentry->queued_link, &vram_mgr->queued_pages);
+		mutex_unlock(&vram_mgr->lock);
+
+		/* Purge BO containing address */
+		spin_lock(&pbo->ttm.bdev->lru_lock);
+		locked = dma_resv_trylock(pbo->ttm.base.resv);
+		spin_unlock(&pbo->ttm.bdev->lru_lock);
+		WARN_ON(!locked);
+		ret = ttm_bo_validate(&pbo->ttm, &place, &ctx);
+		drm_WARN_ON(&xe->drm, ret);
+		xe_bo_put(pbo);
+		if (locked)
+			dma_resv_unlock(pbo->ttm.base.resv);
+
+		/* Reserve page at address addr*/
+		mutex_lock(&vram_mgr->lock);
+		ret = drm_buddy_alloc_blocks(mm, addr, addr + size,
+					     size, size, &nentry->blocks,
+					     DRM_BUDDY_RANGE_ALLOCATION);
+
+		if (ret) {
+			drm_warn(&xe->drm, "Could not reserve page at addr:0x%lx, ret:%d\n",
+				 addr, ret);
+			nentry->status = fail;
+			mutex_unlock(&vram_mgr->lock);
+			return ret;
+		}
+		if ((addr + size) <= vram_mgr->visible_size) {
+			nentry->used_visible_size = size;
+		} else {
+			struct drm_buddy_block *block;
 
-	tbo = xe_ttm_vram_addr_to_tbo(&mm, addr);
+			list_for_each_entry(block, &nentry->blocks, link) {
+				u64 start = drm_buddy_block_offset(block);
 
-	return 0;
+				if (start < vram_mgr->visible_size) {
+					u64 end = start + drm_buddy_block_size(mm, block);
+
+					nentry->used_visible_size +=
+						min(end, vram_mgr->visible_size) - start;
+				}
+			}
+		}
+		vram_mgr->visible_avail -= nentry->used_visible_size;
+		list_for_each_entry_safe(pos, n, &vram_mgr->queued_pages, queued_link) {
+			if (pos->id == nentry->id) {
+				--vram_mgr->n_queued_pages;
+				list_del(&pos->queued_link);
+				break;
+			}
+		}
+		list_add(&nentry->offlined_link, &vram_mgr->offlined_pages);
+		/* TODO: FW Integration: Send command to FW for offlining page */
+		++vram_mgr->n_offlined_pages;
+		mutex_unlock(&vram_mgr->lock);
+		return ret;
+
+	} else {
+		ret = drm_buddy_alloc_blocks(mm, addr, addr + size,
+					     size, size, &nentry->blocks,
+					     DRM_BUDDY_RANGE_ALLOCATION);
+		if (ret) {
+			drm_warn(&xe->drm, "Could not reserve page at addr:0x%lx, ret:%d\n",
+				 addr, ret);
+			nentry->status = fail;
+			mutex_unlock(&vram_mgr->lock);
+			return ret;
+		}
+		if ((addr + size) <= vram_mgr->visible_size) {
+			nentry->used_visible_size = size;
+		} else {
+			struct drm_buddy_block *block;
+
+			list_for_each_entry(block, &nentry->blocks, link) {
+				u64 start = drm_buddy_block_offset(block);
+
+				if (start < vram_mgr->visible_size) {
+					u64 end = start + drm_buddy_block_size(mm, block);
+
+					nentry->used_visible_size +=
+						min(end, vram_mgr->visible_size) - start;
+				}
+			}
+		}
+		vram_mgr->visible_avail -= nentry->used_visible_size;
+		nentry->id = ++vram_mgr->n_offlined_pages;
+		list_add(&nentry->offlined_link, &vram_mgr->offlined_pages);
+		/* TODO: FW Integration: Send command to FW for offlining page */
+		mutex_unlock(&vram_mgr->lock);
+	}
+	/* Success */
+	return ret;
+}
+
+static struct xe_vram_region *xe_ttm_vram_addr_to_region(struct xe_device *xe,
+							 resource_size_t addr)
+{
+	struct xe_vram_region *vr;
+	struct xe_tile *tile;
+	int id;
+
+	for_each_tile(tile, xe, id) {
+		vr = tile->mem.vram;
+		if ((addr <= vr->dpa_base + vr->actual_physical_size) &&
+		    (addr + SZ_4K >= vr->dpa_base))
+			return vr;
+	}
+	return NULL;
+}
+
+/**
+ * xe_ttm_vram_handle_addr_fault - Handle vram physical address error flaged
+ * @xe: pointer to parent device
+ * @addr: physical faulty address
+ *
+ * Handle the physcial faulty address error on specific tile.
+ *
+ * Returns 0 for success, negative error code otherwise.
+ */
+int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long addr)
+{
+	struct xe_ttm_vram_mgr *vram_mgr;
+	struct xe_vram_region *vr;
+	struct drm_buddy *mm;
+	int ret;
+
+	vr = xe_ttm_vram_addr_to_region(xe, addr);
+	WARN_ON(!vr);
+	vram_mgr = &vr->ttm;
+	mm = &vram_mgr->mm;
+	 /* Reserve page at address */
+	ret = xe_ttm_vram_reserve_page_at_addr(xe, addr, vram_mgr, mm);
+	if (ret == -EIO)
+		return 0; /* success, wedged by kernel. */
+	return ret;
 }
-EXPORT_SYMBOL(xe_ttm_tbo_handle_addr_fault);
+EXPORT_SYMBOL(xe_ttm_vram_handle_addr_fault);
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
index 1d6075411ebf..8cc528434ceb 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
@@ -30,7 +30,7 @@ u64 xe_ttm_vram_get_avail(struct ttm_resource_manager *man);
 u64 xe_ttm_vram_get_cpu_visible_size(struct ttm_resource_manager *man);
 void xe_ttm_vram_get_used(struct ttm_resource_manager *man,
 			  u64 *used, u64 *used_visible);
-int xe_ttm_tbo_handle_addr_fault(struct xe_tile *tile, unsigned long addr);
+int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long addr);
 static inline struct xe_ttm_vram_mgr_resource *
 to_xe_ttm_vram_mgr_resource(struct ttm_resource *res)
 {
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
index a71e14818ec2..e1b48db27cfd 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
@@ -19,6 +19,14 @@ struct xe_ttm_vram_mgr {
 	struct ttm_resource_manager manager;
 	/** @mm: DRM buddy allocator which manages the VRAM */
 	struct drm_buddy mm;
+	/** @offlined_pages: List of offlined pages */
+	struct list_head offlined_pages;
+	/** @n_offlined_pages: Number of offlined pages */
+	u16 n_offlined_pages;
+	/** @queued_pages: List of queued pages */
+	struct list_head queued_pages;
+       /** @n_queued_pages: Number of queued pages */
+	u16 n_queued_pages;
 	/** @visible_size: Proped size of the CPU visible portion */
 	u64 visible_size;
 	/** @visible_avail: CPU visible portion still unallocated */
@@ -45,4 +53,19 @@ struct xe_ttm_vram_mgr_resource {
 	unsigned long flags;
 };
 
+struct xe_ttm_offline_resource {
+	/** @offlined_link: Link to offlined pages */
+	struct list_head offlined_link;
+	/** @queued_link: Link to queued pages */
+	struct list_head queued_link;
+	/** @blocks: list of DRM buddy blocks */
+	struct list_head blocks;
+	/** @used_visible_size: How many CPU visible bytes this resource is using */
+	u64 used_visible_size;
+	/** @id: The id of an offline resource */
+	u16 id;
+	/** @status: reservation status of resource */
+	bool status;
+};
+
 #endif
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH V4 3/7] drm/xe: Handle physical memory address error
  2026-02-27 13:44 ` [RFC PATCH V4 3/7] drm/xe: Handle physical memory address error Tejas Upadhyay
@ 2026-03-02  5:11   ` Aravind Iddamsetty
  2026-03-05  6:40     ` Upadhyay, Tejas
  0 siblings, 1 reply; 12+ messages in thread
From: Aravind Iddamsetty @ 2026-03-02  5:11 UTC (permalink / raw)
  To: Tejas Upadhyay, intel-xe
  Cc: matthew.auld, thomas.hellstrom, matthew.brost, Riana Tauro


On 27-02-2026 19:14, Tejas Upadhyay wrote:
> This functionality represents a significant step in making
> the xe driver gracefully handle hardware memory degradation.
> By integrating with the DRM Buddy allocator, the driver
> can permanently "carve out" faulty memory so it isn't reused
> by subsequent allocations.
>
> Buddy Block Reservation:
> ----------------------
> When a memory address is reported as faulty, the driver instructs
> the DRM Buddy allocator to reserve a block of the specific page
> size (typically 4KB). This marks the memory as "dirty/used"
> indefinitely.
>
> Two-Stage Tracking:
> -----------------
> Offlined Pages:
> Pages that have been successfully isolated and removed from the
> available memory pool.
>
> Queued Pages:
> Addresses that have been flagged as faulty but are currently in
> use by a process. These are tracked until the associated buffer
> object (BO) is released or migrated, at which point they move
> to the "offlined" state.
>
> Sysfs Reporting:
> --------------
> The patch exposes these metrics through a standard interface,
> allowing administrators to monitor VRAM health:
> /sys/bus/pci/devices/<device_id>/vram_bad_bad_pages
>
> V3:
> -rename api, remove tile dependency and add status of reservation
> V2:
> - Fix mm->avail counter issue
> - Remove unused code and handle clean up in case of error
>
> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_ttm_vram_mgr.c       | 214 ++++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_ttm_vram_mgr.h       |   2 +-
>  drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h |  23 +++
>  3 files changed, 231 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> index 4e852eed5170..42d531b1dabf 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> @@ -276,6 +276,26 @@ static const struct ttm_resource_manager_func xe_ttm_vram_mgr_func = {
>  	.debug	= xe_ttm_vram_mgr_debug
>  };
>  
> +static void xe_ttm_vram_free_bad_pages(struct drm_device *dev, struct xe_ttm_vram_mgr *mgr)
> +{
> +	struct xe_ttm_offline_resource *pos, *n;
> +
> +	mutex_lock(&mgr->lock);
> +	list_for_each_entry_safe(pos, n, &mgr->offlined_pages, offlined_link) {
> +		--mgr->n_offlined_pages;
> +		drm_buddy_free_list(&mgr->mm, &pos->blocks, 0);
> +		mgr->visible_avail += pos->used_visible_size;
> +		list_del(&pos->offlined_link);
> +		kfree(pos);
> +	}
> +	list_for_each_entry_safe(pos, n, &mgr->queued_pages, queued_link) {
> +		list_del(&pos->queued_link);
> +		mgr->n_queued_pages--;
> +		kfree(pos);
> +	}
> +	mutex_unlock(&mgr->lock);
> +}
> +
>  static void xe_ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
>  {
>  	struct xe_device *xe = to_xe_device(dev);
> @@ -287,6 +307,8 @@ static void xe_ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
>  	if (ttm_resource_manager_evict_all(&xe->ttm, man))
>  		return;
>  
> +	xe_ttm_vram_free_bad_pages(dev, mgr);
> +
>  	WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
>  
>  	drm_buddy_fini(&mgr->mm);
> @@ -315,6 +337,8 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
>  	man->func = &xe_ttm_vram_mgr_func;
>  	mgr->mem_type = mem_type;
>  	mutex_init(&mgr->lock);
> +	INIT_LIST_HEAD(&mgr->offlined_pages);
> +	INIT_LIST_HEAD(&mgr->queued_pages);
>  	mgr->default_page_size = default_page_size;
>  	mgr->visible_size = io_size;
>  	mgr->visible_avail = io_size;
> @@ -531,14 +555,190 @@ static struct ttm_buffer_object *xe_ttm_vram_addr_to_tbo(struct drm_buddy *mm, u
>  	return NULL;
>  }
>  
> -int xe_ttm_tbo_handle_addr_fault(struct xe_tile *tile, unsigned long addr)
> +static int xe_ttm_vram_reserve_page_at_addr(struct xe_device *xe, unsigned long addr,
> +					    struct xe_ttm_vram_mgr *vram_mgr, struct drm_buddy *mm)
>  {
> -	struct xe_ttm_vram_mgr *vram_mgr = &tile->mem.vram->ttm;
> -	struct drm_buddy mm = vram_mgr->mm;
> -	struct ttm_buffer_object *tbo;
> +	int ret = 0;
> +	u64 size = SZ_4K;
> +	struct ttm_buffer_object *tbo = NULL;
> +	struct xe_ttm_offline_resource *nentry;
> +	enum reserve_status {
> +		pending = 0,
> +		fail
> +	};
> +
> +	mutex_lock(&vram_mgr->lock);
> +	tbo = xe_ttm_vram_addr_to_tbo(mm, addr);
> +
> +	nentry = kzalloc(sizeof(*nentry), GFP_KERNEL);
> +	if (!nentry)
> +		return -ENOMEM;
> +	INIT_LIST_HEAD(&nentry->blocks);
> +	nentry->status = pending;
> +
> +	if (tbo) {
> +		struct xe_ttm_vram_mgr_resource *pvres;
> +		struct ttm_placement place = {};
> +		struct ttm_operation_ctx ctx = {
> +			.interruptible = false,
> +			.gfp_retry_mayfail = false,
> +		};
> +		bool locked;
> +		struct xe_ttm_offline_resource *pos, *n;
> +		struct xe_bo *pbo = ttm_to_xe_bo(tbo);
> +
> +		xe_bo_get(pbo);
> +		/* Critical kernel BO? */

There is a scope for recovery from KMD without relying on USER.

I believe this call will be executed as part of AER callback, so if you
had identified this case you could request for SBR and in the next boot
you can offline the page. In addition to this there shall be a check if
the address belongs to reserved memory and as well request SBR for that.

FYI , Riana.

> +		if (pbo->ttm.type == ttm_bo_type_kernel &&
> +		    !(pbo->flags & XE_BO_FLAG_FORCE_USER_VRAM)) {
> +			mutex_unlock(&vram_mgr->lock);
> +			kfree(nentry);
> +			xe_ttm_vram_free_bad_pages(&xe->drm, vram_mgr);
> +			xe_bo_put(pbo);
> +			drm_warn(&xe->drm,
> +				 "%s: corrupt addr: 0x%lx in critical kernel bo, wedge now\n",
> +				__func__, addr);
> +			/* Wedge the device */
> +			xe_device_declare_wedged(xe);
> +			return -EIO;
> +		}
> +		pvres = to_xe_ttm_vram_mgr_resource(pbo->ttm.resource);
> +		nentry->id = ++vram_mgr->n_queued_pages;
> +		nentry->blocks = pvres->blocks;
> +		list_add(&nentry->queued_link, &vram_mgr->queued_pages);
> +		mutex_unlock(&vram_mgr->lock);
> +

Also,  how will this behave if the BO is a ppgtt table, ring buffers,
LRCA etc.., will you signal fences and  ban the context?

Thanks,
Aravind.
> +		/* Purge BO containing address */
> +		spin_lock(&pbo->ttm.bdev->lru_lock);
> +		locked = dma_resv_trylock(pbo->ttm.base.resv);
> +		spin_unlock(&pbo->ttm.bdev->lru_lock);
> +		WARN_ON(!locked);
> +		ret = ttm_bo_validate(&pbo->ttm, &place, &ctx);
> +		drm_WARN_ON(&xe->drm, ret);
> +		xe_bo_put(pbo);
> +		if (locked)
> +			dma_resv_unlock(pbo->ttm.base.resv);
> +
> +		/* Reserve page at address addr*/
> +		mutex_lock(&vram_mgr->lock);
> +		ret = drm_buddy_alloc_blocks(mm, addr, addr + size,
> +					     size, size, &nentry->blocks,
> +					     DRM_BUDDY_RANGE_ALLOCATION);
> +
> +		if (ret) {
> +			drm_warn(&xe->drm, "Could not reserve page at addr:0x%lx, ret:%d\n",
> +				 addr, ret);
> +			nentry->status = fail;
> +			mutex_unlock(&vram_mgr->lock);
> +			return ret;
> +		}
> +		if ((addr + size) <= vram_mgr->visible_size) {
> +			nentry->used_visible_size = size;
> +		} else {
> +			struct drm_buddy_block *block;
>  
> -	tbo = xe_ttm_vram_addr_to_tbo(&mm, addr);
> +			list_for_each_entry(block, &nentry->blocks, link) {
> +				u64 start = drm_buddy_block_offset(block);
>  
> -	return 0;
> +				if (start < vram_mgr->visible_size) {
> +					u64 end = start + drm_buddy_block_size(mm, block);
> +
> +					nentry->used_visible_size +=
> +						min(end, vram_mgr->visible_size) - start;
> +				}
> +			}
> +		}
> +		vram_mgr->visible_avail -= nentry->used_visible_size;
> +		list_for_each_entry_safe(pos, n, &vram_mgr->queued_pages, queued_link) {
> +			if (pos->id == nentry->id) {
> +				--vram_mgr->n_queued_pages;
> +				list_del(&pos->queued_link);
> +				break;
> +			}
> +		}
> +		list_add(&nentry->offlined_link, &vram_mgr->offlined_pages);
> +		/* TODO: FW Integration: Send command to FW for offlining page */
> +		++vram_mgr->n_offlined_pages;
> +		mutex_unlock(&vram_mgr->lock);
> +		return ret;
> +
> +	} else {
> +		ret = drm_buddy_alloc_blocks(mm, addr, addr + size,
> +					     size, size, &nentry->blocks,
> +					     DRM_BUDDY_RANGE_ALLOCATION);
> +		if (ret) {
> +			drm_warn(&xe->drm, "Could not reserve page at addr:0x%lx, ret:%d\n",
> +				 addr, ret);
> +			nentry->status = fail;
> +			mutex_unlock(&vram_mgr->lock);
> +			return ret;
> +		}
> +		if ((addr + size) <= vram_mgr->visible_size) {
> +			nentry->used_visible_size = size;
> +		} else {
> +			struct drm_buddy_block *block;
> +
> +			list_for_each_entry(block, &nentry->blocks, link) {
> +				u64 start = drm_buddy_block_offset(block);
> +
> +				if (start < vram_mgr->visible_size) {
> +					u64 end = start + drm_buddy_block_size(mm, block);
> +
> +					nentry->used_visible_size +=
> +						min(end, vram_mgr->visible_size) - start;
> +				}
> +			}
> +		}
> +		vram_mgr->visible_avail -= nentry->used_visible_size;
> +		nentry->id = ++vram_mgr->n_offlined_pages;
> +		list_add(&nentry->offlined_link, &vram_mgr->offlined_pages);
> +		/* TODO: FW Integration: Send command to FW for offlining page */
> +		mutex_unlock(&vram_mgr->lock);
> +	}
> +	/* Success */
> +	return ret;
> +}
> +
> +static struct xe_vram_region *xe_ttm_vram_addr_to_region(struct xe_device *xe,
> +							 resource_size_t addr)
> +{
> +	struct xe_vram_region *vr;
> +	struct xe_tile *tile;
> +	int id;
> +
> +	for_each_tile(tile, xe, id) {
> +		vr = tile->mem.vram;
> +		if ((addr <= vr->dpa_base + vr->actual_physical_size) &&
> +		    (addr + SZ_4K >= vr->dpa_base))
> +			return vr;
> +	}
> +	return NULL;
> +}
> +
> +/**
> + * xe_ttm_vram_handle_addr_fault - Handle vram physical address error flaged
> + * @xe: pointer to parent device
> + * @addr: physical faulty address
> + *
> + * Handle the physcial faulty address error on specific tile.
> + *
> + * Returns 0 for success, negative error code otherwise.
> + */
> +int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long addr)
> +{
> +	struct xe_ttm_vram_mgr *vram_mgr;
> +	struct xe_vram_region *vr;
> +	struct drm_buddy *mm;
> +	int ret;
> +
> +	vr = xe_ttm_vram_addr_to_region(xe, addr);
> +	WARN_ON(!vr);
> +	vram_mgr = &vr->ttm;
> +	mm = &vram_mgr->mm;
> +	 /* Reserve page at address */
> +	ret = xe_ttm_vram_reserve_page_at_addr(xe, addr, vram_mgr, mm);
> +	if (ret == -EIO)
> +		return 0; /* success, wedged by kernel. */
> +	return ret;
>  }
> -EXPORT_SYMBOL(xe_ttm_tbo_handle_addr_fault);
> +EXPORT_SYMBOL(xe_ttm_vram_handle_addr_fault);
> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> index 1d6075411ebf..8cc528434ceb 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> @@ -30,7 +30,7 @@ u64 xe_ttm_vram_get_avail(struct ttm_resource_manager *man);
>  u64 xe_ttm_vram_get_cpu_visible_size(struct ttm_resource_manager *man);
>  void xe_ttm_vram_get_used(struct ttm_resource_manager *man,
>  			  u64 *used, u64 *used_visible);
> -int xe_ttm_tbo_handle_addr_fault(struct xe_tile *tile, unsigned long addr);
> +int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long addr);
>  static inline struct xe_ttm_vram_mgr_resource *
>  to_xe_ttm_vram_mgr_resource(struct ttm_resource *res)
>  {
> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> index a71e14818ec2..e1b48db27cfd 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> @@ -19,6 +19,14 @@ struct xe_ttm_vram_mgr {
>  	struct ttm_resource_manager manager;
>  	/** @mm: DRM buddy allocator which manages the VRAM */
>  	struct drm_buddy mm;
> +	/** @offlined_pages: List of offlined pages */
> +	struct list_head offlined_pages;
> +	/** @n_offlined_pages: Number of offlined pages */
> +	u16 n_offlined_pages;
> +	/** @queued_pages: List of queued pages */
> +	struct list_head queued_pages;
> +       /** @n_queued_pages: Number of queued pages */
> +	u16 n_queued_pages;
>  	/** @visible_size: Proped size of the CPU visible portion */
>  	u64 visible_size;
>  	/** @visible_avail: CPU visible portion still unallocated */
> @@ -45,4 +53,19 @@ struct xe_ttm_vram_mgr_resource {
>  	unsigned long flags;
>  };
>  
> +struct xe_ttm_offline_resource {
> +	/** @offlined_link: Link to offlined pages */
> +	struct list_head offlined_link;
> +	/** @queued_link: Link to queued pages */
> +	struct list_head queued_link;
> +	/** @blocks: list of DRM buddy blocks */
> +	struct list_head blocks;
> +	/** @used_visible_size: How many CPU visible bytes this resource is using */
> +	u64 used_visible_size;
> +	/** @id: The id of an offline resource */
> +	u16 id;
> +	/** @status: reservation status of resource */
> +	bool status;
> +};
> +
>  #endif

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [RFC PATCH V4 3/7] drm/xe: Handle physical memory address error
  2026-03-02  5:11   ` Aravind Iddamsetty
@ 2026-03-05  6:40     ` Upadhyay, Tejas
  2026-03-06 10:29       ` Aravind Iddamsetty
  2026-03-16 16:34       ` Upadhyay, Tejas
  0 siblings, 2 replies; 12+ messages in thread
From: Upadhyay, Tejas @ 2026-03-05  6:40 UTC (permalink / raw)
  To: Aravind Iddamsetty, intel-xe@lists.freedesktop.org,
	thomas.hellstrom@linux.intel.com, Brost, Matthew,
	Ghimiray, Himal Prasad
  Cc: Auld, Matthew, Tauro, Riana



> -----Original Message-----
> From: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
> Sent: 02 March 2026 10:41
> To: Upadhyay, Tejas <tejas.upadhyay@intel.com>; intel-
> xe@lists.freedesktop.org
> Cc: Auld, Matthew <matthew.auld@intel.com>;
> thomas.hellstrom@linux.intel.com; Brost, Matthew
> <matthew.brost@intel.com>; Tauro, Riana <riana.tauro@intel.com>
> Subject: Re: [RFC PATCH V4 3/7] drm/xe: Handle physical memory address
> error
> 
> 
> On 27-02-2026 19:14, Tejas Upadhyay wrote:
> > This functionality represents a significant step in making the xe
> > driver gracefully handle hardware memory degradation.
> > By integrating with the DRM Buddy allocator, the driver can
> > permanently "carve out" faulty memory so it isn't reused by subsequent
> > allocations.
> >
> > Buddy Block Reservation:
> > ----------------------
> > When a memory address is reported as faulty, the driver instructs the
> > DRM Buddy allocator to reserve a block of the specific page size
> > (typically 4KB). This marks the memory as "dirty/used"
> > indefinitely.
> >
> > Two-Stage Tracking:
> > -----------------
> > Offlined Pages:
> > Pages that have been successfully isolated and removed from the
> > available memory pool.
> >
> > Queued Pages:
> > Addresses that have been flagged as faulty but are currently in use by
> > a process. These are tracked until the associated buffer object (BO)
> > is released or migrated, at which point they move to the "offlined"
> > state.
> >
> > Sysfs Reporting:
> > --------------
> > The patch exposes these metrics through a standard interface, allowing
> > administrators to monitor VRAM health:
> > /sys/bus/pci/devices/<device_id>/vram_bad_bad_pages
> >
> > V3:
> > -rename api, remove tile dependency and add status of reservation
> > V2:
> > - Fix mm->avail counter issue
> > - Remove unused code and handle clean up in case of error
> >
> > Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_ttm_vram_mgr.c       | 214
> ++++++++++++++++++++-
> >  drivers/gpu/drm/xe/xe_ttm_vram_mgr.h       |   2 +-
> >  drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h |  23 +++
> >  3 files changed, 231 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > index 4e852eed5170..42d531b1dabf 100644
> > --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > @@ -276,6 +276,26 @@ static const struct ttm_resource_manager_func
> xe_ttm_vram_mgr_func = {
> >  	.debug	= xe_ttm_vram_mgr_debug
> >  };
> >
> > +static void xe_ttm_vram_free_bad_pages(struct drm_device *dev, struct
> > +xe_ttm_vram_mgr *mgr) {
> > +	struct xe_ttm_offline_resource *pos, *n;
> > +
> > +	mutex_lock(&mgr->lock);
> > +	list_for_each_entry_safe(pos, n, &mgr->offlined_pages, offlined_link)
> {
> > +		--mgr->n_offlined_pages;
> > +		drm_buddy_free_list(&mgr->mm, &pos->blocks, 0);
> > +		mgr->visible_avail += pos->used_visible_size;
> > +		list_del(&pos->offlined_link);
> > +		kfree(pos);
> > +	}
> > +	list_for_each_entry_safe(pos, n, &mgr->queued_pages, queued_link)
> {
> > +		list_del(&pos->queued_link);
> > +		mgr->n_queued_pages--;
> > +		kfree(pos);
> > +	}
> > +	mutex_unlock(&mgr->lock);
> > +}
> > +
> >  static void xe_ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
> > {
> >  	struct xe_device *xe = to_xe_device(dev); @@ -287,6 +307,8 @@
> static
> > void xe_ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
> >  	if (ttm_resource_manager_evict_all(&xe->ttm, man))
> >  		return;
> >
> > +	xe_ttm_vram_free_bad_pages(dev, mgr);
> > +
> >  	WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
> >
> >  	drm_buddy_fini(&mgr->mm);
> > @@ -315,6 +337,8 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe,
> struct xe_ttm_vram_mgr *mgr,
> >  	man->func = &xe_ttm_vram_mgr_func;
> >  	mgr->mem_type = mem_type;
> >  	mutex_init(&mgr->lock);
> > +	INIT_LIST_HEAD(&mgr->offlined_pages);
> > +	INIT_LIST_HEAD(&mgr->queued_pages);
> >  	mgr->default_page_size = default_page_size;
> >  	mgr->visible_size = io_size;
> >  	mgr->visible_avail = io_size;
> > @@ -531,14 +555,190 @@ static struct ttm_buffer_object
> *xe_ttm_vram_addr_to_tbo(struct drm_buddy *mm, u
> >  	return NULL;
> >  }
> >
> > -int xe_ttm_tbo_handle_addr_fault(struct xe_tile *tile, unsigned long
> > addr)
> > +static int xe_ttm_vram_reserve_page_at_addr(struct xe_device *xe,
> unsigned long addr,
> > +					    struct xe_ttm_vram_mgr
> *vram_mgr, struct drm_buddy *mm)
> >  {
> > -	struct xe_ttm_vram_mgr *vram_mgr = &tile->mem.vram->ttm;
> > -	struct drm_buddy mm = vram_mgr->mm;
> > -	struct ttm_buffer_object *tbo;
> > +	int ret = 0;
> > +	u64 size = SZ_4K;
> > +	struct ttm_buffer_object *tbo = NULL;
> > +	struct xe_ttm_offline_resource *nentry;
> > +	enum reserve_status {
> > +		pending = 0,
> > +		fail
> > +	};
> > +
> > +	mutex_lock(&vram_mgr->lock);
> > +	tbo = xe_ttm_vram_addr_to_tbo(mm, addr);
> > +
> > +	nentry = kzalloc(sizeof(*nentry), GFP_KERNEL);
> > +	if (!nentry)
> > +		return -ENOMEM;
> > +	INIT_LIST_HEAD(&nentry->blocks);
> > +	nentry->status = pending;
> > +
> > +	if (tbo) {
> > +		struct xe_ttm_vram_mgr_resource *pvres;
> > +		struct ttm_placement place = {};
> > +		struct ttm_operation_ctx ctx = {
> > +			.interruptible = false,
> > +			.gfp_retry_mayfail = false,
> > +		};
> > +		bool locked;
> > +		struct xe_ttm_offline_resource *pos, *n;
> > +		struct xe_bo *pbo = ttm_to_xe_bo(tbo);
> > +
> > +		xe_bo_get(pbo);
> > +		/* Critical kernel BO? */
> 
> There is a scope for recovery from KMD without relying on USER.
> 
> I believe this call will be executed as part of AER callback, so if you had
> identified this case you could request for SBR and in the next boot you can
> offline the page. In addition to this there shall be a check if the address belongs
> to reserved memory and as well request SBR for that.

Okay, so reserved memory wont be available for any use right?, it should go via bootup path, also we wont get any BO there so it will go in below else case.

For other critical BO's it was decided to wedge the system. @Ghimiray, Himal Prasad @Brost, Matthew @thomas.hellstrom@linux.intel.com any input here? Should we request SBR instead?

> 
> FYI , Riana.
> 
> > +		if (pbo->ttm.type == ttm_bo_type_kernel &&
> > +		    !(pbo->flags & XE_BO_FLAG_FORCE_USER_VRAM)) {
> > +			mutex_unlock(&vram_mgr->lock);
> > +			kfree(nentry);
> > +			xe_ttm_vram_free_bad_pages(&xe->drm, vram_mgr);
> > +			xe_bo_put(pbo);
> > +			drm_warn(&xe->drm,
> > +				 "%s: corrupt addr: 0x%lx in critical kernel bo,
> wedge now\n",
> > +				__func__, addr);
> > +			/* Wedge the device */
> > +			xe_device_declare_wedged(xe);
> > +			return -EIO;
> > +		}
> > +		pvres = to_xe_ttm_vram_mgr_resource(pbo->ttm.resource);
> > +		nentry->id = ++vram_mgr->n_queued_pages;
> > +		nentry->blocks = pvres->blocks;
> > +		list_add(&nentry->queued_link, &vram_mgr-
> >queued_pages);
> > +		mutex_unlock(&vram_mgr->lock);
> > +
> 
> Also,  how will this behave if the BO is a ppgtt table, ring buffers, LRCA etc..,
> will you signal fences and  ban the context?

Right, LRCA/ring buff is in GGTT, right now considered critical BO and wedging if faulty address belongs to it, instead I would need to consider it non-critical bo, purge and ban specific context who has created  you mean?
Ppgtt, is kernel BO but not critical so purging it. May be I need to take this in some proper clean up path.

Tejas
> 
> Thanks,
> Aravind.
> > +		/* Purge BO containing address */
> > +		spin_lock(&pbo->ttm.bdev->lru_lock);
> > +		locked = dma_resv_trylock(pbo->ttm.base.resv);
> > +		spin_unlock(&pbo->ttm.bdev->lru_lock);
> > +		WARN_ON(!locked);
> > +		ret = ttm_bo_validate(&pbo->ttm, &place, &ctx);
> > +		drm_WARN_ON(&xe->drm, ret);
> > +		xe_bo_put(pbo);
> > +		if (locked)
> > +			dma_resv_unlock(pbo->ttm.base.resv);
> > +
> > +		/* Reserve page at address addr*/
> > +		mutex_lock(&vram_mgr->lock);
> > +		ret = drm_buddy_alloc_blocks(mm, addr, addr + size,
> > +					     size, size, &nentry->blocks,
> > +
> DRM_BUDDY_RANGE_ALLOCATION);
> > +
> > +		if (ret) {
> > +			drm_warn(&xe->drm, "Could not reserve page at
> addr:0x%lx, ret:%d\n",
> > +				 addr, ret);
> > +			nentry->status = fail;
> > +			mutex_unlock(&vram_mgr->lock);
> > +			return ret;
> > +		}
> > +		if ((addr + size) <= vram_mgr->visible_size) {
> > +			nentry->used_visible_size = size;
> > +		} else {
> > +			struct drm_buddy_block *block;
> >
> > -	tbo = xe_ttm_vram_addr_to_tbo(&mm, addr);
> > +			list_for_each_entry(block, &nentry->blocks, link) {
> > +				u64 start = drm_buddy_block_offset(block);
> >
> > -	return 0;
> > +				if (start < vram_mgr->visible_size) {
> > +					u64 end = start +
> drm_buddy_block_size(mm, block);
> > +
> > +					nentry->used_visible_size +=
> > +						min(end, vram_mgr-
> >visible_size) - start;
> > +				}
> > +			}
> > +		}
> > +		vram_mgr->visible_avail -= nentry->used_visible_size;
> > +		list_for_each_entry_safe(pos, n, &vram_mgr->queued_pages,
> queued_link) {
> > +			if (pos->id == nentry->id) {
> > +				--vram_mgr->n_queued_pages;
> > +				list_del(&pos->queued_link);
> > +				break;
> > +			}
> > +		}
> > +		list_add(&nentry->offlined_link, &vram_mgr-
> >offlined_pages);
> > +		/* TODO: FW Integration: Send command to FW for offlining
> page */
> > +		++vram_mgr->n_offlined_pages;
> > +		mutex_unlock(&vram_mgr->lock);
> > +		return ret;
> > +
> > +	} else {
> > +		ret = drm_buddy_alloc_blocks(mm, addr, addr + size,
> > +					     size, size, &nentry->blocks,
> > +
> DRM_BUDDY_RANGE_ALLOCATION);
> > +		if (ret) {
> > +			drm_warn(&xe->drm, "Could not reserve page at
> addr:0x%lx, ret:%d\n",
> > +				 addr, ret);
> > +			nentry->status = fail;
> > +			mutex_unlock(&vram_mgr->lock);
> > +			return ret;
> > +		}
> > +		if ((addr + size) <= vram_mgr->visible_size) {
> > +			nentry->used_visible_size = size;
> > +		} else {
> > +			struct drm_buddy_block *block;
> > +
> > +			list_for_each_entry(block, &nentry->blocks, link) {
> > +				u64 start = drm_buddy_block_offset(block);
> > +
> > +				if (start < vram_mgr->visible_size) {
> > +					u64 end = start +
> drm_buddy_block_size(mm, block);
> > +
> > +					nentry->used_visible_size +=
> > +						min(end, vram_mgr-
> >visible_size) - start;
> > +				}
> > +			}
> > +		}
> > +		vram_mgr->visible_avail -= nentry->used_visible_size;
> > +		nentry->id = ++vram_mgr->n_offlined_pages;
> > +		list_add(&nentry->offlined_link, &vram_mgr-
> >offlined_pages);
> > +		/* TODO: FW Integration: Send command to FW for offlining
> page */
> > +		mutex_unlock(&vram_mgr->lock);
> > +	}
> > +	/* Success */
> > +	return ret;
> > +}
> > +
> > +static struct xe_vram_region *xe_ttm_vram_addr_to_region(struct
> xe_device *xe,
> > +							 resource_size_t addr)
> > +{
> > +	struct xe_vram_region *vr;
> > +	struct xe_tile *tile;
> > +	int id;
> > +
> > +	for_each_tile(tile, xe, id) {
> > +		vr = tile->mem.vram;
> > +		if ((addr <= vr->dpa_base + vr->actual_physical_size) &&
> > +		    (addr + SZ_4K >= vr->dpa_base))
> > +			return vr;
> > +	}
> > +	return NULL;
> > +}
> > +
> > +/**
> > + * xe_ttm_vram_handle_addr_fault - Handle vram physical address error
> > +flaged
> > + * @xe: pointer to parent device
> > + * @addr: physical faulty address
> > + *
> > + * Handle the physcial faulty address error on specific tile.
> > + *
> > + * Returns 0 for success, negative error code otherwise.
> > + */
> > +int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long
> > +addr) {
> > +	struct xe_ttm_vram_mgr *vram_mgr;
> > +	struct xe_vram_region *vr;
> > +	struct drm_buddy *mm;
> > +	int ret;
> > +
> > +	vr = xe_ttm_vram_addr_to_region(xe, addr);
> > +	WARN_ON(!vr);
> > +	vram_mgr = &vr->ttm;
> > +	mm = &vram_mgr->mm;
> > +	 /* Reserve page at address */
> > +	ret = xe_ttm_vram_reserve_page_at_addr(xe, addr, vram_mgr, mm);
> > +	if (ret == -EIO)
> > +		return 0; /* success, wedged by kernel. */
> > +	return ret;
> >  }
> > -EXPORT_SYMBOL(xe_ttm_tbo_handle_addr_fault);
> > +EXPORT_SYMBOL(xe_ttm_vram_handle_addr_fault);
> > diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> > b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> > index 1d6075411ebf..8cc528434ceb 100644
> > --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> > +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> > @@ -30,7 +30,7 @@ u64 xe_ttm_vram_get_avail(struct
> > ttm_resource_manager *man);
> >  u64 xe_ttm_vram_get_cpu_visible_size(struct ttm_resource_manager
> > *man);  void xe_ttm_vram_get_used(struct ttm_resource_manager *man,
> >  			  u64 *used, u64 *used_visible);
> > -int xe_ttm_tbo_handle_addr_fault(struct xe_tile *tile, unsigned long
> > addr);
> > +int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long
> > +addr);
> >  static inline struct xe_ttm_vram_mgr_resource *
> > to_xe_ttm_vram_mgr_resource(struct ttm_resource *res)  { diff --git
> > a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> > b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> > index a71e14818ec2..e1b48db27cfd 100644
> > --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> > +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> > @@ -19,6 +19,14 @@ struct xe_ttm_vram_mgr {
> >  	struct ttm_resource_manager manager;
> >  	/** @mm: DRM buddy allocator which manages the VRAM */
> >  	struct drm_buddy mm;
> > +	/** @offlined_pages: List of offlined pages */
> > +	struct list_head offlined_pages;
> > +	/** @n_offlined_pages: Number of offlined pages */
> > +	u16 n_offlined_pages;
> > +	/** @queued_pages: List of queued pages */
> > +	struct list_head queued_pages;
> > +       /** @n_queued_pages: Number of queued pages */
> > +	u16 n_queued_pages;
> >  	/** @visible_size: Proped size of the CPU visible portion */
> >  	u64 visible_size;
> >  	/** @visible_avail: CPU visible portion still unallocated */ @@
> > -45,4 +53,19 @@ struct xe_ttm_vram_mgr_resource {
> >  	unsigned long flags;
> >  };
> >
> > +struct xe_ttm_offline_resource {
> > +	/** @offlined_link: Link to offlined pages */
> > +	struct list_head offlined_link;
> > +	/** @queued_link: Link to queued pages */
> > +	struct list_head queued_link;
> > +	/** @blocks: list of DRM buddy blocks */
> > +	struct list_head blocks;
> > +	/** @used_visible_size: How many CPU visible bytes this resource is
> using */
> > +	u64 used_visible_size;
> > +	/** @id: The id of an offline resource */
> > +	u16 id;
> > +	/** @status: reservation status of resource */
> > +	bool status;
> > +};
> > +
> >  #endif

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH V4 3/7] drm/xe: Handle physical memory address error
  2026-03-05  6:40     ` Upadhyay, Tejas
@ 2026-03-06 10:29       ` Aravind Iddamsetty
  2026-03-16 16:34       ` Upadhyay, Tejas
  1 sibling, 0 replies; 12+ messages in thread
From: Aravind Iddamsetty @ 2026-03-06 10:29 UTC (permalink / raw)
  To: Upadhyay, Tejas, intel-xe@lists.freedesktop.org,
	thomas.hellstrom@linux.intel.com, Brost, Matthew,
	Ghimiray, Himal Prasad
  Cc: Auld, Matthew, Tauro, Riana


On 05-03-2026 12:10, Upadhyay, Tejas wrote:
>
>> -----Original Message-----
>> From: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
>> Sent: 02 March 2026 10:41
>> To: Upadhyay, Tejas <tejas.upadhyay@intel.com>; intel-
>> xe@lists.freedesktop.org
>> Cc: Auld, Matthew <matthew.auld@intel.com>;
>> thomas.hellstrom@linux.intel.com; Brost, Matthew
>> <matthew.brost@intel.com>; Tauro, Riana <riana.tauro@intel.com>
>> Subject: Re: [RFC PATCH V4 3/7] drm/xe: Handle physical memory address
>> error
>>
>>
>> On 27-02-2026 19:14, Tejas Upadhyay wrote:
>>> This functionality represents a significant step in making the xe
>>> driver gracefully handle hardware memory degradation.
>>> By integrating with the DRM Buddy allocator, the driver can
>>> permanently "carve out" faulty memory so it isn't reused by subsequent
>>> allocations.
>>>
>>> Buddy Block Reservation:
>>> ----------------------
>>> When a memory address is reported as faulty, the driver instructs the
>>> DRM Buddy allocator to reserve a block of the specific page size
>>> (typically 4KB). This marks the memory as "dirty/used"
>>> indefinitely.
>>>
>>> Two-Stage Tracking:
>>> -----------------
>>> Offlined Pages:
>>> Pages that have been successfully isolated and removed from the
>>> available memory pool.
>>>
>>> Queued Pages:
>>> Addresses that have been flagged as faulty but are currently in use by
>>> a process. These are tracked until the associated buffer object (BO)
>>> is released or migrated, at which point they move to the "offlined"
>>> state.
>>>
>>> Sysfs Reporting:
>>> --------------
>>> The patch exposes these metrics through a standard interface, allowing
>>> administrators to monitor VRAM health:
>>> /sys/bus/pci/devices/<device_id>/vram_bad_bad_pages
>>>
>>> V3:
>>> -rename api, remove tile dependency and add status of reservation
>>> V2:
>>> - Fix mm->avail counter issue
>>> - Remove unused code and handle clean up in case of error
>>>
>>> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
>>> ---
>>>  drivers/gpu/drm/xe/xe_ttm_vram_mgr.c       | 214
>> ++++++++++++++++++++-
>>>  drivers/gpu/drm/xe/xe_ttm_vram_mgr.h       |   2 +-
>>>  drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h |  23 +++
>>>  3 files changed, 231 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>>> index 4e852eed5170..42d531b1dabf 100644
>>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
>>> @@ -276,6 +276,26 @@ static const struct ttm_resource_manager_func
>> xe_ttm_vram_mgr_func = {
>>>  	.debug	= xe_ttm_vram_mgr_debug
>>>  };
>>>
>>> +static void xe_ttm_vram_free_bad_pages(struct drm_device *dev, struct
>>> +xe_ttm_vram_mgr *mgr) {
>>> +	struct xe_ttm_offline_resource *pos, *n;
>>> +
>>> +	mutex_lock(&mgr->lock);
>>> +	list_for_each_entry_safe(pos, n, &mgr->offlined_pages, offlined_link)
>> {
>>> +		--mgr->n_offlined_pages;
>>> +		drm_buddy_free_list(&mgr->mm, &pos->blocks, 0);
>>> +		mgr->visible_avail += pos->used_visible_size;
>>> +		list_del(&pos->offlined_link);
>>> +		kfree(pos);
>>> +	}
>>> +	list_for_each_entry_safe(pos, n, &mgr->queued_pages, queued_link)
>> {
>>> +		list_del(&pos->queued_link);
>>> +		mgr->n_queued_pages--;
>>> +		kfree(pos);
>>> +	}
>>> +	mutex_unlock(&mgr->lock);
>>> +}
>>> +
>>>  static void xe_ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
>>> {
>>>  	struct xe_device *xe = to_xe_device(dev); @@ -287,6 +307,8 @@
>> static
>>> void xe_ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
>>>  	if (ttm_resource_manager_evict_all(&xe->ttm, man))
>>>  		return;
>>>
>>> +	xe_ttm_vram_free_bad_pages(dev, mgr);
>>> +
>>>  	WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
>>>
>>>  	drm_buddy_fini(&mgr->mm);
>>> @@ -315,6 +337,8 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe,
>> struct xe_ttm_vram_mgr *mgr,
>>>  	man->func = &xe_ttm_vram_mgr_func;
>>>  	mgr->mem_type = mem_type;
>>>  	mutex_init(&mgr->lock);
>>> +	INIT_LIST_HEAD(&mgr->offlined_pages);
>>> +	INIT_LIST_HEAD(&mgr->queued_pages);
>>>  	mgr->default_page_size = default_page_size;
>>>  	mgr->visible_size = io_size;
>>>  	mgr->visible_avail = io_size;
>>> @@ -531,14 +555,190 @@ static struct ttm_buffer_object
>> *xe_ttm_vram_addr_to_tbo(struct drm_buddy *mm, u
>>>  	return NULL;
>>>  }
>>>
>>> -int xe_ttm_tbo_handle_addr_fault(struct xe_tile *tile, unsigned long
>>> addr)
>>> +static int xe_ttm_vram_reserve_page_at_addr(struct xe_device *xe,
>> unsigned long addr,
>>> +					    struct xe_ttm_vram_mgr
>> *vram_mgr, struct drm_buddy *mm)
>>>  {
>>> -	struct xe_ttm_vram_mgr *vram_mgr = &tile->mem.vram->ttm;
>>> -	struct drm_buddy mm = vram_mgr->mm;
>>> -	struct ttm_buffer_object *tbo;
>>> +	int ret = 0;
>>> +	u64 size = SZ_4K;
>>> +	struct ttm_buffer_object *tbo = NULL;
>>> +	struct xe_ttm_offline_resource *nentry;
>>> +	enum reserve_status {
>>> +		pending = 0,
>>> +		fail
>>> +	};
>>> +
>>> +	mutex_lock(&vram_mgr->lock);
>>> +	tbo = xe_ttm_vram_addr_to_tbo(mm, addr);
>>> +
>>> +	nentry = kzalloc(sizeof(*nentry), GFP_KERNEL);
>>> +	if (!nentry)
>>> +		return -ENOMEM;
>>> +	INIT_LIST_HEAD(&nentry->blocks);
>>> +	nentry->status = pending;
>>> +
>>> +	if (tbo) {
>>> +		struct xe_ttm_vram_mgr_resource *pvres;
>>> +		struct ttm_placement place = {};
>>> +		struct ttm_operation_ctx ctx = {
>>> +			.interruptible = false,
>>> +			.gfp_retry_mayfail = false,
>>> +		};
>>> +		bool locked;
>>> +		struct xe_ttm_offline_resource *pos, *n;
>>> +		struct xe_bo *pbo = ttm_to_xe_bo(tbo);
>>> +
>>> +		xe_bo_get(pbo);
>>> +		/* Critical kernel BO? */
>> There is a scope for recovery from KMD without relying on USER.
>>
>> I believe this call will be executed as part of AER callback, so if you had
>> identified this case you could request for SBR and in the next boot you can
>> offline the page. In addition to this there shall be a check if the address belongs
>> to reserved memory and as well request SBR for that.
> Okay, so reserved memory wont be available for any use right?, it should go via bootup path, also we wont get any BO there so it will go in below else case.
this will be a craved out memory I don't think you can have a BO and
reserve that, you should bail out immediately.
>
> For other critical BO's it was decided to wedge the system. @Ghimiray, Himal Prasad @Brost, Matthew @thomas.hellstrom@linux.intel.com any input here? Should we request SBR instead?
>
>> FYI , Riana.
>>
>>> +		if (pbo->ttm.type == ttm_bo_type_kernel &&
>>> +		    !(pbo->flags & XE_BO_FLAG_FORCE_USER_VRAM)) {
>>> +			mutex_unlock(&vram_mgr->lock);
>>> +			kfree(nentry);
>>> +			xe_ttm_vram_free_bad_pages(&xe->drm, vram_mgr);
>>> +			xe_bo_put(pbo);
>>> +			drm_warn(&xe->drm,
>>> +				 "%s: corrupt addr: 0x%lx in critical kernel bo,
>> wedge now\n",
>>> +				__func__, addr);
>>> +			/* Wedge the device */
>>> +			xe_device_declare_wedged(xe);
>>> +			return -EIO;
>>> +		}
>>> +		pvres = to_xe_ttm_vram_mgr_resource(pbo->ttm.resource);
>>> +		nentry->id = ++vram_mgr->n_queued_pages;
>>> +		nentry->blocks = pvres->blocks;
>>> +		list_add(&nentry->queued_link, &vram_mgr-
>>> queued_pages);
>>> +		mutex_unlock(&vram_mgr->lock);
>>> +
>> Also,  how will this behave if the BO is a ppgtt table, ring buffers, LRCA etc..,
>> will you signal fences and  ban the context?
> Right, LRCA/ring buff is in GGTT, right now considered critical BO and wedging if faulty address belongs to it, instead I would need to consider it non-critical bo, purge and ban specific context who has created  you mean?

Ideally yes, similar to PPGTT handling.

Thanks,
Aravind.
> Ppgtt, is kernel BO but not critical so purging it. May be I need to take this in some proper clean up path.
>
> Tejas
>> Thanks,
>> Aravind.
>>> +		/* Purge BO containing address */
>>> +		spin_lock(&pbo->ttm.bdev->lru_lock);
>>> +		locked = dma_resv_trylock(pbo->ttm.base.resv);
>>> +		spin_unlock(&pbo->ttm.bdev->lru_lock);
>>> +		WARN_ON(!locked);
>>> +		ret = ttm_bo_validate(&pbo->ttm, &place, &ctx);
>>> +		drm_WARN_ON(&xe->drm, ret);
>>> +		xe_bo_put(pbo);
>>> +		if (locked)
>>> +			dma_resv_unlock(pbo->ttm.base.resv);
>>> +
>>> +		/* Reserve page at address addr*/
>>> +		mutex_lock(&vram_mgr->lock);
>>> +		ret = drm_buddy_alloc_blocks(mm, addr, addr + size,
>>> +					     size, size, &nentry->blocks,
>>> +
>> DRM_BUDDY_RANGE_ALLOCATION);
>>> +
>>> +		if (ret) {
>>> +			drm_warn(&xe->drm, "Could not reserve page at
>> addr:0x%lx, ret:%d\n",
>>> +				 addr, ret);
>>> +			nentry->status = fail;
>>> +			mutex_unlock(&vram_mgr->lock);
>>> +			return ret;
>>> +		}
>>> +		if ((addr + size) <= vram_mgr->visible_size) {
>>> +			nentry->used_visible_size = size;
>>> +		} else {
>>> +			struct drm_buddy_block *block;
>>>
>>> -	tbo = xe_ttm_vram_addr_to_tbo(&mm, addr);
>>> +			list_for_each_entry(block, &nentry->blocks, link) {
>>> +				u64 start = drm_buddy_block_offset(block);
>>>
>>> -	return 0;
>>> +				if (start < vram_mgr->visible_size) {
>>> +					u64 end = start +
>> drm_buddy_block_size(mm, block);
>>> +
>>> +					nentry->used_visible_size +=
>>> +						min(end, vram_mgr-
>>> visible_size) - start;
>>> +				}
>>> +			}
>>> +		}
>>> +		vram_mgr->visible_avail -= nentry->used_visible_size;
>>> +		list_for_each_entry_safe(pos, n, &vram_mgr->queued_pages,
>> queued_link) {
>>> +			if (pos->id == nentry->id) {
>>> +				--vram_mgr->n_queued_pages;
>>> +				list_del(&pos->queued_link);
>>> +				break;
>>> +			}
>>> +		}
>>> +		list_add(&nentry->offlined_link, &vram_mgr-
>>> offlined_pages);
>>> +		/* TODO: FW Integration: Send command to FW for offlining
>> page */
>>> +		++vram_mgr->n_offlined_pages;
>>> +		mutex_unlock(&vram_mgr->lock);
>>> +		return ret;
>>> +
>>> +	} else {
>>> +		ret = drm_buddy_alloc_blocks(mm, addr, addr + size,
>>> +					     size, size, &nentry->blocks,
>>> +
>> DRM_BUDDY_RANGE_ALLOCATION);
>>> +		if (ret) {
>>> +			drm_warn(&xe->drm, "Could not reserve page at
>> addr:0x%lx, ret:%d\n",
>>> +				 addr, ret);
>>> +			nentry->status = fail;
>>> +			mutex_unlock(&vram_mgr->lock);
>>> +			return ret;
>>> +		}
>>> +		if ((addr + size) <= vram_mgr->visible_size) {
>>> +			nentry->used_visible_size = size;
>>> +		} else {
>>> +			struct drm_buddy_block *block;
>>> +
>>> +			list_for_each_entry(block, &nentry->blocks, link) {
>>> +				u64 start = drm_buddy_block_offset(block);
>>> +
>>> +				if (start < vram_mgr->visible_size) {
>>> +					u64 end = start +
>> drm_buddy_block_size(mm, block);
>>> +
>>> +					nentry->used_visible_size +=
>>> +						min(end, vram_mgr-
>>> visible_size) - start;
>>> +				}
>>> +			}
>>> +		}
>>> +		vram_mgr->visible_avail -= nentry->used_visible_size;
>>> +		nentry->id = ++vram_mgr->n_offlined_pages;
>>> +		list_add(&nentry->offlined_link, &vram_mgr-
>>> offlined_pages);
>>> +		/* TODO: FW Integration: Send command to FW for offlining
>> page */
>>> +		mutex_unlock(&vram_mgr->lock);
>>> +	}
>>> +	/* Success */
>>> +	return ret;
>>> +}
>>> +
>>> +static struct xe_vram_region *xe_ttm_vram_addr_to_region(struct
>> xe_device *xe,
>>> +							 resource_size_t addr)
>>> +{
>>> +	struct xe_vram_region *vr;
>>> +	struct xe_tile *tile;
>>> +	int id;
>>> +
>>> +	for_each_tile(tile, xe, id) {
>>> +		vr = tile->mem.vram;
>>> +		if ((addr <= vr->dpa_base + vr->actual_physical_size) &&
>>> +		    (addr + SZ_4K >= vr->dpa_base))
>>> +			return vr;
>>> +	}
>>> +	return NULL;
>>> +}
>>> +
>>> +/**
>>> + * xe_ttm_vram_handle_addr_fault - Handle vram physical address error
>>> +flaged
>>> + * @xe: pointer to parent device
>>> + * @addr: physical faulty address
>>> + *
>>> + * Handle the physcial faulty address error on specific tile.
>>> + *
>>> + * Returns 0 for success, negative error code otherwise.
>>> + */
>>> +int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long
>>> +addr) {
>>> +	struct xe_ttm_vram_mgr *vram_mgr;
>>> +	struct xe_vram_region *vr;
>>> +	struct drm_buddy *mm;
>>> +	int ret;
>>> +
>>> +	vr = xe_ttm_vram_addr_to_region(xe, addr);
>>> +	WARN_ON(!vr);
>>> +	vram_mgr = &vr->ttm;
>>> +	mm = &vram_mgr->mm;
>>> +	 /* Reserve page at address */
>>> +	ret = xe_ttm_vram_reserve_page_at_addr(xe, addr, vram_mgr, mm);
>>> +	if (ret == -EIO)
>>> +		return 0; /* success, wedged by kernel. */
>>> +	return ret;
>>>  }
>>> -EXPORT_SYMBOL(xe_ttm_tbo_handle_addr_fault);
>>> +EXPORT_SYMBOL(xe_ttm_vram_handle_addr_fault);
>>> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
>>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
>>> index 1d6075411ebf..8cc528434ceb 100644
>>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
>>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
>>> @@ -30,7 +30,7 @@ u64 xe_ttm_vram_get_avail(struct
>>> ttm_resource_manager *man);
>>>  u64 xe_ttm_vram_get_cpu_visible_size(struct ttm_resource_manager
>>> *man);  void xe_ttm_vram_get_used(struct ttm_resource_manager *man,
>>>  			  u64 *used, u64 *used_visible);
>>> -int xe_ttm_tbo_handle_addr_fault(struct xe_tile *tile, unsigned long
>>> addr);
>>> +int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long
>>> +addr);
>>>  static inline struct xe_ttm_vram_mgr_resource *
>>> to_xe_ttm_vram_mgr_resource(struct ttm_resource *res)  { diff --git
>>> a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
>>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
>>> index a71e14818ec2..e1b48db27cfd 100644
>>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
>>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
>>> @@ -19,6 +19,14 @@ struct xe_ttm_vram_mgr {
>>>  	struct ttm_resource_manager manager;
>>>  	/** @mm: DRM buddy allocator which manages the VRAM */
>>>  	struct drm_buddy mm;
>>> +	/** @offlined_pages: List of offlined pages */
>>> +	struct list_head offlined_pages;
>>> +	/** @n_offlined_pages: Number of offlined pages */
>>> +	u16 n_offlined_pages;
>>> +	/** @queued_pages: List of queued pages */
>>> +	struct list_head queued_pages;
>>> +       /** @n_queued_pages: Number of queued pages */
>>> +	u16 n_queued_pages;
>>>  	/** @visible_size: Proped size of the CPU visible portion */
>>>  	u64 visible_size;
>>>  	/** @visible_avail: CPU visible portion still unallocated */ @@
>>> -45,4 +53,19 @@ struct xe_ttm_vram_mgr_resource {
>>>  	unsigned long flags;
>>>  };
>>>
>>> +struct xe_ttm_offline_resource {
>>> +	/** @offlined_link: Link to offlined pages */
>>> +	struct list_head offlined_link;
>>> +	/** @queued_link: Link to queued pages */
>>> +	struct list_head queued_link;
>>> +	/** @blocks: list of DRM buddy blocks */
>>> +	struct list_head blocks;
>>> +	/** @used_visible_size: How many CPU visible bytes this resource is
>> using */
>>> +	u64 used_visible_size;
>>> +	/** @id: The id of an offline resource */
>>> +	u16 id;
>>> +	/** @status: reservation status of resource */
>>> +	bool status;
>>> +};
>>> +
>>>  #endif

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [RFC PATCH V4 3/7] drm/xe: Handle physical memory address error
  2026-03-05  6:40     ` Upadhyay, Tejas
  2026-03-06 10:29       ` Aravind Iddamsetty
@ 2026-03-16 16:34       ` Upadhyay, Tejas
  1 sibling, 0 replies; 12+ messages in thread
From: Upadhyay, Tejas @ 2026-03-16 16:34 UTC (permalink / raw)
  To: Upadhyay, Tejas, Aravind Iddamsetty,
	intel-xe@lists.freedesktop.org, thomas.hellstrom@linux.intel.com,
	Brost, Matthew, Ghimiray, Himal Prasad, Auld, Matthew
  Cc: Tauro, Riana



> -----Original Message-----
> From: Intel-xe <intel-xe-bounces@lists.freedesktop.org> On Behalf Of
> Upadhyay, Tejas
> Sent: 05 March 2026 12:10
> To: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>; intel-
> xe@lists.freedesktop.org; thomas.hellstrom@linux.intel.com; Brost, Matthew
> <matthew.brost@intel.com>; Ghimiray, Himal Prasad
> <himal.prasad.ghimiray@intel.com>
> Cc: Auld, Matthew <matthew.auld@intel.com>; Tauro, Riana
> <riana.tauro@intel.com>
> Subject: RE: [RFC PATCH V4 3/7] drm/xe: Handle physical memory address
> error
> 
> 
> 
> > -----Original Message-----
> > From: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
> > Sent: 02 March 2026 10:41
> > To: Upadhyay, Tejas <tejas.upadhyay@intel.com>; intel-
> > xe@lists.freedesktop.org
> > Cc: Auld, Matthew <matthew.auld@intel.com>;
> > thomas.hellstrom@linux.intel.com; Brost, Matthew
> > <matthew.brost@intel.com>; Tauro, Riana <riana.tauro@intel.com>
> > Subject: Re: [RFC PATCH V4 3/7] drm/xe: Handle physical memory address
> > error
> >
> >
> > On 27-02-2026 19:14, Tejas Upadhyay wrote:
> > > This functionality represents a significant step in making the xe
> > > driver gracefully handle hardware memory degradation.
> > > By integrating with the DRM Buddy allocator, the driver can
> > > permanently "carve out" faulty memory so it isn't reused by
> > > subsequent allocations.
> > >
> > > Buddy Block Reservation:
> > > ----------------------
> > > When a memory address is reported as faulty, the driver instructs
> > > the DRM Buddy allocator to reserve a block of the specific page size
> > > (typically 4KB). This marks the memory as "dirty/used"
> > > indefinitely.
> > >
> > > Two-Stage Tracking:
> > > -----------------
> > > Offlined Pages:
> > > Pages that have been successfully isolated and removed from the
> > > available memory pool.
> > >
> > > Queued Pages:
> > > Addresses that have been flagged as faulty but are currently in use
> > > by a process. These are tracked until the associated buffer object
> > > (BO) is released or migrated, at which point they move to the "offlined"
> > > state.
> > >
> > > Sysfs Reporting:
> > > --------------
> > > The patch exposes these metrics through a standard interface,
> > > allowing administrators to monitor VRAM health:
> > > /sys/bus/pci/devices/<device_id>/vram_bad_bad_pages
> > >
> > > V3:
> > > -rename api, remove tile dependency and add status of reservation
> > > V2:
> > > - Fix mm->avail counter issue
> > > - Remove unused code and handle clean up in case of error
> > >
> > > Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_ttm_vram_mgr.c       | 214
> > ++++++++++++++++++++-
> > >  drivers/gpu/drm/xe/xe_ttm_vram_mgr.h       |   2 +-
> > >  drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h |  23 +++
> > >  3 files changed, 231 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > > b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > > index 4e852eed5170..42d531b1dabf 100644
> > > --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > > +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > > @@ -276,6 +276,26 @@ static const struct ttm_resource_manager_func
> > xe_ttm_vram_mgr_func = {
> > >  	.debug	= xe_ttm_vram_mgr_debug
> > >  };
> > >
> > > +static void xe_ttm_vram_free_bad_pages(struct drm_device *dev,
> > > +struct xe_ttm_vram_mgr *mgr) {
> > > +	struct xe_ttm_offline_resource *pos, *n;
> > > +
> > > +	mutex_lock(&mgr->lock);
> > > +	list_for_each_entry_safe(pos, n, &mgr->offlined_pages,
> > > +offlined_link)
> > {
> > > +		--mgr->n_offlined_pages;
> > > +		drm_buddy_free_list(&mgr->mm, &pos->blocks, 0);
> > > +		mgr->visible_avail += pos->used_visible_size;
> > > +		list_del(&pos->offlined_link);
> > > +		kfree(pos);
> > > +	}
> > > +	list_for_each_entry_safe(pos, n, &mgr->queued_pages, queued_link)
> > {
> > > +		list_del(&pos->queued_link);
> > > +		mgr->n_queued_pages--;
> > > +		kfree(pos);
> > > +	}
> > > +	mutex_unlock(&mgr->lock);
> > > +}
> > > +
> > >  static void xe_ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
> > > {
> > >  	struct xe_device *xe = to_xe_device(dev); @@ -287,6 +307,8 @@
> > static
> > > void xe_ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
> > >  	if (ttm_resource_manager_evict_all(&xe->ttm, man))
> > >  		return;
> > >
> > > +	xe_ttm_vram_free_bad_pages(dev, mgr);
> > > +
> > >  	WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
> > >
> > >  	drm_buddy_fini(&mgr->mm);
> > > @@ -315,6 +337,8 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe,
> > struct xe_ttm_vram_mgr *mgr,
> > >  	man->func = &xe_ttm_vram_mgr_func;
> > >  	mgr->mem_type = mem_type;
> > >  	mutex_init(&mgr->lock);
> > > +	INIT_LIST_HEAD(&mgr->offlined_pages);
> > > +	INIT_LIST_HEAD(&mgr->queued_pages);
> > >  	mgr->default_page_size = default_page_size;
> > >  	mgr->visible_size = io_size;
> > >  	mgr->visible_avail = io_size;
> > > @@ -531,14 +555,190 @@ static struct ttm_buffer_object
> > *xe_ttm_vram_addr_to_tbo(struct drm_buddy *mm, u
> > >  	return NULL;
> > >  }
> > >
> > > -int xe_ttm_tbo_handle_addr_fault(struct xe_tile *tile, unsigned
> > > long
> > > addr)
> > > +static int xe_ttm_vram_reserve_page_at_addr(struct xe_device *xe,
> > unsigned long addr,
> > > +					    struct xe_ttm_vram_mgr
> > *vram_mgr, struct drm_buddy *mm)
> > >  {
> > > -	struct xe_ttm_vram_mgr *vram_mgr = &tile->mem.vram->ttm;
> > > -	struct drm_buddy mm = vram_mgr->mm;
> > > -	struct ttm_buffer_object *tbo;
> > > +	int ret = 0;
> > > +	u64 size = SZ_4K;
> > > +	struct ttm_buffer_object *tbo = NULL;
> > > +	struct xe_ttm_offline_resource *nentry;
> > > +	enum reserve_status {
> > > +		pending = 0,
> > > +		fail
> > > +	};
> > > +
> > > +	mutex_lock(&vram_mgr->lock);
> > > +	tbo = xe_ttm_vram_addr_to_tbo(mm, addr);
> > > +
> > > +	nentry = kzalloc(sizeof(*nentry), GFP_KERNEL);
> > > +	if (!nentry)
> > > +		return -ENOMEM;
> > > +	INIT_LIST_HEAD(&nentry->blocks);
> > > +	nentry->status = pending;
> > > +
> > > +	if (tbo) {
> > > +		struct xe_ttm_vram_mgr_resource *pvres;
> > > +		struct ttm_placement place = {};
> > > +		struct ttm_operation_ctx ctx = {
> > > +			.interruptible = false,
> > > +			.gfp_retry_mayfail = false,
> > > +		};
> > > +		bool locked;
> > > +		struct xe_ttm_offline_resource *pos, *n;
> > > +		struct xe_bo *pbo = ttm_to_xe_bo(tbo);
> > > +
> > > +		xe_bo_get(pbo);
> > > +		/* Critical kernel BO? */
> >
> > There is a scope for recovery from KMD without relying on USER.
> >
> > I believe this call will be executed as part of AER callback, so if
> > you had identified this case you could request for SBR and in the next
> > boot you can offline the page. In addition to this there shall be a
> > check if the address belongs to reserved memory and as well request SBR for
> that.
> 
> Okay, so reserved memory wont be available for any use right?, it should go
> via bootup path, also we wont get any BO there so it will go in below else case.
> 
> For other critical BO's it was decided to wedge the system. @Ghimiray, Himal
> Prasad @Brost, Matthew @thomas.hellstrom@linux.intel.com any input
> here? Should we request SBR instead?

I again went through official documentation and it already recommends device reset for physical address error in reserved(stolen) region. We can probably do same in case of critical kernel bo, device reset and on next boot early on based on policy, page will be reserved. Thoughts @Ghimiray, Himal Prasad @thomas.hellstrom@linux.intel.com @Auld, Matthew @Brost, Matthew? @Aravind Iddamsetty I still have a doubt, do we have ability to reserve stolen memory even on next boot !

Tejas 
> 
> >
> > FYI , Riana.
> >
> > > +		if (pbo->ttm.type == ttm_bo_type_kernel &&
> > > +		    !(pbo->flags & XE_BO_FLAG_FORCE_USER_VRAM)) {
> > > +			mutex_unlock(&vram_mgr->lock);
> > > +			kfree(nentry);
> > > +			xe_ttm_vram_free_bad_pages(&xe->drm, vram_mgr);
> > > +			xe_bo_put(pbo);
> > > +			drm_warn(&xe->drm,
> > > +				 "%s: corrupt addr: 0x%lx in critical kernel bo,
> > wedge now\n",
> > > +				__func__, addr);
> > > +			/* Wedge the device */
> > > +			xe_device_declare_wedged(xe);
> > > +			return -EIO;
> > > +		}
> > > +		pvres = to_xe_ttm_vram_mgr_resource(pbo->ttm.resource);
> > > +		nentry->id = ++vram_mgr->n_queued_pages;
> > > +		nentry->blocks = pvres->blocks;
> > > +		list_add(&nentry->queued_link, &vram_mgr-
> > >queued_pages);
> > > +		mutex_unlock(&vram_mgr->lock);
> > > +
> >
> > Also,  how will this behave if the BO is a ppgtt table, ring buffers,
> > LRCA etc.., will you signal fences and  ban the context?
> 
> Right, LRCA/ring buff is in GGTT, right now considered critical BO and wedging
> if faulty address belongs to it, instead I would need to consider it non-critical
> bo, purge and ban specific context who has created  you mean?
> Ppgtt, is kernel BO but not critical so purging it. May be I need to take this in
> some proper clean up path.
> 
> Tejas
> >
> > Thanks,
> > Aravind.
> > > +		/* Purge BO containing address */
> > > +		spin_lock(&pbo->ttm.bdev->lru_lock);
> > > +		locked = dma_resv_trylock(pbo->ttm.base.resv);
> > > +		spin_unlock(&pbo->ttm.bdev->lru_lock);
> > > +		WARN_ON(!locked);
> > > +		ret = ttm_bo_validate(&pbo->ttm, &place, &ctx);
> > > +		drm_WARN_ON(&xe->drm, ret);
> > > +		xe_bo_put(pbo);
> > > +		if (locked)
> > > +			dma_resv_unlock(pbo->ttm.base.resv);
> > > +
> > > +		/* Reserve page at address addr*/
> > > +		mutex_lock(&vram_mgr->lock);
> > > +		ret = drm_buddy_alloc_blocks(mm, addr, addr + size,
> > > +					     size, size, &nentry->blocks,
> > > +
> > DRM_BUDDY_RANGE_ALLOCATION);
> > > +
> > > +		if (ret) {
> > > +			drm_warn(&xe->drm, "Could not reserve page at
> > addr:0x%lx, ret:%d\n",
> > > +				 addr, ret);
> > > +			nentry->status = fail;
> > > +			mutex_unlock(&vram_mgr->lock);
> > > +			return ret;
> > > +		}
> > > +		if ((addr + size) <= vram_mgr->visible_size) {
> > > +			nentry->used_visible_size = size;
> > > +		} else {
> > > +			struct drm_buddy_block *block;
> > >
> > > -	tbo = xe_ttm_vram_addr_to_tbo(&mm, addr);
> > > +			list_for_each_entry(block, &nentry->blocks, link) {
> > > +				u64 start = drm_buddy_block_offset(block);
> > >
> > > -	return 0;
> > > +				if (start < vram_mgr->visible_size) {
> > > +					u64 end = start +
> > drm_buddy_block_size(mm, block);
> > > +
> > > +					nentry->used_visible_size +=
> > > +						min(end, vram_mgr-
> > >visible_size) - start;
> > > +				}
> > > +			}
> > > +		}
> > > +		vram_mgr->visible_avail -= nentry->used_visible_size;
> > > +		list_for_each_entry_safe(pos, n, &vram_mgr->queued_pages,
> > queued_link) {
> > > +			if (pos->id == nentry->id) {
> > > +				--vram_mgr->n_queued_pages;
> > > +				list_del(&pos->queued_link);
> > > +				break;
> > > +			}
> > > +		}
> > > +		list_add(&nentry->offlined_link, &vram_mgr-
> > >offlined_pages);
> > > +		/* TODO: FW Integration: Send command to FW for offlining
> > page */
> > > +		++vram_mgr->n_offlined_pages;
> > > +		mutex_unlock(&vram_mgr->lock);
> > > +		return ret;
> > > +
> > > +	} else {
> > > +		ret = drm_buddy_alloc_blocks(mm, addr, addr + size,
> > > +					     size, size, &nentry->blocks,
> > > +
> > DRM_BUDDY_RANGE_ALLOCATION);
> > > +		if (ret) {
> > > +			drm_warn(&xe->drm, "Could not reserve page at
> > addr:0x%lx, ret:%d\n",
> > > +				 addr, ret);
> > > +			nentry->status = fail;
> > > +			mutex_unlock(&vram_mgr->lock);
> > > +			return ret;
> > > +		}
> > > +		if ((addr + size) <= vram_mgr->visible_size) {
> > > +			nentry->used_visible_size = size;
> > > +		} else {
> > > +			struct drm_buddy_block *block;
> > > +
> > > +			list_for_each_entry(block, &nentry->blocks, link) {
> > > +				u64 start = drm_buddy_block_offset(block);
> > > +
> > > +				if (start < vram_mgr->visible_size) {
> > > +					u64 end = start +
> > drm_buddy_block_size(mm, block);
> > > +
> > > +					nentry->used_visible_size +=
> > > +						min(end, vram_mgr-
> > >visible_size) - start;
> > > +				}
> > > +			}
> > > +		}
> > > +		vram_mgr->visible_avail -= nentry->used_visible_size;
> > > +		nentry->id = ++vram_mgr->n_offlined_pages;
> > > +		list_add(&nentry->offlined_link, &vram_mgr-
> > >offlined_pages);
> > > +		/* TODO: FW Integration: Send command to FW for offlining
> > page */
> > > +		mutex_unlock(&vram_mgr->lock);
> > > +	}
> > > +	/* Success */
> > > +	return ret;
> > > +}
> > > +
> > > +static struct xe_vram_region *xe_ttm_vram_addr_to_region(struct
> > xe_device *xe,
> > > +							 resource_size_t addr)
> > > +{
> > > +	struct xe_vram_region *vr;
> > > +	struct xe_tile *tile;
> > > +	int id;
> > > +
> > > +	for_each_tile(tile, xe, id) {
> > > +		vr = tile->mem.vram;
> > > +		if ((addr <= vr->dpa_base + vr->actual_physical_size) &&
> > > +		    (addr + SZ_4K >= vr->dpa_base))
> > > +			return vr;
> > > +	}
> > > +	return NULL;
> > > +}
> > > +
> > > +/**
> > > + * xe_ttm_vram_handle_addr_fault - Handle vram physical address
> > > +error flaged
> > > + * @xe: pointer to parent device
> > > + * @addr: physical faulty address
> > > + *
> > > + * Handle the physcial faulty address error on specific tile.
> > > + *
> > > + * Returns 0 for success, negative error code otherwise.
> > > + */
> > > +int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned
> > > +long
> > > +addr) {
> > > +	struct xe_ttm_vram_mgr *vram_mgr;
> > > +	struct xe_vram_region *vr;
> > > +	struct drm_buddy *mm;
> > > +	int ret;
> > > +
> > > +	vr = xe_ttm_vram_addr_to_region(xe, addr);
> > > +	WARN_ON(!vr);
> > > +	vram_mgr = &vr->ttm;
> > > +	mm = &vram_mgr->mm;
> > > +	 /* Reserve page at address */
> > > +	ret = xe_ttm_vram_reserve_page_at_addr(xe, addr, vram_mgr, mm);
> > > +	if (ret == -EIO)
> > > +		return 0; /* success, wedged by kernel. */
> > > +	return ret;
> > >  }
> > > -EXPORT_SYMBOL(xe_ttm_tbo_handle_addr_fault);
> > > +EXPORT_SYMBOL(xe_ttm_vram_handle_addr_fault);
> > > diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> > > b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> > > index 1d6075411ebf..8cc528434ceb 100644
> > > --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> > > +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> > > @@ -30,7 +30,7 @@ u64 xe_ttm_vram_get_avail(struct
> > > ttm_resource_manager *man);
> > >  u64 xe_ttm_vram_get_cpu_visible_size(struct ttm_resource_manager
> > > *man);  void xe_ttm_vram_get_used(struct ttm_resource_manager *man,
> > >  			  u64 *used, u64 *used_visible); -int
> > > xe_ttm_tbo_handle_addr_fault(struct xe_tile *tile, unsigned long
> > > addr);
> > > +int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned
> > > +long addr);
> > >  static inline struct xe_ttm_vram_mgr_resource *
> > > to_xe_ttm_vram_mgr_resource(struct ttm_resource *res)  { diff --git
> > > a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> > > b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> > > index a71e14818ec2..e1b48db27cfd 100644
> > > --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> > > @@ -19,6 +19,14 @@ struct xe_ttm_vram_mgr {
> > >  	struct ttm_resource_manager manager;
> > >  	/** @mm: DRM buddy allocator which manages the VRAM */
> > >  	struct drm_buddy mm;
> > > +	/** @offlined_pages: List of offlined pages */
> > > +	struct list_head offlined_pages;
> > > +	/** @n_offlined_pages: Number of offlined pages */
> > > +	u16 n_offlined_pages;
> > > +	/** @queued_pages: List of queued pages */
> > > +	struct list_head queued_pages;
> > > +       /** @n_queued_pages: Number of queued pages */
> > > +	u16 n_queued_pages;
> > >  	/** @visible_size: Proped size of the CPU visible portion */
> > >  	u64 visible_size;
> > >  	/** @visible_avail: CPU visible portion still unallocated */ @@
> > > -45,4 +53,19 @@ struct xe_ttm_vram_mgr_resource {
> > >  	unsigned long flags;
> > >  };
> > >
> > > +struct xe_ttm_offline_resource {
> > > +	/** @offlined_link: Link to offlined pages */
> > > +	struct list_head offlined_link;
> > > +	/** @queued_link: Link to queued pages */
> > > +	struct list_head queued_link;
> > > +	/** @blocks: list of DRM buddy blocks */
> > > +	struct list_head blocks;
> > > +	/** @used_visible_size: How many CPU visible bytes this resource
> > > +is
> > using */
> > > +	u64 used_visible_size;
> > > +	/** @id: The id of an offline resource */
> > > +	u16 id;
> > > +	/** @status: reservation status of resource */
> > > +	bool status;
> > > +};
> > > +
> > >  #endif

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH V4 4/7] [DO_NOT_REVIEW]]drm/xe/cri: Add debugfs to inject faulty vram address
  2026-02-27 13:44 [RFC PATCH V4 0/7] Add memory page offlining support Tejas Upadhyay
                   ` (2 preceding siblings ...)
  2026-02-27 13:44 ` [RFC PATCH V4 3/7] drm/xe: Handle physical memory address error Tejas Upadhyay
@ 2026-02-27 13:44 ` Tejas Upadhyay
  2026-02-27 13:44 ` [RFC PATCH V4 5/7] drm/buddy: Add routine to dump allocated buddy blocks Tejas Upadhyay
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Tejas Upadhyay @ 2026-02-27 13:44 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.auld, thomas.hellstrom, matthew.brost, Tejas Upadhyay

Add debugfs which can help testing feature with manual error injection.
Adding a debugfs interface to the drm/xe driver allows manual injection
of faulty VRAM addresses, facilitating the testing of the CRI memory
page offline feature before it is fully functional. The implementation
involves creating a debugfs entry, likely under
/sys/kernel/debug/dri/bdf/invalid_addr_vram0,
to accept specific faulty addresses for validation.

For example,
echo 0x1000 > /sys/kernel/debug/dri/bdf/invalid_addr_vram0
where 0x1000 is faulty adress being injected.

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
 drivers/gpu/drm/xe/xe_debugfs.c            | 49 ++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h |  2 +
 2 files changed, 51 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_debugfs.c b/drivers/gpu/drm/xe/xe_debugfs.c
index 844cfafe1ec7..2f66da81c002 100644
--- a/drivers/gpu/drm/xe/xe_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_debugfs.c
@@ -27,6 +27,7 @@
 #include "xe_sriov_vf.h"
 #include "xe_step.h"
 #include "xe_tile_debugfs.h"
+#include "xe_ttm_vram_mgr.h"
 #include "xe_vsec.h"
 #include "xe_wa.h"
 
@@ -509,12 +510,48 @@ static const struct file_operations disable_late_binding_fops = {
 	.write = disable_late_binding_set,
 };
 
+static ssize_t addr_fault_reporting_show(struct file *f, char __user *ubuf,
+					 size_t size, loff_t *pos)
+{
+	struct xe_device *xe = file_inode(f)->i_private;
+	char buf[32];
+	int len;
+
+	len = scnprintf(buf, sizeof(buf), "%lld\n", xe->mem.vram->ttm.fault_addr);
+
+	return simple_read_from_buffer(ubuf, size, pos, buf, len);
+}
+
+static ssize_t addr_fault_reporting_set(struct file *f, const char __user *ubuf,
+					size_t size, loff_t *pos)
+{
+	struct xe_device *xe = file_inode(f)->i_private;
+	u64 addr;
+	int ret;
+
+	ret = kstrtou64_from_user(ubuf, size, 0, &addr);
+	if (ret)
+		return ret;
+
+	xe->mem.vram->ttm.fault_addr = addr;
+	xe_ttm_vram_handle_addr_fault(xe, xe->mem.vram->ttm.fault_addr);
+
+	return size;
+}
+
+static const struct file_operations addr_fault_reporting_fops = {
+	.owner = THIS_MODULE,
+	.read = addr_fault_reporting_show,
+	.write = addr_fault_reporting_set,
+};
+
 void xe_debugfs_register(struct xe_device *xe)
 {
 	struct ttm_device *bdev = &xe->ttm;
 	struct drm_minor *minor = xe->drm.primary;
 	struct dentry *root = minor->debugfs_root;
 	struct ttm_resource_manager *man;
+	u8 mem_type = XE_PL_VRAM1;
 	struct xe_tile *tile;
 	struct xe_gt *gt;
 	u8 tile_id;
@@ -565,6 +602,18 @@ void xe_debugfs_register(struct xe_device *xe)
 	if (man)
 		ttm_resource_manager_create_debugfs(man, root, "stolen_mm");
 
+	do {
+		man = ttm_manager_type(bdev, mem_type);
+		if (man) {
+			char name[20];
+
+			snprintf(name, sizeof(name), "invalid_addr_vram%d", mem_type - XE_PL_VRAM0);
+			debugfs_create_file(name, 0600, root, xe,
+					    &addr_fault_reporting_fops);
+		}
+		--mem_type;
+	} while (mem_type >= XE_PL_VRAM0);
+
 	for_each_tile(tile, xe, tile_id)
 		xe_tile_debugfs_register(tile);
 
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
index e1b48db27cfd..2c69663f540a 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
@@ -37,6 +37,8 @@ struct xe_ttm_vram_mgr {
 	struct mutex lock;
 	/** @mem_type: The TTM memory type */
 	u32 mem_type;
+	/** @fault_addr: debugfs hook for setting faulty address */
+	u64 fault_addr;
 };
 
 /**
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH V4 5/7] drm/buddy: Add routine to dump allocated buddy blocks
  2026-02-27 13:44 [RFC PATCH V4 0/7] Add memory page offlining support Tejas Upadhyay
                   ` (3 preceding siblings ...)
  2026-02-27 13:44 ` [RFC PATCH V4 4/7] [DO_NOT_REVIEW]]drm/xe/cri: Add debugfs to inject faulty vram address Tejas Upadhyay
@ 2026-02-27 13:44 ` Tejas Upadhyay
  2026-02-27 13:44 ` [RFC PATCH V4 6/7] drm/xe/cri: Add sysfs interface for bad gpu vram pages Tejas Upadhyay
  2026-02-27 13:44 ` [RFC PATCH V4 7/7] drm/xe/configfs: Add vram bad page reservation policy Tejas Upadhyay
  6 siblings, 0 replies; 12+ messages in thread
From: Tejas Upadhyay @ 2026-02-27 13:44 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.auld, thomas.hellstrom, matthew.brost, Tejas Upadhyay

To implement the ability to see allocated blocks under a specific VRAM
instance in the drm driver, new api is introduced. While existing structs
often show the free block list, this addition provides a comprehensive view
of all currently resident VRAM allocations.

Dump will look like,

[  +0.000003] xe 0000:03:00.0: [drm] 0x00000002f8000000-0x00000002f8800000: 8388608
[  +0.000005] xe 0000:03:00.0: [drm] 0x00000002f8800000-0x00000002f8840000: 262144
[  +0.000004] xe 0000:03:00.0: [drm] 0x00000002f8840000-0x00000002f8860000: 131072
[  +0.000004] xe 0000:03:00.0: [drm] 0x00000002f8860000-0x00000002f8870000: 65536
[  +0.000005] xe 0000:03:00.0: [drm] 0x00000002f9000000-0x00000002f9800000: 8388608
[  +0.000004] xe 0000:03:00.0: [drm] 0x00000002f9800000-0x00000002f9880000: 524288
[  +0.000005] xe 0000:03:00.0: [drm] 0x00000002f9880000-0x00000002f9884000: 16384
[  +0.000004] xe 0000:03:00.0: [drm] 0x00000002f9900000-0x00000002f9980000: 524288
[  +0.000005] xe 0000:03:00.0: [drm] 0x00000002f9980000-0x00000002f9988000: 32768
[  +0.000004] xe 0000:03:00.0: [drm] 0x00000002f9988000-0x00000002f998c000: 16384

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
 drivers/gpu/drm/drm_buddy.c | 43 +++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index 2f279b46bd2c..6baafbad49f2 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -1065,6 +1065,49 @@ int drm_buddy_block_trim(struct drm_buddy *mm,
 }
 EXPORT_SYMBOL(drm_buddy_block_trim);
 
+/**
+ * drm_buddy_dump_allocated_blocks - print all allocated blocks in drm buddy
+ *
+ * @dev: drm device
+ * @mm: DRM buddy manager to look into
+ * @p: drm printer to print info
+ *
+ * Looks into buddy manager for each block and their status and if allocated
+ * print allocated block range and size
+ *
+ * Returns:
+ * void
+ */
+void drm_buddy_dump_allocated_blocks(struct drm_device *dev, struct drm_buddy *mm,
+				     struct drm_printer *p)
+{
+	struct drm_buddy_block *block;
+	LIST_HEAD(dfs);
+	int i;
+
+	for (i = 0; i < mm->n_roots; ++i)
+		list_add_tail(&mm->roots[i]->tmp_link, &dfs);
+
+	do {
+		block = list_first_entry_or_null(&dfs,
+						 struct drm_buddy_block,
+						 tmp_link);
+		if (!block)
+			break;
+
+		list_del(&block->tmp_link);
+
+		if (drm_buddy_block_is_allocated(block))
+			drm_buddy_block_print(mm, block, p);
+
+		if (drm_buddy_block_is_split(block)) {
+			list_add(&block->right->tmp_link, &dfs);
+			list_add(&block->left->tmp_link, &dfs);
+		}
+	} while (1);
+}
+EXPORT_SYMBOL(drm_buddy_dump_allocated_blocks);
+
 static struct drm_buddy_block *
 __drm_buddy_alloc_blocks(struct drm_buddy *mm,
 			 u64 start, u64 end,
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH V4 6/7] drm/xe/cri: Add sysfs interface for bad gpu vram pages
  2026-02-27 13:44 [RFC PATCH V4 0/7] Add memory page offlining support Tejas Upadhyay
                   ` (4 preceding siblings ...)
  2026-02-27 13:44 ` [RFC PATCH V4 5/7] drm/buddy: Add routine to dump allocated buddy blocks Tejas Upadhyay
@ 2026-02-27 13:44 ` Tejas Upadhyay
  2026-02-27 13:44 ` [RFC PATCH V4 7/7] drm/xe/configfs: Add vram bad page reservation policy Tejas Upadhyay
  6 siblings, 0 replies; 12+ messages in thread
From: Tejas Upadhyay @ 2026-02-27 13:44 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.auld, thomas.hellstrom, matthew.brost, Tejas Upadhyay

Starting CRI, Include a sysfs interface designed to expose information
about bad VRAM pages—those identified as having hardware faults
(e.g., ECC errors). This interface allows userspace tools and
administrators to monitor the health of the GPU's local memory and
track the status of page retirement.To get details on bad gpu vram
pages can be found under /sys/bus/pci/devices/bdf/vram_bad_pages.

Where The format is, pfn : gpu page size : flags

flags:
R: reserved, this gpu page is reserved.
P: pending for reserve, this gpu page is marked as bad, will be reserved in next window of page_reserve.
F: unable to reserve. this gpu page can’t be reserved due to some reasons.

For example if you read using cat /sys/bus/pci/devices/bdf/vram_bad_pages,
0x00000000 : 0x00001000 : R
0x00001234 : 0x00001000 : P

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
 drivers/gpu/drm/xe/xe_device_sysfs.c |  4 ++
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 78 ++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.h |  1 +
 3 files changed, 83 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_device_sysfs.c b/drivers/gpu/drm/xe/xe_device_sysfs.c
index a73e0e957cb0..796dd6bedd97 100644
--- a/drivers/gpu/drm/xe/xe_device_sysfs.c
+++ b/drivers/gpu/drm/xe/xe_device_sysfs.c
@@ -14,6 +14,7 @@
 #include "xe_pcode_api.h"
 #include "xe_pcode.h"
 #include "xe_pm.h"
+#include "xe_ttm_vram_mgr.h"
 
 /**
  * DOC: Xe device sysfs
@@ -285,5 +286,8 @@ int xe_device_sysfs_init(struct xe_device *xe)
 			return ret;
 	}
 
+	if (xe->info.platform == XE_CRESCENTISLAND)
+		xe_ttm_vram_sysfs_init(xe);
+
 	return 0;
 }
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index 42d531b1dabf..417ad563e80e 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -742,3 +742,81 @@ int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long addr)
 	return ret;
 }
 EXPORT_SYMBOL(xe_ttm_vram_handle_addr_fault);
+
+static void xe_ttm_vram_dump_bad_pages_info(char *buf, struct xe_ttm_vram_mgr *mgr)
+{
+	const unsigned int element_size = sizeof("0xabcdabcd : 0x12345678 : R\n") - 1;
+	struct xe_ttm_offline_resource *pos, *n;
+	struct drm_buddy_block *block;
+	ssize_t s = 0;
+
+	mutex_lock(&mgr->lock);
+	list_for_each_entry_safe(pos, n, &mgr->offlined_pages, offlined_link) {
+		block = list_first_entry(&pos->blocks,
+					 struct drm_buddy_block,
+					 link);
+		s += scnprintf(&buf[s], element_size + 1,
+			       "0x%08llx : 0x%08llx : %1s\n",
+			       drm_buddy_block_offset(block) >> PAGE_SHIFT,
+			       drm_buddy_block_size(&mgr->mm, block),
+			       "R");
+	}
+	list_for_each_entry_safe(pos, n, &mgr->queued_pages, queued_link) {
+		block = list_first_entry(&pos->blocks,
+					 struct drm_buddy_block,
+					 link);
+		s += scnprintf(&buf[s], element_size + 1,
+			       "0x%08llx : 0x%08llx : %1s\n",
+			       drm_buddy_block_offset(block) >> PAGE_SHIFT,
+			       drm_buddy_block_size(&mgr->mm, block),
+			       pos->status ? "P" : "F");
+	}
+	mutex_unlock(&mgr->lock);
+}
+
+static ssize_t vram_bad_pages_show(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct xe_device *xe = pdev_to_xe_device(pdev);
+	struct ttm_resource_manager *man;
+	u8 mem_type = XE_PL_VRAM1;
+
+	do {
+		man = ttm_manager_type(&xe->ttm, mem_type);
+		struct xe_ttm_vram_mgr *mgr = to_xe_ttm_vram_mgr(man);
+
+		if (man)
+			xe_ttm_vram_dump_bad_pages_info(buf, mgr);
+		--mem_type;
+	} while (mem_type >= XE_PL_VRAM0);
+
+	return sysfs_emit(buf, "%s\n", buf);
+}
+static DEVICE_ATTR_RO(vram_bad_pages);
+
+static void xe_ttm_vram_sysfs_fini(void *arg)
+{
+	struct xe_device *xe = arg;
+
+	device_remove_file(xe->drm.dev, &dev_attr_vram_bad_pages);
+}
+
+/**
+ * xe_ttm_vram_sysfs_init - Initialize vram sysfs component
+ * @tile: Xe Tile object
+ *
+ * It needs to be initialized after the main tile component is ready
+ *
+ * Returns: 0 on success, negative error code on error.
+ */
+int xe_ttm_vram_sysfs_init(struct xe_device *xe)
+{
+	int err;
+
+	err = device_create_file(xe->drm.dev, &dev_attr_vram_bad_pages);
+	if (err)
+		return 0;
+
+	return devm_add_action_or_reset(xe->drm.dev, xe_ttm_vram_sysfs_fini, xe);
+}
+EXPORT_SYMBOL(xe_ttm_vram_sysfs_init);
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
index 8cc528434ceb..6fcbe0b7fed8 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
@@ -31,6 +31,7 @@ u64 xe_ttm_vram_get_cpu_visible_size(struct ttm_resource_manager *man);
 void xe_ttm_vram_get_used(struct ttm_resource_manager *man,
 			  u64 *used, u64 *used_visible);
 int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long addr);
+int xe_ttm_vram_sysfs_init(struct xe_device *xe);
 static inline struct xe_ttm_vram_mgr_resource *
 to_xe_ttm_vram_mgr_resource(struct ttm_resource *res)
 {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH V4 7/7] drm/xe/configfs: Add vram bad page reservation policy
  2026-02-27 13:44 [RFC PATCH V4 0/7] Add memory page offlining support Tejas Upadhyay
                   ` (5 preceding siblings ...)
  2026-02-27 13:44 ` [RFC PATCH V4 6/7] drm/xe/cri: Add sysfs interface for bad gpu vram pages Tejas Upadhyay
@ 2026-02-27 13:44 ` Tejas Upadhyay
  6 siblings, 0 replies; 12+ messages in thread
From: Tejas Upadhyay @ 2026-02-27 13:44 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.auld, thomas.hellstrom, matthew.brost, Tejas Upadhyay

The interface enables setting the policy for how bad pages are
handled in VRAM. This is crucial for maintaining system
stability in scenarios where VRAM degradation occurs.

By default policy will be "reserve", which can be changed to
"logging" only.

Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
---
 drivers/gpu/drm/xe/xe_configfs.c     | 64 +++++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_configfs.h     |  2 +
 drivers/gpu/drm/xe/xe_device.c       | 41 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_device_sysfs.c |  5 ++-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 10 +++++
 5 files changed, 120 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_configfs.c b/drivers/gpu/drm/xe/xe_configfs.c
index d8c3fbe81aa6..1e5c24c19bc3 100644
--- a/drivers/gpu/drm/xe/xe_configfs.c
+++ b/drivers/gpu/drm/xe/xe_configfs.c
@@ -61,7 +61,8 @@
  *	    ├── survivability_mode
  *	    ├── gt_types_allowed
  *	    ├── engines_allowed
- *	    └── enable_psmi
+ *	    ├── enable_psmi
+ *	    └── bad_page_reservation
  *
  * After configuring the attributes as per next section, the device can be
  * probed with::
@@ -159,6 +160,16 @@
  *
  * This attribute can only be set before binding to the device.
  *
+ * Bad pages reservation:
+ * ---------------------
+ *
+ * Disale vram bad pages reservation, instead just report it in dmesg.
+ *  Example to disable it::
+ *
+ *      # echo 0 > /sys/kernel/config/xe/0000:03:00.0/bad_page_reservation
+ *
+ * This attribute can only be set before binding to the device.
+ *
  * Context restore BB
  * ------------------
  *
@@ -262,6 +273,7 @@ struct xe_config_group_device {
 		struct wa_bb ctx_restore_mid_bb[XE_ENGINE_CLASS_MAX];
 		bool survivability_mode;
 		bool enable_psmi;
+		bool bad_page_reservation;
 		struct {
 			unsigned int max_vfs;
 			bool admin_only_pf;
@@ -281,6 +293,7 @@ static const struct xe_config_device device_defaults = {
 	.engines_allowed = U64_MAX,
 	.survivability_mode = false,
 	.enable_psmi = false,
+	.bad_page_reservation = true,
 	.sriov = {
 		.max_vfs = XE_DEFAULT_MAX_VFS,
 		.admin_only_pf = XE_DEFAULT_ADMIN_ONLY_PF,
@@ -575,6 +588,32 @@ static ssize_t enable_psmi_store(struct config_item *item, const char *page, siz
 	return len;
 }
 
+static ssize_t bad_page_reservation_show(struct config_item *item, char *page)
+{
+	struct xe_config_device *dev = to_xe_config_device(item);
+
+	return sprintf(page, "%d\n", dev->bad_page_reservation);
+}
+
+static ssize_t bad_page_reservation_store(struct config_item *item, const char *page, size_t len)
+{
+	struct xe_config_group_device *dev = to_xe_config_group_device(item);
+	bool val;
+	int ret;
+
+	ret = kstrtobool(page, &val);
+	if (ret)
+		return ret;
+
+	guard(mutex)(&dev->lock);
+	if (is_bound(dev))
+		return -EBUSY;
+
+	dev->config.bad_page_reservation = val;
+
+	return len;
+}
+
 static bool wa_bb_read_advance(bool dereference, char **p,
 			       const char *append, size_t len,
 			       size_t *max_size)
@@ -813,6 +852,7 @@ static ssize_t ctx_restore_post_bb_store(struct config_item *item,
 CONFIGFS_ATTR(, ctx_restore_mid_bb);
 CONFIGFS_ATTR(, ctx_restore_post_bb);
 CONFIGFS_ATTR(, enable_psmi);
+CONFIGFS_ATTR(, bad_page_reservation);
 CONFIGFS_ATTR(, engines_allowed);
 CONFIGFS_ATTR(, gt_types_allowed);
 CONFIGFS_ATTR(, survivability_mode);
@@ -821,6 +861,7 @@ static struct configfs_attribute *xe_config_device_attrs[] = {
 	&attr_ctx_restore_mid_bb,
 	&attr_ctx_restore_post_bb,
 	&attr_enable_psmi,
+	&attr_bad_page_reservation,
 	&attr_engines_allowed,
 	&attr_gt_types_allowed,
 	&attr_survivability_mode,
@@ -1097,6 +1138,7 @@ static void dump_custom_dev_config(struct pci_dev *pdev,
 	PRI_CUSTOM_ATTR("%llx", gt_types_allowed);
 	PRI_CUSTOM_ATTR("%llx", engines_allowed);
 	PRI_CUSTOM_ATTR("%d", enable_psmi);
+	PRI_CUSTOM_ATTR("%d", bad_page_reservation);
 	PRI_CUSTOM_ATTR("%d", survivability_mode);
 	PRI_CUSTOM_ATTR("%u", sriov.admin_only_pf);
 
@@ -1224,6 +1266,26 @@ bool xe_configfs_get_psmi_enabled(struct pci_dev *pdev)
 	return ret;
 }
 
+/**
+ * xe_configfs_get_bad_page_reservation - get configfs bad_page_reservation setting
+ * @pdev: pci device
+ *
+ * Return: bad_page_reservation setting in configfs
+ */
+bool xe_configfs_get_bad_page_reservation(struct pci_dev *pdev)
+{
+	struct xe_config_group_device *dev = find_xe_config_group_device(pdev);
+	bool ret;
+
+	if (!dev)
+		return device_defaults.bad_page_reservation;
+
+	ret = dev->config.bad_page_reservation;
+	config_group_put(&dev->group);
+
+	return ret;
+}
+
 /**
  * xe_configfs_get_ctx_restore_mid_bb - get configfs ctx_restore_mid_bb setting
  * @pdev: pci device
diff --git a/drivers/gpu/drm/xe/xe_configfs.h b/drivers/gpu/drm/xe/xe_configfs.h
index 07d62bf0c152..c107d84b2c62 100644
--- a/drivers/gpu/drm/xe/xe_configfs.h
+++ b/drivers/gpu/drm/xe/xe_configfs.h
@@ -23,6 +23,7 @@ bool xe_configfs_primary_gt_allowed(struct pci_dev *pdev);
 bool xe_configfs_media_gt_allowed(struct pci_dev *pdev);
 u64 xe_configfs_get_engines_allowed(struct pci_dev *pdev);
 bool xe_configfs_get_psmi_enabled(struct pci_dev *pdev);
+bool xe_configfs_get_bad_page_reservation(struct pci_dev *pdev);
 u32 xe_configfs_get_ctx_restore_mid_bb(struct pci_dev *pdev,
 				       enum xe_engine_class class,
 				       const u32 **cs);
@@ -42,6 +43,7 @@ static inline bool xe_configfs_primary_gt_allowed(struct pci_dev *pdev) { return
 static inline bool xe_configfs_media_gt_allowed(struct pci_dev *pdev) { return true; }
 static inline u64 xe_configfs_get_engines_allowed(struct pci_dev *pdev) { return U64_MAX; }
 static inline bool xe_configfs_get_psmi_enabled(struct pci_dev *pdev) { return false; }
+static inline bool xe_configfs_get_bad_page_reservation(struct pci_dev *pdev) { return true; }
 static inline u32 xe_configfs_get_ctx_restore_mid_bb(struct pci_dev *pdev,
 						     enum xe_engine_class class,
 						     const u32 **cs) { return 0; }
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 4b68a2d55651..fa88aab123f2 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -25,6 +25,7 @@
 #include "regs/xe_regs.h"
 #include "xe_bo.h"
 #include "xe_bo_evict.h"
+#include "xe_configfs.h"
 #include "xe_debugfs.h"
 #include "xe_defaults.h"
 #include "xe_devcoredump.h"
@@ -68,6 +69,7 @@
 #include "xe_tile.h"
 #include "xe_ttm_stolen_mgr.h"
 #include "xe_ttm_sys_mgr.h"
+#include "xe_ttm_vram_mgr.h"
 #include "xe_vm.h"
 #include "xe_vm_madvise.h"
 #include "xe_vram.h"
@@ -834,6 +836,41 @@ static void detect_preproduction_hw(struct xe_device *xe)
 	}
 }
 
+static int xe_device_process_bad_pages(struct xe_device *xe)
+{
+	unsigned long offlined[1] = {0x0};
+	unsigned long queued[1] = {0x3000};
+	int n_bad_pages = ARRAY_SIZE(offlined) + ARRAY_SIZE(queued);
+	unsigned long *bad_pages;
+	bool policy;
+	u8 i;
+
+	/* TODO: FW Integration: Query FW for offline/queued pages */
+
+	if (!n_bad_pages)
+		return 0;
+	bad_pages = kmalloc_array(n_bad_pages, sizeof(unsigned long), GFP_KERNEL);
+	if (!bad_pages)
+		return -ENOMEM;
+
+	for (int i = 0; i < ARRAY_SIZE(offlined); i++)
+		bad_pages[i] = offlined[i];
+	for (int i = 0; i < ARRAY_SIZE(queued); i++)
+		bad_pages[ARRAY_SIZE(offlined) + i] = queued[i];
+
+	/* Read policy from configfs */
+	policy = xe_configfs_get_bad_page_reservation(to_pci_dev(xe->drm.dev));
+	for (i = 0; i < n_bad_pages; i++) {
+		if (!policy)
+			drm_err(&xe->drm, "0x%lx is reported as corrupted address by HW\n",
+				bad_pages[i]);
+		else
+			xe_ttm_vram_handle_addr_fault(xe, bad_pages[i]);
+	}
+	kfree(bad_pages);
+	return 0;
+}
+
 int xe_device_probe(struct xe_device *xe)
 {
 	struct xe_tile *tile;
@@ -890,6 +927,10 @@ int xe_device_probe(struct xe_device *xe)
 			return err;
 	}
 
+	err = xe_device_process_bad_pages(xe);
+	if (err)
+		return err;
+
 	/*
 	 * Allow allocations only now to ensure xe_display_init_early()
 	 * is the first to allocate, always.
diff --git a/drivers/gpu/drm/xe/xe_device_sysfs.c b/drivers/gpu/drm/xe/xe_device_sysfs.c
index 796dd6bedd97..47c5be4180fe 100644
--- a/drivers/gpu/drm/xe/xe_device_sysfs.c
+++ b/drivers/gpu/drm/xe/xe_device_sysfs.c
@@ -8,6 +8,7 @@
 #include <linux/pci.h>
 #include <linux/sysfs.h>
 
+#include "xe_configfs.h"
 #include "xe_device.h"
 #include "xe_device_sysfs.h"
 #include "xe_mmio.h"
@@ -268,6 +269,7 @@ static const struct attribute_group auto_link_downgrade_attr_group = {
 int xe_device_sysfs_init(struct xe_device *xe)
 {
 	struct device *dev = xe->drm.dev;
+	bool policy;
 	int ret;
 
 	if (xe->d3cold.capable) {
@@ -286,7 +288,8 @@ int xe_device_sysfs_init(struct xe_device *xe)
 			return ret;
 	}
 
-	if (xe->info.platform == XE_CRESCENTISLAND)
+	policy = xe_configfs_get_bad_page_reservation(to_pci_dev(dev));
+	if (xe->info.platform == XE_CRESCENTISLAND && policy)
 		xe_ttm_vram_sysfs_init(xe);
 
 	return 0;
diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index 417ad563e80e..8dfa47c19721 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -11,6 +11,7 @@
 #include <drm/ttm/ttm_range_manager.h>
 
 #include "xe_bo.h"
+#include "xe_configfs.h"
 #include "xe_device.h"
 #include "xe_res_cursor.h"
 #include "xe_ttm_vram_mgr.h"
@@ -729,12 +730,21 @@ int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long addr)
 	struct xe_ttm_vram_mgr *vram_mgr;
 	struct xe_vram_region *vr;
 	struct drm_buddy *mm;
+	bool policy;
 	int ret;
 
 	vr = xe_ttm_vram_addr_to_region(xe, addr);
 	WARN_ON(!vr);
 	vram_mgr = &vr->ttm;
 	mm = &vram_mgr->mm;
+
+	policy = xe_configfs_get_bad_page_reservation(to_pci_dev(xe->drm.dev));
+	if (!policy) {
+		drm_err(&xe->drm, "0x%lx is reported as corrupted address by HW\n",
+			addr);
+		/* TODO: FW Integration: Report to FW to drop addr from SRAM queue */
+		return -EOPNOTSUPP;
+	}
 	 /* Reserve page at address */
 	ret = xe_ttm_vram_reserve_page_at_addr(xe, addr, vram_mgr, mm);
 	if (ret == -EIO)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-03-16 16:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-27 13:44 [RFC PATCH V4 0/7] Add memory page offlining support Tejas Upadhyay
2026-02-27 13:44 ` [RFC PATCH V4 1/7] drm/xe/svm: Use res_to_mem_region Tejas Upadhyay
2026-02-27 13:44 ` [RFC PATCH V4 2/7] drm/xe: Implement VRAM object tracking ability using physical address Tejas Upadhyay
2026-02-27 13:44 ` [RFC PATCH V4 3/7] drm/xe: Handle physical memory address error Tejas Upadhyay
2026-03-02  5:11   ` Aravind Iddamsetty
2026-03-05  6:40     ` Upadhyay, Tejas
2026-03-06 10:29       ` Aravind Iddamsetty
2026-03-16 16:34       ` Upadhyay, Tejas
2026-02-27 13:44 ` [RFC PATCH V4 4/7] [DO_NOT_REVIEW]]drm/xe/cri: Add debugfs to inject faulty vram address Tejas Upadhyay
2026-02-27 13:44 ` [RFC PATCH V4 5/7] drm/buddy: Add routine to dump allocated buddy blocks Tejas Upadhyay
2026-02-27 13:44 ` [RFC PATCH V4 6/7] drm/xe/cri: Add sysfs interface for bad gpu vram pages Tejas Upadhyay
2026-02-27 13:44 ` [RFC PATCH V4 7/7] drm/xe/configfs: Add vram bad page reservation policy Tejas Upadhyay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox