Re: [RFC PATCH V6 3/7] drm/xe: Handle physical memory address error

public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed

From: Matthew Brost <matthew.brost@intel.com>
To: "Upadhyay, Tejas" <tejas.upadhyay@intel.com>
Cc: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"Auld, Matthew" <matthew.auld@intel.com>,
	"thomas.hellstrom@linux.intel.com"
	<thomas.hellstrom@linux.intel.com>,
	"Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>
Subject: Re: [RFC PATCH V6 3/7] drm/xe: Handle physical memory address error
Date: Thu, 2 Apr 2026 13:20:26 -0700	[thread overview]
Message-ID: <ac7PikDrbonyVuqw@gsse-cloud1.jf.intel.com> (raw)
In-Reply-To: <SJ1PR11MB62045BE3CC2D605CEEB271CD8151A@SJ1PR11MB6204.namprd11.prod.outlook.com>

On Thu, Apr 02, 2026 at 04:30:47AM -0600, Upadhyay, Tejas wrote:
> 
> 
> > -----Original Message-----
> > From: Brost, Matthew <matthew.brost@intel.com>
> > Sent: 02 April 2026 06:34
> > To: Upadhyay, Tejas <tejas.upadhyay@intel.com>
> > Cc: intel-xe@lists.freedesktop.org; Auld, Matthew
> > <matthew.auld@intel.com>; thomas.hellstrom@linux.intel.com; Ghimiray,
> > Himal Prasad <himal.prasad.ghimiray@intel.com>
> > Subject: Re: [RFC PATCH V6 3/7] drm/xe: Handle physical memory address
> > error
> > 
> > On Fri, Mar 27, 2026 at 05:18:16PM +0530, Tejas Upadhyay wrote:
> > > This functionality represents a significant step in making the xe
> > > driver gracefully handle hardware memory degradation.
> > > By integrating with the DRM Buddy allocator, the driver can
> > > permanently "carve out" faulty memory so it isn't reused by subsequent
> > > allocations.
> > >
> > > Buddy Block Reservation:
> > > ----------------------
> > > When a memory address is reported as faulty, the driver instructs the
> > > DRM Buddy allocator to reserve a block of the specific page size
> > > (typically 4KB). This marks the memory as "dirty/used"
> > > indefinitely.
> > >
> > > Two-Stage Tracking:
> > > -----------------
> > > Offlined Pages:
> > > Pages that have been successfully isolated and removed from the
> > > available memory pool.
> > >
> > > Queued Pages:
> > > Addresses that have been flagged as faulty but are currently in use by
> > > a process. These are tracked until the associated buffer object (BO)
> > > is released or migrated, at which point they move to the "offlined"
> > > state.
> > >
> > > Sysfs Reporting:
> > > --------------
> > > The patch exposes these metrics through a standard interface, allowing
> > > administrators to monitor VRAM health:
> > > /sys/bus/pci/devices/<device_id>/vram_bad_bad_pages
> > >
> > > V5:
> > > - Categorise and handle BOs accordingly
> > > - Fix crash found with new debugfs tests
> > > V4:
> > > - Set block->private NULL post bo purge
> > > - Filter out gsm address early on
> > > - Rebase
> > > V3:
> > > -rename api, remove tile dependency and add status of reservation
> > > V2:
> > > - Fix mm->avail counter issue
> > > - Remove unused code and handle clean up in case of error
> > >
> > > Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_ttm_vram_mgr.c       | 336
> > +++++++++++++++++++++
> > >  drivers/gpu/drm/xe/xe_ttm_vram_mgr.h       |   1 +
> > >  drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h |  26 ++
> > >  3 files changed, 363 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > > b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > > index c627dbf94552..0fec7b332501 100644
> > > --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > > +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > > @@ -13,7 +13,10 @@
> > >
> > >  #include "xe_bo.h"
> > >  #include "xe_device.h"
> > > +#include "xe_exec_queue.h"
> > > +#include "xe_lrc.h"
> > >  #include "xe_res_cursor.h"
> > > +#include "xe_ttm_stolen_mgr.h"
> > >  #include "xe_ttm_vram_mgr.h"
> > >  #include "xe_vram_types.h"
> > >
> > > @@ -277,6 +280,26 @@ static const struct ttm_resource_manager_func
> > xe_ttm_vram_mgr_func = {
> > >  	.debug	= xe_ttm_vram_mgr_debug
> > >  };
> > >
> > > +static void xe_ttm_vram_free_bad_pages(struct drm_device *dev, struct
> > > +xe_ttm_vram_mgr *mgr) {
> > > +	struct xe_ttm_vram_offline_resource *pos, *n;
> > > +
> > > +	mutex_lock(&mgr->lock);
> > > +	list_for_each_entry_safe(pos, n, &mgr->offlined_pages, offlined_link)
> > {
> > > +		--mgr->n_offlined_pages;
> > > +		gpu_buddy_free_list(&mgr->mm, &pos->blocks, 0);
> > > +		mgr->visible_avail += pos->used_visible_size;
> > > +		list_del(&pos->offlined_link);
> > > +		kfree(pos);
> > > +	}
> > > +	list_for_each_entry_safe(pos, n, &mgr->queued_pages, queued_link)
> > {
> > > +		list_del(&pos->queued_link);
> > > +		mgr->n_queued_pages--;
> > > +		kfree(pos);
> > > +	}
> > > +	mutex_unlock(&mgr->lock);
> > > +}
> > > +
> > >  static void xe_ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
> > > {
> > >  	struct xe_device *xe = to_xe_device(dev); @@ -288,6 +311,8 @@
> > static
> > > void xe_ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
> > >  	if (ttm_resource_manager_evict_all(&xe->ttm, man))
> > >  		return;
> > >
> > > +	xe_ttm_vram_free_bad_pages(dev, mgr);
> > > +
> > >  	WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
> > >
> > >  	gpu_buddy_fini(&mgr->mm);
> > > @@ -316,6 +341,8 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe,
> > struct xe_ttm_vram_mgr *mgr,
> > >  	man->func = &xe_ttm_vram_mgr_func;
> > >  	mgr->mem_type = mem_type;
> > >  	mutex_init(&mgr->lock);
> > > +	INIT_LIST_HEAD(&mgr->offlined_pages);
> > > +	INIT_LIST_HEAD(&mgr->queued_pages);
> > >  	mgr->default_page_size = default_page_size;
> > >  	mgr->visible_size = io_size;
> > >  	mgr->visible_avail = io_size;
> > > @@ -471,3 +498,312 @@ u64 xe_ttm_vram_get_avail(struct
> > > ttm_resource_manager *man)
> > >
> > >  	return avail;
> > >  }
> > > +
> > > +static bool is_ttm_vram_migrate_lrc(struct xe_device *xe, struct
> > > +xe_bo *pbo)
> > 
> > As discussed in prior reply [1] - I think this can be dropped.
> > 
> > [1]
> > https://patchwork.freedesktop.org/patch/714756/?series=161473&rev=6#c
> > omment_1318048
> > 
> > > +{
> > > +	if (pbo->ttm.type == ttm_bo_type_kernel &&
> > > +	    pbo->flags & XE_BO_FLAG_FORCE_USER_VRAM &&
> > > +	    (pbo->flags & (XE_BO_FLAG_GGTT |
> > XE_BO_FLAG_GGTT_INVALIDATE)) &&
> > > +	    !(pbo->flags & XE_BO_FLAG_PAGETABLE)) {
> > > +		unsigned long idx;
> > > +		struct xe_exec_queue *q;
> > > +		struct drm_device *dev = &xe->drm;
> > > +		struct drm_file *file;
> > > +		struct xe_lrc *lrc;
> > > +
> > > +		/* TODO : Need to extend to multitile in future if needed */
> > > +		mutex_lock(&dev->filelist_mutex);
> > > +		list_for_each_entry(file, &dev->filelist, lhead) {
> > > +			struct xe_file *xef = file->driver_priv;
> > > +
> > > +			mutex_lock(&xef->exec_queue.lock);
> > > +			xa_for_each(&xef->exec_queue.xa, idx, q) {
> > > +				xe_exec_queue_get(q);
> > > +				mutex_unlock(&xef->exec_queue.lock);
> > > +
> > > +				for (int i = 0; i < q->width; i++) {
> > > +					lrc = xe_exec_queue_get_lrc(q, i);
> > > +					if (lrc->bo == pbo) {
> > > +						xe_lrc_put(lrc);
> > > +						mutex_lock(&xef-
> > >exec_queue.lock);
> > > +						xe_exec_queue_put(q);
> > > +						mutex_unlock(&xef-
> > >exec_queue.lock);
> > > +						mutex_unlock(&dev-
> > >filelist_mutex);
> > > +						return false;
> > > +					}
> > > +					xe_lrc_put(lrc);
> > > +				}
> > > +				mutex_lock(&xef->exec_queue.lock);
> > > +				xe_exec_queue_put(q);
> > > +				mutex_unlock(&xef->exec_queue.lock);
> > > +			}
> > > +		}
> > > +		mutex_unlock(&dev->filelist_mutex);
> > > +		return true;
> > > +	}
> > > +	return false;
> > > +}
> > > +
> > > +static void xe_ttm_vram_purge_page(struct xe_device *xe, struct xe_bo
> > > +*pbo) {
> > > +	struct ttm_placement place = {};
> > > +	struct ttm_operation_ctx ctx = {
> > > +		.interruptible = false,
> > > +		.gfp_retry_mayfail = false,
> > > +	};
> > > +	bool locked;
> > > +	int ret = 0;
> > > +
> > > +	/*  Ban VM if BO is PPGTT */
> > > +	if (pbo->ttm.type == ttm_bo_type_kernel &&
> > > +	    pbo->flags & XE_BO_FLAG_FORCE_USER_VRAM &&
> > > +	    pbo->flags & XE_BO_FLAG_PAGETABLE) {
> > 
> > I think XE_BO_FLAG_PAGETABLE and XE_BO_FLAG_FORCE_USER_VRAM are
> > sufficient here.
> > 
> > Also, if XE_BO_FLAG_PAGETABLE is set but XE_BO_FLAG_FORCE_USER_VRAM
> > is clear, that means this is a kernel VM and we probably have to wedge the
> > device, right?
> 
> I am looking at all other review comments , meanwhile as quick response, @Aravind Iddamsetty was suggesting to do SBR reset for critical BOs in place of wedge.
> 

That very well could be correct — it would have to be some kind of
global event if a critical BO encounters an error and the driver needs
to recover. Part of recover process has to be replace the critical BO
with a new BO too, right?

A few more comments below.

> Tejas
> > 
> > > +		down_write(&pbo->vm->lock);
> > > +		xe_vm_kill(pbo->vm, true);
> > > +		up_write(&pbo->vm->lock);
> > > +	}
> > > +
> > > +	/*  Ban exec queue if BO is lrc */
> > > +	if (pbo->ttm.type == ttm_bo_type_kernel &&
> > > +	    pbo->flags & XE_BO_FLAG_FORCE_USER_VRAM &&
> > > +	    (pbo->flags & (XE_BO_FLAG_GGTT |
> > XE_BO_FLAG_GGTT_INVALIDATE)) &&
> > > +	    !(pbo->flags & XE_BO_FLAG_PAGETABLE)) {
> > 
> > 
> > This is a huge if statement just to determine whether this is an LRC. At a
> > minimum, we’d need to normalize this, and it looks very fragile—if we change
> > flags elsewhere in the driver, this if statement could easily break.
> > 
> > Also, I can’t say I’m a fan of searching just to kill an individual queue.
> > 
> > It’s a bit unfortunate that LRCs are created without a VM (I forget the exact
> > reasoning, but I seem to recall it was related to multi-q?)
> > 
> > I think what we really want to do is:
> > 
> > - If we find a PT or LRC BO, kill the VM.

If the PT encounters an error, we likely also want to immediately
invalidate the VM’s page table structure to avoid a device page-table
walk reading a corrupted PT. xe_vm_close() does this in the code below:

1828                 if (bound) {
1829                         for_each_tile(tile, xe, id)
1830                                 if (vm->pt_root[id])
1831                                         xe_pt_clear(xe, vm->pt_root[id]);
1832
1833                         for_each_gt(gt, xe, id)
1834                                 xe_tlb_inval_vm(&gt->tlb_inval, vm);
1835                 }

Maybe this could be extracted into a helper and called here, likely
after xe_vm_kill(). There is also a weird corner case where the PT that
encounters the error is vm->pt_root[id], which is particularly bad,
because in that case we can’t call xe_pt_clear(). That operation
involves a CPU write, and if I remember correctly, things go really
badly then.

> > - Update ‘kill VM’ to kill all exec queues. I honestly forget why we
> >   only kill preempt/rebind queues—it’s likely some nonsensical reasoning
> >   that we never cleaned up. We already have xe_vm_add_exec_queue(), which
> >   is short-circuited on xe->info.has_ctx_tlb_inval, but we can just
> >   remove that.
> > - Normalize this with an LRC BO flag and store the user_vm in the BO for
> >   LRCs.
> > - Critical kernel BOs normalized with BO flag -> wedge the device
> > 
> > The difference between killing a queue and killing a VM doesn’t really matter
> > from a user-space point of view, since typically a single-queue hang leads to
> > the entire process crashing or restarting—at least for Mesa 3D. We should
> > confirm with compute whether this is also what we’re targeting for CRI, but I
> > suspect the answer is the same. Even if it isn’t, I’m not convinced per-queue
> > killing is worthwhile. And if we decide it is, the filelist / exec_queue.xa search is
> > pretty much a non-starter for me—for example, we’d need to make this much
> > simpler and avoid taking a bunch of locks here, which looks pretty scary.
> > 
> > > +		struct drm_device *dev = &xe->drm;
> > > +		struct xe_exec_queue *q;
> > > +		struct drm_file *file;
> > > +		struct xe_lrc *lrc;
> > > +		unsigned long idx;
> > > +
> > > +		/* TODO : Need to extend to multitile in future if needed */
> > > +		mutex_lock(&dev->filelist_mutex);
> > > +		list_for_each_entry(file, &dev->filelist, lhead) {
> > > +			struct xe_file *xef = file->driver_priv;
> > > +
> > > +			mutex_lock(&xef->exec_queue.lock);
> > > +			xa_for_each(&xef->exec_queue.xa, idx, q) {
> > > +				xe_exec_queue_get(q);
> > > +				mutex_unlock(&xef->exec_queue.lock);
> > > +
> > > +				for (int i = 0; i < q->width; i++) {
> > > +					lrc = xe_exec_queue_get_lrc(q, i);
> > > +					if (lrc->bo == pbo) {
> > > +						xe_lrc_put(lrc);
> > > +						xe_exec_queue_kill(q);
> > > +					} else {
> > > +						xe_lrc_put(lrc);
> > > +					}
> > > +				}
> > > +
> > > +				mutex_lock(&xef->exec_queue.lock);
> > > +				xe_exec_queue_put(q);
> > > +				mutex_unlock(&xef->exec_queue.lock);
> > > +			}
> > > +		}
> > > +		mutex_unlock(&dev->filelist_mutex);
> > > +	}
> > > +
> > > +	spin_lock(&pbo->ttm.bdev->lru_lock);
> > > +	locked = dma_resv_trylock(pbo->ttm.base.resv);
> > > +	spin_unlock(&pbo->ttm.bdev->lru_lock);
> > > +	WARN_ON(!locked);
> > 
> > Is there any reason why we can’t just take a sleeping dma_resv_lock here (e.g.
> > xe_bo_lock)? Also, I think the trick with the LRU lock only works once the BO’s
> > dma_resv has been individualized (kref == 0), which is clearly not the case
> > here.
> > 
> > > +	ret = ttm_bo_validate(&pbo->ttm, &place, &ctx);

I thought I had typed this out, but with the purging-BO series now
merged, we have a helper for purging: xe_ttm_bo_purge(). I don’t think
it works exactly for this case, but with small updates, I believe it
could be made to work. It also does things like remove the page tables
for the BO, which I think is desired.

Matt

> > > +	drm_WARN_ON(&xe->drm, ret);
> > > +	xe_bo_put(pbo);
> > > +	if (locked)
> > > +		dma_resv_unlock(pbo->ttm.base.resv);
> > > +}
> > > +
> > > +static int xe_ttm_vram_reserve_page_at_addr(struct xe_device *xe,
> > unsigned long addr,
> > > +					    struct xe_ttm_vram_mgr
> > *vram_mgr, struct gpu_buddy *mm) {
> > > +	struct xe_ttm_vram_offline_resource *nentry;
> > > +	struct ttm_buffer_object *tbo = NULL;
> > > +	struct gpu_buddy_block *block;
> > > +	struct gpu_buddy_block *b, *m;
> > > +	enum reserve_status {
> > > +		pending = 0,
> > > +		fail
> > > +	};
> > > +	u64 size = SZ_4K;
> > > +	int ret = 0;
> > > +
> > > +	mutex_lock(&vram_mgr->lock);
> > 
> > You’re going to have to fix the locking here. For example, the lock is released
> > inside nested if statements below, which makes this function very difficult to
> > follow. Personally, I can’t really focus on anything else until this is cleaned up.
> > I’m not saying we don’t already have bad locking patterns in Xe—I’m sure we
> > do—but let’s avoid introducing new code with those patterns.
> > 
> > For example, it should look more like this:
> > 
> > mutex_lock(&vram_mgr->lock);
> > /* Do the minimal work that requires the lock */ mutex_unlock(&vram_mgr-
> > >lock);
> > 
> > /* Do other work where &vram_mgr->lock needs to be dropped */
> > 
> > mutex_lock(&vram_mgr->lock);
> > /* Do more work that requires the lock */ mutex_unlock(&vram_mgr->lock);
> > 
> > Also strongly prefer guards or scoped_guards too.
> > 
> > > +	block = gpu_buddy_addr_to_block(mm, addr);
> > > +	if (PTR_ERR(block) == -ENXIO) {
> > > +		mutex_unlock(&vram_mgr->lock);
> > > +		return -ENXIO;
> > > +	}
> > > +
> > > +	nentry = kzalloc_obj(*nentry);
> > > +	if (!nentry)
> > > +		return -ENOMEM;
> > > +	INIT_LIST_HEAD(&nentry->blocks);
> > > +	nentry->status = pending;
> > > +
> > > +	if (block) {
> > > +		struct xe_ttm_vram_offline_resource *pos, *n;
> > > +		struct xe_bo *pbo;
> > > +
> > > +		WARN_ON(!block->private);
> > > +		tbo = block->private;
> > > +		pbo = ttm_to_xe_bo(tbo);
> > > +
> > > +		xe_bo_get(pbo);
> > 
> > This probably needs a kref get if it’s non‑zero. If this is a zombie BO, it should
> > already be getting destroyed. Also, we’re going to need to look into gutting the
> > TTM pipeline as well, where TTM resources are transferred to different BOs—
> > but there’s enough to clean up here first before we get to that.
> > 
> > I'm going to stop here as there is quite a bit to cleanup / simplify before I can
> > dig in more.
> > 
> > Matt
> > 
> > > +		/* Critical kernel BO? */
> > > +		if (pbo->ttm.type == ttm_bo_type_kernel &&
> > > +		    (!(pbo->flags & XE_BO_FLAG_FORCE_USER_VRAM) ||
> > > +		     is_ttm_vram_migrate_lrc(xe, pbo))) {
> > > +			mutex_unlock(&vram_mgr->lock);
> > > +			kfree(nentry);
> > > +			xe_ttm_vram_free_bad_pages(&xe->drm, vram_mgr);
> > > +			xe_bo_put(pbo);
> > > +			drm_err(&xe->drm,
> > > +				"%s: corrupt addr: 0x%lx in critical kernel bo,
> > request reset\n",
> > > +				__func__, addr);
> > > +			/* Hint System controller driver for reset with -EIO  */
> > > +			return -EIO;
> > > +		}
> > > +		nentry->id = ++vram_mgr->n_queued_pages;
> > > +		list_add(&nentry->queued_link, &vram_mgr-
> > >queued_pages);
> > > +		mutex_unlock(&vram_mgr->lock);
> > > +
> > > +		/* Purge BO containing address */
> > > +		 xe_ttm_vram_purge_page(xe, pbo);
> > > +
> > > +		/* Reserve page at address addr*/
> > > +		mutex_lock(&vram_mgr->lock);
> > > +		ret = gpu_buddy_alloc_blocks(mm, addr, addr + size,
> > > +					     size, size, &nentry->blocks,
> > > +
> > GPU_BUDDY_RANGE_ALLOCATION);
> > > +
> > > +		if (ret) {
> > > +			drm_warn(&xe->drm, "Could not reserve page at
> > addr:0x%lx, ret:%d\n",
> > > +				 addr, ret);
> > > +			nentry->status = fail;
> > > +			mutex_unlock(&vram_mgr->lock);
> > > +			return ret;
> > > +		}
> > > +
> > > +		list_for_each_entry_safe(b, m, &nentry->blocks, link)
> > > +			b->private = NULL;
> > > +
> > > +		if ((addr + size) <= vram_mgr->visible_size) {
> > > +			nentry->used_visible_size = size;
> > > +		} else {
> > > +			list_for_each_entry(b, &nentry->blocks, link) {
> > > +				u64 start = gpu_buddy_block_offset(b);
> > > +
> > > +				if (start < vram_mgr->visible_size) {
> > > +					u64 end = start +
> > gpu_buddy_block_size(mm, b);
> > > +
> > > +					nentry->used_visible_size +=
> > > +						min(end, vram_mgr-
> > >visible_size) - start;
> > > +				}
> > > +			}
> > > +		}
> > > +		vram_mgr->visible_avail -= nentry->used_visible_size;
> > > +		list_for_each_entry_safe(pos, n, &vram_mgr->queued_pages,
> > queued_link) {
> > > +			if (pos->id == nentry->id) {
> > > +				--vram_mgr->n_queued_pages;
> > > +				list_del(&pos->queued_link);
> > > +				break;
> > > +			}
> > > +		}
> > > +		list_add(&nentry->offlined_link, &vram_mgr-
> > >offlined_pages);
> > > +		/* TODO: FW Integration: Send command to FW for offlining
> > page */
> > > +		++vram_mgr->n_offlined_pages;
> > > +		mutex_unlock(&vram_mgr->lock);
> > > +		return ret;
> > > +
> > > +	} else {
> > > +		ret = gpu_buddy_alloc_blocks(mm, addr, addr + size,
> > > +					     size, size, &nentry->blocks,
> > > +
> > GPU_BUDDY_RANGE_ALLOCATION);
> > > +		if (ret) {
> > > +			drm_warn(&xe->drm, "Could not reserve page at
> > addr:0x%lx, ret:%d\n",
> > > +				 addr, ret);
> > > +			nentry->status = fail;
> > > +			mutex_unlock(&vram_mgr->lock);
> > > +			return ret;
> > > +		}
> > > +
> > > +		list_for_each_entry_safe(b, m, &nentry->blocks, link)
> > > +			b->private = NULL;
> > > +
> > > +		if ((addr + size) <= vram_mgr->visible_size) {
> > > +			nentry->used_visible_size = size;
> > > +		} else {
> > > +			struct gpu_buddy_block *block;
> > > +
> > > +			list_for_each_entry(block, &nentry->blocks, link) {
> > > +				u64 start = gpu_buddy_block_offset(block);
> > > +
> > > +				if (start < vram_mgr->visible_size) {
> > > +					u64 end = start +
> > gpu_buddy_block_size(mm, block);
> > > +
> > > +					nentry->used_visible_size +=
> > > +						min(end, vram_mgr-
> > >visible_size) - start;
> > > +				}
> > > +			}
> > > +		}
> > > +		vram_mgr->visible_avail -= nentry->used_visible_size;
> > > +		nentry->id = ++vram_mgr->n_offlined_pages;
> > > +		list_add(&nentry->offlined_link, &vram_mgr-
> > >offlined_pages);
> > > +		/* TODO: FW Integration: Send command to FW for offlining
> > page */
> > > +		mutex_unlock(&vram_mgr->lock);
> > > +	}
> > > +	/* Success */
> > > +	return ret;
> > > +}
> > > +
> > > +static struct xe_vram_region *xe_ttm_vram_addr_to_region(struct
> > xe_device *xe,
> > > +							 resource_size_t addr)
> > > +{
> > > +	unsigned long stolen_base = xe_ttm_stolen_gpu_offset(xe);
> > > +	struct xe_vram_region *vr;
> > > +	struct xe_tile *tile;
> > > +	int id;
> > > +
> > > +	/* Addr from stolen memory? */
> > > +	if (addr + SZ_4K >= stolen_base)
> > > +		return NULL;
> > > +
> > > +	for_each_tile(tile, xe, id) {
> > > +		vr = tile->mem.vram;
> > > +		if ((addr <= vr->dpa_base + vr->actual_physical_size) &&
> > > +		    (addr + SZ_4K >= vr->dpa_base))
> > > +			return vr;
> > > +	}
> > > +	return NULL;
> > > +}
> > > +
> > > +/**
> > > + * xe_ttm_vram_handle_addr_fault - Handle vram physical address error
> > > +flaged
> > > + * @xe: pointer to parent device
> > > + * @addr: physical faulty address
> > > + *
> > > + * Handle the physcial faulty address error on specific tile.
> > > + *
> > > + * Returns 0 for success, negative error code otherwise.
> > > + */
> > > +int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long
> > > +addr) {
> > > +	struct xe_ttm_vram_mgr *vram_mgr;
> > > +	struct xe_vram_region *vr;
> > > +	struct gpu_buddy *mm;
> > > +	int ret;
> > > +
> > > +	vr = xe_ttm_vram_addr_to_region(xe, addr);
> > > +	if (!vr) {
> > > +		drm_err(&xe->drm, "%s:%d addr:%lx error requesting SBR\n",
> > > +			__func__, __LINE__, addr);
> > > +		/* Hint System controller driver for reset with -EIO  */
> > > +		return -EIO;
> > > +	}
> > > +	vram_mgr = &vr->ttm;
> > > +	mm = &vram_mgr->mm;
> > > +	/* Reserve page at address */
> > > +	ret = xe_ttm_vram_reserve_page_at_addr(xe, addr, vram_mgr, mm);
> > > +	return ret;
> > > +}
> > > +EXPORT_SYMBOL(xe_ttm_vram_handle_addr_fault);
> > > diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> > > b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> > > index 87b7fae5edba..8ef06d9d44f7 100644
> > > --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> > > +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> > > @@ -31,6 +31,7 @@ u64 xe_ttm_vram_get_cpu_visible_size(struct
> > > ttm_resource_manager *man);  void xe_ttm_vram_get_used(struct
> > ttm_resource_manager *man,
> > >  			  u64 *used, u64 *used_visible);
> > >
> > > +int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long
> > > +addr);
> > >  static inline struct xe_ttm_vram_mgr_resource *
> > > to_xe_ttm_vram_mgr_resource(struct ttm_resource *res)  { diff --git
> > > a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> > > b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> > > index 9106da056b49..94eaf9d875f1 100644
> > > --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> > > @@ -19,6 +19,14 @@ struct xe_ttm_vram_mgr {
> > >  	struct ttm_resource_manager manager;
> > >  	/** @mm: DRM buddy allocator which manages the VRAM */
> > >  	struct gpu_buddy mm;
> > > +	/** @offlined_pages: List of offlined pages */
> > > +	struct list_head offlined_pages;
> > > +	/** @n_offlined_pages: Number of offlined pages */
> > > +	u16 n_offlined_pages;
> > > +	/** @queued_pages: List of queued pages */
> > > +	struct list_head queued_pages;
> > > +	/** @n_queued_pages: Number of queued pages */
> > > +	u16 n_queued_pages;
> > >  	/** @visible_size: Proped size of the CPU visible portion */
> > >  	u64 visible_size;
> > >  	/** @visible_avail: CPU visible portion still unallocated */ @@
> > > -45,4 +53,22 @@ struct xe_ttm_vram_mgr_resource {
> > >  	unsigned long flags;
> > >  };
> > >
> > > +/**
> > > + * struct xe_ttm_vram_offline_resource - Xe TTM VRAM offline
> > > +resource  */ struct xe_ttm_vram_offline_resource {
> > > +	/** @offlined_link: Link to offlined pages */
> > > +	struct list_head offlined_link;
> > > +	/** @queued_link: Link to queued pages */
> > > +	struct list_head queued_link;
> > > +	/** @blocks: list of DRM buddy blocks */
> > > +	struct list_head blocks;
> > > +	/** @used_visible_size: How many CPU visible bytes this resource is
> > using */
> > > +	u64 used_visible_size;
> > > +	/** @id: The id of an offline resource */
> > > +	u16 id;
> > > +	/** @status: reservation status of resource */
> > > +	bool status;
> > > +};
> > > +
> > >  #endif
> > > --
> > > 2.52.0
> > >

next prev parent reply	other threads:[~2026-04-02 20:20 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-27 11:48 [RFC PATCH V6 0/7] Add memory page offlining support Tejas Upadhyay
2026-03-27 11:48 ` [RFC PATCH V6 1/7] drm/xe: Link VRAM object with gpu buddy Tejas Upadhyay
2026-04-01 23:56   ` Matthew Brost
2026-04-02  9:10     ` Upadhyay, Tejas
2026-04-02 20:50       ` Matthew Brost
2026-04-06 11:04         ` Upadhyay, Tejas
2026-03-27 11:48 ` [RFC PATCH V6 2/7] drm/gpu: Add gpu_buddy_addr_to_block helper Tejas Upadhyay
2026-04-02  0:09   ` Matthew Brost
2026-04-02 10:16     ` Matthew Auld
2026-04-02  9:12   ` Matthew Auld
2026-03-27 11:48 ` [RFC PATCH V6 3/7] drm/xe: Handle physical memory address error Tejas Upadhyay
2026-04-01 23:53   ` Matthew Brost
2026-04-02  1:03   ` Matthew Brost
2026-04-02 10:30     ` Upadhyay, Tejas
2026-04-02 20:20       ` Matthew Brost [this message]
2026-04-07 12:03         ` Upadhyay, Tejas
2026-03-27 11:48 ` [RFC PATCH V6 4/7] drm/xe/cri: Add debugfs to inject faulty vram address Tejas Upadhyay
2026-03-27 11:48 ` [RFC PATCH V6 5/7] gpu/buddy: Add routine to dump allocated buddy blocks Tejas Upadhyay
2026-03-27 11:48 ` [RFC PATCH V6 6/7] drm/xe/configfs: Add vram bad page reservation policy Tejas Upadhyay
2026-03-27 11:48 ` [RFC PATCH V6 7/7] drm/xe/cri: Add sysfs interface for bad gpu vram pages Tejas Upadhyay
2026-03-27 12:24 ` ✗ CI.checkpatch: warning for Add memory page offlining support (rev6) Patchwork
2026-03-27 12:26 ` ✓ CI.KUnit: success " Patchwork
2026-03-27 13:16 ` ✓ Xe.CI.BAT: " Patchwork
2026-03-28  4:49 ` ✓ Xe.CI.FULL: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac7PikDrbonyVuqw@gsse-cloud1.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=aravind.iddamsetty@linux.intel.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    --cc=tejas.upadhyay@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox