Re: [RFC PATCH V6 3/7] drm/xe: Handle physical memory address error

public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed

From: Matthew Brost <matthew.brost@intel.com>
To: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <matthew.auld@intel.com>,
	<thomas.hellstrom@linux.intel.com>,
	<himal.prasad.ghimiray@intel.com>
Subject: Re: [RFC PATCH V6 3/7] drm/xe: Handle physical memory address error
Date: Wed, 1 Apr 2026 18:03:47 -0700	[thread overview]
Message-ID: <ac3Ac4n//gB3raXc@gsse-cloud1.jf.intel.com> (raw)
In-Reply-To: <20260327114829.2678240-12-tejas.upadhyay@intel.com>

On Fri, Mar 27, 2026 at 05:18:16PM +0530, Tejas Upadhyay wrote:
> This functionality represents a significant step in making
> the xe driver gracefully handle hardware memory degradation.
> By integrating with the DRM Buddy allocator, the driver
> can permanently "carve out" faulty memory so it isn't reused
> by subsequent allocations.
> 
> Buddy Block Reservation:
> ----------------------
> When a memory address is reported as faulty, the driver instructs
> the DRM Buddy allocator to reserve a block of the specific page
> size (typically 4KB). This marks the memory as "dirty/used"
> indefinitely.
> 
> Two-Stage Tracking:
> -----------------
> Offlined Pages:
> Pages that have been successfully isolated and removed from the
> available memory pool.
> 
> Queued Pages:
> Addresses that have been flagged as faulty but are currently in
> use by a process. These are tracked until the associated buffer
> object (BO) is released or migrated, at which point they move
> to the "offlined" state.
> 
> Sysfs Reporting:
> --------------
> The patch exposes these metrics through a standard interface,
> allowing administrators to monitor VRAM health:
> /sys/bus/pci/devices/<device_id>/vram_bad_bad_pages
> 
> V5:
> - Categorise and handle BOs accordingly
> - Fix crash found with new debugfs tests
> V4:
> - Set block->private NULL post bo purge
> - Filter out gsm address early on
> - Rebase
> V3:
> -rename api, remove tile dependency and add status of reservation
> V2:
> - Fix mm->avail counter issue
> - Remove unused code and handle clean up in case of error
> 
> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_ttm_vram_mgr.c       | 336 +++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_ttm_vram_mgr.h       |   1 +
>  drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h |  26 ++
>  3 files changed, 363 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> index c627dbf94552..0fec7b332501 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> @@ -13,7 +13,10 @@
>  
>  #include "xe_bo.h"
>  #include "xe_device.h"
> +#include "xe_exec_queue.h"
> +#include "xe_lrc.h"
>  #include "xe_res_cursor.h"
> +#include "xe_ttm_stolen_mgr.h"
>  #include "xe_ttm_vram_mgr.h"
>  #include "xe_vram_types.h"
>  
> @@ -277,6 +280,26 @@ static const struct ttm_resource_manager_func xe_ttm_vram_mgr_func = {
>  	.debug	= xe_ttm_vram_mgr_debug
>  };
>  
> +static void xe_ttm_vram_free_bad_pages(struct drm_device *dev, struct xe_ttm_vram_mgr *mgr)
> +{
> +	struct xe_ttm_vram_offline_resource *pos, *n;
> +
> +	mutex_lock(&mgr->lock);
> +	list_for_each_entry_safe(pos, n, &mgr->offlined_pages, offlined_link) {
> +		--mgr->n_offlined_pages;
> +		gpu_buddy_free_list(&mgr->mm, &pos->blocks, 0);
> +		mgr->visible_avail += pos->used_visible_size;
> +		list_del(&pos->offlined_link);
> +		kfree(pos);
> +	}
> +	list_for_each_entry_safe(pos, n, &mgr->queued_pages, queued_link) {
> +		list_del(&pos->queued_link);
> +		mgr->n_queued_pages--;
> +		kfree(pos);
> +	}
> +	mutex_unlock(&mgr->lock);
> +}
> +
>  static void xe_ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
>  {
>  	struct xe_device *xe = to_xe_device(dev);
> @@ -288,6 +311,8 @@ static void xe_ttm_vram_mgr_fini(struct drm_device *dev, void *arg)
>  	if (ttm_resource_manager_evict_all(&xe->ttm, man))
>  		return;
>  
> +	xe_ttm_vram_free_bad_pages(dev, mgr);
> +
>  	WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
>  
>  	gpu_buddy_fini(&mgr->mm);
> @@ -316,6 +341,8 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
>  	man->func = &xe_ttm_vram_mgr_func;
>  	mgr->mem_type = mem_type;
>  	mutex_init(&mgr->lock);
> +	INIT_LIST_HEAD(&mgr->offlined_pages);
> +	INIT_LIST_HEAD(&mgr->queued_pages);
>  	mgr->default_page_size = default_page_size;
>  	mgr->visible_size = io_size;
>  	mgr->visible_avail = io_size;
> @@ -471,3 +498,312 @@ u64 xe_ttm_vram_get_avail(struct ttm_resource_manager *man)
>  
>  	return avail;
>  }
> +
> +static bool is_ttm_vram_migrate_lrc(struct xe_device *xe, struct xe_bo *pbo)

As discussed in prior reply [1] - I think this can be dropped.

[1] https://patchwork.freedesktop.org/patch/714756/?series=161473&rev=6#comment_1318048

> +{
> +	if (pbo->ttm.type == ttm_bo_type_kernel &&
> +	    pbo->flags & XE_BO_FLAG_FORCE_USER_VRAM &&
> +	    (pbo->flags & (XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE)) &&
> +	    !(pbo->flags & XE_BO_FLAG_PAGETABLE)) {
> +		unsigned long idx;
> +		struct xe_exec_queue *q;
> +		struct drm_device *dev = &xe->drm;
> +		struct drm_file *file;
> +		struct xe_lrc *lrc;
> +
> +		/* TODO : Need to extend to multitile in future if needed */
> +		mutex_lock(&dev->filelist_mutex);
> +		list_for_each_entry(file, &dev->filelist, lhead) {
> +			struct xe_file *xef = file->driver_priv;
> +
> +			mutex_lock(&xef->exec_queue.lock);
> +			xa_for_each(&xef->exec_queue.xa, idx, q) {
> +				xe_exec_queue_get(q);
> +				mutex_unlock(&xef->exec_queue.lock);
> +
> +				for (int i = 0; i < q->width; i++) {
> +					lrc = xe_exec_queue_get_lrc(q, i);
> +					if (lrc->bo == pbo) {
> +						xe_lrc_put(lrc);
> +						mutex_lock(&xef->exec_queue.lock);
> +						xe_exec_queue_put(q);
> +						mutex_unlock(&xef->exec_queue.lock);
> +						mutex_unlock(&dev->filelist_mutex);
> +						return false;
> +					}
> +					xe_lrc_put(lrc);
> +				}
> +				mutex_lock(&xef->exec_queue.lock);
> +				xe_exec_queue_put(q);
> +				mutex_unlock(&xef->exec_queue.lock);
> +			}
> +		}
> +		mutex_unlock(&dev->filelist_mutex);
> +		return true;
> +	}
> +	return false;
> +}
> +
> +static void xe_ttm_vram_purge_page(struct xe_device *xe, struct xe_bo *pbo)
> +{
> +	struct ttm_placement place = {};
> +	struct ttm_operation_ctx ctx = {
> +		.interruptible = false,
> +		.gfp_retry_mayfail = false,
> +	};
> +	bool locked;
> +	int ret = 0;
> +
> +	/*  Ban VM if BO is PPGTT */
> +	if (pbo->ttm.type == ttm_bo_type_kernel &&
> +	    pbo->flags & XE_BO_FLAG_FORCE_USER_VRAM &&
> +	    pbo->flags & XE_BO_FLAG_PAGETABLE) {

I think XE_BO_FLAG_PAGETABLE and XE_BO_FLAG_FORCE_USER_VRAM are
sufficient here.

Also, if XE_BO_FLAG_PAGETABLE is set but XE_BO_FLAG_FORCE_USER_VRAM is
clear, that means this is a kernel VM and we probably have to wedge the
device, right?

> +		down_write(&pbo->vm->lock);
> +		xe_vm_kill(pbo->vm, true);
> +		up_write(&pbo->vm->lock);
> +	}
> +
> +	/*  Ban exec queue if BO is lrc */
> +	if (pbo->ttm.type == ttm_bo_type_kernel &&
> +	    pbo->flags & XE_BO_FLAG_FORCE_USER_VRAM &&
> +	    (pbo->flags & (XE_BO_FLAG_GGTT | XE_BO_FLAG_GGTT_INVALIDATE)) &&
> +	    !(pbo->flags & XE_BO_FLAG_PAGETABLE)) {


This is a huge if statement just to determine whether this is an LRC. At
a minimum, we’d need to normalize this, and it looks very fragile—if we
change flags elsewhere in the driver, this if statement could easily
break.

Also, I can’t say I’m a fan of searching just to kill an individual
queue.

It’s a bit unfortunate that LRCs are created without a VM (I forget the
exact reasoning, but I seem to recall it was related to multi-q?)

I think what we really want to do is:

- If we find a PT or LRC BO, kill the VM.
- Update ‘kill VM’ to kill all exec queues. I honestly forget why we
  only kill preempt/rebind queues—it’s likely some nonsensical reasoning
  that we never cleaned up. We already have xe_vm_add_exec_queue(), which
  is short-circuited on xe->info.has_ctx_tlb_inval, but we can just
  remove that.
- Normalize this with an LRC BO flag and store the user_vm in the BO for
  LRCs.
- Critical kernel BOs normalized with BO flag -> wedge the device

The difference between killing a queue and killing a VM doesn’t really
matter from a user-space point of view, since typically a single-queue
hang leads to the entire process crashing or restarting—at least for
Mesa 3D. We should confirm with compute whether this is also what we’re
targeting for CRI, but I suspect the answer is the same. Even if it
isn’t, I’m not convinced per-queue killing is worthwhile. And if we
decide it is, the filelist / exec_queue.xa search is pretty much a
non-starter for me—for example, we’d need to make this much simpler and
avoid taking a bunch of locks here, which looks pretty scary.

> +		struct drm_device *dev = &xe->drm;
> +		struct xe_exec_queue *q;
> +		struct drm_file *file;
> +		struct xe_lrc *lrc;
> +		unsigned long idx;
> +
> +		/* TODO : Need to extend to multitile in future if needed */
> +		mutex_lock(&dev->filelist_mutex);
> +		list_for_each_entry(file, &dev->filelist, lhead) {
> +			struct xe_file *xef = file->driver_priv;
> +
> +			mutex_lock(&xef->exec_queue.lock);
> +			xa_for_each(&xef->exec_queue.xa, idx, q) {
> +				xe_exec_queue_get(q);
> +				mutex_unlock(&xef->exec_queue.lock);
> +
> +				for (int i = 0; i < q->width; i++) {
> +					lrc = xe_exec_queue_get_lrc(q, i);
> +					if (lrc->bo == pbo) {
> +						xe_lrc_put(lrc);
> +						xe_exec_queue_kill(q);
> +					} else {
> +						xe_lrc_put(lrc);
> +					}
> +				}
> +
> +				mutex_lock(&xef->exec_queue.lock);
> +				xe_exec_queue_put(q);
> +				mutex_unlock(&xef->exec_queue.lock);
> +			}
> +		}
> +		mutex_unlock(&dev->filelist_mutex);
> +	}
> +
> +	spin_lock(&pbo->ttm.bdev->lru_lock);
> +	locked = dma_resv_trylock(pbo->ttm.base.resv);
> +	spin_unlock(&pbo->ttm.bdev->lru_lock);
> +	WARN_ON(!locked);

Is there any reason why we can’t just take a sleeping dma_resv_lock
here (e.g. xe_bo_lock)? Also, I think the trick with the LRU lock only
works once the BO’s dma_resv has been individualized (kref == 0), which
is clearly not the case here. 

> +	ret = ttm_bo_validate(&pbo->ttm, &place, &ctx);
> +	drm_WARN_ON(&xe->drm, ret);
> +	xe_bo_put(pbo);
> +	if (locked)
> +		dma_resv_unlock(pbo->ttm.base.resv);
> +}
> +
> +static int xe_ttm_vram_reserve_page_at_addr(struct xe_device *xe, unsigned long addr,
> +					    struct xe_ttm_vram_mgr *vram_mgr, struct gpu_buddy *mm)
> +{
> +	struct xe_ttm_vram_offline_resource *nentry;
> +	struct ttm_buffer_object *tbo = NULL;
> +	struct gpu_buddy_block *block;
> +	struct gpu_buddy_block *b, *m;
> +	enum reserve_status {
> +		pending = 0,
> +		fail
> +	};
> +	u64 size = SZ_4K;
> +	int ret = 0;
> +
> +	mutex_lock(&vram_mgr->lock);

You’re going to have to fix the locking here. For example, the lock is
released inside nested if statements below, which makes this function
very difficult to follow. Personally, I can’t really focus on anything
else until this is cleaned up. I’m not saying we don’t already have bad
locking patterns in Xe—I’m sure we do—but let’s avoid introducing new
code with those patterns.

For example, it should look more like this:

mutex_lock(&vram_mgr->lock);
/* Do the minimal work that requires the lock */
mutex_unlock(&vram_mgr->lock);

/* Do other work where &vram_mgr->lock needs to be dropped */

mutex_lock(&vram_mgr->lock);
/* Do more work that requires the lock */
mutex_unlock(&vram_mgr->lock);

Also strongly prefer guards or scoped_guards too.

> +	block = gpu_buddy_addr_to_block(mm, addr);
> +	if (PTR_ERR(block) == -ENXIO) {
> +		mutex_unlock(&vram_mgr->lock);
> +		return -ENXIO;
> +	}
> +
> +	nentry = kzalloc_obj(*nentry);
> +	if (!nentry)
> +		return -ENOMEM;
> +	INIT_LIST_HEAD(&nentry->blocks);
> +	nentry->status = pending;
> +
> +	if (block) {
> +		struct xe_ttm_vram_offline_resource *pos, *n;
> +		struct xe_bo *pbo;
> +
> +		WARN_ON(!block->private);
> +		tbo = block->private;
> +		pbo = ttm_to_xe_bo(tbo);
> +
> +		xe_bo_get(pbo);

This probably needs a kref get if it’s non‑zero. If this is a zombie BO,
it should already be getting destroyed. Also, we’re going to need to
look into gutting the TTM pipeline as well, where TTM resources are
transferred to different BOs—but there’s enough to clean up here first
before we get to that.

I'm going to stop here as there is quite a bit to cleanup / simplify
before I can dig in more.

Matt

> +		/* Critical kernel BO? */
> +		if (pbo->ttm.type == ttm_bo_type_kernel &&
> +		    (!(pbo->flags & XE_BO_FLAG_FORCE_USER_VRAM) ||
> +		     is_ttm_vram_migrate_lrc(xe, pbo))) {
> +			mutex_unlock(&vram_mgr->lock);
> +			kfree(nentry);
> +			xe_ttm_vram_free_bad_pages(&xe->drm, vram_mgr);
> +			xe_bo_put(pbo);
> +			drm_err(&xe->drm,
> +				"%s: corrupt addr: 0x%lx in critical kernel bo, request reset\n",
> +				__func__, addr);
> +			/* Hint System controller driver for reset with -EIO  */
> +			return -EIO;
> +		}
> +		nentry->id = ++vram_mgr->n_queued_pages;
> +		list_add(&nentry->queued_link, &vram_mgr->queued_pages);
> +		mutex_unlock(&vram_mgr->lock);
> +
> +		/* Purge BO containing address */
> +		 xe_ttm_vram_purge_page(xe, pbo);
> +
> +		/* Reserve page at address addr*/
> +		mutex_lock(&vram_mgr->lock);
> +		ret = gpu_buddy_alloc_blocks(mm, addr, addr + size,
> +					     size, size, &nentry->blocks,
> +					     GPU_BUDDY_RANGE_ALLOCATION);
> +
> +		if (ret) {
> +			drm_warn(&xe->drm, "Could not reserve page at addr:0x%lx, ret:%d\n",
> +				 addr, ret);
> +			nentry->status = fail;
> +			mutex_unlock(&vram_mgr->lock);
> +			return ret;
> +		}
> +
> +		list_for_each_entry_safe(b, m, &nentry->blocks, link)
> +			b->private = NULL;
> +
> +		if ((addr + size) <= vram_mgr->visible_size) {
> +			nentry->used_visible_size = size;
> +		} else {
> +			list_for_each_entry(b, &nentry->blocks, link) {
> +				u64 start = gpu_buddy_block_offset(b);
> +
> +				if (start < vram_mgr->visible_size) {
> +					u64 end = start + gpu_buddy_block_size(mm, b);
> +
> +					nentry->used_visible_size +=
> +						min(end, vram_mgr->visible_size) - start;
> +				}
> +			}
> +		}
> +		vram_mgr->visible_avail -= nentry->used_visible_size;
> +		list_for_each_entry_safe(pos, n, &vram_mgr->queued_pages, queued_link) {
> +			if (pos->id == nentry->id) {
> +				--vram_mgr->n_queued_pages;
> +				list_del(&pos->queued_link);
> +				break;
> +			}
> +		}
> +		list_add(&nentry->offlined_link, &vram_mgr->offlined_pages);
> +		/* TODO: FW Integration: Send command to FW for offlining page */
> +		++vram_mgr->n_offlined_pages;
> +		mutex_unlock(&vram_mgr->lock);
> +		return ret;
> +
> +	} else {
> +		ret = gpu_buddy_alloc_blocks(mm, addr, addr + size,
> +					     size, size, &nentry->blocks,
> +					     GPU_BUDDY_RANGE_ALLOCATION);
> +		if (ret) {
> +			drm_warn(&xe->drm, "Could not reserve page at addr:0x%lx, ret:%d\n",
> +				 addr, ret);
> +			nentry->status = fail;
> +			mutex_unlock(&vram_mgr->lock);
> +			return ret;
> +		}
> +
> +		list_for_each_entry_safe(b, m, &nentry->blocks, link)
> +			b->private = NULL;
> +
> +		if ((addr + size) <= vram_mgr->visible_size) {
> +			nentry->used_visible_size = size;
> +		} else {
> +			struct gpu_buddy_block *block;
> +
> +			list_for_each_entry(block, &nentry->blocks, link) {
> +				u64 start = gpu_buddy_block_offset(block);
> +
> +				if (start < vram_mgr->visible_size) {
> +					u64 end = start + gpu_buddy_block_size(mm, block);
> +
> +					nentry->used_visible_size +=
> +						min(end, vram_mgr->visible_size) - start;
> +				}
> +			}
> +		}
> +		vram_mgr->visible_avail -= nentry->used_visible_size;
> +		nentry->id = ++vram_mgr->n_offlined_pages;
> +		list_add(&nentry->offlined_link, &vram_mgr->offlined_pages);
> +		/* TODO: FW Integration: Send command to FW for offlining page */
> +		mutex_unlock(&vram_mgr->lock);
> +	}
> +	/* Success */
> +	return ret;
> +}
> +
> +static struct xe_vram_region *xe_ttm_vram_addr_to_region(struct xe_device *xe,
> +							 resource_size_t addr)
> +{
> +	unsigned long stolen_base = xe_ttm_stolen_gpu_offset(xe);
> +	struct xe_vram_region *vr;
> +	struct xe_tile *tile;
> +	int id;
> +
> +	/* Addr from stolen memory? */
> +	if (addr + SZ_4K >= stolen_base)
> +		return NULL;
> +
> +	for_each_tile(tile, xe, id) {
> +		vr = tile->mem.vram;
> +		if ((addr <= vr->dpa_base + vr->actual_physical_size) &&
> +		    (addr + SZ_4K >= vr->dpa_base))
> +			return vr;
> +	}
> +	return NULL;
> +}
> +
> +/**
> + * xe_ttm_vram_handle_addr_fault - Handle vram physical address error flaged
> + * @xe: pointer to parent device
> + * @addr: physical faulty address
> + *
> + * Handle the physcial faulty address error on specific tile.
> + *
> + * Returns 0 for success, negative error code otherwise.
> + */
> +int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long addr)
> +{
> +	struct xe_ttm_vram_mgr *vram_mgr;
> +	struct xe_vram_region *vr;
> +	struct gpu_buddy *mm;
> +	int ret;
> +
> +	vr = xe_ttm_vram_addr_to_region(xe, addr);
> +	if (!vr) {
> +		drm_err(&xe->drm, "%s:%d addr:%lx error requesting SBR\n",
> +			__func__, __LINE__, addr);
> +		/* Hint System controller driver for reset with -EIO  */
> +		return -EIO;
> +	}
> +	vram_mgr = &vr->ttm;
> +	mm = &vram_mgr->mm;
> +	/* Reserve page at address */
> +	ret = xe_ttm_vram_reserve_page_at_addr(xe, addr, vram_mgr, mm);
> +	return ret;
> +}
> +EXPORT_SYMBOL(xe_ttm_vram_handle_addr_fault);
> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> index 87b7fae5edba..8ef06d9d44f7 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> @@ -31,6 +31,7 @@ u64 xe_ttm_vram_get_cpu_visible_size(struct ttm_resource_manager *man);
>  void xe_ttm_vram_get_used(struct ttm_resource_manager *man,
>  			  u64 *used, u64 *used_visible);
>  
> +int xe_ttm_vram_handle_addr_fault(struct xe_device *xe, unsigned long addr);
>  static inline struct xe_ttm_vram_mgr_resource *
>  to_xe_ttm_vram_mgr_resource(struct ttm_resource *res)
>  {
> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> index 9106da056b49..94eaf9d875f1 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h
> @@ -19,6 +19,14 @@ struct xe_ttm_vram_mgr {
>  	struct ttm_resource_manager manager;
>  	/** @mm: DRM buddy allocator which manages the VRAM */
>  	struct gpu_buddy mm;
> +	/** @offlined_pages: List of offlined pages */
> +	struct list_head offlined_pages;
> +	/** @n_offlined_pages: Number of offlined pages */
> +	u16 n_offlined_pages;
> +	/** @queued_pages: List of queued pages */
> +	struct list_head queued_pages;
> +	/** @n_queued_pages: Number of queued pages */
> +	u16 n_queued_pages;
>  	/** @visible_size: Proped size of the CPU visible portion */
>  	u64 visible_size;
>  	/** @visible_avail: CPU visible portion still unallocated */
> @@ -45,4 +53,22 @@ struct xe_ttm_vram_mgr_resource {
>  	unsigned long flags;
>  };
>  
> +/**
> + * struct xe_ttm_vram_offline_resource - Xe TTM VRAM offline  resource
> + */
> +struct xe_ttm_vram_offline_resource {
> +	/** @offlined_link: Link to offlined pages */
> +	struct list_head offlined_link;
> +	/** @queued_link: Link to queued pages */
> +	struct list_head queued_link;
> +	/** @blocks: list of DRM buddy blocks */
> +	struct list_head blocks;
> +	/** @used_visible_size: How many CPU visible bytes this resource is using */
> +	u64 used_visible_size;
> +	/** @id: The id of an offline resource */
> +	u16 id;
> +	/** @status: reservation status of resource */
> +	bool status;
> +};
> +
>  #endif
> -- 
> 2.52.0
>

next prev parent reply	other threads:[~2026-04-02  1:04 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-27 11:48 [RFC PATCH V6 0/7] Add memory page offlining support Tejas Upadhyay
2026-03-27 11:48 ` [RFC PATCH V6 1/7] drm/xe: Link VRAM object with gpu buddy Tejas Upadhyay
2026-04-01 23:56   ` Matthew Brost
2026-04-02  9:10     ` Upadhyay, Tejas
2026-04-02 20:50       ` Matthew Brost
2026-04-06 11:04         ` Upadhyay, Tejas
2026-03-27 11:48 ` [RFC PATCH V6 2/7] drm/gpu: Add gpu_buddy_addr_to_block helper Tejas Upadhyay
2026-04-02  0:09   ` Matthew Brost
2026-04-02 10:16     ` Matthew Auld
2026-04-02  9:12   ` Matthew Auld
2026-03-27 11:48 ` [RFC PATCH V6 3/7] drm/xe: Handle physical memory address error Tejas Upadhyay
2026-04-01 23:53   ` Matthew Brost
2026-04-02  1:03   ` Matthew Brost [this message]
2026-04-02 10:30     ` Upadhyay, Tejas
2026-04-02 20:20       ` Matthew Brost
2026-04-07 12:03         ` Upadhyay, Tejas
2026-03-27 11:48 ` [RFC PATCH V6 4/7] drm/xe/cri: Add debugfs to inject faulty vram address Tejas Upadhyay
2026-03-27 11:48 ` [RFC PATCH V6 5/7] gpu/buddy: Add routine to dump allocated buddy blocks Tejas Upadhyay
2026-03-27 11:48 ` [RFC PATCH V6 6/7] drm/xe/configfs: Add vram bad page reservation policy Tejas Upadhyay
2026-03-27 11:48 ` [RFC PATCH V6 7/7] drm/xe/cri: Add sysfs interface for bad gpu vram pages Tejas Upadhyay
2026-03-27 12:24 ` ✗ CI.checkpatch: warning for Add memory page offlining support (rev6) Patchwork
2026-03-27 12:26 ` ✓ CI.KUnit: success " Patchwork
2026-03-27 13:16 ` ✓ Xe.CI.BAT: " Patchwork
2026-03-28  4:49 ` ✓ Xe.CI.FULL: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac3Ac4n//gB3raXc@gsse-cloud1.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    --cc=tejas.upadhyay@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox