Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] Add memory page offlining support
@ 2026-02-13  9:25 Tejas Upadhyay
  2026-02-13  9:25 ` [RFC PATCH 1/6] drm/xe/svm: Use res_to_mem_region Tejas Upadhyay
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Tejas Upadhyay @ 2026-02-13  9:25 UTC (permalink / raw)
  To: intel-xe; +Cc: matthew.auld, matthew.brost, himal.prasad.ghimiray,
	Tejas Upadhyay

This functionality represents a significant step in making
the xe driver gracefully handle hardware memory degradation.
By integrating with the DRM Buddy allocator, the driver
can permanently "carve out" faulty memory so it isn't reused
by subsequent allocations.

This series adds memory page offlining support with following:
1. drm/xe/svm: Use xe_vram_addr_to_region, avoid block->private usage
2. Link and track ttm BO's with physical addresses
3. Handle the generated physical address error by reserving addresses 4K page
4. Adds supporting debugfs to inject manual physcal address error
5. Add buddy block allocation dump for debuggin buddy related issues
6. Sysfs entry to provide statistics of bad gpu vram pages for user info


Opens:
1. dump_allocated_blocks() and xe_ttm_vram_addr_to_tbo() API will move under drm_buddy,
right now just to showcase concept its part of xe code

V3: use res_to_mem_region to avoid use of block->private (MattA)
V2:
- some fixes and clean up on errors
- Added xe_vram_addr_to_region helper to avoid other use of block->private (MattB)

Tejas Upadhyay (6):
  drm/xe/svm: Use res_to_mem_region
  drm/xe: Implement VRAM object tracking ability using physical address
  drm/xe: Handle physical memory address error
  [DO NOT REVIEW]drm/xe/cri: Add debugfs to inject faulty vram address
  drm/xe: Add routine to dump allocated VRAM blocks
  [DO NOT REVIEW]]drm/xe/cri: Add sysfs interface for bad gpu vram pages

 drivers/gpu/drm/xe/xe_bo.c                 |   2 +-
 drivers/gpu/drm/xe/xe_bo.h                 |   1 +
 drivers/gpu/drm/xe/xe_debugfs.c            |  49 +++
 drivers/gpu/drm/xe/xe_device_sysfs.c       |   2 +
 drivers/gpu/drm/xe/xe_svm.c                |   8 +-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c       | 355 +++++++++++++++++++++
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.h       |   6 +-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h |  23 ++
 8 files changed, 437 insertions(+), 9 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-02-24  2:17 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-13  9:25 [RFC PATCH 0/6] Add memory page offlining support Tejas Upadhyay
2026-02-13  9:25 ` [RFC PATCH 1/6] drm/xe/svm: Use res_to_mem_region Tejas Upadhyay
2026-02-24  2:16   ` Matthew Brost
2026-02-13  9:25 ` [RFC PATCH 2/6] drm/xe: Implement VRAM object tracking ability using physical address Tejas Upadhyay
2026-02-13  9:25 ` [RFC PATCH 3/6] drm/xe: Handle physical memory address error Tejas Upadhyay
2026-02-13  9:25 ` [RFC PATCH 4/6] [DO NOT REVIEW]drm/xe/cri: Add debugfs to inject faulty vram address Tejas Upadhyay
2026-02-13  9:25 ` [RFC PATCH 5/6] drm/xe: Add routine to dump allocated VRAM blocks Tejas Upadhyay
2026-02-13  9:25 ` [RFC PATCH 6/6] [DO NOT REVIEW]]drm/xe/cri: Add sysfs interface for bad gpu vram pages Tejas Upadhyay
2026-02-18  0:37   ` Rodrigo Vivi
2026-02-20 11:18     ` Aravind Iddamsetty
2026-02-20 14:52       ` Vivi, Rodrigo
2026-02-22  5:32         ` Aravind Iddamsetty
2026-02-23 21:26           ` Rodrigo Vivi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox