Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Tejas Upadhyay <tejas.upadhyay@intel.com>
To: intel-xe@lists.freedesktop.org
Cc: matthew.auld@intel.com, thomas.hellstrom@linux.intel.com,
	matthew.brost@intel.com,
	Tejas Upadhyay <tejas.upadhyay@intel.com>
Subject: [RFC PATCH V4 0/7] Add memory page offlining support
Date: Fri, 27 Feb 2026 19:14:37 +0530	[thread overview]
Message-ID: <20260227134453.1814649-9-tejas.upadhyay@intel.com> (raw)

This functionality represents a significant step in making
the xe driver gracefully handle hardware memory degradation.
By integrating with the DRM Buddy allocator, the driver
can permanently "carve out" faulty memory so it isn't reused
by subsequent allocations.

This series adds memory page offlining support with following:
1. drm/xe/svm: Use xe_vram_addr_to_region, avoid block->private usage
2. Link and track ttm BO's with physical addresses
3. Handle the generated physical address error by reserving addresses 4K page
4. Adds supporting debugfs to inject manual physcal address error
5. Add buddy block allocation dump for debuggin buddy related issues
6. Sysfs entry to provide statistics of bad gpu vram pages for user info
7. Add configfs for vram bad page reservation policy


Opens:
1. in case of faulty address in critical bo, wedge or go for reset via system controller, in
discussion

V4: API reworks, add configfs for policy reservation and apply config everywhere
V3: use res_to_mem_region to avoid use of block->private (MattA)
V2:
- some fixes and clean up on errors
- Added xe_vram_addr_to_region helper to avoid other use of block->private (MattB)


Tejas Upadhyay (7):
  drm/xe/svm: Use res_to_mem_region
  drm/xe: Implement VRAM object tracking ability using physical address
  drm/xe: Handle physical memory address error
  [DO_NOT_REVIEW]]drm/xe/cri: Add debugfs to inject faulty vram address
  drm/buddy: Add routine to dump allocated buddy blocks
  drm/xe/cri: Add sysfs interface for bad gpu vram pages
  drm/xe/configfs: Add vram bad page reservation policy

 drivers/gpu/drm/drm_buddy.c                |  43 +++
 drivers/gpu/drm/xe/xe_bo.c                 |  18 +-
 drivers/gpu/drm/xe/xe_bo.h                 |   1 +
 drivers/gpu/drm/xe/xe_configfs.c           |  64 +++-
 drivers/gpu/drm/xe/xe_configfs.h           |   2 +
 drivers/gpu/drm/xe/xe_debugfs.c            |  49 +++
 drivers/gpu/drm/xe/xe_device.c             |  41 +++
 drivers/gpu/drm/xe/xe_device_sysfs.c       |   7 +
 drivers/gpu/drm/xe/xe_svm.c                |  10 +-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c       | 363 +++++++++++++++++++++
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.h       |   3 +-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h |  25 ++
 12 files changed, 611 insertions(+), 15 deletions(-)

-- 
2.52.0


             reply	other threads:[~2026-02-27 13:45 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-27 13:44 Tejas Upadhyay [this message]
2026-02-27 13:44 ` [RFC PATCH V4 1/7] drm/xe/svm: Use res_to_mem_region Tejas Upadhyay
2026-02-27 13:44 ` [RFC PATCH V4 2/7] drm/xe: Implement VRAM object tracking ability using physical address Tejas Upadhyay
2026-02-27 13:44 ` [RFC PATCH V4 3/7] drm/xe: Handle physical memory address error Tejas Upadhyay
2026-03-02  5:11   ` Aravind Iddamsetty
2026-03-05  6:40     ` Upadhyay, Tejas
2026-03-06 10:29       ` Aravind Iddamsetty
2026-03-16 16:34       ` Upadhyay, Tejas
2026-02-27 13:44 ` [RFC PATCH V4 4/7] [DO_NOT_REVIEW]]drm/xe/cri: Add debugfs to inject faulty vram address Tejas Upadhyay
2026-02-27 13:44 ` [RFC PATCH V4 5/7] drm/buddy: Add routine to dump allocated buddy blocks Tejas Upadhyay
2026-02-27 13:44 ` [RFC PATCH V4 6/7] drm/xe/cri: Add sysfs interface for bad gpu vram pages Tejas Upadhyay
2026-02-27 13:44 ` [RFC PATCH V4 7/7] drm/xe/configfs: Add vram bad page reservation policy Tejas Upadhyay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260227134453.1814649-9-tejas.upadhyay@intel.com \
    --to=tejas.upadhyay@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox