public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: "Souza, Jose" <jose.souza@intel.com>
Cc: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"Yadav,  Arvind" <arvind.yadav@intel.com>,
	"Mishra, Pallavi" <pallavi.mishra@intel.com>,
	"Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>,
	"Vivi, Rodrigo" <rodrigo.vivi@intel.com>,
	"thomas.hellstrom@linux.intel.com"
	<thomas.hellstrom@linux.intel.com>
Subject: Re: [PATCH v6 00/12] drm/xe/madvise: Add support for purgeable buffer objects
Date: Tue, 3 Mar 2026 14:49:27 -0800	[thread overview]
Message-ID: <aadld+E4gyjpAYY4@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <c9c1f3065d8283936bca96abfcd46ba7141406a3.camel@intel.com>

On Tue, Mar 03, 2026 at 03:05:59PM -0700, Souza, Jose wrote:
> On Tue, 2026-03-03 at 20:49 +0530, Arvind Yadav wrote:
> > This patch series introduces comprehensive support for purgeable
> > buffer objects
> > in the Xe driver, enabling userspace to provide memory usage hints
> > for better
> > memory management under system pressure.
> > 
> > Overview:
> > 
> > Purgeable memory allows applications to mark buffer objects as "not
> > currently
> > needed" (DONTNEED), making them eligible for kernel reclamation
> > during memory
> > pressure. This helps prevent OOM conditions and enables more
> > efficient GPU
> > memory utilization for workloads with temporary or regeneratable data
> > (caches,
> > intermediate results, decoded frames, etc.).
> > 
> > Purgeable BO Lifecycle:
> > 1. WILLNEED (default): BO actively needed, kernel preserves backing
> > store
> > 2. DONTNEED (user hint): BO contents discardable, eligible for
> > purging
> > 3. PURGED (kernel action): Backing store reclaimed during memory
> > pressure
> > 
> > Key Design Principles:
> >   - i915 compatibility: "Once purged, always purged" semantics -
> > purged BOs
> >     remain permanently invalid and must be destroyed/recreated
> >   - Per-VMA state tracking: Each VMA tracks its own purgeable state,
> > BO is
> >     only marked DONTNEED when ALL VMAs across ALL VMs agree (Thomas
> > Hellström)
> >   - Safety first: Imported/exported dma-bufs blocked from purgeable
> > state -
> >     no visibility into external device usage (Matt Roper)
> >   - Multiple protection layers: Validation in madvise, VM bind, mmap,
> > CPU
> >     and GPU fault handlers. GPU page faults on DONTNEED BOs are
> > rejected in
> >     xe_pagefault_begin() to preserve the GPU PTE invalidation done at
> > madvise
> >     time; without this the rebind path would re-map real pages and
> > undo the
> >     PTE zap, preventing the shrinker from ever reclaiming the BO.
> >   - Correct GPU PTE zapping: madvise_purgeable() explicitly sets
> >     skip_invalidation per VMA (false for DONTNEED, true for WILLNEED,
> > purged
> >     and dmabuf-shared BOs) so DONTNEED always triggers a GPU PTE zap
> >     regardless of prior madvise state.
> >   - Scratch PTE support: Fault-mode VMs use scratch pages for safe
> > zero reads
> >     on purged BO access.
> >   - TTM shrinker integration: Encapsulated helpers manage xe_ttm_tt-
> > >purgeable
> >     flag and shrinker page accounting (shrinkable vs purgeable
> > buckets)
> 
> 
> I get Engine memory CAT errors when using this feature:
> 
> [  240.301213] xe 0000:00:02.0: [drm] Tile0: GT0: Fault response:
> Unsuccessful -EINVAL
> [  240.301301] xe 0000:00:02.0: [drm] Tile0: GT0: Engine memory CAT
> error [18]: class=rcs, logical_mask: 0x1, guc_id=17
> [  240.302871] xe 0000:00:02.0: [drm] Tile0: GT0: Engine reset:
> engine_class=rcs, logical_mask: 0x1, guc_id=17, state=0x249
> [  240.302885] xe 0000:00:02.0: [drm] Tile0: GT0: Timedout job:
> seqno=4294967169, lrc_seqno=4294967169, guc_id=17, flags=0x0 in
> arb_map_buffer_ [3374]
> [  240.302892] xe 0000:00:02.0: [drm:xe_devcoredump [xe]] Multiple
> hangs are occurring, but only the first snapshot was taken
> 
> Mesa creates VM with DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE, probably you
> don't have a IGT test with this scenario.
> 
> @cc Rodrigo
> 
> Other issue is not related to your patches but drm_xe_madvise only
> works with non-canonical addresses and some time ago was agreed that
> all the user-visible addresses would be in canonical format.
> Not sure if we can do anything at this point but letting you know.
> 

We actually might be able to fix it to accept canonical addresses, what we
can't blindly do is make non-canonical addresses stop working... 

It might create weird scenario for UMDs though if canonical addresses
work on some kernel but not others but perhaps since this is Mesa's
first use of madvise we get this in as part of purgable and only NEO
would have to deal with this scenario.

Matt

> 
> > 
> > v2 Changes:
> >   - Reordered patches: Moved shared BO helper before main
> > implementation for
> >     proper dependency order
> >   - Fixed reference counting in mmap offset validation (use
> > drm_gem_object_put)
> >   - Removed incorrect claims about madvise(WILLNEED) restoring purged
> > BOs
> >   - Fixed error code documentation inconsistencies
> >   - Initialize purge_state_val fields to prevent kernel memory leaks
> >   - Use xe_bo_trigger_rebind() for async TLB invalidation (Thomas
> > Hellström)
> >   - Add NULL rebind with scratch PTEs for fault mode (Thomas
> > Hellström)
> >   - Implement i915-compatible retained field logic (Thomas Hellström)
> >   - Skip BO validation for purged BOs in page fault handler (crash
> > fix)
> >   - Add scratch VM check in page fault path (non-scratch VMs fail
> > fault)
> > 
> > v3 Changes (addressing Matt and Thomas Hellström feedback):
> >   - Per-VMA purgeable state tracking: Added xe_vma->purgeable_state
> > field
> >   - Complete VMA check: xe_bo_all_vmas_dontneed() walks all VMAs
> > across all
> >     VMs to ensure unanimous DONTNEED before marking BO purgeable
> >   - VMA unbind recheck: Added xe_bo_recheck_purgeable_on_vma_unbind()
> > to
> >     re-evaluate BO state when VMAs are destroyed
> >   - Block external dma-bufs: Added xe_bo_is_external_dmabuf() check
> > using
> >     drm_gem_is_imported() and obj->dma_buf to prevent purging
> > imported/exported BOs
> >   - Consistent lockdep enforcement: Added xe_bo_assert_held() to all
> > helpers
> >     that access madv_purgeable state
> >   - Simplified page table logic: Renamed is_null to is_null_or_purged
> > in
> >     xe_pt_stage_bind_entry() - purged BOs treated identically to null
> > VMAs
> >   - Removed unnecessary checks: Dropped redundant "&& bo" check in
> > xe_ttm_bo_purge()
> >   - Xe-specific warnings: Changed drm_warn() to XE_WARN_ON() in purge
> > path
> >   - Moved purge checks under locks: Purge state validation now done
> > after
> >     acquiring dma-resv lock in vma_lock_and_validate() and
> > xe_pagefault_begin()
> >   - Race-free fault handling: Removed unlocked purge check from
> >     xe_pagefault_handle_vma(), moved to locked xe_pagefault_begin()
> >   - Shrinker helper functions: Added xe_bo_set_purgeable_shrinker()
> > and
> >     xe_bo_clear_purgeable_shrinker() to encapsulate TTM purgeable
> > flag updates
> >     and shrinker page accounting, improving code clarity and
> > maintainability
> > 
> > v4 Changes (addressing Matt and Thomas Hellström feedback):
> >   - UAPI: Removed '__u64 reserved' field from purge_state_val union
> > to fit
> >     16-byte size constraint (Matt)
> >   - Changed madv_purgeable from atomic_t to u32 across all patches
> > (Matt)
> >   - CPU fault handling: Added purged check to fastpath
> > (xe_bo_cpu_fault_fastpath)
> >     to prevent hang when accessing existing mmap of purged BO
> > 
> > v5 Changes (addressing Matt and Thomas Hellström feedback):
> >   - Add locking documentation to madv_purgeable field comment (Matt)
> >   - Introduce xe_bo_set_purgeable_state() helper (void return) to
> > centralize
> >     madv_purgeable updates with xe_bo_assert_held() and state
> > transition
> >     validation using explicit enum checks (no transition out of
> > PURGED) (Matt)
> >   - Make xe_ttm_bo_purge() return int and propagate failures from
> >     xe_bo_move(); handle xe_bo_trigger_rebind() failures (e.g.
> > no_wait_gpu
> >     paths) rather than silently ignoring (Matt)
> >   - Replace drm_WARN_ON with xe_assert for better Xe-specific
> > assertions (Matt)
> >   - Hook purgeable handling into
> > madvise_funcs[DRM_XE_VMA_ATTR_PURGEABLE_STATE]
> >     instead of special-case path in xe_vm_madvise_ioctl() (Matt)
> >   - Track purgeable retained return via xe_madvise_details and
> > perform
> >     copy_to_user() from xe_madvise_details_fini() after locks are
> > dropped (Matt)
> >   - Set madvise_funcs[DRM_XE_VMA_ATTR_PURGEABLE_STATE] to NULL with
> >     __maybe_unused on madvise_purgeable() to maintain bisectability
> > until
> >     shrinker integration is complete in final patch (Matt)
> >   - Call xe_bo_recheck_purgeable_on_vma_unbind() from
> > xe_vma_destroy()
> >     right after drm_gpuva_unlink() where we already hold the BO lock,
> >     drop the trylock-based late destroy path (Matt)
> >   - Move purgeable_state into xe_vma_mem_attr with the other madvise
> >     attributes (Matt)
> >   - Drop READ_ONCE since the BO lock already protects us (Matt)
> >   - Keep returning false when there are no VMAs - otherwise we'd mark
> >     BOs purgeable without any user hint (Matt)
> >   -  Use struct xe_vma_lock_and_validate_flags instead of multiple
> > bool
> >     parameters to improve readability and prevent argument
> > transposition (Matt)
> >   - Fix LRU crash while running shrink test
> >   - Skip xe_bo_validate() for purged BOs in xe_gpuvm_validate()
> >   - Split ghost BO and zero-refcount handling in xe_bo_shrink()
> > (Thomas)
> > 
> > v6 Changes (addressing Jose Souza, Thomas Hellström and Matt Brost
> > feedback):
> >   - Document DONTNEED blocking behavior in uAPI: Clearly describe
> > which
> >     operations are blocked and with what error codes. (Thomas, Matt)
> >   - Block VM_BIND to DONTNEED BOs: Return -EBUSY to prevent creating
> > new
> >     VMAs to purgeable BOs (undefined behavior). (Thomas, Matt)
> >   - Block CPU faults to DONTNEED BOs: Return VM_FAULT_SIGBUS in both
> > fastpath
> >     and slowpath to prevent undefined behavior. (Thomas, Matt)
> >   - Block new mmap() to DONTNEED/purged BOs: Return -EBUSY for
> > DONTNEED,
> >     -EINVAL for PURGED. (Thomas, Matt)
> >   - Block dma-buf export of DONTNEED/purged BOs: Return -EBUSY for
> > DONTNEED,
> >     -EINVAL for PURGED. (Thomas, Matt)
> >   - Fix state transition bug: xe_bo_all_vmas_dontneed() now returns
> > enum to
> >     distinguish NO_VMAS (preserve state) from WILLNEED (has active
> > VMAs),
> >     preventing incorrect DONTNEED → WILLNEED flip on last VMA unmap
> > (Matt)
> >   - Set skip_invalidation explicitly in madvise_purgeable() to ensure
> >     DONTNEED always zaps GPU PTEs regardless of prior madvise state.
> >   - Add DRM_XE_QUERY_CONFIG_FLAG_HAS_PURGING_SUPPORT for userspace
> >     feature detection. (Jose)
> > 
> > Arvind Yadav (11):
> >   drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo
> >   drm/xe/madvise: Implement purgeable buffer object support
> >   drm/xe/bo: Block CPU faults to purgeable buffer objects
> >   drm/xe/vm: Prevent binding of purged buffer objects
> >   drm/xe/madvise: Implement per-VMA purgeable state tracking
> >   drm/xe/madvise: Block imported and exported dma-bufs
> >   drm/xe/bo: Block mmap of DONTNEED/purged BOs
> >   drm/xe/dma_buf: Block export of DONTNEED/purged BOs
> >   drm/xe/bo: Add purgeable shrinker state helpers
> >   drm/xe/madvise: Enable purgeable buffer object IOCTL support
> >   drm/xe/bo: Skip zero-refcount BOs in shrinker
> > 
> > Himal Prasad Ghimiray (1):
> >   drm/xe/uapi: Add UAPI support for purgeable buffer objects
> > 
> >  drivers/gpu/drm/xe/xe_bo.c         | 223 +++++++++++++++++++++--
> >  drivers/gpu/drm/xe/xe_bo.h         |  60 ++++++
> >  drivers/gpu/drm/xe/xe_bo_types.h   |   6 +
> >  drivers/gpu/drm/xe/xe_dma_buf.c    |  21 +++
> >  drivers/gpu/drm/xe/xe_pagefault.c  |  19 ++
> >  drivers/gpu/drm/xe/xe_pt.c         |  40 +++-
> >  drivers/gpu/drm/xe/xe_query.c      |   2 +
> >  drivers/gpu/drm/xe/xe_svm.c        |   1 +
> >  drivers/gpu/drm/xe/xe_vm.c         | 100 ++++++++--
> >  drivers/gpu/drm/xe/xe_vm_madvise.c | 283
> > +++++++++++++++++++++++++++++
> >  drivers/gpu/drm/xe/xe_vm_madvise.h |   3 +
> >  drivers/gpu/drm/xe/xe_vm_types.h   |  11 ++
> >  include/uapi/drm/xe_drm.h          |  60 ++++++
> >  13 files changed, 793 insertions(+), 36 deletions(-)

  reply	other threads:[~2026-03-03 22:49 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-03 15:19 [PATCH v6 00/12] drm/xe/madvise: Add support for purgeable buffer objects Arvind Yadav
2026-03-03 15:19 ` [PATCH v6 01/12] drm/xe/uapi: Add UAPI " Arvind Yadav
2026-03-03 15:53   ` Souza, Jose
2026-03-20  4:00     ` Yadav, Arvind
2026-03-10  8:31   ` Thomas Hellström
2026-03-03 15:19 ` [PATCH v6 02/12] drm/xe/bo: Add purgeable bo state tracking and field madv to xe_bo Arvind Yadav
2026-03-03 15:19 ` [PATCH v6 03/12] drm/xe/madvise: Implement purgeable buffer object support Arvind Yadav
2026-03-10  8:41   ` Thomas Hellström
2026-03-03 15:20 ` [PATCH v6 04/12] drm/xe/bo: Block CPU faults to purgeable buffer objects Arvind Yadav
2026-03-05 15:26   ` Thomas Hellström
2026-03-03 15:20 ` [PATCH v6 05/12] drm/xe/vm: Prevent binding of purged " Arvind Yadav
2026-03-05 15:38   ` Thomas Hellström
2026-03-20  2:34     ` Yadav, Arvind
2026-03-03 15:20 ` [PATCH v6 06/12] drm/xe/madvise: Implement per-VMA purgeable state tracking Arvind Yadav
2026-03-10  9:57   ` Thomas Hellström
2026-03-23  6:47     ` Yadav, Arvind
2026-03-03 15:20 ` [PATCH v6 07/12] drm/xe/madvise: Block imported and exported dma-bufs Arvind Yadav
2026-03-03 15:20 ` [PATCH v6 08/12] drm/xe/bo: Block mmap of DONTNEED/purged BOs Arvind Yadav
2026-03-10 10:17   ` Thomas Hellström
2026-03-18 13:03     ` Yadav, Arvind
2026-03-03 15:20 ` [PATCH v6 09/12] drm/xe/dma_buf: Block export " Arvind Yadav
2026-03-10 10:19   ` Thomas Hellström
2026-03-18 13:02     ` Yadav, Arvind
2026-03-03 15:20 ` [PATCH v6 10/12] drm/xe/bo: Add purgeable shrinker state helpers Arvind Yadav
2026-03-10 10:01   ` Thomas Hellström
2026-03-18 12:15     ` Yadav, Arvind
2026-03-03 15:20 ` [PATCH v6 11/12] drm/xe/madvise: Enable purgeable buffer object IOCTL support Arvind Yadav
2026-03-10 10:23   ` Thomas Hellström
2026-03-03 15:20 ` [PATCH v6 12/12] drm/xe/bo: Skip zero-refcount BOs in shrinker Arvind Yadav
2026-03-05 15:49   ` Thomas Hellström
2026-03-17  5:59     ` Yadav, Arvind
2026-03-03 16:12 ` ✗ CI.checkpatch: warning for drm/xe/madvise: Add support for purgeable buffer objects (rev7) Patchwork
2026-03-03 16:14 ` ✓ CI.KUnit: success " Patchwork
2026-03-03 16:50 ` ✓ Xe.CI.BAT: " Patchwork
2026-03-03 22:05 ` [PATCH v6 00/12] drm/xe/madvise: Add support for purgeable buffer objects Souza, Jose
2026-03-03 22:49   ` Matthew Brost [this message]
2026-03-04 13:29     ` Souza, Jose
2026-03-23  6:37       ` Yadav, Arvind
2026-03-04  4:01 ` ✗ Xe.CI.FULL: failure for drm/xe/madvise: Add support for purgeable buffer objects (rev7) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aadld+E4gyjpAYY4@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=arvind.yadav@intel.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jose.souza@intel.com \
    --cc=pallavi.mishra@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox