From: "Yadav, Arvind" <arvind.yadav@intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: <intel-xe@lists.freedesktop.org>,
<himal.prasad.ghimiray@intel.com>,
<thomas.hellstrom@linux.intel.com>
Subject: Re: [RFC 4/7] drm/xe/vm: Add madvise autoreset interval notifier worker infrastructure
Date: Mon, 9 Mar 2026 12:37:15 +0530 [thread overview]
Message-ID: <70f3a306-15e4-4eb1-82da-74818f35b437@intel.com> (raw)
In-Reply-To: <aZ+HHMnSK49omV2Y@lstrano-desk.jf.intel.com>
[-- Attachment #1: Type: text/plain, Size: 22856 bytes --]
On 26-02-2026 05:04, Matthew Brost wrote:
> On Thu, Feb 19, 2026 at 02:43:09PM +0530, Arvind Yadav wrote:
>> MADVISE_AUTORESET needs to reset VMA attributes when userspace unmaps
>> CPU-only ranges, but the MMU invalidate callback cannot take vm->lock
>> due to lock ordering (mmap_lock is already held).
>>
>> Add mmu_interval_notifier that queues work items for MMU_NOTIFY_UNMAP
>> events. The worker runs under vm->lock and resets attributes for VMAs
>> still marked XE_VMA_CPU_AUTORESET_ACTIVE (i.e., not yet GPU-touched).
>>
>> Work items are allocated from a mempool to handle atomic context in the
>> callback. The notifier is deactivated when GPU touches the VMA.
>>
>> Cc: Matthew Brost<matthew.brost@intel.com>
>> Cc: Thomas Hellström<thomas.hellstrom@linux.intel.com>
>> Cc: Himal Prasad Ghimiray<himal.prasad.ghimiray@intel.com>
>> Signed-off-by: Arvind Yadav<arvind.yadav@intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_vm_madvise.c | 394 +++++++++++++++++++++++++++++
>> drivers/gpu/drm/xe/xe_vm_madvise.h | 8 +
>> drivers/gpu/drm/xe/xe_vm_types.h | 41 +++
>> 3 files changed, 443 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.c b/drivers/gpu/drm/xe/xe_vm_madvise.c
>> index 52147f5eaaa0..4c0ffb100bcc 100644
>> --- a/drivers/gpu/drm/xe/xe_vm_madvise.c
>> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.c
>> @@ -6,9 +6,12 @@
>> #include "xe_vm_madvise.h"
>>
>> #include <linux/nospec.h>
>> +#include <linux/mempool.h>
>> +#include <linux/workqueue.h>
>> #include <drm/xe_drm.h>
>>
>> #include "xe_bo.h"
>> +#include "xe_macros.h"
>> #include "xe_pat.h"
>> #include "xe_pt.h"
>> #include "xe_svm.h"
>> @@ -500,3 +503,394 @@ int xe_vm_madvise_ioctl(struct drm_device *dev, void *data, struct drm_file *fil
>> xe_vm_put(vm);
>> return err;
>> }
>> +
>> +/**
>> + * struct xe_madvise_work_item - Work item for unmap processing
>> + * @work: work_struct
>> + * @vm: VM reference
>> + * @pool: Mempool for recycling
>> + * @start: Start address
>> + * @end: End address
>> + */
>> +struct xe_madvise_work_item {
>> + struct work_struct work;
>> + struct xe_vm *vm;
>> + mempool_t *pool;
> Why mempool? Seems like we could just do kmalloc with correct gfp flags.
I tried kmalloc first, but ran into two issues:
GFP_KERNEL — fails because MMU notifier callbacks must not block, and
GFP_KERNEL can sleep waiting for memory reclaim.
GFP_ATOMIC — triggers a circular lockdep warning: the MMU notifier holds
mmu_notifier_invalidate_range_start, and GFP_ATOMIC internally tries to
acquire fs_reclaim, which already depends on the MMU notifier lock.
Agreed. mempool looks unnecessary here. I re-tested this with
kmalloc(..., GFP_NOWAIT) and that avoids both blocking and the
reclaim-related lockdep issue I saw with the earlier approach. I will
switch to that and drop the pool in the next version.
>
>> + u64 start;
>> + u64 end;
>> +};
>> +
>> +static void xe_vma_set_default_attributes(struct xe_vma *vma)
>> +{
>> + vma->attr.preferred_loc.devmem_fd = DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE;
>> + vma->attr.preferred_loc.migration_policy = DRM_XE_MIGRATE_ALL_PAGES;
>> + vma->attr.pat_index = vma->attr.default_pat_index;
>> + vma->attr.atomic_access = DRM_XE_ATOMIC_UNDEFINED;
>> +}
>> +
>> +/**
>> + * xe_vm_madvise_process_unmap - Process munmap for all VMAs in range
>> + * @vm: VM
>> + * @start: Start of unmap range
>> + * @end: End of unmap range
>> + *
>> + * Processes all VMAs overlapping the unmap range. An unmap can span multiple
>> + * VMAs, so we need to loop and process each segment.
>> + *
>> + * Return: 0 on success, negative error otherwise
>> + */
>> +static int xe_vm_madvise_process_unmap(struct xe_vm *vm, u64 start, u64 end)
>> +{
>> + u64 addr = start;
>> + int err;
>> +
>> + lockdep_assert_held_write(&vm->lock);
>> +
>> + if (xe_vm_is_closed_or_banned(vm))
>> + return 0;
>> +
>> + while (addr < end) {
>> + struct xe_vma *vma;
>> + u64 seg_start, seg_end;
>> + bool has_default_attr;
>> +
>> + vma = xe_vm_find_overlapping_vma(vm, addr, end);
>> + if (!vma)
>> + break;
>> +
>> + /* Skip GPU-touched VMAs - SVM handles them */
>> + if (!xe_vma_has_cpu_autoreset_active(vma)) {
>> + addr = xe_vma_end(vma);
>> + continue;
>> + }
>> +
>> + has_default_attr = xe_vma_has_default_mem_attrs(vma);
>> + seg_start = max(addr, xe_vma_start(vma));
>> + seg_end = min(end, xe_vma_end(vma));
>> +
>> + /* Expand for merging if VMA already has default attrs */
>> + if (has_default_attr &&
>> + xe_vma_start(vma) >= start &&
>> + xe_vma_end(vma) <= end) {
>> + seg_start = xe_vma_start(vma);
>> + seg_end = xe_vma_end(vma);
>> + xe_vm_find_cpu_addr_mirror_vma_range(vm, &seg_start, &seg_end);
>> + } else if (xe_vma_start(vma) == seg_start && xe_vma_end(vma) == seg_end) {
>> + xe_vma_set_default_attributes(vma);
>> + addr = seg_end;
>> + continue;
>> + }
>> +
>> + if (xe_vma_start(vma) == seg_start &&
>> + xe_vma_end(vma) == seg_end &&
>> + has_default_attr) {
>> + addr = seg_end;
>> + continue;
>> + }
>> +
>> + err = xe_vm_alloc_cpu_addr_mirror_vma(vm, seg_start, seg_end - seg_start);
>> + if (err) {
>> + if (err == -ENOENT) {
>> + addr = seg_end;
>> + continue;
>> + }
>> + return err;
>> + }
>> +
>> + addr = seg_end;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +/**
>> + * xe_madvise_work_func - Worker to process unmap
>> + * @w: work_struct
>> + *
>> + * Processes a single unmap by taking vm->lock and calling the helper.
>> + * Each unmap has its own work item, so no interval loss.
>> + */
>> +static void xe_madvise_work_func(struct work_struct *w)
>> +{
>> + struct xe_madvise_work_item *item = container_of(w, struct xe_madvise_work_item, work);
>> + struct xe_vm *vm = item->vm;
>> + int err;
>> +
>> + down_write(&vm->lock);
>> + err = xe_vm_madvise_process_unmap(vm, item->start, item->end);
>> + if (err)
>> + drm_warn(&vm->xe->drm,
>> + "madvise autoreset failed [%#llx-%#llx]: %d\n",
>> + item->start, item->end, err);
>> + /*
>> + * Best-effort: Log failure and continue.
>> + * Core correctness from CPU_AUTORESET_ACTIVE flag.
>> + */
>> + up_write(&vm->lock);
>> + xe_vm_put(vm);
>> + mempool_free(item, item->pool);
>> +}
>> +
>> +/**
>> + * xe_madvise_notifier_callback - MMU notifier callback for CPU munmap
>> + * @mni: mmu_interval_notifier
>> + * @range: mmu_notifier_range
>> + * @cur_seq: current sequence number
>> + *
>> + * Queues work to reset VMA attributes. Cannot take vm->lock (circular locking),
>> + * so uses workqueue. GFP_ATOMIC allocation may fail; drops event if so.
>> + *
>> + * Return: true (never blocks)
>> + */
>> +static bool xe_madvise_notifier_callback(struct mmu_interval_notifier *mni,
>> + const struct mmu_notifier_range *range,
>> + unsigned long cur_seq)
>> +{
>> + struct xe_madvise_notifier *notifier =
>> + container_of(mni, struct xe_madvise_notifier, mmu_notifier);
>> + struct xe_vm *vm = notifier->vm;
>> + struct xe_madvise_work_item *item;
>> + struct workqueue_struct *wq;
>> + mempool_t *pool;
>> + u64 start, end;
>> +
>> + if (range->event != MMU_NOTIFY_UNMAP)
>> + return true;
>> +
>> + /*
>> + * Best-effort: skip in non-blockable contexts to avoid building up work.
>> + * Correctness does not rely on this notifier - CPU_AUTORESET_ACTIVE flag
>> + * prevents GPU PTE zaps on CPU-only VMAs in the zap path.
>> + */
>> + if (!mmu_notifier_range_blockable(range))
>> + return true;
>> +
>> + /* Consume seq (interval-notifier convention) */
>> + mmu_interval_set_seq(mni, cur_seq);
>> +
>> + /* Best-effort: core correctness from CPU_AUTORESET_ACTIVE check in zap path */
>> +
>> + start = max_t(u64, range->start, notifier->vma_start);
>> + end = min_t(u64, range->end, notifier->vma_end);
>> +
>> + if (start >= end)
>> + return true;
>> +
>> + pool = READ_ONCE(vm->svm.madvise_work.pool);
>> + wq = READ_ONCE(vm->svm.madvise_work.wq);
>> + if (!pool || !wq || atomic_read(&vm->svm.madvise_work.closing))
> Can you explain the use of READ_ONCE, xchg, and atomics? At first glance
> it seems unnecessary or overly complicated. Let’s start with the problem
> this is trying to solve and see if we can find a simpler approach.
>
> My initial thought is a VM-wide rwsem, marked as reclaim-safe. The
> notifiers would take it in read mode to check whether the VM is tearing
> down, and the fini path would take it in write mode to initiate
> teardown...
Agreed. This got more complicated than it needs to be. I reworked it to
use a VM-wide rw_semaphore for teardown serialization, so the atomic_t,
READ_ONCE(), and xchg() go away..
>
>> + return true;
>> +
>> + /* GFP_ATOMIC to avoid fs_reclaim lockdep in notifier context */
>> + item = mempool_alloc(pool, GFP_ATOMIC);
> Again, probably just use kmalloc. Also s/GFP_ATOMIC/GFP_NOWAIT. We
> really shouldn’t be using GFP_ATOMIC in Xe per the DRM docs unless a
> failed memory allocation would take down the device. We likely abuse
> GFP_ATOMIC in several places that we should clean up, but in this case
> it’s pretty clear GFP_NOWAIT is what we want, as failure isn’t
> fatal—just sub-optimal.
Agreed. This should be |GFP_NOWAIT|, not |GFP_ATOMIC|. Allocation
failure here is non-fatal, so |GFP_NOWAIT| is the right fit. I willl
switch to |kmalloc(..., GFP_NOWAIT)| and drop the mempool.
>
>> + if (!item)
>> + return true;
>> +
>> + memset(item, 0, sizeof(*item));
>> + INIT_WORK(&item->work, xe_madvise_work_func);
>> + item->vm = xe_vm_get(vm);
>> + item->pool = pool;
>> + item->start = start;
>> + item->end = end;
>> +
>> + if (unlikely(atomic_read(&vm->svm.madvise_work.closing))) {
> Same as above the atomic usage...
Noted, Will remove.
>
>> + xe_vm_put(item->vm);
>> + mempool_free(item, pool);
>> + return true;
>> + }
>> +
>> + queue_work(wq, &item->work);
>> +
>> + return true;
>> +}
>> +
>> +static const struct mmu_interval_notifier_ops xe_madvise_notifier_ops = {
>> + .invalidate = xe_madvise_notifier_callback,
>> +};
>> +
>> +/**
>> + * xe_vm_madvise_init - Initialize madvise notifier infrastructure
>> + * @vm: VM
>> + *
>> + * Sets up workqueue and mempool for async munmap processing.
>> + *
>> + * Return: 0 on success, -ENOMEM on failure
>> + */
>> +int xe_vm_madvise_init(struct xe_vm *vm)
>> +{
>> + struct workqueue_struct *wq;
>> + mempool_t *pool;
>> +
>> + /* Always initialize list and mutex - fini may be called on partial init */
>> + INIT_LIST_HEAD(&vm->svm.madvise_notifiers.list);
>> + mutex_init(&vm->svm.madvise_notifiers.lock);
>> +
>> + wq = READ_ONCE(vm->svm.madvise_work.wq);
>> + pool = READ_ONCE(vm->svm.madvise_work.pool);
>> +
>> + /* Guard against double initialization and detect partial init */
>> + if (wq || pool) {
>> + XE_WARN_ON(!wq || !pool);
>> + return 0;
>> + }
>> +
>> + WRITE_ONCE(vm->svm.madvise_work.wq, NULL);
>> + WRITE_ONCE(vm->svm.madvise_work.pool, NULL);
>> + atomic_set(&vm->svm.madvise_work.closing, 1);
>> +
>> + /*
>> + * WQ_UNBOUND: best-effort optimization, not critical path.
>> + * No WQ_MEM_RECLAIM: worker allocates memory (VMA ops with GFP_KERNEL).
>> + * Not on reclaim path - merely resets attributes after munmap.
>> + */
>> + vm->svm.madvise_work.wq = alloc_workqueue("xe_madvise", WQ_UNBOUND, 0);
>> + if (!vm->svm.madvise_work.wq)
>> + return -ENOMEM;
>> +
>> + /* Mempool for GFP_ATOMIC allocs in notifier callback */
>> + vm->svm.madvise_work.pool =
>> + mempool_create_kmalloc_pool(64,
>> + sizeof(struct xe_madvise_work_item));
>> + if (!vm->svm.madvise_work.pool) {
>> + destroy_workqueue(vm->svm.madvise_work.wq);
>> + WRITE_ONCE(vm->svm.madvise_work.wq, NULL);
>> + return -ENOMEM;
>> + }
>> +
>> + atomic_set(&vm->svm.madvise_work.closing, 0);
>> +
>> + return 0;
>> +}
>> +
>> +/**
>> + * xe_vm_madvise_fini - Cleanup all madvise notifiers
>> + * @vm: VM
>> + *
>> + * Tears down notifiers and drains workqueue. Safe if init partially failed.
>> + * Order: closing flag → remove notifiers (SRCU sync) → drain wq → destroy.
>> + */
>> +void xe_vm_madvise_fini(struct xe_vm *vm)
>> +{
>> + struct xe_madvise_notifier *notifier, *next;
>> + struct workqueue_struct *wq;
>> + mempool_t *pool;
>> + LIST_HEAD(tmp);
>> +
>> + atomic_set(&vm->svm.madvise_work.closing, 1);
>> +
>> + /*
>> + * Detach notifiers under lock, then remove outside lock (SRCU sync can be slow).
>> + * Splice avoids holding mutex across mmu_interval_notifier_remove() SRCU sync.
>> + * Removing notifiers first (before drain) prevents new invalidate callbacks.
>> + */
>> + mutex_lock(&vm->svm.madvise_notifiers.lock);
>> + list_splice_init(&vm->svm.madvise_notifiers.list, &tmp);
>> + mutex_unlock(&vm->svm.madvise_notifiers.lock);
>> +
>> + /* Now remove notifiers without holding lock - mmu_interval_notifier_remove() SRCU-syncs */
>> + list_for_each_entry_safe(notifier, next, &tmp, list) {
>> + list_del(¬ifier->list);
>> + mmu_interval_notifier_remove(¬ifier->mmu_notifier);
>> + xe_vm_put(notifier->vm);
>> + kfree(notifier);
>> + }
>> +
>> + /* Drain and destroy workqueue */
>> + wq = xchg(&vm->svm.madvise_work.wq, NULL);
>> + if (wq) {
>> + drain_workqueue(wq);
> Work items in wq call xe_madvise_work_func, which takes vm->lock in
> write mode. If we try to drain here after the work item executing
> xe_madvise_work_func has started or is queued, I think we could
> deadlock. Lockdep should complain about this if you run a test that
> triggers xe_madvise_work_func at least once — or at least it should. If
> it doesn’t, then workqueues likely have an issue in their lockdep
> implementation as 'drain_workqueue' should touch its lockdep map which
> has tainted vm->lock (i.e., is outside of it).
>
> So perhaps call this function without vm->lock and take as need in the
> this function, then drop it drain the work queue, etc...
Good catch. Draining the workqueue while holding |vm->lock| can deadlock
against a worker that takes |vm->lock|. I fixed that by dropping
|vm->lock| before |xe_vm_madvise_fini()|. In the reworked teardown path,
|drain_workqueue()| runs with neither |vm->lock| nor the teardown
semaphore held.
>
>> + destroy_workqueue(wq);
>> + }
>> +
>> + pool = xchg(&vm->svm.madvise_work.pool, NULL);
>> + if (pool)
>> + mempool_destroy(pool);
>> +}
>> +
>> +/**
>> + * xe_vm_madvise_register_notifier_range - Register MMU notifier for address range
>> + * @vm: VM
>> + * @start: Start address (page-aligned)
>> + * @end: End address (page-aligned)
>> + *
>> + * Registers interval notifier for munmap tracking. Uses addresses (not VMA pointers)
>> + * to avoid UAF after dropping vm->lock. Deduplicates by range.
>> + *
>> + * Return: 0 on success, negative error code on failure
>> + */
>> +int xe_vm_madvise_register_notifier_range(struct xe_vm *vm, u64 start, u64 end)
>> +{
>> + struct xe_madvise_notifier *notifier, *existing;
>> + int err;
>> +
> I see this isn’t called under the vm->lock write lock. Is there a reason
> not to? I think taking it under the write lock would help with the
> teardown sequence, since you wouldn’t be able to get here if
> xe_vm_is_closed_or_banned were stable—and we wouldn’t enter this
> function if that helper returned true.
I can make the closed/banned check stable at the call site under
|vm->lock|, but I don’t think I can hold it across
|mmu_interval_notifier_insert()| itself since that may take |mmap_lock|
internally. I’ll restructure this so the state check happens under
|vm->lock|, while the actual insert remains outside that lock.
>
>> + if (!IS_ALIGNED(start, PAGE_SIZE) || !IS_ALIGNED(end, PAGE_SIZE))
>> + return -EINVAL;
>> +
>> + if (WARN_ON_ONCE(end <= start))
>> + return -EINVAL;
>> +
>> + if (atomic_read(&vm->svm.madvise_work.closing))
>> + return -ENOENT;
>> +
>> + if (!READ_ONCE(vm->svm.madvise_work.wq) ||
>> + !READ_ONCE(vm->svm.madvise_work.pool))
>> + return -ENOMEM;
>> +
>> + /* Check mm early to avoid allocation if it's missing */
>> + if (!vm->svm.gpusvm.mm)
>> + return -EINVAL;
>> +
>> + /* Dedupe: check if notifier exists for this range */
>> + mutex_lock(&vm->svm.madvise_notifiers.lock);
> If we had the vm->lock in write mode we could likely just drop
> svm.madvise_notifiers.lock for now, but once we move to fine grained
> locking in page faults [1] we'd in fact need a dedicated lock. So let's
> keep this.
>
> [1]https://patchwork.freedesktop.org/patch/707238/?series=162167&rev=2
Agreed. We should keep a dedicated lock here.
I donot think |vm->lock| can cover |mmu_interval_notifier_insert()|
itself, since that path may take |mmap_lock| internally and would risk
inverting the existing |mmap_lock -> vm->lock| ordering.
So I will keep |svm.madvise_notifiers.lock| in place. That also lines up
better with the planned fine-grained page-fault locking work.
>
>> + list_for_each_entry(existing, &vm->svm.madvise_notifiers.list, list) {
>> + if (existing->vma_start == start && existing->vma_end == end) {
> This is O(N) which typically isn't ideal. Better structure here? mtree?
> Does an mtree have its own locking so svm.madvise_notifiers.lock could
> just be dropped? I'd look into this.
Agreed. I switched this over to a maple tree, so the exact-range lookup
is no longer O(N). That also lets me drop the list walk in the duplicate
check.
>
>> + mutex_unlock(&vm->svm.madvise_notifiers.lock);
>> + return 0;
>> + }
>> + }
>> + mutex_unlock(&vm->svm.madvise_notifiers.lock);
>> +
>> + notifier = kzalloc(sizeof(*notifier), GFP_KERNEL);
>> + if (!notifier)
>> + return -ENOMEM;
>> +
>> + notifier->vm = xe_vm_get(vm);
>> + notifier->vma_start = start;
>> + notifier->vma_end = end;
>> + INIT_LIST_HEAD(¬ifier->list);
>> +
>> + err = mmu_interval_notifier_insert(¬ifier->mmu_notifier,
>> + vm->svm.gpusvm.mm,
>> + start,
>> + end - start,
>> + &xe_madvise_notifier_ops);
>> + if (err) {
>> + xe_vm_put(notifier->vm);
>> + kfree(notifier);
>> + return err;
>> + }
>> +
>> + /* Re-check closing to avoid teardown race */
>> + if (unlikely(atomic_read(&vm->svm.madvise_work.closing))) {
>> + mmu_interval_notifier_remove(¬ifier->mmu_notifier);
>> + xe_vm_put(notifier->vm);
>> + kfree(notifier);
>> + return -ENOENT;
>> + }
>> +
>> + /* Add to list - check again for concurrent registration race */
>> + mutex_lock(&vm->svm.madvise_notifiers.lock);
> If we had the vm->lock in write mode, we couldn't get concurrent
> registrations.
>
> I likely have more comments, but I have enough concerns with the locking
> and structure in this patch that I’m going to pause reviewing the series
> until most of my comments are addressed. It’s hard to focus on anything
> else until we get these issues worked out.
I think the main issue is exactly the locking story around notifier
insert/remove. We cannot hold |vm->lock| across
|mmu_interval_notifier_insert()| because that may take |mmap_lock|
internally and invert the existing ordering.
I have reworked this to simplify the teardown/registration side: drop
the atomic/READ_ONCE/xchg handling, use a single teardown |rwsem|, and
replace the list-based dedupe with a maple tree.
I will send a cleaned-up version with the locking documented more
clearly. Sorry for the churn here.
Thanks,
Arvind
>
> Matt
>
>> + list_for_each_entry(existing, &vm->svm.madvise_notifiers.list, list) {
>> + if (existing->vma_start == start && existing->vma_end == end) {
>> + mutex_unlock(&vm->svm.madvise_notifiers.lock);
>> + mmu_interval_notifier_remove(¬ifier->mmu_notifier);
>> + xe_vm_put(notifier->vm);
>> + kfree(notifier);
>> + return 0;
>> + }
>> + }
>> + list_add(¬ifier->list, &vm->svm.madvise_notifiers.list);
>> + mutex_unlock(&vm->svm.madvise_notifiers.lock);
>> +
>> + return 0;
>> +}
>> diff --git a/drivers/gpu/drm/xe/xe_vm_madvise.h b/drivers/gpu/drm/xe/xe_vm_madvise.h
>> index b0e1fc445f23..ba9cd7912113 100644
>> --- a/drivers/gpu/drm/xe/xe_vm_madvise.h
>> +++ b/drivers/gpu/drm/xe/xe_vm_madvise.h
>> @@ -6,10 +6,18 @@
>> #ifndef _XE_VM_MADVISE_H_
>> #define _XE_VM_MADVISE_H_
>>
>> +#include <linux/types.h>
>> +
>> struct drm_device;
>> struct drm_file;
>> +struct xe_vm;
>> +struct xe_vma;
>>
>> int xe_vm_madvise_ioctl(struct drm_device *dev, void *data,
>> struct drm_file *file);
>>
>> +int xe_vm_madvise_init(struct xe_vm *vm);
>> +void xe_vm_madvise_fini(struct xe_vm *vm);
>> +int xe_vm_madvise_register_notifier_range(struct xe_vm *vm, u64 start, u64 end);
>> +
>> #endif
>> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
>> index 29ff63503d4c..eb978995000c 100644
>> --- a/drivers/gpu/drm/xe/xe_vm_types.h
>> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
>> @@ -12,6 +12,7 @@
>>
>> #include <linux/dma-resv.h>
>> #include <linux/kref.h>
>> +#include <linux/mempool.h>
>> #include <linux/mmu_notifier.h>
>> #include <linux/scatterlist.h>
>>
>> @@ -29,6 +30,26 @@ struct xe_user_fence;
>> struct xe_vm;
>> struct xe_vm_pgtable_update_op;
>>
>> +/**
>> + * struct xe_madvise_notifier - CPU madvise notifier for memory attribute reset
>> + *
>> + * Tracks CPU munmap operations on SVM CPU address mirror VMAs.
>> + * When userspace unmaps CPU memory, this notifier processes attribute reset
>> + * via work queue to avoid circular locking (can't take vm->lock in callback).
>> + */
>> +struct xe_madvise_notifier {
>> + /** @mmu_notifier: MMU interval notifier */
>> + struct mmu_interval_notifier mmu_notifier;
>> + /** @vm: VM this notifier belongs to (holds reference via xe_vm_get) */
>> + struct xe_vm *vm;
>> + /** @vma_start: Start address of VMA being tracked */
>> + u64 vma_start;
>> + /** @vma_end: End address of VMA being tracked */
>> + u64 vma_end;
>> + /** @list: Link in vm->svm.madvise_notifiers.list */
>> + struct list_head list;
>> +};
>> +
>> #if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
>> #define TEST_VM_OPS_ERROR
>> #define FORCE_OP_ERROR BIT(31)
>> @@ -212,6 +233,26 @@ struct xe_vm {
>> struct xe_pagemap *pagemaps[XE_MAX_TILES_PER_DEVICE];
>> /** @svm.peer: Used for pagemap connectivity computations. */
>> struct drm_pagemap_peer peer;
>> +
>> + /**
>> + * @svm.madvise_notifiers: Active CPU madvise notifiers
>> + */
>> + struct {
>> + /** @svm.madvise_notifiers.list: List of active notifiers */
>> + struct list_head list;
>> + /** @svm.madvise_notifiers.lock: Protects notifiers list */
>> + struct mutex lock;
>> + } madvise_notifiers;
>> +
>> + /** @svm.madvise_work: Workqueue for async munmap processing */
>> + struct {
>> + /** @svm.madvise_work.wq: Workqueue */
>> + struct workqueue_struct *wq;
>> + /** @svm.madvise_work.pool: Mempool for work items */
>> + mempool_t *pool;
>> + /** @svm.madvise_work.closing: Teardown flag */
>> + atomic_t closing;
>> + } madvise_work;
>> } svm;
>>
>> struct xe_device *xe;
>> --
>> 2.43.0
>>
[-- Attachment #2: Type: text/html, Size: 27452 bytes --]
next prev parent reply other threads:[~2026-03-09 7:07 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-19 9:13 [RFC 0/7] drm/xe/svm: Add MMU notifier-based madvise autoreset on munmap Arvind Yadav
2026-02-19 9:13 ` [RFC 1/7] drm/xe/vm: Add CPU_AUTORESET_ACTIVE VMA flag Arvind Yadav
2026-02-19 9:13 ` [RFC 2/7] drm/xe/vm: Preserve CPU_AUTORESET_ACTIVE across GPUVA operations Arvind Yadav
2026-02-19 9:13 ` [RFC 3/7] drm/xe/svm: Clear CPU_AUTORESET_ACTIVE on first GPU fault Arvind Yadav
2026-02-20 20:12 ` Matthew Brost
2026-02-20 22:33 ` Matthew Brost
2026-03-05 3:38 ` Yadav, Arvind
2026-02-19 9:13 ` [RFC 4/7] drm/xe/vm: Add madvise autoreset interval notifier worker infrastructure Arvind Yadav
2026-02-25 23:34 ` Matthew Brost
2026-03-09 7:07 ` Yadav, Arvind [this message]
2026-03-09 9:32 ` Thomas Hellström
2026-03-11 6:34 ` Yadav, Arvind
2026-02-19 9:13 ` [RFC 5/7] drm/xe/vm: Deactivate madvise notifier on GPU touch Arvind Yadav
2026-02-19 9:13 ` [RFC 6/7] drm/xe/vm: Wire MADVISE_AUTORESET notifiers into VM lifecycle Arvind Yadav
2026-02-19 9:13 ` [RFC 7/7] drm/xe/svm: Correct memory attribute reset for partial unmap Arvind Yadav
2026-02-19 9:40 ` ✗ CI.checkpatch: warning for drm/xe/svm: Add MMU notifier-based madvise autoreset on munmap Patchwork
2026-02-19 9:42 ` ✓ CI.KUnit: success " Patchwork
2026-02-19 10:40 ` ✓ Xe.CI.BAT: " Patchwork
2026-02-19 13:04 ` ✗ Xe.CI.FULL: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=70f3a306-15e4-4eb1-82da-74818f35b437@intel.com \
--to=arvind.yadav@intel.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=matthew.brost@intel.com \
--cc=thomas.hellstrom@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox