Re: [RFC 3/7] drm/xe/svm: Clear CPU_AUTORESET_ACTIVE on first GPU fault

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Matthew Brost <matthew.brost@intel.com>
To: Arvind Yadav <arvind.yadav@intel.com>
Cc: <intel-xe@lists.freedesktop.org>,
	<himal.prasad.ghimiray@intel.com>,
	<thomas.hellstrom@linux.intel.com>
Subject: Re: [RFC 3/7] drm/xe/svm: Clear CPU_AUTORESET_ACTIVE on first GPU fault
Date: Fri, 20 Feb 2026 12:12:32 -0800	[thread overview]
Message-ID: <aZjAMO2aGh8cHbYg@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <20260219091312.796749-4-arvind.yadav@intel.com>

On Thu, Feb 19, 2026 at 02:43:08PM +0530, Arvind Yadav wrote:
> Clear XE_VMA_CPU_AUTORESET_ACTIVE before installing GPU PTEs for CPU
> address mirror VMAs.
> 
> This marks the one-way transition from CPU-only to GPU-touched so munmap
> handling can switch from the MADVISE autoreset notifier to the existing
> SVM notifier.
> 
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Signed-off-by: Arvind Yadav <arvind.yadav@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_svm.c | 10 ++++++++++
>  drivers/gpu/drm/xe/xe_vm.h  | 11 +++++++++++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index cda3bf7e2418..b9dbbb245779 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
> @@ -1209,6 +1209,9 @@ static int __xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>  	lockdep_assert_held_write(&vm->lock);
>  	xe_assert(vm->xe, xe_vma_is_cpu_addr_mirror(vma));
>  
> +	/* Invariant: CPU_AUTORESET_ACTIVE cleared before reaching here. */
> +	WARN_ON_ONCE(xe_vma_has_cpu_autoreset_active(vma));
> +
>  	xe_gt_stats_incr(gt, XE_GT_STATS_ID_SVM_PAGEFAULT_COUNT, 1);
>  
>  retry:
> @@ -1360,6 +1363,13 @@ int xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>  			    bool atomic)
>  {
>  	int need_vram, ret;
> +
> +	lockdep_assert_held_write(&vm->lock);
> +
> +	/* Transition CPU-only -> GPU-touched before installing PTEs. */
> +	if (xe_vma_has_cpu_autoreset_active(vma))
> +		xe_vma_gpu_touch(vma);

I don’t think this will work going forward. I plan on making the fault
handler run under vm->lock in read mode [1], and VMA state will only be
allowed to be modified under the lockdep constraints in [2], which are
vm->lock in write mode or vm->lock in read mode plus the garbage
collector lock. Maybe this is fine for now, and we can rework it once
[1] and [2] land—most likely by taking the garbage collector mutex
introduced in [1] before touching this VMA’s flags.

Another issue is what happens if we don’t want to taint the VMA unless
we actually fault in a range. It is valid to not find a range to fault
in if this is a prefetch, as that fault just gets suppressed on the
device. So at a minimum, this needs to be moved to where the function
returns zero (i.e., by the out label).

[1] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-svn-perf-6-15-2025/-/commit/08fa2b95800583e804a91caf477f9c30b3440a33#33e7d2d9323cd529c8d587d9d3801e353439d783_181_179
[2] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-svn-perf-6-15-2025/-/commit/08fa2b95800583e804a91caf477f9c30b3440a33#71bf077daec46f3ebd785235e8ecc786681aff99_1112_1128


> +
>  retry:
>  	need_vram = xe_vma_need_vram_for_atomic(vm->xe, vma, atomic);
>  	if (need_vram < 0)
> diff --git a/drivers/gpu/drm/xe/xe_vm.h b/drivers/gpu/drm/xe/xe_vm.h
> index 7bf400f068ce..3dc549550c91 100644
> --- a/drivers/gpu/drm/xe/xe_vm.h
> +++ b/drivers/gpu/drm/xe/xe_vm.h
> @@ -423,4 +423,15 @@ static inline struct drm_exec *xe_vm_validation_exec(struct xe_vm *vm)
>  	((READ_ONCE(tile_present) & ~READ_ONCE(tile_invalidated)) & BIT((tile)->id))
>  
>  void xe_vma_mem_attr_copy(struct xe_vma_mem_attr *to, struct xe_vma_mem_attr *from);
> +
> +/**
> + * xe_vma_gpu_touch() - Mark VMA as GPU-touched
> + * @vma: VMA to mark
> + *
> + * Clear XE_VMA_CPU_AUTORESET_ACTIVE. Must be done before first GPU PTE install.
> + */
> +static inline void xe_vma_gpu_touch(struct xe_vma *vma)
> +{
> +	vma->gpuva.flags &= ~XE_VMA_CPU_AUTORESET_ACTIVE;

Thinking out loud — not strictly related to your series, but I think we
should route all accesses to vma->gpuva.flags through helpers with
lockdep annotations to prove we aren’t violating the rules I mentioned
above (right now this would just require vm->lock in write mode).

Perhaps when I post [1] and [2], I can clean all of that up, or if you
want to transition all access to vma->gpuva.flags to helpers (e.g.,
xe_vma_write_flags(vma, mask), xe_vma_read_flags(vma, mask),
xe_vma_clear_flags(vma, mask)), I wouldn’t complain.

Matt

> +}
>  #endif
> -- 
> 2.43.0
>

next prev parent reply	other threads:[~2026-02-20 20:12 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-19  9:13 [RFC 0/7] drm/xe/svm: Add MMU notifier-based madvise autoreset on munmap Arvind Yadav
2026-02-19  9:13 ` [RFC 1/7] drm/xe/vm: Add CPU_AUTORESET_ACTIVE VMA flag Arvind Yadav
2026-02-19  9:13 ` [RFC 2/7] drm/xe/vm: Preserve CPU_AUTORESET_ACTIVE across GPUVA operations Arvind Yadav
2026-02-19  9:13 ` [RFC 3/7] drm/xe/svm: Clear CPU_AUTORESET_ACTIVE on first GPU fault Arvind Yadav
2026-02-20 20:12   ` Matthew Brost [this message]
2026-02-20 22:33     ` Matthew Brost
2026-03-05  3:38       ` Yadav, Arvind
2026-02-19  9:13 ` [RFC 4/7] drm/xe/vm: Add madvise autoreset interval notifier worker infrastructure Arvind Yadav
2026-02-25 23:34   ` Matthew Brost
2026-03-09  7:07     ` Yadav, Arvind
2026-03-09  9:32       ` Thomas Hellström
2026-03-11  6:34         ` Yadav, Arvind
2026-02-19  9:13 ` [RFC 5/7] drm/xe/vm: Deactivate madvise notifier on GPU touch Arvind Yadav
2026-02-19  9:13 ` [RFC 6/7] drm/xe/vm: Wire MADVISE_AUTORESET notifiers into VM lifecycle Arvind Yadav
2026-02-19  9:13 ` [RFC 7/7] drm/xe/svm: Correct memory attribute reset for partial unmap Arvind Yadav
2026-02-19  9:40 ` ✗ CI.checkpatch: warning for drm/xe/svm: Add MMU notifier-based madvise autoreset on munmap Patchwork
2026-02-19  9:42 ` ✓ CI.KUnit: success " Patchwork
2026-02-19 10:40 ` ✓ Xe.CI.BAT: " Patchwork
2026-02-19 13:04 ` ✗ Xe.CI.FULL: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZjAMO2aGh8cHbYg@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=arvind.yadav@intel.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.