All of lore.kernel.org
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Honglei Huang" <honglei1.huang@amd.com>
Cc: dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v5 1/5] drm/gpusvm: split MM state flags out of drm_gpusvm_pages_flags
Date: Sat, 27 Jun 2026 04:10:35 +0000	[thread overview]
Message-ID: <20260627041035.DF0091F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260627035418.3796481-1-honglei1.huang@amd.com>

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] The page fault handler silently drops the `-EACCES` error and incorrectly returns 0 (success) when migration to device memory is not permitted, leading to an infinite page fault loop.
- [Medium] Concurrent lockless read of `migrate_devmem` races with bitfield update of `unmapped`, causing undefined behavior (KCSAN warning).
--

commit d82186ac8e065d21c32eea3f706b65fa120ad8c5
Author: Honglei Huang <honghuan@amd.com>

drm/gpusvm: split MM state flags out of drm_gpusvm_pages_flags

This commit splits the MM state flags (like unmapped and partial_unmap) out
of drm_gpusvm_pages_flags into a new drm_gpusvm_range_flags struct. It also
updates drivers to check these new range-level flags to avoid handling
unmapped pages.

Link: https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-svn-perf-6-15-2025/-/commit/623f6a50c037d9e44f6c9fbe6859a0ba7ad50177

> diff --git a/drivers/gpu/drm/drm_gpusvm.c b/drivers/gpu/drm/drm_gpusvm.c
> index 958cb605aedd5..df900553f21a4 100644
> --- a/drivers/gpu/drm/drm_gpusvm.c
> +++ b/drivers/gpu/drm/drm_gpusvm.c
[ ... ]
> @@ -1784,20 +1784,35 @@ EXPORT_SYMBOL_GPL(drm_gpusvm_has_mapping);
>  /**
>   * drm_gpusvm_range_set_unmapped() - Mark a GPU SVM range as unmapped
>   * @range: Pointer to the GPU SVM range structure.
> + * @pages: Pointer to the GPU SVM pages structure(s).
> + * @pages_count: Number of GPU SVM pages structure(s) passed in.
>   * @mmu_range: Pointer to the MMU notifier range structure.
>   *
>   * This function marks a GPU SVM range as unmapped and sets the partial_unmap flag
>   * if the range partially falls within the provided MMU notifier range.
>   */
>  void drm_gpusvm_range_set_unmapped(struct drm_gpusvm_range *range,
> +				   struct drm_gpusvm_pages *pages,
> +				   unsigned int pages_count,
>  				   const struct mmu_notifier_range *mmu_range)
>  {
> +	unsigned int i;
> +
>  	lockdep_assert_held_write(&range->gpusvm->notifier_lock);
>  
> -	range->pages.flags.unmapped = true;
> +	range->flags.unmapped = true;

[Severity: Medium]
Will this direct bitfield assignment cause a data race?

This assignment generates a read-modify-write on the shared u16 word. If a
concurrent GPU page fault triggers __xe_svm_handle_pagefault() and does a
lockless read of range->base.flags.migrate_devmem, it can race with this
update and trigger undefined behavior.

The patch correctly uses a WRITE_ONCE on a union for pages[i].flags just
below this to prevent this exact issue. Is the same union and WRITE_ONCE
pattern needed here for range->flags?

> +	for (i = 0; i < pages_count; ++i) {
> +		struct drm_gpusvm_pages_flags flags = {
> +			.__flags = pages[i].flags.__flags,
> +		};
> +
> +		flags.unmapped = true;
> +		/* WRITE_ONCE pairs with READ_ONCE for opportunistic checks */
> +		WRITE_ONCE(pages[i].flags.__flags, flags.__flags);
> +	}
[ ... ]
> diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
> index b1e1ac26c66d9..63da149f3b7fd 100644
> --- a/drivers/gpu/drm/xe/xe_svm.c
> +++ b/drivers/gpu/drm/xe/xe_svm.c
[ ... ]
> @@ -1248,8 +1249,10 @@ static int __xe_svm_handle_pagefault(struct xe_vm *vm, struct xe_vma *vma,
>  
>  	xe_svm_range_fault_count_stats_incr(gt, range);
>  
> -	if (ctx.devmem_only && !range->base.pages.flags.migrate_devmem)
> -		return -EACCES;
> +	if (ctx.devmem_only && !range->base.flags.migrate_devmem) {
> +		err = -EACCES;
> +		goto out;
> +	}

[Severity: High]
Does this silently drop the -EACCES error?

Looking at the rest of __xe_svm_handle_pagefault(), the out label
unconditionally returns 0:

out:
	xe_svm_range_fault_us_stats_incr(gt, range, start);
	return 0;

err_out:
	if (err == -EAGAIN) {

Jumping to out instead of returning the error (or jumping to err_out) signals
incorrect success for the unhandled GPU page fault. Could this lead to an
infinite page fault loop where the hardware continually retries and faults
again on the same address?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260627035418.3796481-1-honglei1.huang@amd.com?part=1

      parent reply	other threads:[~2026-06-27  4:10 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-27  3:54 [PATCH v5 1/5] drm/gpusvm: split MM state flags out of drm_gpusvm_pages_flags Honglei Huang
2026-06-27  3:54 ` [PATCH v5 2/5] drm/gpusvm: embed struct drm_device into drm_gpusvm_pages Honglei Huang
2026-06-27  4:07   ` sashiko-bot
2026-06-27  3:54 ` [PATCH v5 3/5] drm/xe: have xe_svm_range embed one drm_gpusvm_pages Honglei Huang
2026-06-27  4:16   ` sashiko-bot
2026-06-27  3:54 ` [PATCH v5 4/5] drm/gpusvm: move struct drm_gpusvm_pages out of struct drm_gpusvm_range Honglei Huang
2026-06-27  4:33   ` sashiko-bot
2026-06-27  3:54 ` [PATCH v5 5/5] drm/gpusvm: let the drm_gpusvm core context purely MM level Honglei Huang
2026-06-27  3:54 ` [PATCH v5 0/5] drm/gpusvm: split MM and device state across gpusvm/range/pages Honglei Huang
2026-06-27  4:10 ` sashiko-bot [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260627041035.DF0091F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=honglei1.huang@amd.com \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.