AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Chen, Xiaogang" <xiaogang.chen@amd.com>
To: Philip Yang <Philip.Yang@amd.com>, amd-gfx@lists.freedesktop.org
Cc: Felix.Kuehling@amd.com, harish.kasiviswanathan@amd.com
Subject: Re: [PATCH v2 3/3] drm/amdkfd: Don't stuck in svm restore worker
Date: Fri, 3 Oct 2025 14:10:43 -0500	[thread overview]
Message-ID: <d6a70010-cf58-4e9b-9980-ed1e45fee891@amd.com> (raw)
In-Reply-To: <20251003181518.24270-3-Philip.Yang@amd.com>


Reviewed-by: Xiaogang Chen <Xiaogang.Chen@amd.com>

On 10/3/2025 1:15 PM, Philip Yang wrote:
> If vma is not found, the application has freed the memory using madvise
> MADV_FREE, but driver don't receive the unmap from CPU MMU notifier
> callback, the memory is still mapped on GPUs. svm restore work will
> schedule the work to retry forever. Then user queues not resumed and
> cause application hangs to wait for queue finish.
>
> svm restore work should unmap the memory range from GPUs then resume
> queues. If GPU page fault happens on the unmapped address, it is
> application use-after-free bug.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 75 ++++++++++++++--------------
>   1 file changed, 38 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 0aadd20be56a..e87c9b3533b9 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -1708,50 +1708,51 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
>   		bool readonly;
>   
>   		vma = vma_lookup(mm, addr);
> -		if (vma) {
> -			readonly = !(vma->vm_flags & VM_WRITE);
> +		next = vma ? min(vma->vm_end, end) : end;
> +		npages = (next - addr) >> PAGE_SHIFT;
>   
> -			next = min(vma->vm_end, end);
> -			npages = (next - addr) >> PAGE_SHIFT;
> +		if (!vma || !(vma->vm_flags & VM_READ)) {
>   			/* HMM requires at least READ permissions. If provided with PROT_NONE,
>   			 * unmap the memory. If it's not already mapped, this is a no-op
>   			 * If PROT_WRITE is provided without READ, warn first then unmap
> +			 * If vma is not found, addr is invalid, unmap from GPUs
>   			 */
> -			if (!(vma->vm_flags & VM_READ)) {
> -				unsigned long e, s;
> -
> -				svm_range_lock(prange);
> -				if (vma->vm_flags & VM_WRITE)
> -					pr_debug("VM_WRITE without VM_READ is not supported");
> -				s = max(addr >> PAGE_SHIFT, prange->start);
> -				e = s + npages - 1;
> -				r = svm_range_unmap_from_gpus(prange, s, e,
> -						       KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU);
> -				svm_range_unlock(prange);
> -				/* If unmap returns non-zero, we'll bail on the next for loop
> -				 * iteration, so just leave r and continue
> -				 */
> -				addr = next;
> -				continue;
> -			}
> +			unsigned long e, s;
> +
> +			svm_range_lock(prange);
> +			if (!vma)
> +				pr_debug("vma not found\n");
> +			else if (vma->vm_flags & VM_WRITE)
> +				pr_debug("VM_WRITE without VM_READ is not supported");
> +
> +			s = max(addr >> PAGE_SHIFT, prange->start);
> +			e = s + npages - 1;
> +			r = svm_range_unmap_from_gpus(prange, s, e,
> +					       KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU);
> +			svm_range_unlock(prange);
> +			/* If unmap returns non-zero, we'll bail on the next for loop
> +			 * iteration, so just leave r and continue
> +			 */
> +			addr = next;
> +			continue;
> +		}
>   
> -			hmm_range = kzalloc(sizeof(*hmm_range), GFP_KERNEL);
> -			if (unlikely(!hmm_range)) {
> -				r = -ENOMEM;
> -			} else {
> -				WRITE_ONCE(p->svms.faulting_task, current);
> -				r = amdgpu_hmm_range_get_pages(&prange->notifier, addr, npages,
> -							       readonly, owner,
> -							       hmm_range);
> -				WRITE_ONCE(p->svms.faulting_task, NULL);
> -				if (r) {
> -					kfree(hmm_range);
> -					hmm_range = NULL;
> -					pr_debug("failed %d to get svm range pages\n", r);
> -				}
> -			}
> +		readonly = !(vma->vm_flags & VM_WRITE);
> +
> +		hmm_range = kzalloc(sizeof(*hmm_range), GFP_KERNEL);
> +		if (unlikely(!hmm_range)) {
> +			r = -ENOMEM;
>   		} else {
> -			r = -EFAULT;
> +			WRITE_ONCE(p->svms.faulting_task, current);
> +			r = amdgpu_hmm_range_get_pages(&prange->notifier, addr, npages,
> +						       readonly, owner,
> +						       hmm_range);
> +			WRITE_ONCE(p->svms.faulting_task, NULL);
> +			if (r) {
> +				kfree(hmm_range);
> +				hmm_range = NULL;
> +				pr_debug("failed %d to get svm range pages\n", r);
> +			}
>   		}
>   
>   		if (!r) {

      parent reply	other threads:[~2025-10-03 19:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-03 18:15 [PATCH v2 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL Philip Yang
2025-10-03 18:15 ` [PATCH v2 2/3] drm/amdkfd: svm unmap use page aligned address Philip Yang
2025-10-03 19:11   ` Chen, Xiaogang
2025-10-03 20:27   ` Felix Kuehling
2025-10-03 21:46     ` Philip Yang
2025-10-03 18:15 ` [PATCH v2 3/3] drm/amdkfd: Don't stuck in svm restore worker Philip Yang
2025-10-03 18:22   ` Chen, Xiaogang
2025-10-03 18:27     ` Philip Yang
2025-10-03 18:34       ` Chen, Xiaogang
2025-10-03 20:46         ` Felix Kuehling
2025-10-03 19:10   ` Chen, Xiaogang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d6a70010-cf58-4e9b-9980-ed1e45fee891@amd.com \
    --to=xiaogang.chen@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Philip.Yang@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=harish.kasiviswanathan@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox