AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Philip Yang <Philip.Yang@amd.com>
To: <amd-gfx@lists.freedesktop.org>
Cc: <Felix.Kuehling@amd.com>, <harish.kasiviswanathan@amd.com>,
	Philip Yang <Philip.Yang@amd.com>
Subject: [PATCH 3/3] drm/amdkfd: Don't stuck in svm restore worker
Date: Thu, 2 Oct 2025 13:43:07 -0400	[thread overview]
Message-ID: <20251002174307.10583-3-Philip.Yang@amd.com> (raw)
In-Reply-To: <20251002174307.10583-1-Philip.Yang@amd.com>

If vma is not found, the application has freed the memory using madvise
MADV_FREE, but driver don't receive the unmap from CPU MMU notifier
callback, the memory is still mapped on GPUs. svm restore work will
schedule the work to retry forever. Then user queues not resumed and
cause application hangs to wait for queue finish.

svm restore work should unmap the memory range from GPUs then resume
queues. If GPU page fault happens on the unmapped address, it is
application use-after-free bug.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 78 ++++++++++++++--------------
 1 file changed, 40 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 742c28833650..608a25c6c865 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1708,51 +1708,53 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
 		bool readonly;
 
 		vma = vma_lookup(mm, addr);
-		if (vma) {
-			readonly = !(vma->vm_flags & VM_WRITE);
+		next = vma ? min(vma->vm_end, end) : end;
 
-			next = min(vma->vm_end, end);
-			npages = (next - addr) >> PAGE_SHIFT;
+		if (!vma || !(vma->vm_flags & VM_READ)) {
 			/* HMM requires at least READ permissions. If provided with PROT_NONE,
 			 * unmap the memory. If it's not already mapped, this is a no-op
 			 * If PROT_WRITE is provided without READ, warn first then unmap
+			 * If vma is not found, addr is invalid, unmap from GPUs
 			 */
-			if (!(vma->vm_flags & VM_READ)) {
-				unsigned long e, s;
-
-				svm_range_lock(prange);
-				if (vma->vm_flags & VM_WRITE)
-					pr_debug("VM_WRITE without VM_READ is not supported");
-				s = max(start >> PAGE_SHIFT, prange->start);
-				e = min((end - 1) >> PAGE_SHIFT, prange->last);
-				if (e >= s)
-					r = svm_range_unmap_from_gpus(prange, s, e,
-						       KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU);
-				svm_range_unlock(prange);
-				/* If unmap returns non-zero, we'll bail on the next for loop
-				 * iteration, so just leave r and continue
-				 */
-				addr = next;
-				continue;
-			}
+			unsigned long e, s;
+
+			svm_range_lock(prange);
+
+			if (!vma)
+				pr_debug("vma not found\n");
+			else if (vma->vm_flags & VM_WRITE)
+				pr_debug("VM_WRITE without VM_READ is not supported");
+
+			s = max(start >> PAGE_SHIFT, prange->start);
+			e = min((end - 1) >> PAGE_SHIFT, prange->last);
+			if (e >= s)
+				r = svm_range_unmap_from_gpus(prange, s, e,
+					       KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU);
+			svm_range_unlock(prange);
+			/* If unmap returns non-zero, we'll bail on the next for loop
+			 * iteration, so just leave r and continue
+			 */
+			addr = next;
+			continue;
+		}
 
-			hmm_range = kzalloc(sizeof(*hmm_range), GFP_KERNEL);
-			if (unlikely(!hmm_range)) {
-				r = -ENOMEM;
-			} else {
-				WRITE_ONCE(p->svms.faulting_task, current);
-				r = amdgpu_hmm_range_get_pages(&prange->notifier, addr, npages,
-							       readonly, owner,
-							       hmm_range);
-				WRITE_ONCE(p->svms.faulting_task, NULL);
-				if (r) {
-					kfree(hmm_range);
-					hmm_range = NULL;
-					pr_debug("failed %d to get svm range pages\n", r);
-				}
-			}
+		readonly = !(vma->vm_flags & VM_WRITE);
+		npages = (next - addr) >> PAGE_SHIFT;
+
+		hmm_range = kzalloc(sizeof(*hmm_range), GFP_KERNEL);
+		if (unlikely(!hmm_range)) {
+			r = -ENOMEM;
 		} else {
-			r = -EFAULT;
+			WRITE_ONCE(p->svms.faulting_task, current);
+			r = amdgpu_hmm_range_get_pages(&prange->notifier, addr, npages,
+						       readonly, owner,
+						       hmm_range);
+			WRITE_ONCE(p->svms.faulting_task, NULL);
+			if (r) {
+				kfree(hmm_range);
+				hmm_range = NULL;
+				pr_debug("failed %d to get svm range pages\n", r);
+			}
 		}
 
 		if (!r) {
-- 
2.49.0


  parent reply	other threads:[~2025-10-02 17:43 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-02 17:43 [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL Philip Yang
2025-10-02 17:43 ` [PATCH 2/3] drm/amdkfd: svm unmap use page aligned address Philip Yang
2025-10-02 22:04   ` Chen, Xiaogang
2025-10-03 15:03     ` Philip Yang
2025-10-02 17:43 ` Philip Yang [this message]
2025-10-02 21:48 ` [PATCH 1/3] drm/amdgpu: svm check hmm range kzalloc return NULL Chen, Xiaogang
2025-10-03 15:12   ` Philip Yang
2025-10-03 15:39     ` Chen, Xiaogang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251002174307.10583-3-Philip.Yang@amd.com \
    --to=philip.yang@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=harish.kasiviswanathan@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox