From: Philip Yang <Philip.Yang@amd.com>
To: <amd-gfx@lists.freedesktop.org>
Cc: <Felix.Kuehling@amd.com>, Philip Yang <Philip.Yang@amd.com>
Subject: [PATCH] drm/amdkfd: Fix svm_bo and vram page refcount
Date: Fri, 26 Sep 2025 17:03:30 -0400 [thread overview]
Message-ID: <20250926210331.17401-1-Philip.Yang@amd.com> (raw)
zone_device_page_init uses set_page_count to set vram page refcount to
1, there is race if step 2 happens between step 1 and 3.
1. CPU page fault handler get vram page, migrate the vram page to
system page
2. GPU page fault migrate to the vram page, set page refcount to 1
3. CPU page fault handler put vram page, the vram page refcount is
0 and reduce the vram_bo refcount
4. vram_bo refcount is 1 off because the vram page is still used.
Afterwards, this causes use-after-free bug and page refcount warning.
zone_device_page_init should not use in page migration, change to
get_page fix the race bug.
Add WARN_ONCE to report this issue early because the refcount bug is
hard to investigate.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index d10c6673f4de..15ab2db4af1d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -217,7 +217,8 @@ svm_migrate_get_vram_page(struct svm_range *prange, unsigned long pfn)
page = pfn_to_page(pfn);
svm_range_bo_ref(prange->svm_bo);
page->zone_device_data = prange->svm_bo;
- zone_device_page_init(page);
+ get_page(page);
+ lock_page(page);
}
static void
@@ -552,6 +553,17 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc,
if (mpages) {
prange->actual_loc = best_loc;
prange->vram_pages += mpages;
+ /*
+ * To guarent we hold correct page refcount for all prange vram
+ * pages and svm_bo refcount.
+ * After prange migrated to VRAM, each vram page refcount hold
+ * one svm_bo refcount, and vram node hold one refcount.
+ * After page migrated to system memory, vram page refcount
+ * reduced to 0, svm_migrate_page_free reduce svm_bo refcount.
+ * svm_range_vram_node_free will free the svm_bo.
+ */
+ WARN_ONCE(prange->vram_pages == kref_read(&prange->svm_bo->kref),
+ "svm_bo refcount leaking\n");
} else if (!prange->actual_loc) {
/* if no page migrated and all pages from prange are at
* sys ram drop svm_bo got from svm_range_vram_node_new
--
2.49.0
next reply other threads:[~2025-09-26 21:04 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-26 21:03 Philip Yang [this message]
2025-09-26 21:38 ` [PATCH] drm/amdkfd: Fix svm_bo and vram page refcount Kasiviswanathan, Harish
2025-09-30 14:38 ` James Zhu
2025-09-30 15:48 ` Mario Limonciello
2025-10-03 21:05 ` Felix Kuehling
2025-10-03 21:18 ` Philip Yang
2025-10-03 21:46 ` Felix Kuehling
2025-10-03 22:02 ` Philip Yang
2025-10-03 22:16 ` Felix Kuehling
2025-10-06 12:55 ` Philip Yang
2025-10-06 13:21 ` Jason Gunthorpe
2025-10-06 17:51 ` Felix Kuehling
2025-10-06 18:35 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250926210331.17401-1-Philip.Yang@amd.com \
--to=philip.yang@amd.com \
--cc=Felix.Kuehling@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.