Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/xe/userptr: fix notifier vs folio deadlock
@ 2025-04-14 13:25 Matthew Auld
  2025-04-14 14:07 ` Hellstrom, Thomas
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Matthew Auld @ 2025-04-14 13:25 UTC (permalink / raw)
  To: intel-xe; +Cc: Thomas Hellström, Matthew Brost, stable

User is reporting what smells like notifier vs folio deadlock, where
migrate_pages_batch() on core kernel side is holding folio lock(s) and
then interacting with the mappings of it, however those mappings are
tied to some userptr, which means calling into the notifier callback and
grabbing the notifier lock. With perfect timing it looks possible that
the pages we pulled from the hmm fault can get sniped by
migrate_pages_batch() at the same time that we are holding the notifier
lock to mark the pages as accessed/dirty, but at this point we also want
to grab the folio locks(s) to mark them as dirty, but if they are
contended from notifier/migrate_pages_batch side then we deadlock since
folio lock won't be dropped until we drop the notifier lock.

Fortunately the mark_page_accessed/dirty is not really needed in the
first place it seems and should have already been done by hmm fault, so
just remove it.

Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4765
Fixes: 0a98219bcc96 ("drm/xe/hmm: Don't dereference struct page pointers without notifier lock")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: <stable@vger.kernel.org> # v6.10+
---
 drivers/gpu/drm/xe/xe_hmm.c | 24 ------------------------
 1 file changed, 24 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_hmm.c b/drivers/gpu/drm/xe/xe_hmm.c
index c3cc0fa105e8..57b71956ddf4 100644
--- a/drivers/gpu/drm/xe/xe_hmm.c
+++ b/drivers/gpu/drm/xe/xe_hmm.c
@@ -19,29 +19,6 @@ static u64 xe_npages_in_range(unsigned long start, unsigned long end)
 	return (end - start) >> PAGE_SHIFT;
 }
 
-/**
- * xe_mark_range_accessed() - mark a range is accessed, so core mm
- * have such information for memory eviction or write back to
- * hard disk
- * @range: the range to mark
- * @write: if write to this range, we mark pages in this range
- * as dirty
- */
-static void xe_mark_range_accessed(struct hmm_range *range, bool write)
-{
-	struct page *page;
-	u64 i, npages;
-
-	npages = xe_npages_in_range(range->start, range->end);
-	for (i = 0; i < npages; i++) {
-		page = hmm_pfn_to_page(range->hmm_pfns[i]);
-		if (write)
-			set_page_dirty_lock(page);
-
-		mark_page_accessed(page);
-	}
-}
-
 static int xe_alloc_sg(struct xe_device *xe, struct sg_table *st,
 		       struct hmm_range *range, struct rw_semaphore *notifier_sem)
 {
@@ -331,7 +308,6 @@ int xe_hmm_userptr_populate_range(struct xe_userptr_vma *uvma,
 	if (ret)
 		goto out_unlock;
 
-	xe_mark_range_accessed(&hmm_range, write);
 	userptr->sg = &userptr->sgt;
 	xe_hmm_userptr_set_mapped(uvma);
 	userptr->notifier_seq = hmm_range.notifier_seq;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-04-15 22:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-14 13:25 [PATCH] drm/xe/userptr: fix notifier vs folio deadlock Matthew Auld
2025-04-14 14:07 ` Hellstrom, Thomas
2025-04-14 14:07 ` ✓ CI.Patch_applied: success for " Patchwork
2025-04-14 14:08 ` ✓ CI.checkpatch: " Patchwork
2025-04-14 14:09 ` ✓ CI.KUnit: " Patchwork
2025-04-14 14:17 ` ✓ CI.Build: " Patchwork
2025-04-14 14:19 ` ✓ CI.Hooks: " Patchwork
2025-04-14 14:21 ` ✓ CI.checksparse: " Patchwork
2025-04-14 15:27 ` ✓ Xe.CI.BAT: " Patchwork
2025-04-14 20:47 ` ✗ Xe.CI.Full: failure " Patchwork
2025-04-15 22:42 ` [PATCH] " Matthew Brost

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox