From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: intel-xe@lists.freedesktop.org
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
"Alistair Popple" <apopple@nvidia.com>,
"Ralph Campbell" <rcampbell@nvidia.com>,
"Christoph Hellwig" <hch@lst.de>,
"Jason Gunthorpe" <jgg@mellanox.com>,
"Jason Gunthorpe" <jgg@ziepe.ca>,
"Leon Romanovsky" <leon@kernel.org>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Matthew Brost" <matthew.brost@intel.com>,
"John Hubbard" <jhubbard@nvidia.com>,
linux-mm@kvack.org, dri-devel@lists.freedesktop.org,
stable@vger.kernel.org
Subject: [PATCH v3] mm: Fix a hmm_range_fault() livelock / starvation problem
Date: Tue, 3 Feb 2026 15:34:34 +0100 [thread overview]
Message-ID: <20260203143434.16349-1-thomas.hellstrom@linux.intel.com> (raw)
If hmm_range_fault() fails a folio_trylock() in do_swap_page,
trying to acquire the lock of a device-private folio for migration,
to ram, the function will spin until it succeeds grabbing the lock.
However, if the process holding the lock is depending on a work
item to be completed, which is scheduled on the same CPU as the
spinning hmm_range_fault(), that work item might be starved and
we end up in a livelock / starvation situation which is never
resolved.
This can happen, for example if the process holding the
device-private folio lock is stuck in
migrate_device_unmap()->lru_add_drain_all()
The lru_add_drain_all() function requires a short work-item
to be run on all online cpus to complete.
A prerequisite for this to happen is:
a) Both zone device and system memory folios are considered in
migrate_device_unmap(), so that there is a reason to call
lru_add_drain_all() for a system memory folio while a
folio lock is held on a zone device folio.
b) The zone device folio has an initial mapcount > 1 which causes
at least one migration PTE entry insertion to be deferred to
try_to_migrate(), which can happen after the call to
lru_add_drain_all().
c) No or voluntary only preemption.
This all seems pretty unlikely to happen, but indeed is hit by
the "xe_exec_system_allocator" igt test.
Resolve this by waiting for the folio to be unlocked if the
folio_trylock() fails in the do_swap_page() function.
Future code improvements might consider moving
the lru_add_drain_all() call in migrate_device_unmap() to be
called *after* all pages have migration entries inserted.
That would eliminate also b) above.
v2:
- Instead of a cond_resched() in the hmm_range_fault() function,
eliminate the problem by waiting for the folio to be unlocked
in do_swap_page() (Alistair Popple, Andrew Morton)
v3:
- Add a stub migration_entry_wait_on_locked() for the
!CONFIG_MIGRATION case. (Kernel Test Robot)
Suggested-by: Alistair Popple <apopple@nvidia.com>
Fixes: 1afaeb8293c9 ("mm/migrate: Trylock device page in do_swap_page")
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: linux-mm@kvack.org
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.15+
---
include/linux/migrate.h | 6 ++++++
mm/memory.c | 3 ++-
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 26ca00c325d9..800ec174b601 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -97,6 +97,12 @@ static inline int set_movable_ops(const struct movable_operations *ops, enum pag
return -ENOSYS;
}
+static inline void migration_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl)
+ __releases(ptl)
+{
+ spin_unlock(ptl);
+}
+
#endif /* CONFIG_MIGRATION */
#ifdef CONFIG_NUMA_BALANCING
diff --git a/mm/memory.c b/mm/memory.c
index da360a6eb8a4..ed20da5570d5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4684,7 +4684,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
unlock_page(vmf->page);
put_page(vmf->page);
} else {
- pte_unmap_unlock(vmf->pte, vmf->ptl);
+ pte_unmap(vmf->pte);
+ migration_entry_wait_on_locked(entry, vmf->ptl);
}
} else if (softleaf_is_hwpoison(entry)) {
ret = VM_FAULT_HWPOISON;
--
2.52.0
next reply other threads:[~2026-02-03 14:35 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-03 14:34 Thomas Hellström [this message]
2026-02-03 15:34 ` ✓ CI.KUnit: success for mm: Fix a hmm_range_fault() livelock / starvation problem (rev2) Patchwork
2026-02-03 16:08 ` ✗ Xe.CI.BAT: failure " Patchwork
2026-02-04 1:52 ` [PATCH v3] mm: Fix a hmm_range_fault() livelock / starvation problem John Hubbard
2026-02-04 10:59 ` Alistair Popple
2026-02-04 11:47 ` Thomas Hellström
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260203143434.16349-1-thomas.hellstrom@linux.intel.com \
--to=thomas.hellstrom@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=hch@lst.de \
--cc=intel-xe@lists.freedesktop.org \
--cc=jgg@mellanox.com \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=leon@kernel.org \
--cc=linux-mm@kvack.org \
--cc=matthew.brost@intel.com \
--cc=rcampbell@nvidia.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.