* [PATCH v5] mm: Fix a hmm_range_fault() livelock / starvation problem
@ 2026-02-10 11:56 Thomas Hellström
2026-02-10 22:40 ` Alistair Popple
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Thomas Hellström @ 2026-02-10 11:56 UTC (permalink / raw)
To: intel-xe
Cc: Thomas Hellström, Alistair Popple, Ralph Campbell,
Christoph Hellwig, Jason Gunthorpe, Jason Gunthorpe,
Leon Romanovsky, Andrew Morton, Matthew Brost, John Hubbard,
linux-mm, dri-devel, stable
If hmm_range_fault() fails a folio_trylock() in do_swap_page,
trying to acquire the lock of a device-private folio for migration,
to ram, the function will spin until it succeeds grabbing the lock.
However, if the process holding the lock is depending on a work
item to be completed, which is scheduled on the same CPU as the
spinning hmm_range_fault(), that work item might be starved and
we end up in a livelock / starvation situation which is never
resolved.
This can happen, for example if the process holding the
device-private folio lock is stuck in
migrate_device_unmap()->lru_add_drain_all()
sinc lru_add_drain_all() requires a short work-item
to be run on all online cpus to complete.
A prerequisite for this to happen is:
a) Both zone device and system memory folios are considered in
migrate_device_unmap(), so that there is a reason to call
lru_add_drain_all() for a system memory folio while a
folio lock is held on a zone device folio.
b) The zone device folio has an initial mapcount > 1 which causes
at least one migration PTE entry insertion to be deferred to
try_to_migrate(), which can happen after the call to
lru_add_drain_all().
c) No or voluntary only preemption.
This all seems pretty unlikely to happen, but indeed is hit by
the "xe_exec_system_allocator" igt test.
Resolve this by waiting for the folio to be unlocked if the
folio_trylock() fails in do_swap_page().
Rename migration_entry_wait_on_locked() to
softleaf_entry_wait_unlock() and update its documentation to
indicate the new use-case.
Future code improvements might consider moving
the lru_add_drain_all() call in migrate_device_unmap() to be
called *after* all pages have migration entries inserted.
That would eliminate also b) above.
v2:
- Instead of a cond_resched() in hmm_range_fault(),
eliminate the problem by waiting for the folio to be unlocked
in do_swap_page() (Alistair Popple, Andrew Morton)
v3:
- Add a stub migration_entry_wait_on_locked() for the
!CONFIG_MIGRATION case. (Kernel Test Robot)
v4:
- Rename migrate_entry_wait_on_locked() to
softleaf_entry_wait_on_locked() and update docs (Alistair Popple)
v5:
- Add a WARN_ON_ONCE() for the !CONFIG_MIGRATION
version of softleaf_entry_wait_on_locked().
- Modify wording around function names in the commit message
(Andrew Morton)
Suggested-by: Alistair Popple <apopple@nvidia.com>
Fixes: 1afaeb8293c9 ("mm/migrate: Trylock device page in do_swap_page")
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: linux-mm@kvack.org
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.15+
Reviewed-by: John Hubbard <jhubbard@nvidia.com> #v3
---
include/linux/migrate.h | 10 +++++++++-
mm/filemap.c | 15 ++++++++++-----
mm/memory.c | 3 ++-
mm/migrate.c | 8 ++++----
mm/migrate_device.c | 2 +-
5 files changed, 26 insertions(+), 12 deletions(-)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 26ca00c325d9..d5af2b7f577b 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -65,7 +65,7 @@ bool isolate_folio_to_list(struct folio *folio, struct list_head *list);
int migrate_huge_page_move_mapping(struct address_space *mapping,
struct folio *dst, struct folio *src);
-void migration_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl)
+void softleaf_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl)
__releases(ptl);
void folio_migrate_flags(struct folio *newfolio, struct folio *folio);
int folio_migrate_mapping(struct address_space *mapping,
@@ -97,6 +97,14 @@ static inline int set_movable_ops(const struct movable_operations *ops, enum pag
return -ENOSYS;
}
+static inline void softleaf_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl)
+ __releases(ptl)
+{
+ WARN_ON_ONCE(1);
+
+ spin_unlock(ptl);
+}
+
#endif /* CONFIG_MIGRATION */
#ifdef CONFIG_NUMA_BALANCING
diff --git a/mm/filemap.c b/mm/filemap.c
index ebd75684cb0a..d98e4883f13d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1379,14 +1379,16 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
#ifdef CONFIG_MIGRATION
/**
- * migration_entry_wait_on_locked - Wait for a migration entry to be removed
- * @entry: migration swap entry.
+ * softleaf_entry_wait_on_locked - Wait for a migration entry or
+ * device_private entry to be removed.
+ * @entry: migration or device_private swap entry.
* @ptl: already locked ptl. This function will drop the lock.
*
- * Wait for a migration entry referencing the given page to be removed. This is
+ * Wait for a migration entry referencing the given page, or device_private
+ * entry referencing a dvice_private page to be unlocked. This is
* equivalent to folio_put_wait_locked(folio, TASK_UNINTERRUPTIBLE) except
* this can be called without taking a reference on the page. Instead this
- * should be called while holding the ptl for the migration entry referencing
+ * should be called while holding the ptl for @entry referencing
* the page.
*
* Returns after unlocking the ptl.
@@ -1394,7 +1396,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
* This follows the same logic as folio_wait_bit_common() so see the comments
* there.
*/
-void migration_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl)
+void softleaf_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl)
__releases(ptl)
{
struct wait_page_queue wait_page;
@@ -1428,6 +1430,9 @@ void migration_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl)
* If a migration entry exists for the page the migration path must hold
* a valid reference to the page, and it must take the ptl to remove the
* migration entry. So the page is valid until the ptl is dropped.
+ * Similarly any path attempting to drop the last reference to a
+ * device-private page needs to grab the ptl to remove the device-private
+ * entry.
*/
spin_unlock(ptl);
diff --git a/mm/memory.c b/mm/memory.c
index da360a6eb8a4..20172476a57f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4684,7 +4684,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
unlock_page(vmf->page);
put_page(vmf->page);
} else {
- pte_unmap_unlock(vmf->pte, vmf->ptl);
+ pte_unmap(vmf->pte);
+ softleaf_entry_wait_on_locked(entry, vmf->ptl);
}
} else if (softleaf_is_hwpoison(entry)) {
ret = VM_FAULT_HWPOISON;
diff --git a/mm/migrate.c b/mm/migrate.c
index 4688b9e38cd2..cf6449b4202e 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -499,7 +499,7 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
if (!softleaf_is_migration(entry))
goto out;
- migration_entry_wait_on_locked(entry, ptl);
+ softleaf_entry_wait_on_locked(entry, ptl);
return;
out:
spin_unlock(ptl);
@@ -531,10 +531,10 @@ void migration_entry_wait_huge(struct vm_area_struct *vma, unsigned long addr, p
* If migration entry existed, safe to release vma lock
* here because the pgtable page won't be freed without the
* pgtable lock released. See comment right above pgtable
- * lock release in migration_entry_wait_on_locked().
+ * lock release in softleaf_entry_wait_on_locked().
*/
hugetlb_vma_unlock_read(vma);
- migration_entry_wait_on_locked(entry, ptl);
+ softleaf_entry_wait_on_locked(entry, ptl);
return;
}
@@ -552,7 +552,7 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
ptl = pmd_lock(mm, pmd);
if (!pmd_is_migration_entry(*pmd))
goto unlock;
- migration_entry_wait_on_locked(softleaf_from_pmd(*pmd), ptl);
+ softleaf_entry_wait_on_locked(softleaf_from_pmd(*pmd), ptl);
return;
unlock:
spin_unlock(ptl);
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 23379663b1e1..deab89fd4541 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -176,7 +176,7 @@ static int migrate_vma_collect_huge_pmd(pmd_t *pmdp, unsigned long start,
}
if (softleaf_is_migration(entry)) {
- migration_entry_wait_on_locked(entry, ptl);
+ softleaf_entry_wait_on_locked(entry, ptl);
spin_unlock(ptl);
return -EAGAIN;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v5] mm: Fix a hmm_range_fault() livelock / starvation problem
2026-02-10 11:56 [PATCH v5] mm: Fix a hmm_range_fault() livelock / starvation problem Thomas Hellström
@ 2026-02-10 22:40 ` Alistair Popple
2026-02-11 22:23 ` Davidlohr Bueso
2026-04-06 12:54 ` Lorenzo Stoakes (Oracle)
2 siblings, 0 replies; 9+ messages in thread
From: Alistair Popple @ 2026-02-10 22:40 UTC (permalink / raw)
To: Thomas Hellström
Cc: intel-xe, Ralph Campbell, Christoph Hellwig, Jason Gunthorpe,
Jason Gunthorpe, Leon Romanovsky, Andrew Morton, Matthew Brost,
John Hubbard, linux-mm, dri-devel, stable
On 2026-02-10 at 22:56 +1100, Thomas Hellström <thomas.hellstrom@linux.intel.com> wrote...
> If hmm_range_fault() fails a folio_trylock() in do_swap_page,
> trying to acquire the lock of a device-private folio for migration,
> to ram, the function will spin until it succeeds grabbing the lock.
>
> However, if the process holding the lock is depending on a work
> item to be completed, which is scheduled on the same CPU as the
> spinning hmm_range_fault(), that work item might be starved and
> we end up in a livelock / starvation situation which is never
> resolved.
>
> This can happen, for example if the process holding the
> device-private folio lock is stuck in
> migrate_device_unmap()->lru_add_drain_all()
> sinc lru_add_drain_all() requires a short work-item
> to be run on all online cpus to complete.
>
> A prerequisite for this to happen is:
> a) Both zone device and system memory folios are considered in
> migrate_device_unmap(), so that there is a reason to call
> lru_add_drain_all() for a system memory folio while a
> folio lock is held on a zone device folio.
> b) The zone device folio has an initial mapcount > 1 which causes
> at least one migration PTE entry insertion to be deferred to
> try_to_migrate(), which can happen after the call to
> lru_add_drain_all().
> c) No or voluntary only preemption.
>
> This all seems pretty unlikely to happen, but indeed is hit by
> the "xe_exec_system_allocator" igt test.
>
> Resolve this by waiting for the folio to be unlocked if the
> folio_trylock() fails in do_swap_page().
>
> Rename migration_entry_wait_on_locked() to
> softleaf_entry_wait_unlock() and update its documentation to
> indicate the new use-case.
>
> Future code improvements might consider moving
> the lru_add_drain_all() call in migrate_device_unmap() to be
> called *after* all pages have migration entries inserted.
> That would eliminate also b) above.
>
> v2:
> - Instead of a cond_resched() in hmm_range_fault(),
> eliminate the problem by waiting for the folio to be unlocked
> in do_swap_page() (Alistair Popple, Andrew Morton)
> v3:
> - Add a stub migration_entry_wait_on_locked() for the
> !CONFIG_MIGRATION case. (Kernel Test Robot)
> v4:
> - Rename migrate_entry_wait_on_locked() to
> softleaf_entry_wait_on_locked() and update docs (Alistair Popple)
> v5:
> - Add a WARN_ON_ONCE() for the !CONFIG_MIGRATION
> version of softleaf_entry_wait_on_locked().
Thanks!
Reviewed-by: Alistair Popple <apopple@nvidia.com>
> - Modify wording around function names in the commit message
> (Andrew Morton)
>
> Suggested-by: Alistair Popple <apopple@nvidia.com>
> Fixes: 1afaeb8293c9 ("mm/migrate: Trylock device page in do_swap_page")
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Jason Gunthorpe <jgg@mellanox.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Leon Romanovsky <leon@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: linux-mm@kvack.org
> Cc: <dri-devel@lists.freedesktop.org>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: <stable@vger.kernel.org> # v6.15+
> Reviewed-by: John Hubbard <jhubbard@nvidia.com> #v3
> ---
> include/linux/migrate.h | 10 +++++++++-
> mm/filemap.c | 15 ++++++++++-----
> mm/memory.c | 3 ++-
> mm/migrate.c | 8 ++++----
> mm/migrate_device.c | 2 +-
> 5 files changed, 26 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 26ca00c325d9..d5af2b7f577b 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -65,7 +65,7 @@ bool isolate_folio_to_list(struct folio *folio, struct list_head *list);
>
> int migrate_huge_page_move_mapping(struct address_space *mapping,
> struct folio *dst, struct folio *src);
> -void migration_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl)
> +void softleaf_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl)
> __releases(ptl);
> void folio_migrate_flags(struct folio *newfolio, struct folio *folio);
> int folio_migrate_mapping(struct address_space *mapping,
> @@ -97,6 +97,14 @@ static inline int set_movable_ops(const struct movable_operations *ops, enum pag
> return -ENOSYS;
> }
>
> +static inline void softleaf_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl)
> + __releases(ptl)
> +{
> + WARN_ON_ONCE(1);
> +
> + spin_unlock(ptl);
> +}
> +
> #endif /* CONFIG_MIGRATION */
>
> #ifdef CONFIG_NUMA_BALANCING
> diff --git a/mm/filemap.c b/mm/filemap.c
> index ebd75684cb0a..d98e4883f13d 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1379,14 +1379,16 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>
> #ifdef CONFIG_MIGRATION
> /**
> - * migration_entry_wait_on_locked - Wait for a migration entry to be removed
> - * @entry: migration swap entry.
> + * softleaf_entry_wait_on_locked - Wait for a migration entry or
> + * device_private entry to be removed.
> + * @entry: migration or device_private swap entry.
> * @ptl: already locked ptl. This function will drop the lock.
> *
> - * Wait for a migration entry referencing the given page to be removed. This is
> + * Wait for a migration entry referencing the given page, or device_private
> + * entry referencing a dvice_private page to be unlocked. This is
> * equivalent to folio_put_wait_locked(folio, TASK_UNINTERRUPTIBLE) except
> * this can be called without taking a reference on the page. Instead this
> - * should be called while holding the ptl for the migration entry referencing
> + * should be called while holding the ptl for @entry referencing
> * the page.
> *
> * Returns after unlocking the ptl.
> @@ -1394,7 +1396,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
> * This follows the same logic as folio_wait_bit_common() so see the comments
> * there.
> */
> -void migration_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl)
> +void softleaf_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl)
> __releases(ptl)
> {
> struct wait_page_queue wait_page;
> @@ -1428,6 +1430,9 @@ void migration_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl)
> * If a migration entry exists for the page the migration path must hold
> * a valid reference to the page, and it must take the ptl to remove the
> * migration entry. So the page is valid until the ptl is dropped.
> + * Similarly any path attempting to drop the last reference to a
> + * device-private page needs to grab the ptl to remove the device-private
> + * entry.
> */
> spin_unlock(ptl);
>
> diff --git a/mm/memory.c b/mm/memory.c
> index da360a6eb8a4..20172476a57f 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4684,7 +4684,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> unlock_page(vmf->page);
> put_page(vmf->page);
> } else {
> - pte_unmap_unlock(vmf->pte, vmf->ptl);
> + pte_unmap(vmf->pte);
> + softleaf_entry_wait_on_locked(entry, vmf->ptl);
> }
> } else if (softleaf_is_hwpoison(entry)) {
> ret = VM_FAULT_HWPOISON;
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 4688b9e38cd2..cf6449b4202e 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -499,7 +499,7 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
> if (!softleaf_is_migration(entry))
> goto out;
>
> - migration_entry_wait_on_locked(entry, ptl);
> + softleaf_entry_wait_on_locked(entry, ptl);
> return;
> out:
> spin_unlock(ptl);
> @@ -531,10 +531,10 @@ void migration_entry_wait_huge(struct vm_area_struct *vma, unsigned long addr, p
> * If migration entry existed, safe to release vma lock
> * here because the pgtable page won't be freed without the
> * pgtable lock released. See comment right above pgtable
> - * lock release in migration_entry_wait_on_locked().
> + * lock release in softleaf_entry_wait_on_locked().
> */
> hugetlb_vma_unlock_read(vma);
> - migration_entry_wait_on_locked(entry, ptl);
> + softleaf_entry_wait_on_locked(entry, ptl);
> return;
> }
>
> @@ -552,7 +552,7 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
> ptl = pmd_lock(mm, pmd);
> if (!pmd_is_migration_entry(*pmd))
> goto unlock;
> - migration_entry_wait_on_locked(softleaf_from_pmd(*pmd), ptl);
> + softleaf_entry_wait_on_locked(softleaf_from_pmd(*pmd), ptl);
> return;
> unlock:
> spin_unlock(ptl);
> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> index 23379663b1e1..deab89fd4541 100644
> --- a/mm/migrate_device.c
> +++ b/mm/migrate_device.c
> @@ -176,7 +176,7 @@ static int migrate_vma_collect_huge_pmd(pmd_t *pmdp, unsigned long start,
> }
>
> if (softleaf_is_migration(entry)) {
> - migration_entry_wait_on_locked(entry, ptl);
> + softleaf_entry_wait_on_locked(entry, ptl);
> spin_unlock(ptl);
> return -EAGAIN;
> }
> --
> 2.52.0
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v5] mm: Fix a hmm_range_fault() livelock / starvation problem
2026-02-10 11:56 [PATCH v5] mm: Fix a hmm_range_fault() livelock / starvation problem Thomas Hellström
2026-02-10 22:40 ` Alistair Popple
@ 2026-02-11 22:23 ` Davidlohr Bueso
2026-02-11 22:54 ` Alistair Popple
2026-04-06 12:54 ` Lorenzo Stoakes (Oracle)
2 siblings, 1 reply; 9+ messages in thread
From: Davidlohr Bueso @ 2026-02-11 22:23 UTC (permalink / raw)
To: Thomas Hellstr�m
Cc: intel-xe, Alistair Popple, Ralph Campbell, Christoph Hellwig,
Jason Gunthorpe, Jason Gunthorpe, Leon Romanovsky, Andrew Morton,
Matthew Brost, John Hubbard, linux-mm, dri-devel, stable
On Tue, 10 Feb 2026, Thomas Hellstr�m wrote:
>@@ -176,7 +176,7 @@ static int migrate_vma_collect_huge_pmd(pmd_t *pmdp, unsigned long start,
> }
>
> if (softleaf_is_migration(entry)) {
>- migration_entry_wait_on_locked(entry, ptl);
>+ softleaf_entry_wait_on_locked(entry, ptl);
> spin_unlock(ptl);
softleaf_entry_wait_on_locked() unconditionally drops the ptl.
> return -EAGAIN;
> }
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v5] mm: Fix a hmm_range_fault() livelock / starvation problem
2026-02-11 22:23 ` Davidlohr Bueso
@ 2026-02-11 22:54 ` Alistair Popple
2026-02-11 23:22 ` Matthew Brost
0 siblings, 1 reply; 9+ messages in thread
From: Alistair Popple @ 2026-02-11 22:54 UTC (permalink / raw)
To: Thomas Hellstr�m, intel-xe, Ralph Campbell,
Christoph Hellwig, Jason Gunthorpe, Jason Gunthorpe,
Leon Romanovsky, Andrew Morton, Matthew Brost, John Hubbard,
linux-mm, dri-devel, stable
On 2026-02-12 at 09:23 +1100, Davidlohr Bueso <dave@stgolabs.net> wrote...
> On Tue, 10 Feb 2026, Thomas Hellstr�m wrote:
>
> > @@ -176,7 +176,7 @@ static int migrate_vma_collect_huge_pmd(pmd_t *pmdp, unsigned long start,
> > }
> >
> > if (softleaf_is_migration(entry)) {
> > - migration_entry_wait_on_locked(entry, ptl);
> > + softleaf_entry_wait_on_locked(entry, ptl);
> > spin_unlock(ptl);
>
> softleaf_entry_wait_on_locked() unconditionally drops the ptl.
As does migration_entry_wait_on_locked() so obviously a pre-existing issue.
I'm not sure why we would wait on a migration entry here though, maybe Balbir
can help?
> > return -EAGAIN;
> > }
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v5] mm: Fix a hmm_range_fault() livelock / starvation problem
2026-02-11 22:54 ` Alistair Popple
@ 2026-02-11 23:22 ` Matthew Brost
0 siblings, 0 replies; 9+ messages in thread
From: Matthew Brost @ 2026-02-11 23:22 UTC (permalink / raw)
To: Alistair Popple
Cc: Thomas Hellstr�m, intel-xe, Ralph Campbell,
Christoph Hellwig, Jason Gunthorpe, Jason Gunthorpe,
Leon Romanovsky, Andrew Morton, John Hubbard, linux-mm, dri-devel,
stable
On Thu, Feb 12, 2026 at 09:54:50AM +1100, Alistair Popple wrote:
> On 2026-02-12 at 09:23 +1100, Davidlohr Bueso <dave@stgolabs.net> wrote...
> > On Tue, 10 Feb 2026, Thomas Hellstr�m wrote:
> >
> > > @@ -176,7 +176,7 @@ static int migrate_vma_collect_huge_pmd(pmd_t *pmdp, unsigned long start,
> > > }
> > >
> > > if (softleaf_is_migration(entry)) {
> > > - migration_entry_wait_on_locked(entry, ptl);
> > > + softleaf_entry_wait_on_locked(entry, ptl);
> > > spin_unlock(ptl);
> >
> > softleaf_entry_wait_on_locked() unconditionally drops the ptl.
>
> As does migration_entry_wait_on_locked() so obviously a pre-existing issue.
> I'm not sure why we would wait on a migration entry here though, maybe Balbir
> can help?
I noticed this recently as being odd, given that we don’t wait on PTE
migration entries.
Looking again, this is unreachable code, since we bail out just above
this if statement on !softleaf_is_device_private(entry). So we should
just delete this entire if statement.
Matt
>
> > > return -EAGAIN;
> > > }
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v5] mm: Fix a hmm_range_fault() livelock / starvation problem
2026-02-10 11:56 [PATCH v5] mm: Fix a hmm_range_fault() livelock / starvation problem Thomas Hellström
2026-02-10 22:40 ` Alistair Popple
2026-02-11 22:23 ` Davidlohr Bueso
@ 2026-04-06 12:54 ` Lorenzo Stoakes (Oracle)
2026-04-06 12:56 ` Lorenzo Stoakes (Oracle)
2026-04-06 14:08 ` Jason Gunthorpe
2 siblings, 2 replies; 9+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-04-06 12:54 UTC (permalink / raw)
To: Thomas Hellström
Cc: intel-xe, Alistair Popple, Ralph Campbell, Christoph Hellwig,
Jason Gunthorpe, Jason Gunthorpe, Leon Romanovsky, Andrew Morton,
Matthew Brost, John Hubbard, linux-mm, dri-devel, stable,
linux-fsdevel, David Hildenbrand, Zi Yan, Joshua Hahn, Rakie Kim,
Byungchul Park, Gregory Price, Ying Huang,
Matthew Wilcox (Oracle), Liam R. Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko
Hi guys,
+cc missing M/R, fsdevel list
So this was merged upstream, and touches mm/, and even has a mm:
prefix... but was taken through a non-mm tree.
I'm confused as to why an mm: patch didn't go through the mm tree, but when
we do stuff like this, isn't the done thing to get maintainer signoff
before doing something like that, anyway?
I see John gave a tag (and he's great so that gives me confidence here),
but we should really follow the procedure on this properly.
So:
$ scripts/get_maintainer.pl --no-git include/linux/migrate.h mm/filemap.c mm/memory.c mm/migrate.c mm/migrate_device.c | grep maintainer
Andrew Morton <akpm@linux-foundation.org> (maintainer:MEMORY MANAGEMENT - MEMORY POLICY AND MIGRATION)
David Hildenbrand <david@kernel.org> (maintainer:MEMORY MANAGEMENT - MEMORY POLICY AND MIGRATION)
"Matthew Wilcox (Oracle)" <willy@infradead.org> (maintainer:PAGE CACHE)
Are the maintainers, and there are also 14 reviewers too.
None were even cc'd here which isn't helpful :)
I realise mm process has _perhaps_ not been quite so prescriptive as this
in the past, but we definitely require this going forward, and in general
you should run get_maintainers.pl and cc relevant people to help with the
pain that is kernel email :)
As I don't _think_ anybody noticed it, unfortunately. linux-mm is just too
busy to keep track there, generally.
We are semi-tetchy about this as we've had... fun in the past with some
bigger changes that went through other trees. This is, at least, very
isolated.
Thanks, Lorenzo
On Tue, Feb 10, 2026 at 12:56:53PM +0100, Thomas Hellström wrote:
> If hmm_range_fault() fails a folio_trylock() in do_swap_page,
> trying to acquire the lock of a device-private folio for migration,
> to ram, the function will spin until it succeeds grabbing the lock.
>
> However, if the process holding the lock is depending on a work
> item to be completed, which is scheduled on the same CPU as the
> spinning hmm_range_fault(), that work item might be starved and
> we end up in a livelock / starvation situation which is never
> resolved.
>
> This can happen, for example if the process holding the
> device-private folio lock is stuck in
> migrate_device_unmap()->lru_add_drain_all()
> sinc lru_add_drain_all() requires a short work-item
> to be run on all online cpus to complete.
>
> A prerequisite for this to happen is:
> a) Both zone device and system memory folios are considered in
> migrate_device_unmap(), so that there is a reason to call
> lru_add_drain_all() for a system memory folio while a
> folio lock is held on a zone device folio.
> b) The zone device folio has an initial mapcount > 1 which causes
> at least one migration PTE entry insertion to be deferred to
> try_to_migrate(), which can happen after the call to
> lru_add_drain_all().
> c) No or voluntary only preemption.
>
> This all seems pretty unlikely to happen, but indeed is hit by
> the "xe_exec_system_allocator" igt test.
>
> Resolve this by waiting for the folio to be unlocked if the
> folio_trylock() fails in do_swap_page().
>
> Rename migration_entry_wait_on_locked() to
> softleaf_entry_wait_unlock() and update its documentation to
> indicate the new use-case.
>
> Future code improvements might consider moving
> the lru_add_drain_all() call in migrate_device_unmap() to be
> called *after* all pages have migration entries inserted.
> That would eliminate also b) above.
>
> v2:
> - Instead of a cond_resched() in hmm_range_fault(),
> eliminate the problem by waiting for the folio to be unlocked
> in do_swap_page() (Alistair Popple, Andrew Morton)
> v3:
> - Add a stub migration_entry_wait_on_locked() for the
> !CONFIG_MIGRATION case. (Kernel Test Robot)
> v4:
> - Rename migrate_entry_wait_on_locked() to
> softleaf_entry_wait_on_locked() and update docs (Alistair Popple)
> v5:
> - Add a WARN_ON_ONCE() for the !CONFIG_MIGRATION
> version of softleaf_entry_wait_on_locked().
> - Modify wording around function names in the commit message
> (Andrew Morton)
>
> Suggested-by: Alistair Popple <apopple@nvidia.com>
> Fixes: 1afaeb8293c9 ("mm/migrate: Trylock device page in do_swap_page")
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Jason Gunthorpe <jgg@mellanox.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Leon Romanovsky <leon@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: linux-mm@kvack.org
> Cc: <dri-devel@lists.freedesktop.org>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: <stable@vger.kernel.org> # v6.15+
> Reviewed-by: John Hubbard <jhubbard@nvidia.com> #v3
> ---
> include/linux/migrate.h | 10 +++++++++-
> mm/filemap.c | 15 ++++++++++-----
> mm/memory.c | 3 ++-
> mm/migrate.c | 8 ++++----
> mm/migrate_device.c | 2 +-
> 5 files changed, 26 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 26ca00c325d9..d5af2b7f577b 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -65,7 +65,7 @@ bool isolate_folio_to_list(struct folio *folio, struct list_head *list);
>
> int migrate_huge_page_move_mapping(struct address_space *mapping,
> struct folio *dst, struct folio *src);
> -void migration_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl)
> +void softleaf_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl)
> __releases(ptl);
> void folio_migrate_flags(struct folio *newfolio, struct folio *folio);
> int folio_migrate_mapping(struct address_space *mapping,
> @@ -97,6 +97,14 @@ static inline int set_movable_ops(const struct movable_operations *ops, enum pag
> return -ENOSYS;
> }
>
> +static inline void softleaf_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl)
> + __releases(ptl)
> +{
> + WARN_ON_ONCE(1);
> +
> + spin_unlock(ptl);
> +}
> +
> #endif /* CONFIG_MIGRATION */
>
> #ifdef CONFIG_NUMA_BALANCING
> diff --git a/mm/filemap.c b/mm/filemap.c
> index ebd75684cb0a..d98e4883f13d 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1379,14 +1379,16 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
>
> #ifdef CONFIG_MIGRATION
> /**
> - * migration_entry_wait_on_locked - Wait for a migration entry to be removed
> - * @entry: migration swap entry.
> + * softleaf_entry_wait_on_locked - Wait for a migration entry or
> + * device_private entry to be removed.
> + * @entry: migration or device_private swap entry.
> * @ptl: already locked ptl. This function will drop the lock.
> *
> - * Wait for a migration entry referencing the given page to be removed. This is
> + * Wait for a migration entry referencing the given page, or device_private
> + * entry referencing a dvice_private page to be unlocked. This is
> * equivalent to folio_put_wait_locked(folio, TASK_UNINTERRUPTIBLE) except
> * this can be called without taking a reference on the page. Instead this
> - * should be called while holding the ptl for the migration entry referencing
> + * should be called while holding the ptl for @entry referencing
> * the page.
> *
> * Returns after unlocking the ptl.
> @@ -1394,7 +1396,7 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
> * This follows the same logic as folio_wait_bit_common() so see the comments
> * there.
> */
> -void migration_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl)
> +void softleaf_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl)
> __releases(ptl)
I mean I have to say though I'm happy you propagated the softleaf thing
further at least ;)
> {
> struct wait_page_queue wait_page;
> @@ -1428,6 +1430,9 @@ void migration_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl)
> * If a migration entry exists for the page the migration path must hold
> * a valid reference to the page, and it must take the ptl to remove the
> * migration entry. So the page is valid until the ptl is dropped.
> + * Similarly any path attempting to drop the last reference to a
> + * device-private page needs to grab the ptl to remove the device-private
> + * entry.
> */
> spin_unlock(ptl);
>
> diff --git a/mm/memory.c b/mm/memory.c
> index da360a6eb8a4..20172476a57f 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4684,7 +4684,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> unlock_page(vmf->page);
> put_page(vmf->page);
> } else {
> - pte_unmap_unlock(vmf->pte, vmf->ptl);
> + pte_unmap(vmf->pte);
> + softleaf_entry_wait_on_locked(entry, vmf->ptl);
> }
> } else if (softleaf_is_hwpoison(entry)) {
> ret = VM_FAULT_HWPOISON;
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 4688b9e38cd2..cf6449b4202e 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -499,7 +499,7 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd,
> if (!softleaf_is_migration(entry))
> goto out;
>
> - migration_entry_wait_on_locked(entry, ptl);
> + softleaf_entry_wait_on_locked(entry, ptl);
> return;
> out:
> spin_unlock(ptl);
> @@ -531,10 +531,10 @@ void migration_entry_wait_huge(struct vm_area_struct *vma, unsigned long addr, p
> * If migration entry existed, safe to release vma lock
> * here because the pgtable page won't be freed without the
> * pgtable lock released. See comment right above pgtable
> - * lock release in migration_entry_wait_on_locked().
> + * lock release in softleaf_entry_wait_on_locked().
> */
> hugetlb_vma_unlock_read(vma);
> - migration_entry_wait_on_locked(entry, ptl);
> + softleaf_entry_wait_on_locked(entry, ptl);
> return;
> }
>
> @@ -552,7 +552,7 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
> ptl = pmd_lock(mm, pmd);
> if (!pmd_is_migration_entry(*pmd))
> goto unlock;
> - migration_entry_wait_on_locked(softleaf_from_pmd(*pmd), ptl);
> + softleaf_entry_wait_on_locked(softleaf_from_pmd(*pmd), ptl);
> return;
> unlock:
> spin_unlock(ptl);
> diff --git a/mm/migrate_device.c b/mm/migrate_device.c
> index 23379663b1e1..deab89fd4541 100644
> --- a/mm/migrate_device.c
> +++ b/mm/migrate_device.c
> @@ -176,7 +176,7 @@ static int migrate_vma_collect_huge_pmd(pmd_t *pmdp, unsigned long start,
> }
>
> if (softleaf_is_migration(entry)) {
> - migration_entry_wait_on_locked(entry, ptl);
> + softleaf_entry_wait_on_locked(entry, ptl);
> spin_unlock(ptl);
> return -EAGAIN;
> }
> --
> 2.52.0
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v5] mm: Fix a hmm_range_fault() livelock / starvation problem
2026-04-06 12:54 ` Lorenzo Stoakes (Oracle)
@ 2026-04-06 12:56 ` Lorenzo Stoakes (Oracle)
2026-04-06 19:11 ` Matthew Brost
2026-04-06 14:08 ` Jason Gunthorpe
1 sibling, 1 reply; 9+ messages in thread
From: Lorenzo Stoakes (Oracle) @ 2026-04-06 12:56 UTC (permalink / raw)
To: Thomas Hellström
Cc: intel-xe, Alistair Popple, Ralph Campbell, Christoph Hellwig,
Jason Gunthorpe, Jason Gunthorpe, Leon Romanovsky, Andrew Morton,
Matthew Brost, John Hubbard, linux-mm, dri-devel, stable,
linux-fsdevel, David Hildenbrand, Zi Yan, Joshua Hahn, Rakie Kim,
Byungchul Park, Gregory Price, Ying Huang,
Matthew Wilcox (Oracle), Liam R. Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko
On Mon, Apr 06, 2026 at 01:54:13PM +0100, Lorenzo Stoakes (Oracle) wrote:
> I see John gave a tag (and he's great so that gives me confidence here),
> but we should really follow the procedure on this properly.
Oh and just noticed Alastair also :) so that adds further confidence, so this is
really a point about cc/M signoff requirement going forwards.
Thanks, Lorenzo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v5] mm: Fix a hmm_range_fault() livelock / starvation problem
2026-04-06 12:54 ` Lorenzo Stoakes (Oracle)
2026-04-06 12:56 ` Lorenzo Stoakes (Oracle)
@ 2026-04-06 14:08 ` Jason Gunthorpe
1 sibling, 0 replies; 9+ messages in thread
From: Jason Gunthorpe @ 2026-04-06 14:08 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: Thomas Hellström, intel-xe, Alistair Popple, Ralph Campbell,
Christoph Hellwig, Leon Romanovsky, Andrew Morton, Matthew Brost,
John Hubbard, linux-mm, dri-devel, stable, linux-fsdevel,
David Hildenbrand, Zi Yan, Joshua Hahn, Rakie Kim, Byungchul Park,
Gregory Price, Ying Huang, Matthew Wilcox (Oracle),
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko
On Mon, Apr 06, 2026 at 01:54:13PM +0100, Lorenzo Stoakes (Oracle) wrote:
> Hi guys,
>
> +cc missing M/R, fsdevel list
>
> So this was merged upstream, and touches mm/, and even has a mm:
> prefix... but was taken through a non-mm tree.
Also, how come there is no email on lore reporting someone accepted it?
Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v5] mm: Fix a hmm_range_fault() livelock / starvation problem
2026-04-06 12:56 ` Lorenzo Stoakes (Oracle)
@ 2026-04-06 19:11 ` Matthew Brost
0 siblings, 0 replies; 9+ messages in thread
From: Matthew Brost @ 2026-04-06 19:11 UTC (permalink / raw)
To: Lorenzo Stoakes (Oracle)
Cc: Thomas Hellström, intel-xe, Alistair Popple, Ralph Campbell,
Christoph Hellwig, Jason Gunthorpe, Jason Gunthorpe,
Leon Romanovsky, Andrew Morton, John Hubbard, linux-mm, dri-devel,
stable, linux-fsdevel, David Hildenbrand, Zi Yan, Joshua Hahn,
Rakie Kim, Byungchul Park, Gregory Price, Ying Huang,
Matthew Wilcox (Oracle), Liam R. Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko
On Mon, Apr 06, 2026 at 01:56:49PM +0100, Lorenzo Stoakes (Oracle) wrote:
> On Mon, Apr 06, 2026 at 01:54:13PM +0100, Lorenzo Stoakes (Oracle) wrote:
> > I see John gave a tag (and he's great so that gives me confidence here),
> > but we should really follow the procedure on this properly.
>
> Oh and just noticed Alastair also :) so that adds further confidence, so this is
> really a point about cc/M signoff requirement going forwards.
+1.
Andrew did ACK this via DRM here [1].
When we take external subsystem patches through DRM, our merge script
requires ACKs from an external maintainer, as determined by
get_maintainers.pl.
I’m not sure what happened here, but it looks like Andrew’s ACK was lost
on the patch, and somehow our merge tool allowed it to go in regardless.
We will be more diligent going forward.
Matt
[1] https://patchwork.freedesktop.org/patch/703183/?series=161082&rev=3#comment_1294670
>
> Thanks, Lorenzo
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-04-06 19:11 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-10 11:56 [PATCH v5] mm: Fix a hmm_range_fault() livelock / starvation problem Thomas Hellström
2026-02-10 22:40 ` Alistair Popple
2026-02-11 22:23 ` Davidlohr Bueso
2026-02-11 22:54 ` Alistair Popple
2026-02-11 23:22 ` Matthew Brost
2026-04-06 12:54 ` Lorenzo Stoakes (Oracle)
2026-04-06 12:56 ` Lorenzo Stoakes (Oracle)
2026-04-06 19:11 ` Matthew Brost
2026-04-06 14:08 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox