[PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
@ 2026-01-09  3:47 Jinchao Wang
  2026-01-09  4:06 ` Matthew Wilcox
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jinchao Wang @ 2026-01-09  3:47 UTC (permalink / raw)
  To: Matthew Wilcox, Andrew Morton, David Hildenbrand, Zi Yan,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, linux-mm,
	linux-kernel
  Cc: Jinchao Wang, syzbot+2d9c96466c978346b55f

Fix an AB-BA deadlock between hugetlbfs_punch_hole() and page migration.

The deadlock occurs because migration violates the lock ordering defined
in mm/rmap.c for hugetlbfs:

  * hugetlbfs PageHuge() take locks in this order:
  * hugetlb_fault_mutex
  * vma_lock
  * mapping->i_mmap_rwsem
  * folio_lock

The following trace illustrates the inversion:

Task A (punch_hole):             Task B (migration):
--------------------             -------------------
1. i_mmap_lock_write(mapping)    1. folio_lock(folio)
2. folio_lock(folio)             2. i_mmap_lock_read(mapping)
   (blocks waiting for B)           (blocks waiting for A)

Task A is blocked in the punch-hole path:
  hugetlbfs_fallocate
    hugetlbfs_punch_hole
      hugetlbfs_zero_partial_page
        folio_lock

Task B is blocked in the migration path:
  migrate_pages
    unmap_and_move_huge_page
      remove_migration_ptes
        __rmap_walk_file
          i_mmap_lock_read

To fix this, adjust unmap_and_move_huge_page() to respect the established
hierarchy. If i_mmap_rwsem is acquired during try_to_migrate(), hold it
until remove_migration_ptes() completes.

This utilizes the existing retry logic, which unlocks the folio and
returns -EAGAIN if hugetlb_folio_mapping_lock_write() fails.

Link: https://lore.kernel.org/all/68e9715a.050a0220.1186a4.000d.GAE@google.com/
Link: https://lore.kernel.org/all/20260108123957.1123502-2-wangjinchao600@gmail.com
Reported-by: syzbot+2d9c96466c978346b55f@syzkaller.appspotmail.com
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
 mm/migrate.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 5169f9717f60..bcaa13541acc 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1458,6 +1458,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
 	int page_was_mapped = 0;
 	struct anon_vma *anon_vma = NULL;
 	struct address_space *mapping = NULL;
+	enum ttu_flags ttu = 0;
 
 	if (folio_ref_count(src) == 1) {
 		/* page was freed from under us. So we are done. */
@@ -1498,8 +1499,6 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
 		goto put_anon;
 
 	if (folio_mapped(src)) {
-		enum ttu_flags ttu = 0;
-
 		if (!folio_test_anon(src)) {
 			/*
 			 * In shared mappings, try_to_unmap could potentially
@@ -1516,16 +1515,17 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
 
 		try_to_migrate(src, ttu);
 		page_was_mapped = 1;
-
-		if (ttu & TTU_RMAP_LOCKED)
-			i_mmap_unlock_write(mapping);
 	}
 
 	if (!folio_mapped(src))
 		rc = move_to_new_folio(dst, src, mode);
 
 	if (page_was_mapped)
-		remove_migration_ptes(src, !rc ? dst : src, 0);
+		remove_migration_ptes(src, !rc ? dst : src,
+				      ttu ? RMP_LOCKED : 0);
+
+	if (ttu & TTU_RMAP_LOCKED)
+		i_mmap_unlock_write(mapping);
 
 unlock_put_anon:
 	folio_unlock(dst);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
  2026-01-09  3:47 [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering Jinchao Wang
@ 2026-01-09  4:06 ` Matthew Wilcox
  2026-01-09  5:17   ` Jinchao Wang
  2026-01-09  6:37 ` Huang, Ying
  2026-01-09 13:39 ` David Hildenbrand (Red Hat)
  2 siblings, 1 reply; 10+ messages in thread
From: Matthew Wilcox @ 2026-01-09  4:06 UTC (permalink / raw)
  To: Jinchao Wang
  Cc: Andrew Morton, David Hildenbrand, Zi Yan, Matthew Brost,
	Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang,
	Alistair Popple, linux-mm, linux-kernel,
	syzbot+2d9c96466c978346b55f

On Fri, Jan 09, 2026 at 11:47:16AM +0800, Jinchao Wang wrote:
> Link: https://lore.kernel.org/all/68e9715a.050a0220.1186a4.000d.GAE@google.com/
> Link: https://lore.kernel.org/all/20260108123957.1123502-2-wangjinchao600@gmail.com
> Reported-by: syzbot+2d9c96466c978346b55f@syzkaller.appspotmail.com
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>

... and by "Suggested-by", you mean "completely written by", right?

Or did you change it in some way?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
  2026-01-09  4:06 ` Matthew Wilcox
@ 2026-01-09  5:17   ` Jinchao Wang
  0 siblings, 0 replies; 10+ messages in thread
From: Jinchao Wang @ 2026-01-09  5:17 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, David Hildenbrand, Zi Yan, Matthew Brost,
	Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price, Ying Huang,
	Alistair Popple, linux-mm, linux-kernel,
	syzbot+2d9c96466c978346b55f

On Fri, Jan 09, 2026 at 04:06:22AM +0000, Matthew Wilcox wrote:
> On Fri, Jan 09, 2026 at 11:47:16AM +0800, Jinchao Wang wrote:
> > Link: https://lore.kernel.org/all/68e9715a.050a0220.1186a4.000d.GAE@google.com/
> > Link: https://lore.kernel.org/all/20260108123957.1123502-2-wangjinchao600@gmail.com
> > Reported-by: syzbot+2d9c96466c978346b55f@syzkaller.appspotmail.com
> > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
> 
> ... and by "Suggested-by", you mean "completely written by", right?
> 
> Or did you change it in some way?

Yes, it is completely written by you. I verified it against the syzkaller
reproducer and reviewed the code logic.

If you prefer, I am happy to update the attribution, for example by replacing
Suggested-by with Co-developed-by, or by listing you as the author instead.  I
can also drop my patch if that is more appropriate.

Please let me know what you prefer.

Thanks.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
  2026-01-09  3:47 [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering Jinchao Wang
  2026-01-09  4:06 ` Matthew Wilcox
@ 2026-01-09  6:37 ` Huang, Ying
  2026-01-09  8:08   ` Jinchao Wang
  2026-01-09 13:39 ` David Hildenbrand (Red Hat)
  2 siblings, 1 reply; 10+ messages in thread
From: Huang, Ying @ 2026-01-09  6:37 UTC (permalink / raw)
  To: Jinchao Wang
  Cc: Matthew Wilcox, Andrew Morton, David Hildenbrand, Zi Yan,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Alistair Popple, linux-mm, linux-kernel,
	syzbot+2d9c96466c978346b55f

Jinchao Wang <wangjinchao600@gmail.com> writes:

> Fix an AB-BA deadlock between hugetlbfs_punch_hole() and page migration.
>
> The deadlock occurs because migration violates the lock ordering defined
> in mm/rmap.c for hugetlbfs:
>
>   * hugetlbfs PageHuge() take locks in this order:
>   * hugetlb_fault_mutex
>   * vma_lock
>   * mapping->i_mmap_rwsem
>   * folio_lock
>
> The following trace illustrates the inversion:
>
> Task A (punch_hole):             Task B (migration):
> --------------------             -------------------
> 1. i_mmap_lock_write(mapping)    1. folio_lock(folio)
> 2. folio_lock(folio)             2. i_mmap_lock_read(mapping)
>    (blocks waiting for B)           (blocks waiting for A)
>
> Task A is blocked in the punch-hole path:
>   hugetlbfs_fallocate
>     hugetlbfs_punch_hole
>       hugetlbfs_zero_partial_page
>         folio_lock
>
> Task B is blocked in the migration path:
>   migrate_pages
>     unmap_and_move_huge_page
>       remove_migration_ptes
>         __rmap_walk_file
>           i_mmap_lock_read
>
> To fix this, adjust unmap_and_move_huge_page() to respect the established
> hierarchy. If i_mmap_rwsem is acquired during try_to_migrate(), hold it
> until remove_migration_ptes() completes.
>
> This utilizes the existing retry logic, which unlocks the folio and
> returns -EAGAIN if hugetlb_folio_mapping_lock_write() fails.
>
> Link: https://lore.kernel.org/all/68e9715a.050a0220.1186a4.000d.GAE@google.com/
> Link: https://lore.kernel.org/all/20260108123957.1123502-2-wangjinchao600@gmail.com
> Reported-by: syzbot+2d9c96466c978346b55f@syzkaller.appspotmail.com
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>

Can you provide a "Fixes:" tag?  That is helpful for backporting the bug
fix.

---
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
  2026-01-09  6:37 ` Huang, Ying
@ 2026-01-09  8:08   ` Jinchao Wang
  0 siblings, 0 replies; 10+ messages in thread
From: Jinchao Wang @ 2026-01-09  8:08 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Matthew Wilcox, Andrew Morton, David Hildenbrand, Zi Yan,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Alistair Popple, linux-mm, linux-kernel,
	syzbot+2d9c96466c978346b55f

On Fri, Jan 09, 2026 at 02:37:28PM +0800, Huang, Ying wrote:
> Jinchao Wang <wangjinchao600@gmail.com> writes:
> 
> > Fix an AB-BA deadlock between hugetlbfs_punch_hole() and page migration.
> >
> > The deadlock occurs because migration violates the lock ordering defined
> > in mm/rmap.c for hugetlbfs:
> >
> >   * hugetlbfs PageHuge() take locks in this order:
> >   * hugetlb_fault_mutex
> >   * vma_lock
> >   * mapping->i_mmap_rwsem
> >   * folio_lock
> >
> > The following trace illustrates the inversion:
> >
> > Task A (punch_hole):             Task B (migration):
> > --------------------             -------------------
> > 1. i_mmap_lock_write(mapping)    1. folio_lock(folio)
> > 2. folio_lock(folio)             2. i_mmap_lock_read(mapping)
> >    (blocks waiting for B)           (blocks waiting for A)
> >
> > Task A is blocked in the punch-hole path:
> >   hugetlbfs_fallocate
> >     hugetlbfs_punch_hole
> >       hugetlbfs_zero_partial_page
> >         folio_lock
> >
> > Task B is blocked in the migration path:
> >   migrate_pages
> >     unmap_and_move_huge_page
> >       remove_migration_ptes
> >         __rmap_walk_file
> >           i_mmap_lock_read
> >
> > To fix this, adjust unmap_and_move_huge_page() to respect the established
> > hierarchy. If i_mmap_rwsem is acquired during try_to_migrate(), hold it
> > until remove_migration_ptes() completes.
> >
> > This utilizes the existing retry logic, which unlocks the folio and
> > returns -EAGAIN if hugetlb_folio_mapping_lock_write() fails.
> >
> > Link: https://lore.kernel.org/all/68e9715a.050a0220.1186a4.000d.GAE@google.com/
> > Link: https://lore.kernel.org/all/20260108123957.1123502-2-wangjinchao600@gmail.com
> > Reported-by: syzbot+2d9c96466c978346b55f@syzkaller.appspotmail.com
> > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
> 
> Can you provide a "Fixes:" tag?  That is helpful for backporting the bug
> fix.

Thanks for the suggestion. 
The deadlock appears to be caused by a violation of the lock ordering 
introduced in commit 336bf30eb765 ("hugetlbfs: fix anon huge page migration 
race"). Although commit 68d32527d340 ("hugetlbfs: zero partial pages during 
fallocate hole punch") was the one that first triggered the crash, 
I believe the 336bf30eb765 commit is the root cause.

I will add the following tag to v2:
Fixes: 336bf30eb765 ("hugetlbfs: fix anon huge page migration race")
> 
> ---
> Best Regards,
> Huang, Ying


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
  2026-01-09  3:47 [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering Jinchao Wang
  2026-01-09  4:06 ` Matthew Wilcox
  2026-01-09  6:37 ` Huang, Ying
@ 2026-01-09 13:39 ` David Hildenbrand (Red Hat)
  2026-01-09 14:16   ` Jinchao Wang
  2 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-09 13:39 UTC (permalink / raw)
  To: Jinchao Wang, Matthew Wilcox, Andrew Morton, Zi Yan,
	Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
	Gregory Price, Ying Huang, Alistair Popple, linux-mm,
	linux-kernel
  Cc: syzbot+2d9c96466c978346b55f

On 1/9/26 04:47, Jinchao Wang wrote:
> Fix an AB-BA deadlock between hugetlbfs_punch_hole() and page migration.
> 
> The deadlock occurs because migration violates the lock ordering defined
> in mm/rmap.c for hugetlbfs:
> 
>    * hugetlbfs PageHuge() take locks in this order:
>    * hugetlb_fault_mutex
>    * vma_lock
>    * mapping->i_mmap_rwsem
>    * folio_lock
> 
> The following trace illustrates the inversion:
> 
> Task A (punch_hole):             Task B (migration):
> --------------------             -------------------
> 1. i_mmap_lock_write(mapping)    1. folio_lock(folio)
> 2. folio_lock(folio)             2. i_mmap_lock_read(mapping)
>     (blocks waiting for B)           (blocks waiting for A)
> 
> Task A is blocked in the punch-hole path:
>    hugetlbfs_fallocate
>      hugetlbfs_punch_hole
>        hugetlbfs_zero_partial_page
>          folio_lock
> 
> Task B is blocked in the migration path:
>    migrate_pages
>      unmap_and_move_huge_page
>        remove_migration_ptes
>          __rmap_walk_file
>            i_mmap_lock_read
> 
> To fix this, adjust unmap_and_move_huge_page() to respect the established
> hierarchy. If i_mmap_rwsem is acquired during try_to_migrate(), hold it


I'm confused. Isn't it unmap_and_move_huge_page() that grabs the 
i_mmap_rwsem during hugetlb_page_mapping_lock_write() (where we do a 
try-lock)?


We now handle file-backed folios correctly I think. Could we somehow 
also be in trouble for anon folios? Because there, we'd still take the 
rmap lock after grabbing the folio lock.


-- 
Cheers

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
  2026-01-09 13:39 ` David Hildenbrand (Red Hat)
@ 2026-01-09 14:16   ` Jinchao Wang
  2026-01-09 14:18     ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 10+ messages in thread
From: Jinchao Wang @ 2026-01-09 14:16 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: Matthew Wilcox, Andrew Morton, Zi Yan, Matthew Brost, Joshua Hahn,
	Rakie Kim, Byungchul Park, Gregory Price, Ying Huang,
	Alistair Popple, linux-mm, linux-kernel,
	syzbot+2d9c96466c978346b55f

On Fri, Jan 09, 2026 at 02:39:08PM +0100, David Hildenbrand (Red Hat) wrote:
> On 1/9/26 04:47, Jinchao Wang wrote:
> > Fix an AB-BA deadlock between hugetlbfs_punch_hole() and page migration.
> > 
> > The deadlock occurs because migration violates the lock ordering defined
> > in mm/rmap.c for hugetlbfs:
> > 
> >    * hugetlbfs PageHuge() take locks in this order:
> >    * hugetlb_fault_mutex
> >    * vma_lock
> >    * mapping->i_mmap_rwsem
> >    * folio_lock
> > 
> > The following trace illustrates the inversion:
> > 
> > Task A (punch_hole):             Task B (migration):
> > --------------------             -------------------
> > 1. i_mmap_lock_write(mapping)    1. folio_lock(folio)
> > 2. folio_lock(folio)             2. i_mmap_lock_read(mapping)
> >     (blocks waiting for B)           (blocks waiting for A)
> > 
> > Task A is blocked in the punch-hole path:
> >    hugetlbfs_fallocate
> >      hugetlbfs_punch_hole
> >        hugetlbfs_zero_partial_page
> >          folio_lock
> > 
> > Task B is blocked in the migration path:
> >    migrate_pages
> >      unmap_and_move_huge_page
> >        remove_migration_ptes
> >          __rmap_walk_file
> >            i_mmap_lock_read
> > 
> > To fix this, adjust unmap_and_move_huge_page() to respect the established
> > hierarchy. If i_mmap_rwsem is acquired during try_to_migrate(), hold it
> 
> 
> I'm confused. Isn't it unmap_and_move_huge_page() that grabs the
> i_mmap_rwsem during hugetlb_page_mapping_lock_write() (where we do a
> try-lock)?
Yes, but the lock is released before remove_migration_ptes().

Task A can enter the race window between
	i_mmap_unlock_write(mapping)
and
	remove_migration_ptes() -> i_mmap_lock_read(mapping).

This window was introduced by the change below:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/mm/migrate.c?id=336bf30eb765

> 
> 
> We now handle file-backed folios correctly I think. Could we somehow also be
> in trouble for anon folios? Because there, we'd still take the rmap lock
> after grabbing the folio lock.
> 
> 
> -- 
> Cheers
> 
> David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
  2026-01-09 14:16   ` Jinchao Wang
@ 2026-01-09 14:18     ` David Hildenbrand (Red Hat)
  2026-01-09 15:32       ` Jinchao Wang
  0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-09 14:18 UTC (permalink / raw)
  To: Jinchao Wang
  Cc: Matthew Wilcox, Andrew Morton, Zi Yan, Matthew Brost, Joshua Hahn,
	Rakie Kim, Byungchul Park, Gregory Price, Ying Huang,
	Alistair Popple, linux-mm, linux-kernel,
	syzbot+2d9c96466c978346b55f

On 1/9/26 15:16, Jinchao Wang wrote:
> On Fri, Jan 09, 2026 at 02:39:08PM +0100, David Hildenbrand (Red Hat) wrote:
>> On 1/9/26 04:47, Jinchao Wang wrote:
>>> Fix an AB-BA deadlock between hugetlbfs_punch_hole() and page migration.
>>>
>>> The deadlock occurs because migration violates the lock ordering defined
>>> in mm/rmap.c for hugetlbfs:
>>>
>>>     * hugetlbfs PageHuge() take locks in this order:
>>>     * hugetlb_fault_mutex
>>>     * vma_lock
>>>     * mapping->i_mmap_rwsem
>>>     * folio_lock
>>>
>>> The following trace illustrates the inversion:
>>>
>>> Task A (punch_hole):             Task B (migration):
>>> --------------------             -------------------
>>> 1. i_mmap_lock_write(mapping)    1. folio_lock(folio)
>>> 2. folio_lock(folio)             2. i_mmap_lock_read(mapping)
>>>      (blocks waiting for B)           (blocks waiting for A)
>>>
>>> Task A is blocked in the punch-hole path:
>>>     hugetlbfs_fallocate
>>>       hugetlbfs_punch_hole
>>>         hugetlbfs_zero_partial_page
>>>           folio_lock
>>>
>>> Task B is blocked in the migration path:
>>>     migrate_pages
>>>       unmap_and_move_huge_page
>>>         remove_migration_ptes
>>>           __rmap_walk_file
>>>             i_mmap_lock_read
>>>
>>> To fix this, adjust unmap_and_move_huge_page() to respect the established
>>> hierarchy. If i_mmap_rwsem is acquired during try_to_migrate(), hold it
>>
>>
>> I'm confused. Isn't it unmap_and_move_huge_page() that grabs the
>> i_mmap_rwsem during hugetlb_page_mapping_lock_write() (where we do a
>> try-lock)?
> Yes, but the lock is released before remove_migration_ptes().
> 
> Task A can enter the race window between
> 	i_mmap_unlock_write(mapping)
> and
> 	remove_migration_ptes() -> i_mmap_lock_read(mapping).
> 
> This window was introduced by the change below:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/mm/migrate.c?id=336bf30eb765

try_to_migrate() is not the problem, but remove_migration_ptes() ?

Anyhow, I saw that Willy sent out a version.

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
  2026-01-09 14:18     ` David Hildenbrand (Red Hat)
@ 2026-01-09 15:32       ` Jinchao Wang
  2026-01-09 15:41         ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 10+ messages in thread
From: Jinchao Wang @ 2026-01-09 15:32 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: Matthew Wilcox, Andrew Morton, Zi Yan, Matthew Brost, Joshua Hahn,
	Rakie Kim, Byungchul Park, Gregory Price, Ying Huang,
	Alistair Popple, linux-mm, linux-kernel,
	syzbot+2d9c96466c978346b55f

On Fri, Jan 09, 2026 at 03:18:37PM +0100, David Hildenbrand (Red Hat) wrote:
> On 1/9/26 15:16, Jinchao Wang wrote:
> > On Fri, Jan 09, 2026 at 02:39:08PM +0100, David Hildenbrand (Red Hat) wrote:
> > > On 1/9/26 04:47, Jinchao Wang wrote:
> > > > Fix an AB-BA deadlock between hugetlbfs_punch_hole() and page migration.
> > > > 
> > > > The deadlock occurs because migration violates the lock ordering defined
> > > > in mm/rmap.c for hugetlbfs:
> > > > 
> > > >     * hugetlbfs PageHuge() take locks in this order:
> > > >     * hugetlb_fault_mutex
> > > >     * vma_lock
> > > >     * mapping->i_mmap_rwsem
> > > >     * folio_lock
> > > > 
> > > > The following trace illustrates the inversion:
> > > > 
> > > > Task A (punch_hole):             Task B (migration):
> > > > --------------------             -------------------
> > > > 1. i_mmap_lock_write(mapping)    1. folio_lock(folio)
> > > > 2. folio_lock(folio)             2. i_mmap_lock_read(mapping)
> > > >      (blocks waiting for B)           (blocks waiting for A)
> > > > 
> > > > Task A is blocked in the punch-hole path:
> > > >     hugetlbfs_fallocate
> > > >       hugetlbfs_punch_hole
> > > >         hugetlbfs_zero_partial_page
> > > >           folio_lock
> > > > 
> > > > Task B is blocked in the migration path:
> > > >     migrate_pages
> > > >       unmap_and_move_huge_page
> > > >         remove_migration_ptes
> > > >           __rmap_walk_file
> > > >             i_mmap_lock_read
> > > > 
> > > > To fix this, adjust unmap_and_move_huge_page() to respect the established
> > > > hierarchy. If i_mmap_rwsem is acquired during try_to_migrate(), hold it
> > > 
> > > 
> > > I'm confused. Isn't it unmap_and_move_huge_page() that grabs the
> > > i_mmap_rwsem during hugetlb_page_mapping_lock_write() (where we do a
> > > try-lock)?
> > Yes, but the lock is released before remove_migration_ptes().
> > 
> > Task A can enter the race window between
> > 	i_mmap_unlock_write(mapping)
> > and
> > 	remove_migration_ptes() -> i_mmap_lock_read(mapping).
> > 
> > This window was introduced by the change below:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/mm/migrate.c?id=336bf30eb765
> 
> try_to_migrate() is not the problem, but remove_migration_ptes() ?
> 
> Anyhow, I saw that Willy sent out a version.
Thank you for letting me know.

> 
> -- 
> Cheers
> 
> David


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
  2026-01-09 15:32       ` Jinchao Wang
@ 2026-01-09 15:41         ` David Hildenbrand (Red Hat)
  0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-09 15:41 UTC (permalink / raw)
  To: Jinchao Wang
  Cc: Matthew Wilcox, Andrew Morton, Zi Yan, Matthew Brost, Joshua Hahn,
	Rakie Kim, Byungchul Park, Gregory Price, Ying Huang,
	Alistair Popple, linux-mm, linux-kernel,
	syzbot+2d9c96466c978346b55f

On 1/9/26 16:32, Jinchao Wang wrote:
> On Fri, Jan 09, 2026 at 03:18:37PM +0100, David Hildenbrand (Red Hat) wrote:
>> On 1/9/26 15:16, Jinchao Wang wrote:
>>> On Fri, Jan 09, 2026 at 02:39:08PM +0100, David Hildenbrand (Red Hat) wrote:
>>>> On 1/9/26 04:47, Jinchao Wang wrote:
>>>>> Fix an AB-BA deadlock between hugetlbfs_punch_hole() and page migration.
>>>>>
>>>>> The deadlock occurs because migration violates the lock ordering defined
>>>>> in mm/rmap.c for hugetlbfs:
>>>>>
>>>>>      * hugetlbfs PageHuge() take locks in this order:
>>>>>      * hugetlb_fault_mutex
>>>>>      * vma_lock
>>>>>      * mapping->i_mmap_rwsem
>>>>>      * folio_lock
>>>>>
>>>>> The following trace illustrates the inversion:
>>>>>
>>>>> Task A (punch_hole):             Task B (migration):
>>>>> --------------------             -------------------
>>>>> 1. i_mmap_lock_write(mapping)    1. folio_lock(folio)
>>>>> 2. folio_lock(folio)             2. i_mmap_lock_read(mapping)
>>>>>       (blocks waiting for B)           (blocks waiting for A)
>>>>>
>>>>> Task A is blocked in the punch-hole path:
>>>>>      hugetlbfs_fallocate
>>>>>        hugetlbfs_punch_hole
>>>>>          hugetlbfs_zero_partial_page
>>>>>            folio_lock
>>>>>
>>>>> Task B is blocked in the migration path:
>>>>>      migrate_pages
>>>>>        unmap_and_move_huge_page
>>>>>          remove_migration_ptes
>>>>>            __rmap_walk_file
>>>>>              i_mmap_lock_read
>>>>>
>>>>> To fix this, adjust unmap_and_move_huge_page() to respect the established
>>>>> hierarchy. If i_mmap_rwsem is acquired during try_to_migrate(), hold it
>>>>
>>>>
>>>> I'm confused. Isn't it unmap_and_move_huge_page() that grabs the
>>>> i_mmap_rwsem during hugetlb_page_mapping_lock_write() (where we do a
>>>> try-lock)?
>>> Yes, but the lock is released before remove_migration_ptes().
>>>
>>> Task A can enter the race window between
>>> 	i_mmap_unlock_write(mapping)
>>> and
>>> 	remove_migration_ptes() -> i_mmap_lock_read(mapping).
>>>
>>> This window was introduced by the change below:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/mm/migrate.c?id=336bf30eb765
>>
>> try_to_migrate() is not the problem, but remove_migration_ptes() ?
>>
>> Anyhow, I saw that Willy sent out a version.
> Thank you for letting me know.

For reference:

https://lkml.kernel.org/r/20260109041345.3863089-1-willy@infradead.org

-- 
Cheers

David


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-01-09 15:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-09  3:47 [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering Jinchao Wang
2026-01-09  4:06 ` Matthew Wilcox
2026-01-09  5:17   ` Jinchao Wang
2026-01-09  6:37 ` Huang, Ying
2026-01-09  8:08   ` Jinchao Wang
2026-01-09 13:39 ` David Hildenbrand (Red Hat)
2026-01-09 14:16   ` Jinchao Wang
2026-01-09 14:18     ` David Hildenbrand (Red Hat)
2026-01-09 15:32       ` Jinchao Wang
2026-01-09 15:41         ` David Hildenbrand (Red Hat)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.