* [PATCH v2 1/1] mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths
@ 2025-11-10 11:15 Lance Yang
2025-11-10 12:17 ` Harry Yoo
2025-11-10 15:19 ` [syzbot ci] " syzbot ci
0 siblings, 2 replies; 7+ messages in thread
From: Lance Yang @ 2025-11-10 11:15 UTC (permalink / raw)
To: akpm
Cc: syzbot+3f5f9a0d292454409ca6, syzbot+ci5a676d3d210999ee, david,
linux-kernel, linux-mm, muchun.song, osalvador, syzkaller-bugs,
syzbot, syzbot, Lance Yang
From: Lance Yang <lance.yang@linux.dev>
The hugetlb VMA unmap path contains several potential deadlocks, as
reported by syzbot. These deadlocks occur in __hugetlb_zap_begin(),
move_hugetlb_page_tables(), and the retry path of
hugetlb_unmap_file_folio() (affecting remove_inode_hugepages() and
unmap_vmas()), where vma_lock is acquired before i_mmap_lock. This lock
ordering conflicts with other paths like hugetlb_fault(), which establish
the correct dependency as i_mmap_lock -> vma_lock.
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&vma_lock->rw_sema);
lock(&i_mmap_lock);
lock(&vma_lock->rw_sema);
lock(&i_mmap_lock);
Resolve the circular dependencies reported by syzbot across multiple call
chains by reordering the locks in all conflicting paths to consistently
follow the established i_mmap_lock -> vma_lock order.
Reported-by: syzbot+3f5f9a0d292454409ca6@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-mm/69113a97.a70a0220.22f260.00ca.GAE@google.com/
Signed-off-by: Lance Yang <lance.yang@linux.dev>
---
V1 -> V2:
- Update changelog
- Resolve three related deadlock scenarios reported by syzbot
https://lore.kernel.org/linux-mm/6911ad38.a70a0220.22f260.00dc.GAE@google.com/
- https://lore.kernel.org/linux-mm/20251110051421.29436-1-lance.yang@linux.dev/
fs/hugetlbfs/inode.c | 2 +-
mm/hugetlb.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 3919fca56553..d1b0b5346728 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -447,8 +447,8 @@ static void hugetlb_unmap_file_folio(struct hstate *h,
* a reference. We must 'open code' vma locking as we do
* not know if vma_lock is still attached to vma.
*/
- down_write(&vma_lock->rw_sema);
i_mmap_lock_write(mapping);
+ down_write(&vma_lock->rw_sema);
vma = vma_lock->vma;
if (!vma) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b1f47b87ae65..f0212d2579f6 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5110,8 +5110,8 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
mmu_notifier_invalidate_range_start(&range);
last_addr_mask = hugetlb_mask_last_page(h);
/* Prevent race with file truncation */
- hugetlb_vma_lock_write(vma);
i_mmap_lock_write(mapping);
+ hugetlb_vma_lock_write(vma);
for (; old_addr < old_end; old_addr += sz, new_addr += sz) {
src_pte = hugetlb_walk(vma, old_addr, sz);
if (!src_pte) {
@@ -5327,9 +5327,9 @@ void __hugetlb_zap_begin(struct vm_area_struct *vma,
return;
adjust_range_if_pmd_sharing_possible(vma, start, end);
- hugetlb_vma_lock_write(vma);
if (vma->vm_file)
i_mmap_lock_write(vma->vm_file->f_mapping);
+ hugetlb_vma_lock_write(vma);
}
void __hugetlb_zap_end(struct vm_area_struct *vma,
--
2.49.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths
2025-11-10 11:15 [PATCH v2 1/1] mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths Lance Yang
@ 2025-11-10 12:17 ` Harry Yoo
2025-11-10 16:39 ` Lance Yang
2025-11-10 15:19 ` [syzbot ci] " syzbot ci
1 sibling, 1 reply; 7+ messages in thread
From: Harry Yoo @ 2025-11-10 12:17 UTC (permalink / raw)
To: Lance Yang
Cc: akpm, syzbot+3f5f9a0d292454409ca6, syzbot+ci5a676d3d210999ee,
david, linux-kernel, linux-mm, muchun.song, osalvador,
syzkaller-bugs, syzbot, syzbot
On Mon, Nov 10, 2025 at 07:15:53PM +0800, Lance Yang wrote:
> From: Lance Yang <lance.yang@linux.dev>
>
> The hugetlb VMA unmap path contains several potential deadlocks, as
> reported by syzbot. These deadlocks occur in __hugetlb_zap_begin(),
> move_hugetlb_page_tables(), and the retry path of
> hugetlb_unmap_file_folio() (affecting remove_inode_hugepages() and
> unmap_vmas()), where vma_lock is acquired before i_mmap_lock. This lock
> ordering conflicts with other paths like hugetlb_fault(), which establish
> the correct dependency as i_mmap_lock -> vma_lock.
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&vma_lock->rw_sema);
> lock(&i_mmap_lock);
> lock(&vma_lock->rw_sema);
> lock(&i_mmap_lock);
>
> Resolve the circular dependencies reported by syzbot across multiple call
> chains by reordering the locks in all conflicting paths to consistently
> follow the established i_mmap_lock -> vma_lock order.
But mm/rmap.c says:
> * hugetlbfs PageHuge() take locks in this order:
> * hugetlb_fault_mutex (hugetlbfs specific page fault mutex)
> * vma_lock (hugetlb specific lock for pmd_sharing)
> * mapping->i_mmap_rwsem (also used for hugetlb pmd sharing)
> * folio_lock
> */
I think the commit message should explain why the locking order described
above is incorrect (or when it became incorrect) and fix the comment?
> Reported-by: syzbot+3f5f9a0d292454409ca6@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/linux-mm/69113a97.a70a0220.22f260.00ca.GAE@google.com/
> Signed-off-by: Lance Yang <lance.yang@linux.dev>
> ---
> V1 -> V2:
> - Update changelog
> - Resolve three related deadlock scenarios reported by syzbot
> https://lore.kernel.org/linux-mm/6911ad38.a70a0220.22f260.00dc.GAE@google.com/
> - https://lore.kernel.org/linux-mm/20251110051421.29436-1-lance.yang@linux.dev/
>
> fs/hugetlbfs/inode.c | 2 +-
> mm/hugetlb.c | 4 ++--
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 3919fca56553..d1b0b5346728 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -447,8 +447,8 @@ static void hugetlb_unmap_file_folio(struct hstate *h,
> * a reference. We must 'open code' vma locking as we do
> * not know if vma_lock is still attached to vma.
> */
> - down_write(&vma_lock->rw_sema);
> i_mmap_lock_write(mapping);
> + down_write(&vma_lock->rw_sema);
>
> vma = vma_lock->vma;
> if (!vma) {
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index b1f47b87ae65..f0212d2579f6 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5110,8 +5110,8 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma,
> mmu_notifier_invalidate_range_start(&range);
> last_addr_mask = hugetlb_mask_last_page(h);
> /* Prevent race with file truncation */
> - hugetlb_vma_lock_write(vma);
> i_mmap_lock_write(mapping);
> + hugetlb_vma_lock_write(vma);
> for (; old_addr < old_end; old_addr += sz, new_addr += sz) {
> src_pte = hugetlb_walk(vma, old_addr, sz);
> if (!src_pte) {
> @@ -5327,9 +5327,9 @@ void __hugetlb_zap_begin(struct vm_area_struct *vma,
> return;
>
> adjust_range_if_pmd_sharing_possible(vma, start, end);
> - hugetlb_vma_lock_write(vma);
> if (vma->vm_file)
> i_mmap_lock_write(vma->vm_file->f_mapping);
> + hugetlb_vma_lock_write(vma);
> }
>
> void __hugetlb_zap_end(struct vm_area_struct *vma,
> --
> 2.49.0
>
>
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 7+ messages in thread
* [syzbot ci] Re: mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths
2025-11-10 11:15 [PATCH v2 1/1] mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths Lance Yang
2025-11-10 12:17 ` Harry Yoo
@ 2025-11-10 15:19 ` syzbot ci
1 sibling, 0 replies; 7+ messages in thread
From: syzbot ci @ 2025-11-10 15:19 UTC (permalink / raw)
To: akpm, david, lance.yang, linux-kernel, linux-mm, muchun.song,
osalvador, syzbot, syzbot, syzkaller-bugs
Cc: syzbot, syzkaller-bugs
syzbot ci has tested the following series
[v2] mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths
https://lore.kernel.org/all/20251110111553.88384-1-lance.yang@linux.dev
* [PATCH v2 1/1] mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths
and found the following issue:
possible deadlock in unmap_vmas
Full report is available here:
https://ci.syzbot.org/series/3106300b-af9f-40dc-985b-33cb937712aa
***
possible deadlock in unmap_vmas
tree: mm-new
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/akpm/mm.git
base: 02dafa01ec9a00c3758c1c6478d82fe601f5f1ba
arch: amd64
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config: https://ci.syzbot.org/builds/089d8dad-bc85-4724-8a79-2ad934becf17/config
C repro: https://ci.syzbot.org/findings/3dded3f6-aef2-4f4c-9dba-b23214df5e12/c_repro
syz repro: https://ci.syzbot.org/findings/3dded3f6-aef2-4f4c-9dba-b23214df5e12/syz_repro
======================================================
WARNING: possible circular locking dependency detected
syzkaller #0 Not tainted
------------------------------------------------------
syz.0.17/5963 is trying to acquire lock:
ffff88816cbda8e8 (&resv_map->rw_sema){+.+.}-{4:4}, at: hugetlb_zap_begin include/linux/hugetlb.h:257 [inline]
ffff88816cbda8e8 (&resv_map->rw_sema){+.+.}-{4:4}, at: unmap_vmas+0x23d/0x580 mm/memory.c:2098
but task is already holding lock:
ffff888172c80418 (&hugetlbfs_i_mmap_rwsem_key){+.+.}-{4:4}, at: i_mmap_lock_write include/linux/fs.h:548 [inline]
ffff888172c80418 (&hugetlbfs_i_mmap_rwsem_key){+.+.}-{4:4}, at: __hugetlb_zap_begin+0x2aa/0x3a0 mm/hugetlb.c:5331
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&hugetlbfs_i_mmap_rwsem_key){+.+.}-{4:4}:
lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5868
down_write+0x96/0x1f0 kernel/locking/rwsem.c:1590
i_mmap_lock_write include/linux/fs.h:548 [inline]
hugetlb_change_protection+0x56a/0x18c0 mm/hugetlb.c:6439
change_protection+0x386/0x38c0 mm/mprotect.c:655
mprotect_fixup+0x6fe/0x9c0 mm/mprotect.c:774
do_mprotect_pkey+0x8c5/0xcd0 mm/mprotect.c:930
__do_sys_mprotect mm/mprotect.c:951 [inline]
__se_sys_mprotect mm/mprotect.c:948 [inline]
__x64_sys_mprotect+0x80/0x90 mm/mprotect.c:948
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (&resv_map->rw_sema){+.+.}-{4:4}:
check_prev_add kernel/locking/lockdep.c:3165 [inline]
check_prevs_add kernel/locking/lockdep.c:3284 [inline]
validate_chain+0xb9b/0x2140 kernel/locking/lockdep.c:3908
__lock_acquire+0xab9/0xd20 kernel/locking/lockdep.c:5237
lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5868
down_write+0x96/0x1f0 kernel/locking/rwsem.c:1590
hugetlb_zap_begin include/linux/hugetlb.h:257 [inline]
unmap_vmas+0x23d/0x580 mm/memory.c:2098
exit_mmap+0x23e/0xb30 mm/mmap.c:1277
__mmput+0x118/0x430 kernel/fork.c:1133
exit_mm+0x1da/0x2c0 kernel/exit.c:582
do_exit+0x648/0x2300 kernel/exit.c:954
do_group_exit+0x21c/0x2d0 kernel/exit.c:1107
__do_sys_exit_group kernel/exit.c:1118 [inline]
__se_sys_exit_group kernel/exit.c:1116 [inline]
__x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1116
x64_sys_call+0x21f7/0x2200 arch/x86/include/generated/asm/syscalls_64.h:232
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&hugetlbfs_i_mmap_rwsem_key);
lock(&resv_map->rw_sema);
lock(&hugetlbfs_i_mmap_rwsem_key);
lock(&resv_map->rw_sema);
*** DEADLOCK ***
2 locks held by syz.0.17/5963:
#0: ffff88810e0f8ca0 (&mm->mmap_lock){++++}-{4:4}, at: mmap_read_lock include/linux/mmap_lock.h:365 [inline]
#0: ffff88810e0f8ca0 (&mm->mmap_lock){++++}-{4:4}, at: exit_mmap+0x126/0xb30 mm/mmap.c:1262
#1: ffff888172c80418 (&hugetlbfs_i_mmap_rwsem_key){+.+.}-{4:4}, at: i_mmap_lock_write include/linux/fs.h:548 [inline]
#1: ffff888172c80418 (&hugetlbfs_i_mmap_rwsem_key){+.+.}-{4:4}, at: __hugetlb_zap_begin+0x2aa/0x3a0 mm/hugetlb.c:5331
stack backtrace:
CPU: 1 UID: 0 PID: 5963 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_circular_bug+0x2ee/0x310 kernel/locking/lockdep.c:2043
check_noncircular+0x134/0x160 kernel/locking/lockdep.c:2175
check_prev_add kernel/locking/lockdep.c:3165 [inline]
check_prevs_add kernel/locking/lockdep.c:3284 [inline]
validate_chain+0xb9b/0x2140 kernel/locking/lockdep.c:3908
__lock_acquire+0xab9/0xd20 kernel/locking/lockdep.c:5237
lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5868
down_write+0x96/0x1f0 kernel/locking/rwsem.c:1590
hugetlb_zap_begin include/linux/hugetlb.h:257 [inline]
unmap_vmas+0x23d/0x580 mm/memory.c:2098
exit_mmap+0x23e/0xb30 mm/mmap.c:1277
__mmput+0x118/0x430 kernel/fork.c:1133
exit_mm+0x1da/0x2c0 kernel/exit.c:582
do_exit+0x648/0x2300 kernel/exit.c:954
do_group_exit+0x21c/0x2d0 kernel/exit.c:1107
__do_sys_exit_group kernel/exit.c:1118 [inline]
__se_sys_exit_group kernel/exit.c:1116 [inline]
__x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1116
x64_sys_call+0x21f7/0x2200 arch/x86/include/generated/asm/syscalls_64.h:232
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f6bb058efc9
Code: Unable to access opcode bytes at 0x7f6bb058ef9f.
RSP: 002b:00007ffedbb6b2f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6bb058efc9
RDX: 0000000000000064 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000003 R08: 00000002dbb6b3ef R09: 00007f6bb07b1280
R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f6bb07b1280 R14: 0000000000000003 R15: 00007ffedbb6b3b0
</TASK>
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths
2025-11-10 12:17 ` Harry Yoo
@ 2025-11-10 16:39 ` Lance Yang
2025-11-10 23:07 ` Hillf Danton
0 siblings, 1 reply; 7+ messages in thread
From: Lance Yang @ 2025-11-10 16:39 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, syzbot+3f5f9a0d292454409ca6, syzbot+ci5a676d3d210999ee,
david, linux-kernel, linux-mm, muchun.song, osalvador,
syzkaller-bugs, syzbot, syzbot
On 2025/11/10 20:17, Harry Yoo wrote:
> On Mon, Nov 10, 2025 at 07:15:53PM +0800, Lance Yang wrote:
>> From: Lance Yang <lance.yang@linux.dev>
>>
>> The hugetlb VMA unmap path contains several potential deadlocks, as
>> reported by syzbot. These deadlocks occur in __hugetlb_zap_begin(),
>> move_hugetlb_page_tables(), and the retry path of
>> hugetlb_unmap_file_folio() (affecting remove_inode_hugepages() and
>> unmap_vmas()), where vma_lock is acquired before i_mmap_lock. This lock
>> ordering conflicts with other paths like hugetlb_fault(), which establish
>> the correct dependency as i_mmap_lock -> vma_lock.
>>
>> Possible unsafe locking scenario:
>>
>> CPU0 CPU1
>> ---- ----
>> lock(&vma_lock->rw_sema);
>> lock(&i_mmap_lock);
>> lock(&vma_lock->rw_sema);
>> lock(&i_mmap_lock);
>>
>> Resolve the circular dependencies reported by syzbot across multiple call
>> chains by reordering the locks in all conflicting paths to consistently
>> follow the established i_mmap_lock -> vma_lock order.
>
> But mm/rmap.c says:
>> * hugetlbfs PageHuge() take locks in this order:
>> * hugetlb_fault_mutex (hugetlbfs specific page fault mutex)
>> * vma_lock (hugetlb specific lock for pmd_sharing)
>> * mapping->i_mmap_rwsem (also used for hugetlb pmd sharing)
>> * folio_lock
>> */
Thanks! You are right, I was mistaken ...
>
> I think the commit message should explain why the locking order described
> above is incorrect (or when it became incorrect) and fix the comment?
I think the locking order documented in mm/rmap.c (vma_lock -> i_mmap_lock)
is indeed the correct one to follow.
This fix has it backwards then. I'll rework it to fix the actual violations.
Thanks,
Lance
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths
2025-11-10 16:39 ` Lance Yang
@ 2025-11-10 23:07 ` Hillf Danton
2025-11-11 3:20 ` Lance Yang
0 siblings, 1 reply; 7+ messages in thread
From: Hillf Danton @ 2025-11-10 23:07 UTC (permalink / raw)
To: Lance Yang; +Cc: Harry Yoo, akpm, linux-kernel, linux-mm
On Tue, 11 Nov 2025 00:39:29 +0800 Lance Yang wrote:
> On 2025/11/10 20:17, Harry Yoo wrote:
> > On Mon, Nov 10, 2025 at 07:15:53PM +0800, Lance Yang wrote:
> >> From: Lance Yang <lance.yang@linux.dev>
> >>
> >> The hugetlb VMA unmap path contains several potential deadlocks, as
> >> reported by syzbot. These deadlocks occur in __hugetlb_zap_begin(),
> >> move_hugetlb_page_tables(), and the retry path of
> >> hugetlb_unmap_file_folio() (affecting remove_inode_hugepages() and
> >> unmap_vmas()), where vma_lock is acquired before i_mmap_lock. This lock
> >> ordering conflicts with other paths like hugetlb_fault(), which establish
> >> the correct dependency as i_mmap_lock -> vma_lock.
> >>
> >> Possible unsafe locking scenario:
> >>
> >> CPU0 CPU1
> >> ---- ----
> >> lock(&vma_lock->rw_sema);
> >> lock(&i_mmap_lock);
> >> lock(&vma_lock->rw_sema);
> >> lock(&i_mmap_lock);
> >>
> >> Resolve the circular dependencies reported by syzbot across multiple call
> >> chains by reordering the locks in all conflicting paths to consistently
> >> follow the established i_mmap_lock -> vma_lock order.
> >
> > But mm/rmap.c says:
> >> * hugetlbfs PageHuge() take locks in this order:
> >> * hugetlb_fault_mutex (hugetlbfs specific page fault mutex)
> >> * vma_lock (hugetlb specific lock for pmd_sharing)
> >> * mapping->i_mmap_rwsem (also used for hugetlb pmd sharing)
> >> * folio_lock
> >> */
>
> Thanks! You are right, I was mistaken ...
>
> >
> > I think the commit message should explain why the locking order described
> > above is incorrect (or when it became incorrect) and fix the comment?
>
> I think the locking order documented in mm/rmap.c (vma_lock -> i_mmap_lock)
> is indeed the correct one to follow.
>
> This fix has it backwards then. I'll rework it to fix the actual violations.
>
Break a leg, better after taking a look at ffa1e7ada456 ("block: Make
request_queue lockdep splats show up earlier")
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths
2025-11-10 23:07 ` Hillf Danton
@ 2025-11-11 3:20 ` Lance Yang
2025-11-11 3:25 ` Lance Yang
0 siblings, 1 reply; 7+ messages in thread
From: Lance Yang @ 2025-11-11 3:20 UTC (permalink / raw)
To: mike.kravetz; +Cc: Harry Yoo, akpm, linux-kernel, linux-mm, Hillf Danton
+Mike
On 2025/11/11 07:07, Hillf Danton wrote:
> On Tue, 11 Nov 2025 00:39:29 +0800 Lance Yang wrote:
>> On 2025/11/10 20:17, Harry Yoo wrote:
>>> On Mon, Nov 10, 2025 at 07:15:53PM +0800, Lance Yang wrote:
>>>> From: Lance Yang <lance.yang@linux.dev>
>>>>
>>>> The hugetlb VMA unmap path contains several potential deadlocks, as
>>>> reported by syzbot. These deadlocks occur in __hugetlb_zap_begin(),
>>>> move_hugetlb_page_tables(), and the retry path of
>>>> hugetlb_unmap_file_folio() (affecting remove_inode_hugepages() and
>>>> unmap_vmas()), where vma_lock is acquired before i_mmap_lock. This lock
>>>> ordering conflicts with other paths like hugetlb_fault(), which establish
>>>> the correct dependency as i_mmap_lock -> vma_lock.
>>>>
>>>> Possible unsafe locking scenario:
>>>>
>>>> CPU0 CPU1
>>>> ---- ----
>>>> lock(&vma_lock->rw_sema);
>>>> lock(&i_mmap_lock);
>>>> lock(&vma_lock->rw_sema);
>>>> lock(&i_mmap_lock);
>>>>
>>>> Resolve the circular dependencies reported by syzbot across multiple call
>>>> chains by reordering the locks in all conflicting paths to consistently
>>>> follow the established i_mmap_lock -> vma_lock order.
>>>
>>> But mm/rmap.c says:
>>>> * hugetlbfs PageHuge() take locks in this order:
>>>> * hugetlb_fault_mutex (hugetlbfs specific page fault mutex)
>>>> * vma_lock (hugetlb specific lock for pmd_sharing)
>>>> * mapping->i_mmap_rwsem (also used for hugetlb pmd sharing)
>>>> * folio_lock
>>>> */
>>
>> Thanks! You are right, I was mistaken ...
>>
>>>
>>> I think the commit message should explain why the locking order described
>>> above is incorrect (or when it became incorrect) and fix the comment?
>>
>> I think the locking order documented in mm/rmap.c (vma_lock -> i_mmap_lock)
>> is indeed the correct one to follow.
Looking at the commit[1] that introduced the vma_lock, it seems possible
that
the deadlock reported by syzbot[2] is a false positive ...
From the commit message:
```
The vma_lock is used as follows:
- During fault processing. The lock is acquired in read mode before
doing a page table lock and allocation (huge_pte_alloc). The lock is
held until code is finished with the page table entry (ptep).
- The lock must be held in write mode whenever huge_pmd_unshare is
called.
Lock ordering issues come into play when unmapping a page from all
vmas mapping the page. The i_mmap_rwsem must be held to search for the
vmas, and the vma lock must be held before calling unmap which will
call huge_pmd_unshare. This is done today in:
- try_to_migrate_one and try_to_unmap_ for page migration and memory
error handling. In these routines we 'try' to obtain the vma lock and
fail to unmap if unsuccessful. Calling routines already deal with the
failure of unmapping.
- hugetlb_vmdelete_list for truncation and hole punch. This routine
also tries to acquire the vma lock. If it fails, it skips the
unmapping. However, we can not have file truncation or hole punch
fail because of contention. After hugetlb_vmdelete_list, truncation
and hole punch call remove_inode_hugepages. remove_inode_hugepages
checks for mapped pages and call hugetlb_unmap_file_page to unmap them.
hugetlb_unmap_file_page is designed to drop locks and reacquire in the
correct order to guarantee unmap success.```
The locking logic is a bit tricky; some paths can't follow a strict lock
order
and must use trylock or a drop/retry pattern to avoid deadlocking :)
Hoping Mike can take a look and confirm!
[1]
https://lore.kernel.org/all/20220914221810.95771-9-mike.kravetz@oracle.com/
[2]
https://lore.kernel.org/linux-mm/69113a97.a70a0220.22f260.00ca.GAE@google.com/
Thanks,
Lance
>>
>> This fix has it backwards then. I'll rework it to fix the actual violations.
>>
> Break a leg, better after taking a look at ffa1e7ada456 ("block: Make
> request_queue lockdep splats show up earlier")
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/1] mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths
2025-11-11 3:20 ` Lance Yang
@ 2025-11-11 3:25 ` Lance Yang
0 siblings, 0 replies; 7+ messages in thread
From: Lance Yang @ 2025-11-11 3:25 UTC (permalink / raw)
To: mike.kravetz
Cc: Harry Yoo, akpm, linux-kernel, linux-mm, Hillf Danton,
muchun.song, osalvador, david
Cc: HugeTLB folks
On 2025/11/11 11:20, Lance Yang wrote:
> +Mike
>
> On 2025/11/11 07:07, Hillf Danton wrote:
>> On Tue, 11 Nov 2025 00:39:29 +0800 Lance Yang wrote:
>>> On 2025/11/10 20:17, Harry Yoo wrote:
>>>> On Mon, Nov 10, 2025 at 07:15:53PM +0800, Lance Yang wrote:
>>>>> From: Lance Yang <lance.yang@linux.dev>
>>>>>
>>>>> The hugetlb VMA unmap path contains several potential deadlocks, as
>>>>> reported by syzbot. These deadlocks occur in __hugetlb_zap_begin(),
>>>>> move_hugetlb_page_tables(), and the retry path of
>>>>> hugetlb_unmap_file_folio() (affecting remove_inode_hugepages() and
>>>>> unmap_vmas()), where vma_lock is acquired before i_mmap_lock. This
>>>>> lock
>>>>> ordering conflicts with other paths like hugetlb_fault(), which
>>>>> establish
>>>>> the correct dependency as i_mmap_lock -> vma_lock.
>>>>>
>>>>> Possible unsafe locking scenario:
>>>>>
>>>>> CPU0 CPU1
>>>>> ---- ----
>>>>> lock(&vma_lock->rw_sema);
>>>>> lock(&i_mmap_lock);
>>>>> lock(&vma_lock->rw_sema);
>>>>> lock(&i_mmap_lock);
>>>>>
>>>>> Resolve the circular dependencies reported by syzbot across
>>>>> multiple call
>>>>> chains by reordering the locks in all conflicting paths to
>>>>> consistently
>>>>> follow the established i_mmap_lock -> vma_lock order.
>>>>
>>>> But mm/rmap.c says:
>>>>> * hugetlbfs PageHuge() take locks in this order:
>>>>> * hugetlb_fault_mutex (hugetlbfs specific page fault mutex)
>>>>> * vma_lock (hugetlb specific lock for pmd_sharing)
>>>>> * mapping->i_mmap_rwsem (also used for hugetlb pmd sharing)
>>>>> * folio_lock
>>>>> */
>>>
>>> Thanks! You are right, I was mistaken ...
>>>
>>>>
>>>> I think the commit message should explain why the locking order
>>>> described
>>>> above is incorrect (or when it became incorrect) and fix the comment?
>>>
>>> I think the locking order documented in mm/rmap.c (vma_lock ->
>>> i_mmap_lock)
>>> is indeed the correct one to follow.
>
> Looking at the commit[1] that introduced the vma_lock, it seems possible
> that
> the deadlock reported by syzbot[2] is a false positive ...
>
> From the commit message:
>
> ```
> The vma_lock is used as follows:
> - During fault processing. The lock is acquired in read mode before
> doing a page table lock and allocation (huge_pte_alloc). The lock is
> held until code is finished with the page table entry (ptep).
> - The lock must be held in write mode whenever huge_pmd_unshare is
> called.
>
> Lock ordering issues come into play when unmapping a page from all
> vmas mapping the page. The i_mmap_rwsem must be held to search for the
> vmas, and the vma lock must be held before calling unmap which will
> call huge_pmd_unshare. This is done today in:
> - try_to_migrate_one and try_to_unmap_ for page migration and memory
> error handling. In these routines we 'try' to obtain the vma lock and
> fail to unmap if unsuccessful. Calling routines already deal with the
> failure of unmapping.
> - hugetlb_vmdelete_list for truncation and hole punch. This routine
> also tries to acquire the vma lock. If it fails, it skips the
> unmapping. However, we can not have file truncation or hole punch
> fail because of contention. After hugetlb_vmdelete_list, truncation
> and hole punch call remove_inode_hugepages. remove_inode_hugepages
> checks for mapped pages and call hugetlb_unmap_file_page to unmap them.
> hugetlb_unmap_file_page is designed to drop locks and reacquire in the
> correct order to guarantee unmap success.```
>
> The locking logic is a bit tricky; some paths can't follow a strict lock
> order
> and must use trylock or a drop/retry pattern to avoid deadlocking :)
>
> Hoping Mike can take a look and confirm!
>
> [1] https://lore.kernel.org/all/20220914221810.95771-9-
> mike.kravetz@oracle.com/
> [2] https://lore.kernel.org/linux-
> mm/69113a97.a70a0220.22f260.00ca.GAE@google.com/
>
> Thanks,
> Lance
>
>>>
>>> This fix has it backwards then. I'll rework it to fix the actual
>>> violations.
>>>
>> Break a leg, better after taking a look at ffa1e7ada456 ("block: Make
>> request_queue lockdep splats show up earlier")
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-11-11 3:26 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-10 11:15 [PATCH v2 1/1] mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths Lance Yang
2025-11-10 12:17 ` Harry Yoo
2025-11-10 16:39 ` Lance Yang
2025-11-10 23:07 ` Hillf Danton
2025-11-11 3:20 ` Lance Yang
2025-11-11 3:25 ` Lance Yang
2025-11-10 15:19 ` [syzbot ci] " syzbot ci
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).