From: Lance Yang <lance.yang@linux.dev>
To: mike.kravetz@oracle.com
Cc: Harry Yoo <harry.yoo@oracle.com>,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, Hillf Danton <hdanton@sina.com>,
muchun.song@linux.dev, osalvador@suse.de, david@kernel.org
Subject: Re: [PATCH v2 1/1] mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths
Date: Tue, 11 Nov 2025 11:25:42 +0800 [thread overview]
Message-ID: <0432bee6-b384-4cd7-ac1c-e9123c7b393f@linux.dev> (raw)
In-Reply-To: <aef27438-7636-4c38-a5c7-beda95efa02b@linux.dev>
Cc: HugeTLB folks
On 2025/11/11 11:20, Lance Yang wrote:
> +Mike
>
> On 2025/11/11 07:07, Hillf Danton wrote:
>> On Tue, 11 Nov 2025 00:39:29 +0800 Lance Yang wrote:
>>> On 2025/11/10 20:17, Harry Yoo wrote:
>>>> On Mon, Nov 10, 2025 at 07:15:53PM +0800, Lance Yang wrote:
>>>>> From: Lance Yang <lance.yang@linux.dev>
>>>>>
>>>>> The hugetlb VMA unmap path contains several potential deadlocks, as
>>>>> reported by syzbot. These deadlocks occur in __hugetlb_zap_begin(),
>>>>> move_hugetlb_page_tables(), and the retry path of
>>>>> hugetlb_unmap_file_folio() (affecting remove_inode_hugepages() and
>>>>> unmap_vmas()), where vma_lock is acquired before i_mmap_lock. This
>>>>> lock
>>>>> ordering conflicts with other paths like hugetlb_fault(), which
>>>>> establish
>>>>> the correct dependency as i_mmap_lock -> vma_lock.
>>>>>
>>>>> Possible unsafe locking scenario:
>>>>>
>>>>> CPU0 CPU1
>>>>> ---- ----
>>>>> lock(&vma_lock->rw_sema);
>>>>> lock(&i_mmap_lock);
>>>>> lock(&vma_lock->rw_sema);
>>>>> lock(&i_mmap_lock);
>>>>>
>>>>> Resolve the circular dependencies reported by syzbot across
>>>>> multiple call
>>>>> chains by reordering the locks in all conflicting paths to
>>>>> consistently
>>>>> follow the established i_mmap_lock -> vma_lock order.
>>>>
>>>> But mm/rmap.c says:
>>>>> * hugetlbfs PageHuge() take locks in this order:
>>>>> * hugetlb_fault_mutex (hugetlbfs specific page fault mutex)
>>>>> * vma_lock (hugetlb specific lock for pmd_sharing)
>>>>> * mapping->i_mmap_rwsem (also used for hugetlb pmd sharing)
>>>>> * folio_lock
>>>>> */
>>>
>>> Thanks! You are right, I was mistaken ...
>>>
>>>>
>>>> I think the commit message should explain why the locking order
>>>> described
>>>> above is incorrect (or when it became incorrect) and fix the comment?
>>>
>>> I think the locking order documented in mm/rmap.c (vma_lock ->
>>> i_mmap_lock)
>>> is indeed the correct one to follow.
>
> Looking at the commit[1] that introduced the vma_lock, it seems possible
> that
> the deadlock reported by syzbot[2] is a false positive ...
>
> From the commit message:
>
> ```
> The vma_lock is used as follows:
> - During fault processing. The lock is acquired in read mode before
> doing a page table lock and allocation (huge_pte_alloc). The lock is
> held until code is finished with the page table entry (ptep).
> - The lock must be held in write mode whenever huge_pmd_unshare is
> called.
>
> Lock ordering issues come into play when unmapping a page from all
> vmas mapping the page. The i_mmap_rwsem must be held to search for the
> vmas, and the vma lock must be held before calling unmap which will
> call huge_pmd_unshare. This is done today in:
> - try_to_migrate_one and try_to_unmap_ for page migration and memory
> error handling. In these routines we 'try' to obtain the vma lock and
> fail to unmap if unsuccessful. Calling routines already deal with the
> failure of unmapping.
> - hugetlb_vmdelete_list for truncation and hole punch. This routine
> also tries to acquire the vma lock. If it fails, it skips the
> unmapping. However, we can not have file truncation or hole punch
> fail because of contention. After hugetlb_vmdelete_list, truncation
> and hole punch call remove_inode_hugepages. remove_inode_hugepages
> checks for mapped pages and call hugetlb_unmap_file_page to unmap them.
> hugetlb_unmap_file_page is designed to drop locks and reacquire in the
> correct order to guarantee unmap success.```
>
> The locking logic is a bit tricky; some paths can't follow a strict lock
> order
> and must use trylock or a drop/retry pattern to avoid deadlocking :)
>
> Hoping Mike can take a look and confirm!
>
> [1] https://lore.kernel.org/all/20220914221810.95771-9-
> mike.kravetz@oracle.com/
> [2] https://lore.kernel.org/linux-
> mm/69113a97.a70a0220.22f260.00ca.GAE@google.com/
>
> Thanks,
> Lance
>
>>>
>>> This fix has it backwards then. I'll rework it to fix the actual
>>> violations.
>>>
>> Break a leg, better after taking a look at ffa1e7ada456 ("block: Make
>> request_queue lockdep splats show up earlier")
>
next prev parent reply other threads:[~2025-11-11 3:26 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-10 11:15 [PATCH v2 1/1] mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths Lance Yang
2025-11-10 12:17 ` Harry Yoo
2025-11-10 16:39 ` Lance Yang
2025-11-10 23:07 ` Hillf Danton
2025-11-11 3:20 ` Lance Yang
2025-11-11 3:25 ` Lance Yang [this message]
2025-11-10 15:19 ` [syzbot ci] " syzbot ci
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0432bee6-b384-4cd7-ac1c-e9123c7b393f@linux.dev \
--to=lance.yang@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=harry.yoo@oracle.com \
--cc=hdanton@sina.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.