From: Lance Yang <lance.yang@linux.dev>
To: mike.kravetz@oracle.com
Cc: Harry Yoo <harry.yoo@oracle.com>,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, Hillf Danton <hdanton@sina.com>,
muchun.song@linux.dev, osalvador@suse.de, david@kernel.org
Subject: Re: [PATCH v2 1/1] mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths
Date: Tue, 11 Nov 2025 11:25:42 +0800 [thread overview]
Message-ID: <0432bee6-b384-4cd7-ac1c-e9123c7b393f@linux.dev> (raw)
In-Reply-To: <aef27438-7636-4c38-a5c7-beda95efa02b@linux.dev>
Cc: HugeTLB folks
On 2025/11/11 11:20, Lance Yang wrote:
> +Mike
>
> On 2025/11/11 07:07, Hillf Danton wrote:
>> On Tue, 11 Nov 2025 00:39:29 +0800 Lance Yang wrote:
>>> On 2025/11/10 20:17, Harry Yoo wrote:
>>>> On Mon, Nov 10, 2025 at 07:15:53PM +0800, Lance Yang wrote:
>>>>> From: Lance Yang <lance.yang@linux.dev>
>>>>>
>>>>> The hugetlb VMA unmap path contains several potential deadlocks, as
>>>>> reported by syzbot. These deadlocks occur in __hugetlb_zap_begin(),
>>>>> move_hugetlb_page_tables(), and the retry path of
>>>>> hugetlb_unmap_file_folio() (affecting remove_inode_hugepages() and
>>>>> unmap_vmas()), where vma_lock is acquired before i_mmap_lock. This
>>>>> lock
>>>>> ordering conflicts with other paths like hugetlb_fault(), which
>>>>> establish
>>>>> the correct dependency as i_mmap_lock -> vma_lock.
>>>>>
>>>>> Possible unsafe locking scenario:
>>>>>
>>>>> CPU0 CPU1
>>>>> ---- ----
>>>>> lock(&vma_lock->rw_sema);
>>>>> lock(&i_mmap_lock);
>>>>> lock(&vma_lock->rw_sema);
>>>>> lock(&i_mmap_lock);
>>>>>
>>>>> Resolve the circular dependencies reported by syzbot across
>>>>> multiple call
>>>>> chains by reordering the locks in all conflicting paths to
>>>>> consistently
>>>>> follow the established i_mmap_lock -> vma_lock order.
>>>>
>>>> But mm/rmap.c says:
>>>>> * hugetlbfs PageHuge() take locks in this order:
>>>>> * hugetlb_fault_mutex (hugetlbfs specific page fault mutex)
>>>>> * vma_lock (hugetlb specific lock for pmd_sharing)
>>>>> * mapping->i_mmap_rwsem (also used for hugetlb pmd sharing)
>>>>> * folio_lock
>>>>> */
>>>
>>> Thanks! You are right, I was mistaken ...
>>>
>>>>
>>>> I think the commit message should explain why the locking order
>>>> described
>>>> above is incorrect (or when it became incorrect) and fix the comment?
>>>
>>> I think the locking order documented in mm/rmap.c (vma_lock ->
>>> i_mmap_lock)
>>> is indeed the correct one to follow.
>
> Looking at the commit[1] that introduced the vma_lock, it seems possible
> that
> the deadlock reported by syzbot[2] is a false positive ...
>
> From the commit message:
>
> ```
> The vma_lock is used as follows:
> - During fault processing. The lock is acquired in read mode before
> doing a page table lock and allocation (huge_pte_alloc). The lock is
> held until code is finished with the page table entry (ptep).
> - The lock must be held in write mode whenever huge_pmd_unshare is
> called.
>
> Lock ordering issues come into play when unmapping a page from all
> vmas mapping the page. The i_mmap_rwsem must be held to search for the
> vmas, and the vma lock must be held before calling unmap which will
> call huge_pmd_unshare. This is done today in:
> - try_to_migrate_one and try_to_unmap_ for page migration and memory
> error handling. In these routines we 'try' to obtain the vma lock and
> fail to unmap if unsuccessful. Calling routines already deal with the
> failure of unmapping.
> - hugetlb_vmdelete_list for truncation and hole punch. This routine
> also tries to acquire the vma lock. If it fails, it skips the
> unmapping. However, we can not have file truncation or hole punch
> fail because of contention. After hugetlb_vmdelete_list, truncation
> and hole punch call remove_inode_hugepages. remove_inode_hugepages
> checks for mapped pages and call hugetlb_unmap_file_page to unmap them.
> hugetlb_unmap_file_page is designed to drop locks and reacquire in the
> correct order to guarantee unmap success.```
>
> The locking logic is a bit tricky; some paths can't follow a strict lock
> order
> and must use trylock or a drop/retry pattern to avoid deadlocking :)
>
> Hoping Mike can take a look and confirm!
>
> [1] https://lore.kernel.org/all/20220914221810.95771-9-
> mike.kravetz@oracle.com/
> [2] https://lore.kernel.org/linux-
> mm/69113a97.a70a0220.22f260.00ca.GAE@google.com/
>
> Thanks,
> Lance
>
>>>
>>> This fix has it backwards then. I'll rework it to fix the actual
>>> violations.
>>>
>> Break a leg, better after taking a look at ffa1e7ada456 ("block: Make
>> request_queue lockdep splats show up earlier")
>
next prev parent reply other threads:[~2025-11-11 3:26 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-10 11:15 [PATCH v2 1/1] mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths Lance Yang
2025-11-10 12:17 ` Harry Yoo
2025-11-10 16:39 ` Lance Yang
2025-11-10 23:07 ` Hillf Danton
2025-11-11 3:20 ` Lance Yang
2025-11-11 3:25 ` Lance Yang [this message]
2025-11-10 15:19 ` [syzbot ci] " syzbot ci
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0432bee6-b384-4cd7-ac1c-e9123c7b393f@linux.dev \
--to=lance.yang@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=harry.yoo@oracle.com \
--cc=hdanton@sina.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).