linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: mike.kravetz@oracle.com
Cc: Harry Yoo <harry.yoo@oracle.com>,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Hillf Danton <hdanton@sina.com>
Subject: Re: [PATCH v2 1/1] mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths
Date: Tue, 11 Nov 2025 11:20:22 +0800	[thread overview]
Message-ID: <aef27438-7636-4c38-a5c7-beda95efa02b@linux.dev> (raw)
In-Reply-To: <20251110230745.9105-1-hdanton@sina.com>

+Mike

On 2025/11/11 07:07, Hillf Danton wrote:
> On Tue, 11 Nov 2025 00:39:29 +0800 Lance Yang wrote:
>> On 2025/11/10 20:17, Harry Yoo wrote:
>>> On Mon, Nov 10, 2025 at 07:15:53PM +0800, Lance Yang wrote:
>>>> From: Lance Yang <lance.yang@linux.dev>
>>>>
>>>> The hugetlb VMA unmap path contains several potential deadlocks, as
>>>> reported by syzbot. These deadlocks occur in __hugetlb_zap_begin(),
>>>> move_hugetlb_page_tables(), and the retry path of
>>>> hugetlb_unmap_file_folio() (affecting remove_inode_hugepages() and
>>>> unmap_vmas()), where vma_lock is acquired before i_mmap_lock. This lock
>>>> ordering conflicts with other paths like hugetlb_fault(), which establish
>>>> the correct dependency as i_mmap_lock -> vma_lock.
>>>>
>>>> Possible unsafe locking scenario:
>>>>
>>>> CPU0                                 CPU1
>>>> ----                                 ----
>>>> lock(&vma_lock->rw_sema);
>>>>                                        lock(&i_mmap_lock);
>>>>                                        lock(&vma_lock->rw_sema);
>>>> lock(&i_mmap_lock);
>>>>
>>>> Resolve the circular dependencies reported by syzbot across multiple call
>>>> chains by reordering the locks in all conflicting paths to consistently
>>>> follow the established i_mmap_lock -> vma_lock order.
>>>
>>> But mm/rmap.c says:
>>>> * hugetlbfs PageHuge() take locks in this order:
>>>> *   hugetlb_fault_mutex (hugetlbfs specific page fault mutex)
>>>> *     vma_lock (hugetlb specific lock for pmd_sharing)
>>>> *       mapping->i_mmap_rwsem (also used for hugetlb pmd sharing)
>>>> *         folio_lock
>>>> */
>>
>> Thanks! You are right, I was mistaken ...
>>
>>>
>>> I think the commit message should explain why the locking order described
>>> above is incorrect (or when it became incorrect) and fix the comment?
>>
>> I think the locking order documented in mm/rmap.c (vma_lock -> i_mmap_lock)
>> is indeed the correct one to follow.

Looking at the commit[1] that introduced the vma_lock, it seems possible 
that
the deadlock reported by syzbot[2] is a false positive ...

 From the commit message:

```
The vma_lock is used as follows:
- During fault processing. The lock is acquired in read mode before
   doing a page table lock and allocation (huge_pte_alloc).  The lock is
   held until code is finished with the page table entry (ptep).
- The lock must be held in write mode whenever huge_pmd_unshare is
   called.

Lock ordering issues come into play when unmapping a page from all
vmas mapping the page.  The i_mmap_rwsem must be held to search for the
vmas, and the vma lock must be held before calling unmap which will
call huge_pmd_unshare.  This is done today in:
- try_to_migrate_one and try_to_unmap_ for page migration and memory
   error handling.  In these routines we 'try' to obtain the vma lock and
   fail to unmap if unsuccessful.  Calling routines already deal with the
   failure of unmapping.
- hugetlb_vmdelete_list for truncation and hole punch.  This routine
   also tries to acquire the vma lock.  If it fails, it skips the
   unmapping.  However, we can not have file truncation or hole punch
   fail because of contention.  After hugetlb_vmdelete_list, truncation
   and hole punch call remove_inode_hugepages.  remove_inode_hugepages
   checks for mapped pages and call hugetlb_unmap_file_page to unmap them.
   hugetlb_unmap_file_page is designed to drop locks and reacquire in the
   correct order to guarantee unmap success.```

The locking logic is a bit tricky; some paths can't follow a strict lock 
order
and must use trylock or a drop/retry pattern to avoid deadlocking :)

Hoping Mike can take a look and confirm!

[1] 
https://lore.kernel.org/all/20220914221810.95771-9-mike.kravetz@oracle.com/
[2] 
https://lore.kernel.org/linux-mm/69113a97.a70a0220.22f260.00ca.GAE@google.com/

Thanks,
Lance

>>
>> This fix has it backwards then. I'll rework it to fix the actual violations.
>>
> Break a leg, better after taking a look at ffa1e7ada456 ("block: Make
> request_queue lockdep splats show up earlier")



  reply	other threads:[~2025-11-11  3:20 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-10 11:15 [PATCH v2 1/1] mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths Lance Yang
2025-11-10 12:17 ` Harry Yoo
2025-11-10 16:39   ` Lance Yang
2025-11-10 23:07     ` Hillf Danton
2025-11-11  3:20       ` Lance Yang [this message]
2025-11-11  3:25         ` Lance Yang
2025-11-10 15:19 ` [syzbot ci] " syzbot ci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aef27438-7636-4c38-a5c7-beda95efa02b@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=harry.yoo@oracle.com \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).