From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-187.mta1.migadu.com (out-187.mta1.migadu.com [95.215.58.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 344A4314A8F for ; Tue, 11 Nov 2025 03:25:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.187 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762831560; cv=none; b=VdUk45REwdn2AfOOHbn2/sV5veIk2YDgQm3Ncr0Ya0rHSg2fj4lKbvp1JSesZY0XbqipfGinRQNRBy4AgK16IsELxRjJ5gyEMUY5UyNz8ToTON4TCO913j2hutTPe2Xnd20GRWfAevIlsahCYpJIUOg1mThaOLiLd7oCMA+euoo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762831560; c=relaxed/simple; bh=+3A/dqDBUaiD/fqCHMYkpoRB9/fB6Lb8nuvP9o5hNkM=; h=Message-ID:Date:MIME-Version:Subject:From:To:Cc:References: In-Reply-To:Content-Type; b=Fpc9847ppcf4PJT9KdGpf+jqCKrIyzhQzx+KIXl22bVEuVuVZdk+LTMoFt+2Jl09corvxTBSn9ZNAOfabaiBa1OWZVqH7HrJhwoqP7keFkSZvsiX01Ao7PpuYkDpOaqojz8Uocus+LnH+CaYEZOX4ITFUJ91UXbI8tmh3kUQkdM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=kRjP4hwt; arc=none smtp.client-ip=95.215.58.187 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="kRjP4hwt" Message-ID: <0432bee6-b384-4cd7-ac1c-e9123c7b393f@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1762831556; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MWmeOmbaA7+VgqiMTnj4K9aVuIljbTl3qUWGKZSf2ZI=; b=kRjP4hwtLD/JKJhFT4zQmm9kFq1RTjPBFxj/uY9nDzwWWJ+0CxeWDkGfDw0cgwdDzBgWh9 1qhVlRRJTFNiNdEaMuKaXVneQlZa9ZpQI6kBrhlWqcqUYjjj73FvsAPztH8Gaa1CNjy7sf 7fqgBNkxyZYUM4R44GqIieGGbWGVi0U= Date: Tue, 11 Nov 2025 11:25:42 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH v2 1/1] mm/hugetlb: fix possible deadlocks in hugetlb VMA unmap paths Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Lance Yang To: mike.kravetz@oracle.com Cc: Harry Yoo , akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Hillf Danton , muchun.song@linux.dev, osalvador@suse.de, david@kernel.org References: <20251110230745.9105-1-hdanton@sina.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT Cc: HugeTLB folks On 2025/11/11 11:20, Lance Yang wrote: > +Mike > > On 2025/11/11 07:07, Hillf Danton wrote: >> On Tue, 11 Nov 2025 00:39:29 +0800 Lance Yang wrote: >>> On 2025/11/10 20:17, Harry Yoo wrote: >>>> On Mon, Nov 10, 2025 at 07:15:53PM +0800, Lance Yang wrote: >>>>> From: Lance Yang >>>>> >>>>> The hugetlb VMA unmap path contains several potential deadlocks, as >>>>> reported by syzbot. These deadlocks occur in __hugetlb_zap_begin(), >>>>> move_hugetlb_page_tables(), and the retry path of >>>>> hugetlb_unmap_file_folio() (affecting remove_inode_hugepages() and >>>>> unmap_vmas()), where vma_lock is acquired before i_mmap_lock. This >>>>> lock >>>>> ordering conflicts with other paths like hugetlb_fault(), which >>>>> establish >>>>> the correct dependency as i_mmap_lock -> vma_lock. >>>>> >>>>> Possible unsafe locking scenario: >>>>> >>>>> CPU0                                 CPU1 >>>>> ----                                 ---- >>>>> lock(&vma_lock->rw_sema); >>>>>                                        lock(&i_mmap_lock); >>>>>                                        lock(&vma_lock->rw_sema); >>>>> lock(&i_mmap_lock); >>>>> >>>>> Resolve the circular dependencies reported by syzbot across >>>>> multiple call >>>>> chains by reordering the locks in all conflicting paths to >>>>> consistently >>>>> follow the established i_mmap_lock -> vma_lock order. >>>> >>>> But mm/rmap.c says: >>>>> * hugetlbfs PageHuge() take locks in this order: >>>>> *   hugetlb_fault_mutex (hugetlbfs specific page fault mutex) >>>>> *     vma_lock (hugetlb specific lock for pmd_sharing) >>>>> *       mapping->i_mmap_rwsem (also used for hugetlb pmd sharing) >>>>> *         folio_lock >>>>> */ >>> >>> Thanks! You are right, I was mistaken ... >>> >>>> >>>> I think the commit message should explain why the locking order >>>> described >>>> above is incorrect (or when it became incorrect) and fix the comment? >>> >>> I think the locking order documented in mm/rmap.c (vma_lock -> >>> i_mmap_lock) >>> is indeed the correct one to follow. > > Looking at the commit[1] that introduced the vma_lock, it seems possible > that > the deadlock reported by syzbot[2] is a false positive ... > > From the commit message: > > ``` > The vma_lock is used as follows: > - During fault processing. The lock is acquired in read mode before >   doing a page table lock and allocation (huge_pte_alloc).  The lock is >   held until code is finished with the page table entry (ptep). > - The lock must be held in write mode whenever huge_pmd_unshare is >   called. > > Lock ordering issues come into play when unmapping a page from all > vmas mapping the page.  The i_mmap_rwsem must be held to search for the > vmas, and the vma lock must be held before calling unmap which will > call huge_pmd_unshare.  This is done today in: > - try_to_migrate_one and try_to_unmap_ for page migration and memory >   error handling.  In these routines we 'try' to obtain the vma lock and >   fail to unmap if unsuccessful.  Calling routines already deal with the >   failure of unmapping. > - hugetlb_vmdelete_list for truncation and hole punch.  This routine >   also tries to acquire the vma lock.  If it fails, it skips the >   unmapping.  However, we can not have file truncation or hole punch >   fail because of contention.  After hugetlb_vmdelete_list, truncation >   and hole punch call remove_inode_hugepages.  remove_inode_hugepages >   checks for mapped pages and call hugetlb_unmap_file_page to unmap them. >   hugetlb_unmap_file_page is designed to drop locks and reacquire in the >   correct order to guarantee unmap success.``` > > The locking logic is a bit tricky; some paths can't follow a strict lock > order > and must use trylock or a drop/retry pattern to avoid deadlocking :) > > Hoping Mike can take a look and confirm! > > [1] https://lore.kernel.org/all/20220914221810.95771-9- > mike.kravetz@oracle.com/ > [2] https://lore.kernel.org/linux- > mm/69113a97.a70a0220.22f260.00ca.GAE@google.com/ > > Thanks, > Lance > >>> >>> This fix has it backwards then. I'll rework it to fix the actual >>> violations. >>> >> Break a leg, better after taking a look at ffa1e7ada456 ("block: Make >> request_queue lockdep splats show up earlier") >